The Data Engineer's guide to optimizing Kubernetes, with Niels Claeys
Niels Claeys shares how his team at Dataminded built Conveyor, a data platform processing up to 1.5 million core hours monthly. He explains the specific optimizations they discovered through production experience, from scheduler changes that immediately reduce costs by 10-15% to achieving 97% spot instance usage without reliability issues.You will learn:Why the default Kubernetes scheduler wastes money on batch workloads and how switching from "least allocated" to "most allocated" scheduling enables faster scale-down and better resource utilizationHow to achieve 97% spot instance adoption through strategic instance type diversification, region selection, and Spark-specific techniquesNode pool design principles that balance Kubernetes overhead with workload efficiencyPlatform-specific gotchas like AWS cross-AZ data transfer costs that can spike bills unexpectedlySponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/hGRfkzDJWInterested in sponsoring an episode? Learn more.
--------
--------
The Making of Flux: The Scale, a KubeFM Original Series
In this episode, Philippe Ensarguet, VP of Software Engineering at Orange, and Arnab Chatterjee, Global Head of Container & AI Platforms at Nomura, share how large enterprises are adopting Flux to drive reliable, compliant, and scalable platforms.How Orange uses Flux to manage bare-metal Kubernetes through its SYLVR project.Why FSIs rely on GitOps to balance agility with governance.How Flux helps enterprises achieve resilience, compliance, and repeatability at scale.SponsorJoin the Flux maintainers and community at FluxCon, November 11th in Atlanta—register hereMore infoFind all the links and info for this episode here: https://ku.bz/tWcHlJm7MInterested in sponsoring an episode? Learn more.
--------
23:09
--------
23:09
How We Integrated Native macOS Workloads with Kubernetes, with Vitalii Horbachov
Vitalii Horbachov explains how Agoda built macOS VZ Kubelet, a custom solution that registers macOS hosts as Kubernetes nodes and spins up macOS VMs using Apple's native virtualization framework. He details their journey from managing 200 Mac minis with bash scripts to a Kubernetes-native approach that handles 20,000 iOS tests at scale.You will learn:How to build hybrid runtime pods that combine macOS VMs with Docker sidecar containers for complex CI/CD workflowsCustom OCI image format implementation for managing 55-60GB macOS VM images with layered copy-on-write disks and digest validationNetworking and security challenges including Apple entitlements, direct NIC access, and implementing kubectl exec over SSHReal-world adoption considerations including MDM-based host lifecycle management and the build vs. buy decision for Apple infrastructure at scaleSponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/q_JS76SvMInterested in sponsoring an episode? Learn more.
--------
--------
The Making of Flux: The Rewrite, a KubeFM Original Series
In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.How early Flux addressed the risks of manual, unsafe Kubernetes upgradesWhy the complete V2 rewrite was critical for stability, scalability, and adoptionWhat the maintainers learned about building a sustainable, community-driven open-source projectSponsorJoin the Flux maintainers and community at FluxCon, November 11th in Atlanta—register hereMore infoFind all the links and info for this episode here: https://ku.bz/bgkgn227-Interested in sponsoring an episode? Learn more.
--------
44:58
--------
44:58
Scaling CI horizontally with Buildkite, Kubernetes, and multiple pipelines, with Ben Poland
Ben Poland walks through Faire's complete CI transformation, from a single Jenkins instance struggling with thousands of lines of Groovy to a distributed Buildkite system running across multiple Kubernetes clusters.He details the technical challenges of running CI workloads at scale, including API rate limiting, etcd pressure points, and the trade-offs of splitting monolithic pipelines into service-scoped ones.You will learn:How to architect CI systems that match team ownership and eliminate shared failure points across servicesKubernetes scaling patterns for CI workloads, including multi-cluster strategies, predictive node provisioning, and handling API throttlingPerformance optimization techniques like Git mirroring, node-level caching, and spot instance management for variable CI demandsMigration strategies and lessons learned from moving away from monolithic CI, including proof-of-concept approaches and avoiding the sunk cost fallacySponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/klBmzMY5-Interested in sponsoring an episode? Learn more.
Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.