Multi-Cluster Microservices Platform
Orchestrating 150+ microservices across multiple AWS regions serving 500M+ monthly active users with custom operators and advanced autoscaling.
Zero-Downtime Infrastructure Migration
Complete migration of legacy infrastructure to Kubernetes serving 300M+ monthly users with custom networking solutions for ML inference.
Black Friday Traffic Scaling
Autoscaling platform handling 10x traffic spikes during peak shopping events with GitOps deployments and strict multi-tenant isolation.
Compliant Banking Platform
Production banking workloads on Kubernetes with PCI-DSS compliance, multi-cluster federation, and advanced RBAC with Vault integration.
What Kubernetes Engineers Actually Build
Before writing your job description, understand what Kubernetes work looks like at different companies. Here are real examples:
Streaming & Media
Spotify uses Kubernetes to orchestrate 150+ microservices serving 500M+ monthly active users. Their K8s engineers handle:
- Multi-cluster deployments across AWS regions
- Custom operators for stateful services (data pipelines, ML models)
- Autoscaling for traffic spikes (new album releases, podcast launches)
- Service mesh implementation for observability
Netflix runs a hybrid infrastructure where Kubernetes engineers manage:
- Container orchestration alongside their legacy Titus platform
- Chaos engineering experiments on K8s workloads
- Cost optimization across massive compute fleets
E-Commerce & Retail
Shopify relies on Kubernetes to handle extreme traffic variability:
- Black Friday/Cyber Monday scaling (10x normal traffic)
- Multi-tenant isolation for merchant workloads
- GitOps deployments using ArgoCD
- Pod security policies for payment processing
Pinterest completed a massive Kubernetes migration:
- 300M+ monthly users, billions of Pins served
- Zero-downtime migration from legacy infrastructure
- Custom networking solutions for ML inference workloads
Fintech & Banking
Capital One runs production banking workloads on Kubernetes:
- Strict compliance requirements (PCI-DSS, SOX)
- Multi-cluster federation across regions
- Advanced RBAC and policy enforcement
- Secrets management with HashiCorp Vault integration
Stripe uses Kubernetes for their developer tools and internal platforms:
- Self-service deployment platforms for engineering teams
- Canary deployments for payment-critical services
- Comprehensive observability stacks
What to Look For: Skills by Business Need
Managed vs Self-Managed: A Critical Hiring Distinction
This is the most important question for your job description: What kind of Kubernetes environment are you running?
Managed Kubernetes (EKS, GKE, AKS)
Most companies today use managed Kubernetes where the cloud provider handles the control plane. Your engineers focus on:
- Application deployment and configuration
- Networking within the cluster (Services, Ingress, Network Policies)
- Resource optimization (requests, limits, autoscaling)
- Security at the workload level (RBAC, Pod Security)
Who needs this: Most startups, mid-size companies, and enterprises that want to focus on applications rather than infrastructure.
Skill level required: Strong K8s practitioners who understand the platform deeply but don't need to manage etcd backups or control plane upgrades.
Self-Managed Kubernetes
Some organizations run their own clusters (on-prem, bare metal, or custom cloud setups). This requires:
- Full cluster lifecycle management (provisioning, upgrades, disaster recovery)
- etcd administration and backup strategies
- Control plane high availability
- CNI and CSI driver management
- Deep Linux systems knowledge
Who needs this: Large enterprises with specific compliance requirements, companies with on-premises data centers, or those with unique infrastructure needs.
Skill level required: True K8s experts—these candidates are rare and expensive. Budget accordingly.
Critical hiring tip: A developer with 3 years of GKE experience may struggle with self-managed clusters. Be explicit about your environment in the job description—candidates appreciate the honesty and it filters applications effectively.
Modern Kubernetes Practices (2024-2026)
The Kubernetes ecosystem evolves rapidly. Here's what modern K8s looks like:
GitOps is Now Standard
If your team isn't using GitOps, you're behind. Tools like ArgoCD and Flux have become the default for Kubernetes deployments:
- Declarative configuration stored in Git
- Automated sync between Git state and cluster state
- Audit trails for all changes
- Easy rollbacks via git revert
Interview signal: Ask candidates about their deployment workflow. If they describe manual kubectl applies or SSH-ing into servers, they're working with outdated practices.
Platform Engineering Over Raw K8s
Senior Kubernetes roles increasingly focus on platform engineering—building internal developer platforms (IDPs) that abstract K8s complexity:
- Backstage-style developer portals
- Self-service namespace provisioning
- Standardized Helm chart libraries
- Golden paths for common deployment patterns
Companies like Spotify, Airbnb, and Shopify pioneered this approach. Now it's spreading to mid-size companies.
eBPF and Advanced Networking
The cutting edge of Kubernetes networking uses eBPF for:
- High-performance networking (Cilium CNI)
- Advanced observability without sidecars
- Runtime security enforcement
- Service mesh without proxy overhead
This is senior/staff-level territory. Don't require it for most roles, but recognize it as a differentiator for platform teams.
Cost Optimization is a Core Skill
As K8s adoption matures, cost management has become critical:
- Right-sizing resource requests and limits
- Spot/preemptible instance strategies
- Karpenter or Cluster Autoscaler tuning
- FinOps dashboards and showback
Spotify reportedly saves millions annually through K8s cost optimization. Ask candidates about cost-aware architecture decisions.
Recruiter's Cheat Sheet: Spotting Great Candidates
Conversation Starters That Reveal Skill Level
Instead of asking "Do you know Kubernetes?", try these:
| Question | Junior Answer | Senior Answer |
|---|---|---|
| "Describe a K8s incident you handled" | "A pod crashed and I restarted it" | "etcd latency caused cascading failures. I identified the root cause, implemented rate limiting, and added monitoring to prevent recurrence" |
| "How do you decide on resource requests/limits?" | "I use whatever the defaults are" | "I profile actual usage over 2 weeks, set requests at p90, limits at p99, and configure VPA for dynamic adjustment" |
| "What's your deployment strategy?" | "kubectl apply" | "GitOps with ArgoCD, progressive rollouts, automated rollback on error rate increase" |
Resume Signals That Matter
✅ Look for:
- Specific scale metrics ("Managed 50-node clusters serving 1M+ requests/day")
- Production incident experience ("Led incident response for platform outages")
- Modern tooling (ArgoCD, Cilium, Karpenter, Prometheus)
- Cost optimization achievements ("Reduced K8s spend by 40%")
- Platform engineering work ("Built self-service developer platform")
🚫 Be skeptical of:
- Certification-only credentials (CKA without production experience)
- Listing every CNCF project (indicates tutorial completion, not real usage)
- "5 years Kubernetes experience" before 2019 (K8s wasn't widespread yet)
- Generic descriptions ("Worked with Kubernetes infrastructure")
GitHub/Portfolio Red Flags
- Only local Minikube or Kind examples
- No mention of monitoring, logging, or observability
- Helm charts with all default values
- No documentation or README files
Common Hiring Mistakes
1. Requiring CKA Certification as a Hard Requirement
The Certified Kubernetes Administrator exam tests knowledge, not experience. Someone who passed CKA last week knows less than an engineer who's been managing production clusters for 2 years without certification.
Better approach: Use certification as a "nice to have" signal that they're invested in learning, but prioritize production experience in interviews.
2. Conflating Managed and Self-Managed Experience
A developer who's deployed applications to GKE for 3 years may have never touched etcd, managed control plane upgrades, or configured CNI plugins. If you need self-managed K8s expertise, test for it explicitly.
Shopify's approach: They clearly specify whether roles are "platform team" (deep K8s expertise) vs "application team" (deploys to K8s).
3. Testing for YAML Writing
Anyone can copy YAML from documentation. The real skill is understanding why the configuration works, how to troubleshoot when it doesn't, and when to deviate from defaults.
Better approach: Give candidates a broken deployment and ask them to diagnose it. This tests understanding, not memorization.
4. Ignoring Soft Skills
The best K8s engineers at companies like Pinterest and Spotify aren't just technical experts—they write documentation, mentor team members, and communicate with non-technical stakeholders about infrastructure decisions.
Capital One's approach: Their platform engineer interviews include a "explain this to a business stakeholder" component.
5. Overloading Technology Requirements
Requiring experience with every CNCF project (Kubernetes AND Prometheus AND Jaeger AND Linkerd AND Falco AND OPA AND ArgoCD AND Crossplane) signals you don't understand your own stack.
Better approach: List what you actually use. A strong candidate can learn adjacent tools quickly.