# ML Engineer (Production Systems / MLOps)
Location: Seattle, WA (Hybrid) · Employment Type: Full-time · Level: Mid-Senior
[Company] is building an AI-powered logistics optimization platform that helps e-commerce companies reduce shipping costs and delivery times. Our ML models power real-time routing decisions, demand forecasting, and carrier selection for 200+ enterprise customers.
Our platform processes over 10 million predictions daily across 25 models in production. We serve real-time inference with P99 latency under 45ms and maintain 99.95% uptime. This isn't experimental ML—this is production ML at scale that directly impacts customer revenue.
Why join [Company]?
- Work on ML systems processing 10M+ predictions daily
- Join a 120-person company with a dedicated ML Platform team
- Series C funded ($85M from Andreessen Horowitz and Sequoia)
- Clear ownership: Data Scientists train models, you ship them
This is a Production ML / MLOps role. Not research. Not model training.
We're looking for an ML Engineer to join our ML Platform team. You'll deploy, serve, monitor, and operate ML models in production—not train them. Think of this as backend engineering specialized for machine learning systems.
At [Company], our Data Scientists build and experiment with models. As an ML Engineer, you take those trained models and make them production-ready: reliable, fast, scalable, and observable. You'll own the entire lifecycle from model handoff to production deployment and ongoing operations.
What you'll solve: Our recommendation and routing models serve 10M+ predictions/day but we're hitting scaling limits during holiday peaks. We need to redesign our serving infrastructure to handle 3x growth while maintaining sub-50ms latency and building self-service deployment tooling for our Data Science team.
- Model Deployment & Serving — Taking trained models from Data Scientists and deploying them to production via REST APIs, batch inference jobs, and real-time pipelines
- MLOps Infrastructure — Building and maintaining CI/CD pipelines for ML artifacts, model registries, feature stores, and experiment tracking systems
- Production Monitoring — Implementing monitoring for model performance, data drift, prediction distributions, and system health
- Reliability Engineering — Ensuring models meet latency SLAs, designing fallback mechanisms, managing canary deployments and rollbacks
- Platform Building — Creating self-service tooling that enables Data Scientists to deploy models without engineering bottlenecks
- Performance Optimization — Model quantization, caching strategies, GPU utilization, and infrastructure cost optimization
- Research — You won't write papers, develop novel algorithms, or push the boundaries of ML theory
- Model Training — You won't tune hyperparameters, run experiments, or optimize for accuracy metrics
- Data Science — You won't do exploratory analysis, build dashboards, or create business intelligence reports
- Academic ML — We don't need deep mathematical theory; we need production engineering skills
If you want to train models, apply for our Data Scientist roles. If you want to ship models reliably at scale, keep reading.
- Deploy and operate 25+ ML models serving 10M predictions/day with 99.95% uptime
- Reduce model deployment time from 2 weeks to under 2 days through automation
- Build monitoring systems that catch model degradation before it impacts customers
- Create self-service deployment tooling that enables Data Scientists to ship independently
- Maintain P99 latency under 50ms across all real-time inference endpoints
- Establish MLOps best practices and mentor Data Scientists on production patterns
- Deploy ML models to production using TensorFlow Serving and custom inference containers
- Build and maintain real-time inference APIs handling 500+ requests/second
- Implement model versioning, A/B testing, and canary deployment strategies
- Design and operate our feature store (Feast) for consistent features across training and serving
- Build CI/CD pipelines specifically for ML artifacts (model validation, performance regression tests)
- Monitor model performance, data drift, and prediction distributions using Prometheus and Grafana
- Respond to ML-specific production incidents and participate in on-call rotation (1 week every 5 weeks)
- Automate model retraining triggers based on performance degradation thresholds
- Optimize model inference latency through quantization, caching, and batching strategies
- Document MLOps processes and create runbooks for common operational scenarios
- 4+ years of professional backend/infrastructure engineering experience
- Strong Python proficiency with production code (not just notebooks)
- Experience deploying ML models to production using serving frameworks (TensorFlow Serving, TorchServe, Triton, or custom)
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes)
- Understanding of ML concepts (you don't train models, but you need to understand what you're deploying)
- Experience with REST API design and microservices architecture
- Familiarity with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
- Experience debugging distributed systems and handling production incidents
- Comfort with on-call responsibilities and incident management
- Strong software engineering fundamentals: testing, code review, documentation
- Experience with MLOps platforms (MLflow, Kubeflow, Weights & Biases)
- Familiarity with feature stores (Feast, Tecton, or custom implementations)
- Experience with model optimization techniques (quantization, pruning, distillation)
- Background in recommendation systems or real-time inference deployment
- Experience with AWS SageMaker or similar cloud ML services
- Familiarity with streaming data systems (Kafka, Kinesis)
- Experience operating ML systems at 1M+ predictions/day scale
- GPU infrastructure management and optimization experience
- Model Serving: TensorFlow Serving, custom Python inference containers, ONNX Runtime
- Feature Store: Feast (online and offline stores)
- Experiment Tracking: MLflow, Weights & Biases
- Model Registry: MLflow Model Registry
- Orchestration: Kubernetes (EKS), Airflow for batch pipelines
- Monitoring: Prometheus, Grafana, PagerDuty, custom model performance dashboards
- Infrastructure: AWS (EKS, SageMaker endpoints, S3, Lambda), Terraform
- Data Platform: Snowflake, Apache Kafka, dbt
- GPU Compute: AWS EC2 G4/P4 instances, spot instance management
- Languages: Python (primary), Go for high-performance services
- 10M predictions/day across real-time and batch inference
- 25 models in production (recommendations, routing, demand forecasting, pricing)
- P99 latency: 45ms for real-time endpoints
- 500+ requests/second peak traffic
- 2,500+ features in our feature store
- Daily retraining for high-velocity models, weekly for stable models
- 99.95% uptime SLA with automatic failover
Salary: $175,000 - $230,000 (based on experience and location)
Equity: 0.05% - 0.15% (4-year vest, 1-year cliff)
Benefits:
- Medical, dental, and vision insurance (100% covered for employees, 80% for dependents)
- Unlimited PTO with 15-day minimum encouraged
- $4,000 annual learning budget (ML conferences like NeurIPS, MLOps Community, courses)
- $2,000 home office setup allowance
- 401(k) with 4% company match
- 16 weeks paid parental leave
- On-call compensation: $600/week when primary on-call
- Flexible hybrid work (2 days in Seattle office, or full remote for exceptional candidates)
Our interview process typically takes 2-3 weeks. We focus on production ML skills, not research.
- Step 1: Recruiter Screen (30 min) - We'll clarify this is production ML, not research, and discuss your background.
- Step 2: Technical Screen (60 min) - ML systems knowledge, past deployment experience, and Python proficiency.
- Step 3: ML System Design (60 min) - Design an ML serving system for real-time recommendations.
- Step 4: MLOps Deep-Dive (60 min) - CI/CD for ML, monitoring strategies, and retraining pipelines.
- Step 5: Production Scenarios (45 min) - Incident handling and debugging ML systems in production.
- Step 6: Team Interviews (2 x 30 min) - Collaboration with Data Scientists and culture fit.
- Step 7: Hiring Manager (30 min) - Career goals and offer discussion.
What we evaluate: Production deployment, software engineering, MLOps, system design.
What we don't evaluate: Model training, research publications, LeetCode, ML theory.
[Company] is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We believe diverse teams build better ML systems—different perspectives lead to more robust solutions and fewer blind spots in production.
We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
We encourage applications from candidates who may not meet 100% of the qualifications. Research shows that women and underrepresented groups are less likely to apply unless they meet every requirement—we'd rather you apply and let us decide.
---
*This role is about making ML models work reliably in production—not training them in notebooks. If shipping models at scale excites you, apply now.*
# ML Engineer (Production Systems / MLOps)
**Location:** Seattle, WA (Hybrid) · **Employment Type:** Full-time · **Level:** Mid-Senior
## About [Company]
[Company] is building an AI-powered logistics optimization platform that helps e-commerce companies reduce shipping costs and delivery times. Our ML models power real-time routing decisions, demand forecasting, and carrier selection for 200+ enterprise customers.
Our platform processes over 10 million predictions daily across 25 models in production. We serve real-time inference with P99 latency under 45ms and maintain 99.95% uptime. This isn't experimental ML—this is production ML at scale that directly impacts customer revenue.
**Why join [Company]?**
- Work on ML systems processing 10M+ predictions daily
- Join a 120-person company with a dedicated ML Platform team
- Series C funded ($85M from Andreessen Horowitz and Sequoia)
- Clear ownership: Data Scientists train models, you ship them
## The Role
**This is a Production ML / MLOps role. Not research. Not model training.**
We're looking for an ML Engineer to join our ML Platform team. You'll deploy, serve, monitor, and operate ML models in production—not train them. Think of this as backend engineering specialized for machine learning systems.
At [Company], our Data Scientists build and experiment with models. As an ML Engineer, you take those trained models and make them production-ready: reliable, fast, scalable, and observable. You'll own the entire lifecycle from model handoff to production deployment and ongoing operations.
**What you'll solve:** Our recommendation and routing models serve 10M+ predictions/day but we're hitting scaling limits during holiday peaks. We need to redesign our serving infrastructure to handle 3x growth while maintaining sub-50ms latency and building self-service deployment tooling for our Data Science team.
## What This Role IS
- **Model Deployment & Serving** — Taking trained models from Data Scientists and deploying them to production via REST APIs, batch inference jobs, and real-time pipelines
- **MLOps Infrastructure** — Building and maintaining CI/CD pipelines for ML artifacts, model registries, feature stores, and experiment tracking systems
- **Production Monitoring** — Implementing monitoring for model performance, data drift, prediction distributions, and system health
- **Reliability Engineering** — Ensuring models meet latency SLAs, designing fallback mechanisms, managing canary deployments and rollbacks
- **Platform Building** — Creating self-service tooling that enables Data Scientists to deploy models without engineering bottlenecks
- **Performance Optimization** — Model quantization, caching strategies, GPU utilization, and infrastructure cost optimization
## What This Role is NOT
- **Research** — You won't write papers, develop novel algorithms, or push the boundaries of ML theory
- **Model Training** — You won't tune hyperparameters, run experiments, or optimize for accuracy metrics
- **Data Science** — You won't do exploratory analysis, build dashboards, or create business intelligence reports
- **Academic ML** — We don't need deep mathematical theory; we need production engineering skills
**If you want to train models, apply for our Data Scientist roles. If you want to ship models reliably at scale, keep reading.**
## Objectives of This Role
- Deploy and operate 25+ ML models serving 10M predictions/day with 99.95% uptime
- Reduce model deployment time from 2 weeks to under 2 days through automation
- Build monitoring systems that catch model degradation before it impacts customers
- Create self-service deployment tooling that enables Data Scientists to ship independently
- Maintain P99 latency under 50ms across all real-time inference endpoints
- Establish MLOps best practices and mentor Data Scientists on production patterns
## Responsibilities
- Deploy ML models to production using TensorFlow Serving and custom inference containers
- Build and maintain real-time inference APIs handling 500+ requests/second
- Implement model versioning, A/B testing, and canary deployment strategies
- Design and operate our feature store (Feast) for consistent features across training and serving
- Build CI/CD pipelines specifically for ML artifacts (model validation, performance regression tests)
- Monitor model performance, data drift, and prediction distributions using Prometheus and Grafana
- Respond to ML-specific production incidents and participate in on-call rotation (1 week every 5 weeks)
- Automate model retraining triggers based on performance degradation thresholds
- Optimize model inference latency through quantization, caching, and batching strategies
- Document MLOps processes and create runbooks for common operational scenarios
## Required Skills and Qualifications
- 4+ years of professional backend/infrastructure engineering experience
- Strong Python proficiency with production code (not just notebooks)
- Experience deploying ML models to production using serving frameworks (TensorFlow Serving, TorchServe, Triton, or custom)
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes)
- Understanding of ML concepts (you don't train models, but you need to understand what you're deploying)
- Experience with REST API design and microservices architecture
- Familiarity with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
- Experience debugging distributed systems and handling production incidents
- Comfort with on-call responsibilities and incident management
- Strong software engineering fundamentals: testing, code review, documentation
## Preferred Skills and Qualifications
- Experience with MLOps platforms (MLflow, Kubeflow, Weights & Biases)
- Familiarity with feature stores (Feast, Tecton, or custom implementations)
- Experience with model optimization techniques (quantization, pruning, distillation)
- Background in recommendation systems or real-time inference deployment
- Experience with AWS SageMaker or similar cloud ML services
- Familiarity with streaming data systems (Kafka, Kinesis)
- Experience operating ML systems at 1M+ predictions/day scale
- GPU infrastructure management and optimization experience
## Tech Stack
- **Model Serving:** TensorFlow Serving, custom Python inference containers, ONNX Runtime
- **Feature Store:** Feast (online and offline stores)
- **Experiment Tracking:** MLflow, Weights & Biases
- **Model Registry:** MLflow Model Registry
- **Orchestration:** Kubernetes (EKS), Airflow for batch pipelines
- **Monitoring:** Prometheus, Grafana, PagerDuty, custom model performance dashboards
- **Infrastructure:** AWS (EKS, SageMaker endpoints, S3, Lambda), Terraform
- **Data Platform:** Snowflake, Apache Kafka, dbt
- **GPU Compute:** AWS EC2 G4/P4 instances, spot instance management
- **Languages:** Python (primary), Go for high-performance services
## ML Scale
- **10M predictions/day** across real-time and batch inference
- **25 models in production** (recommendations, routing, demand forecasting, pricing)
- **P99 latency: 45ms** for real-time endpoints
- **500+ requests/second** peak traffic
- **2,500+ features** in our feature store
- **Daily retraining** for high-velocity models, weekly for stable models
- **99.95% uptime SLA** with automatic failover
## Compensation and Benefits
**Salary:** $175,000 - $230,000 (based on experience and location)
**Equity:** 0.05% - 0.15% (4-year vest, 1-year cliff)
**Benefits:**
- Medical, dental, and vision insurance (100% covered for employees, 80% for dependents)
- Unlimited PTO with 15-day minimum encouraged
- $4,000 annual learning budget (ML conferences like NeurIPS, MLOps Community, courses)
- $2,000 home office setup allowance
- 401(k) with 4% company match
- 16 weeks paid parental leave
- On-call compensation: $600/week when primary on-call
- Flexible hybrid work (2 days in Seattle office, or full remote for exceptional candidates)
## Interview Process
Our interview process typically takes 2-3 weeks. We focus on production ML skills, not research.
- **Step 1: Recruiter Screen** (30 min) - We'll clarify this is production ML, not research, and discuss your background.
- **Step 2: Technical Screen** (60 min) - ML systems knowledge, past deployment experience, and Python proficiency.
- **Step 3: ML System Design** (60 min) - Design an ML serving system for real-time recommendations.
- **Step 4: MLOps Deep-Dive** (60 min) - CI/CD for ML, monitoring strategies, and retraining pipelines.
- **Step 5: Production Scenarios** (45 min) - Incident handling and debugging ML systems in production.
- **Step 6: Team Interviews** (2 x 30 min) - Collaboration with Data Scientists and culture fit.
- **Step 7: Hiring Manager** (30 min) - Career goals and offer discussion.
**What we evaluate:** Production deployment, software engineering, MLOps, system design.
**What we don't evaluate:** Model training, research publications, LeetCode, ML theory.
## Equal Opportunity
[Company] is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We believe diverse teams build better ML systems—different perspectives lead to more robust solutions and fewer blind spots in production.
We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
We encourage applications from candidates who may not meet 100% of the qualifications. Research shows that women and underrepresented groups are less likely to apply unless they meet every requirement—we'd rather you apply and let us decide.
---
*This role is about making ML models work reliably in production—not training them in notebooks. If shipping models at scale excites you, apply now.*