# AI Engineer
Location: San Francisco, CA (Hybrid) · Employment Type: Full-time · Level: Mid-Senior
[Company] is building AI-powered features that help product teams ship faster. Our platform uses large language models to automate repetitive workflows, generate content, and surface insights from unstructured data.
We're a 65-person team backed by $38M in Series B funding from Andreessen Horowitz and Greylock. Our AI features process 500,000+ queries daily across 2,000+ customers—from product managers drafting specs to support teams resolving tickets.
Why join [Company]?
- Build AI features used by thousands of teams daily
- Work with cutting-edge LLMs (GPT-4, Claude, open-source models)
- Strong AI engineering culture—we ship production AI, not demos
- Competitive compensation with meaningful equity in a growing market
Let's be crystal clear: This is about building AI FEATURES with LLMs, not training models from scratch.
We're looking for an AI Engineer to join our Product Intelligence team. You'll design and build the RAG systems, prompt chains, and LLM integrations that power our AI-assisted features. This means working with pre-trained models (OpenAI, Anthropic, open-source), not training neural networks.
You'll report to our Head of AI Engineering and work alongside 4 other AI engineers, 3 backend engineers, and 2 product managers. Your focus will be making our AI features more accurate, faster, and cost-effective.
The problem you'll solve:
Our AI assistant handles 500K queries daily but struggles with domain-specific questions. We need someone to build RAG systems that ground responses in customer documentation, design evaluation frameworks to measure quality, and optimize our $15K/month LLM spend without sacrificing accuracy.
- LLM API integration—connecting OpenAI, Anthropic, and open-source models to production features
- RAG system development—building retrieval pipelines that combine LLMs with knowledge bases
- Prompt engineering—designing reliable prompts that handle edge cases and produce consistent outputs
- AI evaluation—creating test suites, measuring quality metrics, reducing hallucinations
- Cost optimization—caching, model selection, token efficiency at scale
- Software engineering with AI—building reliable systems, not just calling APIs
- Training ML models from scratch—we use pre-trained LLMs, not custom neural network training
- Traditional ML/data science—no regression models, decision trees, or statistical modeling
- ML research—we ship products, not papers
- MLOps/model serving infrastructure—we use managed LLM APIs, not self-hosted model deployments
- Pure data engineering—while you'll work with data, AI features are the focus
*If you're looking for traditional ML model training, this isn't the role. We're specifically hiring for LLM integration and RAG systems.*
- Build RAG systems that improve response accuracy from 72% to 90%+
- Reduce hallucination rate from 8% to under 2%
- Optimize LLM costs—target 30% reduction while maintaining quality
- Establish prompt engineering best practices for the team
- Design evaluation frameworks that catch quality regressions before production
- Design and build RAG pipelines using vector databases and embedding models
- Integrate LLM APIs (OpenAI, Anthropic, open-source) with proper error handling, fallbacks, and retry logic
- Engineer prompts for production use cases—system prompts, few-shot examples, chain-of-thought
- Build evaluation frameworks to measure AI output quality systematically
- Implement caching and model selection strategies to optimize costs
- Create guardrails for content safety, PII detection, and hallucination prevention
- Collaborate with product and backend teams on AI feature requirements
- Document prompt patterns and RAG architectures for team knowledge sharing
- Monitor AI systems in production—latency, costs, quality metrics, error rates
- Participate in on-call rotation for AI feature incidents (1 week every 8 weeks)
Software Engineering Foundation (Required First):
- 4+ years of professional software engineering experience
- Strong proficiency in Python (our primary AI stack language)
- Production experience building and deploying web services and APIs
- Solid understanding of software architecture, testing, and code quality
- You're an engineer first, AI specialist second—we need reliable systems
LLM Integration Experience:
- 1+ years hands-on experience integrating LLM APIs (OpenAI, Anthropic, Cohere) into production systems
- Understanding of model capabilities, limitations, and appropriate use cases
- Experience with token management, rate limiting, and error handling
- Familiarity with prompt engineering patterns (few-shot, chain-of-thought, structured outputs)
RAG or Similar Experience:
- Built at least one RAG system or similar retrieval-augmented application
- Understanding of embeddings, vector similarity, and retrieval strategies
- Experience with document chunking and context window optimization
- Experience with vector databases (Pinecone, Weaviate, pgvector, Qdrant)
- Familiarity with LLM frameworks (LangChain, LlamaIndex, Semantic Kernel)
- Fine-tuning experience with open-source LLMs (Llama, Mistral)
- Background in B2B SaaS or developer tools
- Experience with AI agents, tool use, and function calling
- Understanding of AI safety, content moderation, and responsible AI practices
- Experience building AI evaluation frameworks and quality metrics
- Contributions to open-source AI projects
LLM Providers:
- OpenAI (GPT-4, GPT-4-turbo for production)
- Anthropic (Claude 3 for complex reasoning tasks)
- Open-source models via Together AI for cost-sensitive features
AI Frameworks:
- LangChain for prompt chains and agent workflows
- LlamaIndex for RAG pipelines
- Custom evaluation framework built on pytest
Vector Database:
- Pinecone for production vector search
- pgvector for development and smaller datasets
Embedding Models:
- OpenAI text-embedding-3-large (primary)
- Cohere embed-v3 for multilingual content
Backend:
- Python (FastAPI) for AI services
- Node.js (TypeScript) for main application
- PostgreSQL for structured data
- Redis for caching and rate limiting
Infrastructure:
- AWS (ECS, Lambda, S3, CloudWatch)
- Kubernetes for AI service orchestration
- Terraform for infrastructure as code
Monitoring:
- Datadog for metrics and tracing
- LangSmith for LLM observability
- Custom dashboards for AI quality metrics
- AI Queries/Day: 500,000+
- P95 Latency: 1.8 seconds (target: 1.2s)
- Monthly LLM Spend: $15,000 (target: $10,500 with same quality)
- Response Accuracy: 72% (target: 90%+)
- Hallucination Rate: 8% (target: <2%)
- User Satisfaction: 4.1/5 (target: 4.5/5)
What we're working on:
- Multi-document RAG for cross-referencing customer data sources
- Streaming responses with real-time citation highlighting
- Prompt caching to reduce costs by 40%
- Automated evaluation pipeline for prompt regression testing
Salary: $180,000 - $240,000 (based on experience and location)
Equity: 0.05% - 0.15% (4-year vest, 1-year cliff)
Benefits:
- Medical, dental, and vision insurance (100% covered for employees, 80% for dependents)
- Unlimited PTO with 15-day minimum encouraged
- $5,000 annual learning budget (AI conferences, courses, certifications)
- $2,000 home office setup allowance
- 401(k) with 4% company match
- 16 weeks paid parental leave
- Flexible hybrid work (2-3 days in SF office)
AI-Specific Perks:
- $100/month OpenAI/Anthropic API credits for personal projects
- Conference budget for AI-focused events (NeurIPS, AI Engineer Summit)
- 10% time for AI experimentation and prototyping new techniques
- Access to latest models and tools before general availability
Our interview process typically takes 2-3 weeks. We focus on LLM integration skills, not traditional ML.
- Step 1: Application Review (3-5 days) - We review your resume, AI project experience, and GitHub.
- Step 2: Recruiter Screen (30 min) - We'll discuss your background, AI interests, and compensation.
- Step 3: Technical Screen (60 min) - LLM fundamentals, past AI projects, and prompt engineering basics.
- Step 4: RAG System Design (60 min) - Design a RAG system including chunking, retrieval, and evaluation.
- Step 5: Prompt Engineering Exercise (60 min) - Design and iterate on prompts with live evaluation.
- Step 6: Team Interviews (2 x 30 min) - Meet potential teammates and discuss collaboration.
- Step 7: Hiring Manager (30 min) - Career goals and offer discussion.
We respond within 5 business days and provide written feedback. Real AI problems, not LeetCode.
- Transparent discussion about our AI challenges and roadmap
What We DON'T Do:
- LeetCode-style algorithm puzzles (we focus on AI system design)
- Traditional ML/statistics interviews (this isn't that role)
- "Gotcha" questions about transformer architectures
- Ghost candidates (we always follow up)
[Company] is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We believe diverse teams build better AI products—different perspectives help us identify biases and build systems that work for everyone.
We encourage applications from candidates who may not meet 100% of the qualifications. Research shows that women and underrepresented groups are less likely to apply unless they meet every requirement—we'd rather you apply and let us decide.
If you need accommodations during the interview process, let us know and we'll make it work.
---
*Questions before applying? Email hiring@company.com*
*Ready to build AI features that help teams work smarter? Apply now.*
# AI Engineer
**Location:** San Francisco, CA (Hybrid) · **Employment Type:** Full-time · **Level:** Mid-Senior
## About [Company]
[Company] is building AI-powered features that help product teams ship faster. Our platform uses large language models to automate repetitive workflows, generate content, and surface insights from unstructured data.
We're a 65-person team backed by $38M in Series B funding from Andreessen Horowitz and Greylock. Our AI features process 500,000+ queries daily across 2,000+ customers—from product managers drafting specs to support teams resolving tickets.
**Why join [Company]?**
- Build AI features used by thousands of teams daily
- Work with cutting-edge LLMs (GPT-4, Claude, open-source models)
- Strong AI engineering culture—we ship production AI, not demos
- Competitive compensation with meaningful equity in a growing market
## The Role
**Let's be crystal clear: This is about building AI FEATURES with LLMs, not training models from scratch.**
We're looking for an AI Engineer to join our Product Intelligence team. You'll design and build the RAG systems, prompt chains, and LLM integrations that power our AI-assisted features. This means working with pre-trained models (OpenAI, Anthropic, open-source), not training neural networks.
You'll report to our Head of AI Engineering and work alongside 4 other AI engineers, 3 backend engineers, and 2 product managers. Your focus will be making our AI features more accurate, faster, and cost-effective.
**The problem you'll solve:**
Our AI assistant handles 500K queries daily but struggles with domain-specific questions. We need someone to build RAG systems that ground responses in customer documentation, design evaluation frameworks to measure quality, and optimize our $15K/month LLM spend without sacrificing accuracy.
## What This Role IS
- **LLM API integration**—connecting OpenAI, Anthropic, and open-source models to production features
- **RAG system development**—building retrieval pipelines that combine LLMs with knowledge bases
- **Prompt engineering**—designing reliable prompts that handle edge cases and produce consistent outputs
- **AI evaluation**—creating test suites, measuring quality metrics, reducing hallucinations
- **Cost optimization**—caching, model selection, token efficiency at scale
- **Software engineering with AI**—building reliable systems, not just calling APIs
## What This Role is NOT
- **Training ML models from scratch**—we use pre-trained LLMs, not custom neural network training
- **Traditional ML/data science**—no regression models, decision trees, or statistical modeling
- **ML research**—we ship products, not papers
- **MLOps/model serving infrastructure**—we use managed LLM APIs, not self-hosted model deployments
- **Pure data engineering**—while you'll work with data, AI features are the focus
*If you're looking for traditional ML model training, this isn't the role. We're specifically hiring for LLM integration and RAG systems.*
## Objectives of This Role
- Build RAG systems that improve response accuracy from 72% to 90%+
- Reduce hallucination rate from 8% to under 2%
- Optimize LLM costs—target 30% reduction while maintaining quality
- Establish prompt engineering best practices for the team
- Design evaluation frameworks that catch quality regressions before production
## Responsibilities
- Design and build RAG pipelines using vector databases and embedding models
- Integrate LLM APIs (OpenAI, Anthropic, open-source) with proper error handling, fallbacks, and retry logic
- Engineer prompts for production use cases—system prompts, few-shot examples, chain-of-thought
- Build evaluation frameworks to measure AI output quality systematically
- Implement caching and model selection strategies to optimize costs
- Create guardrails for content safety, PII detection, and hallucination prevention
- Collaborate with product and backend teams on AI feature requirements
- Document prompt patterns and RAG architectures for team knowledge sharing
- Monitor AI systems in production—latency, costs, quality metrics, error rates
- Participate in on-call rotation for AI feature incidents (1 week every 8 weeks)
## Required Skills and Qualifications
**Software Engineering Foundation (Required First):**
- 4+ years of professional software engineering experience
- Strong proficiency in Python (our primary AI stack language)
- Production experience building and deploying web services and APIs
- Solid understanding of software architecture, testing, and code quality
- You're an engineer first, AI specialist second—we need reliable systems
**LLM Integration Experience:**
- 1+ years hands-on experience integrating LLM APIs (OpenAI, Anthropic, Cohere) into production systems
- Understanding of model capabilities, limitations, and appropriate use cases
- Experience with token management, rate limiting, and error handling
- Familiarity with prompt engineering patterns (few-shot, chain-of-thought, structured outputs)
**RAG or Similar Experience:**
- Built at least one RAG system or similar retrieval-augmented application
- Understanding of embeddings, vector similarity, and retrieval strategies
- Experience with document chunking and context window optimization
## Preferred Skills and Qualifications
- Experience with vector databases (Pinecone, Weaviate, pgvector, Qdrant)
- Familiarity with LLM frameworks (LangChain, LlamaIndex, Semantic Kernel)
- Fine-tuning experience with open-source LLMs (Llama, Mistral)
- Background in B2B SaaS or developer tools
- Experience with AI agents, tool use, and function calling
- Understanding of AI safety, content moderation, and responsible AI practices
- Experience building AI evaluation frameworks and quality metrics
- Contributions to open-source AI projects
## Tech Stack
**LLM Providers:**
- OpenAI (GPT-4, GPT-4-turbo for production)
- Anthropic (Claude 3 for complex reasoning tasks)
- Open-source models via Together AI for cost-sensitive features
**AI Frameworks:**
- LangChain for prompt chains and agent workflows
- LlamaIndex for RAG pipelines
- Custom evaluation framework built on pytest
**Vector Database:**
- Pinecone for production vector search
- pgvector for development and smaller datasets
**Embedding Models:**
- OpenAI text-embedding-3-large (primary)
- Cohere embed-v3 for multilingual content
**Backend:**
- Python (FastAPI) for AI services
- Node.js (TypeScript) for main application
- PostgreSQL for structured data
- Redis for caching and rate limiting
**Infrastructure:**
- AWS (ECS, Lambda, S3, CloudWatch)
- Kubernetes for AI service orchestration
- Terraform for infrastructure as code
**Monitoring:**
- Datadog for metrics and tracing
- LangSmith for LLM observability
- Custom dashboards for AI quality metrics
## AI Metrics
- **AI Queries/Day:** 500,000+
- **P95 Latency:** 1.8 seconds (target: 1.2s)
- **Monthly LLM Spend:** $15,000 (target: $10,500 with same quality)
- **Response Accuracy:** 72% (target: 90%+)
- **Hallucination Rate:** 8% (target: <2%)
- **User Satisfaction:** 4.1/5 (target: 4.5/5)
**What we're working on:**
- Multi-document RAG for cross-referencing customer data sources
- Streaming responses with real-time citation highlighting
- Prompt caching to reduce costs by 40%
- Automated evaluation pipeline for prompt regression testing
## Compensation and Benefits
**Salary:** $180,000 - $240,000 (based on experience and location)
**Equity:** 0.05% - 0.15% (4-year vest, 1-year cliff)
**Benefits:**
- Medical, dental, and vision insurance (100% covered for employees, 80% for dependents)
- Unlimited PTO with 15-day minimum encouraged
- $5,000 annual learning budget (AI conferences, courses, certifications)
- $2,000 home office setup allowance
- 401(k) with 4% company match
- 16 weeks paid parental leave
- Flexible hybrid work (2-3 days in SF office)
**AI-Specific Perks:**
- $100/month OpenAI/Anthropic API credits for personal projects
- Conference budget for AI-focused events (NeurIPS, AI Engineer Summit)
- 10% time for AI experimentation and prototyping new techniques
- Access to latest models and tools before general availability
## Interview Process
Our interview process typically takes 2-3 weeks. We focus on LLM integration skills, not traditional ML.
- **Step 1: Application Review** (3-5 days) - We review your resume, AI project experience, and GitHub.
- **Step 2: Recruiter Screen** (30 min) - We'll discuss your background, AI interests, and compensation.
- **Step 3: Technical Screen** (60 min) - LLM fundamentals, past AI projects, and prompt engineering basics.
- **Step 4: RAG System Design** (60 min) - Design a RAG system including chunking, retrieval, and evaluation.
- **Step 5: Prompt Engineering Exercise** (60 min) - Design and iterate on prompts with live evaluation.
- **Step 6: Team Interviews** (2 x 30 min) - Meet potential teammates and discuss collaboration.
- **Step 7: Hiring Manager** (30 min) - Career goals and offer discussion.
We respond within 5 business days and provide written feedback. Real AI problems, not LeetCode.
- Transparent discussion about our AI challenges and roadmap
**What We DON'T Do:**
- LeetCode-style algorithm puzzles (we focus on AI system design)
- Traditional ML/statistics interviews (this isn't that role)
- "Gotcha" questions about transformer architectures
- Ghost candidates (we always follow up)
## Equal Opportunity
[Company] is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We believe diverse teams build better AI products—different perspectives help us identify biases and build systems that work for everyone.
We encourage applications from candidates who may not meet 100% of the qualifications. Research shows that women and underrepresented groups are less likely to apply unless they meet every requirement—we'd rather you apply and let us decide.
If you need accommodations during the interview process, let us know and we'll make it work.
---
*Questions before applying? Email hiring@company.com*
*Ready to build AI features that help teams work smarter? Apply now.*