# AI Engineer (LLM Applications)
Location: San Francisco, CA (Hybrid) · Employment Type: Full-time · Level: Mid-Senior
[Company] is building AI-powered customer support automation for B2B SaaS companies. Our platform uses LLMs to automatically resolve customer tickets, answer questions from documentation, and route complex issues to the right human agents. We process 200K+ support interactions monthly across 150+ customers.
We're a 38-person team backed by $25M in Series B funding from Sequoia and Greylock. Our AI features achieve 65% automated resolution rates for our top customers—meaning fewer repetitive tickets for support teams and faster answers for customers. We're pragmatic about AI: we ship features that work, not demos that impress.
Why join [Company]?
- Build AI features used by thousands of support teams daily
- Work with cutting-edge LLMs (GPT-4, Claude, open-source models)
- Solve real problems: accuracy, latency, cost, hallucination prevention
- Competitive compensation with meaningful equity in a growing market
Let's be clear: This is about building production AI FEATURES, not research or demos.
We're looking for an AI Engineer to own our LLM integration layer. You'll build RAG systems that ground responses in customer documentation, design prompt chains that handle edge cases reliably, and optimize our $20K/month LLM spend without sacrificing quality.
You'll report to our Head of AI and work alongside 3 other AI engineers, 4 backend engineers, and 2 ML engineers. Your focus will be making our AI features more accurate, faster, and cost-effective—while shipping improvements weekly.
The problem you'll solve:
Our AI assistant handles 200K interactions monthly but struggles with domain-specific questions and occasionally hallucinates. We need someone to improve our RAG pipeline accuracy from 78% to 90%+, reduce hallucination rates from 5% to under 1%, and cut latency from 4 seconds to under 2 seconds.
- RAG system development—building retrieval pipelines that combine LLMs with customer knowledge bases
- Prompt engineering—designing reliable prompts that handle edge cases and produce consistent outputs
- LLM integration—connecting OpenAI, Anthropic, and open-source models to production features
- AI evaluation—creating test suites, measuring quality metrics, catching regressions
- Cost optimization—caching, model selection, and token efficiency at scale
- Production engineering—building reliable systems with proper error handling, not just calling APIs
- ML research—we ship products, not papers
- Training models from scratch—we use pre-trained LLMs
- Traditional ML/data science—no regression models or statistical analysis
- Pure data engineering—we have dedicated data engineers
*If you're looking for traditional ML model training, this isn't the role. We're specifically hiring for LLM integration and RAG systems.*
- Improve RAG accuracy from 78% to 90%+ on our evaluation benchmark
- Reduce hallucination rate from 5% to under 1%
- Cut P95 response latency from 4 seconds to under 2 seconds
- Reduce monthly LLM costs by 30% while maintaining quality
- Build evaluation framework that catches regressions before production
- Design and implement RAG pipelines using LangChain, LlamaIndex, or direct APIs
- Build document chunking and retrieval strategies optimized for support content
- Engineer prompts for production use cases—system prompts, few-shot examples, chain-of-thought
- Integrate LLM APIs with proper error handling, fallbacks, and retry logic
- Create evaluation frameworks to measure AI output quality systematically
- Implement caching and model selection strategies to optimize costs
- Build guardrails for hallucination detection and content safety
- Monitor AI systems in production—latency, costs, quality metrics, error rates
- Collaborate with product and customer success to understand AI requirements
- Document prompt patterns and RAG architectures for team knowledge sharing
Software Engineering Foundation (Required First):
- 3+ years of professional software engineering experience
- Strong proficiency in Python
- Production experience building and deploying web services
- Understanding of software architecture, testing, and code quality
- You're an engineer first, AI specialist second—we need reliable systems
LLM Integration Experience:
- 1+ years building LLM-powered applications (chatbots, RAG, agents, or similar)
- Understanding of LLM concepts: prompting, tokens, context windows, embeddings
- Experience with LLM APIs (OpenAI, Anthropic, or similar)
- Familiarity with prompt engineering patterns
RAG Experience:
- Built at least one RAG system or retrieval-augmented application
- Understanding of embeddings, vector similarity, and retrieval strategies
- Experience with document chunking and context window optimization
- Experience with LLM frameworks (LangChain, LlamaIndex)
- Familiarity with vector databases (Pinecone, Weaviate, pgvector)
- Experience with multiple LLM providers and model comparison
- Background in prompt optimization and systematic evaluation
- Experience with AI agents and tool use patterns
- Understanding of fine-tuning approaches
- Cost optimization experience for production AI systems
LLM Providers:
- OpenAI (GPT-4, GPT-4-turbo) for primary responses
- Anthropic (Claude 3) for complex reasoning
- Open-source models via Together AI for cost-sensitive features
AI Frameworks:
- LangChain for prompt chains and retrieval
- Custom evaluation framework built on pytest
- LangSmith for LLM observability
Vector Database:
- Pinecone for production vector search
- 5M+ document chunks indexed
Backend:
- Python (FastAPI) for AI services
- Redis for response caching
- PostgreSQL for structured data
Infrastructure:
- AWS (ECS, Lambda, S3)
- Kubernetes for AI service orchestration
- Monthly Interactions: 200K+
- Current Accuracy: 78% (target: 90%+)
- Hallucination Rate: 5% (target: <1%)
- P95 Latency: 4 seconds (target: <2 seconds)
- Monthly LLM Spend: $20K (target: $14K with same quality)
- Automated Resolution Rate: 65% (target: 80%)
Salary: $175,000 - $230,000 (based on experience)
Equity: 0.05% - 0.12% (4-year vest, 1-year cliff)
Benefits:
- Medical, dental, and vision insurance (100% employee, 80% dependents)
- Unlimited PTO with 15-day minimum encouraged
- $5,000 annual learning budget (AI conferences, courses)
- $2,000 home office setup allowance
- 401(k) with 4% company match
- 16 weeks paid parental leave
- Flexible hybrid work (2-3 days in SF office)
AI-Specific Perks:
- $100/month OpenAI/Anthropic API credits for personal projects
- Conference budget for AI-focused events (AI Engineer Summit, NeurIPS)
- 10% time for AI experimentation
Our interview process typically takes 2-3 weeks. We focus on LLM integration skills, not traditional ML.
- Step 1: Application Review (3-5 days) — We review your resume and AI project experience
- Step 2: Recruiter Screen (30 min) — Background, interests, and compensation
- Step 3: Technical Screen (60 min) — LLM concepts, RAG patterns, past projects
- Step 4: RAG System Design (60 min) — Design a RAG system including chunking, retrieval, and evaluation
- Step 5: Prompt Engineering Exercise (60 min) — Design and iterate on prompts with live evaluation
- Step 6: Team Interviews (2 x 30 min) — Meet potential teammates
- Step 7: Hiring Manager (30 min) — Career goals and offer discussion
Real AI problems, not LeetCode. We respond within 5 business days with written feedback.
Submit your resume. We'd especially love to see examples of LLM applications you've built—GitHub repos, blog posts, or descriptions in your resume all work.
[Company] is an equal opportunity employer. We evaluate candidates based on skills and learning ability, not years of LangChain experience (nobody has 10 years of it anyway).
# AI Engineer (LLM Applications)
**Location:** San Francisco, CA (Hybrid) · **Employment Type:** Full-time · **Level:** Mid-Senior
## About [Company]
[Company] is building AI-powered customer support automation for B2B SaaS companies. Our platform uses LLMs to automatically resolve customer tickets, answer questions from documentation, and route complex issues to the right human agents. We process 200K+ support interactions monthly across 150+ customers.
We're a 38-person team backed by $25M in Series B funding from Sequoia and Greylock. Our AI features achieve 65% automated resolution rates for our top customers—meaning fewer repetitive tickets for support teams and faster answers for customers. We're pragmatic about AI: we ship features that work, not demos that impress.
**Why join [Company]?**
- Build AI features used by thousands of support teams daily
- Work with cutting-edge LLMs (GPT-4, Claude, open-source models)
- Solve real problems: accuracy, latency, cost, hallucination prevention
- Competitive compensation with meaningful equity in a growing market
## The Role
**Let's be clear: This is about building production AI FEATURES, not research or demos.**
We're looking for an AI Engineer to own our LLM integration layer. You'll build RAG systems that ground responses in customer documentation, design prompt chains that handle edge cases reliably, and optimize our $20K/month LLM spend without sacrificing quality.
You'll report to our Head of AI and work alongside 3 other AI engineers, 4 backend engineers, and 2 ML engineers. Your focus will be making our AI features more accurate, faster, and cost-effective—while shipping improvements weekly.
**The problem you'll solve:**
Our AI assistant handles 200K interactions monthly but struggles with domain-specific questions and occasionally hallucinates. We need someone to improve our RAG pipeline accuracy from 78% to 90%+, reduce hallucination rates from 5% to under 1%, and cut latency from 4 seconds to under 2 seconds.
## What This Role IS
- **RAG system development**—building retrieval pipelines that combine LLMs with customer knowledge bases
- **Prompt engineering**—designing reliable prompts that handle edge cases and produce consistent outputs
- **LLM integration**—connecting OpenAI, Anthropic, and open-source models to production features
- **AI evaluation**—creating test suites, measuring quality metrics, catching regressions
- **Cost optimization**—caching, model selection, and token efficiency at scale
- **Production engineering**—building reliable systems with proper error handling, not just calling APIs
## What This Role is NOT
- **ML research**—we ship products, not papers
- **Training models from scratch**—we use pre-trained LLMs
- **Traditional ML/data science**—no regression models or statistical analysis
- **Pure data engineering**—we have dedicated data engineers
*If you're looking for traditional ML model training, this isn't the role. We're specifically hiring for LLM integration and RAG systems.*
## Objectives of This Role
- Improve RAG accuracy from 78% to 90%+ on our evaluation benchmark
- Reduce hallucination rate from 5% to under 1%
- Cut P95 response latency from 4 seconds to under 2 seconds
- Reduce monthly LLM costs by 30% while maintaining quality
- Build evaluation framework that catches regressions before production
## Responsibilities
- Design and implement RAG pipelines using LangChain, LlamaIndex, or direct APIs
- Build document chunking and retrieval strategies optimized for support content
- Engineer prompts for production use cases—system prompts, few-shot examples, chain-of-thought
- Integrate LLM APIs with proper error handling, fallbacks, and retry logic
- Create evaluation frameworks to measure AI output quality systematically
- Implement caching and model selection strategies to optimize costs
- Build guardrails for hallucination detection and content safety
- Monitor AI systems in production—latency, costs, quality metrics, error rates
- Collaborate with product and customer success to understand AI requirements
- Document prompt patterns and RAG architectures for team knowledge sharing
## Required Skills and Qualifications
**Software Engineering Foundation (Required First):**
- 3+ years of professional software engineering experience
- Strong proficiency in Python
- Production experience building and deploying web services
- Understanding of software architecture, testing, and code quality
- You're an engineer first, AI specialist second—we need reliable systems
**LLM Integration Experience:**
- 1+ years building LLM-powered applications (chatbots, RAG, agents, or similar)
- Understanding of LLM concepts: prompting, tokens, context windows, embeddings
- Experience with LLM APIs (OpenAI, Anthropic, or similar)
- Familiarity with prompt engineering patterns
**RAG Experience:**
- Built at least one RAG system or retrieval-augmented application
- Understanding of embeddings, vector similarity, and retrieval strategies
- Experience with document chunking and context window optimization
## Preferred Skills and Qualifications
- Experience with LLM frameworks (LangChain, LlamaIndex)
- Familiarity with vector databases (Pinecone, Weaviate, pgvector)
- Experience with multiple LLM providers and model comparison
- Background in prompt optimization and systematic evaluation
- Experience with AI agents and tool use patterns
- Understanding of fine-tuning approaches
- Cost optimization experience for production AI systems
## Tech Stack
**LLM Providers:**
- OpenAI (GPT-4, GPT-4-turbo) for primary responses
- Anthropic (Claude 3) for complex reasoning
- Open-source models via Together AI for cost-sensitive features
**AI Frameworks:**
- LangChain for prompt chains and retrieval
- Custom evaluation framework built on pytest
- LangSmith for LLM observability
**Vector Database:**
- Pinecone for production vector search
- 5M+ document chunks indexed
**Backend:**
- Python (FastAPI) for AI services
- Redis for response caching
- PostgreSQL for structured data
**Infrastructure:**
- AWS (ECS, Lambda, S3)
- Kubernetes for AI service orchestration
## AI Metrics
- **Monthly Interactions:** 200K+
- **Current Accuracy:** 78% (target: 90%+)
- **Hallucination Rate:** 5% (target: <1%)
- **P95 Latency:** 4 seconds (target: <2 seconds)
- **Monthly LLM Spend:** $20K (target: $14K with same quality)
- **Automated Resolution Rate:** 65% (target: 80%)
## Compensation and Benefits
**Salary:** $175,000 - $230,000 (based on experience)
**Equity:** 0.05% - 0.12% (4-year vest, 1-year cliff)
**Benefits:**
- Medical, dental, and vision insurance (100% employee, 80% dependents)
- Unlimited PTO with 15-day minimum encouraged
- $5,000 annual learning budget (AI conferences, courses)
- $2,000 home office setup allowance
- 401(k) with 4% company match
- 16 weeks paid parental leave
- Flexible hybrid work (2-3 days in SF office)
**AI-Specific Perks:**
- $100/month OpenAI/Anthropic API credits for personal projects
- Conference budget for AI-focused events (AI Engineer Summit, NeurIPS)
- 10% time for AI experimentation
## Interview Process
Our interview process typically takes 2-3 weeks. We focus on LLM integration skills, not traditional ML.
- **Step 1: Application Review** (3-5 days) — We review your resume and AI project experience
- **Step 2: Recruiter Screen** (30 min) — Background, interests, and compensation
- **Step 3: Technical Screen** (60 min) — LLM concepts, RAG patterns, past projects
- **Step 4: RAG System Design** (60 min) — Design a RAG system including chunking, retrieval, and evaluation
- **Step 5: Prompt Engineering Exercise** (60 min) — Design and iterate on prompts with live evaluation
- **Step 6: Team Interviews** (2 x 30 min) — Meet potential teammates
- **Step 7: Hiring Manager** (30 min) — Career goals and offer discussion
Real AI problems, not LeetCode. We respond within 5 business days with written feedback.
## How to Apply
Submit your resume. We'd especially love to see examples of LLM applications you've built—GitHub repos, blog posts, or descriptions in your resume all work.
[Company] is an equal opportunity employer. We evaluate candidates based on skills and learning ability, not years of LangChain experience (nobody has 10 years of it anyway).