What ETL Developers Actually Build
ETL developers create the data movement infrastructure that powers analytics.
Data Extraction
Getting data from sources:
- Database connectors — Pulling from MySQL, PostgreSQL, Oracle, SQL Server
- API integrations — REST, GraphQL, and custom API connections
- File ingestion — CSV, JSON, XML, Parquet from various sources
- Streaming sources — Kafka, Kinesis, and event streams
- SaaS integrations — Salesforce, HubSpot, Stripe, and other platforms
Data Transformation
Converting data for use:
- Cleaning — Handling nulls, duplicates, and invalid data
- Standardization — Consistent formats, naming, and types
- Enrichment — Adding derived fields and lookups
- Aggregation — Summarizing and grouping data
- Business logic — Applying rules and calculations
Data Loading
Delivering to destinations:
- Warehouse loading — Snowflake, BigQuery, Redshift
- Data lake ingestion — S3, GCS, Azure Data Lake
- Database synchronization — Keeping systems in sync
- Real-time delivery — Low-latency data pipelines
- Incremental updates — Efficient change data capture
ETL vs. ELT: The Modern Evolution
Traditional ETL
Transform before loading:
- Data transformed on dedicated ETL server
- Transformations happen outside the warehouse
- Common with legacy tools (Informatica, SSIS, Talend)
- Limited by ETL server compute
Modern ELT
Load then transform:
- Raw data loaded to warehouse first
- Transformations happen in warehouse (dbt, SQL)
- Leverages warehouse compute power
- More flexible, version-controlled transformations
When to Use Each
| Use ETL When | Use ELT When |
|---|---|
| Limited warehouse compute | Modern cloud warehouse |
| Sensitive data filtering needed | Warehouse handles all data |
| Legacy tool investment | Green field or modernizing |
| Real-time requirements | Batch is acceptable |
Skills by Experience Level
Junior ETL Developer (0-2 years)
Capabilities:
- Build basic pipelines with guidance
- Write SQL transformations
- Handle common data formats
- Use ETL tools or Python for extraction
- Debug data quality issues
Learning areas:
- Complex transformation logic
- Pipeline optimization
- Error handling patterns
- Orchestration tools
Mid-Level ETL Developer (2-4 years)
Capabilities:
- Design pipelines for complex sources
- Optimize transformation performance
- Implement error handling and recovery
- Work with orchestration (Airflow)
- Handle incremental and CDC patterns
- Mentor junior developers
Growing toward:
- Architecture decisions
- Team leadership
- Pipeline strategy
Senior ETL Developer (4+ years)
Capabilities:
- Architect data integration strategy
- Lead pipeline modernization
- Optimize for scale and cost
- Define standards and patterns
- Handle complex real-time requirements
- Guide technology decisions
Curiosity & fundamentals
Independence & ownership
Architecture & leadership
Strategy & org impact
Interview Focus Areas
Pipeline Design
Core competency:
- "Design a pipeline to sync Salesforce data to our warehouse"
- "How would you handle incremental updates for a large table?"
- "Explain change data capture and when you'd use it"
- "How do you handle schema changes in source systems?"
Transformation Logic
Daily work:
- "Walk me through how you'd clean and standardize customer data"
- "Write a SQL transformation for [business requirement]"
- "How do you handle null values and data quality issues?"
- "Explain slowly changing dimension handling in ETL"
Error Handling
Production readiness:
- "How do you handle pipeline failures?"
- "Design an alerting strategy for data pipelines"
- "How do you ensure data consistency after failures?"
- "Explain idempotent pipeline design"
Tools and Technology
Technical depth:
- "Compare Airflow, Prefect, and Dagster"
- "When would you use Spark vs. pure SQL?"
- "How do you choose between batch and streaming?"
- "Explain the role of dbt in modern data pipelines"
Common Hiring Mistakes
Over-Valuing Legacy Tool Experience
Informatica and SSIS experience is less relevant in modern stacks. Focus on fundamentals: SQL, Python, data modeling, and the ability to learn new tools. Code-based approaches are increasingly standard.
Ignoring SQL Depth
ETL is fundamentally about data manipulation. Strong SQL skills are essential regardless of tooling. Candidates who rely entirely on drag-and-drop tools may struggle with complex transformations.
Conflating with Data Engineering
ETL development focuses on integration and transformation. Data engineering is broader, including infrastructure, architecture, and potentially real-time systems. Be clear about what you need.
Expecting Both Batch and Real-Time Expertise
Batch ETL and real-time streaming are different skill sets. If you need both, consider whether you need two people or explicitly hire for streaming experience.
Where to Find ETL Developers
High-Signal Sources
- Data communities — dbt Slack, data engineering Discord
- Python data libraries — Contributors to pandas, Airflow
- LinkedIn — Keywords: data pipeline, ETL, data integration
- Technical content — Writers on data engineering topics
- daily.dev — Data engineering topic followers
Background Transitions
| Background | Strengths | Gaps |
|---|---|---|
| Database Admins | SQL, data understanding | Pipeline tooling |
| Backend Engineers | Code skills, APIs | Data domain |
| BI Developers | Transformation logic | Engineering practices |
| Data Analysts | Business context | Engineering depth |
Recruiter's Cheat Sheet
Resume Green Flags
- Production pipeline experience
- SQL expertise demonstrated
- Modern tools (Airflow, dbt, Python)
- Multiple source systems handled
- Scale mentioned (data volumes)
- Error handling and monitoring
Resume Yellow Flags
- Only legacy tools (Informatica, SSIS) without modern
- No code-based experience
- No production pipeline ownership
- Missing orchestration experience
- No data quality focus
Technical Terms to Know
| Term | What It Means |
|---|---|
| ETL | Extract, Transform, Load |
| ELT | Extract, Load, Transform |
| CDC | Change Data Capture |
| DAG | Directed Acyclic Graph (pipeline structure) |
| Airflow | Popular orchestration tool |
| dbt | SQL transformation tool |
| Idempotent | Safe to rerun without side effects |
| Incremental | Processing only changed data |
| Batch | Processing data in scheduled chunks |
| Streaming | Processing data in real-time |