When should we start building a data platform vs. just using individual tools?

Start with individual tools (Fivetran, dbt, warehouse) when you have 1-3 data engineers and simple needs. Build a platform when: you have 5+ data engineers, multiple teams consuming data, need for self-service capabilities, governance requirements, or platform capabilities become a bottleneck. Most companies start with tools and evolve to platform thinking as they scale.

Do we need dedicated platform engineers or can data engineers build platform capabilities?

Data engineers can build basic platform capabilities, but dedicated platform engineers are needed for: self-service tooling, data catalogs, developer experience improvements, CI/CD for data, and platform infrastructure. Platform engineering requires product thinking, software engineering skills, and developer empathy—distinct from pipeline engineering. Hire platform engineers when you have 5+ data engineers or need sophisticated platform capabilities.

How much does a data platform cost?

Team costs: $130-205K per engineer annually (salary + tools). Infrastructure: $110-310K for small platforms (3-5 people), $360-980K for medium (6-10 people), $1.18M-3.1M+ for large (10+ people). Total annual cost for small platform: $500K-1M, medium: $1.5M-3M, large: $3M-8M+. Most cost is people, not infrastructure—invest in the right team.

What tools should we use for our data platform?

Start with modern data stack: Fivetran/Airbyte (ingestion), Snowflake/BigQuery/Redshift (warehouse), dbt (transformation), Airflow/Dagster (orchestration), Looker/Tableau/Mode (BI). Add platform components as needed: data catalog (Atlan, DataHub) when you have 50+ datasets, self-service compute when analysts wait for resources, custom tooling when off-the-shelf can't meet needs. Don't over-engineer—use managed services aggressively.

Building a Data Platform: The Complete Guide

Data Engineer

Definition

A Data Engineer is a technical professional who designs, builds, and maintains software systems using programming languages and development frameworks. This specialized role requires deep technical expertise, continuous learning, and collaboration with cross-functional teams to deliver high-quality software products that meet business needs.

Data Engineer is a fundamental concept in tech recruiting and talent acquisition. In the context of hiring developers and technical professionals, data engineer plays a crucial role in connecting organizations with the right talent. Whether you're a recruiter, hiring manager, or candidate, understanding data engineer helps navigate the complex landscape of modern tech hiring. This concept is particularly important for developer-focused recruiting where technical expertise and cultural fit must be carefully balanced.

Read full definition

Overview

A data platform is the comprehensive infrastructure and tooling that enables data work across your entire organization. Unlike individual data pipelines or analytics dashboards, a data platform provides the foundation, standards, and self-service capabilities that let data engineers, analysts, data scientists, and product teams work effectively with data.

A mature data platform includes: reliable data ingestion pipelines, scalable data storage (warehouses and lakes), transformation and modeling layers, data discovery and cataloging, access control and governance, self-service tooling for data consumers, monitoring and quality systems, and platform engineering capabilities that make data work easier for everyone.

Building a data platform is a strategic initiative that requires careful planning, the right team composition, and technology decisions that balance immediate needs with long-term scalability. The companies that succeed treat their data platform as a product with internal customers, not just infrastructure to maintain.

What Success Looks Like

A successful data platform enables data work across your organization without constant bottlenecks or reliability issues. Here's what distinguishes a mature data platform from ad-hoc data infrastructure:

Self-Service Capability

Data consumers find what they need through catalogs and discovery tools without asking the data team
New data sources integrate through established patterns and self-service connectors
Analysts and scientists provision their own compute resources and environments
Documentation is discoverable so knowledge scales beyond individual team members
Onboarding is fast—new team members become productive in days, not weeks

Reliability and Trust

Pipelines run reliably without daily firefighting or manual intervention
Data quality issues surface automatically before reaching consumers
Lineage is traceable from source to final metric or dashboard
Stakeholders trust the numbers because definitions are clear and consistent
Incidents are rare and recovery is automated when they occur

Developer Experience

Data engineers work efficiently using established patterns and tooling
CI/CD exists for data transformations and models
Testing frameworks catch issues before production
Development environments mirror production without friction
Platform capabilities reduce repetitive work and enable focus on business logic

Governance at Scale

Access controls work without blocking legitimate needs
Sensitive data is protected with appropriate masking and restrictions
Compliance requirements are met through automated policies
Cost is predictable and scales appropriately with usage
Audit trails exist for debugging and compliance

Platform Architecture Layers

Understanding what a data platform includes helps you plan hiring and technology decisions:

Layer 1: Data Ingestion

Purpose: Getting data from sources into your platform

Components:

Connectors - Fivetran, Airbyte, or custom pipelines for APIs, databases, files
Event streaming - Kafka, Kinesis, or Pub/Sub for real-time data (when needed)
Change data capture - Replicating database changes automatically
File ingestion - Handling batch uploads, exports, and data shares

Who builds this: Data Engineers (Pipeline focus)

Layer 2: Data Storage

Purpose: Scalable, queryable storage for all your data

Components:

Data warehouse - Snowflake, BigQuery, Redshift for structured analytics
Data lake - S3, GCS, Azure Data Lake for raw and semi-structured data
Feature stores - For ML features (Feast, Tecton) if doing ML
Metadata storage - Data catalogs, schema registries, lineage graphs

Who builds this: Data Engineers, Platform Engineers

Layer 3: Data Transformation

Purpose: Turning raw data into clean, business-ready models

Components:

Transformation layer - dbt, Spark, or custom SQL/Python transformations
Data modeling - Dimensional models, data vault, or other patterns
Metrics layer - Centralized metric definitions (dbt metrics, Looker LookML)
Quality testing - dbt tests, Great Expectations, Monte Carlo

Who builds this: Analytics Engineers, Data Engineers

Layer 4: Data Discovery and Catalog

Purpose: Helping people find and understand data

Components:

Data catalog - Atlan, DataHub, Collibra, or custom solutions
Schema registry - Tracking data schemas and changes
Lineage tracking - Understanding data flow from source to consumption
Documentation - Automated and manual documentation of datasets

Who builds this: Platform Engineers, Analytics Engineers

Layer 5: Access and Governance

Purpose: Controlling who can access what data

Components:

Access control - RBAC, row-level security, column masking
Secrets management - Secure credential storage and rotation
Compliance tooling - PII detection, GDPR/CCPA automation
Audit logging - Tracking data access and changes

Who builds this: Platform Engineers, Security Engineers

Layer 6: Self-Service Tooling

Purpose: Enabling data consumers to work independently

Components:

BI platforms - Looker, Tableau, Mode, Metabase
Notebook environments - Jupyter, Hex, Deepnote for data science
Query interfaces - SQL editors, query builders, API gateways
Compute provisioning - Self-service Spark clusters, warehouse resources

Who builds this: Platform Engineers, Analytics Engineers

Layer 7: Platform Infrastructure

Purpose: Making the platform itself easier to build and operate

Components:

CI/CD for data - Testing, versioning, and deploying data transformations
Orchestration - Airflow, Dagster, Prefect for workflow management
Monitoring and alerting - Pipeline health, data quality, cost monitoring
Developer tooling - SDKs, CLIs, APIs for platform capabilities

Who builds this: Platform Engineers, Data Engineers

Roles You'll Need

Building a data platform requires different skills than building individual pipelines. Here's who you need and when:

Data Engineer (Pipeline Focus)

Focus: Building and maintaining data pipelines and ingestion infrastructure
Key skills: Python, SQL, orchestration (Airflow/Dagster), cloud data warehouses
When to hire: First data platform hire—establishes core pipelines
Salary range: $120-165K mid, $165-210K senior

Data engineers focused on pipelines handle ingestion, orchestration, and reliability. They build the infrastructure that moves data from sources to destinations and ensure it runs reliably. At early stages, one strong data engineer handles everything. As you scale, they specialize into pipeline engineers (ingestion focus) and platform engineers (tooling focus).

Analytics Engineer

Focus: Transforming raw data into clean, business-ready models using dbt
Key skills: Advanced SQL, dbt, data modeling, stakeholder communication
When to hire: After pipelines exist—builds the transformation layer
Salary range: $110-145K mid, $145-185K senior

Analytics engineers bridge raw data and business metrics. They own the transformation layer—turning event logs into user journeys, transactions into revenue metrics, and raw tables into dimensional models. This role emerged with dbt's popularity and is distinct from traditional data engineering. Analytics engineers need excellent SQL and business context.

Data Platform Engineer

Focus: Building internal tools and platforms for data producers and consumers
Key skills: Software engineering, infrastructure, developer experience, product thinking
When to hire: When your data team reaches 5+ and needs better tooling
Salary range: $140-180K mid, $180-230K senior

Platform engineers build the developer experience for data work—catalog systems, data discovery tools, access management, self-service infrastructure, CI/CD for data, and developer tooling. This is a senior role for teams at scale where the data infrastructure itself becomes a product with internal customers. Platform engineers need product thinking, not just technical skills.

Data Engineer (Infrastructure Focus)

Focus: Data storage, compute, and infrastructure optimization
Key skills: Cloud infrastructure, warehouse optimization, cost management, scalability
When to hire: When storage/compute costs or performance become bottlenecks
Salary range: $120-165K mid, $165-210K senior

Infrastructure-focused data engineers optimize warehouses, manage data lakes, handle partitioning strategies, optimize query performance, and control costs. They understand the storage and compute layer deeply and ensure the platform scales efficiently.

ML Engineer (for Feature Platforms)

Focus: Building feature pipelines and ML infrastructure
Key skills: Python, feature stores, ML frameworks, streaming systems
When to hire: When ML models need production features beyond batch data
Salary range: $150-190K mid, $190-250K senior

If your data platform feeds ML models, you'll need engineers who understand both data engineering and ML requirements. Feature pipelines have different latency, freshness, and consistency needs than analytics pipelines.

Technology Stack Decisions

The Modern Data Stack (Recommended Starting Point)

Most companies should start with managed services and add complexity only when needed:

Layer	Recommended Tools	Why
Ingestion	Fivetran, Airbyte, Stitch	Managed connectors reduce maintenance burden
Warehouse	Snowflake, BigQuery, Redshift	Scalable, SQL-native, managed
Transformation	dbt	Industry standard, testable, version-controlled
Orchestration	Airflow, Dagster, Prefect	Dependency management, monitoring, alerting
Quality	dbt tests, Great Expectations	Catch issues before stakeholders do
BI	Looker, Tableau, Mode, Metabase	Self-service analytics for stakeholders

Start here. This stack handles 80% of data platform needs without custom development.

When to Add Platform Components

Add a data catalog when:

You have 50+ datasets and people can't find what they need
Multiple teams create similar datasets without knowing
Data lineage questions come up frequently ("Where does this metric come from?")

Add self-service compute when:

Analysts wait for data engineers to provision resources
Data scientists need custom environments frequently
Compute costs are unpredictable due to manual provisioning

Add custom platform tooling when:

Off-the-shelf tools can't meet your requirements
You need deep integration with proprietary systems
You have 10+ data engineers and tooling becomes a bottleneck

Add streaming infrastructure when:

You have user-facing latency requirements (fraud detection, personalization)
Event-driven architectures need real-time reactions
Batch windows can't complete due to data volumes

Build vs. Buy Decision Framework

Buy (managed services) when:

Standard functionality meets your needs
You want vendor support and maintenance
Time-to-value matters more than customization
Your team lacks capacity for custom development

Build (custom platform) when:

Analytics is a core product feature (embedded analytics)
Off-the-shelf tools can't meet performance requirements
You need deep integration with proprietary systems
Regulatory requirements prohibit third-party tools

Reality check: Even companies with sophisticated data platforms use managed services for core infrastructure (warehouses, ingestion) and build custom tooling only where it provides competitive advantage.

Team Structure and Hiring Sequence

Phase 1: Foundation (1-2 people)

Your first data platform hire should be a senior data engineer who can:

Set up core infrastructure (warehouse, ingestion, orchestration)
Build initial pipelines for critical data sources
Establish patterns others can follow
Work independently with minimal supervision
Make pragmatic decisions without over-engineering

Interview focus: "Tell me about a time you built data infrastructure from scratch. What tradeoffs did you make?"

What to look for:

Strong SQL and Python fundamentals
Experience with at least one modern orchestrator
Product mindset (understands why data matters to the business)
Self-directed with minimal supervision

Red flag: Candidates who want to implement Kafka, Spark, and a lakehouse from day one. Start simple.

Phase 2: Growing the Platform (3-5 people)

Once your foundation is solid, add specialists:

Second hire: Analytics Engineer

Handles dbt transformations and data modeling
Partners with analysts to understand needs
Frees up the first engineer for infrastructure work
Builds the transformation layer

Third hire: Based on your bottleneck

More ingestion complexity → Data Engineer (Pipeline focus)
More modeling needs → Analytics Engineer
Reliability issues → Data Engineer (Infrastructure focus)
Need for self-service → Data Platform Engineer

Introduce specialization:

Domain ownership — Engineers own specific data domains
Platform work — Someone focuses on tooling and developer experience
Reliability — Someone focuses on monitoring and incident response

Phase 3: Scale (6+ people)

At this stage, formalize teams:

Pipeline Team:

Ingestion and orchestration
Reliability and monitoring
Infrastructure optimization

Analytics Engineering Team:

dbt models and transformations
Data quality and testing
Metric definitions and governance

Platform Team:

Self-service tooling
Data catalog and discovery
Developer experience
Access control and governance

You'll need technical leadership (Data Platform Manager or Head of Data Platform) to coordinate across teams, set standards, and represent data platform in business decisions.

Common Pitfalls

1. Building Before Understanding Needs

The mistake: "Let's build a data platform" without clear requirements
The result: Expensive infrastructure that doesn't solve actual problems

Better approach: Start with specific problems. "We need reliable product analytics" leads to pipelines and dashboards. "We need self-service data access" leads to catalog and tooling. Build incrementally based on actual needs.

2. Over-Engineering from Day One

The mistake: Building a data lakehouse with Spark, Kafka, and custom orchestration before you have reliable pipelines
The result: Months of infrastructure work before delivering business value

Better approach: Use managed services aggressively. Fivetran/Airbyte for ingestion, a cloud warehouse, dbt for transformations. Add complexity only when managed services can't meet requirements.

3. Ignoring Developer Experience

The mistake: Building platform capabilities without considering how data engineers will use them
The result: Tooling that's technically impressive but doesn't improve productivity

Better approach: Treat data engineers as customers. Gather feedback, measure adoption, iterate based on usage. Platform engineering is product engineering—build for users, not for technical elegance.

4. Requiring Specific Tool Experience

The mistake: "Must have 3+ years of Airflow AND dbt AND Snowflake experience"
The result: Filtered out excellent candidates who used Prefect, Dagster, or BigQuery

Better approach: Test fundamentals—SQL depth, data modeling, systems thinking, product thinking. Someone who understands orchestration concepts learns Airflow in weeks. Someone who understands data modeling applies it across tools.

5. Skipping Data Quality Until Crisis

The mistake: Building pipelines without tests or monitoring
The result: Stakeholder trust erodes when numbers don't match

Better approach: Data quality is a first-class concern from day one. Build tests alongside pipelines, not after. Monitor data quality metrics. Catch issues before they reach consumers.

6. Underestimating Platform Engineering Needs

The mistake: Expecting data engineers to build platform tooling in addition to pipelines
The result: Platform capabilities are half-built and unreliable

Better approach: Platform engineering is a distinct discipline requiring product thinking, software engineering skills, and developer empathy. Hire dedicated platform engineers when you need self-service capabilities, not just infrastructure.

7. No Governance Until Scale

The mistake: Building a platform without access controls, documentation, or lineage tracking
The result: Security issues, knowledge silos, and inability to trace data issues

Better approach: Build governance into the platform from the start. Even simple access controls and documentation prevent problems later. Governance doesn't mean bureaucracy—it means making data work safely and reliably.

Interview Strategy

Technical Assessment

For Data Engineers:

SQL depth - Complex queries with window functions, CTEs, optimization scenarios
Pipeline design - "Design a pipeline architecture for this use case"
Debugging - "This pipeline is slow/failing. Walk me through your approach."
System thinking - "How would you handle late-arriving data? Schema changes?"

For Analytics Engineers:

Data modeling - "Design a data model for e-commerce analytics"
dbt knowledge - "Walk me through your dbt project structure"
Stakeholder scenarios - "A stakeholder questions your numbers. How do you handle it?"

For Platform Engineers:

Product thinking - "How would you design a data catalog for 50+ data consumers?"
API design - "Design an API for developers to provision data resources"
Developer experience - "How would you gather feedback from data engineers?"
System design - "Design a self-service platform for data access"

Questions to Ask

For data platform engineers:

"Walk me through a platform capability you built. Who used it? What impact did it have?"
"How do you balance building new features with maintaining existing infrastructure?"
"Tell me about a time you improved developer experience based on feedback."
"How would you set up data platform infrastructure for a company at our stage?"

For senior hires:

"How would you structure a data platform team from scratch?"
"What's your approach to build vs. buy decisions for platform components?"
"How do you think about balancing standardization with flexibility?"
"How do you measure platform success?"

Building Your Data Platform Culture

Great data platforms aren't just technology—they're culture. Hire for these traits:

Product thinking — Platform engineers treat infrastructure as a product with customers
Ownership mentality — Engineers feel responsible for platform reliability and developer experience
Stakeholder empathy — Understanding that data serves business decisions, not technical elegance
Documentation habits — Writing things down so knowledge scales beyond individuals
Quality obsession — Treating data quality and reliability as non-negotiable

The best data platform teams feel ownership over enabling data work across the organization. They celebrate when data consumers become productive faster, not just when pipelines run or tools look impressive.

Budget Planning

Cost Per Team Member (US, 2026)

Data Engineer (Mid-level):

Salary: $120-165K
Infrastructure/tools: $10-20K
Total: $130-185K

Analytics Engineer (Mid-level):

Salary: $110-145K
Infrastructure/tools: $5-10K
Total: $115-155K

Data Platform Engineer (Mid-level):

Salary: $140-180K
Infrastructure/tools: $15-25K
Total: $155-205K

Infrastructure Costs (Annual)

Small Platform (3-5 people, moderate data volume):

Warehouse: $50-150K
Ingestion tools: $20-50K
Orchestration: $10-30K
BI tools: $30-80K
Total: $110-310K

Medium Platform (6-10 people, high data volume):

Warehouse: $150-400K
Ingestion tools: $50-150K
Orchestration: $30-80K
BI tools: $80-200K
Catalog/governance: $50-150K
Total: $360-980K

Large Platform (10+ people, very high data volume):

Warehouse: $400K-1M+
Ingestion tools: $150-400K
Orchestration: $80-200K
BI tools: $200-500K
Catalog/governance: $150-400K
Custom platform development: $200-500K
Total: $1.18M-3.1M+

ROI Considerations

Value delivered:

Faster time-to-insight for business decisions
Reduced engineering time on repetitive tasks
Better data quality leading to better decisions
Self-service capability reducing bottlenecks
Compliance and governance reducing risk

Cost of not building:

Data engineers spend 50%+ time on manual, repetitive work
Data quality issues causing bad business decisions
Security/compliance risks from ad-hoc data access
Knowledge silos creating bus factor risk
Inability to scale data work with business growth

Timeline: Building Your Data Platform

Months 1-3: Foundation

Hire first data engineer
Set up warehouse and core ingestion
Build pipelines for critical data sources
Establish basic patterns and standards

Months 4-6: Transformation Layer

Hire analytics engineer
Set up dbt and build initial models
Create foundational dashboards
Document data definitions

Months 7-12: Scale and Reliability

Add second data engineer or platform engineer
Improve monitoring and alerting
Optimize performance and costs
Build self-service capabilities (catalog, discovery)

Months 13-18: Platform Maturity

Formalize team structure
Add governance and access controls
Build advanced platform capabilities
Scale to support more data consumers

Ongoing: Continuous Improvement

Gather feedback from data consumers
Iterate on platform capabilities
Optimize costs and performance
Add new capabilities based on needs

The Trust Lens

Industry Reality

Frequently Asked Questions

Data pipelines are individual workflows that move data from sources to destinations. A data platform is the comprehensive infrastructure that enables all data work—pipelines, storage, transformation, discovery, governance, and self-service tooling. Pipelines are components of a platform. Building a platform means creating the foundation, standards, and capabilities that make data work easier for everyone, not just individual pipelines.

Building a Data Platform: The Complete Guide

Data Engineer

Overview

What Success Looks Like

Self-Service Capability

Reliability and Trust

Developer Experience

Governance at Scale

Platform Architecture Layers

Layer 1: Data Ingestion

Layer 2: Data Storage

Layer 3: Data Transformation

Layer 4: Data Discovery and Catalog

Layer 5: Access and Governance

Layer 6: Self-Service Tooling

Layer 7: Platform Infrastructure

Roles You'll Need

Data Engineer (Pipeline Focus)

Analytics Engineer

Data Platform Engineer

Data Engineer (Infrastructure Focus)

ML Engineer (for Feature Platforms)

Technology Stack Decisions

The Modern Data Stack (Recommended Starting Point)

When to Add Platform Components

Build vs. Buy Decision Framework

Team Structure and Hiring Sequence

Phase 1: Foundation (1-2 people)

Phase 2: Growing the Platform (3-5 people)

Phase 3: Scale (6+ people)

Common Pitfalls

1. Building Before Understanding Needs

2. Over-Engineering from Day One

3. Ignoring Developer Experience

4. Requiring Specific Tool Experience

5. Skipping Data Quality Until Crisis

6. Underestimating Platform Engineering Needs

7. No Governance Until Scale

Interview Strategy

Technical Assessment

Questions to Ask

Building Your Data Platform Culture

Budget Planning

Cost Per Team Member (US, 2026)

Infrastructure Costs (Annual)

ROI Considerations

Timeline: Building Your Data Platform

The Trust Lens

Frequently Asked Questions

Frequently Asked Questions

What's the difference between a data platform and data pipelines?

When should we start building a data platform vs. just using individual tools?

Do we need dedicated platform engineers or can data engineers build platform capabilities?

How much does a data platform cost?

What tools should we use for our data platform?

No interview questions available

Hiring outcome guide

Building a Data Platform

Building a Data Platform Strategy

Define Your Requirements

Craft Your Message

Source Candidates

Screen Effectively

Close Strong

Building a Data Platform

Market Pulse

Critical Skills (Must Haves)

Nice-to-Have (Bonus)

Quick Context

Common Mistakes

Interview Tips

Red Flags

Keep Exploring

Related Roles

Related Stacks

Related Levels

Related Scenarios

The best teams don't wait.They're already here.

The best teams don't wait.
They're already here.