OpenTelemetry seems complex. Do we really need dedicated OTel expertise, or can our backend engineers handle it?

It depends on your scale and observability maturity. For small platforms (<50 services), backend engineers can handle instrumentation as part of their regular work-OTel SDKs and auto-instrumentation make basic tracing straightforward. For larger platforms, dedicated observability expertise becomes valuable: designing sampling strategies, operating Collectors, defining instrumentation standards, and optimizing costs. The threshold is typically when you need organization-wide consistency or are spending significant money on telemetry storage.

We currently use Datadog/New Relic. Should candidates have OTel experience if we're not migrating?

Not necessarily, but OTel familiarity is increasingly valuable. Many vendor APMs now support OTel as an instrumentation option-you can use OTel SDKs while still sending data to Datadog or New Relic. This gives you vendor portability without full migration. Candidates who understand OTel can help you adopt this hybrid approach. But if you're committed to your current vendor's proprietary instrumentation, prioritize that experience and treat OTel as nice-to-have.

What's the difference between hiring someone who "uses OTel" versus someone who "owns OTel"?

"Using OTel" means instrumenting application code: adding spans, setting attributes, configuring exporters. Most backend engineers can learn this. "Owning OTel" means operating the infrastructure: deploying Collectors, designing sampling strategies, defining organization-wide conventions, optimizing costs, and supporting teams with instrumentation questions. The ownership role typically lives on platform or observability teams and requires deeper operational experience. Clarify which you need-many roles only require usage, but platform roles need ownership.

How do I assess OTel skills without testing API syntax?

Focus on observability thinking, not OTel mechanics. Ask: "Design instrumentation for a checkout service-what would you trace and why?" Good candidates discuss business-critical paths, error handling, and cardinality concerns. They don't just list every possible span. Ask about debugging: "Walk me through using traces to find why checkout is slow." This reveals whether they understand how to use telemetry, not just how to produce it. Portfolio review also works well-have candidates walk through instrumentation decisions they've made.

Hiring OpenTelemetry Engineers: The Complete Guide

Q: Should I require OpenTelemetry experience specifically, or is general observability experience enough?

For most roles, general observability experience is more valuable than OTel specifically. OpenTelemetry concepts-distributed tracing, context propagation, instrumentation patterns-transfer directly from OpenTracing, Jaeger, Zipkin, or vendor APMs like Datadog. An engineer with strong distributed tracing experience from any background can become productive with OTel in 2-3 weeks. Exception: if you're hiring someone to own OTel Collector infrastructure at scale, specific Collector experience saves significant ramp-up time.

Q: We currently use Datadog/New Relic. Should candidates have OTel experience if we're not migrating?

Not necessarily, but OTel familiarity is increasingly valuable. Many vendor APMs now support OTel as an instrumentation option-you can use OTel SDKs while still sending data to Datadog or New Relic. This gives you vendor portability without full migration. Candidates who understand OTel can help you adopt this hybrid approach. But if you're committed to your current vendor's proprietary instrumentation, prioritize that experience and treat OTel as nice-to-have.

Q: What's the difference between hiring someone who "uses OTel" versus someone who "owns OTel"?

"Using OTel" means instrumenting application code: adding spans, setting attributes, configuring exporters. Most backend engineers can learn this. "Owning OTel" means operating the infrastructure: deploying Collectors, designing sampling strategies, defining organization-wide conventions, optimizing costs, and supporting teams with instrumentation questions. The ownership role typically lives on platform or observability teams and requires deeper operational experience. Clarify which you need-many roles only require usage, but platform roles need ownership.

Q: How do I assess OTel skills without testing API syntax?

Focus on observability thinking, not OTel mechanics. Ask: "Design instrumentation for a checkout service-what would you trace and why?" Good candidates discuss business-critical paths, error handling, and cardinality concerns. They don't just list every possible span. Ask about debugging: "Walk me through using traces to find why checkout is slow." This reveals whether they understand how to use telemetry, not just how to produce it. Portfolio review also works well-have candidates walk through instrumentation decisions they've made.

Site Reliability Engineer (SRE)

Definition

A Site Reliability Engineer (SRE) is a technical professional who designs, builds, and maintains software systems using programming languages and development frameworks. This specialized role requires deep technical expertise, continuous learning, and collaboration with cross-functional teams to deliver high-quality software products that meet business needs.

Site Reliability Engineer (SRE) is a fundamental concept in tech recruiting and talent acquisition. In the context of hiring developers and technical professionals, site reliability engineer (sre) plays a crucial role in connecting organizations with the right talent. Whether you're a recruiter, hiring manager, or candidate, understanding site reliability engineer (sre) helps navigate the complex landscape of modern tech hiring. This concept is particularly important for developer-focused recruiting where technical expertise and cultural fit must be carefully balanced.

Read full definition

Shopify • E-Commerce

Commerce Platform Observability

Instrumenting thousands of services to handle Black Friday traffic peaks. OTel-based tracing enables debugging of complex order flows across payment, inventory, and fulfillment services with sub-second resolution.

High-Scale Tracing Tail Sampling Custom Instrumentation Event Correlation

eBay • E-Commerce

Global Marketplace Telemetry Migration

Migrated from proprietary instrumentation to OpenTelemetry across their global marketplace. Reduced vendor lock-in while improving trace coverage from 40% to 95% of requests.

Migration Strategy Collector Gateway Cross-Region Vendor Neutral

Skyscanner • Travel

Travel Search Distributed Tracing

End-to-end tracing from mobile app search queries through dozens of microservices including pricing, availability, and booking. Context propagation across multiple languages (Kotlin, Python, Go).

Multi-Language Mobile-to-Backend Context Propagation Business Attributes

Zalando • E-Commerce

Fashion E-Commerce Observability Platform

Built internal observability platform on OpenTelemetry serving 2,000+ engineers. Self-service instrumentation with standardized span schemas and automated sampling policies.

Platform Engineering Self-Service Schema Design Sampling Strategy

What OpenTelemetry Expertise Actually Means

Before assessing OpenTelemetry skills, understand the different levels of expertise and what they mean for your hiring needs.

Level 1: Instrumented Application User (Most Engineers)

Every engineer working with modern observability can:

Read distributed traces in Jaeger/Zipkin/vendor UIs
Navigate span hierarchies to debug slow requests
Understand basic context propagation concepts
Add simple spans to code using auto-instrumentation

This is table stakes. Don't test for it in interviews-assume any decent backend engineer develops this within weeks of working with observability.

Level 2: Instrumentation Implementer (Backend/Platform)

Engineers with hands-on OTel experience can:

Add custom spans, attributes, and events to critical code paths
Configure OTel SDK initialization and exporters
Set up the OTel Collector with processors and pipelines
Implement proper context propagation across async boundaries
Create meaningful span names and attribute schemas

This is your target for backend engineers who'll instrument services. It develops naturally with 6-12 months of production observability work.

Level 3: Observability Platform Owner (Specialized)

A smaller subset of engineers can:

Deploy and operate OTel Collectors at scale
Design telemetry pipelines with sampling, filtering, and routing
Build custom instrumentation libraries for internal frameworks
Optimize cost by managing cardinality and sampling strategies
Integrate OTel with CI/CD for deployment correlation

This is rare and valuable for platform teams building internal observability infrastructure. It requires dedicated focus, not just incidental OTel usage.

The Three Pillars of OpenTelemetry

Understanding how OTel handles each signal helps you assess candidate depth.

Traces (Distributed Tracing)

The most mature OTel signal. Traces capture request flow across service boundaries, with spans representing individual operations. Each span includes:

Span name: What operation occurred (HTTP GET, database query, queue processing)
Attributes: Key-value pairs describing the operation (http.method, db.system, user.id)
Events: Timestamped annotations within a span (exception thrown, cache miss)
Status: Success, error, or unset

Interview signal: Candidates who discuss trace context propagation, span attribute conventions, and sampling strategies understand distributed systems debugging deeply.

Metrics (Measurements and Aggregations)

OTel Metrics provide standardized instruments for measuring application behavior:

Counters: Monotonically increasing values (requests served, errors occurred)
Gauges: Point-in-time values (current queue depth, memory usage)
Histograms: Distribution of values (request latency percentiles)

Interview signal: Understanding when to use counters vs. histograms, and awareness of cardinality concerns, indicates operational maturity.

Logs (Structured Logging)

The newest OTel signal, still gaining adoption. OTel Logs aim to correlate log entries with traces:

Automatic injection of trace/span IDs into log records
Structured logging with semantic conventions
Unified collection through the OTel Collector

Interview signal: Candidates who understand log-trace correlation have modern observability perspectives beyond basic logging.

The OpenTelemetry Collector

The Collector is central to production OTel deployments. It receives, processes, and exports telemetry data.

Architecture Patterns

Agent Pattern: Collector runs as a sidecar or daemon, receiving data from local applications and forwarding to backends.

Gateway Pattern: Centralized Collector clusters receive data from many applications, enabling unified processing, sampling, and routing.

Combined Pattern: Agents forward to gateways, enabling both local buffering and centralized processing.

Key Collector Capabilities

Component	Purpose	Example
Receivers	Ingest data in various formats	OTLP, Jaeger, Prometheus, Zipkin
Processors	Transform, filter, batch data	Tail sampling, attribute modification, batching
Exporters	Send to backends	Jaeger, Zipkin, Datadog, Honeycomb, OTLP
Connectors	Route between pipelines	Span-to-metrics conversion

Interview question: "How would you deploy OTel Collectors for a 500-service platform?"

When OpenTelemetry Expertise Matters (And When It Doesn't)

High Value: Observability Platform Teams

If you're hiring someone to:

Build and maintain your company's telemetry pipeline
Design instrumentation standards for development teams
Migrate from proprietary instrumentation to OTel
Optimize observability costs through sampling and filtering

Then deep OTel experience matters. Look for candidates who've operated Collectors at scale, designed span schemas, and built instrumentation libraries.

Medium Value: Backend/Microservices Engineers

For engineers who will:

Instrument services they build and maintain
Debug production issues using distributed tracing
Add custom spans for business-critical code paths

Instrumentation skills matter, but specific OTel API knowledge is secondary. Prioritize candidates who understand what to instrument and can explain their tracing philosophy.

Low Value: Most Application Developers

For engineers who will:

Read traces to debug requests
Rely on auto-instrumentation
Focus on business logic over infrastructure

OTel familiarity is nice but irrelevant for hiring decisions. Auto-instrumentation handles most needs; reading traces requires minutes to learn.

Real-World OpenTelemetry Usage Patterns

Pattern 1: Service Mesh Integration

Challenge: Existing service mesh (Istio, Linkerd) provides some observability, but lacks application-level context like user IDs, feature flags, or business metrics.

Solution: Combine mesh-level telemetry with application instrumentation. OTel propagates trace context through mesh proxies while application code adds business-relevant attributes.

Interview question: "How would you correlate service mesh metrics with application-level traces?"

Pattern 2: Sampling at Scale

Challenge: Full trace collection at 10,000 requests per second generates terabytes of data daily-expensive to store and process.

Solution: Head-based sampling (decide at trace start) for baseline coverage, tail-based sampling (decide after trace completes) to capture errors and slow requests. The OTel Collector supports sophisticated sampling strategies.

Interview question: "You have 50,000 RPS across your platform. How do you approach trace sampling?"

Pattern 3: Migration from Proprietary SDKs

Challenge: You're using Datadog APM or New Relic agents but want to reduce vendor lock-in without losing observability during migration.

Solution: OTel's vendor-neutral design allows gradual migration. Start by deploying the Collector to receive data from existing agents, then incrementally replace proprietary instrumentation with OTel SDKs.

Interview question: "Walk me through migrating from Datadog APM to OpenTelemetry without losing visibility."

Pattern 4: Cross-Language Context Propagation

Challenge: A request flows from JavaScript frontend to Go API to Python ML service to Java payment processor. How do you maintain trace continuity?

Solution: OTel's W3C Trace Context and Baggage propagation standards work across all languages. Configure each service's OTel SDK to inject and extract context from HTTP headers.

Interview question: "How do you ensure trace context propagates across services in different languages?"

Recruiter's Cheat Sheet: Spotting Real Expertise

Resume Screening Signals

Conversation Starters That Reveal Skill Level

Question	Surface-Level Answer	Deep Understanding
"How do you decide what to instrument?"	"I add spans to slow endpoints"	"I instrument at service boundaries, async operations, and business-critical paths. I focus on actionable data-what will help me debug at 3 AM?"
"How do you handle high cardinality?"	"I don't add too many attributes"	"I distinguish between indexed attributes for querying versus span events for context. I use bounded cardinality for service/endpoint names, sampling for high-cardinality debugging data"
"What's your sampling strategy?"	"We sample 10% of traces"	"We use head-based sampling for baseline coverage plus tail-based sampling in the Collector to capture 100% of errors and high-latency traces"

Resume Signals That Matter

✅ Look for:

Scale context ("Instrumented 200+ services with OTel, processing 1M spans/sec")
Collector experience ("Deployed OTel Collector gateway handling cross-region traffic")
Migration work ("Led migration from X-Ray to OpenTelemetry across platform")
Sampling design ("Implemented tail-based sampling reducing storage costs 60%")
Schema ownership ("Defined span naming conventions and attribute schemas")

🚫 Be skeptical of:

"OpenTelemetry expert" without scale or complexity context
Only consuming traces, never instrumenting code
No mention of Collector operations or sampling
Listing OTel alongside 20 tools without depth on any
Can't explain the difference between traces, metrics, and logs

GitHub/Portfolio Indicators

OTel instrumentation libraries or contrib modules
Collector configuration with custom processors
Span attribute schema documentation
Blog posts about instrumentation patterns or sampling strategies

Common Hiring Mistakes

1. Requiring "3+ Years OpenTelemetry Experience"

OpenTelemetry only became production-ready around 2021-2022. Requiring years of experience excludes candidates with deep observability expertise from OpenTracing, Jaeger, or Zipkin-whose skills transfer directly. OTel is learnable in weeks; distributed systems thinking takes years.

2. Conflating OpenTelemetry with Observability Backends

OTel is an instrumentation standard, not an observability platform. Candidates might have deep OTel experience but use Jaeger, or extensive Datadog experience without OTel. Understanding where OTel fits in the observability stack helps you ask the right questions.

3. Testing OTel API Knowledge in Interviews

Asking "How do you create a span in Go?" wastes interview time. APIs differ by language and are in documentation. Instead, ask about instrumentation strategy: "What would you instrument in a payment processing service, and why?"

4. Ignoring the Collector

Engineers who've only added spans to application code have limited OTel experience. Production deployments require Collector expertise: pipelines, processors, sampling, batching, reliability. Ask about operational experience, not just SDK usage.

5. Treating OTel and Vendor Experience as Mutually Exclusive

The best OTel engineers often have deep experience with Datadog, New Relic, or Honeycomb. Vendor expertise provides context for why vendor-neutral matters. Don't dismiss candidates because they learned observability through commercial tools.

OpenTelemetry in the Broader Observability Landscape

Understanding where OTel fits helps you assess candidates accurately.

OTel vs. Proprietary APM (Datadog, New Relic, Dynatrace)

Proprietary APMs provide integrated instrumentation, storage, and visualization-turnkey observability with vendor lock-in. OTel provides vendor-neutral instrumentation that exports to any backend.

Implication: OTel expertise indicates comfort with building from components and valuing portability. Proprietary APM expertise indicates working within integrated platforms. Different skills, both valuable.

OTel vs. OpenTracing/OpenCensus

OpenTelemetry supersedes both projects. OpenTracing focused on distributed tracing; OpenCensus added metrics. OTel unifies both and adds logs.

Implication: Candidates with OpenTracing or OpenCensus experience have directly transferable skills. Don't treat these as separate requirements.

The Vendor Landscape

All major observability vendors support OTel: Datadog, New Relic, Honeycomb, Lightstep, Grafana, Splunk. OTLP (OpenTelemetry Protocol) is becoming the standard export format.

Implication: OTel skills are increasingly portable. Engineers can switch between vendors or run multi-vendor setups without re-instrumenting applications.

The Future: OTel Everywhere

OTel is becoming the default instrumentation layer. Cloud providers (AWS, GCP, Azure) are adding native OTel support. Frameworks and libraries are shipping with built-in OTel instrumentation. Candidates who understand OTel deeply are positioned for the observability future.

Frequently Asked Questions

For most roles, general observability experience is more valuable than OTel specifically. OpenTelemetry concepts-distributed tracing, context propagation, instrumentation patterns-transfer directly from OpenTracing, Jaeger, Zipkin, or vendor APMs like Datadog. An engineer with strong distributed tracing experience from any background can become productive with OTel in 2-3 weeks. Exception: if you're hiring someone to own OTel Collector infrastructure at scale, specific Collector experience saves significant ramp-up time.

Hiring OpenTelemetry Engineers: The Complete Guide

Site Reliability Engineer (SRE)

Commerce Platform Observability

Global Marketplace Telemetry Migration

Travel Search Distributed Tracing

Fashion E-Commerce Observability Platform

What OpenTelemetry Expertise Actually Means

Level 1: Instrumented Application User (Most Engineers)

Level 2: Instrumentation Implementer (Backend/Platform)

Level 3: Observability Platform Owner (Specialized)

The Three Pillars of OpenTelemetry

Traces (Distributed Tracing)

Metrics (Measurements and Aggregations)

Logs (Structured Logging)

The OpenTelemetry Collector

Architecture Patterns

Key Collector Capabilities

When OpenTelemetry Expertise Matters (And When It Doesn't)

High Value: Observability Platform Teams

Medium Value: Backend/Microservices Engineers

Low Value: Most Application Developers

Real-World OpenTelemetry Usage Patterns

Pattern 1: Service Mesh Integration

Pattern 2: Sampling at Scale

Pattern 3: Migration from Proprietary SDKs

Pattern 4: Cross-Language Context Propagation

Recruiter's Cheat Sheet: Spotting Real Expertise

Conversation Starters That Reveal Skill Level

Resume Signals That Matter

GitHub/Portfolio Indicators

Common Hiring Mistakes

1. Requiring "3+ Years OpenTelemetry Experience"

2. Conflating OpenTelemetry with Observability Backends

3. Testing OTel API Knowledge in Interviews

4. Ignoring the Collector

5. Treating OTel and Vendor Experience as Mutually Exclusive

OpenTelemetry in the Broader Observability Landscape

OTel vs. Proprietary APM (Datadog, New Relic, Dynatrace)

OTel vs. OpenTracing/OpenCensus

The Vendor Landscape

The Future: OTel Everywhere

Frequently Asked Questions

Frequently Asked Questions

Should I require OpenTelemetry experience specifically, or is general observability experience enough?

OpenTelemetry seems complex. Do we really need dedicated OTel expertise, or can our backend engineers handle it?

We currently use Datadog/New Relic. Should candidates have OTel experience if we're not migrating?

What's the difference between hiring someone who "uses OTel" versus someone who "owns OTel"?

How do I assess OTel skills without testing API syntax?

Technology modifier

OpenTelemetry Engineers

OpenTelemetry Engineers

Market Pulse

Critical Skills (Must Haves)

Nice-to-Have (Bonus)

Top 5 Interview Questions

Quick Context

Common Mistakes

Interview Tips

Keep Exploring

Related Outcomes

Related Roles

Related Levels

Related Scenarios

Your next hire is already on daily.dev.