Should we build custom WebSocket infrastructure or use a managed service?

Use managed services (Pusher, Ably, Supabase Realtime) unless you have compelling reasons not to. These services handle connection management, scaling, and reliability—work that takes months to build and maintain yourself. Build custom only when: (1) real-time is your core product and you need maximum control, (2) scale exceeds what managed services handle cost-effectively (typically 100K+ concurrent connections), (3) you need capabilities managed services don't offer, or (4) latency requirements are extremely demanding (<50ms). For most companies, managed services are the right choice. The engineering investment in custom infrastructure rarely pays off for non-core features.

How many concurrent connections should we plan for?

Start by understanding your concurrent user patterns, not total users. Rule of thumb: peak concurrent connections are typically 5-15% of daily active users, depending on use case. A chat app might see 10-15% concurrency; a notification system might see 3-5%. For planning: 10K concurrent connections is achievable with managed services and simple architecture. 100K requires thoughtful architecture but isn't exceptional. 1M+ is genuinely challenging and requires specialized expertise. Most companies never reach 100K concurrent—don't over-engineer for theoretical scale. Design for 10x your current needs, not 1000x.

What's the difference between WebSockets and Server-Sent Events?

WebSockets provide bidirectional communication—both client and server can send messages anytime. Server-Sent Events (SSE) are server-to-client only—the server pushes updates, but the client uses regular HTTP requests to send data. Choose WebSockets for: chat, collaborative editing, gaming, or any bidirectional real-time feature. Choose SSE for: notifications, live feeds, dashboards, or any feature where updates flow one direction. SSE advantages: simpler to implement, works over standard HTTP, automatic reconnection with last-event-ID, better firewall compatibility. Many companies over-use WebSockets when SSE would suffice. If your client only receives updates, consider SSE first.

How do we handle users on poor mobile connections?

Mobile connections are the primary challenge for real-time reliability. Key strategies: (1) Aggressive reconnection with exponential backoff—detect disconnection quickly and reconnect automatically. (2) Message persistence—queue messages server-side and replay on reconnection so nothing is lost. (3) Optimistic updates—show the user their action succeeded immediately, sync with server in background. (4) Connection status UI—make it clear when the user is offline or reconnecting. (5) Batch messages—reduce connection frequency by grouping updates. (6) Background/foreground handling—gracefully handle app backgrounding and restore state when foregrounded. Test on throttled networks and airplane mode recovery, not just office WiFi.

Hiring to Build Real-Time Features: The Complete Guide

Q: Do we need a dedicated real-time engineer or can any backend developer do this?

For most use cases, experienced backend developers can build real-time features effectively. Real-time has become a standard backend capability rather than a deep specialization. If you're using managed services (Pusher, Ably, Firebase) for chat, notifications, or live updates, any solid backend developer can integrate them. You need specialized real-time expertise only when: (1) you're building custom WebSocket infrastructure at scale (50K+ concurrent connections), (2) you're implementing collaborative editing with conflict resolution, or (3) real-time is your core product (like Slack or Discord). Start with good backend developers and managed services—you'll know when you need specialists because you'll hit specific limitations.

Q: How many concurrent connections should we plan for?

Start by understanding your concurrent user patterns, not total users. Rule of thumb: peak concurrent connections are typically 5-15% of daily active users, depending on use case. A chat app might see 10-15% concurrency; a notification system might see 3-5%. For planning: 10K concurrent connections is achievable with managed services and simple architecture. 100K requires thoughtful architecture but isn't exceptional. 1M+ is genuinely challenging and requires specialized expertise. Most companies never reach 100K concurrent—don't over-engineer for theoretical scale. Design for 10x your current needs, not 1000x.

Q: What's the difference between WebSockets and Server-Sent Events?

WebSockets provide bidirectional communication—both client and server can send messages anytime. Server-Sent Events (SSE) are server-to-client only—the server pushes updates, but the client uses regular HTTP requests to send data. Choose WebSockets for: chat, collaborative editing, gaming, or any bidirectional real-time feature. Choose SSE for: notifications, live feeds, dashboards, or any feature where updates flow one direction. SSE advantages: simpler to implement, works over standard HTTP, automatic reconnection with last-event-ID, better firewall compatibility. Many companies over-use WebSockets when SSE would suffice. If your client only receives updates, consider SSE first.

Q: How do we handle users on poor mobile connections?

Mobile connections are the primary challenge for real-time reliability. Key strategies: (1) Aggressive reconnection with exponential backoff—detect disconnection quickly and reconnect automatically. (2) Message persistence—queue messages server-side and replay on reconnection so nothing is lost. (3) Optimistic updates—show the user their action succeeded immediately, sync with server in background. (4) Connection status UI—make it clear when the user is offline or reconnecting. (5) Batch messages—reduce connection frequency by grouping updates. (6) Background/foreground handling—gracefully handle app backgrounding and restore state when foregrounded. Test on throttled networks and airplane mode recovery, not just office WiFi.

Backend Developer

Definition

A Backend Developer is a technical professional who designs, builds, and maintains software systems using programming languages and development frameworks. This specialized role requires deep technical expertise, continuous learning, and collaboration with cross-functional teams to deliver high-quality software products that meet business needs.

Backend Developer is a fundamental concept in tech recruiting and talent acquisition. In the context of hiring developers and technical professionals, backend developer plays a crucial role in connecting organizations with the right talent. Whether you're a recruiter, hiring manager, or candidate, understanding backend developer helps navigate the complex landscape of modern tech hiring. This concept is particularly important for developer-focused recruiting where technical expertise and cultural fit must be carefully balanced.

Read full definition

Overview

Real-time features deliver updates to users instantly without requiring page refreshes. This includes chat messaging, live notifications, collaborative document editing, real-time dashboards, multiplayer gaming, and live activity feeds. The technical approaches include WebSockets (bidirectional persistent connections), Server-Sent Events (server-to-client streams), and managed services like Pusher, Ably, or Firebase.

Building real-time isn't the deep specialization it once was. Modern frameworks and managed services have significantly lowered the barrier to entry. Socket.io, Supabase Realtime, and cloud-native solutions handle much of the complexity automatically. The expertise gap appears at scale: handling 10,000+ concurrent connections, ensuring global latency under 100ms, or building collaborative editing with conflict resolution.

For most companies, real-time is a capability within backend engineering, not a separate discipline. Hire solid backend engineers with event-driven experience, and they'll build your chat feature. Reserve specialized real-time expertise for truly demanding use cases.

What Success Looks Like

Before diving into hiring, understand what successful real-time implementation actually means for your product. Real-time features have unique success criteria beyond just "it works."

Signs You Made the Right Hires

Technical Execution

Messages deliver in under 100ms for 95% of users
System handles connection drops and reconnections gracefully
State synchronization works reliably across devices
No message ordering issues or duplicate deliveries
Scales predictably as concurrent users increase

Architecture Quality

Clear separation between real-time transport and business logic
Monitoring and observability built in from day one
Fallback mechanisms for degraded connectivity
Infrastructure costs scale linearly, not exponentially
Easy to add new real-time features to existing infrastructure

User Experience

Users perceive updates as "instant" (< 200ms perceived latency)
Presence indicators (online/typing) feel accurate
Optimistic updates make the interface feel responsive
Connection issues surface helpful feedback, not silent failures
Works reliably on mobile networks and variable connectivity

Red Flags in Implementation

Watch for these warning signs that indicate problems:

Increasing latency as user count grows
"Ghost" messages that appear then disappear
Reconnection storms when server restarts
Memory leaks in long-running connections
State drift between connected clients
High infrastructure costs relative to concurrent users

Technology Decisions: The Foundation

Your technology choice shapes your hiring needs. Make this decision before you start recruiting.

WebSockets vs Server-Sent Events vs Managed Services

Approach	Best For	Complexity	Talent Pool
Managed Services (Pusher, Ably, Firebase)	Simple real-time, rapid development	Low	Large (any backend dev)
Server-Sent Events	Server-to-client only (notifications, feeds)	Medium	Large
Socket.io / WebSocket Libraries	Custom bidirectional real-time	Medium	Large
Raw WebSockets	Maximum control and optimization	High	Smaller
Custom Protocol (WebRTC, QUIC)	Specialized use cases (gaming, video)	Very High	Small

When to Use Managed Services

Choose managed services (Pusher, Ably, Supabase Realtime) when:

Speed-to-market matters — You need chat or notifications in weeks, not months
Scale is moderate — Under 100K concurrent connections
Team is small — Don't have dedicated infrastructure engineers
Focus is elsewhere — Real-time is a feature, not your core product
Budget allows — $500-2000/month is acceptable for infrastructure

Real-world examples: SaaS notification systems, internal chat features, live dashboards for SMB products, MVPs testing real-time concepts.

Trade-offs:

Vendor lock-in and pricing at scale
Limited customization for edge cases
Dependency on third-party availability
May outgrow as you scale

When to Build Custom

Build custom WebSocket infrastructure when:

Real-time is your product — Chat apps, collaboration tools, gaming platforms
Scale requirements are high — Expecting millions of concurrent connections
Latency is critical — Need sub-50ms delivery for competitive advantage
Custom protocol needs — Standard pub/sub doesn't fit your use case
Cost optimization is essential — Managed service costs are prohibitive at scale

Real-world examples: Slack, Discord, Figma, multiplayer games, trading platforms, live video platforms.

Trade-offs:

Significant engineering investment (3-6 months for production-ready)
Operational complexity (monitoring, scaling, failover)
Need specialized expertise
Must handle edge cases yourself

Server-Sent Events: The Overlooked Option

Server-Sent Events (SSE) are underused but perfect for many scenarios:

Choose SSE when:

Data flows server-to-client only (notifications, feeds, dashboards)
You want simpler implementation than WebSockets
HTTP/2 infrastructure is already in place
Automatic reconnection handling is valuable

SSE advantages:

Works over standard HTTP (better firewall compatibility)
Automatic reconnection with last-event-ID
Simpler server implementation
No special load balancer configuration

SSE limitations:

No client-to-server channel (use REST for that)
Browser connection limits (6 per domain in HTTP/1.1)
Less suitable for high-frequency bidirectional communication

Roles You'll Need

The roles you hire depend on your technology choice and scale requirements.

For Managed Service Approach

You need backend developers with event-driven experience, not real-time specialists.

Backend Developer with Real-Time Experience

Core responsibilities:

Integrate managed real-time services (Pusher, Ably, Firebase)
Design event architecture and channel structure
Build server-side event publishing logic
Handle client-side subscription and state management
Optimize for cost and performance within service limits

Required skills:

Strong backend development (Node.js, Python, Go, etc.)
Event-driven architecture understanding
API design and REST principles
Basic understanding of WebSockets/pub-sub
Experience with at least one managed real-time service

What to assess:

Can they design a channel/room structure for your use case?
Have they handled reconnection and state recovery?
Do they understand event ordering and delivery guarantees?

For Custom WebSocket Implementation

You need experienced backend engineers with infrastructure knowledge.

Senior Backend Developer (Real-Time Focus)

Core responsibilities:

Design and implement WebSocket server infrastructure
Build connection management and session handling
Implement pub/sub patterns and message routing
Handle scaling, load balancing, and failover
Create monitoring and debugging systems for real-time traffic

Required skills:

Deep backend expertise (5+ years)
WebSocket protocol understanding
Event-driven and async programming patterns
Distributed systems concepts (eventual consistency, CAP theorem)
Experience with message queues (Redis Pub/Sub, RabbitMQ, Kafka)
Load balancing and horizontal scaling knowledge

What to assess:

How would they handle 100K concurrent connections?
What's their approach to connection state and sticky sessions?
How do they debug message delivery issues?
Experience with WebSocket at scale (not just tutorials)

For Collaborative Features (Google Docs-style)

This is the most specialized real-time domain.

Collaboration Engineer

Core responsibilities:

Implement real-time synchronization for collaborative editing
Design and build conflict resolution systems (CRDTs, OT)
Handle presence, cursors, and selection sharing
Optimize for latency-sensitive operations
Build offline support with eventual sync

Required skills:

Deep understanding of CRDTs or Operational Transformation
Experience with collaborative software
Strong algorithm and data structure background
Client and server-side development
Understanding of distributed consistency models

What to assess:

Can they explain CRDTs vs OT and when to use each?
Have they built collaborative features in production?
How do they handle conflicting edits?
Experience with offline-first architecture

Note: True collaboration engineers are rare and expensive. Consider using libraries like Yjs or Automerge before hiring for custom CRDT development.

Team Structure Options

Option 1: Feature Team (Most Common)

When to use: Real-time is a feature, not the core product

Team Size	Composition	Notes
Minimal (1-2)	1-2 backend developers	Using managed services
Standard (2-4)	2 backend + 1 frontend	Custom implementation
Scaled (4-6)	3 backend + 1 frontend + 1 DevOps	High-scale custom

How it works: Backend engineers own real-time infrastructure. Frontend engineers integrate real-time into the UI. The team ships features end-to-end.

Option 2: Platform Team

When to use: Real-time is foundational infrastructure used by multiple product teams

Team Size	Composition	Notes
Core (3-4)	2-3 backend + 1 DevOps	Builds reusable real-time platform
Scaled (5-7)	3-4 backend + 1-2 DevOps + 1 SRE	Enterprise-scale infrastructure

How it works: Platform team provides real-time capabilities as internal service. Product teams consume the platform to build features. Centralizes expertise and reduces duplication.

Option 3: Embedded Specialist

When to use: Multiple teams need real-time, but not enough for full platform team

Structure	How It Works
1 specialist, multiple teams	Specialist advises and reviews, teams implement
Rotating consultant model	Specialist joins each team for initial implementation

How it works: One experienced engineer provides guidance across teams. Teams own their implementations with specialist oversight. Good for scaling expertise without large dedicated team.

Hiring Sequence

For Managed Services Approach

Phase 1: Backend Foundation (Weeks 1-8)

Start with experienced backend developers:

Can implement managed service integration
Design event architecture
Build initial features (notifications, basic chat)

Phase 2: Frontend Integration (Weeks 4-12)

Add frontend capability:

Real-time UI components
Optimistic updates
Connection state management
Mobile considerations

Phase 3: Scale Optimization (As Needed)

Add infrastructure expertise when:

Costs are increasing faster than expected
Performance issues emerge
Considering migration from managed to custom

For Custom Implementation

Phase 1: Architect/Lead (Weeks 1-10)

Hire your most experienced real-time engineer first:

Makes foundational architecture decisions
Establishes patterns for connection handling
Sets up monitoring and debugging infrastructure

Critical: This person's decisions will affect everything. Don't rush.

Phase 2: Backend Engineers (Weeks 6-16)

Build out the team:

Implement features on established architecture
Scale connection infrastructure
Build specialized subsystems (presence, sync)

Phase 3: Frontend and DevOps (Weeks 10-20)

Complete the team:

Frontend real-time integration
DevOps for infrastructure scaling
SRE for reliability and monitoring

Common Pitfalls

1. Over-Engineering Before You Need It

The mistake: Building custom WebSocket infrastructure for your 1,000-user product because "we might need it later."

What happens: 6 months of engineering time, complex infrastructure to maintain, and the product still has 1,000 users. You've built Slack's infrastructure for a feature that Pusher handles for $50/month.

Better approach: Start with managed services. Migration to custom is straightforward when you actually need it. Most companies never need custom WebSocket infrastructure.

2. Underestimating Connection Management

The mistake: Focusing on message delivery while ignoring connection lifecycle: reconnection, state recovery, authentication refresh, graceful degradation.

What happens: Users experience disconnections with no feedback, lose messages during network switches, have to refresh to recover state. The feature works in demos but fails in production.

Better approach: Connection management is half the work:

Automatic reconnection with exponential backoff
State recovery and message replay on reconnect
Clear UI feedback for connection status
Offline queueing for messages sent during disconnection
Authentication token refresh without reconnection

3. Ignoring Mobile Network Reality

The mistake: Testing real-time on office WiFi and assuming it works everywhere.

What happens: Mobile users on 4G experience constant disconnections, high latency, and battery drain. What works on desktop fails on mobile.

Better approach: Test and design for mobile from the start:

Aggressive reconnection strategies
Batch messages to reduce connection frequency
Implement proper background/foreground handling
Consider battery impact of persistent connections
Test on throttled networks and airplane mode recovery

4. Hiring Real-Time Specialists Too Early

The mistake: Hiring a "WebSocket engineer" before you've validated the product needs sophisticated real-time.

What happens: Expensive hire builds infrastructure you don't need. They get bored because your real-time needs are simple. They leave for a company with actual scale challenges.

Better approach: Start with generalist backend engineers who have event-driven experience. Managed services handle most cases. Hire specialists only when you've genuinely outgrown simpler approaches—typically 50K+ concurrent connections or complex collaboration requirements.

5. No Observability Strategy

The mistake: Building real-time features without proper monitoring, treating WebSocket connections as black boxes.

What happens: Debugging production issues becomes guesswork. You can't tell if messages are slow, dropped, or never sent. Performance problems are discovered by users, not monitoring.

Better approach: Build observability from day one:

Connection metrics (count, duration, error rates)
Message metrics (latency, delivery rate, ordering issues)
Client-side telemetry (reconnection frequency, perceived latency)
Distributed tracing through the real-time pipeline
Alerting on anomalies before users complain

6. Premature Horizontal Scaling

The mistake: Designing for distributed WebSocket clusters before you need them.

What happens: Complexity explosion. Now you need sticky sessions, distributed pub/sub, state synchronization across nodes, and cluster coordination. Development slows to a crawl.

Better approach: Vertical scaling goes further than you think for WebSockets. A single well-optimized server can handle 100K+ connections. Design for horizontal scaling but implement vertical first. Add distribution only when single-node limits are actually reached.

Budget Planning

Team Costs (US Market, 2026)

Managed Services Approach

Role	Base Salary	Total Comp*
Backend Developer (Real-Time Experience)	$130-165K	$155-200K
Frontend Developer	$125-155K	$150-190K
Senior Backend Developer	$160-190K	$190-240K

3-person team: $500K-630K/year total comp

Custom Implementation Approach

Role	Base Salary	Total Comp*
Real-Time Architect/Lead	$180-220K	$220-290K
Senior Backend Developer	$160-190K	$190-240K
Backend Developer	$130-160K	$155-195K
DevOps/Infrastructure	$150-180K	$180-225K

5-person team: $850K-1.15M/year total comp

*Total comp includes equity, benefits, and employer costs (~20-30% overhead)

Infrastructure Costs

Service	Monthly Cost	Scale
Pusher	$50-500	Up to 100K connections
Ably	$100-1000	Up to 10M messages
Firebase Realtime	Pay-per-use	Variable
Custom (AWS/GCP)	$1000-5000+	Unlimited (you scale)

Cost Optimization Strategies

Evaluate managed services carefully — Calculate TCO including engineering time. Pusher at $500/month is cheap compared to 1 engineer-month building custom infrastructure.

Consider hybrid approaches — Use managed services for most features, custom for specialized high-volume cases.

Optimize connection patterns — Batch messages, use efficient protocols, close idle connections. Architecture decisions affect infrastructure costs significantly.

Remote hiring — Real-time expertise is location-independent. Remote hiring can reduce costs 20-30% while accessing broader talent.

The Trust Lens

Industry Reality

Frequently Asked Questions

For most use cases, experienced backend developers can build real-time features effectively. Real-time has become a standard backend capability rather than a deep specialization. If you're using managed services (Pusher, Ably, Firebase) for chat, notifications, or live updates, any solid backend developer can integrate them. You need specialized real-time expertise only when: (1) you're building custom WebSocket infrastructure at scale (50K+ concurrent connections), (2) you're implementing collaborative editing with conflict resolution, or (3) real-time is your core product (like Slack or Discord). Start with good backend developers and managed services—you'll know when you need specialists because you'll hit specific limitations.

Hiring to Build Real-Time Features: The Complete Guide

Backend Developer

Overview

What Success Looks Like

Signs You Made the Right Hires

Red Flags in Implementation

Technology Decisions: The Foundation

WebSockets vs Server-Sent Events vs Managed Services

When to Use Managed Services

When to Build Custom

Server-Sent Events: The Overlooked Option

Roles You'll Need

For Managed Service Approach

For Custom WebSocket Implementation

For Collaborative Features (Google Docs-style)

Team Structure Options

Option 1: Feature Team (Most Common)

Option 2: Platform Team

Option 3: Embedded Specialist

Hiring Sequence

For Managed Services Approach

For Custom Implementation

Common Pitfalls

1. Over-Engineering Before You Need It

2. Underestimating Connection Management

3. Ignoring Mobile Network Reality

4. Hiring Real-Time Specialists Too Early

5. No Observability Strategy

6. Premature Horizontal Scaling

Budget Planning

Team Costs (US Market, 2026)

Infrastructure Costs

Cost Optimization Strategies

The Trust Lens

Frequently Asked Questions

Frequently Asked Questions

Do we need a dedicated real-time engineer or can any backend developer do this?

Should we build custom WebSocket infrastructure or use a managed service?

How many concurrent connections should we plan for?

What's the difference between WebSockets and Server-Sent Events?

How do we handle users on poor mobile connections?

Hiring outcome guide

to Build Real-Time Features

to Build Real-Time Features Strategy

Define Your Requirements

Craft Your Message

Source Candidates

Screen Effectively

Close Strong

to Build Real-Time Features

Market Pulse

Critical Skills (Must Haves)

Nice-to-Have (Bonus)

Top 5 Interview Questions

Quick Context

Common Mistakes

Interview Tips

Red Flags

Keep Exploring

Related Roles

Related Stacks

Related Levels

Related Scenarios

The best teams don't wait.They're already here.

The best teams don't wait.
They're already here.