Global Streaming Data Platform
Operates one of the world's largest Cassandra deployments with 10,000+ nodes, storing viewing history, personalization data, and microservices state for 230+ million subscribers across multiple regions.
Message Storage System
Stores 4+ trillion messages with real-time persistence, handling massive write volumes from millions of concurrent users across global servers with low-latency requirements.
iCloud Data Infrastructure
Manages 10+ petabytes of user data for iCloud services, including device sync state and iMessage delivery, with 99.99% availability requirements across 175+ countries.
Real-Time Location & Trip Data
Stores driver locations, trip history, and real-time dispatch data for millions of rides per day, requiring sub-second writes and reads for the matching algorithm.
What Cassandra Developers Actually Build
Before defining your role, understand what Cassandra work looks like at companies operating at massive scale:
Streaming & Entertainment
Netflix runs one of the world's largest Cassandra deployments, handling data for 230+ million subscribers:
- Viewing history and watch progress across devices
- Personalization and recommendation data storage
- A/B testing and experimentation data
- Microservices state management across regions
- Over 10,000 Cassandra nodes globally
Discord uses Cassandra to store over 4 trillion messages:
- Real-time message persistence at massive scale
- User presence and status tracking
- Channel and server metadata
- Cassandra-backed search indexing
- Multi-region replication for global users
Consumer Platforms
Apple relies on Cassandra for iCloud infrastructure:
- 10+ petabytes of user data
- Device synchronization state
- iMessage delivery tracking
- Global availability across 175+ countries
Instagram manages user activity at unprecedented scale:
- Feed generation and ranking data
- User interactions (likes, comments, follows)
- Story and Reel engagement tracking
- Activity logging for 2+ billion accounts
Fintech & E-Commerce
PayPal processes financial data with Cassandra:
- Transaction logging and audit trails
- User account state management
- Fraud detection data pipelines
- Multi-datacenter disaster recovery
Walmart powers e-commerce operations:
- Real-time inventory tracking
- Shopping cart persistence
- Order history across channels
- Black Friday traffic handling
Cassandra vs Other Distributed Databases
This comes up constantly in hiring discussions. Understanding the competitive landscape helps evaluate candidates who grasp architectural trade-offs.
Cassandra vs DynamoDB
| Aspect | Cassandra | DynamoDB |
|---|---|---|
| Deployment | Self-managed or DataStax (any cloud) | AWS-managed only |
| Cost Model | Infrastructure cost + ops | Pay-per-request or provisioned |
| Scaling | Manual node addition | Automatic (with limits) |
| Data Model | Wide-column with secondary indexes | Key-value with GSI/LSI |
| Multi-Region | Native multi-datacenter | DynamoDB Global Tables |
| Consistency | Tunable (ONE to ALL) | Eventually or strongly consistent |
| Vendor Lock-in | None | AWS-specific |
When companies choose Cassandra over DynamoDB:
- Multi-cloud or hybrid cloud requirements
- Need to avoid vendor lock-in
- Large data volumes where DynamoDB costs become prohibitive
- Require fine-grained consistency tuning
- Need on-premises deployment option
When companies choose DynamoDB over Cassandra:
- Fully serverless architecture on AWS
- Small to medium data volumes
- Don't want operational overhead
- Already deep in AWS ecosystem
Cassandra vs ScyllaDB
ScyllaDB is a C++ rewrite of Cassandra claiming 10x performance improvements:
| Aspect | Cassandra | ScyllaDB |
|---|---|---|
| Language | Java (JVM) | C++ (native) |
| Performance | High | Very high (lower latency) |
| Ecosystem | Mature, extensive tooling | Growing, Cassandra-compatible |
| GC Pauses | Can occur (JVM) | None |
| Community | Large, established | Smaller but active |
| Production Use | Netflix, Apple, Discord | Discord (partial), Comcast |
What this means for hiring:
Candidates who've evaluated both and can articulate trade-offs demonstrate senior-level thinking. ScyllaDB uses the same CQL and drivers, so skills transfer directly.
Cassandra vs MongoDB
| Aspect | Cassandra | MongoDB |
|---|---|---|
| Data Model | Wide-column | Document |
| Write Performance | Exceptional | Good |
| Query Flexibility | Limited (denormalized) | Rich query language |
| Scaling | Linear horizontal | Horizontal (sharding) |
| Use Case | Time-series, high writes | Flexible documents, aggregations |
Cassandra excels when: High write throughput, time-series data, multi-datacenter replication, write-heavy workloads, known query patterns.
MongoDB excels when: Flexible schemas, complex queries, document-centric data, smaller to medium scale.
The Modern Cassandra Developer (2024-2026)
Cassandra expertise has evolved significantly. Modern developers work with managed services and cloud-native patterns.
Managed Cassandra Services
Today's Cassandra developers often use managed offerings:
- DataStax Astra: Serverless Cassandra-as-a-service
- Amazon Keyspaces: Cassandra-compatible on AWS (not full Cassandra)
- Azure Cosmos DB (Cassandra API): Cassandra-compatible interface
- Instaclustr: Managed Cassandra clusters
A modern Cassandra developer understands trade-offs between managed services and self-hosted deployments—and knows that Amazon Keyspaces has compatibility limitations.
Cloud-Native Cassandra
Kubernetes operators now manage Cassandra:
- K8ssandra: Production-ready Cassandra on Kubernetes
- CassKop: Kubernetes operator for Cassandra
- DataStax Kubernetes Operator: Enterprise-grade orchestration
Modern Tooling
Strong candidates know the modern tooling ecosystem:
- cqlsh: Command-line CQL shell
- DataStax DevCenter: Visual CQL development
- Medusa: Backup and restore tool
- Reaper: Anti-entropy repair automation
- Cassandra Exporter: Prometheus metrics
- Stargate: Data API gateway for Cassandra
Skill Levels: What to Test For
Level 1: Basic Cassandra (Every Backend Developer)
- Write CQL queries (SELECT, INSERT, UPDATE, DELETE)
- Understand keyspace and table concepts
- Use Cassandra client drivers
- Basic understanding of partition keys
- Connect and run queries
Red flag: Only knows SQL and tries to apply relational patterns
Level 2: Competent Cassandra User
- Designs effective data models (partition keys, clustering columns)
- Understands consistency levels (ONE, QUORUM, ALL)
- Writes efficient queries (avoids ALLOW FILTERING)
- Handles TTL for data expiration
- Understands replication factor and datacenter concepts
- Can explain when to use Cassandra vs. alternatives
This is the minimum for backend developers building high-scale applications.
Level 3: Cassandra Expert / Distributed Systems Engineer
- Designs distributed data architectures for multi-datacenter
- Optimizes cluster performance and capacity planning
- Manages compaction strategies and disk I/O
- Handles node failures, repairs, and cluster maintenance
- Understands tombstones, read repairs, and anti-entropy
- Tunes JVM and Cassandra configurations
This is Database Engineer or Distributed Systems Engineer territory.
Recruiter's Cheat Sheet: Spotting Great Candidates
Conversation Starters That Reveal Skill Level
| Question | Junior Answer | Senior Answer |
|---|---|---|
| "How do you design a Cassandra data model?" | "Create tables like SQL" | Explains partition keys, clustering columns, denormalization, query-first design, and avoiding large partitions |
| "What's the difference between consistency levels?" | "They're different options" | Explains ONE vs QUORUM vs ALL trade-offs, LOCAL_QUORUM for multi-DC, impact on availability and latency |
| "A query is slow. What do you do?" | "Add an index" | Checks partition size, uses tracing, reviews data model, considers if ALLOW FILTERING snuck in |
| "When wouldn't you use Cassandra?" | "Always use Cassandra" | Explains when relational DBs win (complex queries, ACID), when Redis wins (caching), when DynamoDB might be simpler |
| "How do you handle a failed node?" | "Restart it" | Explains Cassandra's automatic failover, repair process, when to replace vs. recover |
Resume Signals That Matter
✅ Look for:
- Specific scale improvements ("Handled 1M writes/second")
- Production scale experience ("Managed 200-node Cassandra cluster")
- Mentions specific features (multi-datacenter replication, compaction tuning, repair)
- Data modeling experience (not just basic CQL queries)
- Distributed systems background (understands CAP theorem in practice)
- Experience with related tools (Kafka, Spark, real-time pipelines)
🚫 Be skeptical of:
- Only lists "Cassandra" without context
- No mention of data modeling or distributed systems
- "Expert in Cassandra" with only tutorial projects
- Claims Cassandra expertise but describes relational patterns
- No production environment experience
GitHub/Portfolio Green Flags
- Data model designs with partition key reasoning
- Performance optimization case studies
- Multi-datacenter deployment documentation
- Scripts for cluster operations and monitoring
- Contributions to Cassandra or related tools
Common Hiring Mistakes
1. Testing SQL Knowledge Only
Cassandra requires fundamentally different thinking. Candidates who ace SQL interviews may struggle with Cassandra's query-first, denormalized approach. Test CQL, data modeling for wide-column stores, and distributed systems understanding.
Better approach: Give them a use case (user activity logging) and ask for a data model. Watch if they ask about query patterns first.
2. Ignoring Data Modeling Expertise
Many developers can write CQL but design terrible data models. Cassandra's performance depends almost entirely on partition key design. Hot partitions and unbounded partition growth cause production outages.
Netflix's approach: Their interviews focus on partition key design and understanding query patterns before writing any CQL.
3. Overlooking Distributed Systems Knowledge
Cassandra is fundamentally a distributed system. Good candidates understand CAP theorem implications, consistency trade-offs, and why "eventual consistency" isn't actually a problem for most use cases.
4. Hiring a DBA When You Need Application Integration
If you need developers to integrate Cassandra into applications, don't require cluster management expertise. Conversely, if you need someone to run a 500-node cluster, don't test only CQL skills.
5. Conflating Cassandra with Relational Databases
Candidates who think "I know PostgreSQL so I know databases" often struggle. Cassandra requires different mental models—denormalization is mandatory, JOINs don't exist, and you model for queries, not entities.
Why Companies Choose Cassandra
High Write Throughput
Cassandra's write path is optimized for speed. Data writes to memory (memtable) and commit log simultaneously, with background flushes to SSTables. This architecture handles millions of writes per second per cluster.
Netflix handles: 1+ million writes per second across their global clusters
Linear Horizontal Scalability
Adding nodes increases capacity linearly. Double the nodes, double the throughput. No single master bottleneck, no sharding configuration—Cassandra's consistent hashing handles distribution automatically.
Multi-Datacenter Replication
Built-in support for replicating data across datacenters with configurable consistency. LOCAL_QUORUM ensures writes succeed within a datacenter without waiting for cross-DC acknowledgment.
Discord's approach: Multi-region deployment ensures users always connect to nearby nodes
No Single Point of Failure
Every node is equal in the peer-to-peer architecture. Any node can handle any request. Node failures don't cause downtime—other nodes serve the data from replicas.
Tunable Consistency
Choose consistency per-query. Time-series data might use ONE for speed. Financial data might use ALL for safety. This flexibility lets architects optimize for specific use cases.
Understanding why companies choose Cassandra helps evaluate candidates who grasp trade-offs versus those who just learned the syntax.