partitioningscalingroutingstate-distribution
Consistent Hashing
Distribute keys across nodes while minimizing remapped keys during node add/remove events.
Definition
Consistent hashing places nodes and keys on a hash ring so membership changes move only a bounded subset of keys.
When To Use
- Large keyspaces in caches, KV stores, and partitioned services.
- Clusters with frequent scale-out/scale-in operations.
- Systems where repartitioning cost must stay bounded.
When Not To Use
- Very small clusters where simpler modulo partitioning is acceptable.
- Workloads dominated by extreme hot keys without separate skew handling.
- Cases requiring strict range-based data locality.
Tradeoffs
- Limits remapping on topology changes, but adds ring/virtual-node management.
- Works well for uniform keys, but still needs skew controls for hot tenants.
- Improves availability during scaling, with added operational tuning complexity.
Common Failure Modes
- Insufficient virtual nodes causes uneven shard load.
- Ring metadata inconsistency routes requests to wrong owners.
- Rebalancing saturates network and destabilizes p99 latency.
Interview Framing
Use this structure when the interviewer asks for this pattern explicitly.
Explain virtual nodes, ownership metadata propagation, and rebalance throttling to avoid cascading instability.
Related Project Deep Dives
Distributed Cache System
Design a distributed cache system like Redis or Memcached that handles millions of requests per second with sub-millisecond latency, high availability, and intelligent eviction policies.
Distributed Key-Value Store
Design a strongly reliable key-value store with partitioning, replication, quorum reads/writes, and predictable low-latency access.
Related Concepts
Sharding Strategies
Partition data/work across shards to scale throughput and storage while controlling skew.
Quorum Consistency
Use read/write quorum sizes to balance consistency, availability, and latency in replicated stores.
Backpressure
Control producer rate based on downstream capacity to avoid queue explosions and cascading failures.