Skip to content

Scalability

Scalability is the ability to handle increased load. Vertical scaling (scale up) — bigger machine. Horizontal scaling (scale out) — more machines. Key strategies: stateless services, database replication (read replicas), sharding (partition data), caching, async processing. Design for horizontal scaling — it has no ceiling.

Key Concepts

Deep Dive: Vertical vs Horizontal Scaling
Vertical (Scale Up) Horizontal (Scale Out)
How Bigger machine (CPU, RAM) More machines
Limit Hardware ceiling Virtually unlimited
Downtime Usually required Zero downtime
Cost Expensive at scale Commodity hardware
Complexity Simple More complex (distributed)
Deep Dive: Database Replication

Primary-Replica (Master-Slave):

Writes → Primary → Replicates → Replica 1 (reads)
                               → Replica 2 (reads)
                               → Replica 3 (reads)

Benefits: - Read traffic distributed across replicas - High availability (promote replica if primary fails) - Backups from replica without impacting primary

Challenges: - Replication lag — replicas may be slightly behind - Write bottleneck — all writes go to one primary - Consistency — read-after-write may see stale data

Deep Dive: Database Sharding

Split data across multiple databases. Each shard holds a subset.

Users A-M → Shard 1
Users N-Z → Shard 2

Sharding strategies: | Strategy | How | Pros | Cons | |----------|-----|------|------| | Range-based | By key range (A-M, N-Z) | Simple | Uneven distribution | | Hash-based | hash(key) % N | Even distribution | Resharding is hard | | Directory-based | Lookup table maps key → shard | Flexible | Lookup table = bottleneck |

Challenges: - Cross-shard queries (JOINs across shards) - Resharding when adding/removing shards - Hot spots (popular shards get more traffic)

Consistent hashing minimizes data movement when adding/removing shards.

Deep Dive: Stateless vs Stateful Services

Stateless — no session data stored on the server. Any server can handle any request.

Request + Auth Token → Any Server → Response

Stateful — session tied to a specific server.

Request → Must go to Server 2 (has the session)

Stateless is essential for horizontal scaling.

Where to store state: - Database (persistent) - Redis/Memcached (fast, shared) - JWT tokens (client-side)

Deep Dive: Rate Limiting

Protect services from being overwhelmed.

Algorithms: | Algorithm | Description | |-----------|-------------| | Token Bucket | Tokens added at fixed rate, request consumes a token | | Sliding Window Log | Track timestamps of requests in a window | | Sliding Window Counter | Approximate count in sliding window | | Fixed Window Counter | Count requests in fixed time windows |

Implementation:

Key: rate_limit:{user_id}:{window}
Value: request count
TTL: window size

if count < limit → allow, increment
else → reject (HTTP 429)

Common Interview Questions
  • What is the difference between vertical and horizontal scaling?
  • What is database sharding? What are the strategies?
  • What is consistent hashing?
  • What is database replication? What is replication lag?
  • Why are stateless services important for scaling?
  • What is rate limiting? Name some algorithms.
  • How would you scale a system from 1K to 1M users?