Case Study: Rate Limiter¶

A rate limiter controls the number of requests a user/client can make within a time window. Protects against abuse, DDoS, and ensures fair usage. Common algorithms: Token Bucket (most popular — smooth traffic), Sliding Window (precise), Fixed Window (simple but bursty at edges). Typically implemented at the API Gateway layer using Redis for distributed counting.

System Design¶

Step 1: Requirements

Functional: - Limit requests per user/IP/API key - Configurable rules (e.g., 100 req/min, 1000 req/hour) - Return 429 (Too Many Requests) when exceeded - Include rate limit headers in response

Non-Functional: - Low latency (must not slow down requests) - Distributed (works across multiple servers) - Fault-tolerant (if limiter fails, allow traffic)

Step 2: Algorithms

Token Bucket:

Bucket: max capacity = 10 tokens
Refill rate = 1 token/second

Request arrives:
  tokens > 0? → Allow, decrement
  tokens = 0? → Reject (429)

✅ Smooth traffic, allows bursts up to bucket size ✅ Simple, configurable

Sliding Window Log:

Store timestamp of each request in sorted set
On request: remove entries older than window
Count remaining: < limit → allow, else reject

✅ Precise. ❌ Memory-heavy (stores all timestamps).

Fixed Window Counter:

Window: 1 minute (e.g., 12:00-12:01)
Counter per window: increment on each request
Counter >= limit → reject

✅ Simple. ❌ Burst at window edges (200 req at 12:00:59 + 200 at 12:01:00).

Sliding Window Counter (hybrid):

previous_window_count × overlap% + current_window_count
Approximation of sliding window, low memory

Step 3: Distributed Implementation (Redis)

public boolean isAllowed(String userId) {
    String key = "rate_limit:" + userId;
    long count = redis.incr(key);
    if (count == 1) {
        redis.expire(key, 60);  // 60 second window
    }
    return count <= LIMIT;
}

Token bucket in Redis (Lua script for atomicity):

local tokens = tonumber(redis.call("get", KEYS[1]) or ARGV[2])
local last_refill = tonumber(redis.call("get", KEYS[2]) or ARGV[3])
local now = tonumber(ARGV[3])
local elapsed = now - last_refill
tokens = math.min(ARGV[2], tokens + elapsed * ARGV[4])  -- refill
if tokens >= 1 then
    tokens = tokens - 1
    redis.call("set", KEYS[1], tokens)
    redis.call("set", KEYS[2], now)
    return 1  -- allowed
end
return 0  -- rejected

Step 4: Response Headers

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640995260
Retry-After: 30

Step 5: Where to Place

Client → API Gateway (rate limiter) → Services

API Gateway: Most common, centralized
Middleware: Per-service rate limiting
Client-side: Unreliable (can be bypassed)

Common Follow-up Questions

How do you handle rate limiting in a distributed system?
What happens when the rate limiter (Redis) goes down?
How do you implement different limits for different API tiers?
How would you rate-limit by IP vs by user?
What is the difference between rate limiting and throttling?