Skip to content

Case Study: Chat System

Design a real-time chat system (like WhatsApp/Slack). Core: WebSockets for real-time messaging, message queue for delivery guarantees, database for persistence. Key features: 1:1 and group chat, online presence, read receipts, push notifications. Use WebSocket servers behind a load balancer, Kafka for message routing, Cassandra for message storage (write-heavy, time-series).

System Design

Step 1: Requirements

Functional: - 1:1 messaging and group chat - Online/offline status - Message history - Read receipts - Push notifications for offline users

Non-Functional: - Low latency (< 100ms delivery) - At-least-once delivery - Messages must be persistent and ordered - Handle millions of concurrent connections

Step 2: High-Level Architecture
┌──────────┐     ┌──────────────┐     ┌──────────────┐
│  Client  │←ws→│  Chat Server │←→  │  Message     │
│  (App)   │     │  (WebSocket) │     │  Queue       │
└──────────┘     └──────┬───────┘     │  (Kafka)     │
                       │              └──────┬───────┘
                ┌──────┴───────┐             │
                │  Presence    │      ┌──────┴───────┐
                │  Service     │      │  Message DB  │
                │  (Redis)     │      │ (Cassandra)  │
                └──────────────┘      └──────────────┘
Step 3: Real-Time Communication

WebSocket for persistent, bidirectional connection:

1. Client connects via WebSocket
2. Server maintains mapping: userId → WebSocket connection
3. On message send:
   sender → Chat Server → route to recipient's Chat Server → recipient

Why not HTTP polling? | Method | Latency | Resource Usage | |--------|---------|---------------| | HTTP Polling | High (interval) | Wasteful | | Long Polling | Medium | Moderate | | WebSocket | Low (real-time) | Efficient |

Step 4: Message Flow
1. Sender sends message via WebSocket
2. Chat Server stores in Kafka (durability)
3. Chat Server stores in Cassandra (persistence)
4. If recipient online → deliver via WebSocket
5. If recipient offline → push notification
6. Recipient acknowledges receipt → read receipt

Message storage (Cassandra):

CREATE TABLE messages (
    chat_id UUID,
    message_id TIMEUUID,
    sender_id UUID,
    content TEXT,
    created_at TIMESTAMP,
    PRIMARY KEY (chat_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
Partitioned by chat_id, ordered by time. Efficient for "get last N messages."

Step 5: Presence & Scaling

Online presence (Redis):

SETEX presence:user123 30 "online"   # 30 second TTL
# Client sends heartbeat every 10 seconds to refresh

Scaling WebSocket servers: - Each server handles ~100K connections - Use consistent hashing to route users to servers - Message routing between servers via Kafka

Common Follow-up Questions
  • How do you handle message ordering?
  • How do you implement group chat?
  • How do you handle offline users?
  • How do you implement end-to-end encryption?
  • How do you handle media (images, videos)?