Case Study: Notification Service¶
A notification service sends alerts to users across multiple channels: push notifications, email, SMS, in-app. Architecture: event-driven with a message queue (Kafka) for reliability and decoupling. Key components: notification router (determines channel + priority), template engine, delivery services per channel, user preferences, retry with exponential backoff for failures.
System Design¶
Step 1: Requirements
Functional: - Send notifications via push, email, SMS, in-app - User preferences (opt-in/out per channel) - Template-based messages - Scheduling (send later) - Delivery tracking (sent, delivered, read)
Non-Functional: - At-least-once delivery - Handle millions of notifications/day - Low latency for real-time alerts - Fault-tolerant (retry failed deliveries)
Step 2: High-Level Architecture
Event Producers Notification Service Delivery
┌──────────┐ ┌──────────────────┐ ┌──────────┐
│ Order │─events─→│ Message Queue │─────────→│ Push │
│ Service │ │ (Kafka) │ │ (FCM/APNs│
├──────────┤ ├──────────────────┤ ├──────────┤
│ Auth │─events─→│ Notification │─────────→│ Email │
│ Service │ │ Router │ │ (SES) │
├──────────┤ ├──────────────────┤ ├──────────┤
│ Payment │─events─→│ Template Engine │─────────→│ SMS │
│ Service │ │ + Preferences │ │ (Twilio) │
└──────────┘ └──────────────────┘ └──────────┘
Step 3: Notification Flow
1. Event produced (e.g., "order_placed")
2. Notification Service receives event from Kafka
3. Look up user preferences (channel, frequency, opt-in/out)
4. Select template + populate with data
5. Route to appropriate channel queue(s)
6. Delivery service sends via provider (FCM, SES, Twilio)
7. Track delivery status
8. On failure → retry with exponential backoff
Max retries → move to dead-letter queue
Step 4: Database Design
CREATE TABLE notifications (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
type VARCHAR(50), -- ORDER_PLACED, PASSWORD_RESET
channel VARCHAR(20), -- PUSH, EMAIL, SMS, IN_APP
content TEXT,
status VARCHAR(20), -- PENDING, SENT, DELIVERED, FAILED
retry_count INT DEFAULT 0,
created_at TIMESTAMP,
sent_at TIMESTAMP,
delivered_at TIMESTAMP
);
CREATE TABLE user_preferences (
user_id UUID,
channel VARCHAR(20),
enabled BOOLEAN DEFAULT TRUE,
PRIMARY KEY (user_id, channel)
);
Step 5: Reliability & Scaling
Retry strategy:
Attempt 1: immediate
Attempt 2: after 1 min
Attempt 3: after 5 min
Attempt 4: after 30 min
After max retries → Dead Letter Queue → alert ops team
Scaling: - Kafka partitions per channel for parallelism - Separate consumer groups per channel - Rate limit per channel (email providers have send limits) - Batch notifications where possible (daily digest)
Deduplication:
Common Follow-up Questions
- How do you handle notification priorities (critical vs marketing)?
- How do you implement a daily digest?
- How do you handle delivery failures?
- How do you prevent notification spam?
- How do you track delivery and open rates?