Skip to content

Case Study: Notification Service

A notification service sends alerts to users across multiple channels: push notifications, email, SMS, in-app. Architecture: event-driven with a message queue (Kafka) for reliability and decoupling. Key components: notification router (determines channel + priority), template engine, delivery services per channel, user preferences, retry with exponential backoff for failures.

System Design

Step 1: Requirements

Functional: - Send notifications via push, email, SMS, in-app - User preferences (opt-in/out per channel) - Template-based messages - Scheduling (send later) - Delivery tracking (sent, delivered, read)

Non-Functional: - At-least-once delivery - Handle millions of notifications/day - Low latency for real-time alerts - Fault-tolerant (retry failed deliveries)

Step 2: High-Level Architecture
Event Producers          Notification Service           Delivery
┌──────────┐          ┌──────────────────┐          ┌──────────┐
│ Order    │─events─→│ Message Queue    │─────────→│ Push     │
│ Service  │         │ (Kafka)          │          │ (FCM/APNs│
├──────────┤         ├──────────────────┤          ├──────────┤
│ Auth     │─events─→│ Notification     │─────────→│ Email    │
│ Service  │         │ Router           │          │ (SES)    │
├──────────┤         ├──────────────────┤          ├──────────┤
│ Payment  │─events─→│ Template Engine  │─────────→│ SMS      │
│ Service  │         │ + Preferences    │          │ (Twilio) │
└──────────┘         └──────────────────┘          └──────────┘
Step 3: Notification Flow
1. Event produced (e.g., "order_placed")
2. Notification Service receives event from Kafka
3. Look up user preferences (channel, frequency, opt-in/out)
4. Select template + populate with data
5. Route to appropriate channel queue(s)
6. Delivery service sends via provider (FCM, SES, Twilio)
7. Track delivery status
8. On failure → retry with exponential backoff
   Max retries → move to dead-letter queue
Step 4: Database Design
CREATE TABLE notifications (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    type VARCHAR(50),       -- ORDER_PLACED, PASSWORD_RESET
    channel VARCHAR(20),    -- PUSH, EMAIL, SMS, IN_APP
    content TEXT,
    status VARCHAR(20),     -- PENDING, SENT, DELIVERED, FAILED
    retry_count INT DEFAULT 0,
    created_at TIMESTAMP,
    sent_at TIMESTAMP,
    delivered_at TIMESTAMP
);

CREATE TABLE user_preferences (
    user_id UUID,
    channel VARCHAR(20),
    enabled BOOLEAN DEFAULT TRUE,
    PRIMARY KEY (user_id, channel)
);
Step 5: Reliability & Scaling

Retry strategy:

Attempt 1: immediate
Attempt 2: after 1 min
Attempt 3: after 5 min
Attempt 4: after 30 min
After max retries → Dead Letter Queue → alert ops team

Scaling: - Kafka partitions per channel for parallelism - Separate consumer groups per channel - Rate limit per channel (email providers have send limits) - Batch notifications where possible (daily digest)

Deduplication:

Idempotency key: {event_id}:{user_id}:{channel}
Check before sending to prevent duplicate notifications

Common Follow-up Questions
  • How do you handle notification priorities (critical vs marketing)?
  • How do you implement a daily digest?
  • How do you handle delivery failures?
  • How do you prevent notification spam?
  • How do you track delivery and open rates?