Skip to content

Database Design

Database design involves choosing the right database, designing schemas, and defining relationships. Key decisions: SQL vs NoSQL, normalization vs denormalization, sharding strategy, indexing. Schema design should optimize for the most common query patterns. Use ERD (Entity-Relationship Diagrams) to visualize relationships.

Key Concepts

Deep Dive: Choosing the Right Database
Requirement Database Choice
ACID transactions PostgreSQL, MySQL
Flexible schema MongoDB
High write throughput Cassandra
Caching Redis
Full-text search Elasticsearch
Relationships/graphs Neo4j
Time-series data InfluxDB, TimescaleDB
Key-value lookups DynamoDB, Redis

Many systems use polyglot persistence — multiple databases for different needs:

User data → PostgreSQL (ACID)
Product search → Elasticsearch (full-text)
Sessions → Redis (fast reads)
Activity log → Cassandra (write-heavy)

Deep Dive: Schema Design Patterns

1. Many-to-Many with Junction Table:

CREATE TABLE students (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE courses (id SERIAL PRIMARY KEY, title TEXT);
CREATE TABLE enrollments (
    student_id INT REFERENCES students(id),
    course_id INT REFERENCES courses(id),
    enrolled_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (student_id, course_id)
);

2. Polymorphic Associations:

-- Comments on posts, videos, and photos
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    body TEXT,
    commentable_type VARCHAR(50),  -- 'Post', 'Video', 'Photo'
    commentable_id INT,
    created_at TIMESTAMP
);
CREATE INDEX idx_commentable ON comments (commentable_type, commentable_id);

3. Soft Delete:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT,
    deleted_at TIMESTAMP NULL  -- NULL = active, set = "deleted"
);
-- Always filter: WHERE deleted_at IS NULL

4. Audit Trail:

CREATE TABLE audit_log (
    id BIGSERIAL PRIMARY KEY,
    entity_type VARCHAR(50),
    entity_id INT,
    action VARCHAR(20),  -- INSERT, UPDATE, DELETE
    old_value JSONB,
    new_value JSONB,
    changed_by INT,
    changed_at TIMESTAMP DEFAULT NOW()
);

Deep Dive: Data Modeling for NoSQL (MongoDB)

Embed vs Reference:

Embed (denormalize) — when data is read together:

{
    "_id": "user1",
    "name": "John",
    "addresses": [
        { "street": "123 Main St", "city": "NYC" },
        { "street": "456 Oak Ave", "city": "LA" }
    ]
}

Reference (normalize) — when data is shared or large:

// users collection
{ "_id": "user1", "name": "John" }

// orders collection
{ "_id": "order1", "userId": "user1", "total": 100 }

Rule of thumb: - One-to-few → embed - One-to-many → embed or reference (depends on size) - Many-to-many → reference

Common Interview Questions
  • How do you choose between SQL and NoSQL?
  • Design the schema for a social media app.
  • What is polyglot persistence?
  • When do you embed vs reference in MongoDB?
  • How do you handle soft deletes?
  • Design a schema for an e-commerce platform.