Normalization¶

Normalization reduces data redundancy and ensures integrity by organizing tables. 1NF — atomic values, no repeating groups. 2NF — 1NF + no partial dependencies (every non-key column depends on the FULL primary key). 3NF — 2NF + no transitive dependencies (non-key columns depend only on the key). Denormalization intentionally adds redundancy for read performance (common in OLAP/analytics).

Key Concepts¶

Deep Dive: Normal Forms with Examples

Unnormalized: | OrderID | Customer | Items | |---------|----------|-------| | 1 | John | Laptop, Mouse | | 2 | Jane | Keyboard |

1NF — Atomic values, no repeating groups: | OrderID | Customer | Item | |---------|----------|------| | 1 | John | Laptop | | 1 | John | Mouse | | 2 | Jane | Keyboard |

2NF — Remove partial dependencies:

Problem: Customer depends on OrderID, not the full key (OrderID, Item).

Orders table: | OrderID | Customer | |---------|----------| | 1 | John | | 2 | Jane |

OrderItems table: | OrderID | Item | |---------|------| | 1 | Laptop | | 1 | Mouse | | 2 | Keyboard |

3NF — Remove transitive dependencies:

Problem: If Customer → City (city depends on customer, not order).

Customers table: | CustomerID | Name | City | |------------|------|------| | 1 | John | NYC | | 2 | Jane | LA |

Orders table: | OrderID | CustomerID | |---------|------------| | 1 | 1 | | 2 | 2 |

Deep Dive: When to Denormalize

Denormalization adds redundancy intentionally for performance.

Scenarios: - Read-heavy workloads — avoid expensive JOINs - Analytics/reporting — pre-computed aggregates - Caching layer — materialized views

Example:

-- Normalized (requires JOIN)
SELECT o.id, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;

-- Denormalized (no JOIN, faster read)
SELECT id, customer_name, customer_email
FROM orders_denormalized;

Trade-offs: | | Normalized | Denormalized | |-|-----------|-------------| | Redundancy | Minimal | Intentional | | Write speed | Faster | Slower (update multiple places) | | Read speed | Slower (JOINs) | Faster (no JOINs) | | Consistency | Easier | Harder (sync redundant data) | | Use case | OLTP (transactions) | OLAP (analytics) |

Common Interview Questions

What is normalization? Why is it important?
Explain 1NF, 2NF, and 3NF with examples.
What is denormalization? When would you use it?
What is the difference between OLTP and OLAP?
Can you over-normalize a database?
What is a partial dependency? Transitive dependency?