Normalization¶
Normalization reduces data redundancy and ensures integrity by organizing tables. 1NF — atomic values, no repeating groups. 2NF — 1NF + no partial dependencies (every non-key column depends on the FULL primary key). 3NF — 2NF + no transitive dependencies (non-key columns depend only on the key). Denormalization intentionally adds redundancy for read performance (common in OLAP/analytics).
Key Concepts¶
Deep Dive: Normal Forms with Examples
Unnormalized: | OrderID | Customer | Items | |---------|----------|-------| | 1 | John | Laptop, Mouse | | 2 | Jane | Keyboard |
1NF — Atomic values, no repeating groups: | OrderID | Customer | Item | |---------|----------|------| | 1 | John | Laptop | | 1 | John | Mouse | | 2 | Jane | Keyboard |
2NF — Remove partial dependencies:
Problem: Customer depends on OrderID, not the full key (OrderID, Item).
Orders table: | OrderID | Customer | |---------|----------| | 1 | John | | 2 | Jane |
OrderItems table: | OrderID | Item | |---------|------| | 1 | Laptop | | 1 | Mouse | | 2 | Keyboard |
3NF — Remove transitive dependencies:
Problem: If Customer → City (city depends on customer, not order).
Customers table: | CustomerID | Name | City | |------------|------|------| | 1 | John | NYC | | 2 | Jane | LA |
Orders table: | OrderID | CustomerID | |---------|------------| | 1 | 1 | | 2 | 2 |
Deep Dive: When to Denormalize
Denormalization adds redundancy intentionally for performance.
Scenarios: - Read-heavy workloads — avoid expensive JOINs - Analytics/reporting — pre-computed aggregates - Caching layer — materialized views
Example:
-- Normalized (requires JOIN)
SELECT o.id, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Denormalized (no JOIN, faster read)
SELECT id, customer_name, customer_email
FROM orders_denormalized;
Trade-offs: | | Normalized | Denormalized | |-|-----------|-------------| | Redundancy | Minimal | Intentional | | Write speed | Faster | Slower (update multiple places) | | Read speed | Slower (JOINs) | Faster (no JOINs) | | Consistency | Easier | Harder (sync redundant data) | | Use case | OLTP (transactions) | OLAP (analytics) |
Common Interview Questions
- What is normalization? Why is it important?
- Explain 1NF, 2NF, and 3NF with examples.
- What is denormalization? When would you use it?
- What is the difference between OLTP and OLAP?
- Can you over-normalize a database?
- What is a partial dependency? Transitive dependency?