Skip to content

Normalization

Normalization reduces data redundancy and ensures integrity by organizing tables. 1NF — atomic values, no repeating groups. 2NF — 1NF + no partial dependencies (every non-key column depends on the FULL primary key). 3NF — 2NF + no transitive dependencies (non-key columns depend only on the key). Denormalization intentionally adds redundancy for read performance (common in OLAP/analytics).

Key Concepts

Deep Dive: Normal Forms with Examples

Unnormalized: | OrderID | Customer | Items | |---------|----------|-------| | 1 | John | Laptop, Mouse | | 2 | Jane | Keyboard |


1NF — Atomic values, no repeating groups: | OrderID | Customer | Item | |---------|----------|------| | 1 | John | Laptop | | 1 | John | Mouse | | 2 | Jane | Keyboard |


2NF — Remove partial dependencies:

Problem: Customer depends on OrderID, not the full key (OrderID, Item).

Orders table: | OrderID | Customer | |---------|----------| | 1 | John | | 2 | Jane |

OrderItems table: | OrderID | Item | |---------|------| | 1 | Laptop | | 1 | Mouse | | 2 | Keyboard |


3NF — Remove transitive dependencies:

Problem: If Customer → City (city depends on customer, not order).

Customers table: | CustomerID | Name | City | |------------|------|------| | 1 | John | NYC | | 2 | Jane | LA |

Orders table: | OrderID | CustomerID | |---------|------------| | 1 | 1 | | 2 | 2 |

Deep Dive: When to Denormalize

Denormalization adds redundancy intentionally for performance.

Scenarios: - Read-heavy workloads — avoid expensive JOINs - Analytics/reporting — pre-computed aggregates - Caching layer — materialized views

Example:

-- Normalized (requires JOIN)
SELECT o.id, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;

-- Denormalized (no JOIN, faster read)
SELECT id, customer_name, customer_email
FROM orders_denormalized;

Trade-offs: | | Normalized | Denormalized | |-|-----------|-------------| | Redundancy | Minimal | Intentional | | Write speed | Faster | Slower (update multiple places) | | Read speed | Slower (JOINs) | Faster (no JOINs) | | Consistency | Easier | Harder (sync redundant data) | | Use case | OLTP (transactions) | OLAP (analytics) |

Common Interview Questions
  • What is normalization? Why is it important?
  • Explain 1NF, 2NF, and 3NF with examples.
  • What is denormalization? When would you use it?
  • What is the difference between OLTP and OLAP?
  • Can you over-normalize a database?
  • What is a partial dependency? Transitive dependency?