SQL Fundamentals¶

SQL (Structured Query Language) is used to manage relational databases. Key operations: SELECT (read), INSERT/UPDATE/DELETE (write). Joins: INNER (matching rows), LEFT (all from left + matching), RIGHT, FULL OUTER, CROSS. Aggregations: GROUP BY with HAVING. Subqueries vs CTEs. Know the difference between WHERE and HAVING, and UNION vs UNION ALL.

JOIN Types¶

INNER JOIN — only matching rows from both tables. LEFT JOIN — all rows from left table + matches from right (NULL if no match). RIGHT JOIN — opposite. FULL OUTER JOIN — all rows from both. CROSS JOIN — Cartesian product. SELF JOIN — join table with itself (e.g., employee-manager hierarchy).

Deep Dive: Examples

-- INNER JOIN: Only matching rows
SELECT u.name, o.total
FROM users u INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN: All users + their orders (NULL if no orders)
SELECT u.name, o.total
FROM users u LEFT JOIN orders o ON u.id = o.user_id;

-- SELF JOIN: Employee-Manager hierarchy
SELECT e.name AS employee, m.name AS manager
FROM employees e LEFT JOIN employees m ON e.manager_id = m.id;

Join Type	Returns
INNER	Only matching rows
LEFT	All left + matching right (NULL if no match)
RIGHT	All right + matching left (NULL if no match)
FULL OUTER	All from both (NULL where no match)
CROSS	Cartesian product (M × N rows)

GROUP BY, HAVING & Aggregates¶

GROUP BY groups rows sharing a value. Aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX(). WHERE filters rows before grouping. HAVING filters groups after aggregation. This is why you can use aggregates in HAVING but not in WHERE.

Deep Dive: Examples

-- Count orders per user, only those with > 5 orders
SELECT u.name, COUNT(o.id) AS order_count, SUM(o.total) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.name
HAVING COUNT(o.id) > 5
ORDER BY total_spent DESC;

-- WHERE filters rows BEFORE grouping
-- HAVING filters groups AFTER aggregation
SELECT department, AVG(salary)
FROM employees
WHERE status = 'ACTIVE'      -- Before grouping
GROUP BY department
HAVING AVG(salary) > 50000;  -- After grouping

Subqueries vs CTEs¶

Subqueries are inline queries nested inside another query. CTEs (Common Table Expressions, WITH ... AS) are named temporary result sets — more readable, reusable within the query. Recursive CTEs can traverse hierarchical data (org charts, trees). Use CTEs for complex queries; subqueries for simple one-offs.

Deep Dive: Examples

Subquery:

SELECT name FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

CTE — more readable:

WITH high_spenders AS (
    SELECT user_id, SUM(total) AS total_spent
    FROM orders GROUP BY user_id HAVING SUM(total) > 1000
)
SELECT u.name, hs.total_spent
FROM users u JOIN high_spenders hs ON u.id = hs.user_id;

Recursive CTE — hierarchical data:

WITH RECURSIVE org_chart AS (
    SELECT id, name, manager_id, 0 AS level
    FROM employees WHERE manager_id IS NULL
    UNION ALL
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM employees e JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level;

Window Functions¶

Window functions operate over a set of rows without collapsing them (unlike GROUP BY). ROW_NUMBER() — unique sequential number. RANK() / DENSE_RANK() — ranking with/without gaps. SUM() OVER() — running totals. LAG() / LEAD() — access previous/next row. Essential for analytics queries.

Deep Dive: Examples

-- ROW_NUMBER: rank employees by salary within each department
SELECT name, department, salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;

-- RANK vs DENSE_RANK (ties)
SELECT name, salary,
    RANK() OVER (ORDER BY salary DESC) AS rank,         -- 1, 2, 2, 4
    DENSE_RANK() OVER (ORDER BY salary DESC) AS drank   -- 1, 2, 2, 3
FROM employees;

-- Running total
SELECT order_date, total,
    SUM(total) OVER (ORDER BY order_date) AS running_total
FROM orders;

-- LAG / LEAD: access previous/next row
SELECT order_date, total,
    LAG(total) OVER (ORDER BY order_date) AS prev_total,
    total - LAG(total) OVER (ORDER BY order_date) AS diff
FROM orders;

SQL Execution Order¶

SQL doesn't execute in the order you write it. Actual order: FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT. This is why you can't use column aliases in WHERE (SELECT hasn't run yet) but can in ORDER BY.

Deep Dive: Order Diagram

1. FROM / JOIN     — identify tables
2. WHERE           — filter rows
3. GROUP BY        — group rows
4. HAVING          — filter groups
5. SELECT          — compute columns
6. DISTINCT        — remove duplicates
7. ORDER BY        — sort results
8. LIMIT / OFFSET  — paginate

Consequences: - Can't use column aliases in WHERE (SELECT hasn't run yet) - Can use aliases in ORDER BY (runs after SELECT) - HAVING can use aggregates but WHERE can't

DML: INSERT, UPDATE, DELETE¶

INSERT adds rows. UPDATE modifies existing rows (always use WHERE to avoid updating all!). DELETE removes rows (always use WHERE). TRUNCATE removes all rows faster (no logging). DROP deletes the entire table structure. DELETE can be rolled back; TRUNCATE and DROP usually cannot.

Deep Dive: Examples

INSERT INTO users (name, email) VALUES ('John', 'john@test.com');

UPDATE users SET status = 'INACTIVE' WHERE last_login < '2024-01-01';

DELETE FROM users WHERE status = 'INACTIVE';

TRUNCATE TABLE temp_logs;  -- Fast, removes all rows, resets auto-increment

DROP TABLE old_data;  -- Deletes table entirely

Operation	Rows affected	Rollback?	Speed
DELETE	WHERE-filtered	✅	Slower (logged)
TRUNCATE	All rows	❌ (usually)	Fast
DROP	Entire table	❌	Instant

Common Interview Questions¶

Common Interview Questions

What is the difference between INNER JOIN and LEFT JOIN?
What is the difference between WHERE and HAVING?
Write a query to find the second highest salary.
What are window functions? Give an example.
What is the difference between UNION and UNION ALL?
What is a CTE? When would you use one?
In what order does SQL execute a query?
What is the difference between DELETE, TRUNCATE, and DROP?
What is a correlated subquery?
Write a query to find duplicate rows.