SQL Fundamentals¶
SQL (Structured Query Language) is used to manage relational databases. Key operations: SELECT (read), INSERT/UPDATE/DELETE (write). Joins: INNER (matching rows), LEFT (all from left + matching), RIGHT, FULL OUTER, CROSS. Aggregations: GROUP BY with HAVING. Subqueries vs CTEs. Know the difference between WHERE and HAVING, and UNION vs UNION ALL.
JOIN Types¶
INNER JOIN — only matching rows from both tables. LEFT JOIN — all rows from left table + matches from right (NULL if no match). RIGHT JOIN — opposite. FULL OUTER JOIN — all rows from both. CROSS JOIN — Cartesian product. SELF JOIN — join table with itself (e.g., employee-manager hierarchy).
Deep Dive: Examples
-- INNER JOIN: Only matching rows
SELECT u.name, o.total
FROM users u INNER JOIN orders o ON u.id = o.user_id;
-- LEFT JOIN: All users + their orders (NULL if no orders)
SELECT u.name, o.total
FROM users u LEFT JOIN orders o ON u.id = o.user_id;
-- SELF JOIN: Employee-Manager hierarchy
SELECT e.name AS employee, m.name AS manager
FROM employees e LEFT JOIN employees m ON e.manager_id = m.id;
| Join Type | Returns |
|---|---|
| INNER | Only matching rows |
| LEFT | All left + matching right (NULL if no match) |
| RIGHT | All right + matching left (NULL if no match) |
| FULL OUTER | All from both (NULL where no match) |
| CROSS | Cartesian product (M × N rows) |
GROUP BY, HAVING & Aggregates¶
GROUP BY groups rows sharing a value. Aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX(). WHERE filters rows before grouping. HAVING filters groups after aggregation. This is why you can use aggregates in HAVING but not in WHERE.
Deep Dive: Examples
-- Count orders per user, only those with > 5 orders
SELECT u.name, COUNT(o.id) AS order_count, SUM(o.total) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.name
HAVING COUNT(o.id) > 5
ORDER BY total_spent DESC;
-- WHERE filters rows BEFORE grouping
-- HAVING filters groups AFTER aggregation
SELECT department, AVG(salary)
FROM employees
WHERE status = 'ACTIVE' -- Before grouping
GROUP BY department
HAVING AVG(salary) > 50000; -- After grouping
Subqueries vs CTEs¶
Subqueries are inline queries nested inside another query. CTEs (Common Table Expressions, WITH ... AS) are named temporary result sets — more readable, reusable within the query. Recursive CTEs can traverse hierarchical data (org charts, trees). Use CTEs for complex queries; subqueries for simple one-offs.
Deep Dive: Examples
Subquery:
CTE — more readable:
WITH high_spenders AS (
SELECT user_id, SUM(total) AS total_spent
FROM orders GROUP BY user_id HAVING SUM(total) > 1000
)
SELECT u.name, hs.total_spent
FROM users u JOIN high_spenders hs ON u.id = hs.user_id;
Recursive CTE — hierarchical data:
Window Functions¶
Window functions operate over a set of rows without collapsing them (unlike GROUP BY). ROW_NUMBER() — unique sequential number. RANK() / DENSE_RANK() — ranking with/without gaps. SUM() OVER() — running totals. LAG() / LEAD() — access previous/next row. Essential for analytics queries.
Deep Dive: Examples
-- ROW_NUMBER: rank employees by salary within each department
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
-- RANK vs DENSE_RANK (ties)
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank, -- 1, 2, 2, 4
DENSE_RANK() OVER (ORDER BY salary DESC) AS drank -- 1, 2, 2, 3
FROM employees;
-- Running total
SELECT order_date, total,
SUM(total) OVER (ORDER BY order_date) AS running_total
FROM orders;
-- LAG / LEAD: access previous/next row
SELECT order_date, total,
LAG(total) OVER (ORDER BY order_date) AS prev_total,
total - LAG(total) OVER (ORDER BY order_date) AS diff
FROM orders;
SQL Execution Order¶
SQL doesn't execute in the order you write it. Actual order: FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT. This is why you can't use column aliases in WHERE (SELECT hasn't run yet) but can in ORDER BY.
Deep Dive: Order Diagram
1. FROM / JOIN — identify tables
2. WHERE — filter rows
3. GROUP BY — group rows
4. HAVING — filter groups
5. SELECT — compute columns
6. DISTINCT — remove duplicates
7. ORDER BY — sort results
8. LIMIT / OFFSET — paginate
Consequences:
- Can't use column aliases in WHERE (SELECT hasn't run yet)
- Can use aliases in ORDER BY (runs after SELECT)
- HAVING can use aggregates but WHERE can't
DML: INSERT, UPDATE, DELETE¶
INSERT adds rows. UPDATE modifies existing rows (always use WHERE to avoid updating all!). DELETE removes rows (always use WHERE). TRUNCATE removes all rows faster (no logging). DROP deletes the entire table structure. DELETE can be rolled back; TRUNCATE and DROP usually cannot.
Deep Dive: Examples
INSERT INTO users (name, email) VALUES ('John', 'john@test.com');
UPDATE users SET status = 'INACTIVE' WHERE last_login < '2024-01-01';
DELETE FROM users WHERE status = 'INACTIVE';
TRUNCATE TABLE temp_logs; -- Fast, removes all rows, resets auto-increment
DROP TABLE old_data; -- Deletes table entirely
| Operation | Rows affected | Rollback? | Speed |
|---|---|---|---|
| DELETE | WHERE-filtered | ✅ | Slower (logged) |
| TRUNCATE | All rows | ❌ (usually) | Fast |
| DROP | Entire table | ❌ | Instant |
Common Interview Questions¶
Common Interview Questions
- What is the difference between INNER JOIN and LEFT JOIN?
- What is the difference between WHERE and HAVING?
- Write a query to find the second highest salary.
- What are window functions? Give an example.
- What is the difference between UNION and UNION ALL?
- What is a CTE? When would you use one?
- In what order does SQL execute a query?
- What is the difference between DELETE, TRUNCATE, and DROP?
- What is a correlated subquery?
- Write a query to find duplicate rows.