Query Optimization¶

Query optimization improves database performance. Key techniques: use indexes on WHERE/JOIN/ORDER BY columns, avoid SELECT *** (fetch only needed columns), use EXPLAIN to analyze query plans, avoid N+1 queries (batch or JOIN), use pagination instead of loading all rows, denormalize for read-heavy workloads, and use connection pooling** (HikariCP).

Key Concepts¶

Deep Dive: Common Performance Anti-Patterns

1. SELECT * (fetching all columns):

-- Bad
SELECT * FROM users WHERE id = 1;

-- Good (fetch only what you need)
SELECT name, email FROM users WHERE id = 1;

2. Missing index on WHERE/JOIN columns:

-- Slow (full table scan)
SELECT * FROM orders WHERE user_id = 5;

-- Fix: CREATE INDEX idx_orders_user_id ON orders (user_id);

3. Using functions on indexed columns:

-- Bad (index on created_at is NOT used)
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- Good (index-friendly)
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

4. Using OR instead of IN:

-- Harder to optimize
SELECT * FROM users WHERE status = 'ACTIVE' OR status = 'PENDING';

-- Better
SELECT * FROM users WHERE status IN ('ACTIVE', 'PENDING');

5. LIKE with leading wildcard:

-- Cannot use index
SELECT * FROM users WHERE name LIKE '%john%';

-- Can use index
SELECT * FROM users WHERE name LIKE 'john%';

Deep Dive: N+1 Query Problem

// 1 query to load users
List<User> users = userRepository.findAll();

// N additional queries (one per user)
for (User user : users) {
    List<Order> orders = orderRepository.findByUserId(user.getId());
}
// Total: 1 + N queries!

Solutions:

1. JOIN FETCH:

@Query("SELECT u FROM User u JOIN FETCH u.orders")
List<User> findAllWithOrders();
// 1 query with JOIN

2. Batch fetching:

spring.jpa.properties.hibernate.default_batch_fetch_size=20
# Loads orders in batches of 20 users

3. @EntityGraph:

@EntityGraph(attributePaths = {"orders"})
List<User> findAll();

Deep Dive: Pagination

Offset-based (simple but slow for large offsets):

SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 1000;
-- DB must scan 1020 rows, then discard 1000

Keyset/Cursor-based (fast for any page):

-- First page
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20;

-- Next page (use last row's value as cursor)
SELECT * FROM orders
WHERE created_at < '2024-01-15T10:30:00'
ORDER BY created_at DESC
LIMIT 20;
-- Uses index, constant-time regardless of page number

	Offset-based	Keyset-based
Performance	Degrades with page number	Constant
Jump to page N	Yes	No (sequential only)
Implementation	Simple	More complex
Use case	Small datasets, random access	Large datasets, infinite scroll

Deep Dive: Connection Pooling

Database connections are expensive to create. Connection pooling reuses connections.

HikariCP (Spring Boot default):

spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000

Sizing rule of thumb:

Pool size = (core_count * 2) + disk_spindles

For most apps: 10-20 connections is sufficient.

Common issues: - Connection leak — connection not returned to pool (use try-with-resources) - Pool exhaustion — all connections in use (increase pool or optimize queries) - Long-running transactions — hold connections too long

Common Interview Questions

How do you optimize a slow SQL query?
What is the N+1 query problem? How do you fix it?
What is EXPLAIN? How do you read a query plan?
What is connection pooling? Why is it important?
What is the difference between offset and keyset pagination?
Why should you avoid SELECT *?
How do you handle slow queries in production?
What is query caching?
How do you optimize JOINs?
What tools do you use for database performance monitoring?