Query Optimization¶
Query optimization improves database performance. Key techniques: use indexes on WHERE/JOIN/ORDER BY columns, avoid SELECT *** (fetch only needed columns), use EXPLAIN to analyze query plans, avoid N+1 queries (batch or JOIN), use pagination instead of loading all rows, denormalize for read-heavy workloads, and use connection pooling** (HikariCP).
Key Concepts¶
Deep Dive: Common Performance Anti-Patterns
1. SELECT * (fetching all columns):
-- Bad
SELECT * FROM users WHERE id = 1;
-- Good (fetch only what you need)
SELECT name, email FROM users WHERE id = 1;
2. Missing index on WHERE/JOIN columns:
-- Slow (full table scan)
SELECT * FROM orders WHERE user_id = 5;
-- Fix: CREATE INDEX idx_orders_user_id ON orders (user_id);
3. Using functions on indexed columns:
-- Bad (index on created_at is NOT used)
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- Good (index-friendly)
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
4. Using OR instead of IN:
-- Harder to optimize
SELECT * FROM users WHERE status = 'ACTIVE' OR status = 'PENDING';
-- Better
SELECT * FROM users WHERE status IN ('ACTIVE', 'PENDING');
5. LIKE with leading wildcard:
Deep Dive: N+1 Query Problem
// 1 query to load users
List<User> users = userRepository.findAll();
// N additional queries (one per user)
for (User user : users) {
List<Order> orders = orderRepository.findByUserId(user.getId());
}
// Total: 1 + N queries!
Solutions:
1. JOIN FETCH:
@Query("SELECT u FROM User u JOIN FETCH u.orders")
List<User> findAllWithOrders();
// 1 query with JOIN
2. Batch fetching:
3. @EntityGraph:
Deep Dive: Pagination
Offset-based (simple but slow for large offsets):
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 1000;
-- DB must scan 1020 rows, then discard 1000
Keyset/Cursor-based (fast for any page):
-- First page
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20;
-- Next page (use last row's value as cursor)
SELECT * FROM orders
WHERE created_at < '2024-01-15T10:30:00'
ORDER BY created_at DESC
LIMIT 20;
-- Uses index, constant-time regardless of page number
| Offset-based | Keyset-based | |
|---|---|---|
| Performance | Degrades with page number | Constant |
| Jump to page N | Yes | No (sequential only) |
| Implementation | Simple | More complex |
| Use case | Small datasets, random access | Large datasets, infinite scroll |
Deep Dive: Connection Pooling
Database connections are expensive to create. Connection pooling reuses connections.
HikariCP (Spring Boot default):
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000
Sizing rule of thumb:
For most apps: 10-20 connections is sufficient.Common issues: - Connection leak — connection not returned to pool (use try-with-resources) - Pool exhaustion — all connections in use (increase pool or optimize queries) - Long-running transactions — hold connections too long
Common Interview Questions
- How do you optimize a slow SQL query?
- What is the N+1 query problem? How do you fix it?
- What is EXPLAIN? How do you read a query plan?
- What is connection pooling? Why is it important?
- What is the difference between offset and keyset pagination?
- Why should you avoid SELECT *?
- How do you handle slow queries in production?
- What is query caching?
- How do you optimize JOINs?
- What tools do you use for database performance monitoring?