Monitoring & Logging¶
Monitoring and logging are essential for maintaining production systems. Metrics (Prometheus + Grafana) track system health. Logging (ELK Stack) captures application events. Alerting triggers notifications when thresholds are breached. Distributed tracing (Jaeger/Zipkin) follows requests across services. The four golden signals: latency, traffic, errors, saturation.
Key Concepts¶
Deep Dive: Four Golden Signals
| Signal | What to Measure | Example |
|---|---|---|
| Latency | Request duration | p99 response time < 200ms |
| Traffic | Request volume | 1000 RPS |
| Errors | Failure rate | Error rate < 0.1% |
| Saturation | Resource usage | CPU < 80%, Memory < 85% |
Deep Dive: Monitoring Stack
Prometheus + Grafana:
- Prometheus scrapes metrics from /actuator/prometheus
- Grafana visualizes dashboards
Spring Boot metrics:
management.endpoints.web.exposure.include=prometheus,health,info
management.metrics.tags.application=my-app
Custom metrics:
@Service
public class OrderService {
private final Counter orderCounter;
private final Timer orderTimer;
public OrderService(MeterRegistry registry) {
this.orderCounter = registry.counter("orders.created");
this.orderTimer = registry.timer("orders.processing.time");
}
public void createOrder(OrderRequest request) {
orderTimer.record(() -> {
processOrder(request);
orderCounter.increment();
});
}
}
Deep Dive: Structured Logging
// Use structured logging with MDC
@Slf4j
@RestController
public class OrderController {
@PostMapping("/orders")
public Order createOrder(@RequestBody OrderRequest request) {
MDC.put("userId", request.getUserId());
MDC.put("orderId", UUID.randomUUID().toString());
log.info("Creating order for amount: {}", request.getTotal());
// ...
MDC.clear();
}
}
Log levels: TRACE < DEBUG < INFO < WARN < ERROR
ELK Stack: Elasticsearch (store) + Logstash (ingest) + Kibana (visualize)
Common Interview Questions
- What are the four golden signals?
- How do you monitor a microservices application?
- What is the difference between metrics and logs?
- What is distributed tracing?
- How do you set up alerting?
- What is the ELK stack?