Skip to content

Monitoring & Logging

Monitoring and logging are essential for maintaining production systems. Metrics (Prometheus + Grafana) track system health. Logging (ELK Stack) captures application events. Alerting triggers notifications when thresholds are breached. Distributed tracing (Jaeger/Zipkin) follows requests across services. The four golden signals: latency, traffic, errors, saturation.

Key Concepts

Deep Dive: Four Golden Signals
Signal What to Measure Example
Latency Request duration p99 response time < 200ms
Traffic Request volume 1000 RPS
Errors Failure rate Error rate < 0.1%
Saturation Resource usage CPU < 80%, Memory < 85%
Deep Dive: Monitoring Stack

Prometheus + Grafana: - Prometheus scrapes metrics from /actuator/prometheus - Grafana visualizes dashboards

Spring Boot metrics:

management.endpoints.web.exposure.include=prometheus,health,info
management.metrics.tags.application=my-app

Custom metrics:

@Service
public class OrderService {
    private final Counter orderCounter;
    private final Timer orderTimer;

    public OrderService(MeterRegistry registry) {
        this.orderCounter = registry.counter("orders.created");
        this.orderTimer = registry.timer("orders.processing.time");
    }

    public void createOrder(OrderRequest request) {
        orderTimer.record(() -> {
            processOrder(request);
            orderCounter.increment();
        });
    }
}

Deep Dive: Structured Logging
// Use structured logging with MDC
@Slf4j
@RestController
public class OrderController {
    @PostMapping("/orders")
    public Order createOrder(@RequestBody OrderRequest request) {
        MDC.put("userId", request.getUserId());
        MDC.put("orderId", UUID.randomUUID().toString());
        log.info("Creating order for amount: {}", request.getTotal());
        // ...
        MDC.clear();
    }
}

Log levels: TRACE < DEBUG < INFO < WARN < ERROR

ELK Stack: Elasticsearch (store) + Logstash (ingest) + Kibana (visualize)

Common Interview Questions
  • What are the four golden signals?
  • How do you monitor a microservices application?
  • What is the difference between metrics and logs?
  • What is distributed tracing?
  • How do you set up alerting?
  • What is the ELK stack?