Skip to main content

Monitoring

Monitoring tracks system health through metrics, logs, and alerts.

The Three Pillars

PillarTool examplesPurpose
MetricsPrometheus, GrafanaNumeric trends and dashboards
LogsELK, Loki, journalctlEvent investigation
TracesJaeger, ZipkinRequest flow across services

Key Metrics

  • Latency — response time percentiles (p50, p95, p99)
  • Traffic — requests per second
  • Errors — error rate percentage
  • Saturation — CPU, memory, disk usage