Reliability Overview
Building systems that continue operating correctly even when parts fail.
Key Concepts
| Concept | Description |
|---|---|
| Redundancy | Duplicate critical components |
| Failover | Automatic switch to backup |
| Circuit breaker | Stop calling failing services |
| Graceful degradation | Reduce functionality instead of crashing |
Observability
Three pillars for understanding production systems:
- Metrics — numeric measurements over time
- Logs — discrete event records
- Traces — request flow across services