When something breaks in production, the goal is to restore service as quickly as possible — then understand why.Documentation Index
Fetch the complete documentation index at: https://docs.yupvid.com/llms.txt
Use this file to discover all available pages before exploring further.
Severity levels
| Level | Description | Response target |
|---|---|---|
| SEV-1 | Production down or data loss | Immediate, all hands |
| SEV-2 | Significant degradation, major feature broken | Within 30 minutes |
| SEV-3 | Minor degradation, workaround available | Within 2 hours |
| SEV-4 | Cosmetic or low-impact issue | Next business day |
Responding to an alert
- Acknowledge the alert in your alerting tool to signal you’re on it.
- Assess severity — is this SEV-1/2 or lower?
- Open a war room — for SEV-1/2, create a Slack thread in
#incidentsand invite your on-call partner. - Mitigate first — roll back, disable a feature flag, or scale up before diagnosing root cause.
- Communicate — post updates to
#incidentsevery 15 minutes until resolved. - Resolve and document — mark the incident resolved and file a postmortem for SEV-1/2.
On-call expectations
Rotation schedule, escalation paths, and what to do when you’re paged.
Postmortem process
How to write a blameless postmortem and drive follow-through.