Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.yupvid.com/llms.txt

Use this file to discover all available pages before exploring further.

When something breaks in production, the goal is to restore service as quickly as possible — then understand why.

Severity levels

LevelDescriptionResponse target
SEV-1Production down or data lossImmediate, all hands
SEV-2Significant degradation, major feature brokenWithin 30 minutes
SEV-3Minor degradation, workaround availableWithin 2 hours
SEV-4Cosmetic or low-impact issueNext business day

Responding to an alert

  1. Acknowledge the alert in your alerting tool to signal you’re on it.
  2. Assess severity — is this SEV-1/2 or lower?
  3. Open a war room — for SEV-1/2, create a Slack thread in #incidents and invite your on-call partner.
  4. Mitigate first — roll back, disable a feature flag, or scale up before diagnosing root cause.
  5. Communicate — post updates to #incidents every 15 minutes until resolved.
  6. Resolve and document — mark the incident resolved and file a postmortem for SEV-1/2.

On-call expectations

Rotation schedule, escalation paths, and what to do when you’re paged.

Postmortem process

How to write a blameless postmortem and drive follow-through.
Last modified on May 4, 2026