Skip to content

Incident Checklist

Use this when something is broken.

  • What is the user impact?
  • Is data at risk?
  • Can we disable the broken path?
  • Can we roll back?
  • Who needs to know?
  • What changed recently?
  • Can we reproduce?
  • Which boundary is failing?
  • What logs prove the failure?
  • What metrics changed?
  • Is the database healthy?
  • Are external services healthy?
  • What is the smallest safe mitigation?
  • What is the root cause fix?
  • What test or check proves it?
  • What should be monitored after?
  • Timeline.
  • Impact.
  • Root cause.
  • Detection gap.
  • Fix.
  • Prevention.
  • Owner and deadline for follow-ups.