During an incident nobody has spare hands to take notes. A copilot watches the same signals you do and writes down what happened, with timestamps, while you fix it.
# incident-copilot
watch:
- logs: service=checkout level>=error
- metrics: p99_latency, error_rate, saturation
- alerts: pagerduty incident feed
on_event: append to timeline (UTC, newest last)
name the system, never the person
on_resolve: hand the timeline to prompts/postmortem.mdNotes you didn’t have to take#
The value is not real-time analysis — it is the record. When the incident closes, you have a timestamped account instead of a foggy memory and a scrollback you have to reconstruct.
It feeds the postmortem#
The copilot’s output is the exact input the postmortem prompt expects. Two tools, one pipeline: the incident produces its own writeup as a byproduct of being handled.
Guardrails#
The copilot should not diagnose beyond the evidence it can see. It can capture that checkout errors spiked at 14:03 UTC, that a deployment happened at 13:58 UTC, and that p99 latency recovered after a queue drain. It should not guess who caused it or invent a root cause before the team has one.
Useful incident assistance is boring: timestamp, source, system, observed change, link. That makes it safe to hand into the postmortem prompt later. The model is there to preserve facts while humans make the operational decisions.
When it should stay quiet#
During mitigation, the copilot should not flood the channel with theories. It should append concise timeline entries and wait for clear evidence before naming relationships.
The best output is useful after the incident: what changed, when it changed, what recovered, and which signals were missing. That is what turns the incident into better instrumentation and runbooks.