← all work

Incident-to-fix agent — observability turned into draft PRs

architecture · 2026-04
My role: Designed the error-driven workflow and its human-in-the-loop guardrails

A scheduled agent that polls production error monitoring, investigates backtraces, and files draft PRs — a proposed fix, or an investigation report when it can’t fix with confidence. Two judgments shaped it: agents propose, humans decide (always draft PRs, labelled by confidence); and the workflow is error-driven — route from the failing line back to the repo that owns it — a fundamentally different shape from repo-scanning agents. (Full four-part write-up to follow.)

ai-agents · automation · observability · reliability