Connor Holly

Autonomous multi-agent execution loop with three roles — planner, worker, and judge — coordinating through a shared plan file to deliver milestone-based work without human intervention.

The Pattern

Three agents collaborate through a single plan file that serves as shared state:

Planner --> Plan File --> Worker --> Output --> Judge
              ^                                  |
              |           approve/revise         |
              +----------------------------------+

Planner decomposes the goal into milestones (not tasks). Each milestone is a meaningful, verifiable deliverable — "authentication flow works end-to-end," not "write login function." The planner writes these to a plan file with status markers.

Worker picks up one milestone at a time, executes it, and writes a completion marker. Workers are stateless between milestones — they read the plan file fresh each time.

Judge reviews the worker's output against the milestone definition. Three possible outcomes: approve (move to next milestone), request revision (worker retries with feedback), or escalate (flag for human attention). The judge writes its verdict back to the plan file.

The loop terminates when all milestones are approved or a completion promise marker (e.g., <promise>COMPLETE</promise>) is emitted. A max iteration ceiling prevents runaway loops.

Key Decisions

Milestones over tasks. Tasks are implementation-scoped ("write function X"). Milestones are outcome-scoped ("users can authenticate"). This matters because the judge can verify outcomes without understanding implementation details.

Plan file as source of truth. No message passing, no shared memory, no event bus. One file that every agent reads and writes. Simple to debug, simple to recover. If an agent crashes, the plan file tells the next agent exactly where things stand.

Judge loop over single-pass. Without a judge, you get first-draft quality. The judge creates a revision cycle that catches errors the worker missed. The tradeoff is slower execution — each milestone takes 2-3 passes instead of 1. Worth it for anything that needs to actually work.

Recovery by simplification. When a worker fails repeatedly on a milestone, the judge can split it into smaller milestones or simplify the acceptance criteria. This is better than infinite retries on the same scope.

When to Use It

This pattern fits autonomous code generation, document creation, data pipeline construction — any multi-step work where you want AI to self-correct without human babysitting. The judge loop is the key differentiator from simpler orchestration: it turns single-pass generation into iterative refinement.

Overkill for anything a single prompt can handle. The coordination overhead only pays off when the work has 3+ distinct phases that benefit from independent verification.