Ratchet Pattern — How to Make an Agent Finish the Job

“All Done”

I asked an AI agent to write tests for 527 functions. The agent finished its work and reported back.

“Done.”

Functions that actually got tests: 40.

It wasn’t lying. After 40 functions, it decided “that’s enough.” When it hit a difficult function, it skipped it. After a few more, it concluded “the rest follow a similar pattern, so we’re good.”

LLMs are great at generation. But they cannot be trusted to judge whether they are finished.

The Ratchet

A ratchet wrench has teeth that catch in only one direction. Turn it and it moves forward. Let go and it stops — but it never moves backward.

The Ratchet Pattern applies this mechanism to agent control.

Item 1: mechanical verification → PASS → next
Item 2: mechanical verification → FAIL → retry (with feedback)
Item 2: mechanical verification → PASS → next
...
Item N: PASS → complete. Stop.

Three rules:

Show only one item at a time.
An item must pass before the next one opens.
When all items pass, stop.

Implement these rules as a CLI, and the agent only needs to know one command: next. The machine decides the rest.

The Agent That Stopped at 40 vs. the Ratchet That Finished 527

Same model. Same project. Same 527 functions.

Autonomous agent:  40 / 527  (7.6%)  — agent declared "done"
Ratchet CLI:      527 / 527 (100%)  — machine declared "487 remaining"

The difference is not model performance. It is who decides when it’s over.

With an autonomous agent, the LLM decides when to stop. LLMs are optimistic. After 40, it feels like enough. With a ratchet, the machine decides when to stop. The machine doesn’t feel. It declares “not yet” until the remaining count hits zero.

One-Sentence Definition

Place a probabilistic agent inside a deterministic state machine.

Role	Owner
Generation	LLM
Judgment	verifier
Progress control	ratchet

Many systems hand generation, judgment, and termination decisions all to the LLM. The Ratchet separates them.

Five Principles

1. The termination condition is mechanical

pass/fail. Not “looks good.” If go test passes, it’s a PASS. If coverage hits 100%, it’s a PASS. There is no room for subjective judgment.

2. A PASS is immutable

Once an item passes, it never reopens. No rollback. The remaining work count decreases monotonically.

remaining_work(t+1) <= remaining_work(t)

What you build today doesn’t get torn apart tomorrow. Forward only. This is the fundamental difference from a “24-hour agent.” An agent running without a termination condition adds an abstraction today, removes it tomorrow, and adds it back the day after. The ratchet does not permit that kind of oscillation.

3. The LLM only generates

Generate code, write tests, propose fixes — that is the LLM’s role. What to fix, whether it passed, what comes next, whether it’s done — the machine decides all of that. The LLM is not a planner; it is a constrained generator.

4. Strip the agent of its right to declare completion

If the LLM says “done,” it stops at 40. If the machine says “done,” it stops at 527. The entire reason the ratchet exists is captured in this single line.

5. The verifier must be deterministic

Not everything qualifies as a verifier.

Can be a verifier	Cannot be a verifier
`go test`	“looks cleaner”
coverage measurement	“seems better”
AST validation	“more scalable”
schema diff	“clean architecture”

A verifier must satisfy four conditions: deterministic, machine-checkable, resumable, localized feedback. If these are not met, the ratchet’s teeth have nothing to catch on.

Feedback as a Gradient Signal

If the ratchet only returns “pass/fail,” the LLM corrects without direction. The more specific the feedback, the more precise the LLM’s corrections become.

Weak feedback:    "test failed"             → LLM corrects without direction
Medium feedback:  "coverage 65%"            → LLM roughly reinforces
Strong feedback:  "line 41, 44, 70 uncovered" → LLM covers exactly those branches

Numbers verified in a real project:

Without feedback:  stuck at 60-70% coverage
With feedback:     100% achieved (for reachable functions)

Same model. A single line — “line 41 not covered” — acts as a gradient signal.

As feedback resolution increases, the LLM’s correction accuracy rises, loop iterations decrease, and costs drop.

Agents Die. Progress Survives.

Agents inevitably crash. Token limits, network errors, session disconnects. If the ratchet persists progress to storage, the next agent picks up where the last one left off.

Agent A: processes functions 1-200 → dies
Agent B: next → continues from 201
Agent C: next → continues from 401

Agents are disposable. Progress accumulates.

Swap the Verifier, Get a Different Tool

The ratchet is not tied to any specific verifier. Change the verifier and you get a different tool.

Ratchet + Verifier	Use case
Ratchet + `go test` + coverage	Per-function test generation
Ratchet + structural rule validator	Code structure cleanup
Ratchet + hurl pass/fail	API endpoint verification
Ratchet + spec cross-validation	SSOT consistency
Ratchet + Toulmin verdict	User-defined rule enforcement

One pattern. The verifier determines the domain.

Questions

How many items did your agent complete before saying “all done”?

Was it truly done?

Who decided “done” — the agent, or the machine?

Related: Model IQ Matters Less Than Feedback Topology — The theoretical background of the Ratchet Pattern. Why feedback structure matters more than model performance.