Class 5. AI with Reins — Introduction to Reins Engineering

Class 5 Image: AI generated

Quick Tips — Just Know This and You Can Command AI

The limitation of AI coding tools is having only fences (harness) without direction (reins). Linters, formatters, and CI say “don’t go outside” but not “go this way.” Code arrives at production “clean but wrong.”

Core principle: Don’t change the model — add contracts. The same model stops at 40 or completes 527 depending on the feedback structure. Waiting for a smarter model is rule by man. Adding verification loops is rule of law.

Check right now:

To the agent: “Clean up the code. Don’t change functionality.”

After AI finishes refactoring, check whether the rules you set (written in CLAUDE.md) remain intact. Have API paths changed? Have DB table names changed? Has the response format changed?

There’s a high probability something has changed. AI “tidies up” and treats your decisions as implementation details and overwrites them. Because decisions are mixed inside code.

Remember the three pillars of Reins:

Deterministic feedback — Not “seems a bit off” but “line 41: field name mismatch”
Ratchet lock — Lock when passed, move on
Separation of decisions and implementation — Decisions in SSOT, code is disposable projection

If any one is missing, convergence breaks.

Hands-on Try

Open the Class 1 app with Claude Code. Then command:

To the agent: “Clean up the code. Don’t change functionality.”

After AI finishes refactoring, check whether the rules set in Class 1 (written in CLAUDE.md) remain intact. Have API paths changed? DB table names? Response format?

There’s a high probability something has changed. AI “tidies up” and treats your decisions as implementation details and overwrites them. This is the core of Class 5 — it happens because decisions and implementation are mixed inside code.

Why You Need to Command This Way

Previous Class Recap

In Class 4 we experienced yongol firsthand.

Separated decisions from code into 10 declarative specifications (SSOT)
One operationId threaded through 10 layers
287 rules caught cross-layer contradictions
Even 4.5B models converged to 0 errors with validate feedback

This class steps back and asks. Why does this work?

yongol is a tool. Behind the tool is a principle. Understanding this principle lets you apply the same thinking even in situations without yongol.

Four Eras

AI coding has gone through four paradigm shifts. Each era was born from the limitations of the previous one.

1st Gen: Prompt Engineering — “Just speak well”

“Make a TODO app with React.” “Make a bulletin board with FastAPI.”

The era of thinking AI would build well if you write good prompts. It actually works — once. The second prompt produces a different pattern. The third breaks the first.

Limitation: One-time. Prompts vanish. AI doesn’t know what you decided in the next conversation.

2nd Gen: Context Engineering — “Just give good context”

Write CLAUDE.md. Write requirements.md. Write progress.md. AI reads these files at the start of every conversation and remembers previous decisions.

What we learned in Class 1 is this stage. If prompts are one-time, files are permanent. Externalizing decisions maintains context across sessions.

Limitation: Evaporation. Even with context, AI forgets earlier parts as conversations get long. If you feed 200 endpoints’ context at once, middle information gets missed. Context is given but compliance isn’t enforced.

3rd Gen: Harness Engineering — “Just cage it in structure”

Linters, formatters, CI/CD, project structure, coding guidelines. Fences preventing the agent from going outside.

What we learned in Class 3 is part of this stage. Hurl tests, Git, CI/CD — these are fences. CI rejects if AI breaks existing features.

Limitation: No direction. Fences say “don’t go outside” but not “go this way.” Whatever the agent does inside the fence — overwriting existing logic, changing types, skipping state transitions — linters pass. Formatters pass. CI passes. Code arrives at production “clean but wrong.”

4th Gen: Reins Engineering — “Give direction”

Not fences but reins.

What we experienced in Class 4 is this stage. yongol validate doesn’t say “don’t go outside” but “here’s the misalignment, fix this way.” Directional feedback. Deterministic facts. Contracts AI has no choice but to follow.

The Difference Between Fences and Reins

To intuitively understand this difference, imagine riding a horse.

Harness (fence): Build a fence around the ranch. The horse roams freely inside. It can graze, wander in circles, or sleep. The only guarantee is it can’t get outside. But there’s no guarantee of reaching the destination.

Reins: Mount the horse and hold the reins. Pull right, it goes right. Pull left, it goes left. The horse runs freely — but you control the direction. You reach the destination.

Most AI coding tools in the industry today are at the fence stage. Linters, formatters, CI/CD, coding guidelines — all say “do it within this boundary.” The agent spins inside the fence. Code is clean, but nobody knows if it’s heading in the intended direction.

Reins Engineering says “go this direction.” When validate says “misalignment here,” AI corrects in that direction. Reins reach the destination without restricting freedom.

Research shows code complexity permanently increases 41% after AI coding tool adoption¹, and delivery stability decreases as AI adoption grows². Evidence that fences alone aren’t enough.

The Three Pillars of Reins

Reins Engineering consists of three principles.

Pillar 1: Deterministic Feedback

Give AI facts, not opinions.

Bad feedback: “This seems a bit off” Good feedback: “line 41: field name mismatch, expected ‘user_id’, got ‘userId’”

What’s the difference? Bad feedback leaves room for AI to flatter. “I think it looks right? This is a better pattern.” Good feedback has no room for flattery. Numbers and locations aren’t emotions.

The 1,000-word sorting experiment confirmed this quantitatively:

Feedback Method	Feedback Nature	Result
“Are you sure?”	Opinion	27pp accuracy drop
“There are errors”	Vague fact	6 → 10 — worsened
“There are 23 errors”	Quantitative fact	Improved to 1 error
“6 errors, they’re here”	Precise fact	0 errors — 100%

“Wrong” alone causes over-correction and worsens things. Error count creates a target for persistent search. Location enables perfect fixing.

What yongol validate does in Class 4 is exactly this. “SSaC’s CancelReservation calls Reservation.SoftDelete, but there’s no SoftDelete method in sqlc queries.” Not opinion. Fact. The only response AI can have is “I’ll fix it.”

Research shows “do TDD” procedural instructions actually worsen regression, while providing specific test files as context reduces regression by 70%³. Instructions don’t prevent regression — facts do.

Pillar 2: Ratchet Lock (Ratchet Pattern)

Lock when verification passes.

Think of a ratchet wrench. Teeth engage in one direction only. Turn and it goes forward; release and it stops but doesn’t go backward. Ratchet Pattern applies this mechanism to agent control.

Item 1: Mechanical verification → PASS → Lock → Next
Item 2: Mechanical verification → FAIL → Retry (with feedback)
Item 2: Mechanical verification → PASS → Lock → Next
...
Item N: PASS → Complete. Stop.

Three rules:

Show only one item at a time.
Must pass to unlock the next.
Stop when all pass.

Numbers show the actual difference:

Autonomous agent:  40 / 527  (7.6%)  — Agent declares "done"
Ratchet CLI:      527 / 527 (100%)  — Machine declares "487 still remaining"

Same model. Same project. The difference is who decides “done.”

In the autonomous agent, the LLM judges termination. LLMs are optimistic. They do 40 and “feel” it’s enough. In the ratchet, the machine judges termination. Machines don’t feel. They declare “not yet” until remaining items reach 0.

Hurl tests from Class 3 are one form of ratchet. Lock when Hurl tests pass. Can’t delete existing tests when adding new features. The agent can freely change code but can’t change behavior. Only forward.

Pillar 3: Separation of Decisions and Implementation

What we experienced directly in Class 4. Let’s organize the principle once more.

Source code contains three things mixed:

User decisions — This column is BIGINT. This API is owner-access only.
Business logic — Pricing policies, workflows, state transitions.
Implementation details — Variable names, error handling, library calls.

When AI reads this code, it sees text where three things are interleaved. Say “refactor” and it mistakes your decisions for details and overwrites them.

Pulling decisions out of code eliminates this problem. Decisions live in SSOT, code is a disposable projection from SSOT. AI can’t mistake decisions for details. Decision survival becomes independent of model size.

How Three Pillars Work Together

The three pillars don’t work separately. They work together.

Let’s decompose the yongol workflow from Class 4 into the three pillars:

AI edits SSOT                        ← Pillar 3: Separation of decisions and implementation
  ↓
yongol validate returns errors        ← Pillar 1: Deterministic feedback
  ↓
AI fixes errors
  ↓
validate passes → ratchet lock        ← Pillar 2: Ratchet lock
  ↓
On to next feature

Separation — What AI edits isn’t code but declarative specifications. Only decisions, no details.
Feedback — validate pinpoints contradictions precisely. “This field here differs from that field there.” It’s a fact.
Lock — Pass and move on. No going backward.

What happens if any are missing:

Separation without feedback: feedback is precise, but can’t prevent AI from overwriting decisions.
Feedback without lock: don’t know what’s wrong, so AI fixes without direction.
Lock without feedback: AI fixes then breaks again. Oscillation.

All three must be present for convergence to be guaranteed.

Symbolic Feedback Loop — Tracks Over Trains

Organizing everything so far into one structure:

LLM generates → Deterministic tool judges → Result fed back to LLM → Repeat

This is called the Symbolic Feedback Loop.

Why “Symbolic”? Because feedback consists of specific symbols like numbers, filenames, line numbers. Not “seems a bit off” but “line 41, should be user_id but says userId.” Not human natural-language judgment but machine deterministic judgment. That’s why it’s called Symbolic Feedback Loop.

The current industry mainstream is different. AI validates AI. An LLM writes code, another LLM reviews it. This is called LLM Feedback Loop.

Analogy: a drunk person asking their drunk friend “am I drunk?” Both are probabilistic, so errors accumulate.

Symbolic Feedback Loop is different. go test doesn’t hallucinate. yongol validate doesn’t flatter. Coverage measurement doesn’t lie. Deterministic tools give the same output for the same input every time.

Think about why coding agents work. In Claude Code, when AI writes code, it saves to filesystem, runs tests, and tries building. Deterministic gates (tests, builds, type checks) are inserted in this process. This inadvertently creates a Symbolic Feedback Loop. That’s why it works.

But why does it break? It breaks where tests are absent. Where deterministic gates are missing, AI judges probabilistically, and probabilistic judgments degrade multiplicatively.

If 97.7% accuracy steps are chained:

2 steps: 0.977 x 0.977 = 95.4%
5 steps: 0.977^5 = 89.1%
10 steps: 0.977^10 = 79.2%
100 steps: 0.977^100 = 4.8%

The mathematical explanation for why vibe coding crumbles at 200 endpoints. In small projects, there are few consecutive steps so probability holds up; in large projects, multiplication works catastrophically.

The ratchet solves this. Insert a deterministic gate at every step, and degradation resets. Running 10 steps at once makes multiplication catastrophic, but fixing with a ratchet at each step makes accuracy independent per step.

Making trains faster matters less than laying tracks. Many people are making trains. Almost nobody is laying tracks yet.

“Don’t Change the Model — Add Contracts”

This is Reins Engineering’s core thesis.

The same model stops at 40 or completes 527. The same model editing raw code causes drift, editing SSOT converges to 0 errors. The same model without feedback stalls at 60-70% coverage, with feedback reaches 100%.

The difference is not the model but the feedback topology.

“Adding a verification loop is 10x cheaper than increasing model IQ.”

As an analogy: rule by man vs rule of law.

Rule by man depends on a wise king. If the king is smart, the country runs well; if stupid, it collapses. Waiting for a smarter model is rule by man. “GPT-6 will fix it.”

Rule of law depends on law. Whether the king is smart or stupid, with law the system works. Adding contracts is rule of law. “If validate catches it, whatever model converges.”

Humanity already knows this answer. Promises written in blood. The reason 8 billion humans coexist on one planet isn’t because humans are good — it’s because law exists.

AI is no different.

Constraints Are Contracts

For rule of law to work, three conditions are needed:

1. Verifiable. Whether a violation occurred can be mechanically determined.

2. Violation is defined. Not “don’t write bad code” but “if this field and that field have type mismatch, it’s a violation.” Discrete. Either violation or not.

3. Enforceable. Violation has consequences. When yongol validate fails, code generation is refused. A promise without consequences isn’t a promise — it’s a wish.

Putting these three conditions in the same table reveals a pattern:

Domain	Promise	Verification	Violation Definition	Enforcement
Human society	Law	Trial	Statute	Penalty/Compensation
Programming	Type system	Compiler	Type error	Compile rejection
Code structure	filefunc	validate	22 rule violations	ERROR
AI coding	yongol validate	~287 rules	Cross-validation failure	Code generation refused

Every working system has promises. If promises are verifiable, violations are defined, and enforceable — the system converges.

Without promises, there’s chaos. AI produces different results every time. No termination condition. Drift accumulates.

With excessive promises, there’s oppression. Regulating everything kills flexibility. Forcing 10-line verification rules on a 3-line function inverts means and ends.

You need to find the golden ratio. Sufficient constraints without excess. yongol’s 287 rules are the minimal promise that “there must be no cross-layer contradictions.” AI writes freely within a layer. Contracts operate only between layers.

Sufficient freedom within sufficient order. That’s the golden ratio.

Sycophancy Bias Is a Bug No, an Asset

The sycophancy bias most cited as AI’s biggest flaw — “tendency to agree with whatever the user says” — becomes an asset in the Reins structure. Give opinions and it flatters, but give deterministic facts and it meekly accepts and actually fixes. This sycophancy bias actually helps in the ratchet structure. Why is covered in detail in Class 7.

The Real Reason Coding Agents Work

In Class 2 we learned “why they break.” Now it’s time to understand “why they work.”

Same model. The model that hallucinates in web chat ships 200-line features in one go in Claude Code. The model didn’t suddenly get smarter. The structure changed.

Conversational AI loop:

LLM → Human → LLM → Human

Feedback is all natural language. Probabilistic generation followed by probabilistic evaluation.

Coding agent loop:

LLM → Code generation → File save → Test run → pass/fail → LLM

Deterministic gates are inserted in the loop. Filesystem saves exactly what’s written. Tests are pass or fail. Compilers say wrong when wrong.

AI is an unstable part that produces different results every time. But placing stable rules on top of unstable parts is something engineering has always done.

The ocean churns but the lighthouse doesn’t move. Deliveries sometimes get lost but tracking numbers find them. Building stable systems on unstable things isn’t special — it’s what engineering has always done.

The reason coding agents work is the same. A deterministic verifier is placed on top of unstable AI.

Reins Engineering turns this accident into intention. Not “it happens to work because tests happen to be there” but “convergence is guaranteed by consciously designing verification gates.”

Summary — What to Remember from This Class

Four eras. Prompt → Context → Harness → Reins. Each era born from previous era’s limits. Fences (harness) don’t set direction. Reins set direction.
Three pillars. Deterministic feedback, ratchet lock (Ratchet Pattern), separation of decisions and implementation. All three must be present for convergence.
Symbolic Feedback Loop. LLM generates, deterministic tools judge, results fed back to LLM. Tracks matter more than trains.
Don’t change the model — add contracts. Same model stops at 40 or completes 527 depending on feedback topology. Not rule by man but rule of law.
Sycophancy bias is an asset. Give opinions and it flatters, give facts and it accepts. Deterministic feedback + sycophantic LLM = loop with guaranteed convergence.

Exercise: Distinguish Decisions from Details

Goal: Open your project mentally and find where decisions and details mix.

Think based on what you felt in the experience above.

Question 1: What are the “decisions” in your project?

Think back to things you told AI to “make.” Among them, things you designated “this must be this way” are decisions. For example:

“Login uses email” — Decision
“Password must be 8+ characters” — Decision
“Variable name is userEmail” — Detail

Write down 3 decisions and 3 details.

Question 2: Have decisions written in CLAUDE.md been overwritten in code?

You created CLAUDE.md in Class 1. There should be rules written there. Has AI ever ignored those rules and made things differently? If so, that’s exactly drift from decisions and details mixing.

Question 3: What if those decisions were outside code?

If those decisions were in separate specifications rather than code, could AI have overwritten them? What yongol does in Class 4 is exactly this. Pulling decisions out of code to fundamentally prevent AI from mistaking them for details and overwriting.

What you should have felt in this thought experiment:

The sense that decisions and details are different
Recognition that within code, these two are indistinguishable
Understanding that Class 4’s tool (yongol) and Class 5’s principle (Reins Engineering) are one structure

Next Class Preview

In Class 5 we understood the principles of Reins. In Class 6 we deep-dive into Ratchet Pattern. The concrete method of making an agent that stopped at 40 complete 527. Ratchet principles, application in bulk tasks, and practical implementation of the Symbolic Feedback Loop.

Reins Engineering Full Course

Class	Title
Class 0	Install Claude Code
Class 1	How to Command AI
Class 2	How to Distrust AI
Class 3	Apps That Don’t Break
Class 4	Decisions Outside Code
Class 5	AI with Reins
Class 6	Pass Then Lock
Class 7	Flipping Sycophancy
Class 8	The Agent’s Factory
Class 9	Automation Beyond Code
Class 10	The Law of Data
Class 11	How to Rescue Failed Vibe Coding

Sources

Carnegie Mellon University, MSR 2026 — 41% permanent increase in code complexity after AI coding tool adoption.
Google DORA Report, 2025 — 7.2% decrease in delivery stability per 25% increase in AI adoption.
TDAD, ACM AIWare 2026 — “Do TDD” procedural instruction (6.08% → 9.94%) worsens regression; providing specific test files as context (6.08% → 1.82%) reduces regression by 70%.

Changelog

2026-05-24: Initial release

Carnegie Mellon University, MSR 2026. ↩︎
Google DORA Report, 2025 — 7.2% decrease in delivery stability per 25% increase in AI adoption. ↩︎
TDAD, ACM AIWare 2026 — Procedural instruction (6.08% → 9.94%), specific context provision (6.08% → 1.82%). ↩︎