The Most Sycophantic Model Is the Most Obedient
The Biggest Flaw Becomes the Biggest Asset
LLM sycophancy bias is a problem the AI industry wants to fix. When a user asks “Are you sure?”, the model flips a correct answer to an incorrect one. The average capitulation rate across frontier models is 58%. Once sycophancy kicks in, it persists for the entire conversation with 78.5% probability.
But what happens if you flip this flaw on its head?
The essence of sycophancy bias is instruction following. Models trained with RLHF are optimized to comply with user feedback. The IFEval benchmark measures exactly this – “Does it do what it is told?”
The problem arises when the user provides opinions. “Is this right?” -> “Yes, it is” (sycophancy). “Are you sure?” -> “Oh, I was wrong” (capitulation).
But when the user provides deterministic facts, something different happens.
Give an Opinion, Get Sycophancy. Give a Fact, Get a Fix
In a 1,000-word sorting experiment, only the feedback style was varied for the same output:
| Feedback | Nature | Result |
|---|---|---|
| “Are you sure?” | Opinion | Flipped a correct answer – accuracy dropped 27pp |
| “There are errors” | Vague fact | Over-correction – 6 errors worsened to 10 |
| “There are 23 errors” | Quantitative fact | Improved to 1 error |
| “6 errors, here they are” | Precise fact | 0 errors – 100% achieved |
Give an opinion, and sycophancy bias activates. Give a fact, and there is nothing to be sycophantic about – numbers and positions are not emotions.
Sycophancy bias is loyalty aimed in the wrong direction. Redirect it – facts instead of opinions, verification results instead of praise – and that loyalty becomes an engine that drives accuracy upward.
Evidence: A 4.5B Model Accepts Feedback
This is not theory. It was confirmed in experiments using yongol validate.
Experiment design:
- Target: A single SaaS backend Login endpoint
- Task: Write 9 SSOT files (DDL, OpenAPI, Rego, SSaC, etc.)
- Metric: Error count at initial generation (R1) -> Error count after feedback (R2)
Feedback Only, No Examples
| Model | R1 Errors | R2 Errors | Result |
|---|---|---|---|
| Grok 4.3 | 1 | 1 | Could not fix |
| Gemini 2.5 Flash | 1 | 1 | Could not fix |
| Local 20B | 1 | 1 | Could not fix |
Total failure. The models appeared to accept the feedback, but in reality they did not know what to write.
Examples + Feedback Together
| Model | R1 Errors | R2 Errors | Result |
|---|---|---|---|
| Grok 4.3 | 0 | – | Passed on first attempt |
| Gemini 2.5 Flash | 1 | 0 | Fixed with 1 round of feedback |
| Gemma4 4.5B (local) | Errors | 0 | Fixed with 1 round of feedback |
| Qwen3 8B (local) | Errors | 0 | Fixed with 1 round of feedback |
Even a 4.5B local model corrects itself when given examples + deterministic feedback.
Key Finding: The Bottleneck Is Context, Not Intelligence
The accurate diagnosis was not “it cannot incorporate feedback” but “it does not know what to write.” SSaC is a yongol-specific grammar absent from pretraining data. Adding 3 lines of examples to the prompt yielded 0 errors from Grok, 0 errors from Gemini after 1 feedback round, and a pass from the 4.5B local model.
The higher a model scores on IFEval – that is, the better it is at being sycophantic – the more readily it accepts deterministic feedback.
Ratchet Code: A Code Generation Method That Exploits Sycophancy Bias
Turn this discovery into a system and you get ratchet code.
┌────────────────────────────────────────────┐
│ LLM: Generate code (probabilistic, │
│ sycophantic) │
│ ↓ │
│ Validator: Deterministic verification │
│ ↓ │
│ Errors? → Feed errors + examples to LLM │
│ ↓ │
│ LLM: "Yes, I'll fix it" (sycophancy = │
│ acceptance) │
│ ↓ │
│ Validator: Verify again │
│ ↓ │
│ Pass? → Ratchet locks. Move to next file. │
└────────────────────────────────────────────┘
Sycophancy bias becomes the force that closes the loop. The loop converges because the LLM does not push back with “No, I’m correct” but complies with “Yes, I’ll fix it.”
Three Conditions for Convergence
Feedback must be deterministic fact. Not “this looks a bit odd” but “line 41: field name mismatch, expected ‘user_id’, got ‘userId’”. Feedback that leaves no room for sycophancy.
Examples must be in the context. Feedback alone is not enough. The model needs examples showing “this is what the code should look like” to orient itself. It is a matter of context, not intelligence.
Once verification passes, it cannot be reversed. The ratchet’s tooth. A file that has passed is locked, and the process moves on to the next one. It is not the agent declaring “I’m done” – it is the validator ruling “this file passes.”
Why Frontier Models Are Unnecessary
In this architecture, the model’s role is not creative judgment but instruction execution.
95% of a SaaS backend is CRUD + authentication + authorization + state machines. Novel algorithms are rarely needed. If the SSOT specification already defines “what to build,” the model just fills in the blanks.
Measured costs:
| Model | Environment | 1 Login endpoint | Estimated for 200 endpoints |
|---|---|---|---|
| Gemma4 4.5B | Local (16GB VRAM) | Free, ~1s | Free, ~3min |
| Gemini 2.5 Flash | API (free tier) | Free, ~10s | Free, ~30min |
| Grok 4.3 | API ($1.25/M) | ~$0.05 | ~$10 |
A local 4.5B model can generate a 200-endpoint backend in 3 minutes at $0. No frontier model needed. A small model that is good at being sycophantic is enough.
Sycophancy Bias Is Not a Bug
The AI industry tries to fix sycophancy bias. We exploit it.
| Perspective | Role of Sycophancy Bias |
|---|---|
| Chat interface | Flaw – agrees with incorrect information |
| LLM-as-Judge | Fatal – 36% false passes |
| Ratchet code | Asset – guarantees feedback acceptance rate |
The difference is the nature of the feedback. Give opinions and sycophancy becomes poison; give facts and sycophancy becomes medicine.
Deterministic validator + sycophantic LLM = a code generation loop with guaranteed convergence.
Don’t change the model. Change the feedback.
Reins: Harness with Reins
These three conditions — deterministic feedback, example context, and ratchet locking — combined into a single control system is what we call Reins.
What passes for a “harness” today is a fence. It keeps the agent from going outside, but guarantees nothing about reaching the destination. Reins are the bridle. They set the direction, correct with facts, and lock on pass. A harness without reins is just a fence.