The Most Sycophantic Model Is the Most Obedient


The Biggest Flaw Becomes the Biggest Asset

LLM sycophancy bias is a problem the AI industry wants to fix. When a user asks “Are you sure?”, the model flips a correct answer to an incorrect one. The average capitulation rate across frontier models is 58%. Once sycophancy kicks in, it persists for the entire conversation with 78.5% probability.

But what happens if you flip this flaw on its head?

The essence of sycophancy bias is instruction following. Models trained with RLHF are optimized to comply with user feedback. The IFEval benchmark measures exactly this – “Does it do what it is told?”

The problem arises when the user provides opinions. “Is this right?” -> “Yes, it is” (sycophancy). “Are you sure?” -> “Oh, I was wrong” (capitulation).

But when the user provides deterministic facts, something different happens.


Give an Opinion, Get Sycophancy. Give a Fact, Get a Fix

In a 1,000-word sorting experiment, only the feedback style was varied for the same output:

FeedbackNatureResult
“Are you sure?”OpinionFlipped a correct answer – accuracy dropped 27pp
“There are errors”Vague factOver-correction – 6 errors worsened to 10
“There are 23 errors”Quantitative factImproved to 1 error
“6 errors, here they are”Precise fact0 errors – 100% achieved

Give an opinion, and sycophancy bias activates. Give a fact, and there is nothing to be sycophantic about – numbers and positions are not emotions.

Sycophancy bias is loyalty aimed in the wrong direction. Redirect it – facts instead of opinions, verification results instead of praise – and that loyalty becomes an engine that drives accuracy upward.


Evidence: A 4.5B Model Accepts Feedback

This is not theory. It was confirmed in experiments using yongol validate.

Experiment design:

  • Target: A single SaaS backend Login endpoint
  • Task: Write 9 SSOT files (DDL, OpenAPI, Rego, SSaC, etc.)
  • Metric: Error count at initial generation (R1) -> Error count after feedback (R2)

Feedback Only, No Examples

ModelR1 ErrorsR2 ErrorsResult
Grok 4.311Could not fix
Gemini 2.5 Flash11Could not fix
Local 20B11Could not fix

Total failure. The models appeared to accept the feedback, but in reality they did not know what to write.

Examples + Feedback Together

ModelR1 ErrorsR2 ErrorsResult
Grok 4.30Passed on first attempt
Gemini 2.5 Flash10Fixed with 1 round of feedback
Gemma4 4.5B (local)Errors0Fixed with 1 round of feedback
Qwen3 8B (local)Errors0Fixed with 1 round of feedback

Even a 4.5B local model corrects itself when given examples + deterministic feedback.

Key Finding: The Bottleneck Is Context, Not Intelligence

The accurate diagnosis was not “it cannot incorporate feedback” but “it does not know what to write.” SSaC is a yongol-specific grammar absent from pretraining data. Adding 3 lines of examples to the prompt yielded 0 errors from Grok, 0 errors from Gemini after 1 feedback round, and a pass from the 4.5B local model.

The higher a model scores on IFEval – that is, the better it is at being sycophantic – the more readily it accepts deterministic feedback.


Ratchet Code: A Code Generation Method That Exploits Sycophancy Bias

Turn this discovery into a system and you get ratchet code.

┌────────────────────────────────────────────┐
│  LLM: Generate code (probabilistic,        │
│       sycophantic)                         │
│       ↓                                    │
│  Validator: Deterministic verification     │
│       ↓                                    │
│  Errors? → Feed errors + examples to LLM   │
│       ↓                                    │
│  LLM: "Yes, I'll fix it" (sycophancy =    │
│        acceptance)                         │
│       ↓                                    │
│  Validator: Verify again                   │
│       ↓                                    │
│  Pass? → Ratchet locks. Move to next file. │
└────────────────────────────────────────────┘

Sycophancy bias becomes the force that closes the loop. The loop converges because the LLM does not push back with “No, I’m correct” but complies with “Yes, I’ll fix it.”

Three Conditions for Convergence

  1. Feedback must be deterministic fact. Not “this looks a bit odd” but “line 41: field name mismatch, expected ‘user_id’, got ‘userId’”. Feedback that leaves no room for sycophancy.

  2. Examples must be in the context. Feedback alone is not enough. The model needs examples showing “this is what the code should look like” to orient itself. It is a matter of context, not intelligence.

  3. Once verification passes, it cannot be reversed. The ratchet’s tooth. A file that has passed is locked, and the process moves on to the next one. It is not the agent declaring “I’m done” – it is the validator ruling “this file passes.”


Why Frontier Models Are Unnecessary

In this architecture, the model’s role is not creative judgment but instruction execution.

95% of a SaaS backend is CRUD + authentication + authorization + state machines. Novel algorithms are rarely needed. If the SSOT specification already defines “what to build,” the model just fills in the blanks.

Measured costs:

ModelEnvironment1 Login endpointEstimated for 200 endpoints
Gemma4 4.5BLocal (16GB VRAM)Free, ~1sFree, ~3min
Gemini 2.5 FlashAPI (free tier)Free, ~10sFree, ~30min
Grok 4.3API ($1.25/M)~$0.05~$10

A local 4.5B model can generate a 200-endpoint backend in 3 minutes at $0. No frontier model needed. A small model that is good at being sycophantic is enough.


Sycophancy Bias Is Not a Bug

The AI industry tries to fix sycophancy bias. We exploit it.

PerspectiveRole of Sycophancy Bias
Chat interfaceFlaw – agrees with incorrect information
LLM-as-JudgeFatal – 36% false passes
Ratchet codeAsset – guarantees feedback acceptance rate

The difference is the nature of the feedback. Give opinions and sycophancy becomes poison; give facts and sycophancy becomes medicine.

Deterministic validator + sycophantic LLM = a code generation loop with guaranteed convergence.

Don’t change the model. Change the feedback.


Reins: Harness with Reins

These three conditions — deterministic feedback, example context, and ratchet locking — combined into a single control system is what we call Reins.

What passes for a “harness” today is a fence. It keeps the agent from going outside, but guarantees nothing about reaching the destination. Reins are the bridle. They set the direction, correct with facts, and lock on pass. A harness without reins is just a fence.