Ratchet Code That Exploits IFEval

Ratchet Code That Exploits IFEval Image: AI generated

If your LLM follows instructions well but the results are a mess, if you want to exploit sycophancy bias instead of eliminating it, if you want even a 4.5B local model to generate correct code – the combination of IFEval and the ratchet is the answer.

The Most Sycophantic Model Is the Most Obedient

The Biggest Flaw Becomes the Biggest Asset

LLM sycophancy bias is a problem the AI industry wants to fix. When a user asks “Are you sure?”, the model flips a correct answer to an incorrect one. The average capitulation rate across frontier models is 58%. Once sycophancy kicks in, it persists for the entire conversation with 78.5% probability.

But what happens if you flip this flaw on its head?

The essence of sycophancy bias is instruction following. Models trained with RLHF are optimized to comply with user feedback (Ouyang et al., 2022). The IFEval benchmark measures exactly this – “Does it do what it is told?” (Zhou et al., 2023)

The problem arises when the user provides opinions. “Is this right?” -> “Yes, it is” (sycophancy). “Are you sure?” -> “Oh, I was wrong” (capitulation).

But when the user provides deterministic facts, something different happens.

Give an Opinion, Get Sycophancy. Give a Fact, Get a Fix

In a 1,000-word sorting experiment, only the feedback style was varied for the same output:

Feedback	Nature	Result
“Are you sure?”	Opinion	Flipped a correct answer – accuracy dropped 27pp
“There are errors”	Vague fact	Over-correction – 6 errors worsened to 10
“There are 23 errors”	Quantitative fact	Improved to 1 error
“6 errors, here they are”	Precise fact	0 errors – 100% achieved

Give an opinion, and sycophancy bias activates. Give a fact, and there is nothing to be sycophantic about – numbers and positions are not emotions.

Sycophancy bias is loyalty aimed in the wrong direction. Redirect it – facts instead of opinions, verification results instead of praise – and that loyalty becomes an engine that drives accuracy upward.

Evidence: A 4.5B Model Accepts Feedback

This is not theory. It was confirmed in experiments using yongol validate.

Experiment design:

Target: A single SaaS backend Login endpoint
Task: Write 9 SSOT files (DDL, OpenAPI, Rego, SSaC, etc.)
Metric: Error count at initial generation (R1) -> Error count after feedback (R2)

Feedback Only, No Examples

Model	R1 Errors	R2 Errors	Result
Grok 4.3	1	1	Could not fix
Gemini 2.5 Flash	1	1	Could not fix
Local 20B	1	1	Could not fix

Total failure. The models appeared to accept the feedback, but in reality they did not know what to write.

Examples + Feedback Together

Model	R1 Errors	R2 Errors	Result
Grok 4.3	0	–	Passed on first attempt
Gemini 2.5 Flash	1	0	Fixed with 1 round of feedback
Gemma4 4.5B (local)	Errors	0	Fixed with 1 round of feedback
Qwen3 8B (local)	Errors	0	Fixed with 1 round of feedback

Even a 4.5B local model corrects itself when given examples + deterministic feedback.

Key Finding: The Bottleneck Is Context, Not Intelligence

The accurate diagnosis was not “it cannot incorporate feedback” but “it does not know what to write.” SSaC is a yongol-specific grammar absent from pretraining data. Adding 3 lines of examples to the prompt yielded 0 errors from Grok, 0 errors from Gemini after 1 feedback round, and a pass from the 4.5B local model.

The higher a model scores on IFEval – that is, the better it is at being sycophantic – the more readily it accepts deterministic feedback.

Ratchet Code: A Code Generation Method That Exploits Sycophancy Bias

Turn this discovery into a system and you get ratchet code.

┌────────────────────────────────────────────┐
│  LLM: Generate code (probabilistic,        │
│       sycophantic)                         │
│       ↓                                    │
│  Validator: Deterministic verification     │
│       ↓                                    │
│  Errors? → Feed errors + examples to LLM   │
│       ↓                                    │
│  LLM: "Yes, I'll fix it" (sycophancy =    │
│        acceptance)                         │
│       ↓                                    │
│  Validator: Verify again                   │
│       ↓                                    │
│  Pass? → Ratchet locks. Move to next file. │
└────────────────────────────────────────────┘

Sycophancy bias becomes the force that closes the loop. The loop converges because the LLM does not push back with “No, I’m correct” but complies with “Yes, I’ll fix it.” The approach of iteratively fixing LLM code with compiler and test feedback was also shown in Self-Debug (Chen et al., 2024) to complete debugging within 3 turns – ratchet code takes this further by entirely removing the LLM’s self-judgment and leaving only deterministic facts.

Three Conditions for Convergence

Feedback must be deterministic fact. Not “this looks a bit odd” but “line 41: field name mismatch, expected ‘user_id’, got ‘userId’”. Feedback that leaves no room for sycophancy.
Examples must be in the context. Feedback alone is not enough. The model needs examples showing “this is what the code should look like” to orient itself. It is a matter of context, not intelligence.
Once verification passes, it cannot be reversed. The ratchet’s tooth. A file that has passed is locked, and the process moves on to the next one. It is not the agent declaring “I’m done” – it is the validator ruling “this file passes.”

Why Frontier Models Are Unnecessary

In this architecture, the model’s role is not creative judgment but instruction execution.

95% of a SaaS backend is CRUD + authentication + authorization + state machines. Novel algorithms are rarely needed. If the SSOT specification already defines “what to build,” the model just fills in the blanks.

Measured costs:

Model	Environment	1 Login endpoint	Estimated for 200 endpoints
Gemma4 4.5B	Local (16GB VRAM)	Free, ~1s	Free, ~3min
Gemini 2.5 Flash	API (free tier)	Free, ~10s	Free, ~30min
Grok 4.3	API ($1.25/M)	~$0.05	~$10

A local 4.5B model can generate a 200-endpoint backend in 3 minutes at $0. No frontier model needed. A small model that is good at being sycophantic is enough.

Sycophancy Bias Is Not a Bug

The AI industry tries to fix sycophancy bias. We exploit it.

Perspective	Role of Sycophancy Bias
Chat interface	Flaw – agrees with incorrect information
LLM-as-Judge	Fatal – 36% false passes
Ratchet code	Asset – guarantees feedback acceptance rate

The difference is the nature of the feedback. Give opinions and sycophancy becomes poison; give facts and sycophancy becomes medicine.

Deterministic validator + sycophantic LLM = a code generation loop with guaranteed convergence.

Don’t change the model. Change the feedback.

Reins: Harness with Reins

These three conditions – deterministic feedback, example context, and ratchet locking – combined into a single control system is what we call Reins.

What passes for a “harness” today is a fence. It keeps the agent from going outside, but guarantees nothing about reaching the destination. Reins are the bridle. They set the direction, correct with facts, and lock on pass. A harness without reins is just a fence.

References

Zhou, J., Lu, T., Mishra, S., Brahma, S., Basu, S., Luan, Y., Zhou, D., & Hou, L. (2023). “Instruction-Following Evaluation for Large Language Models.” arXiv:2311.07911
Ouyang, L., Wu, J., Jiang, X., et al. (2022). “Training Language Models to Follow Instructions with Human Feedback.” NeurIPS 2022. arXiv:2203.02155
Chen, X., Lin, M., Scharli, N., & Zhou, D. (2024). “Teaching Large Language Models to Self-Debug.” ICLR 2024. arXiv:2304.05128
Sharma, M., Tong, M., Korbak, T., et al. (2024). “Towards Understanding Sycophancy in Language Models.” ICLR 2024. arXiv:2310.13548
Fanous, A., Goldberg, J., Agarwal, A., et al. (2025). “SycEval: Evaluating LLM Sycophancy.” AAAI/ACM AIES 2025. arXiv:2502.08177
Shapira, I., Benade, G., & Procaccia, A. D. (2026). “How RLHF Amplifies Sycophancy.” arXiv:2602.01002
Ibrahim, L., Hafner, F. S., & Rocher, L. (2026). “Training Language Models to Be Warm Can Reduce Accuracy and Increase Sycophancy.” Nature, 652, 1159-1165

Changelog

2026-05-20: Initial release