Image generated by Google Gemini
If you asked an AI agent to refactor and it broke the app, if you want to transform a legacy system into an environment where AI can work, if you want to convert Fortune 500 legacy maintenance budgets into transformation budgets – this post is that roadmap.
Locked Memory
During the IT bubble, corporations began accumulating digital assets. Code, databases, specifications, documents, APIs — decades of corporate memory.
That memory was locked. Unsearchable, unverifiable, unreachable. The only way in was for a human to read it, understand it, and modify it by hand. That is why 60–80% of Fortune 500 IT budgets go to maintaining this locked memory. They cannot open it, so they just guard it.
We are in the middle of what is called the AI bubble. The real meaning of this era is not that models are getting smarter. It is that corporations’ long-locked legacy memory is becoming reachable.
But not yet. In 2026, AI agents write code. One burns millions of tokens over 68 minutes, finishes a refactor. The app is completely broken. Stories like this show up on X every day.
Why does this keep happening?
Not because the agent is stupid. Because the environment is not built for agents to work in. You do not put a robot in a human office. You build a factory where robots can work.
To unlock locked memory, the memory must first be transformed into a shape that can be opened. This is not just a code problem. Databases, specifications, documents, APIs — the entire digital estate is opaque to agents.
What Agent-Operable Means
For an agent to work autonomously, three conditions must hold:
1. It must be readable — without noise
To find one function in a 2,000-line file, 1,950 lines are noise. To find customer data in an unnormalized database, you join three tables. Business rules buried in Excel sheets are invisible to agents.
Readable does not mean a human can read it. It means a machine can structurally parse it.
2. It must be verifiable — deterministically
If an agent cannot tell what broke after a change, it falls into a doom loop. Code needs tests. Databases need constraints. APIs need schema validation. Specifications need cross-verification.
Having an LLM verify another LLM is like asking your drunk friend “Am I drunk?” go test does not hallucinate. A CHECK constraint does not lie. JSON Schema does not drift.
3. State must persist — even when the agent dies
Agents will crash. Token limits, network errors, session drops. If progress is not saved, every run starts from zero.
When Agent A processes up to function #200 and dies, Agent B must pick up at #201. Agents are disposable. Progress must accumulate.
Step Zero: Freeze the Bugs
The three conditions are the destination. The starting point is different. No documentation, no tests, 300 files at 2,000 lines each. That is the starting point.
Tell an agent to “refactor this” in that state, and what happens? It “fixes” a ten-year-old bug. The problem is, that bug is not a bug.
Hyrum’s Law: every observable behavior of a sufficiently old API has someone depending on it. A decimal-rounding error left alone for a decade has a VIP customer’s payment logic wired to it. A date-parsing bug spawned an Excel macro that holds together the entire accounting department. Old bugs are implicit business specifications.
The agent’s first job is not to fix code. It is to freeze current behavior.
Poke the API. Record the response. Pin that response as a Hurl test. Bizarre bug or intended behavior — no distinction. Freeze it as-is. This is the first tooth of the ratchet — it locks the agent out of “improving” things on its own.
Changes are decided only by the person who holds the spec. The agent is an executor. Not a judge.
Once the freeze is done, the transition toward the three conditions — readability, verifiability, persistence — begins.
Not Just Code
“Agent-operable codebase” is the starting point. A corporation’s digital assets are not just code.
| Asset | Current State | Agent-Operable State |
|---|---|---|
| Code | 2,000-line files, no tests | One concept per file, tests for every function |
| Database | Unnormalized, undocumented | Declarative DDL management, auto-generated migrations |
| Specs | Wiki, verbal handoffs, drift | 9 SSOTs cross-verified, chained by a single identifier |
| Documents | Rules buried in PDFs and Excel | Schema-extracted, machine-readable |
| API | Undocumented, implicit contracts | OpenAPI capture, schema validation |
Individually, each row looks like “we should tidy up.” Together, they form a system.
Symbolic Feedback Loop
A common structure makes this transition possible.
LLM generates → deterministic tool judges → result fed back to LLM → repeat
In code, in tests, in specs, in data — the same loop operates:
Code structure: filefunc validate → violation feedback → LLM fixes → repeat
Tests: go test + coverage → uncovered-line feedback → LLM augments → repeat
Spec consistency: yongol check → drift feedback → LLM fixes → repeat
User rules: rulecat evaluate → violation feedback → LLM fixes → repeat
The only thing the LLM does is generate. What to fix, whether it passed, what comes next, whether it is done — all decided by machines. The LLM gets no decision-making authority.
This is not an invention. C. elegans dedicates 60 of its 302 neurons (20%) to sensory input. To verification, not generation. Five hundred million years of evolution reached this conclusion: improving feedback quality beats adding more neurons for survival.
The industry is making the train (the model) faster. Bigger models, smarter agents, better prompts. But the faster the train, the more the rails matter.
80/20
In the final state, the system splits into two layers.
SSOT (80–90%)
├── OpenAPI, DDL, SSaC, FuncSpec, Rego, Hurl, React TSX, Mermaid, manifest
└── Generated from specs. Drift eliminated at the source. Agents modify freely.
Custom (10–20%)
├── Business rules, domain logic, legal/policy calculations
└── Structured with filefunc, tested with tsma. Humans review.
The code that humans actually need to care about compresses to 10–20%. The rest is generated by agents reading specs, verified by machines.
The Fortune 500’s 60%
60–80% of enterprise IT budgets go to legacy maintenance. 42% of developer time goes to technical debt. 70% of digital transformation projects fail to meet their goals.
This budget is already being spent. No new budget is needed. Just redirect it. Turn maintenance budgets into transformation budgets.
Feed in legacy whole, and an agent-operable system comes out. That is the promise of Building Agent-Operable Systems.
Why Big Tech Won’t Do This
Anthropic and OpenAI build general-purpose models. Improve a model by 10% and it applies to every customer. But build a Go test feedback loop and it only applies to Go developers. Build a Python coverage tool and it only applies to Python projects.
Symbolic verification is inherently domain-specific. Every language, every framework, every spec requires a different verifier. No generality means it does not fit big tech’s ROI.
That is why this space is empty. The people building the train and the people laying the rails are not competitors. They are complements.
Questions
Your agent writes code. But who checks whether that code is correct?
Another agent? Or go test?
Does your LLM actually read all 100,000 lines?
Or does it pretend to?
What the agent era needs is not smarter agents. It is systems where agents can work.
Sources
- Gartner, “IT Budget and Cost Optimization” — 60–80% of enterprise IT budgets consumed by legacy maintenance
- Stripe & Harris Poll, The Developer Coefficient (2018) — 42% of developer time spent on technical debt
- McKinsey & Company, Why do most transformations fail? (2019) — ~70% of digital transformation projects fall short of goals
- Hyrum Wright, Hyrum’s Law — “With a sufficient number of users of an API, all observable behaviors of your system will be depended on by somebody”
- Winters, Manshreck, Wright, Software Engineering at Google (O’Reilly, 2020) — Formal book source for Hyrum’s Law
- White et al., “The structure of the nervous system of C. elegans”, Phil. Trans. R. Soc. Lond. B 314 (1986) — 302-neuron connectome
- Inglis et al., The sensory cilia of C. elegans, WormBook (2007) — 60 sensory neurons (~20% of total)
- METR, Early-2025 AI Developer Productivity Study (2025) — AI tools made experienced developers 19% slower, yet developers perceived 24% speedup
- GitClear, AI Copilot Code Quality 2025 (2025) — 211M lines analyzed, refactoring down 60%, copy-paste code up 48%
- Mehtiyev & Assuncao, Beyond Resolution Rates (2026) — 19 agents, 9,374 trajectories; 12.4% of total compute spent on zero-yield tasks