Building Agent-Operable Systems

Building Agent-Operable Systems Image generated by Google Gemini

If you asked an AI agent to refactor and it broke the app, if you want to transform a legacy system into an environment where AI can work, if you want to convert Fortune 500 legacy maintenance budgets into transformation budgets – this post is that roadmap.

Locked Memory

During the IT bubble, corporations began accumulating digital assets. Code, databases, specifications, documents, APIs — decades of corporate memory.

That memory was locked. Unsearchable, unverifiable, unreachable. The only way in was for a human to read it, understand it, and modify it by hand. That is why 60–80% of Fortune 500 IT budgets go to maintaining this locked memory. They cannot open it, so they just guard it.

We are in the middle of what is called the AI bubble. The real meaning of this era is not that models are getting smarter. It is that corporations’ long-locked legacy memory is becoming reachable.

But not yet. In 2026, AI agents write code. One burns millions of tokens over 68 minutes, finishes a refactor. The app is completely broken. Stories like this show up on X every day.

Why does this keep happening?

Not because the agent is stupid. Because the environment is not built for agents to work in. You do not put a robot in a human office. You build a factory where robots can work.

To unlock locked memory, the memory must first be transformed into a shape that can be opened. This is not just a code problem. Databases, specifications, documents, APIs — the entire digital estate is opaque to agents.

What Agent-Operable Means

For an agent to work autonomously, three conditions must hold:

1. It must be readable — without noise

To find one function in a 2,000-line file, 1,950 lines are noise. To find customer data in an unnormalized database, you join three tables. Business rules buried in Excel sheets are invisible to agents.

Readable does not mean a human can read it. It means a machine can structurally parse it.

2. It must be verifiable — deterministically

If an agent cannot tell what broke after a change, it falls into a doom loop. Code needs tests. Databases need constraints. APIs need schema validation. Specifications need cross-verification.

Having an LLM verify another LLM is like asking your drunk friend “Am I drunk?” go test does not hallucinate. A CHECK constraint does not lie. JSON Schema does not drift.

3. State must persist — even when the agent dies

Agents will crash. Token limits, network errors, session drops. If progress is not saved, every run starts from zero.

When Agent A processes up to function #200 and dies, Agent B must pick up at #201. Agents are disposable. Progress must accumulate.

Step Zero: Freeze the Bugs

The three conditions are the destination. The starting point is different. No documentation, no tests, 300 files at 2,000 lines each. That is the starting point.

Tell an agent to “refactor this” in that state, and what happens? It “fixes” a ten-year-old bug. The problem is, that bug is not a bug.

Hyrum’s Law: every observable behavior of a sufficiently old API has someone depending on it. A decimal-rounding error left alone for a decade has a VIP customer’s payment logic wired to it. A date-parsing bug spawned an Excel macro that holds together the entire accounting department. Old bugs are implicit business specifications.

The agent’s first job is not to fix code. It is to freeze current behavior.

Poke the API. Record the response. Pin that response as a Hurl test. Bizarre bug or intended behavior — no distinction. Freeze it as-is. This is the first tooth of the ratchet — it locks the agent out of “improving” things on its own.

Changes are decided only by the person who holds the spec. The agent is an executor. Not a judge.

Once the freeze is done, the transition toward the three conditions — readability, verifiability, persistence — begins.

Not Just Code

“Agent-operable codebase” is the starting point. A corporation’s digital assets are not just code.

Asset	Current State	Agent-Operable State
Code	2,000-line files, no tests	One concept per file, tests for every function
Database	Unnormalized, undocumented	Declarative DDL management, auto-generated migrations
Specs	Wiki, verbal handoffs, drift	9 SSOTs cross-verified, chained by a single identifier
Documents	Rules buried in PDFs and Excel	Schema-extracted, machine-readable
API	Undocumented, implicit contracts	OpenAPI capture, schema validation

Individually, each row looks like “we should tidy up.” Together, they form a system.

Symbolic Feedback Loop

A common structure makes this transition possible.

LLM generates → deterministic tool judges → result fed back to LLM → repeat

In code, in tests, in specs, in data — the same loop operates:

Code structure:     filefunc validate → violation feedback → LLM fixes → repeat
Tests:              go test + coverage → uncovered-line feedback → LLM augments → repeat
Spec consistency:   yongol check → drift feedback → LLM fixes → repeat
User rules:         rulecat evaluate → violation feedback → LLM fixes → repeat

The only thing the LLM does is generate. What to fix, whether it passed, what comes next, whether it is done — all decided by machines. The LLM gets no decision-making authority.

This is not an invention. C. elegans dedicates 60 of its 302 neurons (20%) to sensory input. To verification, not generation. Five hundred million years of evolution reached this conclusion: improving feedback quality beats adding more neurons for survival.

The industry is making the train (the model) faster. Bigger models, smarter agents, better prompts. But the faster the train, the more the rails matter.

80/20

In the final state, the system splits into two layers.

SSOT (80–90%)
├── OpenAPI, DDL, SSaC, FuncSpec, Rego, Hurl, React TSX, Mermaid, manifest
└── Generated from specs. Drift eliminated at the source. Agents modify freely.

Custom (10–20%)
├── Business rules, domain logic, legal/policy calculations
└── Structured with filefunc, tested with tsma. Humans review.

The code that humans actually need to care about compresses to 10–20%. The rest is generated by agents reading specs, verified by machines.

The Fortune 500’s 60%

60–80% of enterprise IT budgets go to legacy maintenance. 42% of developer time goes to technical debt. 70% of digital transformation projects fail to meet their goals.

This budget is already being spent. No new budget is needed. Just redirect it. Turn maintenance budgets into transformation budgets.

Feed in legacy whole, and an agent-operable system comes out. That is the promise of Building Agent-Operable Systems.

Why Big Tech Won’t Do This

Anthropic and OpenAI build general-purpose models. Improve a model by 10% and it applies to every customer. But build a Go test feedback loop and it only applies to Go developers. Build a Python coverage tool and it only applies to Python projects.

Symbolic verification is inherently domain-specific. Every language, every framework, every spec requires a different verifier. No generality means it does not fit big tech’s ROI.

That is why this space is empty. The people building the train and the people laying the rails are not competitors. They are complements.

Questions

Your agent writes code. But who checks whether that code is correct?

Another agent? Or go test?

Does your LLM actually read all 100,000 lines?

Or does it pretend to?

What the agent era needs is not smarter agents. It is systems where agents can work.

Sources

Gartner, “IT Budget and Cost Optimization” — 60–80% of enterprise IT budgets consumed by legacy maintenance
Stripe & Harris Poll, The Developer Coefficient (2018) — 42% of developer time spent on technical debt
McKinsey & Company, Why do most transformations fail? (2019) — ~70% of digital transformation projects fall short of goals
Hyrum Wright, Hyrum’s Law — “With a sufficient number of users of an API, all observable behaviors of your system will be depended on by somebody”
Winters, Manshreck, Wright, Software Engineering at Google (O’Reilly, 2020) — Formal book source for Hyrum’s Law
White et al., “The structure of the nervous system of C. elegans”, Phil. Trans. R. Soc. Lond. B 314 (1986) — 302-neuron connectome
Inglis et al., The sensory cilia of C. elegans, WormBook (2007) — 60 sensory neurons (~20% of total)
METR, Early-2025 AI Developer Productivity Study (2025) — AI tools made experienced developers 19% slower, yet developers perceived 24% speedup
GitClear, AI Copilot Code Quality 2025 (2025) — 211M lines analyzed, refactoring down 60%, copy-paste code up 48%
Mehtiyev & Assuncao, Beyond Resolution Rates (2026) — 19 agents, 9,374 trajectories; 12.4% of total compute spent on zero-yield tasks

Changelog

2026-05-27: Initial release