tsma – Regression Defense Line for Legacy Code

How Do You Refactor Code with No Tests?

You inherited a 100,000-line legacy codebase. There are no tests. You want to refactor, but touching anything might break something. Writing tests requires understanding the code, and understanding the code requires documentation – which doesn’t exist either.

Nobody touches it. It rots further.

Every legacy codebase in the world is stuck in this deadlock. 60-80% of Fortune 500 IT budgets go to legacy maintenance. 42% of developer time is spent dealing with technical debt.

What if an LLM could write the tests for you?


The Problems When You Hand Tests to an LLM

Ask an LLM to “write tests for this function” and it produces something. The problems are threefold.

First, it doesn’t know where to start. When there are 527 functions, do you go in order from #1? Start with the most critical? There’s no criterion.

Second, you can’t verify test quality. The LLM’s tests pass. But are they actually verifying the function’s behavior, or are they empty shells that just call the function with no assert? You’d have to read each one manually to know.

Third, without feedback, LLM tests plateau at 60-70%. Just saying “test this function” won’t reach 100% branch coverage. You need to tell it which branches are missing so it can fill the gaps.

It’s not that LLMs can’t write tests. The problem is the absence of a structure that tells the LLM what to write and how well it wrote it.


tsma: A Test Rail Driven by One Command

tsma is a CLI tool that indexes every function in a project, detects test presence, measures coverage, and gives precise feedback to LLM agents.

The agent needs to know exactly one command:

$ tsma next

This single command drives the entire loop:

$ tsma next          # Shows the next untested function
  → Write a test
$ tsma next          # Detects the new test, runs it, measures coverage
  → 100%? PASS, move to the next function
  → <100%? Reports uncovered branches with line numbers
$ tsma next          # Re-measures the revised test
  → Whether improved or not, marks DONE and moves on

Repeat until “All functions complete!” appears.


Validated on 527 Functions

tsma was applied to a real Go project with 527 functions.

ResultCountRatio
PASS (100% branch coverage)24646.7%
DONE (best-effort)28153.3%
TODO (unprocessed)00%

246 functions reached 100% branch coverage. The remaining 281 did not reach 100%, but tests were written to the extent possible.

Why can’t some functions reach 100%?


Functions That Reach 100% and Those That Don’t

Whether a function can reach 100% branch coverage depends on how it receives its dependencies.

Interface (mockable) – 100% achievable:

type Handler struct {
    svc AuthSvc              // interface -- replaceable with a mock
}

Inject a mock in tests and you can control every path:

svc := mocks.NewMockAuthSvc(ctrl)
svc.EXPECT().Login(...).Return(result, nil)   // success path
svc.EXPECT().Login(...).Return(nil, err)      // failure path

Concrete type (not mockable) – 100% impossible:

type Handler struct {
    svc *service.SMSImportService    // struct pointer -- not replaceable
}

The real implementation runs with internal dependencies on databases, external APIs, etc. You can’t force specific errors or specific return values. Branches that depend on those results are unreachable by unit tests.

tsma’s response: After uncovered-branch feedback, it tries once more. If the branches are still unreachable, it accepts DONE. This isn’t a tool limitation – it reflects the code’s testability. Introducing interfaces (DI) would make 100% possible, but that means modifying the original code.


Feedback Dramatically Transforms LLM Tests

tsma’s core value isn’t indexing or coverage measurement. It’s telling the LLM exactly which branches are uncovered, by line number.

Without feedback:

"Write tests for the ListContracts function"
→ LLM tests only the happy path
→ Coverage 60-70%

With feedback:

"Write tests for the ListContracts function"
→ Coverage 65% (11/17)
→ UNCOVERED:
    line 41 -- if params.Status != nil
    line 44 -- if params.BuildingId != nil
    line 70 -- if err != nil (CountSummary)
→ LLM adds tests covering exactly those branches
→ Coverage 100%

Same LLM. The only difference is the presence of feedback. Three lines of line numbers separate 60% from 100%.


Progress Survives Even When the Agent Dies

LLM agents crash. Token limits, network errors, session drops. You can’t process 527 functions in a single session.

tsma persists progress to .tsma/session.json.

$ tsma status

527 functions
PASS:  246 (46.7%)
DONE:  281 (53.3%)
TODO:    0 (0.0%)

If the agent dies at function #200? A new agent runs tsma next and picks up from #201. session.json is the checkpoint.

Multiple agents can take turns without conflicts. Each function is atomic.


The Session Is a Cache; Source Files Are the Truth

One of tsma’s design principles: the session is a cache, and source files are the source of truth.

If you delete a test file, even if session.json records it as PASS, that function reverts to TODO. The session never drifts from reality.

Principle:
  Even if session.json says "PASS"
  If the test file is missing → TODO
  If the source file changed → re-measurement target

Instructions for the LLM Agent

The agent needs exactly 6 lines of instructions:

1. Run tsma next
2. If TODO -- read the function and write a test
3. If the test fails -- read the error and fix the test
4. If uncovered branches are shown -- add tests covering those branches
5. If PASS/DONE -- the next function is shown automatically
6. Repeat until "All functions complete!" appears

The only command the agent needs to know is tsma next. The CLI constrains the rest.


Trains and Tracks

Vibe coding is a train. It’s fast. But without tracks, it derails.

Every AI coding tool is focused on making the train faster. Bigger models, smarter agents, better prompts. But the faster the train goes, the worse the derailment.

tsma is the track. The LLM generates tests (Neural), and the CLI defines “this far and no further” (Symbolic Constraint). The LLM’s creativity stays intact, but the quality of results is enforced by the machine.

Beforetsma
Test writingHuman (slow) or LLM (chaotic)LLM writes, CLI verifies
Where to start?Human decidesCLI determines order
Quality checkHuman reviewsCLI measures coverage
FeedbackNoneUncovered branch line numbers
Progress trackingNonesession.json automatic

The LLM generates freely. But it runs only on the track called tsma next.


Language Support

LanguageIndexerTest RunnerCoverage
Gogo/astgo testgo test -coverprofile
TypeScriptregexnpx vitest / npx jestc8 / istanbul
Pythonregexpytestcoverage.py

Go uses an AST parser for precise function extraction. TypeScript and Python use regex-based extraction.

Generated files (*_gen.go, *.pb.go), test files, and vendor/node_modules are automatically excluded from indexing.


Installation and Usage

make install
cd your-legacy-project
tsma next

That’s all.

MIT License. github.com/park-jun-woo/tsma