Class 10. Law of Data — Agent Operable Data

Class 10 Image: AI generated

Quick Tips — Just Know This and You Can Command AI

We structured code (Class 8) and systems (Class 9). What remains is data. Data is the most dangerous. When code is wrong, tests catch it. When the system is wrong, /health catches it. When data is wrong, nobody knows. It’s discovered 3 months later in a quarterly report.

To the agent: “Make the schema with explicit columns and constraints instead of JSONB. amount must be greater than 0, status must only accept defined values.”

Putting anything into JSONB means 100 different formats mixed together after 6 months. Explicit columns and constraints are data’s law. The DB immediately rejects constraint violations.

To the agent: “Import this Excel into the DB. DDL constraints must be respected. Report rows that violate constraints in a separate file.”

The agent executes the conversion, and DB constraints validate. Rejected rows are reported with reasons. You only check the rejected data.

To the agent: “Modify the DDL and pass yongol validate. Generate migration files, rollback on failure.”

The ratchet works even when schemas change. Pass and proceed to next step, fail and revert.

All you do as someone who doesn’t know DB is decide “what to store.” “Phone numbers must start with 010,” “emails must be unique” — speak these decisions in natural language and the agent translates them to DDL.

Hands-on Try

You can try without a DB. In Claude Code:

“Create CSV data: 10 customers with name, email, phone, signup date. Intentionally mix in problems: 2 with bad email format, 1 with empty phone, 1 with future signup date.”

Once CSV is generated:

“Find the problematic rows in this CSV.”

See how many AI finds. Most likely it won’t find all of them — some rows slip through with “looks fine.”

Now:

“First define a schema for this CSV. Email must contain @, phone is NOT NULL, signup date is before today. Validate again with that schema.”

With a schema declared first, AI mechanically catches everything. Ask for opinions and it misses; give rules and it catches — the principle from Class 7 applies identically to data.

Why You Need to Command This Way

Introduction: Data Corrupts Before Code

We structured code (Class 8). We structured the system (Class 9). What remains is data.

But data is fundamentally different from code or systems.

When code is wrong, tests catch it. Run go test and in 1 second “here’s what broke.” When the system is wrong, monitoring catches it. /health returns 500 and an alarm sounds immediately.

When data is wrong, nobody knows.

A customer phone number should start with 010 but someone entered one starting with 02. An order amount is negative. Delivery status is “shipping” but the shipping date is null. These errors aren’t caught by tests. Not caught by monitoring. Discovered 3 months later in a quarterly report: “why is revenue negative?”

Imagine building an app with vibe coding. “Make order management app” produces code fast. Users enter data. The agent adds features and data formats change. Migrations don’t go properly, old and new data get mixed. Code is fine but data is corrupted.

Code drift is visible. Data drift is invisible.

This is why data is more dangerous than code.

Three Types of Data Corruption

Common data corruption in vibe coding falls into three types.

1. No Schema — Trading Without a Contract

If you just tell the agent “make an order table”:

-- If you make it like this
CREATE TABLE orders (
    id SERIAL,
    data JSONB    -- anything goes in here
);

These three lines are the root cause of 100 different formats mixed together after 6 months.

JSONB columns accept anything. Convenient at first. 6 months later, 100 different formats mixed together. Some orders have amount, others have price. Some are numbers, some are strings. For agents to handle this data, they must guess 100 formats.

2. Migration Failure — Past vs Present Collision

You tell the agent “add email field to user table.” Agent modifies DDL and runs migration. New users have email. Existing 100,000 users have email as null. Code assumes email always exists. Existing users log in and get 500 errors.

3. Business Rule Violation — Data That Shouldn’t Be Allowed

“Discount rate must be between 0-50%.” If this rule exists only in code, the agent can eliminate it during refactoring. Without CHECK (discount >= 0 AND discount <= 50) constraint in DB, a 200% discount goes in and nobody knows. Discovered 3 months later during settlement.

4 Conditions for Agent Operable Data

Four conditions are needed for agents to safely handle data.

Condition 1. Schema Is Declared — DDL Is the Contract for Data

Below is a database blueprint (DDL). It looks like programming language, but each line is one rule. You don’t need to read it. Each line is explained right below.

CREATE TABLE orders (
    id          BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_id BIGINT       NOT NULL REFERENCES customers(id),
    amount      DECIMAL(12,2) NOT NULL CHECK (amount > 0),
    status      TEXT         NOT NULL DEFAULT 'pending'
                             CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    created_at  TIMESTAMPTZ  NOT NULL DEFAULT now(),
    shipped_at  TIMESTAMPTZ,
    CONSTRAINT  shipped_requires_date 
                CHECK (status != 'shipped' OR shipped_at IS NOT NULL)
);

Let’s read this DDL. Even non-agents can read it.

amount must be greater than 0 — negative orders impossible
status must be one of 5 — arbitrary values like “processing” are rejected
shipped_at must exist when status is ‘shipped’ — prevents “shipping but no ship date”
customer_id must exist in customers table — prevents ghost customers

DDL is the contract for data. Types, constraints, relationships are explicit. Agents don’t need to guess interpretations. Rules are declared in the DB.

Condition 2. Transformations Are Verifiable

Data transforms. From CSV to DB. From one table to another. From raw data to reports.

Transformation rules must be declarative and results mechanically checkable.

# Unverifiable transformation
Tell agent: "Put this Excel in the DB"
→ Agent maps columns on its own
→ 3,000 of 100,000 rows wrongly mapped
→ Nobody knows

# Verifiable transformation
Tell agent: "Put this Excel in the DB.
Mapping rules per transform.yaml.
After import, compare row counts,
verify amount totals match the original."

Write transformation rules in a declaration file and verify invariants before and after. This is verifiable transformation.

yongol’s DDL → sqlc chaining is an example of this principle. Declare schema in DDL, sqlc generates type-safe Go code. If drift between DDL and sqlc occurs, yongol validate catches it. Verified end-to-end from schema to code.

Condition 3. Source and Timestamp Are Tracked

“When, where, and why was this data created?”

Machines must be able to answer this question.

CREATE TABLE orders (
    ...
    source      TEXT         NOT NULL,   -- 'web', 'api', 'import', 'migration'
    created_at  TIMESTAMPTZ  NOT NULL DEFAULT now(),
    created_by  BIGINT       REFERENCES members(id),
    updated_at  TIMESTAMPTZ,
    updated_by  BIGINT       REFERENCES members(id)
);

Source, timestamp (created_at, updated_at), and actor (created_by, updated_by) are recorded in DB. When the agent investigates “why is this order negative?”:

SELECT source, created_at, created_by FROM orders WHERE amount < 0;
-- source: 'import', created_at: 2026-02-15, created_by: NULL

“Negative amounts came from data imported on Feb 15, with no actor recorded.” With this information, the cause can be traced. Without it, it’s a mystery.

Just as whyso tracks “why” for code, data’s “why” must also be tracked. Code has whyso. Data has source/timestamp columns for that role.

Condition 4. Ratchet Applies to Data Changes Too

Ratchet Pattern from Class 6 doesn’t apply only to code. It applies to data too.

Migration ratchet:

Schema change request
→ DDL modification
→ yongol validate (cross-validation passes)
→ Migration file auto-generated (up + down)
→ Apply to staging DB
→ Verify existing data integrity
→ Pass → Apply to production (approval gate)
→ Fail → Rollback with down file

yongol already implements this. yongol generate detects DDL changes and auto-generates migration files. Up and down files come in pairs. No irreversible migrations.

What the ratchet guarantees: If migration succeeds, proceed to next step. If fails, revert to previous state. Never stops midway. Same principle as code’s ratchet.

Schema Is the Law I Establish

Here the philosophy running through the entire course appears.

In Class 5 we learned “constraints are contracts.” The three conditions of rule of law — verifiable, violation is defined, enforceable — apply identically to code.

In data, this principle manifests more directly.

Databases have schemas.

Schemas define what valid data is and what isn’t. NOT NULL, FOREIGN KEY, CHECK — data must pass these constraints to be stored. Regardless of who inserts the data. Whether human, program, or AI — if it satisfies the schema, it enters; if not, it’s rejected. A pattern that’s worked since 1970.

Schema is law. Law I establish.

Let’s map rule of law principles again:

Rule of Law	Data Schema
Verifiable	`CHECK (amount > 0)` — DB verifies automatically
Violation is defined	NOT NULL violation, FOREIGN KEY violation — discrete
Enforceable	INSERT is rejected on violation

This connects to the author’s worldview:

Law is not justice (正義) but definition (定義).

Law doesn’t guarantee justice. Schemas don’t guarantee data’s “truth” either. But law guarantees definition. Schemas guarantee validity.

This minimal guarantee — knowable in advance, mechanically verifiable, violations are rejected — is what humanity spent thousands of years winning in blood, and what databases have proven over 50 years.

Data without schema is a society without law. Anyone can put any data in. Wrong data goes unnoticed. Discovered 3 months later.

Data with schema is a rule-of-law society. Break the rules and you’re immediately rejected. Reasons are stated. You can fix and retry.

From Unstructured to Structured

Most real-world data is unstructured.

Excel files — different formats per sheet
Call recordings — audio files
Meeting notes — free-form text
PDF documents — semi-structured
Email — natural language

For agents to handle this data, structuring must come first.

Unstructured data → Decide schema → Transform → DB

Excel       →  Declare DDL   → import  → PostgreSQL
Recordings  →  STT           → Structurize → Summary DB
Notes       →  Parse         → Extract action items → Task DB
PDF         →  OCR + Parse   → Extract fields → Document DB

Notice the key: Humans decide the schema (structure), agents execute transformation.

“Separation of decisions and implementation” from Class 5 applies identically here.

Decision (human): “Customer info needs name, phone, email, signup date. Phone must start with 01.”
Implementation (agent): Read Excel, map columns, put data matching constraints into DB.

Agent executes transformation, DB constraints validate. Constraint-violating data is rejected with reasons reported. Human checks rejected data and decides: fix and re-insert, or discard.

Data Validation Pattern: 3-Layer Defense

Data validation isn’t one layer but three layers of defense.

1st Defense — DB Constraints (Most Powerful)

NOT NULL           -- Prevent empty values
UNIQUE             -- Prevent duplicates
CHECK (amount > 0) -- Range restriction
FOREIGN KEY        -- Referential integrity
DEFAULT            -- Guarantee defaults

DB constraints are unbypassable. Whatever code you write, whatever agent you use, data violating constraints doesn’t enter. That’s why it’s the 1st defense. The most important rules must be declared as DB constraints.

2nd Defense — Business Rules (Rego)

Some rules can’t be expressed as DB constraints. “Discounts over 30% require manager approval,” “more than 3 orders per day from the same customer is a suspicious transaction.” These rules are declared in Rego.

You don’t need to read this either. State rules in natural language and the agent translates to Rego:

# Order validation rules
deny[msg] {
    input.order.discount > 30
    not input.approver.role == "manager"
    msg := "Discounts over 30% require manager approval"
}

warn[msg] {
    count(input.customer.orders_today) >= 3
    msg := "Same customer 3+ orders today — verify for suspicious transaction"
}

In natural language:

“Discount over 30% and manager didn’t approve → deny” = first rule
“Same customer ordered 3+ times today → warn” = second rule

Rego rules are one of yongol’s SSOTs. Class 4’s cross-validation works here too: if SSaC declares @auth, Rego must have a corresponding rule. If not, yongol validate catches it.

3rd Defense — Migration Ratchet

Verifies that existing data is compatible with new schema when schema changes.

Three defense lines’ division of responsibility:

Defense	Handles	On violation
1st: DB constraints	Data integrity	INSERT/UPDATE rejected
2nd: Rego rules	Business logic	Warning or block
3rd: Migration ratchet	Schema evolution	Rollback or backfill

Same Pattern, Different Domains

See the common pattern across Classes 8, 9, 10?

	Class 8: Code	Class 9: System	Class 10: Data
Readable?	filefunc (1 file 1 concept)	/health (structured JSON)	DDL (declarative schema)
Verifiable?	go test + tsma	CI/CD + health check	DB constraints + Rego
Reversible?	git revert	Previous image rollback	migration down
Progress persists?	session.json (tsma)	Terraform state	migration history
Decisions separated from implementation?	SSOT → code generation	Declarative config → execution	Schema → data

All the same structure:

Declare, verify, lock, persist.

The principle that works in code works identically in systems and data. Not a new invention. Applying the same principle to new domains.

“Constraints are contracts” from Class 5 runs through all three domains:

Code’s contract: filefunc 22 rules, yongol 287 cross-validation rules
System’s contract: Docker Compose, Terraform, CI/CD pipelines
Data’s contract: DDL constraints, Rego rules, migration ratchet

When reasonable constraints are verifiable, violations are defined, and enforceable — any domain converges.

DDL → sqlc → Code: Seamless Chaining

Let’s see concretely how data chaining works in yongol.

You don’t need to read these codes either. Understanding the flow — from DDL all the way to auto-generated code — is sufficient.

1. Declare schema in DDL
2. Declare queries in sqlc
3. yongol validate cross-validates
4. yongol generate produces type-safe Go code

Starting from schema (DDL) to code generation, there’s no gap for human interpretation. DDL changes → sqlc changes → generated code changes → tests catch. This is the structure where drift doesn’t occur in data-driven development.

The Vibe Coder’s Data Practice

“I don’t know DB — how do I do this?”

You don’t need to know. Agent writes the DDL. All you do is decide what to store.

To the agent: "Make a customer management table.
- Need name, phone, email, signup date
- Phone must start with 010
- Email must be unique
- Signup date auto-populated
Make it as DDL with constraints."

Agent writes DDL. yongol validate cross-validates with other SSOTs. When it passes, migration is generated. Even without reading DDL, you can make the decision (“phone must start with 010”).

Decisions in natural language. Realization of decisions in DDL. Verification of DDL by machine.

Truth Vanishes at the Speed of Light

Here we draw one final philosophy from this course.

Physics tells a cold fact. The moment an event happens, its truth vanishes at the speed of light. The moon 1 second ago is the moon from 1.3 seconds ago. A galaxy 10 billion light-years away is its appearance from 10 billion years ago.

Truth physically vanishes. What remains are only claims — fragments of truth.

“I saw this.” “This measurement read this.” “This source said this.” — All claims. Claims with sources, timestamps, and reliability.

Data is the same. “Order amount 50,000 won” in the DB isn’t truth. It’s a claim. A claim someone put in at some time through some path. That’s why source and timestamp matter. Data without source is a claim without evidence. Data without timestamp is a newspaper without a date.

What schema does is give structure to claims. “This claim must be in this form, must satisfy these constraints, must come from this source.” This structure is data’s law.

What has no source is not my data. What has no timestamp is not my record. What has no schema is not my system’s data.

Class 10 Vision: Speak and It’s Built — Code, System, Data

In Class 1 we started here.

“Make a todo list app.”

Code appeared. Worked up to 3 features. Crumbled at 5.

At the end of Class 10, where we stand:

Class 1's world:
  "Make an app" → Code appears → Crumbles at 5 features

Class 10's world:
  "Make an order management SaaS"
  → Decisions: Define schema, declare features, declare rules
  → Code: yongol generates from SSOT (Class 8)
  → System: CI/CD automates build-deploy-monitor (Class 9)
  → Data: DDL enforces schema, Rego validates rules (Class 10)
  → Approval: Human just presses "approve"
  → Doesn't crumble even at 200 endpoints

Class	What we learned	Result
1	Vibe coding’s present	“Speak and code appears”
2	Why it crumbles	Drift, context evaporation, sycophancy
3	How to prevent	Hurl, Git, CI/CD
4	The 200-endpoint wall	yongol — declarative SSOT
5	AI with reins	Reins Engineering 3 pillars
6	Lock and progress	Ratchet Pattern — one-directional ratchet
7	Reverse-engineer sycophancy	IFEval — feedback creates convergence
8	Structure the code	filefunc + tsma — Agent Operable Codebase
9	Structure the system	4 conditions — Agent Operable System
10	Structure the data	Schema is law — Agent Operable Data

Code → System → Data. The same principle works across all three domains.

Declare, verify, lock, persist. Decisions by humans, implementation and verification by machines. Not rule by man but rule of law.

From Class 1’s “make a todo list app” to Class 10’s “make an order management SaaS” — what changed isn’t model size. It’s structure. We put reins on the agent, laid tracks, and established law.

Speak and it’s built. Not just code, but system and data too. For that to be possible, there must be reins, there must be tracks, there must be law. Designing those reins, tracks, and law is Reins Engineering.

You started this course unable to read a single line of code. Having finished Class 10, what changed isn’t that you can now read code. You now know what to tell agents, why to tell them, and how to verify their reports. This is the capability of a decision-maker.

Reins Engineering Full Course

Class	Title
Class 0	Install Claude Code
Class 1	How to Command AI
Class 2	How to Distrust AI
Class 3	Apps That Don’t Break
Class 4	Decisions Outside Code
Class 5	AI with Reins
Class 6	Pass Then Lock
Class 7	Flipping Sycophancy
Class 8	The Agent’s Factory
Class 9	Automation Beyond Code
Class 10	The Law of Data
Class 11	How to Rescue Failed Vibe Coding

Sources

Stanford, “Lost in the Middle: How Language Models Use Long Contexts” (2024) — 30%+ performance drop when relevant info buried in context middle (re-referenced from Class 8)
Amazon, “Context Length Alone Hurts LLM Performance” (2025) — 13.9-85% performance drop even with whitespace tokens (re-referenced from Class 8)
E.F. Codd, “A Relational Model of Data for Large Shared Data Banks” (1970) — Relational database model, theoretical foundation for schema-based data integrity
OPA (Open Policy Agent) / Rego — Declarative policy language for verifying business rules outside code
yongol DDL → sqlc chaining — Seamless cross-validation structure from schema to type-safe code
Rule of Law principle — Three conditions of verifiability, violation definition, and enforceability apply identically to code/system/data
“Law is not justice (正義) but definition (定義)” — Digital rule of law philosophy, presenting schema as the analogue of law
“Truth vanishes at the speed of light” — Foundation for data source/timestamp tracking from limitations of physical observation