Nine cycles of red-team → fix → regression-test

A self-improving system that doesn't audit itself is a vibes-driven research demo. This is the audit log — every cycle, parallel red-team agents attacked the previous cycle's fixes, the findings landed in code with regression tests, and the next cycle usually found the previous one had fixed the bug at the wrong abstraction layer.

9 cycles 99 findings · 0 open 92 → 213 tests 4 architectural pivots

The numbers

99 findings. 96 closed in code, 3 deferred, 0 open.

Each cycle's critique lives at brain/critiques/ with reproducible attacks and file:line evidence; each cycle's shipped state is snapshotted under brain/snapshots/. Test count grew 92 → 213 (+121) across the nine cycles. The three deferrals — E8, H2, H11 — each carry an explicit Phase-1 plan.

Cycle	Findings	Critical	High	Medium	Low	Deferred	Tests after
1 · pre-impl	18	2	6	7	3	0	52
2 · post-impl	13	2	6	3	2	0	78
3 · deltas	11	2	4	4	1	0	114
4 · fixes	11	2	4	4	1	0	141
5 · fixes	11	3	3	4	1	0	165
6 · fixes	10	3	3	3	1	1	186
7 · fixes	7	1	2	3	1	0	195
8 · fixes	6	2	2	2	0	0	202
9 · fixes	12	4	3	2	3	2	213
Total	99	21	33	32	13	3	—

Since the nine cycles, two further work items — the distributional-grader stack and a cross-process event-log chain-integrity fix — brought the suite to 246 passing (4 skipped). Those are feature and bug work, not red-team cycles, so they don't appear in the table above.

Cycle by cycle

What the loop learned, one cycle at a time.

1pre-impl

The pre-implementation red team

Found 18 architecture-level issues before any code existed — F2 (TOCTOU between Auditor sign and promote), F1 (mock-vs-mock self-confirmation). Foundational invariants went into the design rather than getting discovered at runtime.

test bar set at 52

2post-impl

The post-implementation red team

13 new findings against the implementation — P1 (fake-PROMOTED ghosts after a failed promote), P2 (stale-hash semantics), P5 (patterns defeated by hyphen substitution). All critical and high closed.

tests 52 → 78

3deltas

The cycle-2 fixes became the attack surface

C1 (canon regression — one canonical form wasn't enough; needed a strip+space dual form), C2 (budget-tracker single-process lock insufficient when run in parallel).

tests 78 → 114

4fixes

C-class fixes landed

The TOCTOU window in _build_diff got closed with a frozen live_snapshot.

tests 114 → 141

5fixes

Dual-form fallout

D1 (dual-form canonicalization false positives), D2 (a shallow snapshot let mutation through), D3 (sibling-reservation race in the budget tracker).

tests 141 → 165

6fixes

An embarrassing one

E1 — a bare except: swallowed a NameError in the cycle-5 D4 fix, silently leaking VERIFIED candidates after every TierMismatch for the whole cycle. Lesson written in stone: regression tests must assert the remediation effect, not just surface events.

tests 165 → 186 · E8 (Pydantic frozen-dict) deferred

7fixes

Another embarrassing one

F1 caught a _wrapped name-mangling escape that the cycle-6 docstring had explicitly acknowledged and shipped anyway. Lesson written in stone, again: a docstring saying "known limit" must be followed by # TODO(cycleN): fix.

tests 186 → 195

8pivot

architectural pivot #1

Stop fighting subclass attacks; constrain the wrap site

G1 was the third generation of the same _wrapped exposure attack (C4 → E4 → F1 → G1). Each prior cycle added another guard to _BackendTracker.__init_subclass__; each got bypassed by the next escape — literal name, mangled name, post-hoc setattr, metaclass. Cycle 8 dropped the subclass guard and added a wrap-site type(self.backend) is _BackendTracker check at Daemon.__init__.

tests 195 → 202

9pivot

architectural pivot #2

The chokepoint was one layer down — and a humbling rerun of pivot #1

Twelve findings, four critical. H1 found cycle 8's wrap-site check was at the wrong layer — Coordinator, not Daemon, is the actual chokepoint, and three production scripts bypassed the Daemon entirely. H3 found the check ran only once in __init__; a later setattr swapped the wrapper silently. H4 found cycle-8's pre-consolidate snapshot was racy. Fixes: chokepoint relocation, property setters that re-validate on every assignment, and a writer_iteration_id stamp on every candidate. H2 and H11 honestly deferred — in-process pure-Python guards have a ceiling.

tests 202 → 213

Two patterns that kept reappearing

The same two lessons, across four architectural pivots.

pattern 1

Identity beats timing

Cycle 7 narrowed TierMismatch cleanup to "walk entry.tier" — but the buggy-Librarian threat model is exactly "wrote to one tier and lied about it." Cycle 8 added a pre-consolidate snapshot to identify "what this iteration introduced" by id-set difference. Cycle 9 found that has its own race window.

The fix that ended the arms race wasn't a wider snapshot — it was a writer_iteration_id field on the candidate itself, stamped atomically at write_candidate time. Ownership belongs on the data, not in the timing.

pattern 2

Chokepoints beat perimeters

Cycle 4 captured the wrapped backend in a closure. Cycles 6 and 7 found subclass and name-mangling escapes; the guard kept widening. Cycle 8 dropped the subclass guard for a wrap-site type check. Cycle 9 found three scripts bypassed that wrap site entirely.

The fix moved the check into Coordinator.__init__ — the real chokepoint every LLM call passes through — with property setters that re-validate on assignment. Constrain the finite chokepoint; don't perimeter-fence an infinite surface.

Two honest deferrals from cycle 9

Where in-process guards hit their ceiling.

deferred → Phase 1

H2 · closure-cell mutation

fn.__closure__[i].cell_contents = EvilBackend() swaps the wrapped backend while type(self.backend) is _BackendTracker still passes — reproduced live with 100 unmetered calls in the cycle-9 critique.

In pure Python, any in-process attacker with code-execution rights defeats any in-process guard. The wrapped-backend invariant is for honest code paths that forget to wrap (cycle 9 found three). For adversarial in-process attackers the answer is process-level isolation — not an in-process guard.

deferred → Phase 1

H11 · POSIX unlink-during-lock race

A POSIX-only race where deleting the lock file mid-lock could let a second opener acquire a parallel lock. Not reproducible on Windows — the OS blocks the unlink.

Deferred to Phase-1 Linux verification with an xfail-shaped test marker, so the gap is tracked in the suite rather than forgotten.

Reading the trail · what it doesn't claim

How to follow it, and where it stops.

How to read it. Each cycle's critique is under brain/critiques/ with reproducible attacks; each cycle's shipped state is under brain/snapshots/. Cold-start entry point: open brain/BRAIN.html — the catalog, with clickable links to every cycle, plan, critique, snapshot, and source file.
Not a proof of safety. 99 findings closed and a regression test per finding are evidence of careful engineering — not certification of safe behavior under adversarial conditions. The H2 / H11 deferrals are explicit about exactly that.
The LLM backend is mock by default. Real Anthropic calls are gated behind --backend anthropic, an ANTHROPIC_API_KEY, cost ceilings, and WAL-backed budget metering.
"Self-improving" in Phase 0 means the infrastructure for self-improvement is working and tested. Real learning happens via scripts/burst.py on demand against the real backend.