Nine cycles of red-team → fix → regression-test

A self-improving system that doesn't audit itself is a vibes-driven research demo. This is the audit log — every cycle, parallel red-team agents attacked the previous cycle's fixes, the findings landed in code with regression tests, and the next cycle usually found the previous one had fixed the bug at the wrong abstraction layer.

9 cycles 99 findings · 0 open 92 → 213 tests 4 architectural pivots

The numbers

99 findings. 96 closed in code, 3 deferred, 0 open.

Each cycle's critique lives at brain/critiques/ with reproducible attacks and file:line evidence; each cycle's shipped state is snapshotted under brain/snapshots/. Test count grew 92 → 213 (+121) across the nine cycles. The three deferrals — E8, H2, H11 — each carry an explicit Phase-1 plan.

CycleFindingsCriticalHigh MediumLowOpenDeferredTests after
1 · pre-impl1826730052
2 · post-impl1326320078
3 · deltas11244100114
4 · fixes11244100141
5 · fixes11334100165
6 · fixes10333101186
7 · fixes7123100195
8 · fixes6222000202
9 · fixes12432302213
Total992133321303

Since the nine cycles, two further work items — the distributional-grader stack and a cross-process event-log chain-integrity fix — brought the suite to 246 passing (4 skipped). Those are feature and bug work, not red-team cycles, so they don't appear in the table above.

Cycle by cycle

What the loop learned, one cycle at a time.

1pre-impl
The pre-implementation red team

Found 18 architecture-level issues before any code existed — F2 (TOCTOU between Auditor sign and promote), F1 (mock-vs-mock self-confirmation). Foundational invariants went into the design rather than getting discovered at runtime.

test bar set at 52

2post-impl
The post-implementation red team

13 new findings against the implementation — P1 (fake-PROMOTED ghosts after a failed promote), P2 (stale-hash semantics), P5 (patterns defeated by hyphen substitution). All critical and high closed.

tests 52 → 78

3deltas
The cycle-2 fixes became the attack surface

C1 (canon regression — one canonical form wasn't enough; needed a strip+space dual form), C2 (budget-tracker single-process lock insufficient when run in parallel).

tests 78 → 114

4fixes
C-class fixes landed

The TOCTOU window in _build_diff got closed with a frozen live_snapshot.

tests 114 → 141

5fixes
Dual-form fallout

D1 (dual-form canonicalization false positives), D2 (a shallow snapshot let mutation through), D3 (sibling-reservation race in the budget tracker).

tests 141 → 165

6fixes
An embarrassing one

E1 — a bare except: swallowed a NameError in the cycle-5 D4 fix, silently leaking VERIFIED candidates after every TierMismatch for the whole cycle. Lesson written in stone: regression tests must assert the remediation effect, not just surface events.

tests 165 → 186 · E8 (Pydantic frozen-dict) deferred

7fixes
Another embarrassing one

F1 caught a _wrapped name-mangling escape that the cycle-6 docstring had explicitly acknowledged and shipped anyway. Lesson written in stone, again: a docstring saying "known limit" must be followed by # TODO(cycleN): fix.

tests 186 → 195

8pivot
architectural pivot #1
Stop fighting subclass attacks; constrain the wrap site

G1 was the third generation of the same _wrapped exposure attack (C4 → E4 → F1 → G1). Each prior cycle added another guard to _BackendTracker.__init_subclass__; each got bypassed by the next escape — literal name, mangled name, post-hoc setattr, metaclass. Cycle 8 dropped the subclass guard and added a wrap-site type(self.backend) is _BackendTracker check at Daemon.__init__.

tests 195 → 202

9pivot
architectural pivot #2
The chokepoint was one layer down — and a humbling rerun of pivot #1

Twelve findings, four critical. H1 found cycle 8's wrap-site check was at the wrong layerCoordinator, not Daemon, is the actual chokepoint, and three production scripts bypassed the Daemon entirely. H3 found the check ran only once in __init__; a later setattr swapped the wrapper silently. H4 found cycle-8's pre-consolidate snapshot was racy. Fixes: chokepoint relocation, property setters that re-validate on every assignment, and a writer_iteration_id stamp on every candidate. H2 and H11 honestly deferred — in-process pure-Python guards have a ceiling.

tests 202 → 213

Two patterns that kept reappearing

The same two lessons, across four architectural pivots.

pattern 1
Identity beats timing

Cycle 7 narrowed TierMismatch cleanup to "walk entry.tier" — but the buggy-Librarian threat model is exactly "wrote to one tier and lied about it." Cycle 8 added a pre-consolidate snapshot to identify "what this iteration introduced" by id-set difference. Cycle 9 found that has its own race window.

The fix that ended the arms race wasn't a wider snapshot — it was a writer_iteration_id field on the candidate itself, stamped atomically at write_candidate time. Ownership belongs on the data, not in the timing.

pattern 2
Chokepoints beat perimeters

Cycle 4 captured the wrapped backend in a closure. Cycles 6 and 7 found subclass and name-mangling escapes; the guard kept widening. Cycle 8 dropped the subclass guard for a wrap-site type check. Cycle 9 found three scripts bypassed that wrap site entirely.

The fix moved the check into Coordinator.__init__ — the real chokepoint every LLM call passes through — with property setters that re-validate on assignment. Constrain the finite chokepoint; don't perimeter-fence an infinite surface.

Two honest deferrals from cycle 9

Where in-process guards hit their ceiling.

deferred → Phase 1
H2 · closure-cell mutation

fn.__closure__[i].cell_contents = EvilBackend() swaps the wrapped backend while type(self.backend) is _BackendTracker still passes — reproduced live with 100 unmetered calls in the cycle-9 critique.

In pure Python, any in-process attacker with code-execution rights defeats any in-process guard. The wrapped-backend invariant is for honest code paths that forget to wrap (cycle 9 found three). For adversarial in-process attackers the answer is process-level isolation — not an in-process guard.

deferred → Phase 1
H11 · POSIX unlink-during-lock race

A POSIX-only race where deleting the lock file mid-lock could let a second opener acquire a parallel lock. Not reproducible on Windows — the OS blocks the unlink.

Deferred to Phase-1 Linux verification with an xfail-shaped test marker, so the gap is tracked in the suite rather than forgotten.

Reading the trail · what it doesn't claim

How to follow it, and where it stops.