Each cycle's critique lives at brain/critiques/ with reproducible attacks and file:line evidence; each cycle's shipped state is snapshotted under brain/snapshots/. Test count grew 92 → 213 (+121) across the nine cycles. The three deferrals — E8, H2, H11 — each carry an explicit Phase-1 plan.
| Cycle | Findings | Critical | High | Medium | Low | Open | Deferred | Tests after |
|---|---|---|---|---|---|---|---|---|
| 1 · pre-impl | 18 | 2 | 6 | 7 | 3 | 0 | 0 | 52 |
| 2 · post-impl | 13 | 2 | 6 | 3 | 2 | 0 | 0 | 78 |
| 3 · deltas | 11 | 2 | 4 | 4 | 1 | 0 | 0 | 114 |
| 4 · fixes | 11 | 2 | 4 | 4 | 1 | 0 | 0 | 141 |
| 5 · fixes | 11 | 3 | 3 | 4 | 1 | 0 | 0 | 165 |
| 6 · fixes | 10 | 3 | 3 | 3 | 1 | 0 | 1 | 186 |
| 7 · fixes | 7 | 1 | 2 | 3 | 1 | 0 | 0 | 195 |
| 8 · fixes | 6 | 2 | 2 | 2 | 0 | 0 | 0 | 202 |
| 9 · fixes | 12 | 4 | 3 | 2 | 3 | 0 | 2 | 213 |
| Total | 99 | 21 | 33 | 32 | 13 | 0 | 3 | — |
Since the nine cycles, two further work items — the distributional-grader stack and a cross-process event-log chain-integrity fix — brought the suite to 246 passing (4 skipped). Those are feature and bug work, not red-team cycles, so they don't appear in the table above.
Found 18 architecture-level issues before any code existed — F2 (TOCTOU between Auditor sign and promote), F1 (mock-vs-mock self-confirmation). Foundational invariants went into the design rather than getting discovered at runtime.
test bar set at 52
13 new findings against the implementation — P1 (fake-PROMOTED ghosts after a failed promote), P2 (stale-hash semantics), P5 (patterns defeated by hyphen substitution). All critical and high closed.
tests 52 → 78
C1 (canon regression — one canonical form wasn't enough; needed a strip+space dual form), C2 (budget-tracker single-process lock insufficient when run in parallel).
tests 78 → 114
The TOCTOU window in _build_diff got closed with a frozen live_snapshot.
tests 114 → 141
D1 (dual-form canonicalization false positives), D2 (a shallow snapshot let mutation through), D3 (sibling-reservation race in the budget tracker).
tests 141 → 165
E1 — a bare except: swallowed a NameError in the cycle-5 D4 fix, silently leaking VERIFIED candidates after every TierMismatch for the whole cycle. Lesson written in stone: regression tests must assert the remediation effect, not just surface events.
tests 165 → 186 · E8 (Pydantic frozen-dict) deferred
F1 caught a _wrapped name-mangling escape that the cycle-6 docstring had explicitly acknowledged and shipped anyway. Lesson written in stone, again: a docstring saying "known limit" must be followed by # TODO(cycleN): fix.
tests 186 → 195
G1 was the third generation of the same _wrapped exposure attack (C4 → E4 → F1 → G1). Each prior cycle added another guard to _BackendTracker.__init_subclass__; each got bypassed by the next escape — literal name, mangled name, post-hoc setattr, metaclass. Cycle 8 dropped the subclass guard and added a wrap-site type(self.backend) is _BackendTracker check at Daemon.__init__.
tests 195 → 202
Twelve findings, four critical. H1 found cycle 8's wrap-site check was at the wrong layer — Coordinator, not Daemon, is the actual chokepoint, and three production scripts bypassed the Daemon entirely. H3 found the check ran only once in __init__; a later setattr swapped the wrapper silently. H4 found cycle-8's pre-consolidate snapshot was racy. Fixes: chokepoint relocation, property setters that re-validate on every assignment, and a writer_iteration_id stamp on every candidate. H2 and H11 honestly deferred — in-process pure-Python guards have a ceiling.
tests 202 → 213
Cycle 7 narrowed TierMismatch cleanup to "walk entry.tier" — but the buggy-Librarian threat model is exactly "wrote to one tier and lied about it." Cycle 8 added a pre-consolidate snapshot to identify "what this iteration introduced" by id-set difference. Cycle 9 found that has its own race window.
The fix that ended the arms race wasn't a wider snapshot — it was a writer_iteration_id field on the candidate itself, stamped atomically at write_candidate time. Ownership belongs on the data, not in the timing.
Cycle 4 captured the wrapped backend in a closure. Cycles 6 and 7 found subclass and name-mangling escapes; the guard kept widening. Cycle 8 dropped the subclass guard for a wrap-site type check. Cycle 9 found three scripts bypassed that wrap site entirely.
The fix moved the check into Coordinator.__init__ — the real chokepoint every LLM call passes through — with property setters that re-validate on assignment. Constrain the finite chokepoint; don't perimeter-fence an infinite surface.
fn.__closure__[i].cell_contents = EvilBackend() swaps the wrapped backend while type(self.backend) is _BackendTracker still passes — reproduced live with 100 unmetered calls in the cycle-9 critique.
In pure Python, any in-process attacker with code-execution rights defeats any in-process guard. The wrapped-backend invariant is for honest code paths that forget to wrap (cycle 9 found three). For adversarial in-process attackers the answer is process-level isolation — not an in-process guard.
A POSIX-only race where deleting the lock file mid-lock could let a second opener acquire a parallel lock. Not reproducible on Windows — the OS blocks the unlink.
Deferred to Phase-1 Linux verification with an xfail-shaped test marker, so the gap is tracked in the suite rather than forgotten.
brain/critiques/ with reproducible attacks; each cycle's shipped state is under brain/snapshots/. Cold-start entry point: open brain/BRAIN.html — the catalog, with clickable links to every cycle, plan, critique, snapshot, and source file.--backend anthropic, an ANTHROPIC_API_KEY, cost ceilings, and WAL-backed budget metering.scripts/burst.py on demand against the real backend.