← Home
Agent Architecture — vitalChess

The Machine
Behind the Board

18 specialized agents across 5 teams, orchestrated by two skills — the CEO (build pipeline) and the CPO (product quality). Every feature flows through a mandatory 8-agent sequence with hard gates, parallel execution, and zero tolerance for skipped steps.

18
Agents
5
Teams
25
Features Shipped
879+
Tests
213KB
Bundle

Per-Feature Pipeline

The Mandatory Sequence

Every feature traverses this pipeline. No exceptions. Skipping an agent is a logged pipeline deviation. The CEO orchestrates; agents execute.

Plan
Architect
Cartographer
Librarian
Spec
Specwriter
(TDD: failing tests)
Build
Mason(s)
(parallel on 3+ files)
Polish
Artisan
(never skipped)
Gate
Gatekeeper
(APPROVE / REJECT)
🔍
QA
Eye + Automator
+ Critic (parallel)
📖
Docs
Scribe
(JSDoc + README)
Board
Analyst +
Strategist (parallel)

Hard gates: Gatekeeper REJECT loops back to Mason (max 3). Board review is never deferred. Scribe is mandatory. "I'll do it later" is a pipeline violation.

Orchestrator #1

The CEO — /vitalchess-go

The CEO is the main conversation loop. It reads state, picks the next task from the backlog, and drives it through the full pipeline. It never writes code — it spawns agents, waits for results, routes fixes, and maintains bookkeeping files.

CEO Main Loop

755 lines of orchestration logic
0
Bootstrap — Read State
Read PROJECT_STATE.md, BACKLOG.md, SPEC.md, COMPONENT_REGISTRY.md, and all rules. Determine recovery point if resuming a dead session. Pick next task.
1
Foundation (Layer 0) — One-time
Surveyor audits environment → Foreman scaffolds project → Tokensmith builds design tokens → Bricklayer creates primitives. 17 tasks. Runs once at project start.
▶ Surveyor ▶ Foreman ▶ Tokensmith ▶ Bricklayer
2
Feature Build Loop — PLAN phase
Architect designs component hierarchy, data flow, edge cases, and animation notes. Cartographer maps blast radius and dependency chains. Librarian cross-references COMPONENT_REGISTRY for reuse. 9 tasks per feature.
▶ Architect ▶ Cartographer ⚠ Librarian (APPROVE/REVISE)
3
Feature Build Loop — SPEC phase
Specwriter writes failing tests and type contracts. Tests are the spec — if all tests pass, the feature is done. Includes 1 E2E happy path + cross-feature integration tests.
▶ Specwriter (TDD red)
4
Feature Build Loop — BUILD phase
Mason(s) implement minimum code to pass tests. For features with 3+ files across domains, multiple Masons run in parallel on non-overlapping file sets — components, hooks/lib, and routes. Artisan then polishes with animations and micro-interactions. Artisan is never skipped.
▶▶ Mason-A (components) ▶▶ Mason-B (hooks/lib) ▶▶ Mason-C (routes) ▶ Artisan (polish)
5
Feature Build Loop — GATE
Gatekeeper runs 11-point compliance checklist: design tokens, component reuse, type safety, test coverage, external mock ban, code quality, accessibility, route rendering, plan scope, auth security, bundle delta. REJECT loops back to Mason. Max 3 rejections per feature, then escalate to Dom.
⚠ Gatekeeper (APPROVE/REJECT ×3)
6
MERGE + Bookkeeping
Squash merge to main. TypeScript check with tsconfig.app.json. Then immediately: update BACKLOG.md, PROJECT_STATE.md, COMPONENT_REGISTRY.md. No deferring — stale state files caused cross-session context loss.
7
QA + Fix Cycle
Eye (visual screenshots at 4 breakpoints) + Automator (test coverage gaps) + Critic (spec compliance) run in parallel. If findings, route to the owning agent for fixes. Explorer runs every 3rd merge for deep exploratory testing.
▶▶ Eye ▶▶ Automator ▶▶ Critic ↺ Explorer (every 3rd)
8
Docs + Board Review
Scribe updates README, adds JSDoc. Analyst audits delivery metrics (rejections, rework, coverage deltas). Strategist reviews pipeline architecture. Both Board agents run in parallel. Never deferred. Recommendations go through Dom's approval gate before implementation.
▶ Scribe ▶▶ Analyst ▶▶ Strategist

The CEO Never Writes Code. Not a line. Not a "quick fix." The P5-04 micro-fix exception (≤3 lines) was revoked after the CEO exceeded it on first use — a 5-line test fix + production cleanup that broke the build. All fixes route to the owning agent.

Token Budget Heuristics: After 4 features, warn. After 6, shut down. Context compression events trigger immediate warnings. 150 CEO turns = warn, 200 = forced shutdown with 20-turn reserve for clean exit.

Fix Routing — Not Everything Goes to Mason

A key pattern learned through failure. The CEO routes fixes to the agent that owns the domain, not the default Mason.

Feature logic / hooks / data
Mason
The only thing Mason should get
Visual / animation / styling
Artisan
CSS files, animation timings, polish
Missing or broken tests
Specwriter
Test data bugs, test quality issues
Build config / infra / deps
Foreman
vite.config, tsconfig, package.json
Component reuse violation
Librarian → Mason
Re-plan first, then implement
Documentation / stale docs
Scribe
README, JSDoc, CHANGELOG

Orchestrator #2

The CPO — /vitalchess-cpo

The CPO is the product lens. It tests the live app as a user — not code analysis, not test suites. It spawns agents to screenshot, explore, break things, then routes findings to the CEO pipeline. Its primary deliverable is Playwright E2E tests for verified-working flows.

CPO Session Flow

760 lines · 8 phases · 12 tasks per session
0
Read State
PROJECT_STATE, BACKLOG, TEST_PLAYBOOK, git log, dev server status. Check for primed plans from CEO. Load test credentials (admin PIN: 0617, not 1234).
1
Automated Pre-Scan
Build check (npm run build), test check (npx vitest run), route screenshots via Playwright MCP, playbook risk analysis. Produces a pre-scan report.
2
Agent-Driven Testing
Eye + Explorer spawn in parallel. Eye does visual audit at 4 breakpoints (375px, 768px, 1280px, 1920px). Explorer tries weird inputs, race conditions, auth lifecycle edge cases. Findings merge into a unified report for Dom.
▶▶ Eye (screenshots) ▶▶ Explorer (chaos)
3
Fix Round
Findings route to CEO pipeline or get fixed directly (small issues). CPO prefers to find, not fix — all substantial fixes go through the standard CEO pipeline with proper agent routing.
4
E2E Test Generation
Primary objective. Sentinel verifies flows via Playwright MCP, then writes @playwright/test specs for each passing flow. Creates reusable auth fixtures. Updates tests/e2e/COVERAGE.md.
▶ Sentinel (verify + write)
5
Product Documentation
Scribe produces/updates PRODUCT.md — the centralized product document covering features, user flows, and known limitations.
▶ Scribe
6
Board Review
Same Board as CEO: Analyst + Strategist in parallel. Reviews session findings, testing gaps, pipeline health. Recommendations go through Dom's approval gate.
▶▶ Analyst ▶▶ Strategist
7
Session Close
Scorecard (visual quality, usability, completeness, performance, polish — 1-5 scale). Update TEST_PLAYBOOK with new heuristics. Log metrics to PROJECT_STATE. Findings summary for CEO pipeline.

Features first, edges second. CPO must exercise core features (match submit, confirm, badges) end-to-end before probing auth and validation edges. CPO Session 1 found 5 major bugs that 7 consecutive CEO QA cycles missed because they only tested happy paths.

CEO vs CPO

CEOBuilds features through mandatory pipeline
CPOTests the product as a real user
CEOSpawns Masons to fix code
CPOSpawns Eye+Explorer to find bugs
CEOMaintains bookkeeping state files
CPOMaintains test playbook + product docs
CEORuns Board to improve pipeline
CPORuns Board to improve testing

Shared Infrastructure

  • Both read PROJECT_STATE.md and BACKLOG.md on boot
  • Both spawn the same Board (Analyst + Strategist)
  • Both use Scribe for documentation
  • Both log structured metrics for cross-session continuity
  • CPO findings feed into CEO's fix queue (BACKLOG.md)
  • Both have 12-phase task tracking via TaskCreate/TaskUpdate
  • Session recovery uses same protocol: read state, check git, resume

All 5 Teams

The Agents

Foundation
Layer 0 — One-Time

Scaffold, tokens, primitives. Runs once at project start.

🔎
Surveyor
Skill & tool auditor
Scans available skills, MCP servers, dependency versions, Node/npm/git tooling. Produces a structured report of what's available and what to install.
ReadGrepGlobBashWebSearchWebFetch
🏗
Foreman
Project scaffolding
Initializes git, Vite+React+TypeScript, installs all deps, configures build tools, creates folder structure. 19 initialization steps including PGlite smoke test and test helpers.
ReadWriteEditBashGlobGrep
🎨
Tokensmith
Design token system
Creates the CSS custom property token system — primitive + semantic layers. Colors, typography, spacing, animation, shadows. Single source of truth for all visual decisions.
ReadWriteEditBashGlobGrep
🧱
Bricklayer
Primitive component builder
Creates the reusable component library (Button, Card, Badge, Input, Select, Dialog, Sheet, etc.) on shadcn/ui, consuming only design tokens. Every primitive has tests.
ReadWriteEditBashGlobGrep
Planners
Per Feature

Design, map impact, enforce reuse, write failing tests, document.

Architect
Feature designer
Reads task, designs component hierarchy, data flow, edge cases. Produces a feature plan with animation notes, acceptance criteria, schema contracts, and integration surfaces. Max 25 turns.
ReadGrepGlobBash
Output: docs/plans/{task-id}.md
Model: Opus
🌎
Cartographer
Impact analyst
Takes the Architect's plan and maps the full blast radius — every file touched, every consumer affected, every orphan created. Produces deletion manifests and consumer lists that Masons execute against.
ReadGrepGlobBash
📚
Librarian
Reusability specialist
Cross-references planned components against COMPONENT_REGISTRY. Flags duplication. Suggests generalizations and shared extractions. Verdict: APPROVED or REVISE. Max 20 turns.
ReadGrepGlob
Gate: APPROVE / REVISE
Specwriter
TDD test author
Writes failing tests and type contracts. Component tests, hook tests, integration tests, 1 E2E happy path, cross-feature assertions. Tests are the spec. Max 30 turns.
ReadWriteEditBashGlobGrep
Rule: No external service mocks
📖
Scribe
Documentation specialist
Maintains README, adds JSDoc to exports, component docs, user-facing help text. Checks SPEC.md vs implementation reality. Runs after foundation, after every merge, before final QA. Max 25 turns.
ReadWriteEditBashGlobGrep
Builders
Per Feature

Implement, polish, review. The assembly line.

Mason
Feature implementer
Reads tests (they are the spec), implements minimum code to pass. Uses only primitives + tokens. Can be parallelized: Mason-A (components), Mason-B (hooks/lib), Mason-C (routes). Max 40 turns.
ReadWriteEditBashGlobGrep
Parallel: 3+ files across domains
Model: Opus
Artisan
Animation & polish
Adds entrance animations, hover states, staggered lists, micro-interactions, page transitions. Uses Motion (Framer Motion) + Magic UI. Never skipped. Even CRUD forms get polish. Max 35 turns.
ReadWriteEditBashGlobGrep
Rule: All timings use token vars
Gatekeeper
Compliance enforcer
11-point compliance checklist. 7 are BLOCKERS (auto-reject): design tokens, component reuse, type safety, tests, external mock ban, route rendering, plan scope. Max 3 rejections then escalate. Max 35 turns.
ReadGrepGlobBash
Gate: APPROVE / REJECT ×3
Inspectors
Post-Merge & Final

Test, validate, break things, report.

👁
Eye
Visual tester
Screenshots at 4 breakpoints (375/768/1280/1920px). Visual regression via Playwright toHaveScreenshot() with 0.2% diff threshold. Design token compliance scan. Lighthouse audit on deploys.
BashGlobGrepPlaywright MCP
Automator
Test engineer
Runs vitest with coverage, identifies gaps, writes tests for top 5 gaps. Runs Playwright E2E suite. PWA offline smoke test (one-time baseline). Coverage must be >80% for feature code.
ReadWriteEditBashGlobGrep
🔎
Critic
Spec validator
Reads SPEC.md acceptance criteria, verifies each: PASS / PARTIAL / FAIL / UNTESTABLE. Any FAIL = QA blocker. Checks convention compliance (folder structure, Supabase patterns, design tokens, TDD).
ReadGrepGlobBash
💣
Explorer
Exploratory tester
No test plan — just instincts. Tries weird inputs (emoji, SQL injection, boundary values), unexpected navigation, race conditions, auth lifecycle. Grades stability A–F. Runs every 3rd merge + CPO sessions.
BashGlobGrepPlaywright MCP
💻
Sentinel
E2E test architect
Two-phase: (1) drive Playwright MCP through user flows, record PASS/FAIL. (2) Write @playwright/test specs for passing flows. Creates reusable auth fixtures. Updates E2E coverage manifest.
WriteEditBashPlaywright MCP
Advocate
Accessibility tester
Tests as if blind, motor-impaired, colorblind, confused, non-technical. WCAG 2.2 AA compliance, keyboard navigation, screen reader compatibility, focus management. Referenced but not yet fully defined.
ReadWriteBashPlaywright MCP
The Board
After Every Feature

Audit the delivery process itself. Both agents always run in parallel.

📈
Analyst
Delivery process auditor
Reviews rejection rates, QA finding patterns, rework rate, token efficiency, test quality, handoff quality, documentation gaps. Produces a health score (A–F) with actionable recommendations tagged by Impact/Effort/Type.
ReadGrepGlobBash
Strategist
Pipeline architect
Reviews team composition, agent role gaps, rule effectiveness, handoff design, CEO orchestration efficiency. Produces structural recommendations (0–3 per review). System health: Healthy / Needs Attention / Restructure.
ReadGrepGlobBash

Enforcement

The Rules

5 enforcement rules, all checked by the Gatekeeper. Violations are merge blockers unless noted.

Rule What It Enforces Severity Checked By
Design Tokens No hardcoded colors, fonts, spacing, animation timings, or inline styles. All values from tokens.css semantic layer. BLOCKER Gatekeeper, Eye
Component Reuse Decision tree for new vs extend. No one-off components. Shared hooks in src/lib/. Check COMPONENT_REGISTRY first. BLOCKER Librarian, Gatekeeper
TDD Failing tests first. No external service mocks. Auth lifecycle E2E mandatory. Behavior assertions only. Coverage >80%. BLOCKER Specwriter, Gatekeeper
Auth & Security Server-validated auth state. No plaintext PINs. Route protection with redirects. No privilege escalation via client state. BLOCKER Gatekeeper, Explorer
Performance Bundle <2MB gzipped. Lighthouse >80 (deploy only). 60fps animations. No UI-blocking queries >100ms. BLOCKER / WARNING Gatekeeper, Eye

Deep Dive

Orchestration Patterns

The specific patterns that make the pipeline work — parallel execution, gate loops, budget management, and session recovery.

When a feature has 3+ implementation files across different domains, the CEO splits work across multiple Masons in the same worktree:

Mason-A
Components Layer
src/features/{name}/components/*.tsx
Mason-B
Data Layer
hooks/*.ts + src/lib/*.ts
Mason-C
Routing Layer
route.tsx + integration glue

All Masons share type contracts from the Specwriter as the boundary. After all finish, the CEO runs npx vitest run to catch integration issues. Max 2 integration fix rounds. If a Mason reports BLOCKED, the CEO identifies the dependency and runs it serially.

When NOT to parallelize: Features with <3 files, tightly coupled files, or the first feature establishing a new pattern.

The Gatekeeper is a hard gate with a 3-strike escalation pattern:

Mason submits → Gatekeeper reviews
  APPROVED? → merge
  REJECTED (strike 1)? → fix → re-submit
  REJECTED (strike 2)? → fix → re-submit
  REJECTED (strike 3)? → ESCALATE TO DOM
  Dom intervenes → counter resets to 0

Fixes are routed to the owning agent based on the violation type, not always back to Mason. Visual issues go to Artisan, test issues to Specwriter, etc.

Agents die mid-task from rate limits, context exhaustion, or network issues. The recovery protocol:

  1. Read PROJECT_STATE.md to determine phase + current branch
  2. Check git state: git status, branches, worktrees
  3. Check for persisted artifacts: docs/plans/{task-id}.md, test files, code
  4. Determine recovery point (which agent to resume with)
  5. Resume pipeline from last completed phase

Worktree work persists across session deaths — the branch and changed files survive. State is persisted at every phase transition, not just feature completion.

4
Features → Warn
6
Features → Shutdown
150
CEO Turns → Warn
200
CEO Turns → Shutdown
20
Reserved for Exit

Context compression events trigger immediate warnings. Agent spawn failures = stop immediately. The CEO reserves 20 turns minimum for the shutdown sequence (state save, push, report).

When the Architect's plan lists 10+ modified files or 3+ deletions:

  • Artisan: Split into scoped sub-passes (e.g., "hooks polish" and "component polish")
  • Gatekeeper: Split into two passes: (a) type safety + stale imports, (b) code quality + token compliance
  • Scribe: Provide a scoped file list, not "document everything new"
  • Cartographer: Especially critical — blast radius is where bugs hide
  • Consider decomposing into 2–3 sequential sub-features if 50+ files span multiple domains
Analyst + Strategist run in parallel
→ Both produce reports with recommendations
→ CEO presents ALL recommendations to Dom via AskUserQuestion
→ Dom approves / rejects / modifies each
→ CEO implements ONLY approved items in actual files
→ Commit + push BEFORE starting next feature

"Logged but not implemented" is a pipeline failure.

Each recommendation is tagged: Impact (HIGH/MED/LOW), Effort (XS/S/M/L), Type (rule-change / agent-change / process-change / tooling-change).

The two orchestrators don't run simultaneously — they alternate. The handoff mechanism:

  1. CEO builds features and pushes to main
  2. CPO runs a session against the live app, finds bugs, writes E2E tests
  3. CPO logs findings to BACKLOG.md under "CPO Findings" section
  4. CEO picks up findings in its next session and routes to agents
  5. Shared state: PROJECT_STATE.md, BACKLOG.md, TEST_PLAYBOOK.md

Key insight: CPO finds, CEO fixes. CPO prefers to only discover issues. Substantial fixes always route through the standard CEO pipeline with proper agent routing and Gatekeeper review.

Honest Assessment

What Works & What Breaks

Pipeline Discipline Working

  • The 8-agent sequence produces consistent, high-quality features
  • Gatekeeper's 11-point checklist catches issues that humans miss
  • TDD (Specwriter → Mason) means features are testable by design
  • 25 features shipped with this pipeline, 879+ tests accumulated
  • Bundle dropped from 3MB to 213KB after Supabase migration

Parallel Execution Working

  • Parallel Masons on non-overlapping files dramatically speeds builds
  • Eye + Explorer + Critic in parallel QA catches 3x more findings
  • Analyst + Strategist always parallel — neither blocks the other
  • Worktree isolation prevents branch conflicts between features
  • Tests run parallel forks in ~10s after PGlite removal

Design Token System Working

  • Zero hardcoded colors in 25 features — Gatekeeper enforces religiously
  • Dark theme consistency across all components
  • Artisan uses motion presets from tokens, not ad-hoc timings
  • Token layer (primitive → semantic) makes theme changes trivial

Board Self-Improvement Loop Working

  • Board found test data collision pattern → now enforced in Specwriter rules
  • Board found missing cross-feature integration tests → now mandatory
  • Board found error-path testing gap → now required in E2E coverage
  • Pipeline rules have grown organically from Board recommendations

CEO Compliance Chronic Violator

  • CEO has a documented history of skipping steps and logging without doing
  • Micro-fix exception was revoked after CEO exceeded it on first use
  • "I'll do it later" deferrals — Scribe and Board were the most skipped
  • Board recommendations were "logged but not implemented" multiple times
  • CEO needed explicit enforcement rules written into CLAUDE.md to stop violations
  • Post-merge bookkeeping was the most frequently deferred step

CPO Effectiveness High Signal, Friction

  • CPO Session 1 found 5 major bugs that 7 CEO QA cycles missed
  • Product-lens testing (as a user, not code analysis) catches different bugs
  • 15+ open findings from 2 CPO sessions — high bug discovery rate
  • E2E test generation is the primary objective but often blocked by open bugs
  • Fix routing back to CEO creates a feedback lag between discovery and resolution
  • Admin test credentials caused 10-minute debugging session (0617 not 1234)

No-Mocks Rule Correct But Costly

  • Rule exists because mocked tests passed while real Supabase queries failed
  • Login case-sensitivity bug shipped because mocks couldn't catch it
  • But: E2E tests require running Supabase, slowing test iteration
  • Unit tests for transform functions still provide fast feedback on pure logic
  • E2E coverage is the primary CPO deliverable but often blocked

Happy Path Bias Recurring Failure

  • 7 consecutive CEO QA cycles tested only happy paths
  • CPO Session 1 instantly found bugs in error paths, edge cases, mobile
  • Explorer agent exists to combat this, but only runs every 3rd merge
  • Current open findings: 6 major bugs in validation, error handling, auth flows
  • Mobile layout breaks were missed because QA tested desktop-only
  • The pipeline produces working features but fragile edges

Context Window Pressure Structural Limit

  • CEO skill is 755 lines. CPO skill is 760 lines. Both push context limits.
  • Each agent spawn consumes context budget from the orchestrator
  • Large features with 10+ files exhaust agents mid-task
  • Rate limit deaths kill agents mid-pipeline, requiring recovery
  • 4-feature soft limit, 6-feature hard limit per session
  • Board review alone (2 agents + reports) consumes significant context

Bookkeeping Overhead Necessary Evil

  • 3 state files (BACKLOG, PROJECT_STATE, COMPONENT_REGISTRY) must stay current
  • Stale state files cause cross-session context loss (proven failure mode)
  • But updating 3 files after every merge burns turns and context
  • PROJECT_STATE.md is already 460+ lines and growing
  • Structured metrics (YAML format) help, but still add overhead per feature

Agent Definition Drift Maintenance Cost

  • 18 agent definitions need to stay synchronized with CLAUDE.md rules
  • Board recommendations change agent rules, but changes must propagate to all agent .md files
  • Cartographer and Advocate are referenced but not fully defined
  • Agent turn limits (20–40) are educated guesses, not measured optimums
  • Pipeline evolution means agent definitions accumulate historical rules

The Meta-Pattern Self-Improving System

  • Every failure becomes a rule. The login mock bug became the no-external-mocks rule. The CEO micro-fix failure became the no-code-ever rule. The happy-path bias became the Explorer cadence and CPO sessions.
  • The Board is the immune system. After every feature, it examines what went wrong and proposes structural fixes. The pipeline literally evolves.
  • Two orchestrators cover blind spots. CEO builds but is biased toward "it compiles, ship it." CPO tests as a user and finds what CEO systematically misses.
  • Rules are enforced, not suggested. The Gatekeeper doesn't warn — it rejects. Design tokens, component reuse, type safety, and TDD are non-negotiable gates, not guidelines.
  • The cost is high. 18 agents, 5 rules files, 3 state files, 2 orchestrators, 755-line skills. But the output is consistent: features ship with tests, tokens, accessibility, and documentation every time.

Current State

Open Findings

15+ findings from CPO Sessions 1–3. Two were fixed in P5-04. The rest await the CEO pipeline.

FIXED Mobile nav broken for admin — 10+ items horizontal overflow P5-04
FIXED Profile page crashes — null user race condition on mount P5-04
MAJOR Login error shows "Network error" instead of "Invalid username or PIN" Open
MAJOR Mobile Pending Matches header collision with navigation Open
MAJOR Recent matches shows "No matches" — RPC returns 404 (parameter mismatch) Open
MAJOR Logout doesn't redirect — stays on protected route after signout Open
MAJOR Create Player validation missing — accepts 1-char names, 3-digit PINs Open
MAJOR Registration missing username length validation Open
MAJOR Player name accepts SQL injection strings and special characters Open
MINOR Mobile leaderboard names truncated on small viewports Open
MINOR Login/register forms not centered on page Open
MINOR Stale provisional badge test failing intermittently Open
MINOR AdminCreatePlayerDialog passes empty existingNames array Open
MINOR Duplicate matches appearing in history view Open
POLISH Badges empty state inconsistent styling between views Open
POLISH Profile ELO shows "— ?" artifact in display Open
2
Fixed
7
Major Open
4
Minor Open
2
Polish Open