The operating rules that turn 18 people into a 270-person-equivalent shipping machine.
Distilled from Ryan Lapo's harness-engineering work at OpenAI Frontier
and adapted to graph8's real stack — 13 product boards, 3 first-party repos (g8 · agent-os · infra), ~565 PRs/month today.
Pin this page. Reread it weekly. If one of these rules feels uncomfortable, that's the
one to enforce hardest.
"Every time I have to type 'continue' to the agent is a failure of the harness to provide enough context for it to continue to completion. Code is free. Implementation is no longer the scarce resource. Your role is to figure out how to productively deploy infinite agent capacity into your code and your team."
Each principle has a one-line statement, a why, the practice at graph8, and a verification — how you know it's actually being followed (not just nodded at in a meeting).
If your fingers are on the keyboard typing source code, the harness has failed. Engineers dispatch agents, review PRs, set policy, and handle escalations — they don't type implementations. The models are isomorphic to a senior engineer; let them do the work and use your scarce attention for system design and delegation.
SELECT count(*) FROM skill_invocations WHERE engineer_id='you' AND ts > now()-interval '1 day' > 10. If it's lower, you're typing.A single MBM bot review is not enough. Every PR triggers persona-specific review agents in parallel: security_reviewer · reliability_reviewer · frontend_architect · backend_scalability · bug_predictor · test_writer. Each posts comments; the authoring agent addresses them or pushes back with reasoning. Humans review postmerge.
mbm_critic proposes updates to those rubrics weekly.The inner loop is everything. When CI takes 5 minutes, agents wait — and waiting agents are wasted capacity. Rebuild the build tooling whenever it slips above 60s. Ryan's team switched from Makefile → Bazel → Turbo → NX in a single week to hit it. Speed of feedback > locality of tradition.
Blocking on synchronous human review caps throughput at human-attention rate. Trust agent review + persona reviewers + CI gates. Merge automatically. Humans audit postmerge — sample, learn, encode findings into next week's rubric updates. Spot-checks beat blockers.
mbm_critic.Slop accumulates. PR comments, fix-cycle reruns, "the agent did it wrong again" moments — each one is a missing prompt, doc, or lint. Friday is the ritual where last week's friction becomes next week's encoded knowledge. The team eliminates classes of misbehavior, not instances.
knowledge_compactor + mbm_critic surface the candidates Monday morning.git log --since='1 week ago' --grep='gc-friday' returns ≥ 1 commit per engineer per week.Ryan's team runs ~6 skills. Each new skill is one more decision burden, one more place for engineer config to drift, one more thing the org has to policy-gate. Consolidate ruthlessly. When a new use-case emerges, extend an existing skill before creating a new one. If you can't fit it into an existing skill, it probably isn't a skill at all.
skill_mortality cron auto-opens a deprecation issue for any skill with < 5 fires / month. Drift gets caught in days, not quarters.find */.claude/commands -name '*.md' | wc -l across infra + g8 = 8."Unused variable" is useless. "Use fx_org_id from tests/conftest.py:42 — this fixture is canonical at graph8." is a fix. Lints, tests, type errors, review comments — every diagnostic is text that goes straight into an agent's context. Author them as prompts.
Code is free; rewriting is cheap. The valuable thing is the spec, the CLAUDE.md, the lint rules, the agent prompts — the things that determine what gets generated. Treat code as compiled output. Refactor without ceremony. Delete without nostalgia. Migrate large surfaces in a day.
knowledge_compactor + contract_test_runner keep the canonical patterns stable as code churns underneath.If an agent stops and asks you to confirm — that's a missing piece of context, not a feature. Don't just type "continue." Stop, identify what was missing, encode it into the harness, then resume. Over time, agents run end-to-end with zero prompts. That's the goal state.
agent_health + manual review identify top 3 friction points and add them to next Friday's GC queue.If you're watching one agent work, you're a bottleneck. Engineers run 10–15 parallel agents during the day, plus 3–6 overnight runs. The skill is dispatch + review, not observation. Token cost is irrelevant compared to engineer-hour cost; load up the queue and let it cook.
If your week still looks like the left column, you're not living the principles. The Garbage Collection ritual exists specifically to catch and fix that drift.
bug_predictorIf we want to ship like a 270-person team at 18 people, we need to be uncomfortable about how aggressively we ban human-written code. These are the rules that make it possible. Some will feel wrong on first read. Try them anyway.
Ryan's team is not allowed to open their code editors. Period. The only way to change code is to dispatch an agent. This forces every workflow improvement to flow through the harness — making the system better instead of working around it. Painful in week 1, transformative by month 3.
Ryan distributes Symphony as a 6-layer Markdown spec — "ghost library" — that codecs reconstitutes in any repo. Code becomes downstream of intent. graph8 could distribute Lifecycle the same way — every customer reconstitutes it locally, customized to their stack. Zero binary distribution, infinite leverage.
Ryan's repo: 750 PNPM packages with custom lint rules that fail builds if you cross package boundaries. The architecture isn't a doc — it's a CI failure. Boundaries become non-negotiable, which means agents (and humans) learn them instantly.
Ryan: "You can just point codecs at its own session logs to ask it to tell you how to use the tool better." Agents introspect their failures and propose skill updates. The system improves itself, no human required to notice the pattern.
skill_distiller · weekly · reads the trace ledger for the 100 longest sessions · classifies what context was missing · proposes additions to the relevant skill .md file as a PR. mbm_critic's sibling.The hardest part of Lifecycle isn't the engineering — it's the cultural discipline. These 10 principles are what make 5–15× output sustainable. If one feels uncomfortable, it's probably the one we need to lean into hardest.