graph8 lifecycle
Cultural manifesto · pin this
The cultural rules · how graph8 ships now

10 principles. No human-written code. 10+ parallel agents per engineer. CI under 60 seconds. Every continue is a harness failure.

The operating rules that turn 18 people into a 270-person-equivalent shipping machine. Distilled from Ryan Lapo's harness-engineering work at OpenAI Frontier and adapted to graph8's real stack — 13 product boards, 3 first-party repos (g8 · agent-os · infra), ~565 PRs/month today. Pin this page. Reread it weekly. If one of these rules feels uncomfortable, that's the one to enforce hardest.

Rule #1
0
lines of code an engineer hand-writes
Rule #6
8
skills total · max · ever
Rule #10
10+
agents per engineer in flight
Rule #3
< 60s
CI build time · SLO
Rule #4
0
PRs blocked on human review
Rule #5
1
Garbage Collection Day · every Friday
"
"Every time I have to type 'continue' to the agent is a failure of the harness to provide enough context for it to continue to completion. Code is free. Implementation is no longer the scarce resource. Your role is to figure out how to productively deploy infinite agent capacity into your code and your team."
Ryan Lapo · Member of Technical Staff, OpenAI Frontier · "Harness Engineering" (Nov 2025). Token billionaire. Banned his team from touching editors. 9 months of shipping with zero human-written code.
The 10 principles

One per page-break in your head. Memorize the numbers.

Each principle has a one-line statement, a why, the practice at graph8, and a verification — how you know it's actually being followed (not just nodded at in a meeting).

01

No human-written code.

If your fingers are on the keyboard typing source code, the harness has failed. Engineers dispatch agents, review PRs, set policy, and handle escalations — they don't type implementations. The models are isomorphic to a senior engineer; let them do the work and use your scarce attention for system design and delegation.

At graph8Every PR opened by an agent. Every code commit message authored by an agent. If you find yourself opening a file in your editor to edit it, stop. Write a ticket. Dispatch an agent.
Verify · SELECT count(*) FROM skill_invocations WHERE engineer_id='you' AND ts > now()-interval '1 day' > 10. If it's lower, you're typing.
02

Every PR opens a fleet of review agents.

A single MBM bot review is not enough. Every PR triggers persona-specific review agents in parallel: security_reviewer · reliability_reviewer · frontend_architect · backend_scalability · bug_predictor · test_writer. Each posts comments; the authoring agent addresses them or pushes back with reasoning. Humans review postmerge.

At graph8Add one new persona reviewer per month. By month 3, every PR runs through 6+ persona agents before any human reads it. Each persona has a rubric document; mbm_critic proposes updates to those rubrics weekly.
Verify · pick any merged PR · count the bot comments · should be ≥ 4.
03

CI under 60 seconds. No exceptions.

The inner loop is everything. When CI takes 5 minutes, agents wait — and waiting agents are wasted capacity. Rebuild the build tooling whenever it slips above 60s. Ryan's team switched from Makefile → Bazel → Turbo → NX in a single week to hit it. Speed of feedback > locality of tradition.

At graph8Top-level CI SLO. Bandit + ruff + pytest -m unit + pyright all run in < 60s. If they don't, the next priority is sharding/caching/precompilation until they do. Don't make agents wait.
Verify · pick the last 10 merged PRs in g8 · median CI duration < 60s.
04

Postmerge code review. Humans don't gate.

Blocking on synchronous human review caps throughput at human-attention rate. Trust agent review + persona reviewers + CI gates. Merge automatically. Humans audit postmerge — sample, learn, encode findings into next week's rubric updates. Spot-checks beat blockers.

At graph8Once dashboards show 30 days of clean agent merges, flip to auto-merge on green-CI + bot-approved. Humans do weekly postmerge sweeps (sample 5–10% of PRs) and feed findings into mbm_critic.
Verify · median PR-open-to-merge time < 30 min · with humans only in the loop for migrations and policy-flagged paths.
05

Garbage Collection Friday. Every week. Non-negotiable.

Slop accumulates. PR comments, fix-cycle reruns, "the agent did it wrong again" moments — each one is a missing prompt, doc, or lint. Friday is the ritual where last week's friction becomes next week's encoded knowledge. The team eliminates classes of misbehavior, not instances.

At graph8Every Friday afternoon: each engineer picks one recurring slop pattern from the week, writes a lint / CLAUDE.md addition / review-agent rubric update / new test that categorically prevents it. knowledge_compactor + mbm_critic surface the candidates Monday morning.
Verify · git log --since='1 week ago' --grep='gc-friday' returns ≥ 1 commit per engineer per week.
06

8 skills total. Maximum. Forever.

Ryan's team runs ~6 skills. Each new skill is one more decision burden, one more place for engineer config to drift, one more thing the org has to policy-gate. Consolidate ruthlessly. When a new use-case emerges, extend an existing skill before creating a new one. If you can't fit it into an existing skill, it probably isn't a skill at all.

At graph8By end of week 4: 22 → 8. skill_mortality cron auto-opens a deprecation issue for any skill with < 5 fires / month. Drift gets caught in days, not quarters.
Verify · find */.claude/commands -name '*.md' | wc -l across infra + g8 = 8.
07

Every error message is a remediation prompt.

"Unused variable" is useless. "Use fx_org_id from tests/conftest.py:42 — this fixture is canonical at graph8." is a fix. Lints, tests, type errors, review comments — every diagnostic is text that goes straight into an agent's context. Author them as prompts.

At graph8Custom ESLint + ruff rules with bespoke messages that name the canonical alternative, link to the relevant CLAUDE.md section, and tell the agent what to do. New rules ship as part of every Garbage Collection Friday.
Verify · grep any custom lint message in g8 · should reference a file path or a CLAUDE.md anchor, not just describe the problem.
08

Code is a build artifact. The spec is the source of truth.

Code is free; rewriting is cheap. The valuable thing is the spec, the CLAUDE.md, the lint rules, the agent prompts — the things that determine what gets generated. Treat code as compiled output. Refactor without ceremony. Delete without nostalgia. Migrate large surfaces in a day.

At graph8Major refactors (file structure, naming conventions, framework moves) happen in one sprint via parallel agent dispatch. No 3-month migrations. knowledge_compactor + contract_test_runner keep the canonical patterns stable as code churns underneath.
Verify · any framework / convention change should land across the whole codebase within 7 days · not 7 weeks.
09

Every "continue" is a harness failure. Ship the fix.

If an agent stops and asks you to confirm — that's a missing piece of context, not a feature. Don't just type "continue." Stop, identify what was missing, encode it into the harness, then resume. Over time, agents run end-to-end with zero prompts. That's the goal state.

At graph8Every time an engineer types "continue" or "yes" or re-prompts an agent, log it. Weekly: agent_health + manual review identify top 3 friction points and add them to next Friday's GC queue.
Verify · count of "continue"-style prompts per engineer per week trends to zero. Today: high. Week 4: down 80%. Week 12: rare.
10

10+ agents per engineer in flight. Anything less is wasted capacity.

If you're watching one agent work, you're a bottleneck. Engineers run 10–15 parallel agents during the day, plus 3–6 overnight runs. The skill is dispatch + review, not observation. Token cost is irrelevant compared to engineer-hour cost; load up the queue and let it cook.

At graph8Every morning: spawn 5+ agents within the first 30 minutes. Every evening: 3+ overnight runs queued before logging off. Weekly utilization grade on the dashboard — "excellent" means consistently 10+ parallel, "spinning" means high fires + low merges.
Verify · daily skill-fires per engineer in the ledger · target: 50+. Currently top performer (Usjid) is ~1.2/day. Need a 40× ramp here.
What changes in practice

The day-to-day, before and after.

If your week still looks like the left column, you're not living the principles. The Garbage Collection ritual exists specifically to catch and fix that drift.

Engineer week · old way

  • Open IDE · type code · iterate
  • 1–2 PRs / day, all hand-written
  • Wait for MBM bot review (3–10 min)
  • Wait for human reviewer (hours–days)
  • Address every comment manually
  • "Did the agent run?" Slack question
  • 5 hrs/week in status meetings
  • Sprint planning estimates tickets
  • Refactors take weeks
  • Bugs caught in prod via Sentry

Engineer week · principles applied

  • Open dashboard · dispatch 5+ agents · review
  • 10–15 PRs / day, all agent-authored
  • Persona reviewers post in 60s, all parallel
  • Auto-merge on green-CI + bot approval
  • Push back on agent feedback with reasoning
  • One SQL query in Grafana < 1 sec
  • 10 min/week reading the dashboard together
  • "What loops do we close this week?"
  • Refactors land in 1 day, parallel-dispatched
  • Bugs caught at PR time by bug_predictor
The extreme version · for the brave

Ryan's team's actual operating rules. Adopt as much as you can stomach.

If we want to ship like a 270-person team at 18 people, we need to be uncomfortable about how aggressively we ban human-written code. These are the rules that make it possible. Some will feel wrong on first read. Try them anyway.

Ban editors entirely.

Ryan's team is not allowed to open their code editors. Period. The only way to change code is to dispatch an agent. This forces every workflow improvement to flow through the harness — making the system better instead of working around it. Painful in week 1, transformative by month 3.

For graph8Run an experiment: pick one engineer for a 2-week sprint. They are not allowed to type code. All work goes through agents. Measure: PRs shipped, slop introduced, friction reported. If it works, expand.

Ship Lifecycle itself as a spec, not a product.

Ryan distributes Symphony as a 6-layer Markdown spec — "ghost library" — that codecs reconstitutes in any repo. Code becomes downstream of intent. graph8 could distribute Lifecycle the same way — every customer reconstitutes it locally, customized to their stack. Zero binary distribution, infinite leverage.

For graph8After day 30 of internal use, package Lifecycle's components (trace ledger schema · agent prompt patterns · cron definitions · CLAUDE.md template) as a single spec.md file. Test by pointing codecs at it in a clean repo and watching it reconstitute the whole system.

Architecture as enforced lint.

Ryan's repo: 750 PNPM packages with custom lint rules that fail builds if you cross package boundaries. The architecture isn't a doc — it's a CI failure. Boundaries become non-negotiable, which means agents (and humans) learn them instantly.

For graph8Pick the 5 most-violated architecture rules in g8 (e.g., layer crossing in Clean Arch, cross-product imports) · write a custom ruff or AST-based lint per rule · flip to blocking · let the GC Friday catch the drift.

Skill distillation · agents tune themselves.

Ryan: "You can just point codecs at its own session logs to ask it to tell you how to use the tool better." Agents introspect their failures and propose skill updates. The system improves itself, no human required to notice the pattern.

For graph8Add an 11th cron: skill_distiller · weekly · reads the trace ledger for the 100 longest sessions · classifies what context was missing · proposes additions to the relevant skill .md file as a PR. mbm_critic's sibling.

Pin this page. Reread it weekly.

The hardest part of Lifecycle isn't the engineering — it's the cultural discipline. These 10 principles are what make 5–15× output sustainable. If one feels uncomfortable, it's probably the one we need to lean into hardest.