graph8 lifecycle
graph8 internal
graph8's transformation · 18 people · 13 boards · 3 repos · 0 human-typed code

How graph8 ships like a 270-person team while staying at 18 — by banning human-written code.

graph8 today: 15 engineers + 3 QA across 13 product boards (Studio · Inbox · Enrichment · Web Chat · Copilot · Agents · Signals · Voice AI · Dialer · Stripe/Credits · Engage · Mashup · UX/UI). Real throughput last 30 days: ~565 merged PRs across the top 4 repos (top performer Usjid: 37 PRs/30d, hand-writing the code). Most agent activity (~70%) runs invisibly on personal Max plans. The change is harness engineering, full send: one shared trace, one policy plane, one dashboard — engineers banned from touching editors — each running 10+ parallel agents at any moment against the 13 boards, plus overnight long-runs. Set up in 7 days. Same 18 people. 10–15× the output. The new bottleneck isn't engineering — it's how fast we feed the assembly line with PRDs. Read the 10 principles first →

Where you fit · find your name
Real 90-day merged-PR counts from graph8-com/g8 · after-Lifecycle = 8–15× the same engineer dispatching agents instead of typing
Usjid Nisar
Studio · top shipper
117 PRs/90d → ~950–1,750
Muhammad Sadiq
Enrichment · Mashup
99 PRs/90d → ~800–1,500
Hassan Baig
AI Inbox
56 PRs/90d → ~450–850
Muhammad Waleed
Web Chat · sync
48 PRs/90d → ~380–720
Thomas Cornelius
Infra · Product
44 PRs/90d → ~350–660
Huzaifa Aamir
Agents
27 PRs/90d → ~220–410
Oleksii Bokov
Signals · Appts
14 PRs/90d → ~110–210
Shaharyar Ahmad
Database · MBM
12 PRs/90d → ~95–180
Multiplier varies by work type: surgical fixes scale higher (15×) than architectural overhauls (8×). All numbers from gh pr list --repo graph8-com/g8 --state merged --search "author:<you> merged:>2026-02-17". Don't see yourself? Eeshan · Musa · Ibrahim · Joaquin · Hamza · Muhammad I · plus the 3 QA (Ayesha · Rania · Immama) are all in the full engineer wall.
24/7
Continuous agent runs · day & night
~30%
Of agent activity org-visible today

Click any number below to drill into the full list. Click any item in the list to see the underlying file content — the actual skill prompt, the actual agent definition, the actual workflow YAML.

Today · status quo

Six real journeys an agent or engineer takes every day at graph8.

Click any journey to walk through it end-to-end. Click a skill name (purple) to see what it does, who can fire it, and whether the org has any record of the run. The most common surprise: more than half of the steps run somewhere the org cannot see.

Click a journey above to walk through every step.
30%

The org sees roughly 30% of agent activity.

K8s-job agent runs are observable (Loki logs, MBM postgres). Engineer-local Claude Code and Codex sessions — where most skill invocations actually happen — are not. There's no audit log of which skills fired, on whose machine, against which repo, with what model token. This is the single biggest control gap.

30% visible (K8s, MBM, GitHub events) 70% invisible (local CLI, personal tokens)
Tomorrow · what should change

Seven moves, ranked by impact, to close the loop.

Each move is one PR away from starting. The order matters: the first three unlock the rest by giving the org the data and the levers it currently lacks.


Simplifying the skill surface

22 skills sounds modest, but several do the same job under different names. Here's the merge table — fewer, sharper skills means engineers actually remember what fires when, and the org can write meaningful policy.

Skills todayBecomesWhyRemoves
/start · /analyse-system · /investigate-bug/start (auto-routes by tier)/start already classifies tier; the other two should be branches inside it2 skills
/commit · /ship/ship (commit becomes a sub-step)You never want to commit without intent to ship1 skill
/review · /security-review/review --securitySecurity review is a stricter rubric on the same skill1 skill
/assign-prds · /capacity-check/capacity (one command, two views)Both read the same data; splitting forces two queries for one decision1 skill
/check-prds · /check-architecture/check (subcommand picks target)"Check" skills should be a verb with an object, not separate skills1 skill
/article-image · /changelog-image · /write-article · /changelog/publish (article + image bundled)These belong together — writing an article without picking its image is rare2 skills
22 skills → 14 skills8 fewer skills. Same coverage. Org can write 14 policies instead of 22.
Horizon · closing the agent-to-agent loops

Today every agent writes to humans. Tomorrow, agent A's output is agent B's input.

That's the leverage. Right now 25 agents fan out into Roam digests, Slack pings, and GitHub issues — and a human reads them all and decides what to do. That doesn't scale at 18 engineers running 13 product boards. Ten loops below close the gap. All shippable in days or weeks. None take 30 days end-to-end. Each loop maps to one or more of the 10 principles →

External validation · this exact pattern works

Ryan Lapo's team at OpenAI Frontier (3 people · 1 million lines of code · 1,500 PRs · zero human-typed lines · 9 months) ships exactly this way: "PR comments indicate some context failure on behalf of the agent — get it into the repository and figure out ways to automatically prompt-inject the agent so it self-heals." Same pattern, different name. Their full playbook → openai.com/index/harness-engineering · our adaptation → the 10 principles.

Bug-prevention loop · 7 days

Closing question: why are bugs reaching prod when their patterns are visible at PR time? Mechanism: an agent reads each PR diff plus the recent Sentry corpus, flags lines whose patterns historically broke things. Comments on the PR with predicted risk.

First step: Day 1–5, ship bug_predictor agent (new TaskSpawner triggered on pull_request.opened in g8). Day 6–7, tune false-positive rate on the last 30 days of merged PRs before turning on for everyone.
EFFORT
Medium
IMPACT

Test-coverage loop · 3 days

Closing question: why are PRs landing with new public functions that have no tests? Mechanism: The test-writer agent already exists at g8/.claude/agents/test-writer.md — it's defined but not dispatched. Wire it to PR-open events on any new public function without coverage.

First step: Day 1, GitHub Action triggers on PR open. Day 2, action invokes the existing test-writer agent against the diff. Day 3, agent opens a follow-up PR with the missing tests, MBM reviews both PRs together.
EFFORT
Small
IMPACT

MBM-feedback loop · 14 days

Closing question: is MBM getting smarter, or drifting? Mechanism: a nightly agent reads PRs where MBM's CHANGES_REQUESTED was dismissed by the merger; classifies as "MBM was wrong" vs "human shipped anyway." Proposes rubric updates as a PR against tenants/graph8-eng/agents/reviewer/prompt.md.

First step: Day 1–7, ship mbm_critic agent (runs daily via cron). Day 8–14, after first week of data, draft the first rubric-update PR. Each future PR is opt-in via human approval, not auto-applied.
EFFORT
Medium
IMPACT

Agent-health loop · 7 days

Closing question: are our 25 agents actually good, or are some silently making things worse? Mechanism: measure each axon agent's PR survival rate (merged + not reverted within 30 days). Flag agents below threshold. Surface the unwired ones (pr_cleanup, rage_click_detector) and dispatch-them-or-delete-them.

First step: Day 1–5, ship agent_health cron — reads MBM postgres + GitHub, writes per-agent stats to a table. Day 6–7, add a single per-agent panel to the dashboard, alert on regression.
EFFORT
Small
IMPACT

Knowledge-compounding loop · 14 days

Closing question: why do agents re-derive the same context every run? Mechanism: instrument agents to log which files / searches they repeatedly fetch. Daily cron analyzes the log, opens a PR proposing additions to CLAUDE.md (root or feature-level). Same pattern for engineer-domains.json — derive expertise from actual commit patterns, not human-maintained guesses.

First step: Day 1–7, add context-fetch logging to the axon agent runner. Day 8–14, ship knowledge_compactor cron that opens the first CLAUDE.md-update PR.
EFFORT
Medium
IMPACT

Skill-mortality loop · 3 days (after ledger)

Closing question: which of our 22 skills should we delete this week? Mechanism: once the trace ledger lands (rec #1, day 7), usage analytics flag skills with <5 fires/month. Auto-open consolidation issues, link to the merge-table proposal already documented.

First step: Days 8–10 (right after ledger ships day 7). One SQL query + an auto-issue creator. Cleanup PR per dead skill. By day 30, you're back to 14.
EFFORT
Small
IMPACT

Cross-repo regression loop · 30 days

Closing question: how do we catch a change in g8 that breaks graph8-com/g8-eda-server before deploy, not after? Mechanism: a registry of cross-repo contracts (event schemas, RPC signatures). Agent runs contract tests on every PR that touches a known interface. Failure blocks merge.

First step: Week 1, register the top 5 cross-repo contracts as JSON-schemas. Week 2, build contract_test_runner agent triggered on PR. Week 3–4, broaden to remaining contracts, turn on enforcement.
EFFORT
Medium
IMPACT

Onboarding loop · 30 days

Closing question: why does a new engineer ask Shaharyar the same questions every cohort? Mechanism: per-engineer onboarding agent watches the first 4 weeks of activity. Surfaces "you haven't tried /start yet," "you keep editing this file — here's the convention," points to feature-level CLAUDE.md they'd benefit from. Graduates them off training wheels at week 4.

First step: Week 1, add joined: YYYY-MM-DD to engineer-domains.json. Week 2–4, build onboarding agent triggered for engineers within 28 days of join.
EFFORT
Medium
IMPACT

Garbage Collection Friday · weekly cadence (forever)

Closing question: what stops Friday from being the day where last week's slop becomes next week's encoded knowledge? Mechanism: every Friday afternoon, each engineer picks one recurring pattern from the week's PR comments / fix-cycles / friction and turns it into a lint, a CLAUDE.md addition, a review-agent rubric update, or a new test. Eliminates a class of misbehavior, not an instance. This is principle #5.

First step: Day 8 (Mon of week 2) — Monday morning, knowledge_compactor + mbm_critic deliver a curated "GC candidates" digest. Friday afternoon: ship one cleanup per engineer. Commit prefix gc-friday:.
EFFORT
Small
IMPACT

Lints-as-prompts · the diagnostic-rewriting loop · 30 days

Closing question: why is every lint error message generic — "unused variable" — when it could be a remediation prompt that names the canonical alternative and links to the right CLAUDE.md section? Mechanism: audit every lint message in the codebase. Rewrite each to be a one-line prompt for the agent ("use fx_org_id from tests/conftest.py:42 — canonical at graph8"). Then add bespoke rules that enforce the patterns the agents most often violate (file size, package boundaries, single canonical zod schema). This is principle #7.

First step: Week 1 — pick the top 10 lint/test failures by frequency from last 30 days of CI. Week 2 — rewrite their messages as remediation prompts. Week 3–4 — author 5 bespoke lints to enforce architecture invariants (cross-product imports, layer crossings, duplicate zod schemas).
EFFORT
Medium
IMPACT
The meta-observation

Of the 25 axon agents running today, not one reads the output of another. The improver agent writes to Roam. The pipeline-analyzer writes to Roam. The monitor opens GitHub issues. The flywheel writes monthly digests. MBM's reviews go to PRs. Every agent's output is consumed by a human, who then decides whether to do something about it. That's the bottleneck. The eight loops above are eight places to wire agent → agent directly. Compound intelligence starts there.

Open the day-1 blueprint ↗
The change · what graph8 is building

One trace. One policy. One dashboard. Five capabilities graph8 is wiring up over 7 days.

Built on top of what graph8 already runs — Claude Code locally, Codex locally, K8s axon agents, MBM, Modal, GitHub, Slack, Roam — these five capabilities turn the 25 existing agents into a measurable, governed fleet. Then the 18-person team starts dispatching 3+ agents per day in parallel. See what the team looks like after →

Five capabilities. One control plane. Set up in 7 days at graph8.

Universal trace

Every skill, every agent, every run — captured wherever it happens. Local Claude Code, local Codex, K8s Jobs, Modal, webhook handlers. One timeline per engineer, per repo, per journey.

  • OAuth-pool-scoped activity ledger
  • Mandatory client-side wrapper
  • Per-tool latency + success telemetry
  • Per-engineer skill utilization (Max plans)
  • Per-K8s-agent unit economics (OAuth pool)

Utilization + agent economics

Two slices that actually matter for AI-first teams. Per-engineer Max-plan utilization — are they leveraging their seat or coasting? — and per-K8s-agent unit economics — which autonomous agents are worth their token cost? Token $-per-engineer isn't a thing on Max plans; utilization is.

  • Skill fires / day / engineer (utilization proxy)
  • Skill-to-PR ratio per engineer (autonomy ROI)
  • Cost per merged PR, per K8s agent
  • Underutilization + cost-blowout alerts

Org policy engine

Write a policy once, enforce it everywhere. "No /ship on prod database migrations without 2 reviewers" — binds whether the engineer fires from laptop or K8s agent.

  • YAML or visual policy editor
  • Pre-flight checks at skill invocation
  • Versioned, code-reviewable

Journey explorer

Every workflow visualized as a traceable journey. Stale PR? Click it, see every retry, every skill that fired, every reviewer comment, every backoff, every human handoff.

  • Real-time + historical views
  • SLO per step, alert on regression
  • Replay any journey end-to-end

Day / night cadence

Work doesn't stop when engineers log off. Lifecycle keeps agents running on cron + webhook triggers, escalates to human-in-loop only when policy requires, presents results as a morning digest.

  • Configurable escalation routes
  • Roam / Slack morning digest
  • Rate-limit-aware OAuth rotation

Per-engineer utilization + per-agent economics

The metering that actually exists on AI-first teams using Max plans. Engineer side: $ per engineer is flat ($200/mo seat), so the live metric is utilization — skill fires, skill-to-PR ratio, whether the seat is being leveraged or wasted. Agent side: K8s autonomous agents pay per-token via the OAuth pool, so unit economics are real and measurable. Both tables below are illustrative — real numbers populate the day the ledger ships (target: 7 days).

Engineer utilization · last 30 days

Engineer Skill fires PRs merged Skill→PR Top skill Utilization
Thomas C.3122413:1/startexcellent
Shaharyar K.1981910:1/investigate-bugstrong
Engineer C2541616:1/analyse-systemstrong
Engineer D3872814:1/shipexcellent
Engineer E84118:1/commitdeveloping
Engineer F spinning412759:1/investigate-bugspinning
Engineer G1421410:1/startstrong
Engineer H untapped1262:1(mostly manual)untapped
Team median1701511:1

K8s autonomous-agent unit economics · last 30 days · OAuth pool

Agent Runs Token $ PRs opened Merged (30d survival) $ / merged PR
pr_fixer142$5808976$7.63
infra24$3902222$17.73
g8_5xx_fixer high cost38$1,2403831$40.00
g8_frontend19$3101816$19.38
monitor60$180opens issues, no PRs
improver30$290writes to Roam, no PRs
pr_cleanup · rage_click_detector unwired0$0defined, not dispatched
Spinning · check in today

Engineer F fired 412 skills for 7 merged PRs (59:1 vs team median 11:1). Most likely looping in /investigate-bug on a hard problem. A pair-session today beats another week of solo grinding.

Untapped seat · onboard this week

Engineer H fired 12 skills total this month — the seat is paid for and barely used. Either skeptical of the tools or never onboarded properly. AI-first orgs can't afford a 1-in-8 "mostly manual" engineer.

High-cost agent · audit this week

g8_5xx_fixer costs $40 per merged PR — 5× the median agent cost. Either the fixer prompt is bloated, or it's burning tokens on impossible bugs. Either way, audit the last 10 runs and trim.

Dead-code agents · remove today

pr_cleanup and rage_click_detector are defined with prompts but never dispatched. They're noise in the catalog. Either wire them this week or delete them today.


graph8 today (last 30 days · real numbers)

  • ~ 565 merged PRs / 30d across top 4 repos (g8 · infra · agent-os · customer-hub)
  • Top engineer (Usjid) ships 37 PRs/30d hand-writing code · median engineer 6–10
  • Team sees ~30% of agent activity (K8s slice only)
  • 0% visibility into per-engineer Max-plan utilization
  • Skills live in 4 places, 22 of them, no usage data
  • "Did the agent run?" gets asked in Slack daily
  • Each engineer context-switches across 2–3 product boards
  • Cycle-time: feature → prod = 1–4 weeks · bug → fix = 4–48 hrs
  • Bugs caught reactively in prod via Sentry

graph8 after Lifecycle (7 days later · projected)

  • 3,400–6,500 merged PRs / 30d org-wide · 6–12×
  • Each engineer dispatches 10–15 parallel agents · reviews PRs as they land
  • 100% of agent activity in one ledger · per-engineer utilization grade daily
  • 14 sharp skills with fire-rate + abandon-rate
  • One Slack post replaces 5 hours of daily standups per person
  • Dashboard answers "did the agent run?" in < 1 sec
  • Cycle-time: feature → prod = 1–5 days · bug → fix = 10 min – 2 hrs
  • Bugs predicted at PR time by bug_predictor · 5xx auto-fixed in < 2 hr 85% of the time
  • New constraint: PRD backlog depth, not engineering capacity

Why now for graph8 — at 18 people, this is the window.

graph8 is past the point where one eng lead holds it all in their head — too many products, too many surfaces, too many agents. But graph8 is not yet at the size where a platform team funds itself. The 7-day setup gives the existing 18 people the leverage of a 70-person team — without the hiring, the comms overhead, or the platform-org tax. The window is open right now; close it before the next product launch.

18 people · 13 product boards · 3 repos

Three first-party repos hold everything: g8 monorepo (all 13 product boards) · agent-os (company operations) · infra (autonomous K8s engineering). Plus jitsu as an open-source dependency to fold back into g8. Each engineer context-switches across 2–3 product boards inside g8; the dashboard collapses that into one queue.

25 agents already running

The axon platform is here, the OAuth pool is here, MBM is here. Lifecycle isn't a from-scratch build — it's wiring + visibility on what graph8 already has. The 7 days is mostly plumbing.

5–10× output, same team. Possibly 15×.

Real today: ~565 PRs/30d org-wide. After: 3,400–6,500 PRs/30d. graph8 ships like a 75-person team at 5×, a 150-person team at 10×, a 270-person team at the ambitious 15×. Full velocity math →

Catalog

On this page