graph8 is 15 engineers + 3 QA shipping 13 product boards across
3 first-party repos — g8 monorepo (all 13 product boards: Studio · Enrichment · Mashup · Signals · Copilot · Agents · Inbox · Stripe/Credits · Web Chat · Dialer · Sequencer · UX/UI · Voice AI), agent-os (company operations), infra (autonomous K8s engineering) — plus jitsu as an open-source dependency we'll fold into g8.
Today's real throughput across the top 4 repos: ~565 merged PRs / 30 days. Top performer
(Usjid): 37 PRs / 30d, hand-writing the code. After Lifecycle — with engineers banned from
touching editors and the 10 principles in force — each becomes
an assembly-line conductor running 10–20 parallel agents at any moment, plus 5+
overnight long-runs. Same 18 people. 10–30× throughput. Possibly higher once the PRD backlog
catches up. Ryan Lapo's 3-person team at OpenAI Frontier shipped 1M lines of code / 1,500 PRs
/ 9 months with zero hand-typed code — we have 5× their headcount across 13 products. The math is
conservative.
Every person on the graph8 engineering floor, with their actual last-90-days PR count from graph8-com/g8 and the projected after-Lifecycle range (8–15× the current). Same 18 humans · new job descriptions · same paychecks · 10–30× the team output.
Not a single PR in the g8 monorepo has sat untouched for > 7 days in the last 90. The team is healthy at today's throughput. Lifecycle's job is to keep this true as throughput goes 10–30× — automatic merges, agent-driven PR fixes, postmerge review, and the pr_reconciler hourly cron all exist to prevent regression. Healthy stays healthy.
test-writer agent + coverage policy · already shipping (6 PRs/30d)test-writer agent on every PRcontract_test_runner + cross-repo regression schemasbug_predictor false-positive tuning + smoke-test SLO10.7K-LOC AI Inbox visual polish + dashboard — largest single contribution in last 2 weeks. Shipped in one PR. Lifecycle would split this into 4–6 parallel agent runs and merge each as its own gated PR.
4 sequential CI/workflow PRs in 24h — RBAC fail-open, MBM review automation, sync token guard, lowercase normalizer. Lifecycle would dispatch all four in parallel + a 5th to write the regression tests.
103K-LOC qa→main merge — the kind of cross-domain integration that takes weeks of human attention. Under Lifecycle, qa→main auto-promotes after 48h soak. Single human button-click becomes zero clicks.
300 merged PRs / week sustained for two consecutive weeks. Cadence is stable, not accelerating. Lifecycle is what gets us from 300/week to 800–2,400/week without adding bodies.
Each product is a separate board in MBM with its own owner, its own backlog, its own dispatch queue. Today the boards exist but the engineering throughput is the bottleneck. After Lifecycle, each board has 5–10 agents running against it per day — and the constraint shifts to how fast does each owner write PRDs, not how fast does engineering code.
| Product board | Owner | PRs / 30d today | After Lifecycle | The new constraint |
|---|---|---|---|---|
| Studio | Usjid | 37 | 150–300 | PRD throughput · feature backlog |
| Inbox | Hassan + Waleed + Ibrahim | 40 (combined) | 180–360 | UX decisions · webchat/SMS provider integrations |
| Enrichment | Sadiq | 20 | 100–200 | Data vendor onboarding · waterfall logic |
| Web Chat | Waleed + Muhammad I. | 27 (combined) | 130–260 | Customer playbook · integrations |
| Copilot | Hamza + Huzaifa | 19 (combined) | 100–200 | Prompt iteration · eval harness |
| Agents | Huzaifa | 10 | 80–150 | Agent prompt iteration · per-agent eval |
| Signals | Oleksii + Sadiq | 6 | 60–120 | Mailbox provider onboarding · deliverability |
| Voice AI | Eeshan | N/A (separate repo) | 80–150 | Customer-specific call flows · sales coach prompts |
| Dialer | Abdullah | 6 | 60–120 | Telephony providers · ghost-numbers · Twilio rules |
| Stripe / Credits | Ibrahim + Hassan | 6 (in Inbox) | 40–80 | Pricing experiments · billing edge cases |
| Engage / Sequencer | Musa | 3 | 40–80 | Sequence templates · timezone logic |
| Mashup | Abdullah + Sadiq | part of Dialer + Enrichment | 40–80 | Cross-feature data flow |
| UX / UI | Joaquin | 2 | 30–60 | Design system iterations · component library |
| All 13 boards · total | 15 eng | ~ 200 | 1,100–2,150 | PRD backlog depth |
Note: "today" numbers are last 30 days from the g8 monorepo (highest-volume repo · houses all 13 product boards). Real org-wide today across our 3 first-party repos (g8 + agent-os + infra): ~565 PRs/30d. After-Lifecycle projection: ~5,600–17,000 PRs/30d at 10–30× across the same 3 repos. At that throughput, the per-product PRD backlog becomes the rate-limiting input — not engineering.
An engineer's day at graph8 (after Lifecycle) revolves around the agent fleet. Mornings dispatch. Afternoons review. Evenings dispatch the heavy stuff. Nights run autonomously. Mornings open with a digest of what happened while you slept.
Open the Lifecycle dashboard. Read the overnight digest from triage + task_planner + agent_health. Scan your inbox for PRs from overnight agent runs. Decide what to dispatch next.
While the morning agents run, you do the work that needs your judgement: architecture decisions, complex bug investigations, customer conversations, PRD drafting, design reviews on subtle UX.
The morning's 3+ agent runs have opened PRs. MBM has reviewed them. Walk the queue: merge the clean ones, kick the iffy ones back with a comment, sanity-check the risky ones yourself.
Same pattern. Afternoon's PRs land; you review and merge. By 15:00 you've shipped 4–6 PRs across the day without writing a line of code yourself. Spend the remaining hour on improvement work: read this morning's improver digest, pick one suggestion, dispatch it.
Before you log off, kick off the long-running agents. These are the tasks that take hours: bulk refactors, comprehensive test-writing on a module, cross-repo contract validation, knowledge-compactor analysis. They run while you sleep.
This is the asymmetric leverage. Twelve hours of autonomous agent work per night, every night. By morning, your overnight dispatches have opened PRs · the improver agent has logged suggestions · the bug_predictor has flagged risks on the current sprint · agent_health has scored the fleet · mbm_critic has classified yesterday's reviews.
The team arrives. Triage + task_planner + agent_health digests are waiting. Overnight PRs are in your inbox. Yesterday's improvements got picked up by the overnight runs. You start the morning with momentum, not from zero.
The math, grounded in real graph8 data from the last 30 days. Top performer Usjid ships 37 PRs/30d hand-writing code (~1.2/day). With assembly-line dispatch, the same engineer can have 10–15 parallel agents in flight at any moment. The merge rate drops (autonomous work isn't 100% mergeable), but the volume more than compensates. The same 15 engineers, doing 5–10× the work — possibly 15× once the PRD backlog is deep enough to keep all 15 conveyors loaded.
| Metric | Today · real data (last 30d) | After Lifecycle · assembly line | Lift |
|---|---|---|---|
| Top engineer · PRs / day | Usjid: ~ 1.2 (37/30d, g8 only) | 10–15 dispatched + reviewed | 8–12× |
| Median engineer · PRs / day | ~ 0.3–0.5 | 5–10 dispatched + reviewed | 10–25× |
| PRs / team / day (15 eng) · g8 only | ~ 7 (200 PRs/30d) | 75–150 | 10–20× |
| PRs / team / day · org-wide (g8 + agent-os + infra) | ~ 19 (565 PRs/30d top 4 repos) | 150–300 | 8–15× |
| Overnight PRs / night | ~ 0 (humans sleep) | 45–90 (3–6 per engineer overnight) | ∞ |
| Total org PRs / month | ~ 565 | 3,400–6,500 | 6–12× |
| Cycle time · idea → prod | 1–4 weeks (median ~2 wk) | 4 hrs – 2 days (median ~1 day) | 7–15× |
| Cycle time · bug → fix in prod | 4–48 hours (Sentry-then-human) | 10 min – 2 hr (agent autofix) | 12–50× |
| Bug detection | Reactive (Sentry hits prod) | Predictive (bug_predictor at PR time) | qualitative |
| QA throughput · stories / day | ~ 15 (3 QA × manual) | 100–200 (agent-orchestrated) | 7–13× |
| "Did the agent run?" answer time | Slack question, 5–30 min | SQL query, < 1 sec | 100×+ |
| Status meetings / week / person | ~ 5 hrs | ~ 1 hr (dashboard replaces standups) | 5× |
Once engineering capacity stops being the constraint, the rate limit becomes how fast we feed the assembly line. At 5–10× throughput, the bottleneck moves to four upstream questions:
(1) Are there enough approved PRDs per product board to keep the dispatch queues loaded?
(2) Is architecture coherent enough to absorb 200+ PRs/day without drift? (knowledge_compactor + contract_test_runner handle this.)
(3) Can MBM + the 3 QA review at this rate? (Bot review + agent orchestration scale.)
(4) Is the customer feedback loop fast enough to know what to build next?
Solve those four and the ceiling becomes 15×+. Lifecycle ships the engineering plumbing; the PRD-throughput half of the equation is the next horizon.
The QA function gets the largest relative leverage from Lifecycle. Today's manual-test loop becomes an agent-orchestration loop. Same 3 people, ~5× the surface area covered.
test-writer agent · sets coverage policy · reviews generated testscontract_test_runner · maintains cross-repo schemas · resolves contract breaksbug_predictor · tunes false-positive rate · runs the smoke-suite SLObug_predictor · sentry-detector loopWhen agents do most of the writing and reviews are batched, the human-coordination rhythm has to adapt. Fewer status meetings. More dashboard-reading. New rituals around fleet health.
mbm_critic weekly digest + agent_health survival-rate trends. Discussion is "which agent prompts need tuning · which loops to close next."triage cron. Weekly meeting becomes 10 min on escalations only.knowledge_compactor surfaces the patterns; review the diffs to CLAUDE.md and engineer-domains.json before approving.gc-friday:. knowledge_compactor + mbm_critic hand you the Monday-morning candidate list.Each engineer becomes a conductor for a small fleet. Morning batch + afternoon batch + 1–3 overnight runs. Across 15 engineers, that's 75–120 agent runs per day. Plus the org-level cron-driven agents (monitor, triage, agent_health, etc.).
| Window | Parallel agents / engineer | Total runs / team (15 eng) | What they do |
|---|---|---|---|
| Morning batch (08–12) | 5–10 in flight | 75–150 | Product-board PRD tasks · bug fixes · small refactors · most-touched code |
| Afternoon batch (13–17) | 5–10 in flight | 75–150 | Broader-scope work · improvement-loop dispatches · test-writer follow-ups |
| Overnight long-runs (18–07) | 3–6 per engineer | 45–90 | Big refactors · comprehensive test suites · cross-repo contract validation · knowledge_compactor analysis |
| Org-level crons | — | ~ 15 / day | monitor · triage · task_planner · improver · agent_health · mbm_critic · pipeline-analyzer · pr_reconciler · pr_hygiene · knowledge_compactor · skill_mortality |
| Total agent runs / day | — | 210–405 | vs ~ 20 K8s-job runs/day today. 10–20× agent dispatch volume. |
Real assembly-line math. A single engineer can supervise 10–15 parallel agents at any moment — the agents work asynchronously, the engineer reviews and re-dispatches. With 15 engineers each running this pattern, plus overnight dispatches that work for 12 hours autonomously, the org runs 300+ agent runs/day. At 50–60% merge survival, that's 150–240 merged PRs/day vs today's ~19. The math holds as long as the PRD backlog is deep enough to keep the conveyors loaded — see the bottleneck box above.
The bottom-line projection. Same 18-person team, same products. What changes is the throughput.
| Output | Today (per quarter, real) | After Lifecycle (per quarter) | Lift |
|---|---|---|---|
| Total merged PRs · org-wide | ~ 1,700 (565/mo × 3) | 10,000–20,000 | 6–12× |
| Features shipped | ~ 20–30 | 150–300 | 7–10× |
| Bug fixes shipped | ~ 200 | 1,500–2,500 | 7–12× |
| 5xx errors resolved < 2 hr | ~ 10% | ~ 85% | 8× |
| PRD → first release time | 3–6 weeks | 1–5 days | 8–15× |
| Architecture docs maintained | stale within 60d | auto-updated nightly (knowledge_compactor) | qualitative |
| Test coverage growth | ~ 0.5% / quarter | ~ 5–10% / quarter (test-writer on every PR) | 10–20× |
| Cross-repo regressions caught at PR time | ~ 0% | ~ 85% (contract_test_runner) | qualitative |
| New products / quarter shipped to GA | ~ 1–2 | 5–10 | 5× |
At 5×, graph8 ships like a 75-person team while staying at 18. At 10×, like a 150-person team. At the ambitious 15× (achievable with full backlog depth + overnight cadence), like a 270-person team. A competitor hiring against this can't catch up by adding bodies — only by adopting the same operating model, which takes them 12–18 months to even decide to do, let alone execute. graph8's window to make this transition before the rest of the market is right now. The 7-day setup closes the window.
And the cost of being wrong is bounded. Even at the conservative 5× read, the math justifies the 7-day investment many times over: same 18 people, same payroll, the only new line item is the trace-ledger Postgres table + a Cloudflare Worker. There is no scenario where this doesn't pay back inside a quarter.