graph8 lifecycle — local setup for engineers

1 What you install

Both tools. Desktop or CLI for each. Four install paths · pick the two you'll use.

Required: Claude Code AND Codex. The harnesses are post-trained alongside the models — using one means giving up half the leverage. Pick desktop or CLI per tool based on your workflow (most engineers run desktop for one + CLI for the other). The same MCP wrapper plugs into all 4 paths.

Tool 1 Claude Code · Opus 4.7 + 1M context · deep reasoning, multi-step refactors, agentic loops

Path A · Claude Code desktop

Install: claude.ai/download

Native macOS / Windows app. Best for long agentic sessions, visual diff review, integrated browser preview, and the floating Skill tool palette. Configure once in ~/.claude/mcp.json:

// ~/.claude/mcp.json
{
  "mcpServers": {
    "lifecycle-trace": {
      "command": "npx",
      "args": ["-y", "@graph8/lifecycle-trace-mcp"],
      "env": {
        "MBM_URL": "https://mbm.graph8.com",
        "GITHUB_USER": "<your-github-handle>",
        "RUNTIME": "claude-code-desktop"
      }
    }
  }
}

Restart Claude Code. Trace coverage: 100% (tool calls + skill fires via SessionStart/Stop hooks in settings.json).

Path B · Claude Code CLI

Install: `npm i -g @anthropic-ai/claude-code`

Terminal-native. Best for engineers who live in iTerm/tmux, want one-shot scripted runs (claude code -p "..."), or pipe Claude into Unix tools. Same MCP config, same env vars — just a different runtime label:

# ~/.claude/mcp.json (CLI reads the same file)
{
  "mcpServers": {
    "lifecycle-trace": {
      "command": "npx",
      "args": ["-y", "@graph8/lifecycle-trace-mcp"],
      "env": {
        "MBM_URL": "https://mbm.graph8.com",
        "GITHUB_USER": "<your-github-handle>",
        "RUNTIME": "claude-code-cli"
      }
    }
  }
}

Run claude code once in any repo to confirm it picks up the wrapper. Trace coverage: 100% (CLI fires the same hook surface as desktop).

Tool 2 Codex · GPT 5.4 + 1M context · speed-of-iteration, huge codebases, parallel branches

Path C · Codex desktop (ChatGPT app)

Install: chat.openai.com/download · enable Codex in settings

The Codex panel inside the ChatGPT desktop app. Best for fast iteration on a single file, side-by-side with a chat, or when you want OpenAI's harness wired to your IDE via the local agent. Add the MCP wrapper to ~/.codex/mcp.json:

// ~/.codex/mcp.json
{
  "mcpServers": {
    "lifecycle-trace": {
      "command": "npx",
      "args": ["-y", "@graph8/lifecycle-trace-mcp"],
      "env": {
        "MBM_URL": "https://mbm.graph8.com",
        "GITHUB_USER": "<your-github-handle>",
        "RUNTIME": "codex-desktop"
      }
    }
  }
}

Restart the ChatGPT app. Trace coverage: ~80% via MCP (slash commands inside the ChatGPT UI aren't tools and don't fire hooks — supplemented by Codex's own activity log).

Path D · Codex CLI

Install: `npm i -g @openai/codex` · then `codex login`

Terminal-native. The harness OpenAI Frontier uses internally (see Ryan Lapo). Best for scripted runs, headless agents, long-context refactors, and dispatch from mbm.spawn_agent. Authenticates via OAuth — the auth blob lives in ~/.codex/auth.json (also what we pool for the Modal runtime).

# ~/.codex/mcp.json (CLI reads the same file as desktop)
{
  "mcpServers": {
    "lifecycle-trace": {
      "command": "npx",
      "args": ["-y", "@graph8/lifecycle-trace-mcp"],
      "env": {
        "MBM_URL": "https://mbm.graph8.com",
        "GITHUB_USER": "<your-github-handle>",
        "RUNTIME": "codex-cli"
      }
    }
  }
}

Run codex in any repo. Trace coverage: 100% (CLI surfaces every tool call as a hook). Bonus: your OAuth blob is what the cloud Codex runtime pool consumes — if you'd like to donate a slot, follow the infisical secrets set --path /oauth-pool/openai-codex/... path in CLAUDE.md.

Per-repo · already done for you

Both graph8-com/infra and graph8-com/g8 already register the MCP wrapper in .claude/mcp.json and .codex/mcp.json. When you clone either repo, all 4 install paths pick it up automatically — no per-repo edits needed. Only the global ~/.claude/mcp.json and ~/.codex/mcp.json need editing once on each laptop, then both desktop and CLI inherit it.

Verify all 4 paths

-- After firing one tool call in each, you should see 4 distinct runtime rows:
SELECT runtime, COUNT(*) FROM skill_invocations
WHERE engineer_id = '<your-handle>'
  AND ts > NOW() - INTERVAL '1 hour'
GROUP BY runtime;
-- claude-code-desktop  | 1
-- claude-code-cli      | 1
-- codex-desktop        | 1
-- codex-cli            | 1

1.5 When to use which tool

Pick the right harness for the task. The model + harness pairing is the leverage — not the model alone.

The harness is post-trained on the model. Claude's harness rewards multi-step plans and self-correction. Codex's harness rewards parallel exploration of huge codebases and surgical edits. Both teams sat with engineers using their tool for a year and shaped the post-training around what worked. Use the right one and you ship 2–3× faster than picking your favorite and forcing the task to fit.

Task shape	Claude Code	Codex	Why
Multi-file refactor with risk	✓ pick this	—	Claude's harness plans, checks, reverses. Lower regression rate on cross-cutting changes.
Bulk migration across 50+ files	—	✓ pick this	Codex's 1M context + parallel apply. Burns through repetitive edits faster.
Greenfield feature, fuzzy spec	✓ pick this	—	Claude asks better clarifying questions, drafts a plan you can edit before code lands.
Targeted bug fix in known file	also fine	✓ pick this	Codex is faster end-to-end when scope is bounded — less ceremony before the edit.
Reading a huge unfamiliar codebase	also fine	✓ pick this	Drop the whole repo in context, ask Codex to map it. Claude tends to skim too quickly.
Test-driven session (write test → fail → fix)	✓ pick this	—	Claude's loop discipline (run test, read output, iterate) is the strongest in the space.
Dispatching cloud agents in parallel	✓ pick this	—	`mbm.spawn_agent` + Claude's `Skill` tool composes cleanly. Codex parallelism is in beta.
PR review on a 5K-LOC diff	also fine	✓ pick this	1M context comfortably eats the diff + base files. Faster summary, less truncation.
Writing a PRD or design doc	✓ pick this	—	Claude's prose quality + the `/write-prd` skill chain. Codex is terser but less structured.
SQL / data exploration in OpenSearch / PG	also fine	✓ pick this	Codex's faster cycle on iterative query refinement. Use Claude only if you need narrative output.
Long-running agentic loop (overnight)	✓ pick this	—	Claude's hibernation + replay semantics + Modal runtime support are first-class. Use Codex CLI for parallel branches.

Most engineers' setup

Desktop: Claude Code (for the floating Skill palette + visual diff)
CLI: Codex (for piping, scripted runs, and donating the OAuth slot)
Why: One GUI for plan/review, one CLI for headless work. Both tools available, both harnesses fire.

If you live in the terminal

CLI only: Claude Code CLI + Codex CLI
Use tmux or zellij to keep both alive in parallel panes
Why: All 4 paths emit the same trace rows. No coverage gap from skipping the desktop apps.

Ryan Lapo's point (OpenAI Frontier): the harness matters as much as the model. Codex CLI's parallel tool execution + 1M context + iterative apply patches isn't just a wrapper — it's what their RL post-training optimized for. Same with Claude Code's plan-and-revise discipline. Pick the harness whose post-training matches your task, not the model name. That's why we require both installed.

2 What gets captured

Every skill fire. Every tool call. Per-engineer · per-skill · per-day.

One row per invocation in skill_invocations. Visible in Grafana within minutes. Privacy by design — file paths and skill names are captured, prompt content is not.

The trace row · what one skill fire looks like

-- Example row from skill_invocations after you run /start
{
  ts:             "2026-05-17T14:23:01Z",
  engineer_id:    "hassanbaigy",
  skill_name:     "/start",
  repo:           "graph8-com/g8",
  runtime:        "claude-code-local",
  model:          "claude-opus-4-7",
  latency_ms:     1240,
  status:         "success",
  input_tokens:   null,    -- max plan = no metering
  output_tokens:  null,
  cost_usd_cents: null,
  arg_hash:       "sha256:8f3a..."  -- one-way hash of the args
}

What we capture

Skill / tool name · engineer GitHub handle · repo · model · latency · success or abandoned · anonymized argument hash · timestamp

What we never capture

Prompt content · code you typed · file contents · API responses · personal access tokens · anything in your ~/.ssh

Where it goes

POST to https://mbm.graph8.com/v1/trace · stored in skill_invocations Postgres table · visible in Grafana dashboard grafana.graph8.com/d/lifecycle-engineer

How fast it shows up

Within 60 seconds of the skill fire. Dashboard auto-refreshes every minute.

If MBM is down

Fire-and-forget · the trace is dropped silently · never blocks your skill. The hook has a 200ms timeout.

What you see for yourself

Your own utilization grade · skill mix · daily fire count · top-fired skills · skill-to-PR ratio. All at your row on the team wall.

3 Hybrid execution · local UI · cloud execution

Use `mbm.spawn_agent` when you want the cluster to do the work.

For heavy work — bulk refactors · overnight runs · jobs that need private cluster context (Postgres · OpenSearch · internal APIs) — you can dispatch from your local session to the K8s axon fleet. The agent runs on the cluster using OAuth-pool tokens; results stream back into your terminal via SSE.

The flow · local dispatch · cloud execution · streamed results

You in Claude Code desktop locally:
  > Use the heavy-lift agent to refactor all
    integrations_v4 modules to use the new client base.

Claude calls the MCP tool mbm.spawn_agent:
  POST /v1/spawn-agent
  {
    "skill":     "refactor",
    "repo":      "graph8-com/g8",
    "context":   "integrations_v4 · client base migration",
    "budget":    "high",
    "callback":  "sse://<your-session-id>"
  }

MBM creates a Task CR in axon-system K8s namespace:
  → spawns g8 agent pod
  → uses OAuth pool token (not your personal Max plan)
  → has access to Postgres, OpenSearch, internal APIs

Output streams back to your local session via SSE:
  [pod started] Reading integrations_v4/* ...
  [pod working] Identified 23 files to refactor
  [pod working] Opened PR #7141: "refactor(integrations_v4): adopt client base"
  [pod done] Total runtime: 8m 42s · cost: $4.18

You see the work happen in your session — but it ran on the cluster.

Use local execution (default) when…

The task is < 5 min
You want to see the diff being written in real time
The repo has all the context the agent needs
You're iterating tightly (write · review · ask · rewrite)
You're under your Max plan rate limit

Use `mbm.spawn_agent` when…

The task takes > 15 min and you want to walk away
The work needs private cluster access (Postgres queries · OpenSearch · service APIs)
You're queueing overnight runs before logging off
You've already hit your Max plan limit today
The job needs parallel agent dispatch (e.g. 10 modules refactored simultaneously)

4 A day with Lifecycle installed

What your week looks like in practice.

Same Claude Code or Codex you use today · plus the trace MCP wrapper running silently · plus the option to dispatch to the cluster when you want. Three rituals change.

1

Morning · open the dashboard before you open the editor

First tab of the day: grafana.graph8.com/d/lifecycle-engineer. Check your row · skill fires from yesterday · overnight dispatches that landed PRs · utilization grade. Spend 30 seconds reviewing, then dispatch the morning queue.

Verify: your engineer_id appears with at least 1 fire from yesterday + any overnight dispatches

2

During the day · fire skills locally · everything traced silently

Use Claude Code or Codex normally. /start · /investigate-bug · /ship · tool calls · Web searches — all captured. No friction. Your dashboard row updates within 60 seconds of each fire. If you cross 10 skill fires by midday, you're operating at the principle-1 cadence.

Verify: end of day count in dashboard ≥ 10 skill fires for you

3

Evening · queue overnight runs via `mbm.spawn_agent` before you log off

Pick 1–3 heavy tasks and dispatch them to the cluster. They run on OAuth-pool tokens (not your Max plan), use cluster context, and post PRs by morning. The next morning's dashboard shows what landed.

Verify: overnight Task CRs visible in kubectl get tasks -n axon-system · PRs in your inbox by 7 AM

5 Troubleshooting

If something looks wrong.

Most issues fall into 4 buckets. If none of these match, post in #engineering.

"My dashboard row is empty"

1. Did you restart Claude Code / Codex after editing the MCP config? 2. Is the GITHUB_USER env var your actual GitHub handle (not your name)? 3. Run curl -s $MBM_URL/v1/health — should return {"ok":true}.

"I see traces from 'unknown' engineer"

You missed the GITHUB_USER env var. Fix it in your ~/.claude/mcp.json and restart. Old unknown rows can be deleted by Thomas or Shaharyar on request.

"Codex traces don't have skill names"

Known limitation — Codex's tool interception doesn't include slash command names. Open issue tracking native Codex hook support. For now: rely on Claude Code as your primary, use Codex for shorter sessions.

"`mbm.spawn_agent` says rate limit"

The OAuth pool hit a rate ceiling. Wait 5 min and retry · or fall back to local execution for now. Pool rate-limit handling is in codex_pool at services/mbm/internal/pool/.

"My latency_ms looks high"

The MBM URL might be wrong (defaults to graph8.com prod · should be that for everyone). If it's reaching out to staging, you'll see 500ms+ vs the normal <50ms.

"How do I remove the trace MCP?"

Delete the lifecycle-trace entry from ~/.claude/mcp.json and ~/.codex/mcp.json. Restart the apps. Your skill fires stop appearing in the ledger immediately. (We'd ask you not to — universal trace is principle #1 — but technically nothing breaks.)

6 Why this matters · for you specifically

The dashboard is your superpower. The trace is what makes it possible.

You get to see your own work patterns. Leadership sees aggregates. The cluster gets cheaper to run because we know which K8s agents are worth their token budget. Everyone wins because the data exists.

For you · self-coaching

You can finally see your own skill mix, where you spend AI time, whether you're spinning (high fires, low PRs) or flying (high fires, high PRs). 1:1s become data conversations instead of vibes.

For your work · cluster leverage

Heavy tasks dispatch to the cluster · your laptop stays fast · your Max plan stays under limit · overnight runs work for you while you sleep.

For the team · shared visibility

Standups become "look at the dashboard" instead of "what did you work on." Sprint planning becomes "which loops do we close" instead of "estimate these tickets."

For graph8 · cost discipline

OAuth-pool $/merged-PR per agent visible. Bad agents get retired. Good agents get amplified. We ship like a 270-person team without paying for one.

Install Claude Code AND Codex. Desktop or CLI for each — your choice. The MCP wrapper plugs into all 4 paths. Every skill fire lands in the org ledger.