You install both Claude Code and Codex. The tools' harnesses are post-trained alongside their
respective models — using only one means giving up half the leverage. Use Claude Code (Opus 4.7) for deep
reasoning, complex refactors, long sessions. Use Codex (GPT 5.4 + 1M context) for speed-of-iteration on focused
changes and huge codebases. Pick desktop OR CLI for each based on your workflow — the
lifecycle-trace-mcp wrapper supports all 4 paths. Every skill fire lands in the org ledger; you see
your own utilization grade in the dashboard within minutes. For heavy work, dispatch to the K8s axon fleet via
mbm.spawn_agent — local UI, cloud execution, results stream back into your session.
30-minute setup. One-time.
mbm.spawn_agent for heavy or overnight runsRequired: Claude Code AND Codex. The harnesses are post-trained alongside the models — using one means giving up half the leverage. Pick desktop or CLI per tool based on your workflow (most engineers run desktop for one + CLI for the other). The same MCP wrapper plugs into all 4 paths.
Native macOS / Windows app. Best for long agentic sessions, visual diff review, integrated browser preview, and the floating Skill tool palette. Configure once in ~/.claude/mcp.json:
// ~/.claude/mcp.json { "mcpServers": { "lifecycle-trace": { "command": "npx", "args": ["-y", "@graph8/lifecycle-trace-mcp"], "env": { "MBM_URL": "https://mbm.graph8.com", "GITHUB_USER": "<your-github-handle>", "RUNTIME": "claude-code-desktop" } } } }
Restart Claude Code. Trace coverage: 100% (tool calls + skill fires via SessionStart/Stop hooks in settings.json).
npm i -g @anthropic-ai/claude-codeTerminal-native. Best for engineers who live in iTerm/tmux, want one-shot scripted runs (claude code -p "..."), or pipe Claude into Unix tools. Same MCP config, same env vars — just a different runtime label:
# ~/.claude/mcp.json (CLI reads the same file) { "mcpServers": { "lifecycle-trace": { "command": "npx", "args": ["-y", "@graph8/lifecycle-trace-mcp"], "env": { "MBM_URL": "https://mbm.graph8.com", "GITHUB_USER": "<your-github-handle>", "RUNTIME": "claude-code-cli" } } } }
Run claude code once in any repo to confirm it picks up the wrapper. Trace coverage: 100% (CLI fires the same hook surface as desktop).
The Codex panel inside the ChatGPT desktop app. Best for fast iteration on a single file, side-by-side with a chat, or when you want OpenAI's harness wired to your IDE via the local agent. Add the MCP wrapper to ~/.codex/mcp.json:
// ~/.codex/mcp.json { "mcpServers": { "lifecycle-trace": { "command": "npx", "args": ["-y", "@graph8/lifecycle-trace-mcp"], "env": { "MBM_URL": "https://mbm.graph8.com", "GITHUB_USER": "<your-github-handle>", "RUNTIME": "codex-desktop" } } } }
Restart the ChatGPT app. Trace coverage: ~80% via MCP (slash commands inside the ChatGPT UI aren't tools and don't fire hooks — supplemented by Codex's own activity log).
npm i -g @openai/codex · then codex loginTerminal-native. The harness OpenAI Frontier uses internally (see Ryan Lapo). Best for scripted runs, headless agents, long-context refactors, and dispatch from mbm.spawn_agent. Authenticates via OAuth — the auth blob lives in ~/.codex/auth.json (also what we pool for the Modal runtime).
# ~/.codex/mcp.json (CLI reads the same file as desktop) { "mcpServers": { "lifecycle-trace": { "command": "npx", "args": ["-y", "@graph8/lifecycle-trace-mcp"], "env": { "MBM_URL": "https://mbm.graph8.com", "GITHUB_USER": "<your-github-handle>", "RUNTIME": "codex-cli" } } } }
Run codex in any repo. Trace coverage: 100% (CLI surfaces every tool call as a hook). Bonus: your OAuth blob is what the cloud Codex runtime pool consumes — if you'd like to donate a slot, follow the infisical secrets set --path /oauth-pool/openai-codex/... path in CLAUDE.md.
Both graph8-com/infra and graph8-com/g8 already register the MCP wrapper in .claude/mcp.json and .codex/mcp.json. When you clone either repo, all 4 install paths pick it up automatically — no per-repo edits needed. Only the global ~/.claude/mcp.json and ~/.codex/mcp.json need editing once on each laptop, then both desktop and CLI inherit it.
-- After firing one tool call in each, you should see 4 distinct runtime rows: SELECT runtime, COUNT(*) FROM skill_invocations WHERE engineer_id = '<your-handle>' AND ts > NOW() - INTERVAL '1 hour' GROUP BY runtime; -- claude-code-desktop | 1 -- claude-code-cli | 1 -- codex-desktop | 1 -- codex-cli | 1
The harness is post-trained on the model. Claude's harness rewards multi-step plans and self-correction. Codex's harness rewards parallel exploration of huge codebases and surgical edits. Both teams sat with engineers using their tool for a year and shaped the post-training around what worked. Use the right one and you ship 2–3× faster than picking your favorite and forcing the task to fit.
| Task shape | Claude Code | Codex | Why |
|---|---|---|---|
| Multi-file refactor with risk | ✓ pick this | — | Claude's harness plans, checks, reverses. Lower regression rate on cross-cutting changes. |
| Bulk migration across 50+ files | — | ✓ pick this | Codex's 1M context + parallel apply. Burns through repetitive edits faster. |
| Greenfield feature, fuzzy spec | ✓ pick this | — | Claude asks better clarifying questions, drafts a plan you can edit before code lands. |
| Targeted bug fix in known file | also fine | ✓ pick this | Codex is faster end-to-end when scope is bounded — less ceremony before the edit. |
| Reading a huge unfamiliar codebase | also fine | ✓ pick this | Drop the whole repo in context, ask Codex to map it. Claude tends to skim too quickly. |
| Test-driven session (write test → fail → fix) | ✓ pick this | — | Claude's loop discipline (run test, read output, iterate) is the strongest in the space. |
| Dispatching cloud agents in parallel | ✓ pick this | — | mbm.spawn_agent + Claude's Skill tool composes cleanly. Codex parallelism is in beta. |
| PR review on a 5K-LOC diff | also fine | ✓ pick this | 1M context comfortably eats the diff + base files. Faster summary, less truncation. |
| Writing a PRD or design doc | ✓ pick this | — | Claude's prose quality + the /write-prd skill chain. Codex is terser but less structured. |
| SQL / data exploration in OpenSearch / PG | also fine | ✓ pick this | Codex's faster cycle on iterative query refinement. Use Claude only if you need narrative output. |
| Long-running agentic loop (overnight) | ✓ pick this | — | Claude's hibernation + replay semantics + Modal runtime support are first-class. Use Codex CLI for parallel branches. |
Skill palette + visual diff)Ryan Lapo's point (OpenAI Frontier): the harness matters as much as the model. Codex CLI's parallel tool execution + 1M context + iterative apply patches isn't just a wrapper — it's what their RL post-training optimized for. Same with Claude Code's plan-and-revise discipline. Pick the harness whose post-training matches your task, not the model name. That's why we require both installed.
One row per invocation in skill_invocations. Visible in Grafana within minutes. Privacy by design — file paths and skill names are captured, prompt content is not.
-- Example row from skill_invocations after you run /start { ts: "2026-05-17T14:23:01Z", engineer_id: "hassanbaigy", skill_name: "/start", repo: "graph8-com/g8", runtime: "claude-code-local", model: "claude-opus-4-7", latency_ms: 1240, status: "success", input_tokens: null, -- max plan = no metering output_tokens: null, cost_usd_cents: null, arg_hash: "sha256:8f3a..." -- one-way hash of the args }
Skill / tool name · engineer GitHub handle · repo · model · latency · success or abandoned · anonymized argument hash · timestamp
Prompt content · code you typed · file contents · API responses · personal access tokens · anything in your ~/.ssh
POST to https://mbm.graph8.com/v1/trace · stored in skill_invocations Postgres table · visible in Grafana dashboard grafana.graph8.com/d/lifecycle-engineer
Within 60 seconds of the skill fire. Dashboard auto-refreshes every minute.
Fire-and-forget · the trace is dropped silently · never blocks your skill. The hook has a 200ms timeout.
Your own utilization grade · skill mix · daily fire count · top-fired skills · skill-to-PR ratio. All at your row on the team wall.
mbm.spawn_agent when you want the cluster to do the work.For heavy work — bulk refactors · overnight runs · jobs that need private cluster context (Postgres · OpenSearch · internal APIs) — you can dispatch from your local session to the K8s axon fleet. The agent runs on the cluster using OAuth-pool tokens; results stream back into your terminal via SSE.
You in Claude Code desktop locally: > Use the heavy-lift agent to refactor all integrations_v4 modules to use the new client base. Claude calls the MCP tool mbm.spawn_agent: POST /v1/spawn-agent { "skill": "refactor", "repo": "graph8-com/g8", "context": "integrations_v4 · client base migration", "budget": "high", "callback": "sse://<your-session-id>" } MBM creates a Task CR in axon-system K8s namespace: → spawns g8 agent pod → uses OAuth pool token (not your personal Max plan) → has access to Postgres, OpenSearch, internal APIs Output streams back to your local session via SSE: [pod started] Reading integrations_v4/* ... [pod working] Identified 23 files to refactor [pod working] Opened PR #7141: "refactor(integrations_v4): adopt client base" [pod done] Total runtime: 8m 42s · cost: $4.18 You see the work happen in your session — but it ran on the cluster.
mbm.spawn_agent when…Same Claude Code or Codex you use today · plus the trace MCP wrapper running silently · plus the option to dispatch to the cluster when you want. Three rituals change.
First tab of the day: grafana.graph8.com/d/lifecycle-engineer. Check your row · skill fires from yesterday · overnight dispatches that landed PRs · utilization grade. Spend 30 seconds reviewing, then dispatch the morning queue.
engineer_id appears with at least 1 fire from yesterday + any overnight dispatchesUse Claude Code or Codex normally. /start · /investigate-bug · /ship · tool calls · Web searches — all captured. No friction. Your dashboard row updates within 60 seconds of each fire. If you cross 10 skill fires by midday, you're operating at the principle-1 cadence.
mbm.spawn_agent before you log offPick 1–3 heavy tasks and dispatch them to the cluster. They run on OAuth-pool tokens (not your Max plan), use cluster context, and post PRs by morning. The next morning's dashboard shows what landed.
kubectl get tasks -n axon-system · PRs in your inbox by 7 AMMost issues fall into 4 buckets. If none of these match, post in #engineering.
1. Did you restart Claude Code / Codex after editing the MCP config? 2. Is the GITHUB_USER env var your actual GitHub handle (not your name)? 3. Run curl -s $MBM_URL/v1/health — should return {"ok":true}.
You missed the GITHUB_USER env var. Fix it in your ~/.claude/mcp.json and restart. Old unknown rows can be deleted by Thomas or Shaharyar on request.
Known limitation — Codex's tool interception doesn't include slash command names. Open issue tracking native Codex hook support. For now: rely on Claude Code as your primary, use Codex for shorter sessions.
mbm.spawn_agent says rate limit"The OAuth pool hit a rate ceiling. Wait 5 min and retry · or fall back to local execution for now. Pool rate-limit handling is in codex_pool at services/mbm/internal/pool/.
The MBM URL might be wrong (defaults to graph8.com prod · should be that for everyone). If it's reaching out to staging, you'll see 500ms+ vs the normal <50ms.
Delete the lifecycle-trace entry from ~/.claude/mcp.json and ~/.codex/mcp.json. Restart the apps. Your skill fires stop appearing in the ledger immediately. (We'd ask you not to — universal trace is principle #1 — but technically nothing breaks.)
You get to see your own work patterns. Leadership sees aggregates. The cluster gets cheaper to run because we know which K8s agents are worth their token budget. Everyone wins because the data exists.
You can finally see your own skill mix, where you spend AI time, whether you're spinning (high fires, low PRs) or flying (high fires, high PRs). 1:1s become data conversations instead of vibes.
Heavy tasks dispatch to the cluster · your laptop stays fast · your Max plan stays under limit · overnight runs work for you while you sleep.
Standups become "look at the dashboard" instead of "what did you work on." Sprint planning becomes "which loops do we close" instead of "estimate these tickets."
OAuth-pool $/merged-PR per agent visible. Bad agents get retired. Good agents get amplified. We ship like a 270-person team without paying for one.