---
title: "Engineer autopilot — local daemon that auto-picks AGENT-READY issues"
status: draft
author: "thomas"
assignee: null
created: 2026-05-17
approved: null
domains: [npm/@graph8/engineer-autopilot, services/mbm/internal/spawn, tenants/graph8-eng/agents, .claude]
issues: []
complexity: medium
gtm_score: 12
analysis_date: 2026-05-17
---

# Engineer autopilot — local daemon that auto-picks AGENT-READY issues

## GTM Automation Assessment

- Manual step reduction: **5/5** — eliminates the "engineer manually
  dispatches each agent" step that today happens 30+ times per day per
  engineer. Engineer wakes up; the night's AGENT-READY backlog is
  already in flight or shipped.
- User autonomy ("approve-only"): **5/5** — engineer starts the daemon
  once; from then on, the only human touch is a Slack/Roam ack when an
  agent is genuinely stuck.
- Data pipeline contribution: **3/5** — emits trace rows tagged
  `/autopilot-dispatch` so the per-engineer utilization dashboard
  (PRD 2) shows autopilot activity distinctly from interactive sessions.
- Market positioning: **differentiator** — no other agent platform
  ships a "pick up your day's issues without you typing" daemon at this
  level of integration with the issue-tracking + dispatch pipeline.
- **Total: 12/15** — ties with PRD 4 for highest in the Lifecycle set
  because it eliminates the *last* manual step in the engineering
  assembly line.

## Problem

Shaharyar in the 2026-05-17 conversation (transcript captured in this
PR): *"how currently I... did something was I pointed to GitHub and
say, okay, get all the issues right now and start it. And for each, I
ask it manually. But what if we just have it? Like, all you have to do
is go start and Cloud Code does it for them?"*

Today the flow is:

1. Engineer opens GitHub, scrolls to AGENT-READY issues assigned to
   them.
2. Engineer types `claude code` (or `codex`) in a terminal.
3. Engineer pastes the issue URL or summary.
4. Engineer waits for the agent to finish.
5. Engineer reviews + opens PR.
6. Engineer repeats for the next issue.

That's 30+ context switches per day, each one a chance to lose focus
and a tax on throughput. Per [principle 10 — 10+ agents per engineer
in flight](https://graph8-lifecycle.pages.dev/principles.html), the
right shape is many agents running in parallel — but the engineer
shouldn't be the orchestration layer. The harness should be.

What we want instead (Shaharyar, same transcript): *"we can have a
background agent running in the desktop that just looks into the
issues if they are assigned to them and just start working on it. And
it would be a pinging system if human help is needed somewhere
suddenly in some agent."*

This PRD ships that daemon.

## Analysis Summary

### Existing Code to Reuse

| What | Where | How we reuse it |
|------|-------|-----------------|
| `mbm.spawn_agent` MCP tool | shipping in [PRD 5 — Lifecycle deploy loop](lifecycle-deploy-loop.md), `npm/@graph8/spawn-agent-mcp` | Autopilot calls `mbm.spawn_agent` for each picked issue when the engineer opts into cloud execution; local execution uses `claude code` / `codex` directly |
| Trace ledger | shipping in [PRD 1 — Lifecycle trace pipeline](lifecycle-trace-pipeline.md), `@graph8/lifecycle-trace-mcp` | Autopilot fires `/autopilot-dispatch` trace rows per pick |
| MBM `POST /v1/spawn-agent` endpoint | shipping in [PRD 5](lifecycle-deploy-loop.md), `services/mbm/internal/spawn/handler.go` | Same endpoint; autopilot uses Bearer auth same as `mbm.spawn_agent` MCP |
| GitHub Projects API for AGENT READY status | already used by `gh project item-list` in setup scripts | Daemon queries `gh api graphql` for items with Status=AGENT READY assigned to engineer |
| Claude Code / Codex hook surface | already used by trace MCP for `SessionStart` etc | Autopilot reuses the same hook ecosystem to detect "agent stuck" events |
| Slack webhook from `tenants/graph8-eng/channels` | existing GitHub App + Slack integration | Autopilot posts pings via the same Slack channel MBM uses |

### Data Dependencies

- **Read:** GitHub issues (REST + GraphQL Projects API) for AGENT READY items assigned to the engineer.
- **Read:** local `.claude/mcp.json` and `~/.claude/mcp.json` for harness config.
- **Write:** GitHub comments (when an agent gets stuck or finishes), PR creates.
- **Write:** trace rows via PRD 1's pipeline.
- **No new tables.**

### Integration Points

- **Local laptop process** — long-running Node daemon, `npx @graph8/engineer-autopilot start` to launch.
- **GitHub** — polls the mbm project (and any other configured boards) for AGENT READY items.
- **Claude Code / Codex** — invoked as subprocess per issue. The harness is selected per the matrix in [`docs/local.html`](https://graph8-lifecycle.pages.dev/local.html#when-to-use); engineer can override via env or per-issue label.
- **MBM** — optional fallback to cloud dispatch via `mbm.spawn_agent` for heavy issues (engineer-configurable threshold based on est. agent runtime).
- **Slack/Roam** — ping channel for human-help events.

### Competitive Context

- **Cursor's "background agents"** — run in the cloud; you can't dispatch from a local CLI to your own queue.
- **GitHub Copilot Workspace** — operates at the PR level; doesn't manage an issue queue.
- **Anthropic's computer-use SDK** — runs in Anthropic's cloud; not designed for an engineer-controlled local issue queue.
- **Aider / Cline** — interactive, not autonomous; require the engineer to type each prompt.

The combination (local daemon + GitHub issue queue + harness pick + Slack ping on block) is bespoke.

## Technical Fit Review

### System Touchpoints

| Layer | Affected | Details |
|-------|----------|---------|
| Modules | new | `npm/@graph8/engineer-autopilot` (Node ≥18) |
| Routes | none | reuses MBM `/v1/spawn-agent` from PRD 5 |
| Repositories | none | direct GitHub API calls |
| Data models | none | reuses GitHub issues + `skill_invocations` |
| Infrastructure | none | runs on engineer's laptop |
| Events (EDA) | none | |
| External services | new | Slack webhook for pings (reuses existing) |

### Reuse Plan

| Component | Action | Notes |
|-----------|--------|-------|
| `@graph8/spawn-agent-mcp` package | reuse | Sister package; autopilot can import or shell out |
| `@graph8/lifecycle-trace-mcp` package | reuse | Autopilot dispatches fire trace rows through the same pipeline |
| Claude Code CLI / Codex CLI | wrap | Autopilot spawns these as subprocesses per issue |
| `gh` CLI | reuse | Used for issue queries, PR creates, comment posts |
| `mbm.spawn_agent` MCP tool | reuse | Cloud-dispatch path for heavy issues |
| Slack webhook (existing graph8 ops channel) | reuse | New webhook URL added to Infisical as needed |

### Extension-Before-Replacement

Nothing being replaced. Pure addition. The interactive harness path
(engineer types `claude code` themselves) continues to work — autopilot
is just an additional dispatcher.

### Architecture Scorecard

| Criterion | Score (1-5) | Notes |
|-----------|-------------|-------|
| Fits current architecture | 5 | Same npm + GHCR + harness + trace + dispatch shape as everything else |
| Delivery speed | 4 | One npm package + thin wrappers around existing endpoints; ~1 week |
| Operational risk | 3 | Daemon runs unattended; bug → wrong issue picked or PR opened. Mitigated by dry-run mode + per-issue confirmation flag |
| Reversibility | 5 | Engineer stops the daemon; queue freezes; no side effects |
| Complexity added | 3 | One package, but it touches GH API + harness subprocess + Slack |
| Future extensibility | 5 | New dispatchers (e.g. desktop UI panel, mobile ack) layer on top |

## Requirements

### npm package `@graph8/engineer-autopilot`

- **Registry:** GHCR npm under `@graph8-com` scope.
- **Runtime:** Node ≥18.
- **Dependencies:** `@octokit/rest`, `@modelcontextprotocol/sdk` (optional, for spawn-agent integration), `commander` (CLI), `chalk` (logging), `node-fetch` (Slack webhook).
- **Binary:** `engineer-autopilot` exposed via `package.json` `bin`. Subcommands:

```bash
engineer-autopilot start [--concurrency N] [--harness claude|codex|auto] [--cloud-threshold-min M]
engineer-autopilot stop
engineer-autopilot status
engineer-autopilot dry-run    # print what would be picked, don't dispatch
engineer-autopilot config     # write/show ~/.config/graph8/autopilot.yaml
```

### Daemon behavior

1. **Boot:**
   - Read `~/.config/graph8/autopilot.yaml` (or env overrides):
     - `github_user` (required; engineer's GitHub handle)
     - `boards` (default: `["graph8-com/projects/42"]`)
     - `concurrency` (default 1; max 3)
     - `harness` (default `auto` — uses the [when-to-use matrix](https://graph8-lifecycle.pages.dev/local.html#when-to-use))
     - `cloud_threshold_min` (default 30 — issues estimated to run >30 min get dispatched to MBM `/v1/spawn-agent` instead of local)
     - `slack_webhook_url` (from Infisical `/lifecycle/autopilot-slack-webhook`)
     - `pause_on_block` (default `true` — pauses queue until human ack)
   - Verify auth: `gh auth status` must succeed; `mbm.spawn_agent` token must be present if cloud-dispatch enabled.
   - Print "autopilot running for <github_user>; watching N boards; concurrency M" and start polling loop.

2. **Poll loop (every 60s):**
   - Query mbm project (and any other configured boards) for items with Status=AGENT READY assigned to the engineer's GitHub handle.
   - Filter out items already in-flight (tracked in memory + persisted to `~/.local/state/graph8/autopilot-state.json`).
   - Pick the top N items (where N = concurrency - in_flight).
   - For each:
     - Move item to Status=IN PROGRESS on the project board.
     - Add a comment on the issue: `🤖 Autopilot picked up at <ts>. Harness: <claude-code-cli|codex-cli>. PR will follow.`
     - Spawn the agent (local subprocess OR cloud `mbm.spawn_agent`).
     - Track the spawn in state file.

3. **Agent execution (local path):**
   - Create a temp worktree from `main`.
   - Build the agent prompt from the issue body + sub-issue context.
   - Run `claude code -p "$PROMPT"` or `codex -p "$PROMPT"` in the worktree.
   - On success: open a PR with `gh pr create --base main --head autopilot/<issue-number> --title "<issue title>"`. PR body includes "Closes #<issue>" + agent transcript link.
   - On agent exit code 1: capture stderr + last 200 lines of stdout. Trigger ping flow.
   - On agent timeout (default 90 min): kill, capture state, trigger ping flow.

4. **Agent execution (cloud path):**
   - Estimate issue duration from labels / size (heuristic: PR-fixing < 15 min, feature work > 60 min).
   - If estimate > `cloud_threshold_min`: POST to MBM `/v1/spawn-agent` with the issue prompt.
   - Track the returned `task_id`; subscribe to the SSE `stream_url` and surface progress in autopilot logs.
   - On task completion: same PR-open flow as local.
   - On task failure or stuck: trigger ping flow.

5. **Ping flow:**
   - Post to Slack webhook: `🔔 Autopilot blocked on issue #N (title). Reason: <reason>. Last output: <link to gist>. Ack to unblock.`
   - Post a comment on the issue: same content, plus `cc @<engineer>`.
   - Move item back to Status=TODO on the board with a comment explaining the block (per Shaharyar's flow: blocked items need human review before re-promoting).
   - If `pause_on_block: true`, pause the queue (don't pick new items) until engineer runs `engineer-autopilot resume`.

6. **Trace integration:**
   - On every pick: fire trace row `skill_name='/autopilot-dispatch'`, `metadata: {issue_url, harness, mode}`.
   - On every PR open: fire `skill_name='/autopilot-pr-open'`.
   - On every ping: fire `skill_name='/autopilot-block'`.
   - Visible in PRD 2's per-engineer dashboard, distinct from interactive sessions.

7. **Stop:**
   - `Ctrl-C` or `engineer-autopilot stop`: gracefully wait up to 30s for in-flight agents to checkpoint; persist state; exit.
   - On hard kill: state file is recoverable on next `start`; in-flight items get a comment "autopilot interrupted; needs re-pickup."

### UI (Frontend)

None for v1 — CLI only. Daemon output is structured logs to stdout; engineer can `tail` them. Future v2 might ship a small desktop panel showing the queue + current agent, but v1 is intentionally minimal.

### Data Model

No new tables. State persisted as JSON on the engineer's laptop at `~/.local/state/graph8/autopilot-state.json`.

## Acceptance Criteria

- [ ] npm package `@graph8/engineer-autopilot@0.1.0` published to GHCR npm registry under `@graph8-com` scope.
- [ ] `npx @graph8/engineer-autopilot start` runs in <5s on a freshly-cloned laptop with `gh auth login` complete.
- [ ] Daemon writes a sample `~/.config/graph8/autopilot.yaml` on first run if none exists.
- [ ] `engineer-autopilot dry-run` lists the issues that *would* be picked, with the chosen harness per issue, without dispatching.
- [ ] Daemon picks AGENT-READY issues assigned to the configured `github_user` and skips issues assigned to others.
- [ ] Daemon moves picked items to Status=IN PROGRESS on the mbm project before dispatching.
- [ ] Daemon posts an "autopilot picked up" comment on each picked issue within 5s of dispatch.
- [ ] Local dispatch: `claude code` / `codex` runs in a temp worktree (no side effects on the engineer's main checkout).
- [ ] Cloud dispatch: estimated-long issues (>30 min default) call `mbm.spawn_agent` and subscribe to the returned SSE stream.
- [ ] Successful agent run → PR opened with `Closes #N` body + correct branch naming (`autopilot/N`).
- [ ] Stuck agent → Slack ping + issue comment + item moved back to TODO + queue pauses (when `pause_on_block: true`).
- [ ] `engineer-autopilot stop` triggers graceful shutdown; in-flight agents finish or checkpoint cleanly.
- [ ] Trace rows fire for `/autopilot-dispatch`, `/autopilot-pr-open`, `/autopilot-block` and are visible in PRD 2's per-engineer dashboard within 60s.
- [ ] OTEL span `autopilot.dispatch` visible in Tempo per pick.
- [ ] End-to-end test: seed 3 AGENT-READY issues assigned to a test user, start daemon, observe 3 PRs opened within total agent runtime + ≤5s overhead per dispatch.
- [ ] Failure test: seed 1 AGENT-READY issue with a deliberately broken acceptance criterion; agent fails; daemon posts ping + moves to TODO; queue pauses.
- [ ] Documentation: package README documents all subcommands + sample `autopilot.yaml`; [`docs/local.html`](https://graph8-lifecycle.pages.dev/local.html) gets a new "Engineer autopilot" section under Hybrid Execution.

## Out of Scope

- **Desktop UI panel.** CLI-only in v1. A floating panel showing queue state is v2.
- **Multi-laptop coordination.** No leader election if two laptops run autopilot for the same user; v1 assumes single laptop.
- **Auto-assignment of issues to itself.** Daemon only picks issues already assigned; it doesn't claim unassigned items.
- **Auto-promotion of TODO → AGENT READY.** Per Shaharyar's flow rule, that's a human-approval gate. Autopilot only consumes AGENT READY.
- **Cross-board coordination beyond mbm.** v1 watches the mbm board (#42). Additional boards configurable but not enabled by default.
- **Cost-aware dispatch.** No "this issue is expensive, skip it" logic. Cost surfaces are PRD 2 dashboards; autopilot just dispatches.
- **PR review delegation.** Autopilot opens the PR; MBM bot reviews (existing pipeline). No new review logic here.

## Dependencies

**Depends on:**

- **[PRD 5 — Lifecycle deploy loop](lifecycle-deploy-loop.md).** Specifically `mbm.spawn_agent` MCP + MBM `/v1/spawn-agent` endpoint for the cloud-dispatch path. If PRD 5 slips, autopilot v0.1 can ship local-only (the cloud path becomes a v0.2 follow-up).

**Soft signal from:**

- **[PRD 1 — Lifecycle trace pipeline](lifecycle-trace-pipeline.md).** Autopilot fires trace rows; if PRD 1 isn't live, the rows are silently dropped (per its fire-and-forget design) and the dashboards stay empty until both ship.
- **[PRD 2 — Lifecycle dashboards](lifecycle-dashboards.md).** Per-engineer autopilot activity shows up there; not blocking.

## Sequencing notes

Ship after PRD 5 (sprint 3 end or sprint 4 start). Could ship local-only sooner if PRD 5 slips.

## How a coding agent picks this up

When this PRD is `status: approved`, planner spawns ~7 issues:

1. npm package scaffolding + `engineer-autopilot start/stop/status/dry-run/config` CLI commands.
2. Daemon poll loop + GitHub Projects API integration.
3. Local agent subprocess execution + temp-worktree handling.
4. Cloud dispatch via `mbm.spawn_agent` (depends on PRD 5).
5. Ping flow + Slack webhook integration + state pause/resume.
6. Trace row integration with PRD 1.
7. End-to-end test fixture + sample `autopilot.yaml`.

Labels: npm package + CLI + state → `g8_fullstack` agent. GitHub Projects integration → `infra` agent. MBM integration touches → `mbm` agent.
