docs: SUBAGENTS.md — role taxonomy, lifecycle, output contract (#404)

The role taxonomy expansion in #404 added Implementer + Verifier as
distinct postures alongside General / Explore / Plan / Review /
Custom. The issue body explicitly lists \`docs/AGENTS.md or
docs/SUBAGENTS.md\` as a target file; this commit creates that file.

Coverage:

- Role taxonomy table — stance, write/shell access, typical use per
  role.
- "When to pick which role" — narrative guidance the model can read
  if the role choice isn't obvious.
- Alias map — every accepted spelling routed to a canonical role,
  matching what \`SubAgentType::from_str\` accepts.
- Concurrency cap — the 10-by-default value, the
  \`[subagents].max_concurrent\` knob, and the running-only
  semantics (#509).
- Lifecycle — Pending → Running → terminal states, plus
  \`Interrupted\` after a process restart.
- Session boundaries (#405) — \`session_boot_id\` mechanics,
  default current-session filter, \`include_archived=true\` escape
  hatch, pre-#405 record handling.
- Output contract — the SUMMARY/CHANGES/EVIDENCE/RISKS/BLOCKERS
  format every sub-agent must produce.
- Memory + \`remember\` integration (#489) — sub-agents inherit the
  parent's memory file when memory is enabled and can append durable
  notes.
- Implementation notes — source path, persisted state file,
  is_running semantics, RwLock pattern.

Cross-link added in \`docs/TOOL_SURFACE.md\` so the sub-agent section
points to this doc.

No Rust code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hunter Bown
2026-05-03 04:09:27 -05:00
parent 2f9b58b910
commit d129ab4150
2 changed files with 168 additions and 1 deletions
+164
View File
@@ -0,0 +1,164 @@
# Sub-Agents
Sub-agents are background instances of the agent loop. The parent
agent spawns one with a focused task, gets back an `agent_id`
immediately, and continues working while the sub-agent runs to
completion. Sub-agents inherit the parent's tool registry by default
and run with `CancellationToken::child_token()`, so cancelling the
parent cancels every descendant.
This doc covers the role taxonomy. For the orchestration tool surface
(`agent_spawn` / `agent_wait` / `agent_result` / `agent_cancel` /
`agent_list` / `agent_send_input` / `agent_resume` / `agent_assign`)
see `prompts/base.md` "Sub-Agent Strategy" and the in-line tool
descriptions.
## Role taxonomy
The `agent_type` field on `agent_spawn` selects a system-prompt
posture for the child. Each role is a distinct stance toward the
work — not just a different label.
| Role | Stance | Writes? | Runs shell? | Typical use |
|---------------|----------------------------------------|---------|-------------|----------------------------------------------|
| `general` | flexible; do whatever the parent says | yes | yes | the default; multi-step tasks |
| `explore` | read-only; map the relevant code fast | no | yes (read) | "find every call site of `Foo`" |
| `plan` | analyse and produce a strategy | minimal | minimal | "design the migration; don't execute" |
| `review` | read-and-grade with severity scores | no | no | "audit this PR for bugs" |
| `implementer` | land a specific change with min edit | yes | yes | "rewrite `bar.rs::Foo::bar` to do X" |
| `verifier` | run tests / validation, report outcome | no | yes (test) | "run cargo test --workspace, report" |
| `custom` | explicit narrow tool allowlist | depends | depends | locked-down dispatch with hand-picked tools |
Each role's full system prompt lives in
`crates/tui/src/tools/subagent/mod.rs` (search for
`*_AGENT_PROMPT`). The prompt prefix loads automatically when the
child agent boots; the parent's spawn prompt becomes the first
turn's user message.
### When to pick which role
- **`general`** — when the task is "do this whole thing", not "go
look", "design", or "verify". This is the right default; reach for
a more specific role only when the posture matters.
- **`explore`** — when the parent needs evidence before deciding what
to do next. Explorers are cheap and fast; spawn 23 in parallel
for independent regions.
- **`plan`** — when the parent has an objective but no executable
decomposition. Planners write artifacts (`update_plan` rows,
`checklist_write` entries) but don't carry them out.
- **`review`** — when there's already a change and the parent wants
it graded. Reviewers don't patch — they describe the fix in the
finding so the parent can dispatch an Implementer if the verdict
is "fix it".
- **`implementer`** — when the change is already specified and just
needs to land. Implementers stay tightly scoped: minimum edit, no
drive-by refactoring, run a quick verification before handing back.
- **`verifier`** — when the parent needs an authoritative pass/fail
on the test suite or other validation. Verifiers don't fix
failures; they capture the failing assertion + stack and put fix
candidates under RISKS.
- **`custom`** — only when the parent needs to constrain the tool
set explicitly. Pass the allowlist via the `allowed_tools` field
on `agent_spawn`.
### Aliases
The model can spell each role multiple ways:
| Canonical | Aliases |
|---------------|------------------------------------------------------------------|
| `general` | `worker`, `default`, `general-purpose` |
| `explore` | `explorer`, `exploration` |
| `plan` | `planning`, `awaiter` |
| `review` | `reviewer`, `code-review` |
| `implementer` | `implement`, `implementation`, `builder` |
| `verifier` | `verify`, `verification`, `validator`, `tester` |
| `custom` | (none; explicit `allowed_tools` array required) |
All matching is case-insensitive. Unknown values produce a typed
error listing the accepted set, so the model can self-correct on
the next turn.
## Concurrency cap
The dispatcher caps concurrent sub-agents at 10 by default
(configurable via `[subagents].max_concurrent` in `~/.deepseek/config.toml`,
hard ceiling 20). When the parent hits the cap, `agent_spawn` returns
an error with the cap value; the parent should `agent_wait` for
completion or `agent_cancel` to free a slot before retrying.
The cap counts only **running** agents — completed / failed /
cancelled records persist for inspection but don't occupy a slot.
Agents that lost their `task_handle` (e.g. across a process
restart) also don't count against the cap.
## Lifecycle
Each spawn produces a record that progresses through:
```
Pending → Running → (Completed | Failed(reason) | Cancelled | Interrupted(reason))
```
`Interrupted` fires when the manager detects a `Running` agent
whose task handle is gone — typically after a process restart that
loaded the agent from `~/.deepseek/subagents.v1.json`. The parent
can `agent_resume` to attempt continuation or treat it as a
terminal state.
### Session boundaries (#405)
Each `SubAgentManager` instance assigns itself a fresh
`session_boot_id` on construction. Every spawn stamps the agent
with that id; the persisted state file carries it across restarts.
`agent_list` defaults to **current-session only**: prior-session
agents that aren't still running are filtered out. Pass
`include_archived=true` to surface every record, with the
`from_prior_session: true` flag so the model can tell archived
records apart from live ones.
Records that loaded from a pre-#405 persisted state file (no
`session_boot_id` field) classify as prior-session because the
manager can't match them to the current boot.
## Output contract
Every sub-agent produces a final result string with five sections,
in order:
```
SUMMARY: one paragraph; what you did and what happened
CHANGES: files modified, with one-line descriptions; "None." if read-only
EVIDENCE: path:line-range citations and key findings; one bullet each
RISKS: what could go wrong / what the parent should double-check
BLOCKERS: what stopped you; "None." if you finished cleanly
```
The exact format lives in `crates/tui/src/prompts/subagent_output_format.md`.
The parent reads `EVIDENCE` as a working set for the next turn, so
explorers and reviewers should be precise here.
## Memory and the `remember` tool (#489)
Sub-agents inherit the parent's memory file when memory is enabled
(`[memory] enabled = true` or `DEEPSEEK_MEMORY=on`). They can
append durable notes via the `remember` tool — handy for an
explorer that discovers a project convention worth carrying across
sessions, or a verifier that learns "this test is flaky".
Memory writes are scoped to the user's own `memory.md` file; they
don't go through the standard write-approval flow.
## Implementation notes
- Source: `crates/tui/src/tools/subagent/mod.rs` (about 3500 LOC).
- Persisted state: `~/.deepseek/subagents.v1.json`. Schema version
`1` (forward-compatible — new optional fields use
`#[serde(default)]`).
- The `is_running` check ignores agents whose `task_handle` is
`None`; this avoids counting persisted-but-detached records
toward the concurrency cap (#509).
- `SharedSubAgentManager` is `Arc<RwLock<...>>` — read paths use
read locks so `/agents` and the sidebar projection don't block
the main loop during multi-agent fan-out (#510).
+4 -1
View File
@@ -138,7 +138,10 @@ Large logs and command outputs should be artifacts with compact summaries in the
`agent_spawn` plus the supporting tools (`agent_result` / `wait` / `send_input` /
`agent_assign` / `agent_cancel` / `resume_agent` / `agent_list`).
See `agent.txt` for the delegation protocol.
See `agent.txt` for the delegation protocol and
[`SUBAGENTS.md`](SUBAGENTS.md) for the role taxonomy
(`general` / `explore` / `plan` / `review` / `implementer` /
`verifier` / `custom`).
### Parallel fan-out: cost-class caps