From d129ab4150e9c923e8a3bf2b29ef2e13af1537cb Mon Sep 17 00:00:00 2001 From: Hunter Bown Date: Sun, 3 May 2026 04:09:27 -0500 Subject: [PATCH] =?UTF-8?q?docs:=20SUBAGENTS.md=20=E2=80=94=20role=20taxon?= =?UTF-8?q?omy,=20lifecycle,=20output=20contract=20(#404)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The role taxonomy expansion in #404 added Implementer + Verifier as distinct postures alongside General / Explore / Plan / Review / Custom. The issue body explicitly lists \`docs/AGENTS.md or docs/SUBAGENTS.md\` as a target file; this commit creates that file. Coverage: - Role taxonomy table — stance, write/shell access, typical use per role. - "When to pick which role" — narrative guidance the model can read if the role choice isn't obvious. - Alias map — every accepted spelling routed to a canonical role, matching what \`SubAgentType::from_str\` accepts. - Concurrency cap — the 10-by-default value, the \`[subagents].max_concurrent\` knob, and the running-only semantics (#509). - Lifecycle — Pending → Running → terminal states, plus \`Interrupted\` after a process restart. - Session boundaries (#405) — \`session_boot_id\` mechanics, default current-session filter, \`include_archived=true\` escape hatch, pre-#405 record handling. - Output contract — the SUMMARY/CHANGES/EVIDENCE/RISKS/BLOCKERS format every sub-agent must produce. - Memory + \`remember\` integration (#489) — sub-agents inherit the parent's memory file when memory is enabled and can append durable notes. - Implementation notes — source path, persisted state file, is_running semantics, RwLock pattern. Cross-link added in \`docs/TOOL_SURFACE.md\` so the sub-agent section points to this doc. No Rust code changed. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/SUBAGENTS.md | 164 +++++++++++++++++++++++++++++++++++++++++++ docs/TOOL_SURFACE.md | 5 +- 2 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 docs/SUBAGENTS.md diff --git a/docs/SUBAGENTS.md b/docs/SUBAGENTS.md new file mode 100644 index 00000000..3fe83329 --- /dev/null +++ b/docs/SUBAGENTS.md @@ -0,0 +1,164 @@ +# Sub-Agents + +Sub-agents are background instances of the agent loop. The parent +agent spawns one with a focused task, gets back an `agent_id` +immediately, and continues working while the sub-agent runs to +completion. Sub-agents inherit the parent's tool registry by default +and run with `CancellationToken::child_token()`, so cancelling the +parent cancels every descendant. + +This doc covers the role taxonomy. For the orchestration tool surface +(`agent_spawn` / `agent_wait` / `agent_result` / `agent_cancel` / +`agent_list` / `agent_send_input` / `agent_resume` / `agent_assign`) +see `prompts/base.md` "Sub-Agent Strategy" and the in-line tool +descriptions. + +## Role taxonomy + +The `agent_type` field on `agent_spawn` selects a system-prompt +posture for the child. Each role is a distinct stance toward the +work — not just a different label. + +| Role | Stance | Writes? | Runs shell? | Typical use | +|---------------|----------------------------------------|---------|-------------|----------------------------------------------| +| `general` | flexible; do whatever the parent says | yes | yes | the default; multi-step tasks | +| `explore` | read-only; map the relevant code fast | no | yes (read) | "find every call site of `Foo`" | +| `plan` | analyse and produce a strategy | minimal | minimal | "design the migration; don't execute" | +| `review` | read-and-grade with severity scores | no | no | "audit this PR for bugs" | +| `implementer` | land a specific change with min edit | yes | yes | "rewrite `bar.rs::Foo::bar` to do X" | +| `verifier` | run tests / validation, report outcome | no | yes (test) | "run cargo test --workspace, report" | +| `custom` | explicit narrow tool allowlist | depends | depends | locked-down dispatch with hand-picked tools | + +Each role's full system prompt lives in +`crates/tui/src/tools/subagent/mod.rs` (search for +`*_AGENT_PROMPT`). The prompt prefix loads automatically when the +child agent boots; the parent's spawn prompt becomes the first +turn's user message. + +### When to pick which role + +- **`general`** — when the task is "do this whole thing", not "go + look", "design", or "verify". This is the right default; reach for + a more specific role only when the posture matters. +- **`explore`** — when the parent needs evidence before deciding what + to do next. Explorers are cheap and fast; spawn 2–3 in parallel + for independent regions. +- **`plan`** — when the parent has an objective but no executable + decomposition. Planners write artifacts (`update_plan` rows, + `checklist_write` entries) but don't carry them out. +- **`review`** — when there's already a change and the parent wants + it graded. Reviewers don't patch — they describe the fix in the + finding so the parent can dispatch an Implementer if the verdict + is "fix it". +- **`implementer`** — when the change is already specified and just + needs to land. Implementers stay tightly scoped: minimum edit, no + drive-by refactoring, run a quick verification before handing back. +- **`verifier`** — when the parent needs an authoritative pass/fail + on the test suite or other validation. Verifiers don't fix + failures; they capture the failing assertion + stack and put fix + candidates under RISKS. +- **`custom`** — only when the parent needs to constrain the tool + set explicitly. Pass the allowlist via the `allowed_tools` field + on `agent_spawn`. + +### Aliases + +The model can spell each role multiple ways: + +| Canonical | Aliases | +|---------------|------------------------------------------------------------------| +| `general` | `worker`, `default`, `general-purpose` | +| `explore` | `explorer`, `exploration` | +| `plan` | `planning`, `awaiter` | +| `review` | `reviewer`, `code-review` | +| `implementer` | `implement`, `implementation`, `builder` | +| `verifier` | `verify`, `verification`, `validator`, `tester` | +| `custom` | (none; explicit `allowed_tools` array required) | + +All matching is case-insensitive. Unknown values produce a typed +error listing the accepted set, so the model can self-correct on +the next turn. + +## Concurrency cap + +The dispatcher caps concurrent sub-agents at 10 by default +(configurable via `[subagents].max_concurrent` in `~/.deepseek/config.toml`, +hard ceiling 20). When the parent hits the cap, `agent_spawn` returns +an error with the cap value; the parent should `agent_wait` for +completion or `agent_cancel` to free a slot before retrying. + +The cap counts only **running** agents — completed / failed / +cancelled records persist for inspection but don't occupy a slot. +Agents that lost their `task_handle` (e.g. across a process +restart) also don't count against the cap. + +## Lifecycle + +Each spawn produces a record that progresses through: + +``` +Pending → Running → (Completed | Failed(reason) | Cancelled | Interrupted(reason)) +``` + +`Interrupted` fires when the manager detects a `Running` agent +whose task handle is gone — typically after a process restart that +loaded the agent from `~/.deepseek/subagents.v1.json`. The parent +can `agent_resume` to attempt continuation or treat it as a +terminal state. + +### Session boundaries (#405) + +Each `SubAgentManager` instance assigns itself a fresh +`session_boot_id` on construction. Every spawn stamps the agent +with that id; the persisted state file carries it across restarts. + +`agent_list` defaults to **current-session only**: prior-session +agents that aren't still running are filtered out. Pass +`include_archived=true` to surface every record, with the +`from_prior_session: true` flag so the model can tell archived +records apart from live ones. + +Records that loaded from a pre-#405 persisted state file (no +`session_boot_id` field) classify as prior-session because the +manager can't match them to the current boot. + +## Output contract + +Every sub-agent produces a final result string with five sections, +in order: + +``` +SUMMARY: one paragraph; what you did and what happened +CHANGES: files modified, with one-line descriptions; "None." if read-only +EVIDENCE: path:line-range citations and key findings; one bullet each +RISKS: what could go wrong / what the parent should double-check +BLOCKERS: what stopped you; "None." if you finished cleanly +``` + +The exact format lives in `crates/tui/src/prompts/subagent_output_format.md`. +The parent reads `EVIDENCE` as a working set for the next turn, so +explorers and reviewers should be precise here. + +## Memory and the `remember` tool (#489) + +Sub-agents inherit the parent's memory file when memory is enabled +(`[memory] enabled = true` or `DEEPSEEK_MEMORY=on`). They can +append durable notes via the `remember` tool — handy for an +explorer that discovers a project convention worth carrying across +sessions, or a verifier that learns "this test is flaky". + +Memory writes are scoped to the user's own `memory.md` file; they +don't go through the standard write-approval flow. + +## Implementation notes + +- Source: `crates/tui/src/tools/subagent/mod.rs` (about 3500 LOC). +- Persisted state: `~/.deepseek/subagents.v1.json`. Schema version + `1` (forward-compatible — new optional fields use + `#[serde(default)]`). +- The `is_running` check ignores agents whose `task_handle` is + `None`; this avoids counting persisted-but-detached records + toward the concurrency cap (#509). +- `SharedSubAgentManager` is `Arc>` — read paths use + read locks so `/agents` and the sidebar projection don't block + the main loop during multi-agent fan-out (#510). diff --git a/docs/TOOL_SURFACE.md b/docs/TOOL_SURFACE.md index a8674895..b58470c7 100644 --- a/docs/TOOL_SURFACE.md +++ b/docs/TOOL_SURFACE.md @@ -138,7 +138,10 @@ Large logs and command outputs should be artifacts with compact summaries in the `agent_spawn` plus the supporting tools (`agent_result` / `wait` / `send_input` / `agent_assign` / `agent_cancel` / `resume_agent` / `agent_list`). -See `agent.txt` for the delegation protocol. +See `agent.txt` for the delegation protocol and +[`SUBAGENTS.md`](SUBAGENTS.md) for the role taxonomy +(`general` / `explore` / `plan` / `review` / `implementer` / +`verifier` / `custom`). ### Parallel fan-out: cost-class caps