dgf1988/codewhale

Files

T

Hunter Bown d129ab4150 docs: SUBAGENTS.md — role taxonomy, lifecycle, output contract (#404 )

The role taxonomy expansion in #404 added Implementer + Verifier as
distinct postures alongside General / Explore / Plan / Review /
Custom. The issue body explicitly lists \`docs/AGENTS.md or
docs/SUBAGENTS.md\` as a target file; this commit creates that file.

Coverage:

- Role taxonomy table — stance, write/shell access, typical use per
  role.
- "When to pick which role" — narrative guidance the model can read
  if the role choice isn't obvious.
- Alias map — every accepted spelling routed to a canonical role,
  matching what \`SubAgentType::from_str\` accepts.
- Concurrency cap — the 10-by-default value, the
  \`[subagents].max_concurrent\` knob, and the running-only
  semantics (#509).
- Lifecycle — Pending → Running → terminal states, plus
  \`Interrupted\` after a process restart.
- Session boundaries (#405) — \`session_boot_id\` mechanics,
  default current-session filter, \`include_archived=true\` escape
  hatch, pre-#405 record handling.
- Output contract — the SUMMARY/CHANGES/EVIDENCE/RISKS/BLOCKERS
  format every sub-agent must produce.
- Memory + \`remember\` integration (#489) — sub-agents inherit the
  parent's memory file when memory is enabled and can append durable
  notes.
- Implementation notes — source path, persisted state file,
  is_running semantics, RwLock pattern.

Cross-link added in \`docs/TOOL_SURFACE.md\` so the sub-agent section
points to this doc.

No Rust code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 04:09:27 -05:00

7.8 KiB

Raw Blame History

Sub-Agents

Sub-agents are background instances of the agent loop. The parent agent spawns one with a focused task, gets back an agent_id immediately, and continues working while the sub-agent runs to completion. Sub-agents inherit the parent's tool registry by default and run with CancellationToken::child_token(), so cancelling the parent cancels every descendant.

This doc covers the role taxonomy. For the orchestration tool surface (agent_spawn / agent_wait / agent_result / agent_cancel / agent_list / agent_send_input / agent_resume / agent_assign) see prompts/base.md "Sub-Agent Strategy" and the in-line tool descriptions.

Role taxonomy

The agent_type field on agent_spawn selects a system-prompt posture for the child. Each role is a distinct stance toward the work — not just a different label.

Role	Stance	Writes?	Runs shell?	Typical use
`general`	flexible; do whatever the parent says	yes	yes	the default; multi-step tasks
`explore`	read-only; map the relevant code fast	no	yes (read)	"find every call site of `Foo`"
`plan`	analyse and produce a strategy	minimal	minimal	"design the migration; don't execute"
`review`	read-and-grade with severity scores	no	no	"audit this PR for bugs"
`implementer`	land a specific change with min edit	yes	yes	"rewrite `bar.rs::Foo::bar` to do X"
`verifier`	run tests / validation, report outcome	no	yes (test)	"run cargo test --workspace, report"
`custom`	explicit narrow tool allowlist	depends	depends	locked-down dispatch with hand-picked tools

Each role's full system prompt lives in crates/tui/src/tools/subagent/mod.rs (search for *_AGENT_PROMPT). The prompt prefix loads automatically when the child agent boots; the parent's spawn prompt becomes the first turn's user message.

When to pick which role

general — when the task is "do this whole thing", not "go look", "design", or "verify". This is the right default; reach for a more specific role only when the posture matters.
explore — when the parent needs evidence before deciding what to do next. Explorers are cheap and fast; spawn 2–3 in parallel for independent regions.
plan — when the parent has an objective but no executable decomposition. Planners write artifacts (update_plan rows, checklist_write entries) but don't carry them out.
review — when there's already a change and the parent wants it graded. Reviewers don't patch — they describe the fix in the finding so the parent can dispatch an Implementer if the verdict is "fix it".
implementer — when the change is already specified and just needs to land. Implementers stay tightly scoped: minimum edit, no drive-by refactoring, run a quick verification before handing back.
verifier — when the parent needs an authoritative pass/fail on the test suite or other validation. Verifiers don't fix failures; they capture the failing assertion + stack and put fix candidates under RISKS.
custom — only when the parent needs to constrain the tool set explicitly. Pass the allowlist via the allowed_tools field on agent_spawn.

Aliases

The model can spell each role multiple ways:

Canonical	Aliases
`general`	`worker`, `default`, `general-purpose`
`explore`	`explorer`, `exploration`
`plan`	`planning`, `awaiter`
`review`	`reviewer`, `code-review`
`implementer`	`implement`, `implementation`, `builder`
`verifier`	`verify`, `verification`, `validator`, `tester`
`custom`	(none; explicit `allowed_tools` array required)

All matching is case-insensitive. Unknown values produce a typed error listing the accepted set, so the model can self-correct on the next turn.

Concurrency cap

The dispatcher caps concurrent sub-agents at 10 by default (configurable via [subagents].max_concurrent in ~/.deepseek/config.toml, hard ceiling 20). When the parent hits the cap, agent_spawn returns an error with the cap value; the parent should agent_wait for completion or agent_cancel to free a slot before retrying.

The cap counts only running agents — completed / failed / cancelled records persist for inspection but don't occupy a slot. Agents that lost their task_handle (e.g. across a process restart) also don't count against the cap.

Lifecycle

Each spawn produces a record that progresses through:

Pending → Running → (Completed | Failed(reason) | Cancelled | Interrupted(reason))

Interrupted fires when the manager detects a Running agent whose task handle is gone — typically after a process restart that loaded the agent from ~/.deepseek/subagents.v1.json. The parent can agent_resume to attempt continuation or treat it as a terminal state.

Session boundaries (#405)

Each SubAgentManager instance assigns itself a fresh session_boot_id on construction. Every spawn stamps the agent with that id; the persisted state file carries it across restarts.

agent_list defaults to current-session only: prior-session agents that aren't still running are filtered out. Pass include_archived=true to surface every record, with the from_prior_session: true flag so the model can tell archived records apart from live ones.

Records that loaded from a pre-#405 persisted state file (no session_boot_id field) classify as prior-session because the manager can't match them to the current boot.

Output contract

Every sub-agent produces a final result string with five sections, in order:

SUMMARY:    one paragraph; what you did and what happened
CHANGES:    files modified, with one-line descriptions; "None." if read-only
EVIDENCE:   path:line-range citations and key findings; one bullet each
RISKS:      what could go wrong / what the parent should double-check
BLOCKERS:   what stopped you; "None." if you finished cleanly

The exact format lives in crates/tui/src/prompts/subagent_output_format.md. The parent reads EVIDENCE as a working set for the next turn, so explorers and reviewers should be precise here.

Memory and the `remember` tool (#489)

Sub-agents inherit the parent's memory file when memory is enabled ([memory] enabled = true or DEEPSEEK_MEMORY=on). They can append durable notes via the remember tool — handy for an explorer that discovers a project convention worth carrying across sessions, or a verifier that learns "this test is flaky".

Memory writes are scoped to the user's own memory.md file; they don't go through the standard write-approval flow.

Implementation notes

Source: crates/tui/src/tools/subagent/mod.rs (about 3500 LOC).
Persisted state: ~/.deepseek/subagents.v1.json. Schema version 1 (forward-compatible — new optional fields use #[serde(default)]).
The is_running check ignores agents whose task_handle is None; this avoids counting persisted-but-detached records toward the concurrency cap (#509).
SharedSubAgentManager is Arc<RwLock<...>> — read paths use read locks so /agents and the sidebar projection don't block the main loop during multi-agent fan-out (#510).

7.8 KiB Raw Blame History Unescape Escape