refactor(#35): tighten agent prompt tool descriptions, drop alias dupes
Tool-surface audit pass: - FILE OPERATIONS rewritten so each line states the niche, not just the verb. read_file mentions PDF auto-extraction + `pages` slicing. - New SEARCH section consolidates grep_files / file_search / web_search / fetch_url so the model sees them next to each other and picks the right one. fetch_url (#33) added; previously absent from the prompt. - request_user_input pulled out of FILE OPERATIONS into its own USER section — it never belonged there. - SUB-AGENTS list shrinks by 3: drops `spawn_agent` (use `agent_spawn`), `close_agent` (use `agent_cancel`), and the `agent_assign / assign_agent` dual-name. The underlying dispatchers still resolve those names, so existing sessions don't break — they just no longer pollute the model's tool list. Adds `docs/TOOL_SURFACE.md` with the rationale, the v0.5.1 final surface, and the dropped aliases. Calls out that grep_files is pure-Rust (no rg/grep shell-out, so the "fall back to grep" AC from #35 is vacuously satisfied — the tool has no shell dependency to fall back from). Closes #35.
This commit is contained in:
@@ -35,15 +35,21 @@ Step budgeting:
|
||||
|
||||
Available tools:
|
||||
|
||||
FILE OPERATIONS:
|
||||
- list_dir: List directory contents
|
||||
- read_file: Read file contents
|
||||
- write_file: Create or overwrite a file
|
||||
- edit_file: Search and replace text in a file
|
||||
- apply_patch: Apply a unified diff patch to a file
|
||||
- grep_files: Search files by regex
|
||||
- web_search: Quick web search (fallback when citations are not needed)
|
||||
- request_user_input: Ask the user short multiple-choice questions
|
||||
FILE OPERATIONS (prefer these over `exec_shell` equivalents — they return structured output):
|
||||
- read_file: Read a file. PDFs are auto-extracted via pdftotext; pass `pages: "1-5"` to slice.
|
||||
- list_dir: List directory contents (structured, gitignore-aware).
|
||||
- write_file: Create or overwrite a file.
|
||||
- edit_file: Search-and-replace inside a single file. Cheaper than rewriting.
|
||||
- apply_patch: Apply a unified diff patch — the right tool for multi-hunk edits.
|
||||
|
||||
SEARCH:
|
||||
- grep_files: Regex search file contents within the workspace; returns matches + context lines.
|
||||
- file_search: Fuzzy-match filenames (NOT contents). Use to locate a file when you know roughly the name.
|
||||
- web_search: DuckDuckGo/Bing search; returns ranked snippets with ref_ids for citation.
|
||||
- fetch_url: Direct HTTP GET on a known URL (faster than web_search when the link is already known). HTML is stripped to text by default.
|
||||
|
||||
USER:
|
||||
- request_user_input: Ask the user a short multiple-choice question.
|
||||
|
||||
PARALLEL TOOL USE:
|
||||
- Issue independent tool calls in parallel by emitting multiple tool_calls in one assistant turn (the model API supports this natively). Do not wrap them in any meta-tool or pseudo-XML.
|
||||
@@ -79,21 +85,18 @@ TASK MANAGEMENT:
|
||||
- note: Record important information
|
||||
|
||||
SUB-AGENTS:
|
||||
- spawn_agent: Spawn a background sub-agent (agent_type, message/items)
|
||||
- agent_spawn: Spawn a background sub-agent (type, prompt, allowed_tools)
|
||||
- spawn_agents_on_csv: Batch-process CSV rows with one worker sub-agent per row
|
||||
- report_agent_job_result: Worker-only job row report tool for spawn_agents_on_csv
|
||||
- agent_swarm: Spawn a dependency-aware swarm of sub-agents (tasks, shared_context)
|
||||
- swarm_status: Check status for a previously started swarm (swarm_id)
|
||||
- swarm_result: Get full results for a previously started swarm (swarm_id, optional block/timeout)
|
||||
- agent_result: Get result from a sub-agent (agent_id, block, timeout_ms)
|
||||
- send_input: Send input to a running sub-agent (agent_id, message/items, interrupt)
|
||||
- agent_assign / assign_agent: Update assignment objective/role and optionally push immediate guidance
|
||||
- wait: Wait for one or more sub-agents to complete (ids optional, wait_mode:any|all, timeout_ms)
|
||||
- agent_cancel: Cancel a running sub-agent (agent_id)
|
||||
- close_agent: Close a running sub-agent (alias for cancel)
|
||||
- resume_agent: Resume a previously closed/completed sub-agent
|
||||
- agent_list: List all sub-agents and their status
|
||||
- agent_spawn: Spawn a background sub-agent (type, prompt, allowed_tools).
|
||||
- spawn_agents_on_csv: Batch-process CSV rows with one worker sub-agent per row.
|
||||
- report_agent_job_result: Worker-only job row report tool for spawn_agents_on_csv.
|
||||
- agent_swarm: Spawn a dependency-aware swarm of sub-agents (tasks, shared_context).
|
||||
- swarm_status / swarm_result: Inspect a swarm by swarm_id (status; or full results, with optional block/timeout).
|
||||
- agent_result: Get result from a sub-agent (agent_id, block, timeout_ms).
|
||||
- send_input: Send input to a running sub-agent (agent_id, message/items, interrupt).
|
||||
- agent_assign: Update assignment objective/role and optionally push immediate guidance.
|
||||
- wait: Wait for one or more sub-agents to complete (ids optional, wait_mode:any|all, timeout_ms).
|
||||
- agent_cancel: Cancel a running sub-agent (agent_id).
|
||||
- resume_agent: Resume a previously closed/completed sub-agent.
|
||||
- agent_list: List all sub-agents and their status.
|
||||
Delegation protocol:
|
||||
- Delegate only bounded, parallelizable work with a clear input, expected output, and tool limits.
|
||||
- Prefer multiple sub-agents for independent steps to maximize parallelism.
|
||||
|
||||
@@ -0,0 +1,91 @@
|
||||
# Tool surface
|
||||
|
||||
Why these specific tools, in this groupings, and how each one is meant to be
|
||||
chosen over the available shell equivalent. Companion to `crates/tui/src/prompts/agent.txt`.
|
||||
|
||||
## Design stance
|
||||
|
||||
- **Dedicated tools over `exec_shell` whenever the dedicated tool returns
|
||||
structured output.** Bash escaping is error-prone and platform behavior
|
||||
varies (GNU vs BSD `grep`, `rg` is not always installed). Structured
|
||||
output also frees the model from re-parsing free-form text.
|
||||
- **`exec_shell` for everything else.** Build, test, format, lint, ad-hoc
|
||||
commands, anything platform-specific. We don't try to wrap the long tail.
|
||||
- **Drop tools that don't beat their shell equivalent.** Two-tool aliases
|
||||
for the same backing operation are a model trap — the LLM will alternate
|
||||
between them and the cache hit rate suffers.
|
||||
|
||||
## Final surface (v0.5.1)
|
||||
|
||||
### File operations
|
||||
|
||||
| Tool | Niche |
|
||||
|---|---|
|
||||
| `read_file` | Read a UTF-8 file. PDFs auto-extracted via `pdftotext` (poppler) when available; `pages: "1-5"` slices large docs. |
|
||||
| `list_dir` | Structured, gitignore-aware listing. Preferred over `exec_shell("ls")`. |
|
||||
| `write_file` | Create or overwrite a file. |
|
||||
| `edit_file` | Search-and-replace inside a single file. Cheaper than a full rewrite. |
|
||||
| `apply_patch` | Apply a unified diff. The right tool for multi-hunk edits. |
|
||||
|
||||
### Search
|
||||
|
||||
| Tool | Niche |
|
||||
|---|---|
|
||||
| `grep_files` | Regex search file contents within the workspace; structured matches + context lines. Pure-Rust (`regex` crate), no `rg`/`grep` shell-out. |
|
||||
| `file_search` | Fuzzy-match filenames (not contents). Use when you know roughly the name. |
|
||||
| `web_search` | DuckDuckGo (with Bing fallback); ranked snippets + `ref_id` for citation. |
|
||||
| `fetch_url` | Direct HTTP GET on a known URL. Faster than `web_search` when the link is already known. HTML stripped to text by default. |
|
||||
|
||||
### Shell
|
||||
|
||||
| Tool | Niche |
|
||||
|---|---|
|
||||
| `exec_shell` | Run a shell command. Foreground or background (`background: true` returns a `task_id`). |
|
||||
| `exec_shell_wait` | Poll a background task for incremental output. |
|
||||
| `exec_shell_interact` | Send stdin to a running background task and read incremental output. |
|
||||
|
||||
### Git / diagnostics / testing
|
||||
|
||||
| Tool | Niche |
|
||||
|---|---|
|
||||
| `git_status` | Inspect repo status without running shell. |
|
||||
| `git_diff` | Inspect working-tree or staged diffs. |
|
||||
| `diagnostics` | Workspace, git, sandbox, and toolchain info in one call. |
|
||||
| `run_tests` | `cargo test` with optional args. |
|
||||
|
||||
### Task management
|
||||
|
||||
| Tool | Niche |
|
||||
|---|---|
|
||||
| `todo_write` | Granular per-item progress. |
|
||||
| `update_plan` | Structured checklist for complex multi-step work. |
|
||||
| `note` | One-off important fact for later. |
|
||||
|
||||
### Sub-agents
|
||||
|
||||
`agent_spawn`, `agent_swarm`, `spawn_agents_on_csv`, plus the supporting
|
||||
tools (`agent_result` / `swarm_result` / `wait` / `send_input` /
|
||||
`agent_assign` / `agent_cancel` / `resume_agent` / `agent_list` /
|
||||
`report_agent_job_result` / `swarm_status`). See `agent.txt` for the
|
||||
delegation protocol.
|
||||
|
||||
## Recently consolidated (v0.5.1)
|
||||
|
||||
Removed from the prompt as duplicates of equivalent tools (the underlying
|
||||
dispatchers still resolve them, so existing sessions don't break — they just
|
||||
no longer pollute the model's tool list):
|
||||
|
||||
- `spawn_agent` → use `agent_spawn`.
|
||||
- `close_agent` → use `agent_cancel`.
|
||||
- `assign_agent` → use `agent_assign`.
|
||||
|
||||
## Why we don't ship a single `bash` tool
|
||||
|
||||
Single-`bash` agents (Claude Code's design) are powerful but hand the model
|
||||
all the foot-guns of shell scripting: quoting, platform divergence,
|
||||
side-effects from misread cwd, `cd` not persisting between calls, etc. Our
|
||||
file tools are also significantly cheaper to render in the transcript
|
||||
(structured JSON-shaped output collapses better than `ls -la` walls of text).
|
||||
|
||||
The model can always fall back to `exec_shell` when something is missing.
|
||||
The dedicated tools just take the common 80% off the shell escape-hatch.
|
||||
Reference in New Issue
Block a user