Files
codewhale/docs/TOOL_SURFACE.md
T
Hunter Bown 82e4a564aa refactor(#35): tighten agent prompt tool descriptions, drop alias dupes
Tool-surface audit pass:

- FILE OPERATIONS rewritten so each line states the niche, not just the
  verb. read_file mentions PDF auto-extraction + `pages` slicing.
- New SEARCH section consolidates grep_files / file_search / web_search /
  fetch_url so the model sees them next to each other and picks the
  right one. fetch_url (#33) added; previously absent from the prompt.
- request_user_input pulled out of FILE OPERATIONS into its own USER
  section — it never belonged there.
- SUB-AGENTS list shrinks by 3: drops `spawn_agent` (use `agent_spawn`),
  `close_agent` (use `agent_cancel`), and the `agent_assign /
  assign_agent` dual-name. The underlying dispatchers still resolve those
  names, so existing sessions don't break — they just no longer
  pollute the model's tool list.

Adds `docs/TOOL_SURFACE.md` with the rationale, the v0.5.1 final
surface, and the dropped aliases. Calls out that grep_files is pure-Rust
(no rg/grep shell-out, so the "fall back to grep" AC from #35 is
vacuously satisfied — the tool has no shell dependency to fall back from).

Closes #35.
2026-04-25 13:44:43 -05:00

92 lines
3.8 KiB
Markdown

# Tool surface
Why these specific tools, in this groupings, and how each one is meant to be
chosen over the available shell equivalent. Companion to `crates/tui/src/prompts/agent.txt`.
## Design stance
- **Dedicated tools over `exec_shell` whenever the dedicated tool returns
structured output.** Bash escaping is error-prone and platform behavior
varies (GNU vs BSD `grep`, `rg` is not always installed). Structured
output also frees the model from re-parsing free-form text.
- **`exec_shell` for everything else.** Build, test, format, lint, ad-hoc
commands, anything platform-specific. We don't try to wrap the long tail.
- **Drop tools that don't beat their shell equivalent.** Two-tool aliases
for the same backing operation are a model trap — the LLM will alternate
between them and the cache hit rate suffers.
## Final surface (v0.5.1)
### File operations
| Tool | Niche |
|---|---|
| `read_file` | Read a UTF-8 file. PDFs auto-extracted via `pdftotext` (poppler) when available; `pages: "1-5"` slices large docs. |
| `list_dir` | Structured, gitignore-aware listing. Preferred over `exec_shell("ls")`. |
| `write_file` | Create or overwrite a file. |
| `edit_file` | Search-and-replace inside a single file. Cheaper than a full rewrite. |
| `apply_patch` | Apply a unified diff. The right tool for multi-hunk edits. |
### Search
| Tool | Niche |
|---|---|
| `grep_files` | Regex search file contents within the workspace; structured matches + context lines. Pure-Rust (`regex` crate), no `rg`/`grep` shell-out. |
| `file_search` | Fuzzy-match filenames (not contents). Use when you know roughly the name. |
| `web_search` | DuckDuckGo (with Bing fallback); ranked snippets + `ref_id` for citation. |
| `fetch_url` | Direct HTTP GET on a known URL. Faster than `web_search` when the link is already known. HTML stripped to text by default. |
### Shell
| Tool | Niche |
|---|---|
| `exec_shell` | Run a shell command. Foreground or background (`background: true` returns a `task_id`). |
| `exec_shell_wait` | Poll a background task for incremental output. |
| `exec_shell_interact` | Send stdin to a running background task and read incremental output. |
### Git / diagnostics / testing
| Tool | Niche |
|---|---|
| `git_status` | Inspect repo status without running shell. |
| `git_diff` | Inspect working-tree or staged diffs. |
| `diagnostics` | Workspace, git, sandbox, and toolchain info in one call. |
| `run_tests` | `cargo test` with optional args. |
### Task management
| Tool | Niche |
|---|---|
| `todo_write` | Granular per-item progress. |
| `update_plan` | Structured checklist for complex multi-step work. |
| `note` | One-off important fact for later. |
### Sub-agents
`agent_spawn`, `agent_swarm`, `spawn_agents_on_csv`, plus the supporting
tools (`agent_result` / `swarm_result` / `wait` / `send_input` /
`agent_assign` / `agent_cancel` / `resume_agent` / `agent_list` /
`report_agent_job_result` / `swarm_status`). See `agent.txt` for the
delegation protocol.
## Recently consolidated (v0.5.1)
Removed from the prompt as duplicates of equivalent tools (the underlying
dispatchers still resolve them, so existing sessions don't break — they just
no longer pollute the model's tool list):
- `spawn_agent` → use `agent_spawn`.
- `close_agent` → use `agent_cancel`.
- `assign_agent` → use `agent_assign`.
## Why we don't ship a single `bash` tool
Single-`bash` agents (Claude Code's design) are powerful but hand the model
all the foot-guns of shell scripting: quoting, platform divergence,
side-effects from misread cwd, `cd` not persisting between calls, etc. Our
file tools are also significantly cheaper to render in the transcript
(structured JSON-shaped output collapses better than `ls -la` walls of text).
The model can always fall back to `exec_shell` when something is missing.
The dedicated tools just take the common 80% off the shell escape-hatch.