docs(prompts): tighten /rlm guidance — specialty tool, not first-class (#358)

The previous rlm prompt guidance ("Treat rlm as a normal reasoning tool, not a last-resort escape hatch") encouraged the model to reach for rlm in cases where a direct read_file or focused agent_spawn would do better. The "RLM Is First-Class" framing was too encouraging given that rlm is genuinely a specialty tool: it pays off ONLY when the input can't fit in the model's context window. Three audit items from #358 addressed: 1. **Reaching for rlm too often.** Reframed as "specialty tool" with explicit do-not-use-when guidance front-loaded. The decomposition workflow now says "ONLY when an input genuinely doesn't fit" with a concrete size threshold (~50K tokens / a whole file / a long transcript / a multi-document corpus). 2. **Tool description encourages overuse.** The rlm tool's description() now leads with "DO NOT use this tool when..." (input fits, grep suffices, short classification, interactive exploration), and only then describes the legitimate use cases. Adds explicit cost/speed caveat. 3. **Helpers documented as if they were tools.** Both the rlm tool description and base.md/base.txt now state plainly: `llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched` live INSIDE the Python REPL. They are functions the sub-agent uses, NOT separately-callable tools the model invokes. Closes #358. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 10:16:31 -05:00
parent bc13dbfee7
commit f0e1a6c63a
3 changed files with 32 additions and 17 deletions
@@ -33,16 +33,18 @@ Your default workflow for any non-trivial request:
 2. **Execute** — work through each checklist item, updating status as you go.
 3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `checklist_write` (granular steps).
 4. **For parallel work**, spawn sub-agents (`agent_spawn`) — each does one thing well. Link them to plan/todo items in your thinking. Batch independent tool calls in a single turn.
-5. **For long inputs, recursive sub-LLM work, or high-leverage parallel reasoning**, use `rlm` — it loads input into a Python REPL as `context` and runs sub-LLM calls there so long strings and batched deliberation stay out of your window.
+5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use `rlm`. It loads the input into a Python REPL where a sub-agent processes it. For shorter inputs, use `read_file` and reason directly.
 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context.

 **Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated.

-## RLM Is First-Class
+## RLM Is a Specialty Tool

-Treat `rlm` as a normal reasoning tool, not a last-resort escape hatch. Reach for it when you need independent second opinions, batched issue triage, design-option comparison, test-plan generation, risky implementation review, or map-reduce over bulky artifacts. Ask bounded questions with explicit inputs and expected output shape.
+`rlm` is for one specific shape of work: a long input that genuinely does not fit in your context (a whole file > ~50K tokens, a long transcript, a multi-document corpus). Reach for it ONLY when direct reasoning over the input is impossible because of its size. For everything else — short inputs, focused questions, parallel exploration — use `read_file`, `grep_files`, or `agent_spawn` instead. Those are faster, cheaper, and easier to reason about.

-`rlm` output is advisory. Use it to find blind spots and alternate routes, then ground decisions in local files, live tool output, GitHub issue text, and passing verification before claiming completion.
+When you do use `rlm`, ask bounded questions with explicit inputs and expected output shape. The result is advisory — ground decisions in local files, live tool output, and passing verification before claiming completion.
+
+The Python helpers visible inside the REPL (`llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched`) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code. You only call `rlm` itself from the model side.

 ## Context
 You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread.
@@ -9,16 +9,18 @@ Your default workflow for any non-trivial request:
 2. **Execute** — work through each checklist item, updating status as you go.
 3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `checklist_write` (granular steps).
 4. **For parallel work**, spawn sub-agents (`agent_spawn`) — each does one thing well. Link them to plan/todo items in your thinking.
-5. **For long inputs, recursive sub-LLM work, or high-leverage parallel reasoning**, use `rlm` — it loads input into a Python REPL as `context` and runs sub-LLM calls there so long strings and batched deliberation stay out of your window.
+5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use `rlm`. It loads the input into a Python REPL where a sub-agent processes it. For shorter inputs, use `read_file` and reason directly.
 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context.

 **Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated.

-## RLM Is First-Class
+## RLM Is a Specialty Tool

-Treat `rlm` as a normal reasoning tool, not a last-resort escape hatch. Reach for it when you need independent second opinions, batched issue triage, design-option comparison, test-plan generation, risky implementation review, or map-reduce over bulky artifacts. Ask bounded questions with explicit inputs and expected output shape.
+`rlm` is for one specific shape of work: a long input that genuinely does not fit in your context (a whole file > ~50K tokens, a long transcript, a multi-document corpus). Reach for it ONLY when direct reasoning over the input is impossible because of its size. For everything else — short inputs, focused questions, parallel exploration — use `read_file`, `grep_files`, or `agent_spawn` instead.

-`rlm` output is advisory. Use it to find blind spots and alternate routes, then ground decisions in local files, live tool output, GitHub issue text, and passing verification before claiming completion.
+When you do use `rlm`, ask bounded questions with explicit inputs and expected output shape. The result is advisory — ground decisions in local files, live tool output, and passing verification before claiming completion.
+
+The Python helpers visible inside the REPL (`llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched`) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code.

 ## Context
 You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread.
@@ -54,15 +54,26 @@ impl ToolSpec for RlmTool {
    }

    fn description(&self) -> &'static str {
-        "Heavy-lift recursive language model. Use when you have a long input \
-         (a whole file, a long transcript, a doc) that doesn't fit in your \
-         working context. The input is loaded into a sandboxed Python REPL \
-         where a sub-agent writes code to chunk and process it via sub-LLM \
-         calls, and returns a synthesized answer. Provide `task` (what to \
-         do) plus exactly one of `file_path` (relative to workspace, \
-         preferred) or `content` (inline, capped at 200k chars). Slower and \
-         pricier than `read_file` / `rlm_query` — only reach for it when \
-         the input genuinely doesn't fit. Returns the final answer string."
+        "Specialty tool for processing long inputs that don't fit in your \
+         own context window. Loads the input into a sandboxed Python REPL \
+         as `PROMPT`; a sub-agent writes Python that chunks the input and \
+         calls in-REPL helpers (`llm_query`, `llm_query_batched`, \
+         `rlm_query`, `rlm_query_batched`) to process it, then returns a \
+         synthesized answer. \n\n\
+         DO NOT use this tool when: the input fits in your context (just \
+         use `read_file` and reason directly); a `grep_files` / \
+         `exec_shell` pipeline would answer the question; the task is a \
+         short classification or extraction; you need interactive \
+         iterative exploration (rlm is one-shot batch). \n\n\
+         Use this tool only when the input is genuinely too large to load \
+         (a whole file > 50K tokens, a long transcript, a multi-document \
+         corpus). It is slower and more expensive than direct reasoning. \n\n\
+         Provide `task` (what to do) plus exactly one of `file_path` \
+         (workspace-relative, preferred — keeps the long input out of \
+         your context entirely) or `content` (inline, capped at 200k \
+         chars). The Python helpers (`llm_query`, `rlm_query`, etc.) live \
+         INSIDE the REPL — they are not separately-callable tools. \n\n\
+         Returns the final synthesized answer as a string."
    }

    fn input_schema(&self) -> Value {