diff --git a/crates/tui/src/prompts/base.md b/crates/tui/src/prompts/base.md index aa257eaa..fa46dcbb 100644 --- a/crates/tui/src/prompts/base.md +++ b/crates/tui/src/prompts/base.md @@ -33,16 +33,18 @@ Your default workflow for any non-trivial request: 2. **Execute** — work through each checklist item, updating status as you go. 3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `checklist_write` (granular steps). 4. **For parallel work**, spawn sub-agents (`agent_spawn`) — each does one thing well. Link them to plan/todo items in your thinking. Batch independent tool calls in a single turn. -5. **For long inputs, recursive sub-LLM work, or high-leverage parallel reasoning**, use `rlm` — it loads input into a Python REPL as `context` and runs sub-LLM calls there so long strings and batched deliberation stay out of your window. +5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use `rlm`. It loads the input into a Python REPL where a sub-agent processes it. For shorter inputs, use `read_file` and reason directly. 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context. **Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated. -## RLM Is First-Class +## RLM Is a Specialty Tool -Treat `rlm` as a normal reasoning tool, not a last-resort escape hatch. Reach for it when you need independent second opinions, batched issue triage, design-option comparison, test-plan generation, risky implementation review, or map-reduce over bulky artifacts. Ask bounded questions with explicit inputs and expected output shape. +`rlm` is for one specific shape of work: a long input that genuinely does not fit in your context (a whole file > ~50K tokens, a long transcript, a multi-document corpus). Reach for it ONLY when direct reasoning over the input is impossible because of its size. For everything else — short inputs, focused questions, parallel exploration — use `read_file`, `grep_files`, or `agent_spawn` instead. Those are faster, cheaper, and easier to reason about. -`rlm` output is advisory. Use it to find blind spots and alternate routes, then ground decisions in local files, live tool output, GitHub issue text, and passing verification before claiming completion. +When you do use `rlm`, ask bounded questions with explicit inputs and expected output shape. The result is advisory — ground decisions in local files, live tool output, and passing verification before claiming completion. + +The Python helpers visible inside the REPL (`llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched`) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code. You only call `rlm` itself from the model side. ## Context You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread. diff --git a/crates/tui/src/prompts/base.txt b/crates/tui/src/prompts/base.txt index 962fbdf0..47cb64a9 100644 --- a/crates/tui/src/prompts/base.txt +++ b/crates/tui/src/prompts/base.txt @@ -9,16 +9,18 @@ Your default workflow for any non-trivial request: 2. **Execute** — work through each checklist item, updating status as you go. 3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `checklist_write` (granular steps). 4. **For parallel work**, spawn sub-agents (`agent_spawn`) — each does one thing well. Link them to plan/todo items in your thinking. -5. **For long inputs, recursive sub-LLM work, or high-leverage parallel reasoning**, use `rlm` — it loads input into a Python REPL as `context` and runs sub-LLM calls there so long strings and batched deliberation stay out of your window. +5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use `rlm`. It loads the input into a Python REPL where a sub-agent processes it. For shorter inputs, use `read_file` and reason directly. 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context. **Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated. -## RLM Is First-Class +## RLM Is a Specialty Tool -Treat `rlm` as a normal reasoning tool, not a last-resort escape hatch. Reach for it when you need independent second opinions, batched issue triage, design-option comparison, test-plan generation, risky implementation review, or map-reduce over bulky artifacts. Ask bounded questions with explicit inputs and expected output shape. +`rlm` is for one specific shape of work: a long input that genuinely does not fit in your context (a whole file > ~50K tokens, a long transcript, a multi-document corpus). Reach for it ONLY when direct reasoning over the input is impossible because of its size. For everything else — short inputs, focused questions, parallel exploration — use `read_file`, `grep_files`, or `agent_spawn` instead. -`rlm` output is advisory. Use it to find blind spots and alternate routes, then ground decisions in local files, live tool output, GitHub issue text, and passing verification before claiming completion. +When you do use `rlm`, ask bounded questions with explicit inputs and expected output shape. The result is advisory — ground decisions in local files, live tool output, and passing verification before claiming completion. + +The Python helpers visible inside the REPL (`llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched`) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code. ## Context You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread. diff --git a/crates/tui/src/tools/rlm.rs b/crates/tui/src/tools/rlm.rs index cfc1f0a9..80a13138 100644 --- a/crates/tui/src/tools/rlm.rs +++ b/crates/tui/src/tools/rlm.rs @@ -54,15 +54,26 @@ impl ToolSpec for RlmTool { } fn description(&self) -> &'static str { - "Heavy-lift recursive language model. Use when you have a long input \ - (a whole file, a long transcript, a doc) that doesn't fit in your \ - working context. The input is loaded into a sandboxed Python REPL \ - where a sub-agent writes code to chunk and process it via sub-LLM \ - calls, and returns a synthesized answer. Provide `task` (what to \ - do) plus exactly one of `file_path` (relative to workspace, \ - preferred) or `content` (inline, capped at 200k chars). Slower and \ - pricier than `read_file` / `rlm_query` — only reach for it when \ - the input genuinely doesn't fit. Returns the final answer string." + "Specialty tool for processing long inputs that don't fit in your \ + own context window. Loads the input into a sandboxed Python REPL \ + as `PROMPT`; a sub-agent writes Python that chunks the input and \ + calls in-REPL helpers (`llm_query`, `llm_query_batched`, \ + `rlm_query`, `rlm_query_batched`) to process it, then returns a \ + synthesized answer. \n\n\ + DO NOT use this tool when: the input fits in your context (just \ + use `read_file` and reason directly); a `grep_files` / \ + `exec_shell` pipeline would answer the question; the task is a \ + short classification or extraction; you need interactive \ + iterative exploration (rlm is one-shot batch). \n\n\ + Use this tool only when the input is genuinely too large to load \ + (a whole file > 50K tokens, a long transcript, a multi-document \ + corpus). It is slower and more expensive than direct reasoning. \n\n\ + Provide `task` (what to do) plus exactly one of `file_path` \ + (workspace-relative, preferred — keeps the long input out of \ + your context entirely) or `content` (inline, capped at 200k \ + chars). The Python helpers (`llm_query`, `rlm_query`, etc.) live \ + INSIDE the REPL — they are not separately-callable tools. \n\n\ + Returns the final synthesized answer as a string." } fn input_schema(&self) -> Value {