diff --git a/crates/tui/src/prompts/base.md b/crates/tui/src/prompts/base.md
index aa257eaa..fa46dcbb 100644
--- a/crates/tui/src/prompts/base.md
+++ b/crates/tui/src/prompts/base.md
@@ -33,16 +33,18 @@ Your default workflow for any non-trivial request:
 2. **Execute** — work through each checklist item, updating status as you go.
 3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `checklist_write` (granular steps).
 4. **For parallel work**, spawn sub-agents (`agent_spawn`) — each does one thing well. Link them to plan/todo items in your thinking. Batch independent tool calls in a single turn.
-5. **For long inputs, recursive sub-LLM work, or high-leverage parallel reasoning**, use `rlm` — it loads input into a Python REPL as `context` and runs sub-LLM calls there so long strings and batched deliberation stay out of your window.
+5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use `rlm`. It loads the input into a Python REPL where a sub-agent processes it. For shorter inputs, use `read_file` and reason directly.
 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context.
 
 **Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated.
 
-## RLM Is First-Class
+## RLM Is a Specialty Tool
 
-Treat `rlm` as a normal reasoning tool, not a last-resort escape hatch. Reach for it when you need independent second opinions, batched issue triage, design-option comparison, test-plan generation, risky implementation review, or map-reduce over bulky artifacts. Ask bounded questions with explicit inputs and expected output shape.
+`rlm` is for one specific shape of work: a long input that genuinely does not fit in your context (a whole file > ~50K tokens, a long transcript, a multi-document corpus). Reach for it ONLY when direct reasoning over the input is impossible because of its size. For everything else — short inputs, focused questions, parallel exploration — use `read_file`, `grep_files`, or `agent_spawn` instead. Those are faster, cheaper, and easier to reason about.
 
-`rlm` output is advisory. Use it to find blind spots and alternate routes, then ground decisions in local files, live tool output, GitHub issue text, and passing verification before claiming completion.
+When you do use `rlm`, ask bounded questions with explicit inputs and expected output shape. The result is advisory — ground decisions in local files, live tool output, and passing verification before claiming completion.
+
+The Python helpers visible inside the REPL (`llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched`) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code. You only call `rlm` itself from the model side.
 
 ## Context
 You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread.
diff --git a/crates/tui/src/prompts/base.txt b/crates/tui/src/prompts/base.txt
index 962fbdf0..47cb64a9 100644
--- a/crates/tui/src/prompts/base.txt
+++ b/crates/tui/src/prompts/base.txt
@@ -9,16 +9,18 @@ Your default workflow for any non-trivial request:
 2. **Execute** — work through each checklist item, updating status as you go.
 3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `checklist_write` (granular steps).
 4. **For parallel work**, spawn sub-agents (`agent_spawn`) — each does one thing well. Link them to plan/todo items in your thinking.
-5. **For long inputs, recursive sub-LLM work, or high-leverage parallel reasoning**, use `rlm` — it loads input into a Python REPL as `context` and runs sub-LLM calls there so long strings and batched deliberation stay out of your window.
+5. **Only when an input genuinely doesn't fit your context window** — a whole file > ~50K tokens, a long transcript, a multi-document corpus — use `rlm`. It loads the input into a Python REPL where a sub-agent processes it. For shorter inputs, use `read_file` and reason directly.
 6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context.
 
 **Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated.
 
-## RLM Is First-Class
+## RLM Is a Specialty Tool
 
-Treat `rlm` as a normal reasoning tool, not a last-resort escape hatch. Reach for it when you need independent second opinions, batched issue triage, design-option comparison, test-plan generation, risky implementation review, or map-reduce over bulky artifacts. Ask bounded questions with explicit inputs and expected output shape.
+`rlm` is for one specific shape of work: a long input that genuinely does not fit in your context (a whole file > ~50K tokens, a long transcript, a multi-document corpus). Reach for it ONLY when direct reasoning over the input is impossible because of its size. For everything else — short inputs, focused questions, parallel exploration — use `read_file`, `grep_files`, or `agent_spawn` instead.
 
-`rlm` output is advisory. Use it to find blind spots and alternate routes, then ground decisions in local files, live tool output, GitHub issue text, and passing verification before claiming completion.
+When you do use `rlm`, ask bounded questions with explicit inputs and expected output shape. The result is advisory — ground decisions in local files, live tool output, and passing verification before claiming completion.
+
+The Python helpers visible inside the REPL (`llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched`) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code.
 
 ## Context
 You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread.
diff --git a/crates/tui/src/tools/rlm.rs b/crates/tui/src/tools/rlm.rs
index cfc1f0a9..80a13138 100644
--- a/crates/tui/src/tools/rlm.rs
+++ b/crates/tui/src/tools/rlm.rs
@@ -54,15 +54,26 @@ impl ToolSpec for RlmTool {
     }
 
     fn description(&self) -> &'static str {
-        "Heavy-lift recursive language model. Use when you have a long input \
-         (a whole file, a long transcript, a doc) that doesn't fit in your \
-         working context. The input is loaded into a sandboxed Python REPL \
-         where a sub-agent writes code to chunk and process it via sub-LLM \
-         calls, and returns a synthesized answer. Provide `task` (what to \
-         do) plus exactly one of `file_path` (relative to workspace, \
-         preferred) or `content` (inline, capped at 200k chars). Slower and \
-         pricier than `read_file` / `rlm_query` — only reach for it when \
-         the input genuinely doesn't fit. Returns the final answer string."
+        "Specialty tool for processing long inputs that don't fit in your \
+         own context window. Loads the input into a sandboxed Python REPL \
+         as `PROMPT`; a sub-agent writes Python that chunks the input and \
+         calls in-REPL helpers (`llm_query`, `llm_query_batched`, \
+         `rlm_query`, `rlm_query_batched`) to process it, then returns a \
+         synthesized answer. \n\n\
+         DO NOT use this tool when: the input fits in your context (just \
+         use `read_file` and reason directly); a `grep_files` / \
+         `exec_shell` pipeline would answer the question; the task is a \
+         short classification or extraction; you need interactive \
+         iterative exploration (rlm is one-shot batch). \n\n\
+         Use this tool only when the input is genuinely too large to load \
+         (a whole file > 50K tokens, a long transcript, a multi-document \
+         corpus). It is slower and more expensive than direct reasoning. \n\n\
+         Provide `task` (what to do) plus exactly one of `file_path` \
+         (workspace-relative, preferred — keeps the long input out of \
+         your context entirely) or `content` (inline, capped at 200k \
+         chars). The Python helpers (`llm_query`, `rlm_query`, etc.) live \
+         INSIDE the REPL — they are not separately-callable tools. \n\n\
+         Returns the final synthesized answer as a string."
     }
 
     fn input_schema(&self) -> Value {