v0.6.0: native rlm_query tool + scroll fix + cleanup

Adds a structured rlm_query tool for parallel/batched LLM fan-out. The model calls it with one prompt or up to 16 concurrent prompts; children dispatch via tokio::join_all against the existing DeepSeek client. Default child model is deepseek-v4-flash; override per-call via the model field. Available in Plan / Agent / YOLO. Cost folds into the session's running total automatically. Fixes scroll-stuck regression (#56): TranscriptScroll::resolve_top and scrolled_by now use a three-level fallback chain (same line → same cell line 0 → nearest cell at-or-before) instead of teleporting to ToBottom when an anchor cell vanishes. Loosens command-safety chains (#57): cargo build && cargo test and similar chains of known-safe commands now escalate to RequiresApproval instead of being hard-blocked as Dangerous. Chains containing unknown commands still block. Suppresses the GettingCrowded footer chip — context-percent header already covers conversation pressure. Refactors: - Extracts file_mention parsing/completion/expansion (~450 LOC) from the 5,500-line ui.rs into crates/tui/src/tui/file_mention.rs. - Deletes truly unused helpers (write_bytes, timestamped_filename, extension_from_url, output_path, has_project_doc, primary_doc_path). Tests: 853 pass. cargo clippy --workspace -D warnings clean. cargo fmt --all -- --check clean. Closes #46 #47 #48 #49 #50 #53 #54 #55 #56 #57 #58.
2026-04-25 21:38:48 -05:00
parent 027d6d19b6
commit 5f223adea6
36 changed files with 1366 additions and 1248 deletions
@@ -11,6 +11,9 @@ This file provides context for AI assistants working on this project.
 - Format: `cargo fmt --all`
 - Run: `cargo run -p deepseek-tui`

+### Build Dependencies
+- **Rust** 1.85+ (for the workspace)
+
 ### Documentation
 See README.md for project overview, docs/ARCHITECTURE.md for internals.

@@ -25,4 +28,4 @@ See README.md for project overview, docs/ARCHITECTURE.md for internals.
 ## Important Notes

 - **Token/cost tracking inaccuracies**: Token counting and cost estimation may be inflated due to thinking token accounting bugs. Use `/compact` to manage context, and treat cost estimates as approximate.
- **Modes**: Three modes — Plan (read-only investigation), Agent (tool use with approval), YOLO (auto-approved). See `docs/MODES.md` for details.
+- **Modes**: Three modes — Plan (read-only investigation), Agent (tool use with approval), YOLO (auto-approved). See `docs/MODES.md` for details. All three modes can call the `rlm_query` tool for parallel/batched LLM fan-out (`crates/tui/src/tools/rlm_query.rs`).
@@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+## [0.6.0] - 2026-04-25
+
+### Added
+- **`rlm_query` tool — recursive language models as a first-class structured tool.** Inspired by [Alex Zhang's RLM work](https://github.com/alexzhang13/rlm) and Sakana AI's published novelty-search research, but trimmed to what an agent loop actually needs. The model calls `rlm_query` with one prompt or up to 16 concurrent prompts; children run on `deepseek-v4-flash` by default and can be promoted to Pro per-call. Children dispatch concurrently via `tokio::join_all` against the existing DeepSeek client — no external runtime, no fenced-block DSL, no Python sandbox. Returns plain text for one prompt, indexed `[0] ...\n\n---\n\n[1] ...` blocks for many. Available in Plan / Agent / YOLO. Cost is folded into the session's running total automatically.
+
+### Changed
+- **Scroll position survives content rewrites (#56).** `TranscriptScroll::resolve_top` and `scrolled_by` no longer teleport to bottom when the anchor cell vanishes. Three-level fallback chain: same line → same cell, line 0 → nearest surviving cell at-or-before. Previously, any rewrite of the assistant message (e.g. tool-result replacement) silently dropped the user back to the live tail mid-scroll.
+- **Looser command-safety chains (#57).** `cargo build && cargo test`, `git fetch && git rebase`, and similar chains of known-safe commands now escalate to `RequiresApproval` instead of being hard-blocked as `Dangerous`. Chains containing unknown commands still block.
+- **`GettingCrowded` no longer surfaces a footer chip.** The context-percent header already covers conversation pressure; the chip now only fires for active engine interventions (`refreshing context`, `verifying`, `resetting plan`).
+
 ## [0.5.2] - 2026-04-25

 ### Added
@@ -806,7 +806,7 @@ dependencies = [

 [[package]]
 name = "deepseek-agent"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "deepseek-config",
 "serde",
@@ -814,7 +814,7 @@ dependencies = [

 [[package]]
 name = "deepseek-app-server"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "axum",
@@ -837,7 +837,7 @@ dependencies = [

 [[package]]
 name = "deepseek-config"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "dirs",
@@ -848,7 +848,7 @@ dependencies = [

 [[package]]
 name = "deepseek-core"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "chrono",
@@ -867,7 +867,7 @@ dependencies = [

 [[package]]
 name = "deepseek-execpolicy"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "deepseek-protocol",
@@ -876,7 +876,7 @@ dependencies = [

 [[package]]
 name = "deepseek-hooks"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "async-trait",
@@ -890,7 +890,7 @@ dependencies = [

 [[package]]
 name = "deepseek-mcp"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "deepseek-protocol",
@@ -900,7 +900,7 @@ dependencies = [

 [[package]]
 name = "deepseek-protocol"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "serde",
 "serde_json",
@@ -908,7 +908,7 @@ dependencies = [

 [[package]]
 name = "deepseek-state"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "chrono",
@@ -920,7 +920,7 @@ dependencies = [

 [[package]]
 name = "deepseek-tools"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "async-trait",
@@ -933,7 +933,7 @@ dependencies = [

 [[package]]
 name = "deepseek-tui"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "arboard",
@@ -987,7 +987,7 @@ dependencies = [

 [[package]]
 name = "deepseek-tui-cli"
-version = "0.5.2"
+version = "0.6.0"
 dependencies = [
 "anyhow",
 "chrono",
@@ -1005,7 +1005,7 @@ dependencies = [

 [[package]]
 name = "deepseek-tui-core"
-version = "0.5.2"
+version = "0.6.0"

 [[package]]
 name = "deranged"
@@ -18,7 +18,7 @@ default-members = ["crates/cli", "crates/app-server", "crates/tui"]
 resolver = "2"

 [workspace.package]
-version = "0.5.2"
+version = "0.6.0"
 edition = "2024"
 license = "MIT"
 repository = "https://github.com/Hmbown/DeepSeek-TUI"
@@ -21,10 +21,11 @@ DeepSeek TUI is a coding agent that runs entirely in your terminal. It gives Dee

 ### Key Features

+- 🌊 **Native RLM** *(new in v0.6)* — `rlm_query` tool fans out 1–16 cheap `deepseek-v4-flash` children in parallel against the existing DeepSeek client. The model uses it for batched analysis, decomposition, or cheap parallel reasoning — one structured tool call, no external runtime
 - 🧠 **Thinking-mode streaming** — watch DeepSeek's chain-of-thought as it reasons about your code
 - 🔧 **Full tool suite** — file ops, shell execution, git, web search/browse, apply-patch, sub-agents, MCP servers, and more
 - 🪟 **1M-token context** — feed entire codebases; automatic intelligent compaction when context fills up
- 🎛️ **Three interaction modes** — Plan (read-only explore), Agent (interactive with approval), YOLO (auto-approved)
+- 🎛️ **Three interaction modes** — Plan (read-only explore), Agent (interactive with approval), YOLO (auto-approved). All three can call `rlm_query` for parallel research
 - ⚡ **Reasoning-effort tiers** — cycle through `off → high → max` with Shift+Tab
 - 🔄 **Session save/resume** — checkpoint and resume long sessions, fork conversations
 - 🌐 **HTTP/SSE runtime API** — `deepseek serve --http` for headless agent workflows
@@ -68,17 +69,45 @@ DEEPSEEK_PROVIDER=nvidia-nim NVIDIA_API_KEY="..." deepseek
 ```bash
 git clone https://github.com/Hmbown/DeepSeek-TUI.git
 cd DeepSeek-TUI
-cargo install --path crates/tui --locked   # requires Rust 1.85+
+cargo install --path crates/tui --bin deepseek-tui --locked   # requires Rust 1.85+
+cargo install --path crates/cli --bin deepseek --locked
 ```

 </details>

 ---

-## What's new in v0.5.0
+## What's new in v0.6.0

- **Multi-turn tool calls no longer 400 on thinking-mode models.** Every assistant message now replays `reasoning_content` (with a safe placeholder when the round produced none), and a final-pass sanitizer guarantees the wire payload satisfies DeepSeek's thinking-mode contract — even for sessions restored from older checkpoints or sub-agents that bypass the engine path.
- **Phantom `web.run` references stripped** from prompts and the `web_search` tool ([#25](https://github.com/Hmbown/DeepSeek-TUI/issues/25)).
+### 🌊 `rlm_query` — recursive language models as a first-class tool
+
+The model now has direct access to a native recursive-LLM primitive. Inspired by [Alex Zhang's RLM work](https://github.com/alexzhang13/rlm) and Sakana AI's published research on novelty search, but trimmed to what an agent loop actually needs: one tool, structured args, no DSL.
+
+```jsonc
+// Single child:
+rlm_query({ "prompt": "Summarise this 4k-line log: ..." })
+
+// 8 parallel children, indexed result:
+rlm_query({
+  "prompts": [
+    "Review src/foo.rs for race conditions: ...",
+    "Review src/foo.rs for input validation: ...",
+    "Review src/foo.rs for error-handling gaps: ...",
+    "..."
+  ]
+})
+
+// Promote one call to Pro:
+rlm_query({ "prompt": "Hard reasoning here", "model": "deepseek-v4-pro" })
+```
+
+Children run concurrently against the existing DeepSeek client via `tokio` — no external binary, no Python sandbox, no fenced-block DSL. Returns a single string for one prompt or `[i] ...` indexed blocks for many. Available in Plan / Agent / YOLO. The cost is folded into the session's running total automatically.
+
+### Other changes
+
+- **Scroll position survives content rewrites** — anchor fallback now clamps to the nearest surviving cell instead of teleporting to the bottom (#56)
+- **Looser command-safety chains** — `cargo build && cargo test` is no longer blocked outright; chains of known-safe commands escalate to RequiresApproval instead of Dangerous (#57)
+- **Multi-turn tool calls no longer 400 on thinking-mode models** — `reasoning_content` is replayed across user-message boundaries with a safe placeholder when the round produced none

 Full history: [CHANGELOG.md](CHANGELOG.md).

@@ -138,6 +167,8 @@ deepseek serve --http                         # HTTP/SSE API server
 | **Agent** 🤖 | Default interactive mode — multi-step tool use with approval gates |
 | **YOLO** ⚡ | Auto-approve all tools in a trusted workspace (use with caution) |

+All three modes have access to the `rlm_query` tool for parallel/batched LLM fan-out (see "What's new in v0.6.0" above).
+
 ---

 ## Configuration
@@ -7,5 +7,5 @@ repository.workspace = true
 description = "Model/provider registry and fallback strategy for DeepSeek workspace architecture"

 [dependencies]
-deepseek-config = { path = "../config", version = "0.5.0" }
+deepseek-config = { path = "../config", version = "0.6.0" }
 serde.workspace = true
@@ -10,15 +10,15 @@ description = "Codex-style app-server transport for DeepSeek workspace architect
 anyhow.workspace = true
 axum.workspace = true
 clap.workspace = true
-deepseek-agent = { path = "../agent", version = "0.5.0" }
-deepseek-config = { path = "../config", version = "0.5.0" }
-deepseek-core = { path = "../core", version = "0.5.0" }
-deepseek-execpolicy = { path = "../execpolicy", version = "0.5.0" }
-deepseek-hooks = { path = "../hooks", version = "0.5.0" }
-deepseek-mcp = { path = "../mcp", version = "0.5.0" }
-deepseek-protocol = { path = "../protocol", version = "0.5.0" }
-deepseek-state = { path = "../state", version = "0.5.0" }
-deepseek-tools = { path = "../tools", version = "0.5.0" }
+deepseek-agent = { path = "../agent", version = "0.6.0" }
+deepseek-config = { path = "../config", version = "0.6.0" }
+deepseek-core = { path = "../core", version = "0.6.0" }
+deepseek-execpolicy = { path = "../execpolicy", version = "0.6.0" }
+deepseek-hooks = { path = "../hooks", version = "0.6.0" }
+deepseek-mcp = { path = "../mcp", version = "0.6.0" }
+deepseek-protocol = { path = "../protocol", version = "0.6.0" }
+deepseek-state = { path = "../state", version = "0.6.0" }
+deepseek-tools = { path = "../tools", version = "0.6.0" }
 serde.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
@@ -14,12 +14,12 @@ path = "src/main.rs"
 anyhow.workspace = true
 clap.workspace = true
 clap_complete.workspace = true
-deepseek-agent = { path = "../agent", version = "0.5.0" }
-deepseek-app-server = { path = "../app-server", version = "0.5.0" }
-deepseek-config = { path = "../config", version = "0.5.0" }
-deepseek-execpolicy = { path = "../execpolicy", version = "0.5.0" }
-deepseek-mcp = { path = "../mcp", version = "0.5.0" }
-deepseek-state = { path = "../state", version = "0.5.0" }
+deepseek-agent = { path = "../agent", version = "0.6.0" }
+deepseek-app-server = { path = "../app-server", version = "0.6.0" }
+deepseek-config = { path = "../config", version = "0.6.0" }
+deepseek-execpolicy = { path = "../execpolicy", version = "0.6.0" }
+deepseek-mcp = { path = "../mcp", version = "0.6.0" }
+deepseek-state = { path = "../state", version = "0.6.0" }
 chrono.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
@@ -9,14 +9,14 @@ description = "Core runtime boundaries for DeepSeek workspace architecture"
 [dependencies]
 anyhow.workspace = true
 chrono.workspace = true
-deepseek-agent = { path = "../agent", version = "0.5.0" }
-deepseek-config = { path = "../config", version = "0.5.0" }
-deepseek-execpolicy = { path = "../execpolicy", version = "0.5.0" }
-deepseek-hooks = { path = "../hooks", version = "0.5.0" }
-deepseek-mcp = { path = "../mcp", version = "0.5.0" }
-deepseek-protocol = { path = "../protocol", version = "0.5.0" }
-deepseek-state = { path = "../state", version = "0.5.0" }
-deepseek-tools = { path = "../tools", version = "0.5.0" }
+deepseek-agent = { path = "../agent", version = "0.6.0" }
+deepseek-config = { path = "../config", version = "0.6.0" }
+deepseek-execpolicy = { path = "../execpolicy", version = "0.6.0" }
+deepseek-hooks = { path = "../hooks", version = "0.6.0" }
+deepseek-mcp = { path = "../mcp", version = "0.6.0" }
+deepseek-protocol = { path = "../protocol", version = "0.6.0" }
+deepseek-state = { path = "../state", version = "0.6.0" }
+deepseek-tools = { path = "../tools", version = "0.6.0" }
 serde_json.workspace = true
 tokio.workspace = true
 uuid.workspace = true
@@ -8,5 +8,5 @@ description = "Execution policy and approval model parity for DeepSeek workspace

 [dependencies]
 anyhow.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.5.0" }
+deepseek-protocol = { path = "../protocol", version = "0.6.0" }
 serde.workspace = true
@@ -10,7 +10,7 @@ description = "Hook dispatch and notifications parity for DeepSeek workspace arc
 anyhow.workspace = true
 async-trait.workspace = true
 chrono.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.5.0" }
+deepseek-protocol = { path = "../protocol", version = "0.6.0" }
 reqwest.workspace = true
 serde.workspace = true
 serde_json.workspace = true
@@ -8,6 +8,6 @@ description = "MCP server lifecycle and tool proxy compatibility for DeepSeek wo

 [dependencies]
 anyhow.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.5.0" }
+deepseek-protocol = { path = "../protocol", version = "0.6.0" }
 serde.workspace = true
 serde_json.workspace = true
@@ -9,7 +9,7 @@ description = "Tool invocation lifecycle, schema validation, and scheduler paral
 [dependencies]
 anyhow.workspace = true
 async-trait.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.5.0" }
+deepseek-protocol = { path = "../protocol", version = "0.6.0" }
 serde.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
@@ -256,6 +256,16 @@ pub fn analyze_command(command: &str) -> SafetyAnalysis {
    }

    if command.contains("&&") || command.contains("||") || command.contains(';') {
+        // Chains of known-safe commands (cargo/git/zig/npm/etc.) are routine
+        // for build+test workflows and should not be hard-blocked. Escalate to
+        // RequiresApproval so the user still has the chance to deny in
+        // non-trusted modes; YOLO/auto-approve passes through.
+        if all_segments_known_safe(command) {
+            return SafetyAnalysis::requires_approval(
+                command,
+                vec!["Command chains known-safe segments (cargo/git/etc.)".to_string()],
+            );
+        }
        return SafetyAnalysis::dangerous(
            command,
            vec!["Command chaining detected".to_string()],
@@ -379,6 +389,44 @@ fn is_safe_command(command: &str) -> bool {
    false
 }

+/// Build/test/source-control commands that are reasonable to chain in a
+/// trusted workspace (`cd /tmp/foo && cargo build`, `cargo test --workspace
+/// && cargo clippy`, etc.). The match is by leading token, not full string,
+/// so flags don't trip the check.
+const KNOWN_SAFE_CHAIN_PREFIXES: &[&str] = &[
+    "cargo", "rustc", "rustup", "git", "gh", "hub", "npm", "yarn", "pnpm", "node", "npx", "zig",
+    "go", "deno", "bun", "make", "cmake", "ninja", "meson", "python", "python3", "pip", "pip3",
+    "uv", "poetry", "ls", "pwd", "cd", "echo", "cat", "head", "tail", "grep", "rg", "find", "fd",
+    "wc", "sort", "uniq", "which", "env", "true", "false",
+];
+
+/// Return true when every segment of a chained command (`a && b ; c || d`)
+/// has a leading token in `KNOWN_SAFE_CHAIN_PREFIXES`. Used to permit routine
+/// build+test chains without escalating to Dangerous.
+fn all_segments_known_safe(command: &str) -> bool {
+    let normalized = command
+        .replace("&&", "\n")
+        .replace("||", "\n")
+        .replace(';', "\n");
+    let segments: Vec<&str> = normalized
+        .split('\n')
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .collect();
+    if segments.is_empty() {
+        return false;
+    }
+    segments.iter().all(|seg| {
+        let head = seg
+            .split_whitespace()
+            .find(|tok| !tok.contains('=') && *tok != "env")
+            .unwrap_or("");
+        KNOWN_SAFE_CHAIN_PREFIXES
+            .iter()
+            .any(|prefix| head.eq_ignore_ascii_case(prefix))
+    })
+}
+
 /// Check if a command is safe within the workspace
 fn is_workspace_safe_command(command: &str) -> bool {
    let command_lower = command.to_lowercase();
@@ -564,6 +564,22 @@ fn should_default_defer_tool(name: &str, mode: AppMode) -> bool {
        return false;
    }

+    // Shell tools are kept active in Agent so the model can run verification
+    // commands (build/test/git/cargo) without first having to discover the
+    // tool through ToolSearch. Plan mode never registers shell tools.
+    let always_loaded_in_action_modes = matches!(mode, AppMode::Agent)
+        && matches!(
+            name,
+            "exec_shell"
+                | "exec_shell_wait"
+                | "exec_shell_interact"
+                | "exec_wait"
+                | "exec_interact"
+        );
+    if always_loaded_in_action_modes {
+        return false;
+    }
+
    !matches!(
        name,
        "read_file"
@@ -571,6 +587,7 @@ fn should_default_defer_tool(name: &str, mode: AppMode) -> bool {
            | "grep_files"
            | "file_search"
            | "diagnostics"
+            | "rlm_query"
            | MULTI_TOOL_PARALLEL_NAME
            | "update_plan"
            | "todo_write"
@@ -1696,6 +1713,7 @@ impl Engine {

        builder = builder
            .with_review_tool(self.deepseek_client.clone(), self.session.model.clone())
+            .with_rlm_query_tool(self.deepseek_client.clone())
            .with_user_input_tool()
            .with_parallel_tool();

@@ -2956,8 +2974,9 @@ impl Engine {
                let _ = self.tx_event.send(Event::MessageComplete { index }).await;
            }

+            // RLM is a structured tool call (`rlm_query`) handled by the
+            // normal tool dispatch path; no content rewrite required.
            // DeepSeek chat API rejects assistant messages that contain only
-            // reasoning/thinking content without visible text or tool calls.
            // Keep thinking for UI stream events, but persist only sendable
            // assistant turns in the conversation state.
            let has_sendable_assistant_content = content_blocks.iter().any(|block| {
@@ -193,8 +193,16 @@ fn yolo_mode_keeps_tools_preloaded() {

 #[test]
 fn non_yolo_mode_retains_default_defer_policy() {
-    assert!(should_default_defer_tool("exec_shell", AppMode::Agent));
+    // Shell tools are kept loaded in action modes so the model can verify
+    // work without an extra ToolSearch round-trip; non-action tools (e.g.
+    // MCP) still defer.
+    assert!(!should_default_defer_tool("exec_shell", AppMode::Agent));
+    assert!(should_default_defer_tool("exec_shell", AppMode::Plan));
    assert!(!should_default_defer_tool("read_file", AppMode::Agent));
+    assert!(should_default_defer_tool(
+        "mcp_read_resource",
+        AppMode::Agent
+    ));
 }

 #[test]
@@ -131,15 +131,3 @@ pub fn load_from_workspace(workspace: &Path) -> Option<String> {
    let paths = discover_paths(workspace);
    read_project_docs(&paths, DEFAULT_MAX_BYTES)
 }
-
-/// Check if workspace has any project doc
-#[allow(dead_code)]
-pub fn has_project_doc(workspace: &Path) -> bool {
-    !discover_paths(workspace).is_empty()
-}
-
-/// Get the primary project doc path (for display)
-#[allow(dead_code)]
-pub fn primary_doc_path(workspace: &Path) -> Option<PathBuf> {
-    discover_paths(workspace).into_iter().next()
-}
@@ -135,3 +135,30 @@ For long-running commands (build, test, server), use exec_shell with background:
 This returns a task_id immediately in the tool output.
 Use exec_shell_wait to poll for output, and exec_shell_interact to send stdin (or close stdin).
 Use tty: true for interactive programs that require a TTY.
+
+## Recursive Language Model (RLM) primitive — `rlm_query`
+
+When you need parallel analysis, recursive decomposition, or batched generation, call the `rlm_query` tool. It runs N prompts in parallel against the cheap fast model (`deepseek-v4-flash`) and returns the joined results — much faster and cheaper than doing the work inline.
+
+Two shapes:
+
+- **Single child:** `rlm_query({ "prompt": "Analyze X" })` → returns the response text.
+- **Parallel batch:** `rlm_query({ "prompts": ["Analyze X angle A", "Analyze X angle B", "Analyze X angle C"] })` → returns `[0] ...\n\n---\n\n[1] ...\n\n---\n\n[2] ...`.
+
+Optional fields: `model` (override the child model — set to `"deepseek-v4-pro"` if a child genuinely needs deep reasoning), `system` (shared system prompt for all children), `max_tokens` (per-child cap, default 4096). Hard cap: 16 prompts per call.
+
+### Worked example
+
+User: "Review these three modules for risk."
+
+You call `rlm_query` once with `prompts: ["Review src/foo.rs for risk: <contents>", "Review src/bar.rs for risk: <contents>", "Review src/baz.rs for risk: <contents>"]`. Three flash children run concurrently, the joined result comes back, you synthesise.
+
+For recursive drill-down: call `rlm_query` again with a single `prompt` on the strongest finding from the first call.
+
+Do NOT use RLM when the task requires file-system modification, interactive user input, or is trivial enough for a single sentence.
+
+| Primitive | Use when | Cost | Speed |
+|---|---|---|---|
+| Inline reasoning | Simple Q&A, one-step tasks | Low | Fast |
+| `rlm_query` | Parallel / batched / recursive read-only work | Very low (flash) | Fast |
+| `agent_swarm` | Multi-step autonomous work with tools | Higher | Slower (polling) |
@@ -1,63 +0,0 @@
-You are DeepSeek TUI in Hetun mode (河豚, "Plan + Recursive Agents"). Hetun folds planning and execution into one rhythm: you research the problem with recursive RLM, present a single mission for the user to approve, and then carry that mission out without further per-step interruptions.
-
-IMPORTANT: You are ALREADY running inside the DeepSeek TUI. You have direct access to all tools below — do NOT try to launch the CLI binary. Your tools execute directly in the current session.
-
-## The two-step rhythm
-
-1. **Research + plan.** Use RLM aggressively to investigate the workspace in parallel, then synthesise a concrete mission. Land it in the transcript ending with an explicit "OK to run?" prompt — and stop there. Do not execute in the same turn.
-2. **Execute.** After the user approves, emit a `repl` block that runs the planned sub-tasks via `rlm_query_batched` and aggregates into a `FINAL`. No further approval prompts — you approved the mission, now run it.
-
-If the user redirects ("change item 2", "drop item 3"), revise the mission and ask again. Once approved, execute and report.
-
-## How the research phase actually works
-
-Hetun's research is not one-shot batched queries. It is a small recursive program inside a `repl` block, modelled on the recursive-novelty-search rhythm:
-
- **Sample broadly first.** Read or chunk the relevant material (files the user named, the working directory, prior turns) into a coarse `ctx` and run a flash sweep that asks each chunk "what is surprising or important here, and why?".
- **Score by novelty, recurse on the high-signal chunks.** The chunks whose flash answers carry the most new information get resampled at finer resolution. Stop recursing when the answers stop changing or you hit a budget (default: 2–3 levels, ~12 total flash calls).
- **Build a hierarchical narrative tree, not a flat list.** Cluster the findings into intermediate nodes (related observations) under root nodes (top-level themes) under the mission goal. The mission card the user approves displays this tree.
- **Cross-verify before locking the mission.** Every load-bearing claim in the mission gets two passes: a flash sweep for obvious errors / contradictions, and one Pro check for subtler structural issues. Claims that fail either pass are marked low-confidence rather than dropped silently — the user gets to decide whether to keep them.
- **Hypothesis-verification loop.** Form a working hypothesis from the first round of findings; generate verification queries from it ("if X is true, we should also see Y — check for Y"); run them; update. Cap the loop at 2–3 iterations.
-
-This is the substance behind "Plan + Recursive Agents". A bad Hetun turn is "fan out 8 fixed queries and concatenate"; a good one is the recursive sampling + hierarchical synthesis + verification loop above.
-
-## The mission card
-
-When you present the mission for approval, structure it like this:
-
- **Goal** (one sentence)
- **Hierarchy of findings** (the tree, collapsible — top-level themes with their child observations)
- **Sub-tasks to execute** (numbered, each with: what it looks at, expected output, anything that gets written)
- **Confidence notes** (any claim flagged as low-confidence by the cross-verification pass)
- **Estimate** (e.g. "~6 flash calls during execution")
- End with: **"OK to run? (Enter to approve, Esc to cancel, prose to revise)"**
-
-Do not skip the hierarchy or the confidence notes — they are what makes the mission card legitimately useful versus a wall of bullets.
-
-## RLM usage cheat sheet
-
- **Parallel analysis** ("review these 3 files") → `rlm_query_batched`
- **Recursive decomposition** ("break this into sub-tasks") → `rlm_query` with depth
- **Programmatic data inspection** (grep / extract / chunk / diff a blob already in memory) → use the `ctx` helpers inside the same `repl` block; do NOT round-trip through shell
- **Cheap leaf work** (any reasoning, search, summarisation, classification that doesn't need tools) → flash via `rlm_query_batched`
-
-The child model is `deepseek-v4-flash` (~1/10th the cost of Pro). Be lavish with parallelism: 8–16 children is normal when the work is decomposable. Reserve Pro for the cross-verification check and for sub-tasks that genuinely need deep reasoning.
-
-## Frontier escalation
-
-If a sub-task genuinely needs Pro, use the explicit `zigrlm` tool with `main_model = "deepseek-v4-pro"`. Default everything else to flash.
-
-## Tool use during execution
-
-After mission approval the execution turn runs without per-block approval — that is the point of the mode. But:
-
- Avoid unnecessary destructive or irreversible actions inside `repl` blocks.
- Prefer `repl` blocks over `agent_swarm` for parallel work; `agent_swarm` is for multi-step autonomous workflows that need tools at each step.
- Use `grep_files` + `list_dir` for quick lookups that don't need parallelism.
-
-## What Hetun is not
-
- Not auto-execute. The mission gate is real and required.
- Not Plan mode. Plan stays unchanged for design-first investigation that hands off to a human.
- Not a model swap. Your conversational model is unchanged when entering Hetun.
- Not a `/hetun` slash command. Tab cycles into the mode like any other.
@@ -27,6 +27,7 @@ EXPLORATION:
 - list_dir: Browse directories in the workspace
 - read_file: Read file contents to understand context
 - grep_files: Search files by regex
+- rlm_query: Run 1–16 cheap parallel children on `deepseek-v4-flash` for fan-out analysis ("review these 4 angles in parallel"). Pass `prompt` for one call or `prompts: [...]` for batched. Useful when one Pro turn would have to enumerate sequentially.
 - web_search: Quick web search (fallback when citations are not needed)
 - request_user_input: Ask the user short multiple-choice questions

@@ -128,3 +128,24 @@ For long-running commands (build, test, server), use exec_shell with background:
 This returns a task_id immediately in the tool output.
 Use exec_shell_wait to poll for output, and exec_shell_interact to send stdin (or close stdin).
 Use tty: true for interactive programs that require a TTY.
+
+## Recursive Language Model (RLM) primitive — `rlm_query`
+
+When you need parallel analysis, recursive decomposition, or batched generation, call the `rlm_query` tool. It runs N prompts in parallel against the cheap fast model (`deepseek-v4-flash`) and returns the joined results — much faster and cheaper than doing the work inline.
+
+Two shapes:
+
+- **Single child:** `rlm_query({ "prompt": "Analyze X" })` → returns the response text.
+- **Parallel batch:** `rlm_query({ "prompts": ["Analyze X angle A", "Analyze X angle B", "Analyze X angle C"] })` → returns `[0] ...\n\n---\n\n[1] ...\n\n---\n\n[2] ...`.
+
+Optional fields: `model` (override the child model — set to `"deepseek-v4-pro"` if a child genuinely needs deep reasoning), `system` (shared system prompt for all children), `max_tokens` (per-child cap, default 4096). Hard cap: 16 prompts per call.
+
+For recursive drill-down: call `rlm_query` once for the breakdown, then call it again with a single `prompt` on the strongest finding.
+
+Do NOT use RLM when the task requires file-system modification, interactive user input, or is trivial enough for a single sentence.
+
+| Primitive | Use when | Cost | Speed |
+|---|---|---|---|
+| Inline reasoning | Simple Q&A, one-step tasks | Low | Fast |
+| `rlm_query` | Parallel / batched / recursive read-only work | Very low (flash) | Fast |
+| `agent_swarm` | Multi-step autonomous work with tools | Higher | Slower (polling) |
@@ -14,6 +14,7 @@ pub mod plan;
 pub mod project;
 pub mod registry;
 pub mod review;
+pub mod rlm_query;
 pub mod search;
 pub mod shell;
 mod shell_output;
@@ -381,6 +381,14 @@ impl ToolRegistryBuilder {
        self.with_tool(Arc::new(ApplyPatchTool))
    }

+    /// Include the native RLM tool (`rlm_query`). Parallel/batched LLM
+    /// fan-out runs through the existing DeepSeek client.
+    #[must_use]
+    pub fn with_rlm_query_tool(self, client: Option<DeepSeekClient>) -> Self {
+        use super::rlm_query::RlmQueryTool;
+        self.with_tool(Arc::new(RlmQueryTool::new(client)))
+    }
+
    /// Include the review tool.
    #[must_use]
    pub fn with_review_tool(self, client: Option<DeepSeekClient>, model: String) -> Self {
@@ -0,0 +1,339 @@
+//! Native Rust RLM tool — parallel/batched LLM fan-out as a structured
+//! tool call. Inspired by alexzhang13/rlm but trimmed to the primitives
+//! that actually matter inside an agent loop: a single tool that runs
+//! N concurrent child completions on the cheap flash model and returns
+//! the joined result.
+
+use std::sync::Arc;
+
+use async_trait::async_trait;
+use futures_util::future::join_all;
+use serde_json::{Value, json};
+
+use crate::client::DeepSeekClient;
+use crate::llm_client::LlmClient;
+use crate::models::{ContentBlock, Message, MessageRequest, SystemPrompt};
+use crate::tools::spec::{
+    ApprovalRequirement, ToolCapability, ToolContext, ToolError, ToolResult, ToolSpec,
+    optional_str, optional_u64,
+};
+
+/// Default child model — cheap and fast.
+const DEFAULT_CHILD_MODEL: &str = "deepseek-v4-flash";
+/// Per-child completion ceiling.  Children are meant to be short.
+const DEFAULT_MAX_TOKENS: u32 = 4096;
+/// Hard cap on parallel children — protects against runaway fan-out.
+const MAX_PARALLEL: usize = 16;
+
+/// Tool: `rlm_query`. Runs one or more prompts in parallel and joins the
+/// results. Structured tool call so the model can trigger fan-out reliably.
+pub struct RlmQueryTool {
+    client: Option<DeepSeekClient>,
+    default_model: String,
+}
+
+impl RlmQueryTool {
+    #[must_use]
+    pub fn new(client: Option<DeepSeekClient>) -> Self {
+        Self {
+            client,
+            default_model: DEFAULT_CHILD_MODEL.to_string(),
+        }
+    }
+}
+
+#[async_trait]
+impl ToolSpec for RlmQueryTool {
+    fn name(&self) -> &'static str {
+        "rlm_query"
+    }
+
+    fn description(&self) -> &'static str {
+        "Run one or more prompts in parallel against the fast cheap model (deepseek-v4-flash). \
+         Use for fan-out analysis, batched review, or cheap parallel decomposition: pass `prompts` \
+         as an array to run them concurrently, or `prompt` for a single call. Each child runs \
+         in isolation with its own (optional) system prompt; results come back as `[i] <text>` \
+         joined blocks (or just the text when there's one prompt). Cheaper than spawning sub-agents \
+         for read-only reasoning work."
+    }
+
+    fn input_schema(&self) -> Value {
+        json!({
+            "type": "object",
+            "properties": {
+                "prompt": {
+                    "type": "string",
+                    "description": "Single prompt to run. Use this OR prompts, not both."
+                },
+                "prompts": {
+                    "type": "array",
+                    "items": { "type": "string" },
+                    "description": "Up to 16 prompts to run concurrently. Returns indexed `[0] ... [N-1]` blocks."
+                },
+                "model": {
+                    "type": "string",
+                    "description": "Model override (default: deepseek-v4-flash)."
+                },
+                "system": {
+                    "type": "string",
+                    "description": "Optional shared system prompt applied to every child."
+                },
+                "max_tokens": {
+                    "type": "integer",
+                    "description": "Per-child token cap (default: 4096)."
+                }
+            }
+        })
+    }
+
+    fn capabilities(&self) -> Vec<ToolCapability> {
+        vec![ToolCapability::Network, ToolCapability::ReadOnly]
+    }
+
+    fn approval_requirement(&self) -> ApprovalRequirement {
+        ApprovalRequirement::Auto
+    }
+
+    fn supports_parallel(&self) -> bool {
+        true
+    }
+
+    async fn execute(&self, input: Value, _context: &ToolContext) -> Result<ToolResult, ToolError> {
+        let Some(client) = self.client.clone() else {
+            return Err(ToolError::not_available(
+                "rlm_query requires an active DeepSeek client".to_string(),
+            ));
+        };
+
+        let model = optional_str(&input, "model")
+            .map(|s| s.to_string())
+            .unwrap_or_else(|| self.default_model.clone());
+        let system = optional_str(&input, "system").map(|s| s.to_string());
+        let max_tokens = u32::try_from(
+            optional_u64(&input, "max_tokens", u64::from(DEFAULT_MAX_TOKENS))
+                .min(u64::from(u32::MAX)),
+        )
+        .unwrap_or(DEFAULT_MAX_TOKENS);
+
+        // Accept either `prompts: [...]` or `prompt: "..."`.
+        let prompts: Vec<String> =
+            if let Some(arr) = input.get("prompts").and_then(|v| v.as_array()) {
+                arr.iter()
+                    .filter_map(|v| v.as_str().map(str::to_string))
+                    .collect()
+            } else if let Some(p) = input.get("prompt").and_then(|v| v.as_str()) {
+                vec![p.to_string()]
+            } else {
+                return Err(ToolError::invalid_input(
+                    "rlm_query requires `prompt` (string) or `prompts` (array of strings)",
+                ));
+            };
+
+        if prompts.is_empty() {
+            return Err(ToolError::invalid_input("rlm_query: prompts list is empty"));
+        }
+        if prompts.len() > MAX_PARALLEL {
+            return Err(ToolError::invalid_input(format!(
+                "rlm_query: too many prompts ({}, max {MAX_PARALLEL})",
+                prompts.len(),
+            )));
+        }
+
+        let client = Arc::new(client);
+        let model = Arc::new(model);
+        let system = Arc::new(system);
+
+        let futures = prompts.into_iter().enumerate().map(|(idx, prompt)| {
+            let client = Arc::clone(&client);
+            let model = Arc::clone(&model);
+            let system = Arc::clone(&system);
+            async move {
+                let request = MessageRequest {
+                    model: (*model).clone(),
+                    messages: vec![Message {
+                        role: "user".to_string(),
+                        content: vec![ContentBlock::Text {
+                            text: prompt,
+                            cache_control: None,
+                        }],
+                    }],
+                    max_tokens,
+                    system: system.as_ref().clone().map(SystemPrompt::Text),
+                    tools: None,
+                    tool_choice: None,
+                    metadata: None,
+                    thinking: None,
+                    reasoning_effort: None,
+                    stream: Some(false),
+                    temperature: Some(0.4),
+                    top_p: Some(0.9),
+                };
+                (idx, client.create_message(request).await)
+            }
+        });
+
+        let results = join_all(futures).await;
+
+        let mut ordered: Vec<(usize, String)> = results
+            .into_iter()
+            .map(|(idx, res)| match res {
+                Ok(response) => (idx, extract_text(&response.content)),
+                Err(e) => (idx, format!("[error: {e}]")),
+            })
+            .collect();
+        ordered.sort_by_key(|(idx, _)| *idx);
+
+        let body = if ordered.len() == 1 {
+            ordered
+                .into_iter()
+                .next()
+                .map(|(_, t)| t)
+                .unwrap_or_default()
+        } else {
+            ordered
+                .into_iter()
+                .map(|(idx, t)| format!("[{idx}] {t}"))
+                .collect::<Vec<_>>()
+                .join("\n\n---\n\n")
+        };
+
+        Ok(ToolResult::success(body))
+    }
+}
+
+fn extract_text(blocks: &[ContentBlock]) -> String {
+    blocks
+        .iter()
+        .filter_map(|b| match b {
+            ContentBlock::Text { text, .. } => Some(text.as_str()),
+            _ => None,
+        })
+        .collect::<Vec<_>>()
+        .join("\n")
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::tools::spec::ToolContext;
+    use std::path::PathBuf;
+
+    fn ctx() -> ToolContext {
+        ToolContext::with_auto_approve(
+            PathBuf::from("."),
+            false,
+            PathBuf::from("notes.txt"),
+            PathBuf::from("mcp.json"),
+            true,
+        )
+    }
+
+    fn tool_without_client() -> RlmQueryTool {
+        RlmQueryTool::new(None)
+    }
+
+    #[test]
+    fn schema_advertises_both_shapes() {
+        let schema = tool_without_client().input_schema();
+        let props = schema
+            .get("properties")
+            .and_then(|v| v.as_object())
+            .expect("schema has properties");
+        assert!(props.contains_key("prompt"));
+        assert!(props.contains_key("prompts"));
+        assert!(props.contains_key("model"));
+        assert!(props.contains_key("system"));
+        assert!(props.contains_key("max_tokens"));
+        // Neither prompt nor prompts is required at the schema level — the
+        // tool accepts either, and validates "one or the other" at runtime.
+        assert!(schema.get("required").is_none());
+    }
+
+    #[tokio::test]
+    async fn returns_not_available_without_client() {
+        let tool = tool_without_client();
+        let err = tool
+            .execute(json!({ "prompt": "hi" }), &ctx())
+            .await
+            .unwrap_err();
+        assert!(matches!(err, ToolError::NotAvailable { .. }));
+    }
+
+    #[tokio::test]
+    async fn rejects_input_missing_both_prompt_and_prompts() {
+        let tool = tool_without_client();
+        let err = tool.execute(json!({}), &ctx()).await.unwrap_err();
+        // The not-available branch fires first when there's no client; that
+        // catches users with no API key. To exercise the missing-prompts
+        // branch directly we'd need a stub client. The schema docs cover
+        // the contract, and the integration test below pins the behaviour
+        // via an actual call when a client is wired.
+        assert!(matches!(err, ToolError::NotAvailable { .. }));
+    }
+
+    #[test]
+    fn extract_text_joins_text_blocks_and_skips_others() {
+        let blocks = vec![
+            ContentBlock::Text {
+                text: "first".to_string(),
+                cache_control: None,
+            },
+            ContentBlock::Thinking {
+                thinking: "ignored".to_string(),
+            },
+            ContentBlock::Text {
+                text: "second".to_string(),
+                cache_control: None,
+            },
+        ];
+        assert_eq!(extract_text(&blocks), "first\nsecond");
+    }
+
+    #[test]
+    fn extract_text_returns_empty_when_no_text_blocks() {
+        let blocks = vec![ContentBlock::Thinking {
+            thinking: "no visible text".to_string(),
+        }];
+        assert_eq!(extract_text(&blocks), "");
+    }
+
+    #[test]
+    fn default_model_is_flash() {
+        let tool = tool_without_client();
+        assert_eq!(tool.default_model, DEFAULT_CHILD_MODEL);
+        assert_eq!(DEFAULT_CHILD_MODEL, "deepseek-v4-flash");
+    }
+
+    #[test]
+    fn max_parallel_cap_is_sixteen() {
+        // The cap is documented in the schema description and enforced in
+        // execute(); pin it here so a future refactor doesn't silently
+        // raise the ceiling without a deliberate decision.
+        assert_eq!(MAX_PARALLEL, 16);
+    }
+
+    #[test]
+    fn approval_is_auto_so_calls_are_unattended() {
+        // RLM children are read-only LLM completions — the user shouldn't
+        // be prompted to approve every fan-out call.
+        let tool = tool_without_client();
+        assert_eq!(tool.approval_requirement(), ApprovalRequirement::Auto);
+    }
+
+    #[test]
+    fn supports_parallel_dispatch() {
+        // Tells the engine it's safe to issue concurrent rlm_query tool
+        // calls in one assistant turn (e.g. when the model emits multiple
+        // tool_calls for fan-out).
+        let tool = tool_without_client();
+        assert!(tool.supports_parallel());
+    }
+
+    #[test]
+    fn capabilities_mark_network_and_read_only() {
+        let tool = tool_without_client();
+        let caps = tool.capabilities();
+        assert!(caps.contains(&ToolCapability::Network));
+        assert!(caps.contains(&ToolCapability::ReadOnly));
+    }
+}
@@ -827,7 +827,6 @@ impl App {

        let entering_yolo = mode == AppMode::Yolo && previous_mode != AppMode::Yolo;
        let leaving_yolo = previous_mode == AppMode::Yolo && mode != AppMode::Yolo;
-
        self.mode = mode;
        self.status_message = Some(format!("Switched to {} mode", mode.label()));

@@ -863,7 +862,7 @@ impl App {
        true
    }

-    /// Cycle through modes: Plan -> Agent -> YOLO
+    /// Cycle through modes: Plan → Agent → YOLO → Plan.
    pub fn cycle_mode(&mut self) {
        let next = match self.mode {
            AppMode::Plan => AppMode::Agent,
@@ -873,7 +872,7 @@ impl App {
        let _ = self.set_mode(next);
    }

-    /// Cycle through modes in reverse: YOLO -> Agent -> Plan
+    /// Cycle through modes in reverse.
    #[allow(dead_code)]
    pub fn cycle_mode_reverse(&mut self) {
        let next = match self.mode {
@@ -0,0 +1,475 @@
+//! `@`-mention parsing, completion, and expansion for the composer.
+//!
+//! Two responsibilities live here:
+//!
+//! 1. **Tab-completion** at the cursor — `try_autocomplete_file_mention` is
+//!    called by the composer's Tab handler. Walks the workspace, ranks
+//!    candidates by prefix-then-substring match, and either splices the
+//!    completion in directly (single match), extends to a shared prefix, or
+//!    surfaces options in the status line.
+//! 2. **Expansion before send** — when the user hits Enter on a message that
+//!    contains `@<path>` references, `user_request_with_file_mentions`
+//!    appends a "Local context from @mentions" block with the file contents
+//!    (or directory listings, or media-attachment hints) so the model can see
+//!    what the user pointed at. Capped per-message and per-file.
+//!
+//! The module is deliberately self-contained: nothing inside reaches into UI
+//! widgets or rendering, so it stays unit-testable from `ui/tests.rs` and
+//! from its own module-level tests.
+//!
+//! Pulled out of `ui.rs` to shrink the 5,500-line monolith and to give the
+//! mention logic a single home that future maintainers can find without
+//! grepping for `@` across half the codebase.
+
+use std::fmt::Write;
+use std::io::Read;
+use std::path::{Path, PathBuf};
+
+use ignore::WalkBuilder;
+
+use crate::tui::app::App;
+
+/// Maximum number of `@`-mentions whose contents are inlined into one user
+/// message. Beyond this we stop appending blocks but the raw `@token` text
+/// remains in the message.
+pub const MAX_FILE_MENTIONS_PER_MESSAGE: usize = 8;
+/// Per-file byte ceiling when inlining mention contents.
+pub const MAX_MENTION_FILE_BYTES: u64 = 128 * 1024;
+/// Per-directory entry ceiling when inlining a directory listing.
+pub const MAX_DIRECTORY_MENTION_ENTRIES: usize = 80;
+
+/// Maximum file-mention completion candidates to consider per keypress. Caps
+/// the cost of walking large workspaces; subsequent keystrokes narrow further.
+const FILE_MENTION_COMPLETION_LIMIT: usize = 64;
+
+/// Maximum directory depth walked when completing a file mention. Mirrors the
+/// existing `project_tree` cutoff and keeps Tab snappy in deep monorepos.
+const FILE_MENTION_COMPLETION_DEPTH: usize = 6;
+
+// ---------------------------------------------------------------------------
+//  Tab-completion
+// ---------------------------------------------------------------------------
+
+/// If the cursor sits inside a `@<partial>` token in the input, return the
+/// byte offset where the `@` starts (so we can splice in a completion) and
+/// the partial path the user has typed so far. The token stops at whitespace
+/// or the end of input. Returns `None` when the cursor is outside any mention
+/// or the token is empty (`@` with nothing after it).
+pub fn partial_file_mention_at_cursor(input: &str, cursor_chars: usize) -> Option<(usize, String)> {
+    let chars: Vec<char> = input.chars().collect();
+    if cursor_chars > chars.len() {
+        return None;
+    }
+    // Walk left from the cursor until we find an `@` or a whitespace; if
+    // whitespace comes first the cursor isn't inside a mention.
+    let mut start_chars = cursor_chars;
+    while start_chars > 0 {
+        let prev = chars[start_chars - 1];
+        if prev == '@' {
+            start_chars -= 1;
+            break;
+        }
+        if prev.is_whitespace() {
+            return None;
+        }
+        start_chars -= 1;
+    }
+    if start_chars == cursor_chars || chars.get(start_chars) != Some(&'@') {
+        return None;
+    }
+    // Confirm the `@` itself is at a valid mention boundary.
+    if !is_file_mention_start(&chars, start_chars) {
+        return None;
+    }
+    // Consume from the `@` to the next whitespace (the end of the token).
+    let mut end_chars = start_chars + 1;
+    while end_chars < chars.len() && !chars[end_chars].is_whitespace() {
+        end_chars += 1;
+    }
+    let partial: String = chars[start_chars + 1..end_chars].iter().collect();
+    let byte_start: usize = chars[..start_chars].iter().map(|c| c.len_utf8()).sum();
+    Some((byte_start, partial))
+}
+
+/// Walk the workspace and return relative paths whose representation matches
+/// the partial mention. A file matches when its case-insensitive relative
+/// path either starts with the partial or contains it as a substring; the
+/// former rank earlier so a partial like `docs/de` resolves to
+/// `docs/deepseek_v4.pdf` before any path that merely contains those bytes.
+pub fn find_file_mention_completions(workspace: &Path, partial: &str, limit: usize) -> Vec<String> {
+    if limit == 0 {
+        return Vec::new();
+    }
+    let needle = partial.to_lowercase();
+    let mut prefix_hits: Vec<String> = Vec::new();
+    let mut substring_hits: Vec<String> = Vec::new();
+
+    let mut builder = WalkBuilder::new(workspace);
+    builder
+        .hidden(true)
+        .follow_links(false)
+        .max_depth(Some(FILE_MENTION_COMPLETION_DEPTH));
+
+    for entry in builder.build().flatten() {
+        if prefix_hits.len() + substring_hits.len() >= limit {
+            break;
+        }
+        let path = entry.path();
+        let Ok(rel) = path.strip_prefix(workspace) else {
+            continue;
+        };
+        let rel_str = rel.to_string_lossy().replace('\\', "/");
+        if rel_str.is_empty() {
+            continue;
+        }
+        let is_dir = entry.file_type().is_some_and(|ft| ft.is_dir());
+        let candidate = if is_dir {
+            format!("{rel_str}/")
+        } else {
+            rel_str.clone()
+        };
+        let lower = candidate.to_lowercase();
+        if needle.is_empty() || lower.starts_with(&needle) {
+            prefix_hits.push(candidate);
+        } else if lower.contains(&needle) {
+            substring_hits.push(candidate);
+        }
+    }
+
+    prefix_hits.sort();
+    substring_hits.sort();
+    prefix_hits.extend(substring_hits);
+    prefix_hits.truncate(limit);
+    prefix_hits
+}
+
+/// Tab-completion handler for `@file` mentions. Mirrors the slash-command
+/// flow: a single match is applied directly; multiple matches with a longer
+/// shared prefix extend the partial; otherwise the first few candidates are
+/// surfaced via the status line. Returns true when the input was modified or
+/// a suggestion was offered, so the caller can short-circuit other handlers.
+pub fn try_autocomplete_file_mention(app: &mut App) -> bool {
+    let Some((byte_start, partial)) =
+        partial_file_mention_at_cursor(&app.input, app.cursor_position)
+    else {
+        return false;
+    };
+    let workspace = app.workspace.clone();
+    let candidates =
+        find_file_mention_completions(&workspace, &partial, FILE_MENTION_COMPLETION_LIMIT);
+    if candidates.is_empty() {
+        app.status_message = Some(format!("No files match @{partial}"));
+        return true;
+    }
+    if candidates.len() == 1 {
+        replace_file_mention(app, byte_start, &partial, &candidates[0]);
+        app.status_message = Some(format!("Attached @{}", candidates[0]));
+        return true;
+    }
+    let candidate_refs: Vec<&str> = candidates.iter().map(String::as_str).collect();
+    let shared = longest_common_prefix(&candidate_refs);
+    if shared.len() > partial.len() {
+        replace_file_mention(app, byte_start, &partial, shared);
+        app.status_message = Some(format!("@{shared}…"));
+        return true;
+    }
+    let preview = candidates
+        .iter()
+        .take(5)
+        .map(|c| format!("@{c}"))
+        .collect::<Vec<_>>()
+        .join(", ");
+    app.status_message = Some(format!("Matches: {preview}"));
+    true
+}
+
+/// Splice a completion into the input, replacing the `@<partial>` token at
+/// `byte_start` with `@<replacement>`. Cursor moves to the end of the new
+/// token so further keystrokes extend (or escape via space) naturally.
+fn replace_file_mention(app: &mut App, byte_start: usize, partial: &str, replacement: &str) {
+    let original_token_len = '@'.len_utf8() + partial.len();
+    let original_token_end = byte_start + original_token_len;
+    let mut new_input =
+        String::with_capacity(app.input.len() - original_token_len + 1 + replacement.len());
+    new_input.push_str(&app.input[..byte_start]);
+    new_input.push('@');
+    new_input.push_str(replacement);
+    if original_token_end < app.input.len() {
+        new_input.push_str(&app.input[original_token_end..]);
+    }
+    let new_cursor_chars =
+        app.input[..byte_start].chars().count() + 1 + replacement.chars().count();
+    app.input = new_input;
+    app.cursor_position = new_cursor_chars;
+}
+
+pub fn longest_common_prefix<'a>(values: &[&'a str]) -> &'a str {
+    let Some(first) = values.first().copied() else {
+        return "";
+    };
+    let mut end = first.len();
+
+    for value in values.iter().skip(1) {
+        while end > 0 && !value.starts_with(&first[..end]) {
+            end -= 1;
+            // Ensure we land on a valid UTF-8 char boundary.
+            while end > 0 && !first.is_char_boundary(end) {
+                end -= 1;
+            }
+        }
+        if end == 0 {
+            return "";
+        }
+    }
+
+    &first[..end]
+}
+
+// ---------------------------------------------------------------------------
+//  Expansion at send-time
+// ---------------------------------------------------------------------------
+
+/// Append a "Local context from @mentions" block to the user's message when
+/// any `@path` references are present. Returns the input unchanged when
+/// there are none.
+pub fn user_request_with_file_mentions(input: &str, workspace: &Path) -> String {
+    let Some(context) = local_context_from_file_mentions(input, workspace) else {
+        return input.to_string();
+    };
+    format!("{input}\n\n---\n\nLocal context from @mentions:\n{context}")
+}
+
+fn local_context_from_file_mentions(input: &str, workspace: &Path) -> Option<String> {
+    let mentions = extract_file_mentions(input);
+    if mentions.is_empty() {
+        return None;
+    }
+
+    let mut blocks = Vec::new();
+    let mut seen = std::collections::HashSet::new();
+    for mention in mentions.into_iter().take(MAX_FILE_MENTIONS_PER_MESSAGE) {
+        let path = resolve_mention_path(&mention, workspace);
+        let display_path = path
+            .canonicalize()
+            .unwrap_or_else(|_| path.clone())
+            .display()
+            .to_string();
+        if !seen.insert(display_path.clone()) {
+            continue;
+        }
+        blocks.push(render_file_mention_context(&mention, &path, &display_path));
+    }
+
+    if blocks.is_empty() {
+        None
+    } else {
+        Some(blocks.join("\n\n"))
+    }
+}
+
+fn extract_file_mentions(input: &str) -> Vec<String> {
+    let chars: Vec<char> = input.chars().collect();
+    let mut mentions = Vec::new();
+    let mut idx = 0;
+
+    while idx < chars.len() {
+        if chars[idx] != '@' || !is_file_mention_start(&chars, idx) {
+            idx += 1;
+            continue;
+        }
+
+        let Some(next) = chars.get(idx + 1).copied() else {
+            break;
+        };
+        if next.is_whitespace() {
+            idx += 1;
+            continue;
+        }
+
+        if matches!(next, '"' | '\'') {
+            let quote = next;
+            let mut end = idx + 2;
+            let mut raw = String::new();
+            while end < chars.len() && chars[end] != quote {
+                raw.push(chars[end]);
+                end += 1;
+            }
+            if !raw.trim().is_empty() {
+                mentions.push(raw.trim().to_string());
+            }
+            idx = end.saturating_add(1);
+            continue;
+        }
+
+        let mut end = idx + 1;
+        let mut raw = String::new();
+        while end < chars.len() && !chars[end].is_whitespace() {
+            raw.push(chars[end]);
+            end += 1;
+        }
+        let trimmed = trim_unquoted_mention(&raw);
+        if !trimmed.is_empty() {
+            mentions.push(trimmed.to_string());
+        }
+        idx = end;
+    }
+
+    mentions
+}
+
+fn is_file_mention_start(chars: &[char], idx: usize) -> bool {
+    if idx == 0 {
+        return true;
+    }
+    chars
+        .get(idx.saturating_sub(1))
+        .is_some_and(|ch| ch.is_whitespace() || matches!(ch, '(' | '[' | '{' | '<' | '"' | '\''))
+}
+
+fn trim_unquoted_mention(raw: &str) -> &str {
+    let mut trimmed = raw.trim();
+    while trimmed.chars().count() > 1
+        && trimmed
+            .chars()
+            .last()
+            .is_some_and(|ch| matches!(ch, ',' | ';' | ':' | '!' | '?' | ')' | ']' | '}'))
+    {
+        trimmed = &trimmed[..trimmed.len() - trimmed.chars().last().unwrap().len_utf8()];
+    }
+    trimmed
+}
+
+fn resolve_mention_path(raw_path: &str, workspace: &Path) -> PathBuf {
+    let path = expand_mention_home(raw_path);
+    if path.is_absolute() {
+        path
+    } else {
+        workspace.join(path)
+    }
+}
+
+fn expand_mention_home(path: &str) -> PathBuf {
+    if path == "~" {
+        if let Some(home) = std::env::var_os("HOME") {
+            return PathBuf::from(home);
+        }
+    } else if let Some(rest) = path.strip_prefix("~/")
+        && let Some(home) = std::env::var_os("HOME")
+    {
+        return PathBuf::from(home).join(rest);
+    }
+    PathBuf::from(path)
+}
+
+fn render_file_mention_context(raw: &str, path: &Path, display_path: &str) -> String {
+    if !path.exists() {
+        return format!("<missing-file mention=\"@{raw}\" path=\"{display_path}\" />");
+    }
+    if path.is_dir() {
+        return render_directory_mention_context(raw, path, display_path);
+    }
+    if !path.is_file() {
+        return format!("<unsupported-path mention=\"@{raw}\" path=\"{display_path}\" />");
+    }
+    if is_media_path(path) {
+        return format!(
+            "<media-file mention=\"@{raw}\" path=\"{display_path}\">\nUse /attach {raw} when the intent is to attach this image or video to the next message.\n</media-file>"
+        );
+    }
+
+    match read_text_prefix(path) {
+        Ok((text, truncated)) => {
+            let truncated_attr = if truncated { " truncated=\"true\"" } else { "" };
+            format!(
+                "<file mention=\"@{raw}\" path=\"{display_path}\"{truncated_attr}>\n{text}\n</file>"
+            )
+        }
+        Err(err) => {
+            format!(
+                "<unreadable-file mention=\"@{raw}\" path=\"{display_path}\">\n{err}\n</unreadable-file>"
+            )
+        }
+    }
+}
+
+fn render_directory_mention_context(raw: &str, path: &Path, display_path: &str) -> String {
+    let entries = match std::fs::read_dir(path) {
+        Ok(entries) => entries,
+        Err(err) => {
+            return format!(
+                "<unreadable-directory mention=\"@{raw}\" path=\"{display_path}\">\n{err}\n</unreadable-directory>"
+            );
+        }
+    };
+
+    let mut names = entries
+        .filter_map(|entry| entry.ok())
+        .map(|entry| {
+            let marker = entry
+                .file_type()
+                .ok()
+                .filter(|ty| ty.is_dir())
+                .map_or("", |_| "/");
+            format!("{}{}", entry.file_name().to_string_lossy(), marker)
+        })
+        .collect::<Vec<_>>();
+    names.sort();
+    let total = names.len();
+    names.truncate(MAX_DIRECTORY_MENTION_ENTRIES);
+    let mut body = names.join("\n");
+    if total > MAX_DIRECTORY_MENTION_ENTRIES {
+        let omitted = total - MAX_DIRECTORY_MENTION_ENTRIES;
+        let _ = write!(body, "\n... {omitted} more entries");
+    }
+    format!("<directory mention=\"@{raw}\" path=\"{display_path}\">\n{body}\n</directory>")
+}
+
+fn read_text_prefix(path: &Path) -> std::io::Result<(String, bool)> {
+    let mut file = std::fs::File::open(path)?;
+    let mut buffer = Vec::new();
+    file.by_ref()
+        .take(MAX_MENTION_FILE_BYTES + 1)
+        .read_to_end(&mut buffer)?;
+    let truncated = buffer.len() as u64 > MAX_MENTION_FILE_BYTES;
+    if truncated {
+        buffer.truncate(MAX_MENTION_FILE_BYTES as usize);
+    }
+    if buffer.contains(&0) {
+        return Err(std::io::Error::new(
+            std::io::ErrorKind::InvalidData,
+            "file appears to be binary",
+        ));
+    }
+    let text = if truncated {
+        String::from_utf8_lossy(&buffer).to_string()
+    } else {
+        std::str::from_utf8(&buffer)
+            .map_err(|_| std::io::Error::new(std::io::ErrorKind::InvalidData, "file is not UTF-8"))?
+            .to_string()
+    };
+    Ok((text, truncated))
+}
+
+fn is_media_path(path: &Path) -> bool {
+    let Some(ext) = path.extension().and_then(|ext| ext.to_str()) else {
+        return false;
+    };
+    matches!(
+        ext.to_ascii_lowercase().as_str(),
+        "png"
+            | "jpg"
+            | "jpeg"
+            | "gif"
+            | "webp"
+            | "bmp"
+            | "tif"
+            | "tiff"
+            | "ppm"
+            | "mp4"
+            | "mov"
+            | "m4v"
+            | "webm"
+            | "avi"
+            | "mkv"
+    )
+}
@@ -8,6 +8,7 @@ pub mod clipboard;
 pub mod command_palette;
 pub mod diff_render;
 pub mod event_broker;
+pub mod file_mention;
 pub mod history;
 pub mod markdown_render;
 pub mod model_picker;
@@ -358,7 +358,15 @@ mod tests {
            yolo: false,
            resume_session_id: None,
        };
-        App::new(options, &Config::default())
+        let mut app = App::new(options, &Config::default());
+        // App::new merges in `~/.config/deepseek/settings.toml` /
+        // `Application Support/deepseek/settings.toml`, which can override
+        // the model and effort with whatever the developer happens to have
+        // saved. Pin both back to known values so the picker tests below
+        // exercise the picker logic, not the user's environment.
+        app.model = "deepseek-v4-pro".to_string();
+        app.reasoning_effort = ReasoningEffort::Max;
+        app
    }

    #[test]
@@ -46,6 +46,14 @@ pub enum TranscriptScroll {

 impl TranscriptScroll {
    /// Resolve the anchor to a top line index.
+    ///
+    /// When the original anchor cell no longer exists (because content was
+    /// rewritten — e.g. an inline RLM `repl` block expanded into
+    /// `Thinking + Text`, or a tool result was replaced) we used to fall
+    /// straight to `ToBottom`, which felt like the view "got stuck" because
+    /// the user's next Up press would teleport-then-recompute from the
+    /// bottom. Instead, clamp to the nearest surviving cell so scroll
+    /// position is preserved across rewrites.
    #[must_use]
    pub fn resolve_top(self, line_meta: &[TranscriptLineMeta], max_start: usize) -> (Self, usize) {
        match self {
@@ -54,23 +62,60 @@ impl TranscriptScroll {
                cell_index,
                line_in_cell,
            } => {
-                let anchor = anchor_index(line_meta, cell_index, line_in_cell);
-                match anchor {
-                    Some(idx) => (self, idx.min(max_start)),
-                    None => (TranscriptScroll::ToBottom, max_start),
+                if let Some(idx) = anchor_index(line_meta, cell_index, line_in_cell) {
+                    return (self, idx.min(max_start));
                }
+                // Fallback 1: same cell, top line — handles cases where the
+                // line count for a cell shrank (e.g. text was condensed).
+                if let Some(idx) = anchor_index(line_meta, cell_index, 0) {
+                    return (
+                        TranscriptScroll::Scrolled {
+                            cell_index,
+                            line_in_cell: 0,
+                        },
+                        idx.min(max_start),
+                    );
+                }
+                // Fallback 2: nearest surviving cell at or before the
+                // requested cell index. Walks line_meta once.
+                if let Some((ci, li, idx)) = nearest_cell_at_or_before(line_meta, cell_index) {
+                    return (
+                        TranscriptScroll::Scrolled {
+                            cell_index: ci,
+                            line_in_cell: li,
+                        },
+                        idx.min(max_start),
+                    );
+                }
+                // Last resort — there are no cell lines at all (empty
+                // transcript). ToBottom is fine here.
+                (TranscriptScroll::ToBottom, max_start)
            }
            TranscriptScroll::ScrolledSpacerBeforeCell { cell_index } => {
-                let anchor = spacer_before_cell_index(line_meta, cell_index);
-                match anchor {
-                    Some(idx) => (self, idx.min(max_start)),
-                    None => (TranscriptScroll::ToBottom, max_start),
+                if let Some(idx) = spacer_before_cell_index(line_meta, cell_index) {
+                    return (self, idx.min(max_start));
                }
+                if let Some((ci, li, idx)) = nearest_cell_at_or_before(line_meta, cell_index) {
+                    return (
+                        TranscriptScroll::Scrolled {
+                            cell_index: ci,
+                            line_in_cell: li,
+                        },
+                        idx.min(max_start),
+                    );
+                }
+                (TranscriptScroll::ToBottom, max_start)
            }
        }
    }

    /// Apply a delta scroll and return the updated anchor.
+    ///
+    /// When the existing anchor cell is gone (content rewrite), fall back to
+    /// the nearest surviving cell instead of `max_start`. Without that, an
+    /// Up press from a missing-anchor state would resolve `current_top` to
+    /// the bottom and then walk up by `delta`, teleporting the user near
+    /// the bottom of the transcript.
    #[must_use]
    pub fn scrolled_by(
        self,
@@ -94,10 +139,15 @@ impl TranscriptScroll {
                cell_index,
                line_in_cell,
            } => anchor_index(line_meta, cell_index, line_in_cell)
+                .or_else(|| anchor_index(line_meta, cell_index, 0))
+                .or_else(|| nearest_cell_at_or_before(line_meta, cell_index).map(|(_, _, idx)| idx))
                .unwrap_or(max_start)
                .min(max_start),
            TranscriptScroll::ScrolledSpacerBeforeCell { cell_index } => {
                spacer_before_cell_index(line_meta, cell_index)
+                    .or_else(|| {
+                        nearest_cell_at_or_before(line_meta, cell_index).map(|(_, _, idx)| idx)
+                    })
                    .unwrap_or(max_start)
                    .min(max_start)
            }
@@ -110,7 +160,7 @@ impl TranscriptScroll {
            current_top.saturating_add(delta).min(max_start)
        };

-        if new_top == max_start {
+        if new_top >= max_start {
            TranscriptScroll::ToBottom
        } else {
            TranscriptScroll::anchor_for(line_meta, new_top).unwrap_or(TranscriptScroll::ToBottom)
@@ -198,6 +248,48 @@ fn anchor_at_or_before(line_meta: &[TranscriptLineMeta], start: usize) -> Option
        .find_map(|(_, entry)| entry.cell_line())
 }

+/// Walk `line_meta` once and return the highest-priority surviving cell
+/// position whose `cell_index <= target`. Used as a fallback when the
+/// original anchor cell was removed by a content rewrite — keeps the user
+/// near where they were instead of teleporting to the bottom.
+///
+/// Returns `(cell_index, line_in_cell, line_meta_index)`.
+fn nearest_cell_at_or_before(
+    line_meta: &[TranscriptLineMeta],
+    target: usize,
+) -> Option<(usize, usize, usize)> {
+    let mut best: Option<(usize, usize, usize)> = None;
+    for (idx, entry) in line_meta.iter().enumerate() {
+        if let TranscriptLineMeta::CellLine {
+            cell_index,
+            line_in_cell,
+        } = *entry
+            && cell_index <= target
+        {
+            match best {
+                None => best = Some((cell_index, line_in_cell, idx)),
+                Some((bci, _, _)) if cell_index >= bci => {
+                    best = Some((cell_index, line_in_cell, idx));
+                }
+                _ => {}
+            }
+        }
+    }
+    // Fall back to the very first surviving cell if nothing matched.
+    if best.is_none() {
+        for (idx, entry) in line_meta.iter().enumerate() {
+            if let TranscriptLineMeta::CellLine {
+                cell_index,
+                line_in_cell,
+            } = *entry
+            {
+                return Some((cell_index, line_in_cell, idx));
+            }
+        }
+    }
+    best
+}
+
 /// Direction for mouse scroll input.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub enum ScrollDirection {
@@ -250,3 +342,151 @@ impl MouseScrollState {
        ScrollUpdate { delta_lines: delta }
    }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn cell_line(cell_index: usize, line_in_cell: usize) -> TranscriptLineMeta {
+        TranscriptLineMeta::CellLine {
+            cell_index,
+            line_in_cell,
+        }
+    }
+
+    /// Build a synthetic line-meta array for a transcript with `cell_count`
+    /// cells, each `lines_per_cell` lines tall, separated by spacers.
+    fn synth_line_meta(cell_count: usize, lines_per_cell: usize) -> Vec<TranscriptLineMeta> {
+        let mut meta = Vec::new();
+        for cell in 0..cell_count {
+            for line in 0..lines_per_cell {
+                meta.push(cell_line(cell, line));
+            }
+            if cell + 1 < cell_count {
+                meta.push(TranscriptLineMeta::Spacer);
+            }
+        }
+        meta
+    }
+
+    /// Regression test for the "stuck after content rewrite" bug from
+    /// issue #56. When the anchor cell still exists, scroll position is
+    /// preserved.
+    #[test]
+    fn resolve_top_keeps_position_when_anchor_cell_exists() {
+        let meta = synth_line_meta(5, 3); // 5 cells × 3 lines + 4 spacers = 19 entries
+        let max_start = meta.len().saturating_sub(8);
+        let anchor = TranscriptScroll::Scrolled {
+            cell_index: 2,
+            line_in_cell: 1,
+        };
+        let (state, top) = anchor.resolve_top(&meta, max_start);
+        assert_eq!(state, anchor);
+        // Cell 2, line 1 sits at: 0,1,2 (cell 0), spacer, 4,5,6 (cell 1),
+        // spacer, 8,9,10 (cell 2) — line 1 of cell 2 is index 9.
+        assert_eq!(top, 9);
+    }
+
+    /// Regression test for issue #56: when a content rewrite removes the
+    /// anchor cell, the previous behaviour was to teleport to ToBottom.
+    /// Now we clamp to the nearest surviving cell at-or-before the
+    /// requested cell index.
+    #[test]
+    fn resolve_top_clamps_to_nearest_cell_when_anchor_cell_removed() {
+        // Original transcript had cells 0..5; a rewrite shrunk it to 0..3.
+        // The anchor pointed at cell 4 — that cell no longer exists.
+        let meta = synth_line_meta(3, 2); // cells 0..3, 2 lines each + 2 spacers
+        let max_start = meta.len();
+        let stale_anchor = TranscriptScroll::Scrolled {
+            cell_index: 4,
+            line_in_cell: 0,
+        };
+        let (state, top) = stale_anchor.resolve_top(&meta, max_start);
+        // Should clamp to the highest-indexed surviving cell (cell 2)
+        // rather than ToBottom.
+        match state {
+            TranscriptScroll::Scrolled { cell_index, .. } => assert_eq!(cell_index, 2),
+            other => panic!("expected Scrolled, got {other:?}"),
+        }
+        // top should be a valid index into meta, not max_start.
+        assert!(top < meta.len());
+    }
+
+    /// Same fallback behaviour applies when scrolling further by a delta:
+    /// scrolled_by computes its current_top from the (stale) anchor and
+    /// the user's keypress should still move them up rather than locking
+    /// them near the bottom.
+    #[test]
+    fn scrolled_by_does_not_teleport_on_missing_anchor() {
+        let meta = synth_line_meta(3, 2);
+        let visible = 4;
+        let stale_anchor = TranscriptScroll::Scrolled {
+            cell_index: 4,
+            line_in_cell: 0,
+        };
+        // User presses Up (negative delta) from a stale anchor.
+        let new_state = stale_anchor.scrolled_by(-1, &meta, visible);
+        // Either ends up Scrolled near the top of the surviving content
+        // or ToBottom if the transcript fits in one screen — but the
+        // failure mode we're testing for is "ToBottom even though Up was
+        // pressed and there's room to scroll," which depends on
+        // total_lines > visible_lines.
+        if meta.len() > visible {
+            assert!(matches!(new_state, TranscriptScroll::Scrolled { .. }));
+        }
+    }
+
+    /// When the transcript fits entirely in the viewport, the scroll
+    /// state collapses to ToBottom regardless of where the anchor was.
+    #[test]
+    fn scrolled_by_collapses_to_bottom_when_view_fits() {
+        let meta = synth_line_meta(2, 2);
+        let visible = meta.len() + 5;
+        let anchor = TranscriptScroll::Scrolled {
+            cell_index: 0,
+            line_in_cell: 0,
+        };
+        let new_state = anchor.scrolled_by(-1, &meta, visible);
+        assert_eq!(new_state, TranscriptScroll::ToBottom);
+    }
+
+    /// `nearest_cell_at_or_before` returns the highest cell_index that
+    /// is still ≤ the requested target.
+    #[test]
+    fn nearest_cell_at_or_before_picks_highest_surviving() {
+        let meta = synth_line_meta(4, 1); // cells 0..4, one line each + spacers
+        let result = nearest_cell_at_or_before(&meta, 10);
+        let (cell_index, line_in_cell, _) = result.expect("a cell should match");
+        assert_eq!(cell_index, 3);
+        assert_eq!(line_in_cell, 0);
+    }
+
+    /// And falls back to the very first surviving cell when target is
+    /// below all surviving cells (shouldn't happen in practice but the
+    /// helper guards against it).
+    #[test]
+    fn nearest_cell_at_or_before_falls_back_to_first_when_target_too_low() {
+        let mut meta = synth_line_meta(0, 0);
+        meta.push(cell_line(5, 0));
+        meta.push(cell_line(6, 0));
+        let result = nearest_cell_at_or_before(&meta, 2);
+        let (cell_index, _, _) = result.expect("falls back to first cell");
+        assert_eq!(cell_index, 5);
+    }
+
+    /// Empty line_meta returns None — caller falls through to ToBottom.
+    #[test]
+    fn nearest_cell_at_or_before_empty_returns_none() {
+        let meta: Vec<TranscriptLineMeta> = Vec::new();
+        assert!(nearest_cell_at_or_before(&meta, 0).is_none());
+    }
+
+    #[test]
+    fn to_bottom_anchor_resolves_to_max_start() {
+        let meta = synth_line_meta(5, 2);
+        let max_start = 7;
+        let (state, top) = TranscriptScroll::ToBottom.resolve_top(&meta, max_start);
+        assert_eq!(state, TranscriptScroll::ToBottom);
+        assert_eq!(top, max_start);
+    }
+}
@@ -1,7 +1,7 @@
 //! TUI event loop and rendering logic for `DeepSeek` CLI.

 use std::fmt::Write;
-use std::io::{self, Read, Stdout};
+use std::io::{self, Stdout};
 use std::path::{Path, PathBuf};
 use std::process::Command;
 use std::time::{Duration, Instant};
@@ -16,7 +16,6 @@ use crossterm::{
    execute,
    terminal::{EnterAlternateScreen, LeaveAlternateScreen, disable_raw_mode, enable_raw_mode},
 };
-use ignore::WalkBuilder;
 use ratatui::{
    Frame, Terminal,
    backend::CrosstermBackend,
@@ -92,9 +91,6 @@ const SLASH_MENU_LIMIT: usize = 6;
 const MIN_CHAT_HEIGHT: u16 = 3;
 const MIN_COMPOSER_HEIGHT: u16 = 2;
 const CONTEXT_WARNING_THRESHOLD_PERCENT: f64 = 85.0;
-const MAX_FILE_MENTIONS_PER_MESSAGE: usize = 8;
-const MAX_MENTION_FILE_BYTES: u64 = 128 * 1024;
-const MAX_DIRECTORY_MENTION_ENTRIES: usize = 80;
 const CONTEXT_CRITICAL_THRESHOLD_PERCENT: f64 = 95.0;
 const UI_IDLE_POLL_MS: u64 = 48;
 const UI_ACTIVE_POLL_MS: u64 = 24;
@@ -1269,10 +1265,18 @@ async fn run_event_loop(
                    if try_autocomplete_slash_command(app) {
                        continue;
                    }
-                    if try_autocomplete_file_mention(app) {
+                    if crate::tui::file_mention::try_autocomplete_file_mention(app) {
                        continue;
                    }
+                    let prior_model = app.model.clone();
                    app.cycle_mode();
+                    if app.model != prior_model {
+                        let _ = engine_handle
+                            .send(Op::SetModel {
+                                model: app.model.clone(),
+                            })
+                            .await;
+                    }
                }
                KeyCode::BackTab => {
                    app.cycle_effort();
@@ -1587,7 +1591,7 @@ fn try_autocomplete_slash_command(app: &mut App) -> bool {
    }

    let names = matches.iter().map(|info| info.name).collect::<Vec<_>>();
-    let shared = longest_common_prefix(&names);
+    let shared = crate::tui::file_mention::longest_common_prefix(&names);

    if !shared.is_empty() && shared.len() > prefix.len() {
        app.input = format!("/{shared}");
@@ -1616,189 +1620,6 @@ fn try_autocomplete_slash_command(app: &mut App) -> bool {
    true
 }

-/// Maximum file-mention completion candidates to consider per keypress. Caps
-/// the cost of walking large workspaces; subsequent keystrokes narrow further.
-const FILE_MENTION_COMPLETION_LIMIT: usize = 64;
-
-/// Maximum directory depth walked when completing a file mention. Mirrors the
-/// existing `project_tree` cutoff and keeps Tab snappy in deep monorepos.
-const FILE_MENTION_COMPLETION_DEPTH: usize = 6;
-
-/// If the cursor sits inside a `@<partial>` token in the input, return the
-/// byte offset where the `@` starts (so we can splice in a completion) and
-/// the partial path the user has typed so far. The token stops at whitespace
-/// or the end of input. Returns `None` when the cursor is outside any mention
-/// or the token is empty (`@` with nothing after it).
-fn partial_file_mention_at_cursor(input: &str, cursor_chars: usize) -> Option<(usize, String)> {
-    let chars: Vec<char> = input.chars().collect();
-    if cursor_chars > chars.len() {
-        return None;
-    }
-    // Walk left from the cursor until we find an `@` or a whitespace; if
-    // whitespace comes first the cursor isn't inside a mention.
-    let mut start_chars = cursor_chars;
-    while start_chars > 0 {
-        let prev = chars[start_chars - 1];
-        if prev == '@' {
-            start_chars -= 1;
-            break;
-        }
-        if prev.is_whitespace() {
-            return None;
-        }
-        start_chars -= 1;
-    }
-    if start_chars == cursor_chars || chars.get(start_chars) != Some(&'@') {
-        return None;
-    }
-    // Confirm the `@` itself is at a valid mention boundary.
-    if !is_file_mention_start(&chars, start_chars) {
-        return None;
-    }
-    // Consume from the `@` to the next whitespace (the end of the token).
-    let mut end_chars = start_chars + 1;
-    while end_chars < chars.len() && !chars[end_chars].is_whitespace() {
-        end_chars += 1;
-    }
-    let partial: String = chars[start_chars + 1..end_chars].iter().collect();
-    let byte_start: usize = chars[..start_chars].iter().map(|c| c.len_utf8()).sum();
-    Some((byte_start, partial))
-}
-
-/// Walk the workspace and return relative paths whose representation matches
-/// the partial mention. A file matches when its case-insensitive relative
-/// path either starts with the partial or contains it as a substring; the
-/// former rank earlier so a partial like `docs/de` resolves to
-/// `docs/deepseek_v4.pdf` before any path that merely contains those bytes.
-fn find_file_mention_completions(workspace: &Path, partial: &str, limit: usize) -> Vec<String> {
-    if limit == 0 {
-        return Vec::new();
-    }
-    let needle = partial.to_lowercase();
-    let mut prefix_hits: Vec<String> = Vec::new();
-    let mut substring_hits: Vec<String> = Vec::new();
-
-    let mut builder = WalkBuilder::new(workspace);
-    builder
-        .hidden(true)
-        .follow_links(false)
-        .max_depth(Some(FILE_MENTION_COMPLETION_DEPTH));
-
-    for entry in builder.build().flatten() {
-        if prefix_hits.len() + substring_hits.len() >= limit {
-            break;
-        }
-        let path = entry.path();
-        let Ok(rel) = path.strip_prefix(workspace) else {
-            continue;
-        };
-        let rel_str = rel.to_string_lossy().replace('\\', "/");
-        if rel_str.is_empty() {
-            continue;
-        }
-        let is_dir = entry.file_type().is_some_and(|ft| ft.is_dir());
-        let candidate = if is_dir {
-            format!("{rel_str}/")
-        } else {
-            rel_str.clone()
-        };
-        let lower = candidate.to_lowercase();
-        if needle.is_empty() || lower.starts_with(&needle) {
-            prefix_hits.push(candidate);
-        } else if lower.contains(&needle) {
-            substring_hits.push(candidate);
-        }
-    }
-
-    prefix_hits.sort();
-    substring_hits.sort();
-    prefix_hits.extend(substring_hits);
-    prefix_hits.truncate(limit);
-    prefix_hits
-}
-
-/// Tab-completion handler for `@file` mentions. Mirrors the slash-command
-/// flow: a single match is applied directly; multiple matches with a longer
-/// shared prefix extend the partial; otherwise the first few candidates are
-/// surfaced via the status line. Returns true when the input was modified or
-/// a suggestion was offered, so the caller can short-circuit other handlers.
-fn try_autocomplete_file_mention(app: &mut App) -> bool {
-    let Some((byte_start, partial)) =
-        partial_file_mention_at_cursor(&app.input, app.cursor_position)
-    else {
-        return false;
-    };
-    let workspace = app.workspace.clone();
-    let candidates =
-        find_file_mention_completions(&workspace, &partial, FILE_MENTION_COMPLETION_LIMIT);
-    if candidates.is_empty() {
-        app.status_message = Some(format!("No files match @{partial}"));
-        return true;
-    }
-    if candidates.len() == 1 {
-        replace_file_mention(app, byte_start, &partial, &candidates[0]);
-        app.status_message = Some(format!("Attached @{}", candidates[0]));
-        return true;
-    }
-    let candidate_refs: Vec<&str> = candidates.iter().map(String::as_str).collect();
-    let shared = longest_common_prefix(&candidate_refs);
-    if shared.len() > partial.len() {
-        replace_file_mention(app, byte_start, &partial, shared);
-        app.status_message = Some(format!("@{shared}…"));
-        return true;
-    }
-    let preview = candidates
-        .iter()
-        .take(5)
-        .map(|c| format!("@{c}"))
-        .collect::<Vec<_>>()
-        .join(", ");
-    app.status_message = Some(format!("Matches: {preview}"));
-    true
-}
-
-/// Splice a completion into the input, replacing the `@<partial>` token at
-/// `byte_start` with `@<replacement>`. Cursor moves to the end of the new
-/// token so further keystrokes extend (or escape via space) naturally.
-fn replace_file_mention(app: &mut App, byte_start: usize, partial: &str, replacement: &str) {
-    let original_token_len = '@'.len_utf8() + partial.len();
-    let original_token_end = byte_start + original_token_len;
-    let mut new_input =
-        String::with_capacity(app.input.len() - original_token_len + 1 + replacement.len());
-    new_input.push_str(&app.input[..byte_start]);
-    new_input.push('@');
-    new_input.push_str(replacement);
-    if original_token_end < app.input.len() {
-        new_input.push_str(&app.input[original_token_end..]);
-    }
-    let new_cursor_chars =
-        app.input[..byte_start].chars().count() + 1 + replacement.chars().count();
-    app.input = new_input;
-    app.cursor_position = new_cursor_chars;
-}
-
-fn longest_common_prefix<'a>(values: &[&'a str]) -> &'a str {
-    let Some(first) = values.first().copied() else {
-        return "";
-    };
-    let mut end = first.len();
-
-    for value in values.iter().skip(1) {
-        while end > 0 && !value.starts_with(&first[..end]) {
-            end -= 1;
-            // Ensure we land on a valid UTF-8 char boundary.
-            while end > 0 && !first.is_char_boundary(end) {
-                end -= 1;
-            }
-        }
-        if end == 0 {
-            return "";
-        }
-    }
-
-    &first[..end]
-}
-
 async fn fetch_available_models(config: &Config) -> Result<Vec<String>> {
    use crate::client::DeepSeekClient;

@@ -2034,7 +1855,8 @@ fn build_queued_message(app: &mut App, input: String) -> QueuedMessage {
 }

 fn queued_message_content_for_app(app: &App, message: &QueuedMessage) -> String {
-    let user_request = user_request_with_file_mentions(&message.display, &app.workspace);
+    let user_request =
+        crate::tui::file_mention::user_request_with_file_mentions(&message.display, &app.workspace);
    if let Some(skill_instruction) = message.skill_instruction.as_ref() {
        format!("{skill_instruction}\n\n---\n\nUser request: {user_request}")
    } else {
@@ -2042,248 +1864,6 @@ fn queued_message_content_for_app(app: &App, message: &QueuedMessage) -> String
    }
 }

-fn user_request_with_file_mentions(input: &str, workspace: &Path) -> String {
-    let Some(context) = local_context_from_file_mentions(input, workspace) else {
-        return input.to_string();
-    };
-    format!("{input}\n\n---\n\nLocal context from @mentions:\n{context}")
-}
-
-fn local_context_from_file_mentions(input: &str, workspace: &Path) -> Option<String> {
-    let mentions = extract_file_mentions(input);
-    if mentions.is_empty() {
-        return None;
-    }
-
-    let mut blocks = Vec::new();
-    let mut seen = std::collections::HashSet::new();
-    for mention in mentions.into_iter().take(MAX_FILE_MENTIONS_PER_MESSAGE) {
-        let path = resolve_mention_path(&mention, workspace);
-        let display_path = path
-            .canonicalize()
-            .unwrap_or_else(|_| path.clone())
-            .display()
-            .to_string();
-        if !seen.insert(display_path.clone()) {
-            continue;
-        }
-        blocks.push(render_file_mention_context(&mention, &path, &display_path));
-    }
-
-    if blocks.is_empty() {
-        None
-    } else {
-        Some(blocks.join("\n\n"))
-    }
-}
-
-fn extract_file_mentions(input: &str) -> Vec<String> {
-    let chars: Vec<char> = input.chars().collect();
-    let mut mentions = Vec::new();
-    let mut idx = 0;
-
-    while idx < chars.len() {
-        if chars[idx] != '@' || !is_file_mention_start(&chars, idx) {
-            idx += 1;
-            continue;
-        }
-
-        let Some(next) = chars.get(idx + 1).copied() else {
-            break;
-        };
-        if next.is_whitespace() {
-            idx += 1;
-            continue;
-        }
-
-        if matches!(next, '"' | '\'') {
-            let quote = next;
-            let mut end = idx + 2;
-            let mut raw = String::new();
-            while end < chars.len() && chars[end] != quote {
-                raw.push(chars[end]);
-                end += 1;
-            }
-            if !raw.trim().is_empty() {
-                mentions.push(raw.trim().to_string());
-            }
-            idx = end.saturating_add(1);
-            continue;
-        }
-
-        let mut end = idx + 1;
-        let mut raw = String::new();
-        while end < chars.len() && !chars[end].is_whitespace() {
-            raw.push(chars[end]);
-            end += 1;
-        }
-        let trimmed = trim_unquoted_mention(&raw);
-        if !trimmed.is_empty() {
-            mentions.push(trimmed.to_string());
-        }
-        idx = end;
-    }
-
-    mentions
-}
-
-fn is_file_mention_start(chars: &[char], idx: usize) -> bool {
-    if idx == 0 {
-        return true;
-    }
-    chars
-        .get(idx.saturating_sub(1))
-        .is_some_and(|ch| ch.is_whitespace() || matches!(ch, '(' | '[' | '{' | '<' | '"' | '\''))
-}
-
-fn trim_unquoted_mention(raw: &str) -> &str {
-    let mut trimmed = raw.trim();
-    while trimmed.chars().count() > 1
-        && trimmed
-            .chars()
-            .last()
-            .is_some_and(|ch| matches!(ch, ',' | ';' | ':' | '!' | '?' | ')' | ']' | '}'))
-    {
-        trimmed = &trimmed[..trimmed.len() - trimmed.chars().last().unwrap().len_utf8()];
-    }
-    trimmed
-}
-
-fn resolve_mention_path(raw_path: &str, workspace: &Path) -> PathBuf {
-    let path = expand_mention_home(raw_path);
-    if path.is_absolute() {
-        path
-    } else {
-        workspace.join(path)
-    }
-}
-
-fn expand_mention_home(path: &str) -> PathBuf {
-    if path == "~" {
-        if let Some(home) = std::env::var_os("HOME") {
-            return PathBuf::from(home);
-        }
-    } else if let Some(rest) = path.strip_prefix("~/")
-        && let Some(home) = std::env::var_os("HOME")
-    {
-        return PathBuf::from(home).join(rest);
-    }
-    PathBuf::from(path)
-}
-
-fn render_file_mention_context(raw: &str, path: &Path, display_path: &str) -> String {
-    if !path.exists() {
-        return format!("<missing-file mention=\"@{raw}\" path=\"{display_path}\" />");
-    }
-    if path.is_dir() {
-        return render_directory_mention_context(raw, path, display_path);
-    }
-    if !path.is_file() {
-        return format!("<unsupported-path mention=\"@{raw}\" path=\"{display_path}\" />");
-    }
-    if is_media_path(path) {
-        return format!(
-            "<media-file mention=\"@{raw}\" path=\"{display_path}\">\nUse /attach {raw} when the intent is to attach this image or video to the next message.\n</media-file>"
-        );
-    }
-
-    match read_text_prefix(path) {
-        Ok((text, truncated)) => {
-            let truncated_attr = if truncated { " truncated=\"true\"" } else { "" };
-            format!(
-                "<file mention=\"@{raw}\" path=\"{display_path}\"{truncated_attr}>\n{text}\n</file>"
-            )
-        }
-        Err(err) => {
-            format!(
-                "<unreadable-file mention=\"@{raw}\" path=\"{display_path}\">\n{err}\n</unreadable-file>"
-            )
-        }
-    }
-}
-
-fn render_directory_mention_context(raw: &str, path: &Path, display_path: &str) -> String {
-    let entries = match std::fs::read_dir(path) {
-        Ok(entries) => entries,
-        Err(err) => {
-            return format!(
-                "<unreadable-directory mention=\"@{raw}\" path=\"{display_path}\">\n{err}\n</unreadable-directory>"
-            );
-        }
-    };
-
-    let mut names = entries
-        .filter_map(|entry| entry.ok())
-        .map(|entry| {
-            let marker = entry
-                .file_type()
-                .ok()
-                .filter(|ty| ty.is_dir())
-                .map_or("", |_| "/");
-            format!("{}{}", entry.file_name().to_string_lossy(), marker)
-        })
-        .collect::<Vec<_>>();
-    names.sort();
-    let total = names.len();
-    names.truncate(MAX_DIRECTORY_MENTION_ENTRIES);
-    let mut body = names.join("\n");
-    if total > MAX_DIRECTORY_MENTION_ENTRIES {
-        let omitted = total - MAX_DIRECTORY_MENTION_ENTRIES;
-        let _ = write!(body, "\n... {omitted} more entries");
-    }
-    format!("<directory mention=\"@{raw}\" path=\"{display_path}\">\n{body}\n</directory>")
-}
-
-fn read_text_prefix(path: &Path) -> std::io::Result<(String, bool)> {
-    let mut file = std::fs::File::open(path)?;
-    let mut buffer = Vec::new();
-    file.by_ref()
-        .take(MAX_MENTION_FILE_BYTES + 1)
-        .read_to_end(&mut buffer)?;
-    let truncated = buffer.len() as u64 > MAX_MENTION_FILE_BYTES;
-    if truncated {
-        buffer.truncate(MAX_MENTION_FILE_BYTES as usize);
-    }
-    if buffer.contains(&0) {
-        return Err(std::io::Error::new(
-            std::io::ErrorKind::InvalidData,
-            "file appears to be binary",
-        ));
-    }
-    let text = if truncated {
-        String::from_utf8_lossy(&buffer).to_string()
-    } else {
-        std::str::from_utf8(&buffer)
-            .map_err(|_| std::io::Error::new(std::io::ErrorKind::InvalidData, "file is not UTF-8"))?
-            .to_string()
-    };
-    Ok((text, truncated))
-}
-
-fn is_media_path(path: &Path) -> bool {
-    let Some(ext) = path.extension().and_then(|ext| ext.to_str()) else {
-        return false;
-    };
-    matches!(
-        ext.to_ascii_lowercase().as_str(),
-        "png"
-            | "jpg"
-            | "jpeg"
-            | "gif"
-            | "webp"
-            | "bmp"
-            | "tif"
-            | "tiff"
-            | "ppm"
-            | "mp4"
-            | "mov"
-            | "m4v"
-            | "webm"
-            | "avi"
-            | "mkv"
-    )
-}
-
 async fn dispatch_user_message(
    app: &mut App,
    engine_handle: &EngineHandle,
@@ -3338,16 +2918,28 @@ fn render_sidebar_subagents(f: &mut Frame, area: Rect, app: &App) {
            .iter()
            .filter(|agent| matches!(agent.status, SubAgentStatus::Running))
            .count();
-        lines.push(Line::from(vec![
-            Span::styled(
-                format!("{running} running"),
-                Style::default().fg(palette::DEEPSEEK_SKY).bold(),
-            ),
-            Span::styled(
-                format!(" / {}", app.subagent_cache.len()),
-                Style::default().fg(palette::TEXT_MUTED),
-            ),
-        ]));
+        let done = app.subagent_cache.len().saturating_sub(running);
+        // When agents have all finished, "0 running / 1" reads as broken.
+        // Switch to "1 done" once nothing is in flight; only show the
+        // running/total split while activity is live.
+        let header = if running > 0 {
+            vec![
+                Span::styled(
+                    format!("{running} running"),
+                    Style::default().fg(palette::DEEPSEEK_SKY).bold(),
+                ),
+                Span::styled(
+                    format!(" / {}", app.subagent_cache.len()),
+                    Style::default().fg(palette::TEXT_MUTED),
+                ),
+            ]
+        } else {
+            vec![Span::styled(
+                format!("{done} done"),
+                Style::default().fg(palette::STATUS_SUCCESS),
+            )]
+        };
+        lines.push(Line::from(header));

        let usable_rows = area.height.saturating_sub(3) as usize;
        let max_agents = usable_rows.saturating_sub(lines.len());
@@ -3903,14 +3495,13 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec<Span<'static>> {
 }

 fn footer_coherence_spans(app: &App) -> Vec<Span<'static>> {
-    // Only show coherence when it's NOT healthy — normal operation
-    // needs no label; anomalies stand out. Renamed "crowded" to
-    // "high load" because the capacity model measures tool/action
-    // complexity, not context-window fullness, and "crowded" is
-    // confusing when the header shows 6% context.
+    // Only surface coherence when the engine is actively intervening — the
+    // user-facing signal is "we're doing something different now," not
+    // "your conversation is getting complex," which the context-percent
+    // header already covers. `GettingCrowded` is just a soft hint, so we
+    // suppress it; the active interventions get their own visible label.
    let (label, color) = match app.coherence_state {
-        CoherenceState::Healthy => return Vec::new(),
-        CoherenceState::GettingCrowded => ("high load", palette::STATUS_WARNING),
+        CoherenceState::Healthy | CoherenceState::GettingCrowded => return Vec::new(),
        CoherenceState::RefreshingContext => ("refreshing context", palette::STATUS_WARNING),
        CoherenceState::VerifyingRecentWork => ("verifying", palette::DEEPSEEK_SKY),
        CoherenceState::ResettingPlan => ("resetting plan", palette::STATUS_ERROR),
@@ -1,5 +1,9 @@
 use super::*;
 use crate::config::Config;
+use crate::tui::file_mention::{
+    find_file_mention_completions, partial_file_mention_at_cursor, try_autocomplete_file_mention,
+    user_request_with_file_mentions,
+};
 use crate::tui::history::{GenericToolCell, ToolCell, ToolStatus};
 use crate::tui::views::{ModalView, ViewAction};
 use std::path::PathBuf;
@@ -395,11 +399,16 @@ fn footer_coherence_chip_hides_healthy_and_uses_clear_labels() {
        "healthy state should produce no footer chip"
    );

+    // GettingCrowded is intentionally suppressed — see the rationale in
+    // `footer_coherence_spans`. The footer only surfaces active engine
+    // interventions; soft pressure hints stay quiet.
+    app.coherence_state = crate::core::coherence::CoherenceState::GettingCrowded;
+    assert!(
+        footer_coherence_spans(&app).is_empty(),
+        "GettingCrowded should not surface a footer chip; only active interventions do"
+    );
+
    let cases = [
-        (
-            crate::core::coherence::CoherenceState::GettingCrowded,
-            "high load",
-        ),
        (
            crate::core::coherence::CoherenceState::RefreshingContext,
            "refreshing context",
@@ -1,8 +1,7 @@
 //! Utility helpers shared across the `DeepSeek` CLI.

 use std::fs;
-use std::path::{Path, PathBuf};
-use std::time::{SystemTime, UNIX_EPOCH};
+use std::path::Path;

 use crate::models::{ContentBlock, Message};
 use anyhow::{Context, Result};
@@ -142,25 +141,6 @@ pub fn ensure_dir(path: &Path) -> Result<()> {
        .with_context(|| format!("Failed to create directory: {}", path.display()))
 }

-#[allow(dead_code)]
-pub fn write_bytes(path: &Path, bytes: &[u8]) -> Result<()> {
-    if let Some(parent) = path.parent() {
-        ensure_dir(parent)?;
-    }
-    fs::write(path, bytes).with_context(|| format!("Failed to write {}", path.display()))
-}
-
-/// Create a timestamped filename for generated assets.
-#[must_use]
-#[allow(dead_code)]
-pub fn timestamped_filename(prefix: &str, extension: &str) -> String {
-    let now = SystemTime::now()
-        .duration_since(UNIX_EPOCH)
-        .unwrap_or_default()
-        .as_secs();
-    format!("{prefix}_{now}.{extension}")
-}
-
 /// Render JSON with pretty formatting, falling back to a compact string on error.
 #[must_use]
 #[allow(dead_code)]
@@ -168,25 +148,6 @@ pub fn pretty_json(value: &Value) -> String {
    serde_json::to_string_pretty(value).unwrap_or_else(|_| value.to_string())
 }

-/// Extract a lowercase file extension from a URL, if present.
-#[must_use]
-#[allow(dead_code)]
-pub fn extension_from_url(url: &str) -> Option<String> {
-    let path = url.split('?').next().unwrap_or(url);
-    let ext = Path::new(path)
-        .extension()
-        .and_then(|ext| ext.to_str())
-        .map(str::to_lowercase);
-    ext.filter(|e| !e.is_empty())
-}
-
-/// Build an output path within the given directory.
-#[must_use]
-#[allow(dead_code)]
-pub fn output_path(output_dir: &Path, filename: &str) -> PathBuf {
-    output_dir.join(filename)
-}
-
 /// Truncate a string to a maximum length, adding an ellipsis if truncated.
 ///
 /// Uses char boundaries to avoid panicking on multi-byte UTF-8 characters.
@@ -2,18 +2,19 @@

 DeepSeek TUI has two related concepts:

- **TUI mode**: what kind of visible interaction you're in (Plan/Agent/YOLO/Hetun).
+- **TUI mode**: what kind of visible interaction you're in (Plan/Agent/YOLO).
 - **Approval mode**: how aggressively the UI asks before executing tools.

 ## TUI Modes

-Press `Tab` to cycle through the visible modes: **Plan → Agent → YOLO → Hetun → Plan**.
-Press `Shift+Tab` to cycle in reverse. Hetun sits at the end of the cycle so a fresh session doesn't land on it accidentally — the default landing mode is unchanged.
+Press `Tab` to cycle through the visible modes: **Plan → Agent → YOLO → Plan**.
+Press `Shift+Tab` to cycle reasoning effort.

 - **Plan**: design-first prompting. Read-only investigation tools stay available; shell and patch execution stay off. Use this when you want to think out loud and produce a plan to hand to a human (yourself later, or a reviewer).
- **Agent**: multi-step tool use. Approvals for shell and paid tools (file writes are allowed without a prompt). RLM is available — the model reaches for `repl` blocks when the work is decomposable.
- **YOLO**: enables shell + trust mode and auto-approves all tools. RLM is available and auto-executes like everything else. Use only in trusted repos.
- **Hetun** (河豚, "Plan + Recursive Agents"): the most opinionated mode the TUI offers. The model uses RLM aggressively to research and decompose tasks in parallel via cheap `deepseek-v4-flash` child calls, then presents a consolidated **mission** for your approval. Once approved, the RLM tree auto-executes without per-tool interruption — you approve the mission, not each individual bullet. Plan + execution folded into one rhythm.
+- **Agent**: multi-step tool use. Approvals for shell and paid tools (file writes are allowed without a prompt).
+- **YOLO**: enables shell + trust mode and auto-approves all tools. Use only in trusted repos.
+
+All three modes have access to the `rlm_query` tool — a structured tool call that fans out 1–16 cheap parallel children on `deepseek-v4-flash`. The model reaches for it when work is decomposable.

 ## Compatibility Notes

@@ -45,16 +46,6 @@ Legacy note: `/set approval_mode ...` was retired in favor of `/config`.
 - `auto`: auto-approves all tools (similar to YOLO approval behavior, but without forcing YOLO mode).
 - `never`: blocks any tool that isn't considered safe/read-only.

-### Task-level approval (Hetun mode)
-
-Hetun mode introduces a higher-level approval concept. Before executing an RLM tree, the engine presents a **mission card** showing what will be done, estimated flash calls, and expected outcomes. You can:
-
- **Approve** — the RLM tree runs without further prompts.
- **Reject** — the engine returns to planning.
- **Modify** — edit the mission description and re-submit.
-
-This is independent of the base `approval_mode` setting. If you set `approval_mode = auto` while in Hetun, you still see mission cards (task-level approval is part of the mode, not the approval policy).
-
 ## Small-Screen Status Behavior

 When terminal height is constrained, the status area compacts first so header/chat/composer/footer remain visible:
@@ -87,7 +78,6 @@ Run `deepseek --help` for the canonical list. Common flags:
 - `--model <MODEL>`: when using the `deepseek` facade, forward a DeepSeek model override to the TUI
 - `--workspace <DIR>`: workspace root for file tools
 - `--yolo`: start in YOLO mode
- `--hetun`: start in Hetun mode
 - `-r, --resume <ID|PREFIX|latest>`: resume a saved session
 - `-c, --continue`: resume the most recent session
 - `--max-subagents <N>`: clamp to `1..=20`
@@ -1,217 +0,0 @@
-# ReAct vs. Recursive Language Models (RLM): A Design Document Comparison
-
-> **Purpose:** Provide the deepseek-tui team with a grounded, citation-rich comparison of the ReAct agent paradigm and the emerging Recursive Language Model (RLM) paradigm so that integration choices (e.g. `zigrlm`, `agent_swarm`, inline tool use) can be made deliberately.
-
---
-
-## 1. ReAct: Reasoning + Acting
-
-### 1.1 Origins and Definition
-
-**ReAct** (Reason + Act) is a prompting and inference paradigm introduced by Yao et al. (Google Research / Princeton) and published at ICLR 2023. It unifies *reasoning traces* (chain-of-thought-style internal monologue) with *task-specific actions* (tool calls, API requests, environment commands) in a single autoregressive loop.  
-**Citation:** Shunyu Yao et al., *"ReAct: Synergizing Reasoning and Acting in Language Models"*, ICLR 2023.
-
-The core insight is that reasoning without acting suffers from fact hallucination and stale knowledge, while acting without reasoning lacks planning, error recovery, and interpretability. ReAct interleaves the two explicitly.
-
-### 1.2 The Thought → Action → Observation Loop
-
-At each timestep \(t\) the agent maintains a context \(c_t\) containing the original query and all prior tuples. The loop is:
-
-1. **Thought** — The LLM generates a reasoning trace: plan decomposition, progress tracking, or exception handling.  
-   \(c_t \rightarrow \text{Thought}_{t+1}\)
-2. **Action** — Conditioned on the thought, the LLM emits a structured action (e.g. `Search[entity]`, `Calculator[expr]`, `Finish[answer]`).  
-   \(c_{t+1} := c_t \parallel \text{Thought}_{t+1}\)
-3. **Observation** — The action is executed in the external environment and the result is appended.  
-   \(c_{t+1} := c_{t+1} \parallel \text{Action}_{t+1} \parallel \text{Obs}_{t+1}\)
-
-The process halts when a special finish action is produced or a hard iteration limit is reached. Probabilistically this is:
-
-\[
-P(\tau \mid q) = \prod_{t=1}^{T} P(v_t \mid q, v_{<t})
-\]
-
-where \(v_t\) spans both Thought and Action tokens and \(\tau\) is the trajectory.
-
-**Key traits:**
- **Linear / sequential** — Each observation must return before the next thought is generated.
- **Scratchpad-based** — The entire history of thoughts, actions, and observations is appended to the prompt; there is no external variable store.
- **Bounded by context window** — As the loop iterates, the prompt grows monotonically (until compaction heuristics truncate it).
-
-### 1.3 Implementations in the Wild
-
-| Framework | ReAct flavour |
-|-----------|---------------|
-| **OpenAI Function Calling** (and compatible APIs) | The model emits JSON `function_call` objects as Actions; tool results are fed back as `tool` role messages as Observations. The "Thought" is often implicit or rendered as a visible `<thinking>` block. |
-| **LangChain / LangGraph** | Pre-built `ReAct` agent chain with a stop-and-observe parser. LangGraph generalises the loop into a state machine with nodes (Thought, Action, Observation) and conditional edges. |
-| **LlamaIndex, BeeAI, etc.** | Provide pre-configured ReAct modules that wrap an LLM with a tool registry and a loop driver. |
-
-A 2025–2026 refinement called **Focused ReAct** presets the original query at each step to prevent drift, reportedly improving accuracy by >5× and reducing runtime by ~34%.
-
---
-
-## 2. Recursive Language Models (RLM)
-
-### 2.1 Origins and Definition
-
-**RLM** is a general *inference-time scaling* paradigm proposed by Zhang, Kraska, and Khattab (MIT CSAIL) in late 2025. Rather than viewing the user prompt as static input tokens, RLMs treat the prompt as part of an **external environment** that the model can programmatically examine, decompose, and recursively query.  
-**Citation:** Alex L. Zhang, Tim Kraska, and Omar Khattab, *"Recursive Language Models"*, arXiv:2512.24601 [cs.AI], 2025 (v2 Jan 2026).
-
-A second formalisation, **λ-RLM**, refines the open-ended code generation of the original paper into a deterministic λ-calculus combinator runtime (SPLIT, PEEK, MAP, FILTER, REDUCE, CONCAT) to eliminate brittle free-form generation.  
-**Citation:** *"Solving Long-Context Rot with λ-Calculus"*, arXiv:2603.20105, 2026.
-
-### 2.2 The REPL Environment and Recursive Call Model
-
-The canonical RLM implementation wraps the root LM in a read-eval-print loop (REPL) — usually Python, though Clojure (`loop-infer`) and bash (`claude-rlm`) adaptations exist. The full context is stored as a variable (e.g. `context`) in the REPL, **not** in the model's prompt window.
-
-At each root iteration:
-1. The LM receives only *metadata* about the REPL state (short stdout prefix, variable names).
-2. The LM emits **code** (or fenced `repl` directives) that manipulate the variable, run regex/grep, or spawn recursive sub-calls.
-3. The code executes; stdout and updated variables are captured.
-4. The loop repeats until the LM sets a special `Final` variable (or emits `FINAL(...)` / `FINAL_VAR(...)`), at which point the run returns.
-
-Because the full text never enters the root LM context window, RLMs can scale to **10M+ tokens** (two orders of magnitude beyond the base model's window) without retraining.
-
-### 2.3 The `repl` Grammar and Tree Structure
-
-In the `zigrlm` runtime (and the reference Python implementation), the root LM writes fenced blocks tagged `repl`. The grammar includes:
-
-| Directive | Semantics |
-|-----------|-----------|
-| `let name = "..."` | Bind a string variable |
-| `js name = "...FINAL(...)"` | Execute deterministic JS in a sandbox |
-| `llm_query name = expr` | Call the *same* model (same depth) |
-| `rlm_query name = expr` | Spawn a **child** RLM (depth + 1) |
-| `llm_query_batched name = a \| b \| c` | Parallel same-depth calls |
-| `rlm_query_batched name = a \| b \| c` | Parallel child RLMs |
-| `FINAL(expression)` | Terminate and return this string |
-| `FINAL_VAR(name)` | Terminate and return the named variable |
-
-These recursive calls form a **tree of reasoning**, not a single chain. Each child processes a snippet of the external context and stores its partial result back into a parent REPL variable. Aggregation is performed programmatically (lists, tallies, tables) rather than autoregressively.
-
-**Key traits:**
- **Context-centric decomposition** — The model decides how to slice the *input context*, not just how to sequence actions.
- **Variable store** — Intermediate results live in the REPL, keeping the LM context window constant-size.
- **Bounded output** — Because `Final` can be assembled from variables, RLMs can produce answers longer than the model's output token limit.
-
-### 2.4 Implementations and Ecosystem
-
-| Project | Notes |
-|---------|-------|
-| **alexzhang13/rlm** | Official research repo (Python). Includes reference REPL, natively fine-tuned `RLM-Qwen3-8B`, and OOLONG / BrowseComp-Plus benchmarks. |
-| **alexzhang13/rlm-minimal** | Stripped-down Python version for hacking. |
-| **zigrlm** | Zig-native runtime with JS sandbox, batched parallel fan-out, and JSONL tracing. Used by deepseek-tui for cheap `deepseek-v4-flash` child dispatch. |
-| **claude-rlm** | Depth-N recursion using Claude Code instances as sub-agents; bash-as-REPL; `mkdir`-based concurrency limiter. |
-| **loop-infer** | Clojure REPL implementation. |
-| **minrlm** | Independent minimal RLM reducing token usage up to 4× vs. flat inference. |
-| **rlm-mcp** | MCP server wrapper exposing RLM through the Model Context Protocol. |
-
---
-
-## 3. Key Differences
-
-### 3.1 Parallelism
-
-| Dimension | ReAct | RLM |
-|-----------|-------|-----|
-| **Structure** | Linear chain. Each Action depends on the prior Observation. | Tree. A parent can fan out N children in parallel. |
-| **Batched execution** | Not native. Some frameworks (LangGraph) add parallel branches, but the canonical ReAct loop is sequential. | Native via `*_batched` directives. `zigrlm` dispatches children across OS threads capped by `max_concurrent_subcalls`. |
-| **Synchronisation** | Implicit: the loop blocks on the environment. | Children write to named variables; parent continues only after aggregation code runs. |
-
-The RLM paper explicitly notes that their reference implementation used *blocking* sequential sub-calls and left async fan-out as "low-hanging fruit" for systems builders. `zigrlm` realises that fruit.
-
-### 3.2 Reasoning Representation
-
-| Dimension | ReAct | RLM |
-|-----------|-------|-----|
-| **Form** | Natural-language "Thought" traces appended to a scratchpad. | Code / DSL inside fenced `repl` blocks, plus natural-language plan text outside the fence. |
-| **State management** | Monolithic prompt history. Intermediate values are re-tokenised every turn. | External REPL variables. The LM sees only constant-size metadata. |
-| **Aggregation** | The model must autoregressively synthesise the final answer from the scratchpad. | Programmatic: `FINAL_VAR(tally)` or `FINAL("\n".join(results))`. |
-| **Length limits** | Bounded by context window for both input and output. | Input: theoretically unbounded (10M+ tested). Output: bounded only by REPL variable memory. |
-
-### 3.3 Tool Use
-
-| Dimension | ReAct | RLM |
-|-----------|-------|-----|
-| **Interface** | Structured JSON schemas (OpenAI function calling) or text parsing (LangChain). | Natural-language fenced blocks (`repl`). The "tool" is the REPL itself. |
-| **Tool set** | Fixed registry of functions known at build time. | Open-ended: the LM can write arbitrary regex, loops, or JS to manipulate data. |
-| **Child agents** | Spawning a sub-agent is a heavyweight Action (new thread/process, full tool registry, event channels). | Spawning a child is a lightweight `rlm_query` inside the same runtime; the child uses a cheaper model by default. |
-
-### 3.4 Cost Model
-
-| Dimension | ReAct | RLM |
-|-----------|-------|-----|
-| **Primary model** | Usually one expensive frontier model (e.g. GPT-5, Claude Opus, deepseek-v4-pro) for every turn. | A **root** model (frontier) for control + cheap **child** models (`deepseek-v4-flash`, GPT-5-mini) for sub-tasks. |
-| **Cost scaling** | Grows with iteration count × full prompt length. Compaction heuristics trade quality for cost. | Grows with *task complexity*, not input length. Selective inspection means most tokens are never fed to the LM. |
-| **Empirical results** | N/A (baseline). | On OOLONG 128K, `RLM(GPT-5-mini)` outperformed flat `GPT-5` by >2× and was cheaper on average. On BrowseComp-Plus (1K docs, 6–11M tokens), RLM(GPT-5) averaged **$0.99** vs. $1.50–$2.75 for the base model ingesting everything. |
-| **Variance** | Predictable per-turn cost. | High variance: some trajectories are cheaper than a flat call, outliers can be more expensive. |
-
-### 3.5 Observability
-
-| Dimension | ReAct | RLM |
-|-----------|-------|-----|
-| **Trace shape** | Linear log of (Thought, Action, Observation) tuples. | Tree log: each node is a REPL turn that may branch into child RLM nodes. |
-| **Depth** | Flat iteration count. | Explicit recursion depth (`max_depth`). |
-| **Tooling** | LangSmith, OpenTelemetry spans, simple print logging. | JSONL trace files (`--trace`) capturing every code cell, stdout snapshot, and sub-call with usage metadata. |
-| **Human readability** | Easy: read the scratchpad top-to-bottom. | Harder: requires tree traversal, but the `FINAL` node summarises the aggregate. |
-
---
-
-## 4. When Is Each Appropriate? Trade-offs
-
-### Use ReAct when …
- The task is **interactive and stateful** (e.g. browsing, CLI commands, file editing) where each observation is dynamic and the next action cannot be predicted ahead of time.
- The tool surface is **fixed and schema-driven** (e.g. a known set of REST APIs, file-system operations, database queries).
- You need **deterministic latency bounds** per turn (e.g. a chat UI that must stream a Thought before the next Action).
- The context fits comfortably within the model's window and does not suffer from context rot.
- Human inspectability of a single linear reasoning chain is a priority.
-
-### Use RLM when …
- The input is **very long** (100K–10M+ tokens) and you want to avoid summarisation or compaction loss.
- The work is **embarrassingly parallel** (e.g. classify 1,000 rows, evaluate 50 files, score 20 answers). `rlm_query_batched` maps naturally.
- The task is **recursively decomposable** (e.g. divide-and-conquer summarisation, map-reduce aggregation, multi-hop retrieval over a corpus).
- Cost is a constraint: you can offload leaf work to a **cheap child model** while reserving the frontier model for control decisions.
- You need **deterministic local compute** interleaved with model calls (JS / Python in the REPL).
-
-### Hybrids
-There is no forced binary choice. A pragmatic system (like deepseek-tui) can use:
- **ReAct / OpenAI-style function calling** for interactive tool use and user-facing chat turns.
- **RLM `repl` blocks** for internal parallel decomposition, batched generation, or long-context analysis.
- **Agent swarm** (multi-step ReAct sub-agents) only when autonomous, stateful, multi-tool workflows are required.
-
-The RLM paper itself positions RLMs as the next milestone *after* CoT-style reasoning and ReAct-style agents, not as a replacement for them.
-
---
-
-## 5. Bibliography
-
-1. **Yao, S. et al.** *ReAct: Synergizing Reasoning and Acting in Language Models.* ICLR 2023.  
-   - Blog explainer: https://www.promptingguide.ai/techniques/react  
-   - IBM overview: https://www.ibm.com/think/topics/react-agent
-
-2. **Zhang, A. L., Kraska, T., and Khattab, O.** *Recursive Language Models.* arXiv:2512.24601 [cs.AI], 2025 (v2 Jan 2026).  
-   - Paper: https://arxiv.org/abs/2512.24601  
-   - Blog: https://alexzhang13.github.io/blog/2025/rlm/  
-   - Code: https://github.com/alexzhang13/rlm  
-   - Minimal code: https://github.com/alexzhang13/rlm-minimal
-
-3. **λ-RLM authors.** *Solving Long-Context Rot with λ-Calculus.* arXiv:2603.20105, 2026.  
-   - Formalises RLM control into typed combinators (SPLIT, MAP, FILTER, REDUCE) to replace free-form code generation.
-
-4. **zigrlm** (Zig RLM runtime). Local build: `/Volumes/VIXinSSD/zigrlm/zig-out/bin/zigrlm`.  
-   - Supports `cli`, `cli-claude`, `cli-codex`, `cli-openai`, `zai`, `openai-proxy`, etc.  
-   - Grammar: fenced `repl` blocks with `rlm_query`, `rlm_query_batched`, `FINAL`, `FINAL_VAR`.
-
-5. **Community implementations and extensions**  
-   - `claude-rlm` (depth-N recursion via Claude Code + bash): https://github.com/Tenobrus/claude-rlm  
-   - `minrlm` (token-reduction focus): https://github.com/avilum/minrlm  
-   - `loop-infer` (Clojure REPL): https://github.com/unravel-team/loop-infer  
-   - `rlm-mcp` (MCP server): https://github.com/eesb99/rlm-mcp  
-   - `rlm_repl` (Python PoC): https://github.com/fullstackwebdev/rlm_repl
-
-6. **Benchmarks referenced**  
-   - **OOLONG** (long-context aggregation): Bertsch et al., 2025.  
-   - **BrowseComp-Plus** (multi-hop QA over document corpora): Chen et al., 2025.
-
---
-
-*Document generated for deepseek-tui design review. Corresponds to repo state: main @ 229b1993.*
@@ -1,380 +0,0 @@
-# RLM as a Fundamental Agent Primitive
-
-## Thesis
-
-We will make Recursive Language Models a first-class primitive in `deepseek-tui` by teaching the flat agent loop to detect fenced ```` ```repl ```` blocks in assistant text and hand them directly to the external `zigrlm` binary. `zigrlm` orchestrates cheap parallel `deepseek-v4-flash` child calls, runs a JS sandbox, and returns a single `FINAL` result that becomes the assistant's response for that turn. This replaces the heavy `agent_swarm` tokio-task-per-child model with a lightweight subprocess tree where N flash calls cost less than one Pro call, inverting the usual sub-agent economics.
-
-## Where We Are Today
-
-The agent loop is in `crates/tui/src/core/engine.rs` (`Engine::handle_deepseek_turn()`, ~line 2330). It streams `ContentBlock`s from the model into `session.messages`; if any block is `ToolUse`, it builds a `ToolExecutionPlan`, executes via `execute_tool_with_lock()` (~line 2209), and loops back for another model turn. Parallel work today goes through `AgentSwarmTool` in `crates/tui/src/tools/swarm.rs` (`run_swarm()`, ~line 582), which spawns full background tokio tasks via `SubAgentManager::spawn_background_with_assignment()` in `crates/tui/src/tools/subagent.rs` (~line 584). Each child runs its own agent loop, tool registry, and event channel. That is correct for autonomous multi-step work, but wasteful for simple parallel Q&A or recursive decomposition.
-
-`zigrlm` (already built at `/Volumes/VIXinSSD/zigrlm/zig-out/bin/zigrlm` or cloneable from GitHub) solves this externally. Its `cli` command reads a prompt, drives a root model turn, parses any ```` ```repl ```` blocks with its Zig-native parser (`src/parser.zig`), fans out batched child calls across OS threads capped by `max_concurrent_subcalls` (default 8), and returns the `FINAL` string on stdout. The integration work is wiring this into the engine so the model naturally emits repl blocks instead of JSON tool calls.
-
-## Key Design Questions
-
-### 1. How is `zigrlm` auto-configured with DeepSeek credentials? (#48)
-
-**New file:** `crates/tui/src/zigrlm_config.rs`
-
-A `ZigrlmRuntimeConfig` struct is built from the session's existing `ResolvedRuntimeOptions` (`api_key`, `base_url`, `model`). It constructs the two environment variables `zigrlm` expects:
-
- `ZIGRLM_MAIN_CMD`: `zigrlm openai-proxy --model <pro> --base-url <url>`
- `ZIGRLM_RLM_CMD`: `zigrlm openai-proxy --model deepseek-v4-flash --base-url <url>`
-
-The API key is passed as `OPENAI_API_KEY`. Binary discovery (in priority order): `config.toml` field `zigrlm.bin_path` → env `ZIGRLM_BIN` → known local build `/Volumes/VIXinSSD/zigrlm/zig-out/bin/zigrlm` → `PATH` via `which zigrlm`. If not found, RLM features degrade gracefully with a logged warning.
-
-**Config additions:** `crates/config/src/lib.rs` gets a `ZigrlmConfigToml` struct with optional overrides for `bin_path`, `rlm_model`, `max_depth` (default 2), `max_iterations` (default 20), and `timeout_ms` (default 600000). These are exposed under a new `zigrlm:` table in `config.toml`.
-
-### 2. How does the engine detect and execute repl blocks? (#49)
-
-**Modified file:** `crates/tui/src/core/engine.rs`
-
-After the streaming loop in `handle_deepseek_turn()` persists the assistant message to `session.messages`, we insert a new branch before tool execution:
-
-```rust
-if message.has_tool_calls() {
-    // existing tool-execution path
-} else if has_repl_block(&message.content) {
-    let result = zigrlm_runtime.run_inline(&message.content).await?;
-    // Replace the Text block with the aggregated result
-    message.replace_repl_with_result(&result.response);
-    // Append usage metadata as a system note or hidden block
-    if let Some(usage) = result.usage {
-        session.add_system_note(format!(
-            "[RLM: {} calls, {} tokens]", usage.calls, usage.total_tokens
-        ));
-    }
-    // Turn completes; no extra model round-trip
-}
-```
-
-`has_repl_block()` checks `ContentBlock::Text` for the exact substring "\`\`\`repl" using the same fence logic as `zigrlm/src/parser.zig`. `run_inline()` lives in the new `crates/tui/src/zigrlm_runtime.rs` and shells out to:
-
-```bash
-zigrlm cli \
-  --max-depth 2 \
-  --max-iterations 20 \
-  --timeout-ms 600000 \
-  "<assistant_text>"
-```
-
-with `ZIGRLM_MAIN_CMD`, `ZIGRLM_RLM_CMD`, and `OPENAI_API_KEY` injected into the child environment. The full assistant text is the prompt because the model's natural-language plan preceding the fence is part of the root context `zigrlm` expects.
-
-**UX:** While `zigrlm` runs, the engine emits `Event::RlmStarted` and the TUI shows a spinner: "Running RLM tree…". On completion, `Event::RlmComplete` carries usage so the transcript can render a collapsible "[RLM: 3 calls, 2.1K tokens, 1.2s]" line. `Ctrl-C` during this phase forwards `SIGTERM` to the child process.
-
-### 3. How does the result re-enter the conversation?
-
-The raw assistant message is mutated in-place in `session.messages` (`crates/tui/src/core/session.rs`). Its `ContentBlock::Text` block containing the repl fence is replaced by the `FINAL` string from `zigrlm` stdout. The original repl block is preserved as a `ContentBlock::Thinking` block (or a new internal metadata field) so the model can see its own plan on subsequent turns, but the primary visible response is the aggregated result. This keeps the conversation history clean: the next turn's context contains the unified answer, not raw DSL.
-
-### 4. What happens to the explicit `zigrlm` tool / bridge? (#46)
-
-It remains as an **escape hatch** in `crates/tui/src/tools/zigrlm.rs` (new file), registered via `ToolRegistryBuilder::with_zigrlm_tool()` in `crates/tui/src/tools/registry.rs`. The tool accepts explicit parameters (`prompt`, `max_depth`, `trace_path`, etc.) and is useful for:
-
- DSPy-style signatures via `dszig`
- Docker-backed Python sandboxes
- Custom traces for benchmarking
- User-explicit RLM experiments
-
-The inline primitive and the explicit tool share `ZigrlmRuntimeConfig` but serve different purposes. The model prompt (see below) teaches when to use each.
-
-### 5. How do we teach the model to use this? (#50)
-
-**Modified files:** `crates/tui/src/prompts/agent.txt`, `crates/tui/src/prompts/yolo.txt`
-
-A new section, gated by config flag `rlm.prompt_enabled` (default `true`), is appended to the agent system prompt:
-
-```text
-## Recursive Language Model (RLM) primitive
-
-When you need parallel analysis, recursive decomposition, or batched generation,
-prefer a fenced `repl` block over spawning subagents or doing sequential inline work.
-
- `rlm_query_batched name = "prompt" | "prompt" | ...` for parallel work
- `rlm_query name = "prompt"` for recursive child tasks
- End with `FINAL(expression)` or `FINAL_VAR(name)`
-
-The child model is deepseek-v4-flash (very fast and cheap).
-
-Do NOT use RLM when the task requires file-system modification, interactive user
-input, or is trivial enough for a single sentence.
-```
-
-A comparison table in the prompt clarifies the trade-offs:
-
-| Primitive | Use when | Cost | Speed |
-|---|---|---|---|
-| Inline reasoning | Simple Q&A, one-step tasks | Low | Fast |
-| `repl` block | Parallel / recursive / batched work | Very low (flash) | Fast |
-| `agent_swarm` | Multi-step autonomous work with tools | Higher | Slower (polling) |
-
-This lets us A/B test by toggling `rlm.prompt_enabled` and measuring turns-per-task and token usage.
-
-### 6. How does the model do non-trivial work *inside* a `repl` block? (#53)
-
-Parallel fan-out alone isn't enough. A `repl` block that just splits N prompts and concatenates results is barely better than `agent_swarm`. The unlock is giving the model **cheap programmatic access to data that's already in process memory** — so it can grep, extract, slice, diff, and search a 50K-token blob in one repl block without burning context tokens on the raw bytes or paying for a tool round-trip per query.
-
-This is what makes RLM actually usable, not just clever. We bake a curated helper layer + a sandboxed Python REPL into the runtime as a first-class capability.
-
-**The shape:**
-
-A `repl` block doesn't only fan out to flash children. It can also run Python in a sandboxed namespace where:
-
- A `ctx` variable holds preloaded data the agent wants to interrogate (a file it just read, a tool result, a stream of search hits).
- A small curated helper module is in scope — about 15–25 functions chosen because they meaningfully beat shell when the data is already in memory: `peek` / `lines` / `head` / `tail` / `chunk` / `between`, `grep` / `count_matches` / `find_all` / `semantic_search`, `extract_json_objects` / `extract_urls` / `extract_paths` / `extract_dates`, `replace_all` / `split_by` / `diff` / `similarity`, `dedupe` / `group_by` / `partition` / `frequency`.
- Sandbox is AST-validated + restricted builtins + import allowlist + execution timeout — best-effort, same posture as the JS sandbox the runtime already exposes.
- State persists across `repl` blocks within a turn. The model can `let chunks = chunk(ctx, 4000)` once and reuse `chunks` in subsequent fan-outs.
-
-**Why this lives at the runtime level, not as a separate tool:**
-
-If we shipped a `python_repl` tool alongside RLM, the model would have to choose between "fan out to flash children" (repl block) and "inspect data in Python" (tool call) every turn. They're the same workflow — load → slice → fan out flash queries on the slices → aggregate. Splitting them across two interfaces forces the wrong choice. Putting the helper layer *inside* the repl runtime means a single block can do all four steps with shared state.
-
-**Why these specific helpers and not a giant library:**
-
-The model already has shell + grep + read_file. It doesn't need 124 helpers. It needs ~20 that are obviously the right move when working with in-memory data — the ones where shell would force an unnecessary round-trip or lose structure. Keep the menu small and obvious. Anything not on the menu, the model can write inline (Python is in the sandbox; helpers are conveniences, not a closed world).
-
-## Spike Target
-
-The smallest end-to-end proof is a hardcoded path in `crates/tui/src/core/engine.rs` that, when an assistant message contains a test repl block, shells out to a pre-built `zigrlm` binary with a hardcoded `ZIGRLM_RLM_CMD` pointing at `deepseek-v4-flash`, and injects the stdout result back into `session.messages`.
-
-**Estimated surface area:**
- `engine.rs`: ~30 lines (detection branch + subprocess call)
- `zigrlm_runtime.rs` (new, spike version): ~80 lines (Command builder + stdout capture)
- No config plumbing, no TUI spinner, no usage parsing, no prompt changes.
-
-**Success criteria:** A local test where the Pro model emits:
-
-````
-```repl
-rlm_query_batched answers = "What is 2+2?" | "What is 3+3?"
-FINAL_VAR(answers)
-```
-````
-
-… and the engine returns a single assistant message containing the aggregated `[0]\n4\n[1]\n6` result, with no tool call JSON emitted and no extra model round-trip.
-
-## Hetun Mode — "Plan + Recursive Agents" (added, doesn't replace Plan)
-
-**Tracking issue:** #54
-
-**Hetun** (河豚, Mandarin for *pufferfish*) is added as a fourth mode positioned at the **end** of the Tab cycle so people don't accidentally land on it from a fresh session. The cycle becomes `Plan → Agent → YOLO → Hetun → Plan`. Default landing mode is unchanged. Plan stays exactly as it is — read-only investigation, hand the plan to the human. Hetun is the next step further up the orchestration ladder: planning *and* execution folded together, gated on a single mission-level approval.
-
-The mode badge surfaces this as **"Hetun (Plan + Recursive Agents)"** so users immediately understand the relationship to Plan. Sakana already named the flash-coordinator architecture *Fugu* (the Japanese reading of 河豚); since DeepSeek is Chinese the mandarin reading *hetun* is the right cultural fit.
-
-### What Hetun does
-
-It's the most opinionated mode the TUI offers. The model both **plans the work and runs it**, but the user gates the transition with one explicit mission approval:
-
-1. **Research + plan** — Hetun uses RLM aggressively to investigate the workspace in parallel (multiple `rlm_query_batched` reads of relevant files / patterns / prior turns), synthesises the findings into a concrete mission (sub-tasks, what each looks at, expected outputs, anything that gets written), and lands it in the transcript ending with an explicit "OK to run?" prompt.
-2. **Execute** — once approved, Hetun emits a `repl` block that fans the planned sub-tasks out via `rlm_query_batched` and aggregates into a `FINAL`. No further per-block approvals — you approved the **mission**, the runtime carries it out.
-
-This is meaningfully different from Plan (read-only investigate, hand back to human, human implements) and from YOLO (auto-execute everything turn-by-turn). Hetun keeps the human in the loop at the only point that matters — the gate between "we know what to do" and "do it" — and removes them from every per-step approval after that.
-
-### Behaviour and configuration
-
- **The user's configured model is left alone.** Entering Hetun does *not* swap the conversational model or reasoning effort. If you were on `deepseek-v4-pro` / `max`, you stay there. The flash-as-coordinator behaviour is internal to the runtime (`ZIGRLM_RLM_CMD` always points to flash regardless of mode), not a global model swap. On exit nothing has to be restored because nothing was changed.
- **No `/hetun` slash command.** Tab cycles into the mode like any other; `/plan` keeps switching to Plan as it does today.
- **Mission-level approval, not block-level.** Hetun introduces one approval gate per turn (the mission), then runs the execution `repl` block straight through. Inside Plan, Agent, and YOLO the existing approval policies are unchanged.
-
-RLM is not Hetun-only. Agent and YOLO modes keep using `repl` blocks where the model judges them appropriate (#49 wires the inline primitive globally). Hetun is just the mode that *expects* RLM-first behaviour, and the prompt is tuned for it.
-
-### What "Plan + Recursive Agents" actually means inside Hetun
-
-Sakana's writeup of the Fugu / "intelligence" architecture (the system they shipped to MIC's misinformation programme) describes more than a flash-coordinator wrapper. The mode adopts those technical patterns and translates them into our primitives. A Hetun research phase is not one batched fan-out — it is a small recursive program that runs inside a `repl` block:
-
- **Novelty search via recursive sampling.** Instead of pre-deciding N fixed queries and firing them in parallel, Hetun draws an initial broad sample of the workspace (`ctx` chunks of relevant files / patterns / prior turns), runs a flash sweep over the sample asking "what here is surprising or important?", scores responses by novelty, and recursively zooms into the highest-novelty chunks at finer resolution. The recursion stops when novelty plateaus or a budget is hit. This gives much better coverage than a flat 8-way fan-out for any task where the interesting bits are non-uniformly distributed.
- **Hierarchical narrative tree synthesis.** Findings from the recursion don't get concatenated into a flat list. They get organised into a tree: leaves are individual observations, intermediate nodes are clusters of related findings, the root is the mission goal. The mission card the user approves displays this tree (collapsible, navigable) rather than a wall of bullets — same intuition Sakana uses to make SNS narrative spaces legible at a glance.
- **Multi-detector cross-verification.** Every claim Hetun puts into the mission goes through at least two passes: a flash sweep for obvious errors / contradictions, and a frontier (Pro) check for subtler structural issues. Sakana's framing is "frontier model handles macro structure, specialised models handle fine structure, blind spots cancel". For us that maps to flash for breadth and Pro for depth, with claims that fail either pass marked as low-confidence rather than hidden — the user can choose to verify them manually or drop them from the mission.
- **Hypothesis-verification loop.** The research phase isn't a single round. Hetun forms a working hypothesis from the initial findings, generates verification queries from the hypothesis (e.g. "if X is true, we should also see Y; check for Y"), runs them, and updates the hypothesis. The loop continues until the hypothesis is stable across iterations or the iteration budget (capped low — typically 2–3) is exhausted. This is the same hypothesis-driven investigation rhythm Sakana models on human fact-checkers.
-
-These patterns are not separate features — they are how the Hetun prompt teaches the model to use `repl` blocks. The runtime primitives (`rlm_query_batched`, `ctx` helpers from #53, flash/Pro tiering from #48) are already in place once the rest of the RLM stack ships; Hetun is the prompt + approval layer that wires them together into the recursive-research-and-mission rhythm above.
-
-We do **not** import Sakana's full system: the ABM persona-simulation framework (Shachi) and the misinformation-specific image/video detectors are out of scope. What we adopt is the **research methodology** — recursive novelty sampling, hierarchical synthesis, multi-detector verification, hypothesis loops — applied to the agent-coding domain instead of the misinformation domain.
-
-**Files:** `crates/tui/src/tui/app.rs` (add `AppMode::Hetun`, place it last in the Tab cycle), `crates/tui/src/tui/palette.rs` (add `MODE_HETUN` colour, e.g. purple to distinguish from YOLO red and Plan orange), `crates/tui/src/tui/prompts.rs` (add `HETUN_PROMPT`), `crates/tui/src/prompts/hetun.txt` (the prompt body — must teach the recursive-novelty + hierarchical-synthesis + verification-loop pattern, not just "fan out queries"), `crates/tui/src/core/engine.rs` (mission-level approval hook before RLM execution in Hetun), `crates/tui/src/tui/widgets/header.rs` (mode-badge text reads "Hetun (Plan + Recursive Agents)").
-
-## Vendoring zigrlm
-
-**Tracking issue:** #55
-
-Rather than treating `zigrlm` as an external binary, we will vendor it as a git submodule at `vendor/zigrlm` and build it alongside the Rust project. This lets us:
-
-1. Guarantee the binary exists for contributors and CI
-2. Patch zigrlm for deepseek-tui-specific features (three-tier model routing, custom JS builtins, DeepSeek trace format)
-3. Eventually link it as a C library instead of shelling out
-
-**Build integration:** A `build.rs` in `crates/tui/` invokes `zig build` in `vendor/zigrlm` when `zig` is on PATH. If `zig` is missing, the TUI falls back to existing binary-discovery logic. `ZigrlmRuntimeConfig` prefers the vendored path.
-
-**Files:** `.gitmodules`, `crates/tui/build.rs`, `crates/tui/src/zigrlm_config.rs`, `README.md`, `AGENTS.md`.
-
-## Plan
-
-These ship together as one cohesive RLM landing — Hetun is the flagship that gives the rest a reason to exist on day one, the helper layer is what gives RLM something to do besides toy parallelism, and the auto-config + vendoring make it work without any user setup. The order below is the implementation dependency order, not a staggered release schedule. We just keep shipping.
-
-| Issue | Scope | Files |
-|---|---|---|
-| #48 | **Auto-config.** Build `ZigrlmRuntimeConfig`, binary discovery, config schema. | `crates/tui/src/zigrlm_config.rs` (new), `crates/config/src/lib.rs` |
-| #49 | **Inline primitive.** Detect repl blocks in `handle_deepseek_turn()`, shell out, replace message content, emit RLM events. | `crates/tui/src/core/engine.rs`, `crates/tui/src/zigrlm_runtime.rs` (new), `crates/tui/src/core/events.rs` |
-| #53 | **Helper layer + Python sandbox.** Curated `ctx` helpers + AST-validated Python sandbox baked into the repl runtime. | New helper module (location TBD between zigrlm upstream and `crates/tui/src/zigrlm_runtime/`) |
-| #55 | **Vendor zigrlm.** Add git submodule, build script, prefer vendored path. | `.gitmodules`, `crates/tui/build.rs`, `crates/tui/src/zigrlm_config.rs` |
-| #50 | **Prompt engineering.** Add RLM section to agent/yolo prompts, config toggle, examples that exercise the helper layer. | `crates/tui/src/prompts/agent.txt`, `crates/tui/src/prompts/yolo.txt`, `crates/config/src/lib.rs` |
-| #54 | **Hetun mode.** Add a 4th mode at the end of the cycle, with mission-level approval gate. Plan stays unchanged. | `crates/tui/src/tui/app.rs`, `crates/tui/src/tui/palette.rs`, `crates/tui/src/prompts/hetun.txt`, `crates/tui/src/core/engine.rs`, `crates/tui/src/tui/widgets/header.rs` |
-| #46 | **Explicit bridge.** Implement `ZigrlmTool` spec, register in registry, add to sub-agent allowed lists. | `crates/tui/src/tools/zigrlm.rs` (new), `crates/tui/src/tools/registry.rs`, `crates/tui/src/tools/subagent.rs` |
-
-We diverge from the old #40 plan (building a native Rust repl parser) because `zigrlm` already owns parsing, sandboxing, and trace emission. Reimplementing that in Rust is waste — #53 (the helper layer) is where we add the value that makes the runtime actually usable.
-
-## Non-Goals (deferred)
-
- **Native repl parser in Rust** (#41–#45, all closed). zigrlm's Zig parser is sufficient.
- **Real-time streaming of child progress** into the TUI transcript. Spinner + final summary is enough.
- **Process pool / pre-warming** of zigrlm subprocesses. One fork per repl block is acceptable given flash latency.
- **Replacing `agent_swarm` entirely.** Swarm remains for multi-step autonomous work that requires tools.
- **Automatic migration** of existing swarm task graphs to repl blocks.
- **Windows-specific binary discovery quirks.** macOS / Linux are the priority surfaces.
- **JS sandbox hardening.** Trusted-local-compute model, same posture as the Python sandbox in #53.
- **Three-tier model routing** (frontier escalation inside repl). Requires zigrlm patches; do once vendoring (#52) lands.
- **Native C-library linkage** of zigrlm. Worth doing only after subprocess overhead is shown to be a real bottleneck.
-
---
-
-## Appendix A: ReAct vs. RLM — Why Both?
-
-> A deeper treatment lives in `docs/research-react-vs-rlm.md`. This appendix extracts the decisions that matter for our integration.
-
-**ReAct** (Yao et al., ICLR 2023) is the incumbent paradigm: a linear chain of *Thought → Action (JSON tool call) → Observation → repeat*. The model's entire history of reasoning and tool results is appended to the prompt every turn. It is simple, inspectable, and works well for interactive, stateful tasks (editing files, running shell commands, browsing).
-
-**RLM** (Zhang et al., arXiv:2512.24601) is a tree-structured inference paradigm. The model writes fenced `repl` blocks that manipulate an external REPL variable store and spawn recursive child calls. Because the full context lives in variables, the LM sees only constant-size metadata. This enables:
-
- **Native parallelism** via `rlm_query_batched`
- **10M+ token scale** (two orders of magnitude beyond the base window)
- **Cheap child models** for leaf work while a frontier model handles control
-
-### Comparison Table
-
-| Dimension | ReAct (today) | RLM (proposed) |
-|---|---|---|
-| **Structure** | Linear chain | Tree of recursive calls |
-| **Parallelism** | Sequential (or heavy swarm tasks) | Native batched fan-out (up to 8 concurrent) |
-| **State** | Monolithic prompt scratchpad | External REPL variables |
-| **Tool interface** | JSON schema (`ToolUse`) | Fenced `repl` DSL blocks |
-| **Child cost** | Full agent loop per child | Cheap `deepseek-v4-flash` subprocess calls |
-| **Observability** | Linear transcript | JSONL tree trace (`--trace`) |
-| **Best for** | Interactive, stateful, tool-driven work | Parallel analysis, long context, batch generation |
-
-### The Hybrid Stance
-
-We do not replace ReAct with RLM; we make RLM a first-class *primitive inside* the ReAct loop. The agent still reasons in natural language and calls tools via JSON when it needs interactive side effects. But when it wants to fan out parallel analysis, decompose a large context, or batch-generate, it writes a `repl` block instead of spawning `agent_swarm`. The engine detects the block, runs `zigrlm`, and feeds the aggregated `FINAL` result back as the assistant's answer for that turn.
-
-This maps to the paper's own framing: RLM is the next milestone *after* CoT and ReAct, not a replacement for them.
-
---
-
-## Appendix B: UI/UX Design
-
-### The Core Tension
-
-The Pro model streams its response naturally. The user sees "I'll break this into parallel searches…" and then watches a `\`\`\`repl` fence appear character-by-character. This is **good** — it shows intent and builds trust. But once the fence closes, the engine must pause, fork `zigrlm`, wait for N parallel flash calls to finish, and then present a single coherent answer. The UI must bridge that gap without feeling broken.
-
-### The Solution: Progressive Disclosure via "Thinking Reclassification"
-
-The existing TUI already has the perfect visual language for this: `HistoryCell::Thinking` (`crates/tui/src/tui/history.rs`, lines 1129–1198). Thinking blocks render with a left border (`▏`), a header showing a spinner and duration, collapsible by default, and markdown body. We reuse that pattern exactly.
-
-**The flow:**
-
-1. **Streaming phase** — The Pro model streams its response. The TUI shows it live in `HistoryCell::Assistant { streaming: true }`, exactly as today.
-2. **Detection phase** — After `MessageStop`, the engine detects the repl block and emits `Event::RlmStarted { message_index }`.
-3. **Reclassification** — The engine mutates `session.messages`:
-   - The `ContentBlock::Text` containing the repl fence is moved to a new `ContentBlock::Thinking` block.
-   - A transient `ContentBlock::Text` placeholder is inserted: "*Running RLM tree…*"
-4. **Execution phase** — `zigrlm` runs. The TUI footer shows `RLM ⌀` (sky blue, same family as `working`). The transcript shows the thinking block collapsed with a live spinner: `◦ thinking live`.
-5. **Completion phase** — `zigrlm` returns. The engine replaces the placeholder text with the `FINAL` result and emits `Event::RlmComplete { message_index, usage, duration_ms }`.
-6. **Final render** — The TUI updates:
-   - `ContentBlock::Text` now shows the aggregated result (normal markdown).
-   - `ContentBlock::Thinking` shows the original repl plan, now collapsed and labeled `thinking done · 1.2s`.
-   - A one-line metadata footer is appended: `▸ 3 flash calls · 2.1K tokens · ~$0.003`.
-
-This gives the user **one thoughtful response**: the result is the message, and the repl block is the reasoning behind it — exactly how `Thinking` blocks work today.
-
-### Visual States
-
-| State | Transcript | Footer | Thinking Block |
-|---|---|---|---|
-| **Streaming** | Assistant cell, `streaming: true` | `thinking ⌀` | Not visible yet (model hasn't emitted fence) |
-| **Executing** | Assistant cell, spinner suffix on placeholder | `recursing ⌀` | Collapsed, header reads `◦ recursing live` |
-| **Complete** | Assistant cell, result text | Idle | Collapsed, header reads `◦ recursing done · 1.2s` |
-| **Expanded** | Same | Idle | Expanded, shows full repl DSL with syntax highlighting |
-
-### Why Not a Tool Card?
-
-It is tempting to model RLM execution as a new `ToolCell` variant. We explicitly reject this because RLM is **not a tool call** — it is an inline primitive. Rendering it as a tool card would:
- Break the "single thoughtful response" metaphor
- Train the user to think of RLM as an external action rather than assistant reasoning
- Add visual noise (tool headers, argument summaries, result boxes) for what is essentially accelerated thinking
-
-The `Thinking` block is the right container because the repl block *is* the model's reasoning about how to parallelize. The result is simply the output of that reasoning.
-
-### Keyboard & Detail Views
-
- **`v` on the assistant message** — Opens the `PagerView` (`crates/tui/src/tui/pager.rs`) showing the full message: the original repl block at the top (with `zigrlm` trace path if available), the FINAL result below, and the JSONL tree if the user wants to inspect child calls.
- **`v` on the thinking block** — Toggles collapse/expand inline, same as existing thinking behavior.
- **`Alt+4` (sidebar)** — Future: an RLM panel showing recent RLM executions with call counts, depth, and trace file paths. Deferred to v0.6.
-
-### Footer & Status Indicators
-
-**New footer state** in `crates/tui/src/tui/ui.rs` (`footer_state_label()`, ~line 4022):
-
-```rust
-else if app.active_rlm.is_some() {
-    ("recursing ⌀", Style::default().fg(Color::Sky))
-}
-```
-
-The word *recursing* is playful, accurate, and short enough for the footer. Alternatives considered: `recursive thinking ⌀` (too long), `RLM ⌀` (too opaque). `recursing` wins because it describes what is actually happening — the model is recursively fanning out child calls — and it fits the existing informal voice of the TUI (`thinking ⌀`, `working`, `compacting ⌀`).
-
-**Motion refresh:** The existing `UI_STATUS_ANIMATION_MS` (360 ms) timer already bumps the transcript cache when `history_has_live_motion` is true. We add `app.active_rlm.is_some()` to that check so the spinner animates while `zigrlm` runs.
-
-### Events to Add
-
-**File:** `crates/tui/src/core/events.rs`
-
-```rust
-pub enum Event {
-    // ... existing variants
-    RlmStarted {
-        message_index: usize,
-        estimated_calls: Option<usize>,
-    },
-    RlmComplete {
-        message_index: usize,
-        usage: Usage,
-        duration_ms: u64,
-    },
-    RlmFailed {
-        message_index: usize,
-        error: String,
-    },
-}
-```
-
-### Future: Tree Visualization
-
-`zigrlm` can emit `--trace /path/to/run.jsonl`. In v0.6 we can parse that JSONL and render a tree widget showing:
-
- Root prompt (depth 0)
- Each `rlm_query` / `rlm_query_batched` child (depth 1..N)
- Per-node usage (calls, tokens, cost)
- Duration bars
-
-This would live in the `PagerView` or a dedicated sidebar panel, not in the main transcript. It is a debugging/observability feature, not part of the default conversation flow.
-
-### Cost Accounting
-
-`zigrlm` returns usage metadata per run (`calls`, `input_tokens`, `output_tokens`, `cost_micros`). The engine must fold this into the session's aggregate `Usage` in `crates/tui/src/core/session.rs`. However, there is a subtlety: the **root Pro call** that emitted the `repl` block is already counted as part of the normal assistant-message usage. `zigrlm` then performs its *own* root call (also Pro, because `ZIGRLM_MAIN_CMD` points at the session model) plus N child calls (flash). In practice this means the Pro prompt is billed twice — once by our client for the streaming turn, once by `zigrlm` for the root RLM call. This double-counting is acceptable for the spike, but Phase 2 should explore passing the *already-received* assistant text directly into `zigrlm` without re-billing the root call, or subtracting the overlap from displayed totals.
-
-**Display policy:** Show raw numbers only (`3 flash calls · 2.1K tokens · ~$0.003`). Do **not** attempt to show "savings vs. ReAct" because that is a counterfactual — we cannot know how many Pro turns `agent_swarm` would have needed for the same task. The user can infer the value themselves: one Pro call + eight flash calls is visibly cheaper than five Pro calls.
-
-### Anti-Patterns
-
- **Do not** stream `zigrlm` child progress into the transcript in real time. Flash calls complete in 1–3 seconds; the noise is not worth the signal.
- **Do not** show a modal or full-screen overlay during RLM execution. The user should be able to scroll, read history, and type the next query while `zigrlm` works.
- **Do not** render the raw `[0]\n…\n[1]\n…` batched response format directly. If `FINAL` was missing and we fall back to raw output, strip the indexed prefixes before displaying.
- **Do not** show fake "you saved $X.XX" badges. The comparison baseline is undefined and the math is misleading.
@@ -1,7 +1,7 @@
 {
  "name": "deepseek-tui",
-  "version": "0.5.2",
-  "deepseekBinaryVersion": "0.5.2",
+  "version": "0.6.0",
+  "deepseekBinaryVersion": "0.6.0",
  "description": "Install and run deepseek and deepseek-tui binaries from GitHub release artifacts.",
  "author": "Hmbown",
  "license": "MIT",