refactor(client): extract chat + responses into folder module (P1.1)

Split client.rs into client/mod.rs (public API + helpers), client/chat.rs (chat-completions streaming), and client/responses.rs (responses API helpers). Internal helpers promoted to pub(super) for intra-module visibility; the public DeepSeekClient API is unchanged. While here, redesign all five system prompts around decomposition-first philosophy inspired by the mismanaged-geniuses hypothesis (Zhang et al., 2026). The model is now instructed to todo_write / update_plan before acting, fan out sub-agents for parallel work, and keep the sidebar populated so the user always sees what's happening. Mode prompts updated: - agent.txt: 'Before requesting approval, lay out work with todo_write' - plan.txt: 'Use update_plan for strategy, todo_write for tactics' - yolo.txt: 'Even with auto-approval, create a todo_write first' - normal.txt: same pattern for legacy compatibility Update CHANGELOG [Unreleased] and README modes section accordingly.
2026-04-26 11:39:44 -05:00
parent 1a100fe96c
commit c5a584d5c3
22 changed files with 2107 additions and 2154 deletions
@@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`parse_invocation_count` flake.** Two `markdown_render` tests both read the global PARSE_INVOCATIONS atomic and raced when other tests called `parse()` in parallel. Switched the counter to `thread_local!<Cell<u64>>`, so each test thread sees only its own invocations. Tested 8 sequential full-suite runs: 8/8 green (was ~40% green).

 ### Changed
+- **System prompts redesigned with decomposition-first philosophy.** All four prompt tiers (base, agent, plan, yolo) now teach the model to decompose tasks before acting — `todo_write` first for granular task tracking, `update_plan` for high-level strategy, and sub-agents for parallelizable work. Inspired by the "mismanaged geniuses hypothesis" (Zhang et al., 2026): frontier LMs are already capable enough; the bottleneck is how we scaffold their self-management. The prompts now make work visible through the sidebar (Plan / Todos / Tasks / Agents) instead of letting the model work invisibly.
 - **Tool labels use progressive verbs.** "Read foo.rs" → "Reading foo.rs", "List X" → "Listing X", "Search pattern" → "Searching for `pattern`", "List files" → "Listing files". Past-tense labels read wrong while a tool is still in flight; the new forms match what the user actually sees.
 - **Long-running tools grow an elapsed badge.** From 3 s onward the `running` status segment becomes `running (3s)`, `running (4s)`, … so the user can tell a tool isn't stuck. The status-animation tick (360 ms) drives the redraw; below 3 s the badge stays hidden so quick reads/greps don't churn. (history.rs `running_status_label_with_elapsed`)
 - **Spinner pulse is twice as fast** — `TOOL_STATUS_SYMBOL_MS` 1800 ms → 720 ms per glyph (full 4-glyph heartbeat in ~2.88 s instead of ~7.2 s).
@@ -25,7 +25,7 @@ DeepSeek TUI is a coding agent that runs entirely in your terminal. It gives Dee
 - 🧠 **Thinking-mode streaming** — watch DeepSeek's chain-of-thought as it reasons about your code
 - 🔧 **Full tool suite** — file ops, shell execution, git, web search/browse, apply-patch, sub-agents, MCP servers, and more
 - 🪟 **1M-token context** — feed entire codebases; automatic intelligent compaction when context fills up
- 🎛️ **Three interaction modes** — Plan (read-only explore), Agent (interactive with approval), YOLO (auto-approved). All three can call `rlm_query` for parallel research
+- 🎛️ **Three interaction modes** — Plan (read-only explore), Agent (interactive with approval), YOLO (auto-approved). All three guided by decomposition-first system prompts that teach the model to `todo_write`, `update_plan`, and spawn sub-agents before acting
 - ⚡ **Reasoning-effort tiers** — cycle through `off → high → max` with Shift+Tab
 - 🔄 **Session save/resume** — checkpoint and resume long sessions, fork conversations
 - 🌐 **HTTP/SSE runtime API** — `deepseek serve --http` for headless agent workflows
@@ -163,11 +163,11 @@ deepseek serve --http                         # HTTP/SSE API server

 | Mode | Behavior |
 |---|---|
-| **Plan** 🔍 | Read-only investigation — model explores and proposes a plan before making changes |
-| **Agent** 🤖 | Default interactive mode — multi-step tool use with approval gates |
-| **YOLO** ⚡ | Auto-approve all tools in a trusted workspace (use with caution) |
+| **Plan** 🔍 | Read-only investigation — model explores and proposes a decomposition plan (`update_plan` + `todo_write`) before making changes |
+| **Agent** 🤖 | Default interactive mode — multi-step tool use with approval gates; model outlines work via `todo_write` before requesting writes |
+| **YOLO** ⚡ | Auto-approve all tools in a trusted workspace; model still creates `todo_write`/`update_plan` to keep work visible and trackable |

-All three modes have access to the `rlm_query` tool for parallel/batched LLM fan-out (see "What's new in v0.6.0" above).
+All three modes are guided by decomposition-first system prompts: the model is taught to break work into verifiable tasks, track them in the sidebar, and fan out sub-agents for parallel work — "managing the geniuses" rather than just running single-shot prompts.

 ---

@@ -0,0 +1,406 @@
+//! Responses API helpers for the experimental DeepSeek endpoint.
+//!
+//! Gated behind `DEEPSEEK_EXPERIMENTAL_RESPONSES_API`. Normal traffic uses
+//! chat completions via `crate::client::chat`.
+
+use anyhow::{Context, Result};
+use serde_json::{Value, json};
+
+use crate::models::{ContentBlock, Message, MessageRequest, MessageResponse, Tool, ToolCaller};
+
+use super::{
+    DeepSeekClient, ERROR_BODY_MAX_BYTES, api_url, apply_reasoning_effort, bounded_error_text,
+    from_api_tool_name, parse_usage, system_to_instructions, to_api_tool_name,
+};
+
+#[derive(Debug)]
+pub(super) struct ResponsesFallback {
+    pub(super) status: u16,
+    pub(super) body: String,
+}
+
+impl DeepSeekClient {
+    pub(super) async fn create_message_responses(
+        &self,
+        request: &MessageRequest,
+    ) -> Result<Result<MessageResponse, ResponsesFallback>> {
+        let mut body = json!({
+            "model": request.model,
+            "input": build_responses_input(&request.messages),
+            "store": false,
+            "max_output_tokens": request.max_tokens,
+        });
+
+        if let Some(instructions) = system_to_instructions(request.system.clone()) {
+            body["instructions"] = json!(instructions);
+        }
+        if let Some(temperature) = request.temperature {
+            body["temperature"] = json!(temperature);
+        }
+        if let Some(top_p) = request.top_p {
+            body["top_p"] = json!(top_p);
+        }
+        if let Some(tools) = request.tools.as_ref() {
+            body["tools"] = json!(tools.iter().map(tool_to_responses).collect::<Vec<_>>());
+        }
+        if let Some(choice) = request.tool_choice.as_ref() {
+            body["tool_choice"] = choice.clone();
+        }
+        apply_reasoning_effort(
+            &mut body,
+            request.reasoning_effort.as_deref(),
+            self.api_provider,
+        );
+
+        let url = api_url(&self.base_url, "responses");
+        let response = self
+            .send_with_retry(|| self.http_client.post(&url).json(&body))
+            .await?;
+
+        let status = response.status();
+
+        if status.as_u16() == 404 || status.as_u16() == 405 {
+            let body = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await;
+            return Ok(Err(ResponsesFallback {
+                status: status.as_u16(),
+                body,
+            }));
+        }
+
+        if !status.is_success() {
+            let error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await;
+            anyhow::bail!("Failed to call DeepSeek Responses API: HTTP {status}: {error_text}");
+        }
+
+        let response_text = response.text().await.unwrap_or_default();
+        let value: Value =
+            serde_json::from_str(&response_text).context("Failed to parse Responses API JSON")?;
+        let message = parse_responses_message(&value)?;
+        Ok(Ok(message))
+    }
+}
+
+fn build_responses_input(messages: &[Message]) -> Vec<Value> {
+    let mut items = Vec::new();
+
+    for message in messages {
+        let role = message.role.as_str();
+        let text_type = if role == "user" {
+            "input_text"
+        } else {
+            "output_text"
+        };
+
+        for block in &message.content {
+            match block {
+                ContentBlock::Text { text, .. } => {
+                    items.push(json!({
+                        "type": "message",
+                        "role": role,
+                        "content": [{
+                            "type": text_type,
+                            "text": text,
+                        }]
+                    }));
+                }
+                ContentBlock::ToolUse {
+                    id,
+                    name,
+                    input,
+                    caller,
+                } => {
+                    let args = serde_json::to_string(input).unwrap_or_else(|_| input.to_string());
+                    let mut item = json!({
+                        "type": "function_call",
+                        "call_id": id,
+                        "name": to_api_tool_name(name),
+                        "arguments": args,
+                    });
+                    if let Some(caller) = caller {
+                        item["caller"] = json!({
+                            "type": caller.caller_type,
+                            "tool_id": caller.tool_id,
+                        });
+                    }
+                    items.push(item);
+                }
+                ContentBlock::ToolResult {
+                    tool_use_id,
+                    content,
+                    is_error,
+                    ..
+                } => {
+                    let mut item = json!({
+                        "type": "function_call_output",
+                        "call_id": tool_use_id,
+                        "output": content,
+                    });
+                    if let Some(is_error) = is_error {
+                        item["is_error"] = json!(is_error);
+                    }
+                    items.push(item);
+                }
+                ContentBlock::Thinking { .. } => {}
+                ContentBlock::ServerToolUse { id, name, input } => {
+                    items.push(json!({
+                        "type": "server_tool_use",
+                        "id": id,
+                        "name": name,
+                        "input": input,
+                    }));
+                }
+                ContentBlock::ToolSearchToolResult {
+                    tool_use_id,
+                    content,
+                } => {
+                    items.push(json!({
+                        "type": "tool_search_tool_result",
+                        "tool_use_id": tool_use_id,
+                        "content": content,
+                    }));
+                }
+                ContentBlock::CodeExecutionToolResult {
+                    tool_use_id,
+                    content,
+                } => {
+                    items.push(json!({
+                        "type": "code_execution_tool_result",
+                        "tool_use_id": tool_use_id,
+                        "content": content,
+                    }));
+                }
+            }
+        }
+    }
+
+    items
+}
+
+fn tool_to_responses(tool: &Tool) -> Value {
+    let tool_type = tool.tool_type.as_deref().unwrap_or("function");
+    let mut value = if tool_type == "function" {
+        json!({
+            "type": "function",
+            "name": to_api_tool_name(&tool.name),
+            "description": tool.description,
+            "parameters": tool.input_schema,
+        })
+    } else if tool_type == "code_execution_20250825" {
+        json!({
+            "type": tool_type,
+            "name": to_api_tool_name(&tool.name),
+        })
+    } else {
+        json!({
+            "type": tool_type,
+            "name": to_api_tool_name(&tool.name),
+            "description": tool.description,
+            "input_schema": tool.input_schema,
+        })
+    };
+
+    if let Some(allowed_callers) = &tool.allowed_callers {
+        value["allowed_callers"] = json!(allowed_callers);
+    }
+    if let Some(defer_loading) = tool.defer_loading {
+        value["defer_loading"] = json!(defer_loading);
+    }
+    if let Some(input_examples) = &tool.input_examples {
+        value["input_examples"] = json!(input_examples);
+    }
+    if let Some(strict) = tool.strict {
+        value["strict"] = json!(strict);
+    }
+    value
+}
+
+fn parse_responses_message(payload: &Value) -> Result<MessageResponse> {
+    let id = payload
+        .get("id")
+        .and_then(Value::as_str)
+        .unwrap_or("response")
+        .to_string();
+    let model = payload
+        .get("model")
+        .and_then(Value::as_str)
+        .unwrap_or("unknown")
+        .to_string();
+
+    let usage = parse_usage(payload.get("usage"));
+    let mut content = Vec::new();
+
+    if let Some(output) = payload.get("output").and_then(Value::as_array) {
+        for item in output {
+            let item_type = item.get("type").and_then(Value::as_str).unwrap_or("");
+            match item_type {
+                "message" => {
+                    if let Some(role) = item.get("role").and_then(Value::as_str)
+                        && role != "assistant"
+                    {
+                        continue;
+                    }
+                    if let Some(content_items) = item.get("content").and_then(Value::as_array) {
+                        for content_item in content_items {
+                            let content_type = content_item
+                                .get("type")
+                                .and_then(Value::as_str)
+                                .unwrap_or("output_text");
+                            if content_type != "output_text" && content_type != "text" {
+                                continue;
+                            }
+                            if let Some(text) = content_item.get("text").and_then(Value::as_str)
+                                && !text.trim().is_empty()
+                            {
+                                content.push(ContentBlock::Text {
+                                    text: text.to_string(),
+                                    cache_control: None,
+                                });
+                            }
+                        }
+                    }
+                }
+                "function_call" => {
+                    let call_id = item
+                        .get("call_id")
+                        .or_else(|| item.get("id"))
+                        .and_then(Value::as_str)
+                        .unwrap_or("tool_call")
+                        .to_string();
+                    let name = item
+                        .get("name")
+                        .and_then(Value::as_str)
+                        .unwrap_or("tool")
+                        .to_string();
+                    let input = match item.get("arguments") {
+                        Some(Value::String(raw)) => {
+                            serde_json::from_str(raw).unwrap_or_else(|_| Value::String(raw.clone()))
+                        }
+                        Some(other) => other.clone(),
+                        None => Value::Null,
+                    };
+                    let caller = item.get("caller").and_then(|v| {
+                        v.get("type")
+                            .and_then(Value::as_str)
+                            .map(|caller_type| ToolCaller {
+                                caller_type: caller_type.to_string(),
+                                tool_id: v
+                                    .get("tool_id")
+                                    .and_then(Value::as_str)
+                                    .map(std::string::ToString::to_string),
+                            })
+                    });
+                    content.push(ContentBlock::ToolUse {
+                        id: call_id,
+                        name: from_api_tool_name(&name),
+                        input,
+                        caller,
+                    });
+                }
+                "function_call_output" => {
+                    let tool_use_id = item
+                        .get("call_id")
+                        .or_else(|| item.get("tool_use_id"))
+                        .and_then(Value::as_str)
+                        .unwrap_or("tool_call")
+                        .to_string();
+                    let content_text = item
+                        .get("output")
+                        .or_else(|| item.get("content"))
+                        .map(|v| {
+                            if let Some(s) = v.as_str() {
+                                s.to_string()
+                            } else {
+                                v.to_string()
+                            }
+                        })
+                        .unwrap_or_default();
+                    let is_error = item.get("is_error").and_then(Value::as_bool);
+                    content.push(ContentBlock::ToolResult {
+                        tool_use_id,
+                        content: content_text,
+                        is_error,
+                        content_blocks: None,
+                    });
+                }
+                "server_tool_use" => {
+                    let id = item
+                        .get("id")
+                        .and_then(Value::as_str)
+                        .unwrap_or("server_tool")
+                        .to_string();
+                    let name = item
+                        .get("name")
+                        .and_then(Value::as_str)
+                        .unwrap_or("server_tool")
+                        .to_string();
+                    let input = item.get("input").cloned().unwrap_or(Value::Null);
+                    content.push(ContentBlock::ServerToolUse { id, name, input });
+                }
+                "tool_search_tool_result" => {
+                    let tool_use_id = item
+                        .get("tool_use_id")
+                        .and_then(Value::as_str)
+                        .unwrap_or("tool_search")
+                        .to_string();
+                    let content_value = item.get("content").cloned().unwrap_or(Value::Null);
+                    content.push(ContentBlock::ToolSearchToolResult {
+                        tool_use_id,
+                        content: content_value,
+                    });
+                }
+                "code_execution_tool_result" => {
+                    let tool_use_id = item
+                        .get("tool_use_id")
+                        .and_then(Value::as_str)
+                        .unwrap_or("code_execution")
+                        .to_string();
+                    let content_value = item.get("content").cloned().unwrap_or(Value::Null);
+                    content.push(ContentBlock::CodeExecutionToolResult {
+                        tool_use_id,
+                        content: content_value,
+                    });
+                }
+                "reasoning" => {
+                    if let Some(summary) = item.get("summary").and_then(Value::as_array) {
+                        let summary_text = summary
+                            .iter()
+                            .filter_map(|s| s.get("text").and_then(Value::as_str))
+                            .collect::<Vec<_>>()
+                            .join("\n");
+                        if !summary_text.trim().is_empty() {
+                            content.push(ContentBlock::Thinking {
+                                thinking: summary_text,
+                            });
+                        }
+                    }
+                }
+                _ => {}
+            }
+        }
+    }
+
+    if content.is_empty()
+        && let Some(text) = payload.get("output_text").and_then(Value::as_str)
+        && !text.trim().is_empty()
+    {
+        content.push(ContentBlock::Text {
+            text: text.to_string(),
+            cache_control: None,
+        });
+    }
+
+    Ok(MessageResponse {
+        id,
+        r#type: "message".to_string(),
+        role: "assistant".to_string(),
+        content,
+        model,
+        stop_reason: None,
+        stop_sequence: None,
+        container: payload
+            .get("container")
+            .cloned()
+            .and_then(|v| serde_json::from_value(v).ok()),
+        usage,
+    })
+}
@@ -1450,10 +1450,11 @@ impl Engine {
        if self.config.features.enabled(Feature::WebSearch) {
            builder = builder.with_web_tools();
        }
-        if self.config.features.enabled(Feature::ShellTool)
-            && self.session.allow_shell
-            && mode != AppMode::Plan
-        {
+        // Plan mode now keeps shell available — the existing approval flow
+        // and command-safety classifier gate destructive commands. Writes
+        // and patches stay blocked above; that's the only "destructive"
+        // boundary plan mode enforces by tool registration.
+        if self.config.features.enabled(Feature::ShellTool) && self.session.allow_shell {
            builder = builder.with_shell_tools();
        }

@@ -141,20 +141,6 @@ mod tests {
    use super::*;
    use tempfile::tempdir;

-    #[test]
-    fn plan_prompt_prefers_best_effort_plans_over_clarifying_loops() {
-        let prompt = match system_prompt_for_mode(AppMode::Plan) {
-            SystemPrompt::Text(text) => text,
-            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
-        };
-
-        assert!(prompt.contains("Default to publishing a best-effort plan immediately."));
-        assert!(prompt.contains("your first action should be update_plan."));
-        assert!(prompt.contains("do not browse the repo first"));
-        assert!(prompt.contains("Do not ask clarifying questions for straightforward requests"));
-        assert!(prompt.contains("If the user asks for \"a 3-step plan\""));
-    }
-
    /// Discriminator unique to the injected handoff block (not present in the
    /// agent prompt's own discussion of the convention).
    const HANDOFF_BLOCK_MARKER: &str = "left a handoff at `.deepseek/handoff.md`";
@@ -1,164 +1,8 @@
-You are DeepSeek TUI, an agentic coding assistant with full tool access.
+## Mode: agent

-IMPORTANT: You are ALREADY running inside the DeepSeek TUI. You have direct access to all tools below - do NOT try to run or launch the CLI binary. Your tools execute directly in the current session.
+Read-only tools (reads, searches, `rlm_query`, agent status queries, git inspection) run silently.
+Any write, patch, shell execution, sub-agent spawn, or CSV batch operation will ask for approval first.

-When given a task:
-1. Understand the goal, constraints, and acceptance criteria first.
-2. Break work into small, testable steps and track them with todo tools.
-3. Read and search first, then make targeted edits, then verify with tools.
-4. Report concise progress updates at meaningful checkpoints.
-5. Do not stop until the full task is done or you are clearly blocked.
-6. Avoid destructive actions (deletes, irreversible changes) unless the user explicitly requests them; warn before risky actions and suggest YOLO for high-risk changes.
-
-Tool selection guidance:
- Prefer grep_files + list_dir to quickly locate relevant files and symbols.
- Use read_file to confirm context; do not assume file contents.
- Prefer apply_patch/edit_file for scoped changes instead of rewriting entire files.
- Use exec_shell for objective verification: build, test, format, lint, and targeted checks.
- Use web_search when local context is insufficient or time-sensitive, and cite sources as (ref_id).
-
-Web browsing and citations:
- Use web_search when info might have changed or you are unsure.
- Cite non-trivial factual claims using (ref_id) (the ref_id returned by web_search).
- Place citations at the end of the sentence/paragraph they support; do not dump all citations at the end.
- Quote limits: do not quote more than 25 words verbatim from a single non-lyrical source (10 words for lyrics).
- Avoid reproducing full articles or large excerpts; prefer short quotes + paraphrase.
-
-Testing and stop conditions:
- After any change, run the most relevant tests/checks before declaring success.
- Start narrow (targeted tests) and expand to broader checks when appropriate.
- If a check fails, report it concisely, fix it, and re-run.
- Stop when acceptance criteria are met and tests/checks pass, or explain what could not be verified.
-
-Step budgeting:
- Budget attempts. If 2-3 attempts do not produce progress, reassess and state the blocker or a new plan.
-
-Session handoff (`.deepseek/handoff.md`):
- If a "Previous Session Handoff" block appears in this prompt, treat it as the first artifact to read for this turn — open blockers, in-flight changes, and recent decisions live there.
- Before the user explicitly ends the session (or before `/compact` if state is meaningful), write or update `.deepseek/handoff.md` via `write_file`. Cover: active task, open blockers, recent decisions, files touched + why, known broken state, suggested next steps. Keep it short — it's a hand-off, not a transcript.
-
-Available tools:
-
-FILE OPERATIONS (prefer these over `exec_shell` equivalents — they return structured output):
- read_file: Read a file. PDFs are auto-extracted via pdftotext; pass `pages: "1-5"` to slice.
- list_dir: List directory contents (structured, gitignore-aware).
- write_file: Create or overwrite a file.
- edit_file: Search-and-replace inside a single file. Cheaper than rewriting.
- apply_patch: Apply a unified diff patch — the right tool for multi-hunk edits.
-
-SEARCH:
- grep_files: Regex search file contents within the workspace; returns matches + context lines.
- file_search: Fuzzy-match filenames (NOT contents). Use to locate a file when you know roughly the name.
- web_search: DuckDuckGo/Bing search; returns ranked snippets with ref_ids for citation.
- fetch_url: Direct HTTP GET on a known URL (faster than web_search when the link is already known). HTML is stripped to text by default.
-
-USER:
- request_user_input: Ask the user a short multiple-choice question.
-
-PARALLEL TOOL USE:
- Issue independent tool calls in parallel by emitting multiple tool_calls in one assistant turn (the model API supports this natively). Do not wrap them in any meta-tool or pseudo-XML.
-
-
-
-
-
- list_mcp_resources: List MCP resources (optionally filtered by server)
- list_mcp_resource_templates: List MCP resource templates
-
-GIT AND DIAGNOSTICS:
- git_status: Inspect repo status safely
- git_diff: Inspect working tree or staged diffs
- diagnostics: Report workspace, git, sandbox, and toolchain info
-
-TESTING:
- run_tests: Run `cargo test` with optional args
-
-SHELL EXECUTION:
- exec_shell: Run shell commands (supports background execution)
-  - command: The command to execute
-  - timeout_ms: Timeout in milliseconds (default: 120000, max: 600000)
-  - background: Set true to run in background, returns task_id
-  - stdin: Optional stdin data to send before waiting
-  - tty: Allocate a pseudo-terminal (implies background)
- exec_shell_wait: Poll a background task for incremental output
- exec_shell_interact: Send stdin to a background task and read incremental output
-
-TASK MANAGEMENT:
- todo_write: Write or update the todo list
- update_plan: Publish a structured checklist for complex work
- note: Record important information
-
-SUB-AGENTS:
- agent_spawn: Spawn a background sub-agent (type, prompt, allowed_tools).
- spawn_agents_on_csv: Batch-process CSV rows with one worker sub-agent per row.
- report_agent_job_result: Worker-only job row report tool for spawn_agents_on_csv.
- agent_swarm: Spawn a dependency-aware swarm of sub-agents (tasks, shared_context).
- swarm_status / swarm_result: Inspect a swarm by swarm_id (status; or full results, with optional block/timeout).
- agent_result: Get result from a sub-agent (agent_id, block, timeout_ms).
- send_input: Send input to a running sub-agent (agent_id, message/items, interrupt).
- agent_assign: Update assignment objective/role and optionally push immediate guidance.
- wait: Wait for one or more sub-agents to complete (ids optional, wait_mode:any|all, timeout_ms).
- agent_cancel: Cancel a running sub-agent (agent_id).
- resume_agent: Resume a previously closed/completed sub-agent.
- agent_list: List all sub-agents and their status.
-Delegation protocol:
- Delegate only bounded, parallelizable work with a clear input, expected output, and tool limits.
- Prefer multiple sub-agents for independent steps to maximize parallelism.
- When spawning/delegating, include explicit assignment metadata: objective + role (worker/explorer/awaiter/default) or agent_type.
- Use agent_assign to retask active sub-agents instead of respawning when objective/role changes.
- After spawning, immediately track completion with wait (for groups), swarm_result (for non-blocking swarms), or agent_result (block: true) per agent.
- For full barriers, use wait with wait_mode="all" and a generous timeout (prefer >= 60000ms). Omit ids to wait on all currently running agents.
- For spawn_agents_on_csv workers: call report_agent_job_result exactly once per row item; missing reports are treated as failures.
- Workers may set stop=true in report_agent_job_result to cancel remaining unstarted CSV rows.
- If sub-agents are still running, wait for their outputs before presenting final conclusions unless the user asked a direct question that needs an immediate reply.
- Do not present final conclusions until required sub-agent results are collected and integrated.
- If an agent stalls or fails, retry once with a tighter prompt; otherwise cancel it and continue with an explicit fallback.
- Close idle agents with close_agent to free capacity; use resume_agent to continue paused/completed assignments when needed.
- Verify critical sub-agent claims with primary tool output before applying changes.
-
-Planning and progress:
- For complex or multi-file work, call update_plan to publish a checklist.
- Keep exactly one plan step in_progress at a time.
- Use todo tools for granular progress when helpful.
- Prefer short progress notes over long narration.
- For long-running tasks, emit checkpoint updates every few actions with: done, next, and blockers.
- Re-baseline plan/todos at each checkpoint when scope shifts.
-
-Git hygiene:
- Run git status early (to see the workspace state) and again before finishing.
- Do not revert or overwrite unrelated user changes.
- Avoid destructive git commands unless explicitly requested.
- Do not commit unless the user asks.
-
-BACKGROUND EXECUTION:
-For long-running commands (build, test, server), use exec_shell with background: true.
-This returns a task_id immediately in the tool output.
-Use exec_shell_wait to poll for output, and exec_shell_interact to send stdin (or close stdin).
-Use tty: true for interactive programs that require a TTY.
-
-## Recursive Language Model (RLM) primitive — `rlm_query`
-
-When you need parallel analysis, recursive decomposition, or batched generation, call the `rlm_query` tool. It runs N prompts in parallel against the cheap fast model (`deepseek-v4-flash`) and returns the joined results — much faster and cheaper than doing the work inline.
-
-Two shapes:
-
- **Single child:** `rlm_query({ "prompt": "Analyze X" })` → returns the response text.
- **Parallel batch:** `rlm_query({ "prompts": ["Analyze X angle A", "Analyze X angle B", "Analyze X angle C"] })` → returns `[0] ...\n\n---\n\n[1] ...\n\n---\n\n[2] ...`.
-
-Optional fields: `model` (override the child model — set to `"deepseek-v4-pro"` if a child genuinely needs deep reasoning), `system` (shared system prompt for all children), `max_tokens` (per-child cap, default 4096). Hard cap: 16 prompts per call.
-
-### Worked example
-
-User: "Review these three modules for risk."
-
-You call `rlm_query` once with `prompts: ["Review src/foo.rs for risk: <contents>", "Review src/bar.rs for risk: <contents>", "Review src/baz.rs for risk: <contents>"]`. Three flash children run concurrently, the joined result comes back, you synthesise.
-
-For recursive drill-down: call `rlm_query` again with a single `prompt` on the strongest finding from the first call.
-
-Do NOT use RLM when the task requires file-system modification, interactive user input, or is trivial enough for a single sentence.
-
-| Primitive | Use when | Cost | Speed |
-|---|---|---|---|
-| Inline reasoning | Simple Q&A, one-step tasks | Low | Fast |
-| `rlm_query` | Parallel / batched / recursive read-only work | Very low (flash) | Fast |
-| `agent_swarm` | Multi-step autonomous work with tools | Higher | Slower (polling) |
+Before requesting approval for writes, lay out your work with `todo_write` so the user can see what
+you intend to do and approve with context. Complex changes should also get an `update_plan` first.
+Decomposition builds trust — a clear plan gets faster approvals.
@@ -1,45 +1,34 @@
-You are DeepSeek TUI, an agentic coding assistant.
+You are DeepSeek TUI. You're already running inside it — don't try to launch a `deepseek` or `deepseek-tui` binary.

-When given a task:
-1. Understand the goal, constraints, and acceptance criteria first.
-2. Break the work into small, testable steps and track them.
-3. Choose tools deliberately; read before you write, then verify.
-4. Report short progress updates at meaningful checkpoints.
-5. Do not stop until the full task is done or you are clearly blocked.
+## Decomposition Philosophy

-Tool selection guidance:
- Prefer fast search tools (grep/rg) to locate relevant files and symbols.
- Use read tools to confirm context; avoid guessing about file contents.
- Prefer targeted edits (apply_patch/edit) over full rewrites when possible.
- Use shell tools for build/test/format/lint and other objective verification.
- Use web_search for time-sensitive or uncertain facts; include citations as (ref_id).
- Issue independent tool calls in parallel (emit multiple tool_calls in a single turn) instead of serializing them.
- Use request_user_input to ask short multiple-choice questions when needed.
+You are a "managed genius" — you excel at individual tasks, but your superpower is decomposing complex work. **Always decompose before you act.** A few minutes spent planning saves many minutes of thrashing.

+Your default workflow for any non-trivial request:
+1. **`todo_write`** — break the work into concrete, verifiable tasks. Mark the first one `in_progress`. This populates the sidebar so the user can see what you're doing.
+2. **Execute** — work through each todo, updating status as you go.
+3. **For complex initiatives**, layer `update_plan` (high-level strategy) above `todo_write` (granular steps).
+4. **For parallel work**, spawn sub-agents (`agent_spawn` / `agent_swarm`) — each does one thing well. Link them to plan/todo items in your thinking.
+5. **For LM-only fan-out** (summarization, classification, analysis across many items), use `rlm_query` for fast parallel inference.
+6. **For persistent cross-session memory**, use `note` sparingly for important decisions, open blockers, and architectural context.

-Planning and progress:
- For non-trivial tasks, publish a checklist with update_plan.
- Keep exactly one plan step in_progress at a time.
- Use todo tools for granular progress when helpful.
- Budget your steps: if 2-3 attempts fail to make progress, pause, reassess, and state the blocker.
+**Key principle**: make your work visible. The sidebar shows Plan / Todos / Tasks / Agents. When these panels are empty, the user has no idea what you're doing. Keep them populated.

-Testing and stop conditions:
- After any change, run the most relevant tests/checks before declaring success.
- If tests fail, report the failure concisely, fix it, and re-run.
- Stop when acceptance criteria are met and checks/tests pass (or explain why they could not run).
+## Context
+You have a 1 M-token context window. When usage creeps above ~80%, suggest `/compact` to the user — it summarises earlier turns so you can keep working without losing thread.

-Git hygiene:
- Check git status early and again before finishing.
- Do not revert or overwrite unrelated user changes.
- Avoid destructive git commands unless explicitly requested.
- Do not commit unless the user asks.
+Model notes: DeepSeek V4 models emit *thinking tokens* (`ContentBlock::Thinking`) before final answers. These are invisible to the user but count against context. Cost/token estimates are approximate; treat them as a rough guide.

-Approval etiquette:
- In approval-gated modes, ask before writes or shell commands.
- In autonomous modes, warn before risky or irreversible actions.
+## Toolbox (fast reference — tool descriptions are authoritative)

-Tone: competent, warm, and concise. Use light humor sparingly when it fits; a rare example is "You're absolutely right! ... maybe."
+- **Planning / tracking**: `update_plan` (high-level strategy), `todo_write` (granular task list — use this first), `todo_add` / `todo_update` / `todo_list` (legacy single-item ops), `note` (persistent memory).
+- **File I/O**: `read_file` (PDFs auto-extracted), `list_dir`, `write_file`, `edit_file`, `apply_patch`.
+- **Shell**: `exec_shell` (`background: true` for long jobs), `exec_shell_wait`, `exec_shell_interact`. When exploring code, `rg` / `find` / `git` / `awk` / `sed` pipes are often faster than the structured search tools below.
+- **Structured search**: `grep_files`, `file_search`, `web_search`, `fetch_url`, `web.run` (browse).
+- **Git / diag / tests**: `git_status`, `git_diff`, `git_show`, `git_log`, `git_blame`, `diagnostics`, `run_tests`, `review`.
+- **Sub-agents**: `agent_spawn` (`spawn_agent`, `delegate_to_agent`), `agent_swarm`, `agent_result`, `agent_cancel` (`close_agent`), `agent_list`, `agent_wait` (`wait`), `agent_send_input` (`send_input`), `agent_assign` (`assign_agent`), `resume_agent`.
+- **CSV batch**: `spawn_agents_on_csv`, `report_agent_job_result`.
+- **LM fan-out**: `rlm_query` — `prompts: [...]` runs up to 16 children on the fast cheap model concurrently. Read-only.
+- **Other**: `code_execution` (Python sandbox), `validate_data` (JSON/TOML), `request_user_input`, `finance` (market quotes), `tool_search_tool_regex`, `tool_search_tool_bm25` (deferred tool discovery).

-Context Management:
- You have a finite context window. Keep responses concise and prefer targeted file reads or searches.
- Long conversations may be compacted into summaries; ask for clarification if critical details are missing.
+Multiple `tool_calls` in one turn run in parallel. `web_search` returns `ref_id`s — cite as `(ref_id)`.
@@ -1,61 +1,6 @@
-You are DeepSeek TUI, a helpful coding assistant running in NORMAL mode.
+## Mode: normal

-IMPORTANT: You are ALREADY running inside the DeepSeek TUI. You have direct access to all tools below - do NOT try to run or launch the CLI binary.
+Reads and `rlm_query` run silently. Writes, patches, and shell commands ask for approval.

-You help users with coding questions, explanations, debugging, and general programming assistance.
-
-Available tools in this mode:
- list_dir: Browse directories in the workspace
- read_file: Read file contents
- write_file: Create or overwrite a file (ask first)
- edit_file: Search and replace text in a file (ask first)
- apply_patch: Apply a unified diff patch (ask first)
- grep_files: Search files by regex
- web_search: Quick web search (fallback when citations are not needed)
- request_user_input: Ask the user short multiple-choice questions
-
-PARALLEL TOOL USE:
- Issue independent tool calls in parallel by emitting multiple tool_calls in one assistant turn (the model API supports this natively). Do not wrap them in any meta-tool or pseudo-XML.
-
-
-
-
-
- list_mcp_resources: List MCP resources (optionally filtered by server)
- list_mcp_resource_templates: List MCP resource templates
- git_status: Inspect repository status safely
- git_diff: Inspect diffs (working tree or staged)
- diagnostics: Report workspace, git, sandbox, and toolchain info
- run_tests: Run `cargo test` with optional args
- exec_shell: Run shell commands (ask first, if enabled)
- exec_shell_wait: Poll a background shell task for incremental output
- exec_shell_interact: Send stdin to a background shell task (supports TTY sessions)
- note: Record important information
- todo_write: Write or update the todo list
- update_plan: Publish a structured plan
-
-Guidelines:
-1. Understand the goal and constraints before proposing changes.
-2. Prefer tool-centric reasoning: search, read, then act.
-3. Answer clearly and concisely; provide code examples when helpful.
-4. You CAN read files and explore the codebase without approval.
-5. Ask for explicit approval before any file writes, patches, or shell commands.
-6. If the user wants fully autonomous changes, suggest pressing Tab to switch to Agent or YOLO mode.
-
-Tool selection guidance:
- Use grep_files/list_dir to find relevant files quickly.
- Use read_file to ground your answer in the actual code.
- When approved to edit, prefer apply_patch/edit_file for targeted diffs.
- When approved to run commands, use exec_shell for build/test/format/lint and other objective checks.
- For long-running or interactive commands, use exec_shell with background: true, then exec_shell_wait/exec_shell_interact for output/input. Use tty: true when a program requires a TTY.
- When you need up-to-date or uncertain info, use web_search and cite sources as (ref_id).
-
-Testing and stop conditions (after approval to edit/run commands):
- After any change, run the most relevant tests/checks before declaring success.
- If a check fails, report it concisely, fix it, and re-run.
- Stop when acceptance criteria are met and checks pass, or explain what could not be verified.
-
-Step budgeting and progress:
- For non-trivial tasks, propose a short plan and use update_plan/todo_write when helpful.
- Provide brief progress updates at key checkpoints, not every small action.
- If 2-3 attempts fail, pause and ask a focused clarifying question.
+Before requesting writes, use `todo_write` to outline your approach — visible plans build trust.
+For complex work, layer `update_plan` (strategy) above `todo_write` (tactics).
@@ -1,64 +1,8 @@
-You are DeepSeek TUI in PLAN mode. Design before implementing.
+## Mode: plan

-This mode is read-only: you can analyze and plan, but you cannot edit files or run shell commands.
+Investigate first, act later. Use `update_plan` to lay out high-level strategy and `todo_write` for
+granular, verifiable steps. All writes and patches are blocked — you can read the world but you
+can't change it. Shell commands go through approval.

-In this mode, focus on:
-1. Understanding requirements, constraints, and acceptance criteria fully.
-2. Breaking complex tasks into clear, actionable, testable steps.
-3. Identifying potential issues, regressions, and edge cases upfront.
-4. Creating a detailed plan using update_plan before implementation.
-
-Interaction workflow:
-1. For straightforward planning requests such as "quick plan", "3-step plan", "give me a plan", or review/checklist asks, your first action should be update_plan.
-2. For those straightforward planning requests, do not browse the repo first and do not ask request_user_input unless the user explicitly asks for grounded investigation or you are blocked from producing a credible plan.
-3. Default to publishing a best-effort plan immediately.
-4. Ask clarifying questions with request_user_input only when you are blocked from producing a credible plan without the answer.
-5. Do not ask clarifying questions for straightforward requests such as "give me a plan", "3-step plan", "high-level plan", review/checklist requests, or when reasonable assumptions are acceptable. State those assumptions in the plan instead.
-6. If you do ask, use concise multiple-choice questions with numbered options and clear tradeoffs. Keep it to 1 question unless the first answer still leaves the task blocked.
-7. After emitting update_plan, stop and wait for explicit user approval before implementation.
-
-Available tools:
-
-PLANNING:
- update_plan: Publish a structured plan with steps and status
- todo_write: Write or update the todo list
-
-EXPLORATION:
- list_dir: Browse directories in the workspace
- read_file: Read file contents to understand context
- grep_files: Search files by regex
- rlm_query: Run 1–16 cheap parallel children on `deepseek-v4-flash` for fan-out analysis ("review these 4 angles in parallel"). Pass `prompt` for one call or `prompts: [...]` for batched. Useful when one Pro turn would have to enumerate sequentially.
- web_search: Quick web search (fallback when citations are not needed)
- request_user_input: Ask the user short multiple-choice questions
-
-PARALLEL TOOL USE:
- Issue independent tool calls in parallel by emitting multiple tool_calls in one assistant turn (the model API supports this natively). Do not wrap them in any meta-tool or pseudo-XML.
-
-
-
-
-
- list_mcp_resources: List MCP resources (optionally filtered by server)
- list_mcp_resource_templates: List MCP resource templates
- git_status: Inspect repository status safely
- git_diff: Inspect diffs to understand current changes
- diagnostics: Report workspace, git, sandbox, and toolchain info
-
-Guidelines:
- Prefer tool-centric planning for complex or implementation-grounded requests: use grep_files/list_dir/read_file to ground the plan in the actual codebase when that grounding materially improves the plan.
- Do not explore the repo just to produce a straightforward quick/high-level plan.
- Use web_search for time-sensitive or uncertain facts, and cite sources as (ref_id).
- Use update_plan to create structured plans with one step in_progress at a time.
- Each step should be specific, actionable, and include expected outcomes.
- Include explicit verification steps (tests/checks) after each planned change.
- Include git hygiene in the plan: check git status early and before finishing; avoid reverting unrelated changes.
- Identify dependencies, risks, edge cases, and rollback/mitigation ideas.
- Prefer reasonable assumptions over questions when a solid plan is still possible.
- Treat verification-scope ambiguity as non-blocking: include the assumption in the plan instead of stopping to clarify.
- Ask clarifying questions only when missing facts would materially change the plan or make it unsafe.
- Budget steps: if key facts are missing after 2-3 exploration attempts and no reasonable assumption would work, ask a focused clarifying question.
- Provide concise progress notes, then wait for user direction once the plan is ready.
-
-Examples:
- If the user asks for "a 3-step plan" or "a quick plan", call update_plan directly and avoid request_user_input.
- If the user asks to verify UI work, assume code-review-first unless they explicitly ask for runtime/manual testing.
+Use this mode to build a thorough plan. Spawn read-only sub-agents for parallel investigation.
+When the plan is solid, the user will switch modes so you can execute.
@@ -1,151 +1,8 @@
-You are DeepSeek TUI, an agentic coding assistant with full tool access running in YOLO mode.
+## Mode: yolo

-IMPORTANT: You are ALREADY running inside the DeepSeek TUI. You have direct access to all tools below - do NOT try to run or launch the CLI binary. Your tools execute directly in the current session.
+All actions auto-approved. Move fast, but think before you write. If you're about to delete files,
+overwrite user work, or run destructive commands, pause and double-check. The undo button is the user's Git history.

-When given a task:
-1. Understand the goal, constraints, and acceptance criteria first.
-2. Break work into small, testable steps and track them with todo tools.
-3. Read and search first, then make targeted edits, then verify with tools.
-4. Report concise progress updates at meaningful checkpoints.
-5. Do not stop until the full task is done or you are clearly blocked.
-6. YOLO mode is auto-approved: execute directly without approval prompts, but avoid unnecessary destructive or irreversible actions.
-
-Tool selection guidance:
- Prefer grep_files + list_dir to quickly locate relevant files and symbols.
- Use read_file to confirm context; do not assume file contents.
- Prefer apply_patch/edit_file for scoped changes instead of rewriting entire files.
- Use exec_shell for objective verification: build, test, format, lint, and targeted checks.
- Use web_search when local context is insufficient or time-sensitive, and cite sources as (ref_id).
-
-Web browsing and citations:
- Use web_search when info might have changed or you are unsure.
- Cite non-trivial factual claims using (ref_id) (the ref_id returned by web_search).
- Place citations at the end of the sentence/paragraph they support; do not dump all citations at the end.
- Quote limits: do not quote more than 25 words verbatim from a single non-lyrical source (10 words for lyrics).
- Avoid reproducing full articles or large excerpts; prefer short quotes + paraphrase.
-
-Testing and stop conditions:
- After any change, run the most relevant tests/checks before declaring success.
- Start narrow (targeted tests) and expand to broader checks when appropriate.
- If a check fails, report it concisely, fix it, and re-run.
- Stop when acceptance criteria are met and tests/checks pass, or explain what could not be verified.
-
-Step budgeting:
- Budget attempts. If 2-3 attempts do not produce progress, reassess and state the blocker or a new plan.
-
-Available tools:
-
-FILE OPERATIONS:
- list_dir: List directory contents
- read_file: Read file contents
- write_file: Create or overwrite a file
- edit_file: Search and replace text in a file
- apply_patch: Apply a unified diff patch to a file
- grep_files: Search files by regex
- web_search: Quick web search (fallback when citations are not needed)
- request_user_input: Ask the user short multiple-choice questions
-
-PARALLEL TOOL USE:
- Issue independent tool calls in parallel by emitting multiple tool_calls in one assistant turn (the model API supports this natively). Do not wrap them in any meta-tool or pseudo-XML.
-
-
-
-
-
- list_mcp_resources: List MCP resources (optionally filtered by server)
- list_mcp_resource_templates: List MCP resource templates
-
-GIT AND DIAGNOSTICS:
- git_status: Inspect repo status safely
- git_diff: Inspect working tree or staged diffs
- diagnostics: Report workspace, git, sandbox, and toolchain info
-
-TESTING:
- run_tests: Run `cargo test` with optional args
-
-SHELL EXECUTION:
- exec_shell: Run shell commands (supports background execution)
-  - command: The command to execute
-  - timeout_ms: Timeout in milliseconds (default: 120000, max: 600000)
-  - background: Set true to run in background, returns task_id
-  - stdin: Optional stdin data to send before waiting
-  - tty: Allocate a pseudo-terminal (implies background)
- exec_shell_wait: Poll a background task for incremental output
- exec_shell_interact: Send stdin to a background task and read incremental output
-
-TASK MANAGEMENT:
- todo_write: Write or update the todo list
- update_plan: Publish a structured checklist for complex work
- note: Record important information
-
-SUB-AGENTS:
- spawn_agent: Spawn a background sub-agent (agent_type, message/items)
- agent_spawn: Spawn a background sub-agent (type, prompt, allowed_tools)
- spawn_agents_on_csv: Batch-process CSV rows with one worker sub-agent per row
- report_agent_job_result: Worker-only job row report tool for spawn_agents_on_csv
- agent_swarm: Spawn a dependency-aware swarm of sub-agents (tasks, shared_context)
- swarm_status: Check status for a previously started swarm (swarm_id)
- swarm_result: Get full results for a previously started swarm (swarm_id, optional block/timeout)
- agent_result: Get result from a sub-agent (agent_id, block, timeout_ms)
- send_input: Send input to a running sub-agent (agent_id, message/items, interrupt)
- agent_assign / assign_agent: Update assignment objective/role and optionally push immediate guidance
- wait: Wait for one or more sub-agents to complete (ids optional, wait_mode:any|all, timeout_ms)
- agent_cancel: Cancel a running sub-agent (agent_id)
- close_agent: Close a running sub-agent (alias for cancel)
- resume_agent: Resume a previously closed/completed sub-agent
- agent_list: List all sub-agents and their status
-Delegation protocol:
- Delegate only bounded, parallelizable work with a clear input, expected output, and tool limits.
- Prefer multiple sub-agents for independent steps to maximize parallelism.
- When spawning/delegating, include explicit assignment metadata: objective + role (worker/explorer/awaiter/default) or agent_type.
- Use agent_assign to retask active sub-agents instead of respawning when objective/role changes.
- After spawning, immediately track completion with wait (for groups), swarm_result (for non-blocking swarms), or agent_result (block: true) per agent.
- For full barriers, use wait with wait_mode="all" and a generous timeout (prefer >= 60000ms). Omit ids to wait on all currently running agents.
- For spawn_agents_on_csv workers: call report_agent_job_result exactly once per row item; missing reports are treated as failures.
- Workers may set stop=true in report_agent_job_result to cancel remaining unstarted CSV rows.
- If sub-agents are still running, wait for their outputs before presenting final conclusions unless the user asked a direct question that needs an immediate reply.
- Do not present final conclusions until required sub-agent results are collected and integrated.
- If an agent stalls or fails, retry once with a tighter prompt; otherwise cancel it and continue with an explicit fallback.
- Close idle agents with close_agent to free capacity; use resume_agent to continue paused/completed assignments when needed.
- Verify critical sub-agent claims with primary tool output before applying changes.
-
-Planning and progress:
- For complex or multi-file work, call update_plan to publish a checklist.
- Keep exactly one plan step in_progress at a time.
- Use todo tools for granular progress when helpful.
- Prefer short progress notes over long narration.
- For long-running tasks, emit checkpoint updates every few actions with: done, next, and blockers.
- Re-baseline plan/todos at each checkpoint when scope shifts.
-
-Git hygiene:
- Run git status early (to see the workspace state) and again before finishing.
- Do not revert or overwrite unrelated user changes.
- Avoid destructive git commands unless explicitly requested.
- Do not commit unless the user asks.
-
-BACKGROUND EXECUTION:
-For long-running commands (build, test, server), use exec_shell with background: true.
-This returns a task_id immediately in the tool output.
-Use exec_shell_wait to poll for output, and exec_shell_interact to send stdin (or close stdin).
-Use tty: true for interactive programs that require a TTY.
-
-## Recursive Language Model (RLM) primitive — `rlm_query`
-
-When you need parallel analysis, recursive decomposition, or batched generation, call the `rlm_query` tool. It runs N prompts in parallel against the cheap fast model (`deepseek-v4-flash`) and returns the joined results — much faster and cheaper than doing the work inline.
-
-Two shapes:
-
- **Single child:** `rlm_query({ "prompt": "Analyze X" })` → returns the response text.
- **Parallel batch:** `rlm_query({ "prompts": ["Analyze X angle A", "Analyze X angle B", "Analyze X angle C"] })` → returns `[0] ...\n\n---\n\n[1] ...\n\n---\n\n[2] ...`.
-
-Optional fields: `model` (override the child model — set to `"deepseek-v4-pro"` if a child genuinely needs deep reasoning), `system` (shared system prompt for all children), `max_tokens` (per-child cap, default 4096). Hard cap: 16 prompts per call.
-
-For recursive drill-down: call `rlm_query` once for the breakdown, then call it again with a single `prompt` on the strongest finding.
-
-Do NOT use RLM when the task requires file-system modification, interactive user input, or is trivial enough for a single sentence.
-
-| Primitive | Use when | Cost | Speed |
-|---|---|---|---|
-| Inline reasoning | Simple Q&A, one-step tasks | Low | Fast |
-| `rlm_query` | Parallel / batched / recursive read-only work | Very low (flash) | Fast |
-| `agent_swarm` | Multi-step autonomous work with tools | Higher | Slower (polling) |
+Even with auto-approval, create a `todo_write` first so your work is visible and trackable in the
+sidebar. Decomposition is not red tape — it's how you organize complex work and demonstrate thoroughness.
+For multi-step initiatives, use `update_plan` + `todo_write` together.
@@ -5,10 +5,13 @@
 //! the joined result.

 use std::sync::Arc;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::time::Instant;

 use async_trait::async_trait;
 use futures_util::future::join_all;
 use serde_json::{Value, json};
+use tracing::debug;

 use crate::client::DeepSeekClient;
 use crate::llm_client::LlmClient;
@@ -49,12 +52,11 @@ impl ToolSpec for RlmQueryTool {
    }

    fn description(&self) -> &'static str {
-        "Run one or more prompts in parallel against the fast cheap model (deepseek-v4-flash). \
-         Use for fan-out analysis, batched review, or cheap parallel decomposition: pass `prompts` \
-         as an array to run them concurrently, or `prompt` for a single call. Each child runs \
-         in isolation with its own (optional) system prompt; results come back as `[i] <text>` \
-         joined blocks (or just the text when there's one prompt). Cheaper than spawning sub-agents \
-         for read-only reasoning work."
+        "Run up to 16 prompts concurrently against the fast cheap model (deepseek-v4-flash) \
+         and return the joined results. Pass `prompts: [...]` for a parallel batch or \
+         `prompt` for a single child. Children run in isolation with an optional shared \
+         `system` prompt; results come back as `[i] <text>` blocks separated by `---` (or \
+         just the text for N=1). Read-only — no file or shell side-effects."
    }

    fn input_schema(&self) -> Value {
@@ -142,12 +144,33 @@ impl ToolSpec for RlmQueryTool {
        let client = Arc::new(client);
        let model = Arc::new(model);
        let system = Arc::new(system);
+        let total = prompts.len();
+        // Tracks the peak concurrent in-flight child count for this fan-out.
+        // Useful as evidence that join_all actually overlaps requests rather
+        // than walking through them serially. Surfaces in `RUST_LOG=
+        // deepseek_cli::tools=debug` as the `peak` field of the summary log.
+        let in_flight = Arc::new(AtomicUsize::new(0));
+        let peak = Arc::new(AtomicUsize::new(0));
+        let dispatch_started = Instant::now();

        let futures = prompts.into_iter().enumerate().map(|(idx, prompt)| {
            let client = Arc::clone(&client);
            let model = Arc::clone(&model);
            let system = Arc::clone(&system);
+            let in_flight = Arc::clone(&in_flight);
+            let peak = Arc::clone(&peak);
            async move {
+                let prior = in_flight.fetch_add(1, Ordering::Relaxed);
+                let now = prior + 1;
+                peak.fetch_max(now, Ordering::Relaxed);
+                debug!(
+                    target: "deepseek_cli::tools",
+                    tool = "rlm_query",
+                    idx,
+                    in_flight = now,
+                    "child request start"
+                );
+                let started = Instant::now();
                let request = MessageRequest {
                    model: (*model).clone(),
                    messages: vec![Message {
@@ -168,11 +191,31 @@ impl ToolSpec for RlmQueryTool {
                    temperature: Some(0.4),
                    top_p: Some(0.9),
                };
-                (idx, client.create_message(request).await)
+                let response = client.create_message(request).await;
+                let elapsed_ms = started.elapsed().as_millis() as u64;
+                in_flight.fetch_sub(1, Ordering::Relaxed);
+                debug!(
+                    target: "deepseek_cli::tools",
+                    tool = "rlm_query",
+                    idx,
+                    elapsed_ms,
+                    ok = response.is_ok(),
+                    "child request done"
+                );
+                (idx, response)
            }
        });

        let results = join_all(futures).await;
+        let dispatch_elapsed_ms = dispatch_started.elapsed().as_millis() as u64;
+        debug!(
+            target: "deepseek_cli::tools",
+            tool = "rlm_query",
+            total,
+            peak = peak.load(Ordering::Relaxed),
+            dispatch_elapsed_ms,
+            "fan-out complete"
+        );

        let mut ordered: Vec<(usize, String)> = results
            .into_iter()
@@ -1011,7 +1011,8 @@ impl ToolSpec for AgentSpawnTool {
    }

    fn description(&self) -> &'static str {
-        "Spawn a background sub-agent to handle a focused task. Returns an agent_id immediately; follow with agent_result to retrieve the result."
+        "Spawn a background sub-agent for a focused task. Returns an agent_id immediately; \
+         follow with agent_result to retrieve the final result."
    }

    fn input_schema(&self) -> Value {
@@ -1136,7 +1137,8 @@ impl ToolSpec for AgentResultTool {
    }

    fn description(&self) -> &'static str {
-        "Get the latest status or final result for a sub-agent."
+        "Get the latest status or final result for a sub-agent. Set `block: true` to wait until the \
+         agent reaches a terminal state (respects `timeout_ms`)."
    }

    fn input_schema(&self) -> Value {
@@ -1224,7 +1226,7 @@ impl ToolSpec for AgentCancelTool {
    }

    fn description(&self) -> &'static str {
-        "Cancel a running sub-agent."
+        "Cancel a running sub-agent. Returns the final snapshot with the cancelled status."
    }

    fn input_schema(&self) -> Value {
@@ -1411,7 +1413,8 @@ impl ToolSpec for AgentListTool {
    }

    fn description(&self) -> &'static str {
-        "List all active and completed sub-agents with their status."
+        "List all active and recently completed sub-agents with their status, type, assignment, \
+         steps taken, and duration."
    }

    fn input_schema(&self) -> Value {
@@ -1458,7 +1461,7 @@ impl ToolSpec for AgentSendInputTool {
    }

    fn description(&self) -> &'static str {
-        "Send input to a running sub-agent."
+        "Send input to a running sub-agent. Returns the agent's current snapshot after delivery."
    }

    fn input_schema(&self) -> Value {
@@ -1542,7 +1545,9 @@ impl ToolSpec for AgentAssignTool {
    }

    fn description(&self) -> &'static str {
-        "Update a sub-agent assignment and optionally send an immediate instruction."
+        "Update a sub-agent's assignment (objective, role) and optionally deliver an immediate \
+         coordinator note. The update is delivered as a high-priority message when `interrupt` is \
+         true (the default). Returns the agent's current snapshot."
    }

    fn input_schema(&self) -> Value {
@@ -1634,7 +1639,9 @@ impl ToolSpec for AgentWaitTool {
    }

    fn description(&self) -> &'static str {
-        "Wait for one or more sub-agents to reach a terminal status."
+        "Wait for one or more sub-agents to reach a terminal status. Use `wait_mode: \"all\"` to block \
+         until every listed agent finishes, or `wait_mode: \"any\"` (default) to return as soon as \
+         one finishes. When no ids are given, waits on all currently running sub-agents."
    }

    fn input_schema(&self) -> Value {
@@ -1777,7 +1784,8 @@ impl ToolSpec for DelegateToAgentTool {
    }

    fn description(&self) -> &'static str {
-        "Delegate a task to a specialized sub-agent. This is an alias for agent_spawn."
+        "Delegate a task to a specialized sub-agent. This is an alias for agent_spawn — same schema, \
+         same behavior. Use `type` (or `agent_name`, `agent_type`) to pick the agent flavor."
    }

    fn input_schema(&self) -> Value {
@@ -281,7 +281,9 @@ impl ToolSpec for AgentSwarmTool {
    }

    fn description(&self) -> &'static str {
-        "Spawn multiple sub-agents with optional dependencies and aggregate their results."
+        "Spawn multiple sub-agents in parallel, each with their own tools and optional task \
+         dependencies, and aggregate their results. Returns a swarm_id; results come back via \
+         swarm_result or wait."
    }

    fn input_schema(&self) -> Value {
@@ -463,7 +465,8 @@ impl ToolSpec for SwarmStatusTool {
    }

    fn description(&self) -> &'static str {
-        "Get the latest status for a previously spawned swarm."
+        "Get the latest status snapshot for a previously spawned swarm — status, task counts, \
+         and elapsed duration, without pulling full per-task results."
    }

    fn input_schema(&self) -> Value {
@@ -517,7 +520,8 @@ impl ToolSpec for SwarmResultTool {
    }

    fn description(&self) -> &'static str {
-        "Get full outcomes for a previously spawned swarm."
+        "Get full outcomes for a previously spawned swarm. Use `block: true` to wait for completion; \
+         returns task-level results, durations, errors, and aggregated counts."
    }

    fn input_schema(&self) -> Value {
@@ -353,6 +353,7 @@ mod tests {
            status: ToolStatus::Running,
            input_summary: None,
            output: None,
+            prompts: None,
        }))
    }

@@ -898,6 +898,10 @@ pub struct GenericToolCell {
    pub status: ToolStatus,
    pub input_summary: Option<String>,
    pub output: Option<String>,
+    /// When the tool is `rlm_query` (or any future fan-out tool that exposes a
+    /// list of child prompts), each prompt is shown on its own indented row
+    /// instead of the inline `args:` summary. `None` for ordinary tools.
+    pub prompts: Option<Vec<String>>,
 }

 impl GenericToolCell {
@@ -917,15 +921,37 @@ impl GenericToolCell {
            tool_value_style(),
            width,
        ));
-        let show_args = matches!(self.status, ToolStatus::Running) || self.output.is_none();
-        if show_args && let Some(summary) = self.input_summary.as_ref() {
-            lines.extend(render_compact_kv(
-                "args",
-                summary,
-                tool_value_style(),
-                width,
-            ));
+
+        // Prefer per-prompt rows over the generic args summary when the tool
+        // exposes a list of child prompts (rlm_query). One row per child with
+        // a `[i]` index makes the fan-out legible without expanding JSON.
+        let show_prompts = matches!(self.status, ToolStatus::Running) || self.output.is_none();
+        if show_prompts
+            && let Some(prompts) = self.prompts.as_ref()
+            && !prompts.is_empty()
+        {
+            for (idx, prompt) in prompts.iter().enumerate() {
+                let label = if idx == 0 { "prompts" } else { "" };
+                let value = format!("[{idx}] {}", truncate_text(prompt.trim(), 200));
+                lines.extend(render_card_detail_line(
+                    if label.is_empty() { None } else { Some(label) },
+                    &value,
+                    tool_value_style(),
+                    width,
+                ));
+            }
+        } else {
+            let show_args = matches!(self.status, ToolStatus::Running) || self.output.is_none();
+            if show_args && let Some(summary) = self.input_summary.as_ref() {
+                lines.extend(render_compact_kv(
+                    "args",
+                    summary,
+                    tool_value_style(),
+                    width,
+                ));
+            }
        }
+
        if let Some(output) = self.output.as_ref() {
            lines.extend(render_compact_kv(
                "result",
@@ -1786,8 +1812,8 @@ fn thinking_state_accent(state: ThinkingVisualState) -> Color {
 #[cfg(test)]
 mod tests {
    use super::{
-        ExecCell, ExecSource, HistoryCell, PlanStep, PlanUpdateCell, TOOL_RUNNING_SYMBOLS,
-        TOOL_STATUS_SYMBOL_MS, ToolCell, ToolStatus, TranscriptRenderOptions,
+        ExecCell, ExecSource, GenericToolCell, HistoryCell, PlanStep, PlanUpdateCell,
+        TOOL_RUNNING_SYMBOLS, TOOL_STATUS_SYMBOL_MS, ToolCell, ToolStatus, TranscriptRenderOptions,
        extract_reasoning_summary, render_thinking, running_status_label_with_elapsed,
    };
    use crate::deepseek_theme::Theme;
@@ -2182,4 +2208,48 @@ mod tests {
        let last = format!("output line {:02}", total_output_lines - 1);
        assert!(transcript_text.contains(&last));
    }
+
+    #[test]
+    fn generic_tool_cell_renders_rlm_prompts_as_indexed_rows() {
+        // When prompts are populated (rlm_query fan-out), each child shows on
+        // its own row instead of the inline `args:` summary so the user can
+        // read what each child was asked.
+        let cell = HistoryCell::Tool(ToolCell::Generic(GenericToolCell {
+            name: "rlm_query".to_string(),
+            status: ToolStatus::Running,
+            input_summary: Some("prompts: <3 items>".to_string()),
+            output: None,
+            prompts: Some(vec![
+                "Summarize the README".to_string(),
+                "List the public types in client.rs".to_string(),
+                "Diff this commit against main".to_string(),
+            ]),
+        }));
+        let text = lines_text(&cell.lines(80));
+
+        assert!(text.contains("[0] Summarize the README"));
+        assert!(text.contains("[1] List the public types in client.rs"));
+        assert!(text.contains("[2] Diff this commit against main"));
+        // The inline args summary must not also be emitted — we replaced it
+        // with the per-child rows.
+        assert!(
+            !text.contains("args: prompts:"),
+            "inline `args:` summary must be suppressed when per-prompt rows render"
+        );
+    }
+
+    #[test]
+    fn generic_tool_cell_falls_back_to_args_when_prompts_none() {
+        // Non-fan-out tools keep the existing `args:` summary so behavior
+        // doesn't drift for everything else.
+        let cell = HistoryCell::Tool(ToolCell::Generic(GenericToolCell {
+            name: "file_search".to_string(),
+            status: ToolStatus::Running,
+            input_summary: Some("query: foo".to_string()),
+            output: None,
+            prompts: None,
+        }));
+        let text = lines_text(&cell.lines(80));
+        assert!(text.contains("query: foo"));
+    }
 }
@@ -24,9 +24,7 @@
 //! cell and re-runs only the render step on width changes. That makes resize a
 //! re-flow operation rather than a re-parse + re-flow operation.

-use std::sync::Arc;
-
-#[cfg(any(test, feature = "perf-counters"))]
+#[cfg(test)]
 use std::cell::Cell;

 use ratatui::style::{Modifier, Style};
@@ -37,21 +35,20 @@ use crate::palette;

 // Thread-local counter incremented every time `parse` runs. Used by tests to
 // prove that width-only changes hit the cached-AST path and skip parsing.
-// Available in test builds and behind the `perf-counters` feature flag so
-// release builds pay no cost. Thread-local (not global atomic) so concurrent
-// tests calling `parse()` can't pollute each other's counters.
-#[cfg(any(test, feature = "perf-counters"))]
+// Thread-local (not global atomic) so concurrent tests calling `parse()` can't
+// pollute each other's counters.
+#[cfg(test)]
 thread_local! {
    static PARSE_INVOCATIONS: Cell<u64> = const { Cell::new(0) };
 }

-#[cfg(any(test, feature = "perf-counters"))]
+#[cfg(test)]
 #[must_use]
 pub fn parse_invocation_count() -> u64 {
    PARSE_INVOCATIONS.with(|c| c.get())
 }

-#[cfg(any(test, feature = "perf-counters"))]
+#[cfg(test)]
 pub fn reset_parse_invocation_count() {
    PARSE_INVOCATIONS.with(|c| c.set(0));
 }
@@ -87,20 +84,6 @@ pub struct ParsedMarkdown {
    blocks: Vec<Block>,
 }

-impl ParsedMarkdown {
-    /// Borrow the parsed blocks (mostly useful for tests).
-    #[must_use]
-    pub fn blocks(&self) -> &[Block] {
-        &self.blocks
-    }
-
-    /// Whether the parse was empty (no source at all).
-    #[must_use]
-    pub fn is_empty(&self) -> bool {
-        self.blocks.is_empty()
-    }
-}
-
 /// Parse markdown source into a width-independent block AST.
 ///
 /// This is a small line-oriented parser tuned for the patterns we render:
@@ -110,7 +93,7 @@ impl ParsedMarkdown {
 /// classify as `Block::Paragraph`.
 #[must_use]
 pub fn parse(content: &str) -> ParsedMarkdown {
-    #[cfg(any(test, feature = "perf-counters"))]
+    #[cfg(test)]
    PARSE_INVOCATIONS.with(|c| c.set(c.get() + 1));

    let mut blocks = Vec::new();
@@ -236,16 +219,6 @@ pub fn render_markdown(content: &str, width: u16, base_style: Style) -> Vec<Line
    render_parsed(&parsed, width, base_style)
 }

-/// Cache-friendly parsed AST for [`HistoryCell`] rendering.
-///
-/// Wraps the `ParsedMarkdown` in `Arc` so the transcript cache can hand the
-/// same parse to many render passes (e.g. across spacers / overlays) without
-/// reallocation.
-#[must_use]
-pub fn parse_arc(content: &str) -> Arc<ParsedMarkdown> {
-    Arc::new(parse(content))
-}
-
 fn parse_heading(line: &str) -> Option<(usize, &str)> {
    let trimmed = line.trim_start();
    let hashes = trimmed.chars().take_while(|c| *c == '#').count();
@@ -514,7 +487,7 @@ mod tests {
    #[test]
    fn fenced_code_block_collected_in_parse() {
        let parsed = parse("text\n```\ncode line one\ncode line two\n```\nmore\n");
-        let blocks = parsed.blocks();
+        let blocks = &parsed.blocks;
        // text paragraph, two code lines, more paragraph (fences are dropped)
        let code_lines: Vec<_> = blocks
            .iter()
@@ -530,7 +503,7 @@ mod tests {
    fn ordered_and_unordered_list_items_parse() {
        let parsed = parse("- alpha\n* beta\n1. gamma\n");
        let items: Vec<_> = parsed
-            .blocks()
+            .blocks
            .iter()
            .filter_map(|b| match b {
                Block::ListItem { bullet, text } => Some((bullet.as_str(), text.as_str())),
@@ -95,7 +95,11 @@ const CONTEXT_WARNING_THRESHOLD_PERCENT: f64 = 85.0;
 const CONTEXT_CRITICAL_THRESHOLD_PERCENT: f64 = 95.0;
 const UI_IDLE_POLL_MS: u64 = 48;
 const UI_ACTIVE_POLL_MS: u64 = 24;
-const UI_STATUS_ANIMATION_MS: u64 = 360;
+// Forced repaint cadence while a turn is live (model loading, compacting,
+// sub-agents running). Drives the footer water-spout animation as well as
+// the per-tool spinner pulse — keep this fast enough that the spout reads as
+// motion (~12 fps) instead of teleport-frames.
+const UI_STATUS_ANIMATION_MS: u64 = 80;
 const WORKSPACE_CONTEXT_REFRESH_SECS: u64 = 15;
 const SIDEBAR_VISIBLE_MIN_WIDTH: u16 = 100;

@@ -3068,7 +3072,7 @@ fn render_footer(f: &mut Frame, area: Rect, app: &mut App) {
        Vec::new()
    };

-    let props = FooterProps::from_app(
+    let mut props = FooterProps::from_app(
        app,
        toast,
        state_label,
@@ -3079,11 +3083,35 @@ fn render_footer(f: &mut Frame, area: Rect, app: &mut App) {
        cache,
        cost,
    );
+
+    // Animate the spacer between the left status line and the right-hand
+    // chips whenever a turn is live: model loading/streaming, compacting, or
+    // sub-agents in flight. Honors the `low_motion` setting — calm terminals
+    // get the plain whitespace gap. Frame counter ticks every 80 ms; the
+    // renderer is deterministic given the frame, so tests can pin specific
+    // frames. Computed independently of `state_label` so removing the
+    // "thinking" text label doesn't kill the visual signal.
+    if !app.low_motion && footer_working_strip_active(app) {
+        let frame = std::time::SystemTime::now()
+            .duration_since(std::time::UNIX_EPOCH)
+            .map(|d| d.as_millis() as u64 / 80)
+            .unwrap_or(0);
+        props.working_strip_frame = Some(frame);
+    }
+
    let widget = FooterWidget::new(props);
    let buf = f.buffer_mut();
    widget.render(area, buf);
 }

+/// Whether the footer should animate the water-spout strip. Driven by the
+/// underlying live-work flags (model loading, compacting, sub-agents) rather
+/// than a stringly-typed status label, so adding or removing labels never
+/// silently disables the animation.
+fn footer_working_strip_active(app: &App) -> bool {
+    app.is_loading || app.is_compacting || running_agent_count(app) > 0
+}
+
 /// Test-only helper retained as a parity reference for `FooterWidget`'s
 /// auxiliary-span composition. Production rendering is performed by the
 /// widget itself; the existing footer parity tests still exercise this
@@ -3257,9 +3285,12 @@ fn footer_state_label(app: &App) -> (&'static str, ratatui::style::Color) {
    if app.is_compacting {
        return ("compacting \u{238B}", palette::STATUS_WARNING);
    }
-    if app.is_loading {
-        return ("thinking \u{238B}", palette::STATUS_WARNING);
-    }
+    // Note: we deliberately do NOT show a "thinking" label for `is_loading`.
+    // The animated water-spout strip in the footer's spacer is the visual
+    // signal that the model is live; "thinking" was misleading because it
+    // fired for every kind of in-flight work (tool calls, streaming, etc.),
+    // not strictly reasoning. Sub-agents still surface "working" because
+    // that's a distinct lifecycle the user can act on (open `/agents`).
    if running_agent_count(app) > 0 {
        return ("working", palette::DEEPSEEK_SKY);
    }
@@ -4276,6 +4307,7 @@ fn handle_tool_call_started(app: &mut App, id: &str, name: &str, input: &serde_j
    }

    let input_summary = summarize_tool_args(input);
+    let prompts = extract_fanout_prompts(name, input);
    push_active_tool_cell(
        app,
        &id,
@@ -4286,10 +4318,35 @@ fn handle_tool_call_started(app: &mut App, id: &str, name: &str, input: &serde_j
            status: ToolStatus::Running,
            input_summary,
            output: None,
+            prompts,
        })),
    );
 }

+/// Extract per-child prompts from a fan-out tool's input. For `rlm_query` the
+/// renderer shows one row per child instead of an inline JSON summary so the
+/// user can read what each child was asked. Returns `None` for tools that
+/// don't expose a prompt list.
+fn extract_fanout_prompts(name: &str, input: &serde_json::Value) -> Option<Vec<String>> {
+    if name != "rlm_query" {
+        return None;
+    }
+    if let Some(arr) = input.get("prompts").and_then(|v| v.as_array()) {
+        let prompts: Vec<String> = arr
+            .iter()
+            .filter_map(|v| v.as_str().map(str::to_string))
+            .collect();
+        if prompts.is_empty() {
+            return None;
+        }
+        return Some(prompts);
+    }
+    if let Some(s) = input.get("prompt").and_then(|v| v.as_str()) {
+        return Some(vec![s.to_string()]);
+    }
+    None
+}
+
 /// Push a tool cell as a new entry in `active_cell`, register the tool id,
 /// and write a stub detail record so the pager / Ctrl+O can find it.
 fn push_active_tool_cell(
@@ -4585,6 +4642,7 @@ fn push_orphan_tool_completion(
        status,
        input_summary: None,
        output,
+        prompts: None,
    })));
    let cell_index = app.history.len().saturating_sub(1);
    app.tool_details_by_cell.insert(
@@ -349,19 +349,27 @@ fn format_context_budget_caps_overflow_display() {
 }

 #[test]
-fn footer_state_label_prefers_compacting_then_thinking() {
+fn footer_state_label_drops_thinking_and_prefers_compacting() {
+    // We deliberately do not surface a "thinking" label for `is_loading` —
+    // the animated water-spout strip in the footer's spacer is the visual
+    // signal. `is_loading` alone falls through to "ready"; `is_compacting`
+    // still wins because compacting is a less-common, distinct state.
    let mut app = create_test_app();
    assert_eq!(footer_state_label(&app).0, "ready");

    app.is_loading = true;
-    assert!(footer_state_label(&app).0.starts_with("thinking"));
+    assert_eq!(
+        footer_state_label(&app).0,
+        "ready",
+        "is_loading must NOT produce a `thinking` text label — the animation handles it"
+    );

    app.is_compacting = true;
    assert!(footer_state_label(&app).0.starts_with("compacting"));
 }

 #[test]
-fn footer_status_line_spans_show_mode_model_and_status() {
+fn footer_status_line_spans_show_mode_and_model_idle_and_active() {
    let mut app = create_test_app();
    app.model = "deepseek-v4-flash".to_string();

@@ -371,11 +379,17 @@ fn footer_status_line_spans_show_mode_model_and_status() {
    assert!(idle.contains("\u{00B7}"));
    assert!(!idle.contains("ready"));

+    // is_loading no longer adds a "thinking" text label — the live-work
+    // signal is the animated water-spout strip the renderer paints into
+    // the footer's spacer. The mode + model still render unchanged.
    app.is_loading = true;
    let active = spans_text(&footer_status_line_spans(&app, 60));
    assert!(active.contains("agent"));
    assert!(active.contains("deepseek-v4-flash"));
-    assert!(active.contains("thinking"));
+    assert!(
+        !active.contains("thinking"),
+        "footer must not show a `thinking` text label while loading"
+    );
 }

 #[test]
@@ -896,6 +910,7 @@ fn jump_to_adjacent_tool_cell_finds_next_and_previous() {
            status: ToolStatus::Success,
            input_summary: Some("query: foo".to_string()),
            output: Some("done".to_string()),
+            prompts: None,
        })),
        HistoryCell::Assistant {
            content: "ok".to_string(),
@@ -906,6 +921,7 @@ fn jump_to_adjacent_tool_cell_finds_next_and_previous() {
            status: ToolStatus::Success,
            input_summary: Some("ls".to_string()),
            output: Some("...".to_string()),
+            prompts: None,
        })),
    ];
    app.mark_history_updated();
@@ -49,6 +49,54 @@ pub struct FooterProps {
    pub cost: Vec<Span<'static>>,
    /// Optional toast that, when present, replaces the left status line.
    pub toast: Option<FooterToast>,
+    /// When `Some(frame_idx)`, the gap between the left status line and the
+    /// right-hand chips is filled with an animated water-spout strip keyed
+    /// off `frame_idx` (deterministic given the frame). `None` keeps the gap
+    /// as plain whitespace, which is the idle/ready state.
+    pub working_strip_frame: Option<u64>,
+}
+
+/// One frame of the footer's water-spout animation. `col` is the cell index
+/// inside the strip, `width` the strip's total width, `frame` the discrete
+/// frame counter. Returns the glyph that should appear in that cell on that
+/// frame.
+///
+/// Visual: a single calm water line of `─` with one upward spout glyph that
+/// drifts back and forth via a triangle-wave bounce. Minimal, artistic, and
+/// purely deterministic so the test suite can pin a specific frame.
+#[must_use]
+pub fn footer_working_strip_glyph_at(col: usize, width: usize, frame: u64) -> char {
+    if width == 0 {
+        return ' ';
+    }
+    let w = width as i64;
+    let frame = frame as i64;
+
+    // Bounce a value that counts up forever between [0, w-1] using a
+    // triangle wave so the spout rides back and forth instead of wrapping.
+    let span = (w * 2).max(2);
+    let t = frame.rem_euclid(span);
+    let pos = if t < w { t } else { (span - 1) - t };
+
+    let dist = (col as i64 - pos).abs();
+    match dist {
+        0 => '\u{257F}', // ╿  — vertical bar with a stronger top half: a spout standing up out of the surface
+        1 => '\u{2576}', // ╶  — short stub on the spout's shoulder, like a splash
+        _ => '\u{2500}', // ─  — calm water surface
+    }
+}
+
+/// Build the per-frame water-spout string of `width` characters. Empty string
+/// when width is 0. The result is the same visual width as requested (one
+/// char per column for box-drawing chars) and is safe to drop into a `Span`
+/// between the footer's left and right segments.
+#[must_use]
+pub fn footer_working_strip_string(width: usize, frame: u64) -> String {
+    let mut out = String::with_capacity(width * 4);
+    for col in 0..width {
+        out.push(footer_working_strip_glyph_at(col, width, frame));
+    }
+    out
 }

 /// Build a "N agents" chip span list when there are sub-agents in flight.
@@ -112,6 +160,7 @@ impl FooterProps {
            cache,
            cost,
            toast,
+            working_strip_frame: None,
        }
    }
 }
@@ -253,8 +302,18 @@ impl Renderable for FooterWidget {
        let left_width = span_width(&left_spans);
        let spacer_width = available_width.saturating_sub(left_width + right_width);

+        // When a turn is in flight, fill the gap with a thin animated water-
+        // spout strip; otherwise the gap stays as plain whitespace.
+        let spacer_span = match self.props.working_strip_frame {
+            Some(frame) if spacer_width > 0 => Span::styled(
+                footer_working_strip_string(spacer_width, frame),
+                Style::default().fg(palette::DEEPSEEK_SKY),
+            ),
+            _ => Span::raw(" ".repeat(spacer_width)),
+        };
+
        let mut all_spans = left_spans;
-        all_spans.push(Span::raw(" ".repeat(spacer_width)));
+        all_spans.push(spacer_span);
        all_spans.extend(right_spans);

        let paragraph = Paragraph::new(Line::from(all_spans));
@@ -467,6 +526,73 @@ mod tests {
        assert!(!rendered.contains("ready"));
    }

+    #[test]
+    fn working_strip_string_width_matches_request() {
+        // The strip must produce exactly `width` characters per frame —
+        // otherwise the spacer math in `FooterWidget::render` would
+        // mis-align the right-hand chips. (Glyphs are all ASCII / Latin-1
+        // so char count equals visual width here.)
+        for width in [0usize, 1, 8, 60, 200] {
+            let s = super::footer_working_strip_string(width, 7);
+            assert_eq!(s.chars().count(), width, "width {width} mismatch");
+        }
+    }
+
+    #[test]
+    fn working_strip_glyph_is_deterministic_per_frame() {
+        // Same (col, width, frame) → same glyph. Different `frame` values
+        // produce different overall strings, which is what makes the
+        // animation visible.
+        let a = super::footer_working_strip_string(40, 1);
+        let b = super::footer_working_strip_string(40, 1);
+        assert_eq!(a, b, "deterministic given the same frame");
+        let c = super::footer_working_strip_string(40, 2);
+        assert_ne!(a, c, "advancing the frame must change the strip");
+    }
+
+    #[test]
+    fn working_strip_renders_glyphs_only_when_frame_is_some() {
+        // Idle: spacer is plain whitespace. Active: spacer contains the
+        // box-drawing animation glyphs (`╿` spout, `╶` splash, `─` water
+        // surface) and visibly differs from the idle render.
+        let app = make_app();
+        let mut props = idle_props_for(&app);
+
+        let area = ratatui::layout::Rect::new(0, 0, 80, 1);
+        let mut buf = ratatui::buffer::Buffer::empty(area);
+        FooterWidget::new(props.clone()).render(area, &mut buf);
+        let idle: String = (0..area.width).map(|x| buf[(x, 0)].symbol()).collect();
+
+        props.working_strip_frame = Some(13);
+        let mut buf2 = ratatui::buffer::Buffer::empty(area);
+        FooterWidget::new(props).render(area, &mut buf2);
+        let active: String = (0..area.width).map(|x| buf2[(x, 0)].symbol()).collect();
+
+        assert_ne!(
+            idle, active,
+            "active footer must visibly differ from idle one"
+        );
+        assert!(
+            active.contains('\u{257F}')
+                || active.contains('\u{2576}')
+                || active.contains('\u{2500}'),
+            "active strip must contain at least one animation glyph: {active:?}",
+        );
+    }
+
+    #[test]
+    fn working_strip_spout_position_advances_with_frame() {
+        // The single spout column must move between consecutive frames so
+        // the animation reads as drift rather than a static pattern.
+        let width = 16;
+        let f0 = super::footer_working_strip_string(width, 1);
+        let f1 = super::footer_working_strip_string(width, 2);
+        let pos = |s: &str| s.chars().position(|c| c == '\u{257F}');
+        let p0 = pos(&f0).expect("frame 1 has a spout");
+        let p1 = pos(&f1).expect("frame 2 has a spout");
+        assert_ne!(p0, p1, "spout column must advance between frames");
+    }
+
    #[test]
    fn render_swaps_toast_for_status_line() {
        let app = make_app();
@@ -1526,6 +1526,7 @@ mod tests {
            status: ToolStatus::Success,
            input_summary: Some("items: <2 items>".to_string()),
            output: Some("hello world ".repeat(420)),
+            prompts: None,
        }));
        for width in [40u16, 80, 111, 165] {
            let lines = cell.lines(width);
@@ -1568,6 +1569,7 @@ mod tests {
                status: ToolStatus::Success,
                input_summary: Some("todos: <1 items>".to_string()),
                output: Some(output),
+                prompts: None,
            })));

            let height: u16 = 30;