diff --git a/CHANGELOG.md b/CHANGELOG.md index c414c513..c52f9e4b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,6 +19,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 oriented.** README, rebrand notes, crate metadata, and npm package text now describe CodeWhale as an agentic terminal for open source and open-weight coding models while preserving the official DeepSeek provider as first-class. +- **Model auto-routing is documented separately from TUI modes.** README and + modes docs now reserve "mode" for Plan / Agent / YOLO, describe + `--model auto` as model/thinking routing, and name the fast + `deepseek-v4-flash` thinking-off seam as Fin. +- **Rebrand shim docs now match the v0.8.x transition window.** The npm and + migration notes no longer imply the legacy `deepseek-tui` package/shims + expired immediately after v0.8.41. ### Fixed diff --git a/README.ja-JP.md b/README.ja-JP.md index 7924b75c..9af0d14f 100644 --- a/README.ja-JP.md +++ b/README.ja-JP.md @@ -68,8 +68,9 @@ codewhale は、ターミナル内で完結するコーディングエージェ ### 主な機能 -- **Auto モード** — `--model auto` / `/model auto` がターンごとにモデルと推論強度を選択 -- **ネイティブ RLM** (`rlm_open`/`rlm_eval`) — 永続 REPL セッションでバッチ解析を行い、`peek`、`search`、`chunk`、`sub_query_batch` などの補助関数で低コストな `deepseek-v4-flash` 子タスクを実行 +- **モデル自動ルーティング** — `--model auto` / `/model auto` がターンごとにモデルと推論強度を選択 +- **Fin の高速経路** — thinking off の低コストな `deepseek-v4-flash` がルーティング、RLM 子呼び出し、要約、調整作業を担当 +- **ネイティブ RLM** (`rlm_open`/`rlm_eval`) — 永続 REPL セッションでバッチ解析を行い、`peek`、`search`、`chunk`、`sub_query_batch` などの補助関数を利用 - **Thinking-mode ストリーミング** — モデルがタスクに取り組む様子をリアルタイムで観察し、思考連鎖の展開を追える - **完全なツールスイート** — ファイル操作、シェル実行、Git、Web 検索/ブラウズ、apply-patch、サブエージェント、MCP サーバー - **100 万トークンコンテキスト** — コンテキスト追跡、手動または設定ベースのコンパクション、プレフィックスキャッシュのテレメトリ @@ -235,10 +236,10 @@ TUI 内では `/provider` でプロバイダーピッカー、`/model` でロー ```bash codewhale # インタラクティブ TUI codewhale "explain this function" # ワンショットプロンプト -codewhale exec --auto --output-format stream-json "fix this bug" # NDJSON バックエンドストリーム +codewhale exec --auto --output-format stream-json "fix this bug" # ツール自動承認付きの agentic exec codewhale exec --resume "follow up" # 非対話セッションを継続 codewhale --model deepseek-v4-flash "summarize" # モデルの上書き -codewhale --model auto "fix this bug" # モデルと推論強度を自動選択 +codewhale --model auto "fix this bug" # モデルと推論強度を自動ルーティング codewhale --yolo # ツールを自動承認 codewhale auth set --provider deepseek # API キーの保存 codewhale doctor # セットアップと接続性のチェック @@ -287,6 +288,11 @@ codewhale update # バイナリ更新の確認 | **Agent** 🤖 | デフォルトのインタラクティブモード — 承認ゲート付きのマルチステップなツール利用。モデルは `checklist_write` で作業を概説 | | **YOLO** ⚡ | 信頼できるワークスペースですべてのツールを自動承認。可視性のための計画とチェックリストは引き続き維持 | +モードとモデル自動ルーティングは別物です。`Tab` は Plan / Agent / YOLO +を切り替え、`/model auto` はモデルと thinking レベルを選びます。`/goal` +は現時点ではセッション目標と token 予算の追跡であり、将来の Goal +ワークサーフェスは `--model auto` とは別に扱います。 + --- ## 設定 diff --git a/README.md b/README.md index 7426271b..0faff8e9 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # CodeWhale -> DeepSeek-first agentic terminal for open source and open-weight coding models. It runs from the `codewhale` command, streams reasoning blocks, edits local workspaces with approval gates, and includes an auto mode that chooses both model and thinking level per turn. +> DeepSeek-first agentic terminal for open source and open-weight coding models. It runs from the `codewhale` command, streams reasoning blocks, edits local workspaces with approval gates, and can auto-route each turn to the right DeepSeek model and thinking level. [简体中文 README](README.zh-CN.md) [日本語 README](README.ja-JP.md) @@ -78,7 +78,7 @@ It is built around DeepSeek V4 (`deepseek-v4-pro` / `deepseek-v4-flash`), includ ### Key Features -- **Auto mode** — `--model auto` / `/model auto` chooses both the model and thinking level for each turn +- **Model auto-routing** — `--model auto` / `/model auto` chooses both the model and thinking level for each turn - **Thinking-mode streaming** — see DeepSeek reasoning blocks as the model works - **Full tool suite** — file ops, shell execution, git, web search/browse, apply-patch, sub-agents, MCP servers - **1M-token context** — context tracking, manual or configured compaction, and prefix-cache telemetry @@ -91,7 +91,8 @@ It is built around DeepSeek V4 (`deepseek-v4-pro` / `deepseek-v4-flash`), includ - **Durable task queue** — background tasks can survive restarts - **HTTP/SSE runtime API** — `codewhale serve --http` for headless agent workflows - **MCP protocol** — connect to Model Context Protocol servers for extended tooling; please see [docs/MCP.md](docs/MCP.md) -- **Native RLM** (`rlm_open`/`rlm_eval`) — persistent REPL sessions for batched analysis; run cheap `deepseek-v4-flash` children with bounded helpers like `peek`, `search`, `chunk`, and `sub_query_batch` +- **Fin-powered seams** — cheap `deepseek-v4-flash` with thinking off handles routing, RLM child calls, summaries, and other fast coordination work +- **Native RLM** (`rlm_open`/`rlm_eval`) — persistent REPL sessions for batched analysis with bounded helpers like `peek`, `search`, `chunk`, and `sub_query_batch` - **LSP diagnostics** — inline error/warning surfacing after every edit via rust-analyzer, pyright, typescript-language-server, gopls, clangd - **User memory** — optional persistent note file injected into the system prompt for cross-session preferences - **Localized UI** — `en`, `ja`, `zh-Hans`, `pt-BR` with auto-detection @@ -164,18 +165,18 @@ Start with [docs/TENCENT_CLOUD_REMOTE_FIRST.md](docs/TENCENT_CLOUD_REMOTE_FIRST. then use [docs/TENCENT_LIGHTHOUSE_HK.md](docs/TENCENT_LIGHTHOUSE_HK.md) for the server runbook. -### Auto Mode +### Model Auto-Routing and Fin Use `codewhale --model auto` or `/model auto` when you want codewhale to decide how much model and reasoning power a turn needs. -Auto mode controls two settings together: +Model auto-routing controls two settings together: - Model: `deepseek-v4-flash` or `deepseek-v4-pro` - Thinking: `off`, `high`, or `max` -Before the real turn is sent, the app makes a small `deepseek-v4-flash` routing call with thinking off. That router looks at the latest request and recent context, then selects a concrete model and thinking level for the real request. Short/simple turns can stay on Flash with thinking off; coding, debugging, release work, architecture, security review, or ambiguous multi-step tasks can move up to Pro and/or higher thinking. +Before the real turn is sent, the app makes a small `deepseek-v4-flash` routing call with thinking off. That fast path is called **Fin**: a low-latency seam for model selection, summaries, RLM children, context maintenance, and other coordination work that should not spend a full reasoning turn. Fin looks at the latest request and recent context, then selects a concrete model and thinking level for the real request. Short/simple turns can stay on Flash with thinking off; coding, debugging, release work, architecture, security review, or ambiguous multi-step tasks can move up to Pro and/or higher thinking. -`auto` is local to codewhale. The upstream API never receives `model: "auto"`; it receives the concrete model and thinking setting chosen for that turn. The TUI shows the selected route, and cost tracking is charged against the model that actually ran. If the router call fails or returns an invalid answer, the app falls back to a local heuristic. Sub-agents inherit auto mode unless you assign them an explicit model. +`--model auto` and `/model auto` are local to codewhale. The upstream API never receives `model: "auto"`; it receives the concrete model and thinking setting chosen for that turn. The TUI shows the selected route, and cost tracking is charged against the model that actually ran. If the Fin route fails or returns an invalid answer, the app falls back to a local heuristic. Sub-agents inherit model auto-routing unless you assign them an explicit model. Use a fixed model or fixed thinking level when you want repeatable benchmarking, a strict cost ceiling, or a specific provider/model mapping. @@ -310,10 +311,10 @@ interfaces, and extension points. ```bash codewhale # interactive TUI codewhale "explain this function" # one-shot prompt -codewhale exec --auto --output-format stream-json "fix this bug" # NDJSON backend stream +codewhale exec --auto --output-format stream-json "fix this bug" # agentic exec with tool auto-approvals codewhale exec --resume "follow up" # continue a non-interactive session codewhale --model deepseek-v4-flash "summarize" # model override -codewhale --model auto "fix this bug" # auto-select model + thinking +codewhale --model auto "fix this bug" # auto-route model + thinking codewhale --yolo # auto-approve tools codewhale auth set --provider deepseek # save API key codewhale doctor # check setup & connectivity @@ -416,6 +417,12 @@ Full shortcut catalog: [docs/KEYBINDINGS.md](docs/KEYBINDINGS.md). | **Agent** 🤖 | Default interactive mode — multi-step tool use with approval gates; substantial work is tracked with `checklist_write` | | **YOLO** ⚡ | Auto-approve all tools in a trusted workspace; multi-step work still keeps a visible checklist | +Modes are separate from model auto-routing. `Tab` cycles Plan / Agent / YOLO, +while `/model auto` controls model and thinking selection. The `/goal` command +tracks a session objective and token budget today; a fuller Goal work surface is +the right future home for persistent objective progress rather than another +meaning of "auto". + --- ## Configuration diff --git a/README.zh-CN.md b/README.zh-CN.md index 5b652f43..99777469 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -73,8 +73,9 @@ codewhale 是一个完全运行在终端里的编程智能体。它让 DeepSeek ### 主要功能 -- **Auto 模式** —— `--model auto` / `/model auto` 每轮自动选择模型和推理强度 -- **原生 RLM**(`rlm_open`/`rlm_eval`)—— 持久化 REPL 会话用于批量分析;使用带界面的辅助函数(`peek`、`search`、`chunk`、`sub_query_batch`)运行低成本 `deepseek-v4-flash` 子任务 +- **模型自动路由** —— `--model auto` / `/model auto` 每轮自动选择模型和推理强度 +- **Fin 快速通道** —— 使用关闭思考的低成本 `deepseek-v4-flash` 承担路由、RLM 子调用、摘要和协调工作 +- **原生 RLM**(`rlm_open`/`rlm_eval`)—— 持久化 REPL 会话用于批量分析;使用带界面的辅助函数(`peek`、`search`、`chunk`、`sub_query_batch`) - **思考模式流式输出** —— 实时观察模型在解决问题时的思维链展开 - **完整工具集** —— 文件操作、shell 执行、git、网页搜索/浏览、apply-patch、子智能体、MCP 服务器 - **100 万 token 上下文** —— 上下文跟踪、手动或配置驱动的压缩,以及前缀缓存遥测 @@ -151,18 +152,18 @@ CNB 镜像/源码,腾讯云 Lighthouse 香港实例,飞书/Lark 长连接桥 先看 [docs/TENCENT_CLOUD_REMOTE_FIRST.md](docs/TENCENT_CLOUD_REMOTE_FIRST.md), 再按 [docs/TENCENT_LIGHTHOUSE_HK.md](docs/TENCENT_LIGHTHOUSE_HK.md) 配置服务器。 -### Auto 模式 +### 模型自动路由与 Fin 使用 `codewhale --model auto` 或 `/model auto` 让 codewhale 自行决定每轮需要多少模型和推理能力。 -Auto 模式同时控制两个设置: +模型自动路由同时控制两个设置: - 模型:`deepseek-v4-flash` 或 `deepseek-v4-pro` - 推理强度:`off`、`high` 或 `max` -在真实请求发出之前,应用会先用关闭推理的 `deepseek-v4-flash` 进行一次小型路由调用。路由器审视最新请求和最近的上下文,然后为真实请求选定具体的模型和推理强度。简短/简单的轮次保持在 Flash + 关闭推理;编码、调试、发布、架构、安全审查或模糊的多步骤任务可升级到 Pro 和/或更高推理强度。 +在真实请求发出之前,应用会先用关闭推理的 `deepseek-v4-flash` 进行一次小型路由调用。这条快速路径叫 **Fin**:用于模型选择、摘要、RLM 子任务、上下文维护以及其他不该消耗完整推理轮次的协调工作。Fin 审视最新请求和最近的上下文,然后为真实请求选定具体的模型和推理强度。简短/简单的轮次保持在 Flash + 关闭推理;编码、调试、发布、架构、安全审查或模糊的多步骤任务可升级到 Pro 和/或更高推理强度。 -`auto` 是 codewhale 本地行为。上游 API 永远不会收到 `model: "auto"`,它只会收到为当前轮次选定的具体模型和推理强度设置。TUI 会显示选定的路由,成本跟踪按实际运行的模型计费。如果路由调用失败或返回无效答案,应用会回退到本地启发式规则。子智能体会继承 auto 模式,除非你为它们指定了显式模型。 +`--model auto` 和 `/model auto` 是 codewhale 本地行为。上游 API 永远不会收到 `model: "auto"`,它只会收到为当前轮次选定的具体模型和推理强度设置。TUI 会显示选定的路由,成本跟踪按实际运行的模型计费。如果 Fin 路由失败或返回无效答案,应用会回退到本地启发式规则。子智能体会继承模型自动路由,除非你为它们指定了显式模型。 需要可重复基准测试、严格控制成本上限或特定提供商/模型映射时,请使用固定模型或固定推理强度。 @@ -289,10 +290,10 @@ codewhale --provider ollama --model codewhale-coder:1.3b ```bash codewhale # 交互式 TUI codewhale "explain this function" # 一次性提示 -codewhale exec --auto --output-format stream-json "fix this bug" # 面向后端集成的 NDJSON 流 +codewhale exec --auto --output-format stream-json "fix this bug" # 自动批准工具的 agentic exec codewhale exec --resume "follow up" # 继续非交互会话 codewhale --model deepseek-v4-flash "summarize" # 指定模型 -codewhale --model auto "fix this bug" # 自动选择模型 + 推理强度 +codewhale --model auto "fix this bug" # 自动路由模型 + 推理强度 codewhale --yolo # 自动批准工具 codewhale auth set --provider deepseek # 保存 API key codewhale doctor # 检查配置和连接 @@ -374,6 +375,10 @@ DeepSeek 可作为自定义 Agent Client Protocol 服务器运行,供 Zed 等 | **Agent** 🤖 | 默认交互模式;多步工具调用带审批门禁 | | **YOLO** ⚡ | 在可信工作区自动批准工具;仍会维护计划和清单以保持可见性 | +模式与模型自动路由是两个概念。`Tab` 切换 Plan / Agent / YOLO, +`/model auto` 选择模型和思考强度。`/goal` 当前用于追踪会话目标和 +token 预算;未来如果扩展成 Goal 工作区,也应与 `--model auto` 保持独立。 + --- ## 配置 diff --git a/crates/tui/CHANGELOG.md b/crates/tui/CHANGELOG.md index c414c513..c52f9e4b 100644 --- a/crates/tui/CHANGELOG.md +++ b/crates/tui/CHANGELOG.md @@ -19,6 +19,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 oriented.** README, rebrand notes, crate metadata, and npm package text now describe CodeWhale as an agentic terminal for open source and open-weight coding models while preserving the official DeepSeek provider as first-class. +- **Model auto-routing is documented separately from TUI modes.** README and + modes docs now reserve "mode" for Plan / Agent / YOLO, describe + `--model auto` as model/thinking routing, and name the fast + `deepseek-v4-flash` thinking-off seam as Fin. +- **Rebrand shim docs now match the v0.8.x transition window.** The npm and + migration notes no longer imply the legacy `deepseek-tui` package/shims + expired immediately after v0.8.41. ### Fixed diff --git a/crates/tui/src/tui/ui/tests.rs b/crates/tui/src/tui/ui/tests.rs index 12d8855a..fd0246a5 100644 --- a/crates/tui/src/tui/ui/tests.rs +++ b/crates/tui/src/tui/ui/tests.rs @@ -429,13 +429,17 @@ fn selection_to_text_copies_rendered_transcript_block() { let selected = selection_to_text(&app).expect("selection text"); assert!(selected.contains("Note copy system"), "{selected:?}"); assert!(selected.contains("copy user"), "{selected:?}"); + // Short completed thinking now renders inline (v0.8.42 thinking-preview + // change); it should be selectable/copyable as visible transcript text. assert!( - !selected.contains("copy thinking"), - "raw completed thinking should stay out of live selection text: {selected:?}" + selected.contains("copy thinking"), + "short completed thinking should be visible inline: {selected:?}" ); + // Short thinking that fits entirely inline doesn't need the Ctrl+O + // affordance; only truncated or explicit-summary thinking shows it. assert!( - selected.contains("Ctrl+O"), - "selection should keep the reasoning detail affordance: {selected:?}" + !selected.contains("Ctrl+O"), + "short completed thinking should not show the detail affordance: {selected:?}" ); assert!(selected.contains("tool output line"), "{selected:?}"); assert!(selected.contains("copy assistant"), "{selected:?}"); diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 3350428d..13176265 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -481,7 +481,9 @@ If you are upgrading from older releases: - `[snapshots].enabled` (bool, default `true`) - `[snapshots].max_age_days` (int, default `7`) - snapshots live under `~/.deepseek/snapshots///.git` and never use the workspace's own `.git` directory -- `context.*` (optional): append-only Flash seam manager, currently opt-in. +- `context.*` (optional): append-only Fin seam manager, currently opt-in. + Fin is the fast `deepseek-v4-flash` path with thinking off used for + coordination work such as routing, summaries, and context maintenance. Thresholds use the active request input estimate, not lifetime summed API usage: - `[context].enabled` (bool, default `false`) diff --git a/docs/MODES.md b/docs/MODES.md index 091db030..99226815 100644 --- a/docs/MODES.md +++ b/docs/MODES.md @@ -5,6 +5,10 @@ codewhale has two related concepts: - **TUI mode**: what kind of visible interaction you're in (Plan/Agent/YOLO). - **Approval mode**: how aggressively the UI asks before executing tools. +Model selection is separate. `--model auto` and `/model auto` route each turn to +a concrete model and thinking level; they are not TUI modes and are not part of +the `Tab` cycle. + ## TUI Modes Press `Tab` to complete composer menus, queue a draft as a next-turn follow-up @@ -20,6 +24,14 @@ Run `/mode` to open the mode picker, or switch directly with `/mode agent`, All three modes have access to persistent RLM sessions through `rlm_open`, `rlm_eval`, `rlm_configure`, and `rlm_close`. Inside an RLM Python REPL, `sub_query_batch` fans out 1-16 cheap parallel child calls pinned to `deepseek-v4-flash`. The model reaches for it when work is too large or repetitive for the parent transcript. +The fast `deepseek-v4-flash` / thinking-off path is called Fin in the product +language. Fin is a seam for routing, summaries, cheap child calls, and +coordination work; it does not change approval behavior. + +`/goal` sets a session objective with an optional token budget. It is goal +tracking today, not a separate TUI mode. If CodeWhale grows a persistent Goal +work surface later, it should remain distinct from `--model auto`. + ## Compatibility Notes - Older settings files with `default_mode = "normal"` still load as `agent`; saving rewrites the normalized value. diff --git a/docs/REBRAND.md b/docs/REBRAND.md index e14444f0..4ce7c9ba 100644 --- a/docs/REBRAND.md +++ b/docs/REBRAND.md @@ -62,16 +62,16 @@ Anything that targets the DeepSeek provider API stays exactly as it was: will be flipped to the new names in a follow-up. - **Docker image**: `ghcr.io/hmbown/codewhale`. -## Deprecation shims (one release cycle) +## Deprecation shims (through v0.8.x) To keep existing shell aliases, scripts, and CI working through the rename, -v0.8.41 ships **deprecation shims** for one cycle: +v0.8.41 and later v0.8.x releases ship **deprecation shims**: - A `deepseek` binary that prints a one-line warning to stderr and forwards argv to `codewhale`. - A `deepseek-tui` binary that does the same for `codewhale-tui`. -- An `npm` package at `deepseek-tui@0.8.41` with no `bin` and a postinstall - that prints a clear "rename" notice. +- An `npm` package at `deepseek-tui@0.8.x` with no `bin` and a postinstall + that prints a clear rename notice. These shims will be removed in **v0.9.0**. Please migrate before then. diff --git a/npm/codewhale/README.md b/npm/codewhale/README.md index 7fb283e4..ac6b0a23 100644 --- a/npm/codewhale/README.md +++ b/npm/codewhale/README.md @@ -5,7 +5,7 @@ models, from GitHub release artifacts. > Previously published as `deepseek-tui`. See `docs/REBRAND.md` in the upstream > repository for the migration notes; the legacy `deepseek-tui` npm package -> still exists as a deprecation shim for one release cycle. +> remains a deprecation shim through the v0.8.x transition. ## Install diff --git a/npm/deepseek-tui/README.md b/npm/deepseek-tui/README.md index 33308356..9b3dd161 100644 --- a/npm/deepseek-tui/README.md +++ b/npm/deepseek-tui/README.md @@ -9,7 +9,7 @@ npm install -g codewhale `codewhale` ships the same `codewhale` and `codewhale-tui` binaries plus deprecation shims under the old `deepseek` / `deepseek-tui` names so existing -scripts keep working through one transition release. +scripts keep working through the v0.8.x transition. See [docs/REBRAND.md](https://github.com/Hmbown/CodeWhale/blob/main/docs/REBRAND.md) for the full migration story.