From 06612495fc6027e9a365060a39053fca6dbc929b Mon Sep 17 00:00:00 2001 From: Hunter Bown Date: Tue, 2 Jun 2026 17:36:18 -0700 Subject: [PATCH] =?UTF-8?q?chore(release):=20prep=20v0.8.51=20=E2=80=94=20?= =?UTF-8?q?Arcee=20provider,=20cycle=20removal,=20UI=20fixes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Release-preparation checkpoint for v0.8.51 (workspace + npm bumped to 0.8.51). Added: - Arcee AI direct provider: [providers.arcee], ARCEE_API_KEY/BASE_URL/MODEL, CLI auth, provider + model picker, registry. Default direct-API model is trinity-large-thinking (reasoning, 262K ctx/out); preview + mini selectable. Cloudflare-WAF-safe opening turn (benign read-only tool surface, system-prompt payload splitting) and reasoning_content replay on tool-call turns. - Expanded model catalog (qwen3.6 flash/plus/max-preview, Xiaomi MiMo v2.5 chat/ASR/TTS); provider-aware model picker with per-provider saved models. Changed: - Auto-compaction is percentage- and model-aware (compaction_threshold_for_model_at_percent; default 80%; auto-enable for <=256K windows, opt-in for 1M models). - Provider/gateway HTTP errors sanitized (HTML/WAF interstitials collapsed, 401/403 split into authentication vs authorization). Removed: - The session cycle / checkpoint-restart system: /cycles, /cycle, /recall, recall_archive tool, cycle_manager, cycle-handoff prompt, sidebar cycle lines, EngineConfig.cycle / Event::CycleAdvanced / seam cycle thresholds. Fixed: - Orphaned assistant 'blue dot' role glyph on whitespace-only turns. - Sidebar mouse-wheel scroll leaking into the transcript. - Sidebar hover tooltip overlap + warning-orange styling. - README Constitution description corrected to match prompts/base.md. - Repaired release-blocking unit/integration tests after the refactors. Preflight: cargo fmt clean, workspace builds, 3903 tui tests pass (1 known flaky MCP SSE test under parallel load, passes in isolation). --- CHANGELOG.md | 72 +- Cargo.lock | 30 +- Cargo.toml | 2 +- README.md | 48 +- config.example.toml | 20 +- crates/agent/Cargo.toml | 2 +- crates/agent/src/lib.rs | 159 ++- crates/app-server/Cargo.toml | 18 +- crates/cli/Cargo.toml | 16 +- crates/cli/src/lib.rs | 43 +- crates/config/Cargo.toml | 4 +- crates/config/src/lib.rs | 83 +- crates/core/Cargo.toml | 16 +- crates/execpolicy/Cargo.toml | 2 +- crates/hooks/Cargo.toml | 2 +- crates/tools/Cargo.toml | 2 +- crates/tui/CHANGELOG.md | 72 +- crates/tui/Cargo.toml | 10 +- crates/tui/src/client.rs | 36 +- crates/tui/src/client/chat.rs | 180 +++- crates/tui/src/commands/config.rs | 1 + crates/tui/src/commands/core.rs | 78 +- crates/tui/src/commands/cycle.rs | 225 ---- crates/tui/src/commands/mod.rs | 28 - crates/tui/src/commands/provider.rs | 24 +- crates/tui/src/commands/skills.rs | 5 +- crates/tui/src/compaction.rs | 95 +- crates/tui/src/config.rs | 214 +++- crates/tui/src/core/engine.rs | 383 +++---- crates/tui/src/core/engine/context.rs | 4 - crates/tui/src/core/engine/tests.rs | 122 ++- crates/tui/src/core/engine/tool_catalog.rs | 58 + crates/tui/src/core/engine/tool_setup.rs | 3 +- crates/tui/src/core/engine/turn_loop.rs | 8 + crates/tui/src/core/events.rs | 11 - crates/tui/src/core/session.rs | 18 - crates/tui/src/cost_status.rs | 6 +- crates/tui/src/cycle_manager.rs | 1115 -------------------- crates/tui/src/error_taxonomy.rs | 24 +- crates/tui/src/llm_client/mod.rs | 274 ++++- crates/tui/src/localization.rs | 68 +- crates/tui/src/main.rs | 26 +- crates/tui/src/models.rs | 147 +-- crates/tui/src/prompts/cycle_handoff.md | 76 -- crates/tui/src/runtime_threads.rs | 52 +- crates/tui/src/seam_manager.rs | 108 +- crates/tui/src/settings.rs | 37 +- crates/tui/src/tools/mod.rs | 1 - crates/tui/src/tools/recall_archive.rs | 718 ------------- crates/tui/src/tools/registry.rs | 9 - crates/tui/src/tools/subagent/tests.rs | 4 +- crates/tui/src/tui/app.rs | 191 +++- crates/tui/src/tui/history.rs | 145 ++- crates/tui/src/tui/model_picker.rs | 230 +++- crates/tui/src/tui/mouse_ui.rs | 32 +- crates/tui/src/tui/notifications.rs | 2 +- crates/tui/src/tui/sidebar.rs | 26 - crates/tui/src/tui/ui.rs | 239 ++++- crates/tui/src/tui/ui/tests.rs | 186 +++- crates/tui/src/tui/views/mod.rs | 1 + crates/tui/src/vision/tools.rs | 7 +- crates/tui/tests/integration_mock_llm.rs | 3 +- docs/CONFIGURATION.md | 43 +- docs/LEGACY_RUST_AUDIT_0_7_6.md | 2 +- docs/PROVIDERS.md | 32 +- docs/REMOTE_SETUP_DESIGN.md | 264 +++++ npm/codewhale/package.json | 4 +- 67 files changed, 2965 insertions(+), 3201 deletions(-) delete mode 100644 crates/tui/src/commands/cycle.rs delete mode 100644 crates/tui/src/cycle_manager.rs delete mode 100644 crates/tui/src/prompts/cycle_handoff.md delete mode 100644 crates/tui/src/tools/recall_archive.rs create mode 100644 docs/REMOTE_SETUP_DESIGN.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 6fdabac0..146bce56 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,12 +7,76 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.8.51] - 2026-06-02 + ### Added -- Added Arcee AI as a direct OpenAI-compatible provider with `[providers.arcee]`, - `ARCEE_API_KEY` / `ARCEE_BASE_URL` / `ARCEE_MODEL`, `trinity-mini` as the - default model, and `trinity-large-preview` as the documented direct API model. - OpenRouter's `arcee-ai/trinity-large-thinking` route remains separate. +- **Arcee AI as a direct provider.** New `[providers.arcee]` config block and + `ARCEE_API_KEY` / `ARCEE_BASE_URL` / `ARCEE_MODEL` environment variables, + wired through CLI auth (`codewhale auth set --provider arcee`), the TUI + provider picker, and the model registry. The default direct-API model is + `trinity-large-thinking` (reasoning-capable, 262K context and 262K max + output); `trinity-large-preview` (262K context, non-reasoning) and + `trinity-mini` (128K context) are also selectable. OpenRouter's + `arcee-ai/trinity-large-thinking` route remains separate. +- **Arcee Cloudflare-WAF compatibility.** The opening turn to the Arcee gateway + uses a benign read-only tool surface (`read_file`, `list_dir`, `file_search`, + `grep_files`, `git_status`, `git_diff`, `checklist_write`, `update_plan`) and + splits example payloads such as `python -c …` out of the system prompt, so the + WAF does not reject the first request; the full tool catalog stays reachable + through tool-search. `trinity-large-thinking`'s `reasoning_content` is + recognized and replayed on tool-call turns. +- **Expanded model catalog.** Added context-window, max-output, and + reasoning-capability metadata for additional model IDs, including + `qwen/qwen3.6-flash`, `qwen/qwen3.6-plus`, `qwen/qwen3.6-max-preview`, and + Xiaomi MiMo v2.5 chat/ASR/TTS variants; `trinity-large-preview`'s context + window was corrected to 262K. +- **Provider-aware model picker.** The picker groups models by provider, shows + per-model hints, and remembers a saved model per provider. + +### Changed + +- **Auto-compaction is now percentage- and model-aware.** The per-model + threshold helper is `compaction_threshold_for_model_at_percent(model, + percent)` (replacing the effort-based variant), and the default + `auto_compact_threshold_percent` is 80%. Auto-compaction defaults on for + models with a context window of 256K or smaller and stays opt-in for 1M-token + models (e.g. DeepSeek V4) to protect prefix-cache economics, unless the user + has explicitly set `auto_compact`. +- **Clearer provider/gateway errors.** HTTP error bodies are sanitized before + display — HTML interstitials and Cloudflare "Access Denied" pages collapse to + a one-line reason (with the ray/error ID) instead of dumping raw markup into + the transcript — and 403s are split into authentication vs. authorization + (gateway/WAF block) categories. +- The invalid-model error now names the active provider and lists Arcee among + the options. + +### Removed + +- **The session "cycle" / checkpoint-restart system.** Removed the `/cycles`, + `/cycle `, and `/recall` commands, the `recall_archive` tool, the + cycle-handoff briefing prompt, the sidebar "cycles" lines, and the + `cycle_manager` engine plumbing (`EngineConfig.cycle`, `Event::CycleAdvanced`, + seam-manager cycle thresholds and flash briefings). Long sessions no longer + auto-reset their context at a fixed token boundary — reclaim budget with + `/compact` or model-aware auto-compaction instead. Existing on-disk cycle + archives are left untouched but are no longer read or written. + +### Fixed + +- Assistant turns no longer leave an orphaned role glyph (the stray "blue dot") + when a turn streams only whitespace between reasoning and a tool call. +- Scrolling the mouse wheel over the right-hand sidebar no longer leaks into the + transcript scroll. +- The sidebar hover tooltip now appears only for truncated lines, sits below the + cursor, and uses a neutral surface color instead of the warning-orange + highlight that overlapped neighbouring rows. +- Corrected the README's description of the Constitution (Article VII is the + hierarchy itself; Article II's truth duty overrides even a user request) to + match `prompts/base.md`. +- Repaired release-blocking unit and integration tests left failing by the + cycle-removal and compaction-threshold refactors (relay instruction, + model-reject message, compaction budget, mock-LLM threshold helper). ## [0.8.50] - 2026-06-02 diff --git a/Cargo.lock b/Cargo.lock index 4f3e1854..9b6465ab 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -803,7 +803,7 @@ checksum = "e9b18233253483ce2f65329a24072ec414db782531bdbb7d0bbc4bd2ce6b7e21" [[package]] name = "codewhale-agent" -version = "0.8.50" +version = "0.8.51" dependencies = [ "codewhale-config", "serde", @@ -811,7 +811,7 @@ dependencies = [ [[package]] name = "codewhale-app-server" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "axum", @@ -836,7 +836,7 @@ dependencies = [ [[package]] name = "codewhale-cli" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "chrono", @@ -863,7 +863,7 @@ dependencies = [ [[package]] name = "codewhale-config" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "codewhale-execpolicy", @@ -877,7 +877,7 @@ dependencies = [ [[package]] name = "codewhale-core" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "chrono", @@ -895,7 +895,7 @@ dependencies = [ [[package]] name = "codewhale-execpolicy" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "codewhale-protocol", @@ -904,7 +904,7 @@ dependencies = [ [[package]] name = "codewhale-hooks" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "async-trait", @@ -918,7 +918,7 @@ dependencies = [ [[package]] name = "codewhale-mcp" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "serde", @@ -927,7 +927,7 @@ dependencies = [ [[package]] name = "codewhale-protocol" -version = "0.8.50" +version = "0.8.51" dependencies = [ "serde", "serde_json", @@ -935,7 +935,7 @@ dependencies = [ [[package]] name = "codewhale-release" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "reqwest", @@ -946,7 +946,7 @@ dependencies = [ [[package]] name = "codewhale-secrets" -version = "0.8.50" +version = "0.8.51" dependencies = [ "dirs", "keyring", @@ -959,7 +959,7 @@ dependencies = [ [[package]] name = "codewhale-state" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "chrono", @@ -971,7 +971,7 @@ dependencies = [ [[package]] name = "codewhale-tools" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "async-trait", @@ -985,7 +985,7 @@ dependencies = [ [[package]] name = "codewhale-tui" -version = "0.8.50" +version = "0.8.51" dependencies = [ "anyhow", "arboard", @@ -1054,7 +1054,7 @@ dependencies = [ [[package]] name = "codewhale-tui-core" -version = "0.8.50" +version = "0.8.51" [[package]] name = "colorchoice" diff --git a/Cargo.toml b/Cargo.toml index 614400a8..7daf7ea1 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -20,7 +20,7 @@ default-members = ["crates/cli", "crates/app-server", "crates/tui"] resolver = "2" [workspace.package] -version = "0.8.50" +version = "0.8.51" edition = "2024" # Rust 1.88 stabilized `let_chains` in `if`/`while` conditions, which the # codebase relies on extensively. Cargo enforces this so users on older diff --git a/README.md b/README.md index 6a071520..b8dcdde8 100644 --- a/README.md +++ b/README.md @@ -104,18 +104,19 @@ for authority in a single turn. LLM-as-a-judge needs jurisdiction — which source wins when they disagree? CodeWhale answers this with a **Constitution** (`prompts/base.md`). It's a -formal hierarchy of law — Article VII ranks nine sources from the +formal hierarchy of law — Article VII ranks nine tiers from the Constitution's own articles down to prior-session handoffs. The user's current message outranks stale project instructions. Live tool output outranks assumptions. Verification outranks confidence. The model inherits a clear chain of authority every turn and never has to guess which directive to follow. -Seven articles sit above the hierarchy, defining the model's identity, -duties, and agency: a verification mandate (Article V — every action leaves -evidence, never declare success on faith), a coordination legacy (Article -VI — leave the workspace legible for the next intelligence), and a -primacy-of-truth clause (Article II — no lower rule may override it). +Six Articles define the model's identity, duties, and agency (Article VII +is the hierarchy itself): a verification mandate (Article V — every action +leaves evidence, never declare success on faith), a coordination legacy +(Article VI — leave the workspace cleaner and the handoff truthful for the +next intelligence), and a primacy-of-truth clause (Article II — +non-negotiable; not even a user request may override the duty of truth). DeepSeek V4's prefix caching makes this practical. The Constitution is long and detailed, but once cached it costs roughly 100× less per turn than a @@ -320,14 +321,45 @@ codewhale --provider openrouter --model deepseek/deepseek-v4-pro codewhale --provider openrouter --model arcee-ai/trinity-large-thinking codewhale --provider openrouter --model minimax/minimax-m3 -# Arcee AI direct API +Arcee AI offers direct API access to its powerful Trinity models, including the reasoning-capable Trinity-Large Thinking. This section provides comprehensive setup instructions and model comparisons. + +## Configuration + +### API Key +The primary authentication method is the `ARCEE_API_KEY` environment variable or the `[providers.arcee]` configuration section in `~/.codewhale/config.toml`: + +```toml +[providers.arcee] +# api_key = "your-arcee-api-key" +# base_url = "https://api.arcee.ai/api/v1" +# model = "trinity-large-thinking" # or "trinity-large-preview", "trinity-mini" +``` + +### Environment Variables + +- `ARCEE_API_KEY`: Your Arcee API key (required) +- `ARCEE_BASE_URL`: Custom base URL (optional, defaults to `https://api.arcee.ai/api/v1`) +- `ARCEE_MODEL`: Default model to use (optional, defaults to `trinity-large-thinking`) + +### Model Support + +CodeWhale supports three Arcee models: + +| Model | Reasoning | Context Window | Max Output | Best For | +|--------|-----------|----------------|------------|----------| +| `trinity-large-thinking` | ✅ Yes | 262,144 tokens | 262,144 tokens | Complex reasoning, coding, math | +| `trinity-large-preview` | ❌ No | 262,144 tokens | 4,096 tokens | High-accuracy non-reasoning tasks | +| `trinity-mini` | ❌ No | 128,000 tokens | 4,096 tokens | Faster, cost-effective tasks | + +**Note:** The `trinity-large-thinking` model supports reasoning (thinking mode) and can handle very large contexts, making it ideal for complex programming tasks. The other models do not support reasoning but offer larger context windows than many other providers. codewhale auth set --provider arcee --api-key "YOUR_ARCEE_API_KEY" -codewhale --provider arcee --model trinity-mini +codewhale --provider arcee --model trinity-large-thinking codewhale --provider arcee --model trinity-large-preview # Xiaomi MiMo codewhale auth set --provider xiaomi-mimo --api-key "YOUR_XIAOMI_KEY" codewhale --provider xiaomi-mimo --model mimo-v2.5-pro +codewhale --provider xiaomi-mimo --model mimo-v2.5 codewhale --provider xiaomi-mimo speech "Hello from MiMo" --model tts -o hello.wav # Novita diff --git a/config.example.toml b/config.example.toml index 13649c80..51017ef9 100644 --- a/config.example.toml +++ b/config.example.toml @@ -45,13 +45,14 @@ base_url = "https://api.deepseek.com/beta" # deepseek-ai/deepseek-v4-flash — default AtlasCloud model ID # deepseek-reasoner — default Wanjie Ark model ID # mimo-v2.5-pro — default Xiaomi MiMo model ID -# mimo-v2.5-tts ? Xiaomi MiMo speech/TTS model ID -# mimo-v2.5-tts-voicedesign ? Xiaomi MiMo voice-design TTS model ID -# mimo-v2.5-tts-voiceclone ? Xiaomi MiMo voice-clone TTS model ID +# mimo-v2.5 — Xiaomi MiMo V2.5 Omni model ID +# mimo-v2.5-tts — Xiaomi MiMo speech/TTS model ID +# mimo-v2.5-tts-voicedesign — Xiaomi MiMo voice-design TTS model ID +# mimo-v2.5-tts-voiceclone — Xiaomi MiMo voice-clone TTS model ID # accounts/fireworks/models/deepseek-v4-pro — Fireworks AI Pro model ID # deepseek-ai/DeepSeek-V4-Pro — SiliconFlow hosted Pro model ID # deepseek-ai/DeepSeek-V4-Flash — SiliconFlow hosted Flash model ID -# trinity-mini — default direct Arcee AI API model ID +# trinity-large-thinking — default direct Arcee AI API model ID # trinity-large-preview — direct Arcee AI API model ID # deepseek-ai/DeepSeek-V4-Pro — SGLang self-hosted Pro model ID # deepseek-ai/DeepSeek-V4-Flash — SGLang self-hosted Flash model ID @@ -304,7 +305,8 @@ max_subagents = 10 # optional (1-20) # base_url = "https://openrouter.ai/api/v1" # model = "deepseek/deepseek-v4-pro" # Recent large model IDs also accepted here include arcee-ai/trinity-large-thinking, -# xiaomi/mimo-v2.5-pro, qwen/qwen3.6-35b-a3b, +# xiaomi/mimo-v2.5-pro, qwen/qwen3.6-flash, qwen/qwen3.6-35b-a3b, +# qwen/qwen3.6-max-preview, qwen/qwen3.6-27b, qwen/qwen3.6-plus, # google/gemma-4-31b-it, z-ai/glm-5.1, moonshotai/kimi-k2.6, and # nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free. @@ -313,6 +315,7 @@ max_subagents = 10 # optional (1-20) # api_key = "YOUR_XIAOMI_KEY" # base_url = "https://api.xiaomimimo.com/v1" # model = "mimo-v2.5-pro" # chat/reasoning +# Chat model IDs: mimo-v2.5-pro, mimo-v2.5 # TTS aliases are also accepted by `codewhale speech`: tts, voice-design, voice-clone # TTS model IDs: mimo-v2.5-tts, mimo-v2.5-tts-voicedesign, mimo-v2.5-tts-voiceclone, mimo-v2-tts @@ -338,7 +341,7 @@ max_subagents = 10 # optional (1-20) [providers.arcee] # api_key = "YOUR_ARCEE_API_KEY" # base_url = "https://api.arcee.ai/api/v1" -# model = "trinity-mini" # or trinity-large-preview +# model = "trinity-large-thinking" # or trinity-large-preview # Moonshot/Kimi OpenAI-compatible endpoint (https://platform.moonshot.ai) [providers.moonshot] @@ -510,7 +513,7 @@ exponential_base = 2.0 # ───────────────────────────────────────────────────────────────────────────────── # Auto-compaction is a saved UI setting edited with `/config` (`auto_compact`). # The optional saved threshold setting is `auto_compact_threshold_percent` -# (default 70, still gated by the 500K-token floor). There is no config-file +# (default 80). There is no config-file # `[compaction]` table yet; runtime compaction budgets are chosen by the TUI # from the active model/context window. @@ -524,9 +527,6 @@ verbatim_window_turns = 16 l1_threshold = 192000 l2_threshold = 384000 l3_threshold = 576000 -# Hard cycle reserves the normal 262144-token internal turn budget plus 1024 -# safety tokens, separate from V4's official 384000 max-output metadata. -cycle_threshold = 768000 seam_model = "deepseek-v4-flash" # ───────────────────────────────────────────────────────────────────────────────── diff --git a/crates/agent/Cargo.toml b/crates/agent/Cargo.toml index a4293707..a3967b51 100644 --- a/crates/agent/Cargo.toml +++ b/crates/agent/Cargo.toml @@ -7,5 +7,5 @@ repository.workspace = true description = "Model/provider registry and fallback strategy for DeepSeek workspace architecture" [dependencies] -codewhale-config = { path = "../config", version = "0.8.50" } +codewhale-config = { path = "../config", version = "0.8.51" } serde.workspace = true diff --git a/crates/agent/src/lib.rs b/crates/agent/src/lib.rs index a96dc226..c1414adc 100644 --- a/crates/agent/src/lib.rs +++ b/crates/agent/src/lib.rs @@ -164,6 +164,17 @@ impl Default for ModelRegistry { supports_tools: true, supports_reasoning: true, }, + ModelInfo { + id: "trinity-large-thinking".to_string(), + provider: ProviderKind::Arcee, + aliases: vec![ + "trinity".to_string(), + "arcee-trinity".to_string(), + "arcee-trinity-large-thinking".to_string(), + ], + supports_tools: true, + supports_reasoning: true, + }, ModelInfo { id: "deepseek/deepseek-v4-pro".to_string(), provider: ProviderKind::Openrouter, @@ -217,6 +228,13 @@ impl Default for ModelRegistry { supports_tools: true, supports_reasoning: true, }, + ModelInfo { + id: "qwen/qwen3.6-flash".to_string(), + provider: ProviderKind::Openrouter, + aliases: vec!["qwen3.6-flash".to_string(), "qwen-3.6-flash".to_string()], + supports_tools: true, + supports_reasoning: true, + }, ModelInfo { id: "qwen/qwen3.6-35b-a3b".to_string(), provider: ProviderKind::Openrouter, @@ -227,6 +245,17 @@ impl Default for ModelRegistry { supports_tools: true, supports_reasoning: true, }, + ModelInfo { + id: "qwen/qwen3.6-max-preview".to_string(), + provider: ProviderKind::Openrouter, + aliases: vec![ + "qwen3.6-max-preview".to_string(), + "qwen-3.6-max-preview".to_string(), + "qwen-max-preview".to_string(), + ], + supports_tools: true, + supports_reasoning: true, + }, ModelInfo { id: "qwen/qwen3.6-27b".to_string(), provider: ProviderKind::Openrouter, @@ -234,6 +263,13 @@ impl Default for ModelRegistry { supports_tools: true, supports_reasoning: true, }, + ModelInfo { + id: "qwen/qwen3.6-plus".to_string(), + provider: ProviderKind::Openrouter, + aliases: vec!["qwen3.6-plus".to_string(), "qwen-3.6-plus".to_string()], + supports_tools: true, + supports_reasoning: true, + }, ModelInfo { id: "moonshotai/kimi-k2.6".to_string(), provider: ProviderKind::Openrouter, @@ -296,17 +332,40 @@ impl Default for ModelRegistry { ModelInfo { id: "mimo-v2.5-pro".to_string(), provider: ProviderKind::XiaomiMimo, - aliases: vec!["mimo".to_string()], + aliases: vec![ + "mimo".to_string(), + "pro".to_string(), + "xiaomi-mimo-v2.5-pro".to_string(), + "xiaomi-mimo-v2-5-pro".to_string(), + ], supports_tools: true, supports_reasoning: true, }, ModelInfo { id: "mimo-v2.5".to_string(), provider: ProviderKind::XiaomiMimo, - aliases: vec!["xiaomi-mimo-v2.5".to_string()], + aliases: vec![ + "omni".to_string(), + "mimo-omni".to_string(), + "v2.5-omni".to_string(), + "mimo-v2.5-omni".to_string(), + "xiaomi-mimo-v2.5".to_string(), + "xiaomi-mimo-v2.5-omni".to_string(), + ], supports_tools: true, supports_reasoning: true, }, + ModelInfo { + id: "mimo-v2.5-asr".to_string(), + provider: ProviderKind::XiaomiMimo, + aliases: vec![ + "asr".to_string(), + "speech-to-text".to_string(), + "transcribe".to_string(), + ], + supports_tools: false, + supports_reasoning: false, + }, ModelInfo { id: "mimo-v2.5-tts".to_string(), provider: ProviderKind::XiaomiMimo, @@ -403,17 +462,6 @@ impl Default for ModelRegistry { supports_tools: true, supports_reasoning: true, }, - ModelInfo { - id: "trinity-mini".to_string(), - provider: ProviderKind::Arcee, - aliases: vec![ - "trinity".to_string(), - "arcee-trinity".to_string(), - "arcee-trinity-mini".to_string(), - ], - supports_tools: true, - supports_reasoning: false, - }, ModelInfo { id: "trinity-large-preview".to_string(), provider: ProviderKind::Arcee, @@ -581,6 +629,16 @@ impl ModelRegistry { fallback_chain, }; } + if provider_hint == Some(ProviderKind::XiaomiMimo) + && let Some(model) = xiaomi_mimo_passthrough_model(name) + { + return ModelResolution { + requested: Some(name.to_string()), + resolved: model, + used_fallback: false, + fallback_chain, + }; + } if let Some(idx) = self.alias_map.get(&normalize(name)) { return ModelResolution { requested: Some(name.to_string()), @@ -671,6 +729,21 @@ fn arcee_passthrough_model(requested: &str) -> Option { }) } +fn xiaomi_mimo_passthrough_model(requested: &str) -> Option { + let requested = requested.trim(); + if requested.is_empty() || requested.chars().any(char::is_control) { + return None; + } + + Some(ModelInfo { + id: requested.to_string(), + provider: ProviderKind::XiaomiMimo, + aliases: Vec::new(), + supports_tools: true, + supports_reasoning: true, + }) +} + #[cfg(test)] mod tests { use super::*; @@ -807,6 +880,40 @@ mod tests { assert_eq!(resolved.resolved.id, "mimo-v2.5-tts-voiceclone"); } + #[test] + fn xiaomi_mimo_chat_aliases_resolve_when_provider_hinted() { + let registry = ModelRegistry::default(); + + let resolved = registry.resolve(Some("omni"), Some(ProviderKind::XiaomiMimo)); + assert_eq!(resolved.resolved.provider, ProviderKind::XiaomiMimo); + assert_eq!(resolved.resolved.id, "mimo-v2.5"); + assert!(resolved.resolved.supports_tools); + } + + #[test] + fn xiaomi_mimo_provider_hint_preserves_custom_model_id() { + let registry = ModelRegistry::default(); + let resolved = + registry.resolve(Some("account-custom-mimo"), Some(ProviderKind::XiaomiMimo)); + + assert_eq!(resolved.resolved.provider, ProviderKind::XiaomiMimo); + assert_eq!(resolved.resolved.id, "account-custom-mimo"); + assert!(!resolved.used_fallback); + } + + #[test] + fn xiaomi_mimo_provider_hint_does_not_reclassify_openrouter_model_id() { + let registry = ModelRegistry::default(); + let resolved = registry.resolve( + Some("deepseek/deepseek-v4-pro"), + Some(ProviderKind::XiaomiMimo), + ); + + assert_eq!(resolved.resolved.provider, ProviderKind::XiaomiMimo); + assert_eq!(resolved.resolved.id, "deepseek/deepseek-v4-pro"); + assert!(!resolved.used_fallback); + } + #[test] fn wanjie_ark_default_uses_reasoner_model_id() { let registry = ModelRegistry::default(); @@ -849,19 +956,30 @@ mod tests { } #[test] - fn arcee_default_uses_direct_trinity_mini_model_id() { + fn arcee_default_uses_direct_trinity_large_thinking_model_id() { let registry = ModelRegistry::default(); let resolved = registry.resolve(None, Some(ProviderKind::Arcee)); assert_eq!(resolved.resolved.provider, ProviderKind::Arcee); - assert_eq!(resolved.resolved.id, "trinity-mini"); + assert_eq!(resolved.resolved.id, "trinity-large-thinking"); + assert!(resolved.resolved.supports_reasoning); } #[test] - fn arcee_trinity_alias_resolves_to_direct_provider_not_openrouter() { + fn arcee_trinity_alias_resolves_to_direct_large_thinking_not_openrouter() { let registry = ModelRegistry::default(); let resolved = registry.resolve(Some("trinity"), Some(ProviderKind::Arcee)); + assert_eq!(resolved.resolved.provider, ProviderKind::Arcee); + assert_eq!(resolved.resolved.id, "trinity-large-thinking"); + assert!(resolved.resolved.supports_reasoning); + } + + #[test] + fn arcee_trinity_mini_remains_explicit_compatibility_model() { + let registry = ModelRegistry::default(); + let resolved = registry.resolve(Some("trinity-mini"), Some(ProviderKind::Arcee)); + assert_eq!(resolved.resolved.provider, ProviderKind::Arcee); assert_eq!(resolved.resolved.id, "trinity-mini"); assert!(!resolved.resolved.supports_reasoning); @@ -870,11 +988,11 @@ mod tests { #[test] fn arcee_provider_hint_preserves_explicit_future_model_id() { let registry = ModelRegistry::default(); - let resolved = registry.resolve(Some("trinity-large-thinking"), Some(ProviderKind::Arcee)); + let resolved = registry.resolve(Some("trinity-large-next"), Some(ProviderKind::Arcee)); assert_eq!(resolved.resolved.provider, ProviderKind::Arcee); - assert_eq!(resolved.resolved.id, "trinity-large-thinking"); - assert!(resolved.resolved.supports_reasoning); + assert_eq!(resolved.resolved.id, "trinity-large-next"); + assert!(!resolved.resolved.supports_reasoning); assert!(!resolved.used_fallback); } @@ -920,7 +1038,10 @@ mod tests { for (alias, expected) in [ ("trinity-large-thinking", "arcee-ai/trinity-large-thinking"), + ("qwen3.6-flash", "qwen/qwen3.6-flash"), ("qwen3.6-35b-a3b", "qwen/qwen3.6-35b-a3b"), + ("qwen3.6-max-preview", "qwen/qwen3.6-max-preview"), + ("qwen3.6-plus", "qwen/qwen3.6-plus"), ("gemma-4-31b-it", "google/gemma-4-31b-it"), ("glm-5.1", "z-ai/glm-5.1"), ("minimax-m3", "minimax/minimax-m3"), diff --git a/crates/app-server/Cargo.toml b/crates/app-server/Cargo.toml index 9ec76487..7d2b3deb 100644 --- a/crates/app-server/Cargo.toml +++ b/crates/app-server/Cargo.toml @@ -10,15 +10,15 @@ description = "Codex-style app-server transport for DeepSeek workspace architect anyhow.workspace = true axum.workspace = true clap.workspace = true -codewhale-agent = { path = "../agent", version = "0.8.50" } -codewhale-config = { path = "../config", version = "0.8.50" } -codewhale-core = { path = "../core", version = "0.8.50" } -codewhale-execpolicy = { path = "../execpolicy", version = "0.8.50" } -codewhale-hooks = { path = "../hooks", version = "0.8.50" } -codewhale-mcp = { path = "../mcp", version = "0.8.50" } -codewhale-protocol = { path = "../protocol", version = "0.8.50" } -codewhale-state = { path = "../state", version = "0.8.50" } -codewhale-tools = { path = "../tools", version = "0.8.50" } +codewhale-agent = { path = "../agent", version = "0.8.51" } +codewhale-config = { path = "../config", version = "0.8.51" } +codewhale-core = { path = "../core", version = "0.8.51" } +codewhale-execpolicy = { path = "../execpolicy", version = "0.8.51" } +codewhale-hooks = { path = "../hooks", version = "0.8.51" } +codewhale-mcp = { path = "../mcp", version = "0.8.51" } +codewhale-protocol = { path = "../protocol", version = "0.8.51" } +codewhale-state = { path = "../state", version = "0.8.51" } +codewhale-tools = { path = "../tools", version = "0.8.51" } serde.workspace = true serde_json.workspace = true tokio.workspace = true diff --git a/crates/cli/Cargo.toml b/crates/cli/Cargo.toml index 59d7bd00..338c6c80 100644 --- a/crates/cli/Cargo.toml +++ b/crates/cli/Cargo.toml @@ -25,14 +25,14 @@ path = "src/bin/deepseek_legacy_shim.rs" anyhow.workspace = true clap.workspace = true clap_complete.workspace = true -codewhale-agent = { path = "../agent", version = "0.8.50" } -codewhale-app-server = { path = "../app-server", version = "0.8.50" } -codewhale-config = { path = "../config", version = "0.8.50" } -codewhale-execpolicy = { path = "../execpolicy", version = "0.8.50" } -codewhale-mcp = { path = "../mcp", version = "0.8.50" } -codewhale-release = { path = "../release", version = "0.8.50" } -codewhale-secrets = { path = "../secrets", version = "0.8.50" } -codewhale-state = { path = "../state", version = "0.8.50" } +codewhale-agent = { path = "../agent", version = "0.8.51" } +codewhale-app-server = { path = "../app-server", version = "0.8.51" } +codewhale-config = { path = "../config", version = "0.8.51" } +codewhale-execpolicy = { path = "../execpolicy", version = "0.8.51" } +codewhale-mcp = { path = "../mcp", version = "0.8.51" } +codewhale-release = { path = "../release", version = "0.8.51" } +codewhale-secrets = { path = "../secrets", version = "0.8.51" } +codewhale-state = { path = "../state", version = "0.8.51" } chrono.workspace = true dirs.workspace = true serde.workspace = true diff --git a/crates/cli/src/lib.rs b/crates/cli/src/lib.rs index a0815743..85901a06 100644 --- a/crates/cli/src/lib.rs +++ b/crates/cli/src/lib.rs @@ -784,7 +784,6 @@ fn write_provider_api_key_to_config( provider: ProviderKind, api_key: &str, ) { - store.config.provider = provider; store.config.auth_mode = Some("api_key".to_string()); store.config.providers.for_provider_mut(provider).api_key = Some(api_key.to_string()); if provider == ProviderKind::Deepseek { @@ -987,7 +986,6 @@ fn run_auth_command_with_secrets( let provider: ProviderKind = provider.into(); let slot = provider_slot(provider); if provider == ProviderKind::Ollama && api_key.is_none() && !api_key_stdin { - store.config.provider = provider; let provider_cfg = store.config.providers.for_provider_mut(provider); if provider_cfg.base_url.is_none() { provider_cfg.base_url = Some("http://localhost:11434/v1".to_string()); @@ -2369,6 +2367,44 @@ mod tests { let _ = std::fs::remove_file(path); } + #[test] + fn auth_set_provider_key_does_not_switch_active_provider() { + let nanos = chrono::Utc::now().timestamp_nanos_opt().unwrap_or_default(); + let path = std::env::temp_dir().join(format!( + "deepseek-cli-auth-set-preserve-provider-test-{}-{nanos}.toml", + std::process::id() + )); + let mut store = ConfigStore::load(Some(path.clone())).expect("store should load"); + store.config.provider = ProviderKind::Deepseek; + let secrets = no_keyring_secrets(); + + run_auth_command_with_secrets( + &mut store, + AuthCommand::Set { + provider: ProviderArg::Arcee, + api_key: Some("arcee-key".to_string()), + api_key_stdin: false, + }, + &secrets, + ) + .expect("set should succeed"); + + assert_eq!(store.config.provider, ProviderKind::Deepseek); + assert_eq!( + store.config.providers.arcee.api_key.as_deref(), + Some("arcee-key") + ); + + let reloaded = ConfigStore::load(Some(path.clone())).expect("store should reload"); + assert_eq!(reloaded.config.provider, ProviderKind::Deepseek); + assert_eq!( + reloaded.config.providers.arcee.api_key.as_deref(), + Some("arcee-key") + ); + + let _ = std::fs::remove_file(path); + } + #[test] fn auth_set_ollama_accepts_empty_key_and_records_base_url() { let nanos = chrono::Utc::now().timestamp_nanos_opt().unwrap_or_default(); @@ -2377,6 +2413,7 @@ mod tests { std::process::id() )); let mut store = ConfigStore::load(Some(path.clone())).expect("store should load"); + store.config.provider = ProviderKind::Deepseek; let secrets = no_keyring_secrets(); run_auth_command_with_secrets( @@ -2390,7 +2427,7 @@ mod tests { ) .expect("ollama auth set should not require a key"); - assert_eq!(store.config.provider, ProviderKind::Ollama); + assert_eq!(store.config.provider, ProviderKind::Deepseek); assert_eq!( store.config.providers.ollama.base_url.as_deref(), Some("http://localhost:11434/v1") diff --git a/crates/config/Cargo.toml b/crates/config/Cargo.toml index 71191068..9ffefcb7 100644 --- a/crates/config/Cargo.toml +++ b/crates/config/Cargo.toml @@ -8,8 +8,8 @@ description = "Config schema and precedence model for DeepSeek workspace archite [dependencies] anyhow.workspace = true -codewhale-execpolicy = { path = "../execpolicy", version = "0.8.50" } -codewhale-secrets = { path = "../secrets", version = "0.8.50" } +codewhale-execpolicy = { path = "../execpolicy", version = "0.8.51" } +codewhale-secrets = { path = "../secrets", version = "0.8.51" } dirs.workspace = true serde.workspace = true serde_json.workspace = true diff --git a/crates/config/src/lib.rs b/crates/config/src/lib.rs index 5f3161b8..758214f2 100644 --- a/crates/config/src/lib.rs +++ b/crates/config/src/lib.rs @@ -38,12 +38,17 @@ const OPENROUTER_GLM_5_1_MODEL: &str = "z-ai/glm-5.1"; const OPENROUTER_KIMI_K2_6_MODEL: &str = "moonshotai/kimi-k2.6"; const OPENROUTER_NEMOTRON_3_NANO_OMNI_MODEL: &str = "nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free"; +const OPENROUTER_QWEN_3_6_FLASH_MODEL: &str = "qwen/qwen3.6-flash"; const OPENROUTER_QWEN_3_6_35B_A3B_MODEL: &str = "qwen/qwen3.6-35b-a3b"; +const OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL: &str = "qwen/qwen3.6-max-preview"; const OPENROUTER_QWEN_3_6_27B_MODEL: &str = "qwen/qwen3.6-27b"; +const OPENROUTER_QWEN_3_6_PLUS_MODEL: &str = "qwen/qwen3.6-plus"; const OPENROUTER_TENCENT_HY3_PREVIEW_MODEL: &str = "tencent/hy3-preview"; const OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL: &str = "xiaomi/mimo-v2.5-pro"; const OPENROUTER_XIAOMI_MIMO_V2_5_MODEL: &str = "xiaomi/mimo-v2.5"; const DEFAULT_XIAOMI_MIMO_MODEL: &str = "mimo-v2.5-pro"; +const XIAOMI_MIMO_V2_5_OMNI_MODEL: &str = "mimo-v2.5"; +const XIAOMI_MIMO_ASR_MODEL: &str = "mimo-v2.5-asr"; const XIAOMI_MIMO_TTS_MODEL: &str = "mimo-v2.5-tts"; const XIAOMI_MIMO_TTS_VOICE_DESIGN_MODEL: &str = "mimo-v2.5-tts-voicedesign"; const XIAOMI_MIMO_TTS_VOICE_CLONE_MODEL: &str = "mimo-v2.5-tts-voiceclone"; @@ -53,8 +58,9 @@ const DEFAULT_NOVITA_FLASH_MODEL: &str = "deepseek/deepseek-v4-flash"; const DEFAULT_FIREWORKS_MODEL: &str = "accounts/fireworks/models/deepseek-v4-pro"; const DEFAULT_SILICONFLOW_MODEL: &str = "deepseek-ai/DeepSeek-V4-Pro"; const DEFAULT_SILICONFLOW_FLASH_MODEL: &str = "deepseek-ai/DeepSeek-V4-Flash"; -const DEFAULT_ARCEE_MODEL: &str = "trinity-mini"; +const DEFAULT_ARCEE_MODEL: &str = "trinity-large-thinking"; const ARCEE_TRINITY_LARGE_PREVIEW_MODEL: &str = "trinity-large-preview"; +const ARCEE_TRINITY_MINI_MODEL: &str = "trinity-mini"; const DEFAULT_MOONSHOT_MODEL: &str = "kimi-k2.6"; const DEFAULT_MOONSHOT_BASE_URL: &str = "https://api.moonshot.ai/v1"; const DEFAULT_KIMI_CODE_MODEL: &str = "kimi-for-coding"; @@ -193,7 +199,7 @@ pub struct ProvidersToml { pub volcengine: ProviderConfigToml, #[serde(default)] pub openrouter: ProviderConfigToml, - #[serde(default)] + #[serde(default, alias = "xiaomi", alias = "mimo", alias = "xiaomimimo")] pub xiaomi_mimo: ProviderConfigToml, #[serde(default)] pub novita: ProviderConfigToml, @@ -1559,8 +1565,12 @@ fn normalize_model_for_provider(provider: ProviderKind, model: &str) -> String { ProviderKind::Siliconflow, "deepseek-v4-flash" | "deepseek-v4flash" | "deepseek-chat" | "deepseek-v3", ) => DEFAULT_SILICONFLOW_FLASH_MODEL.to_string(), - (ProviderKind::Arcee, "trinity" | "arcee-trinity" | "arcee-trinity-mini") => { - DEFAULT_ARCEE_MODEL.to_string() + ( + ProviderKind::Arcee, + "trinity" | "arcee-trinity" | "trinity-large-thinking" | "arcee-trinity-large-thinking", + ) => DEFAULT_ARCEE_MODEL.to_string(), + (ProviderKind::Arcee, "trinity-mini" | "arcee-trinity-mini") => { + ARCEE_TRINITY_MINI_MODEL.to_string() } (ProviderKind::Arcee, "arcee-trinity-large-preview") => { ARCEE_TRINITY_LARGE_PREVIEW_MODEL.to_string() @@ -1595,8 +1605,22 @@ fn canonical_xiaomi_mimo_model_id(model: &str) -> Option<&'static str> { | "mimo-v2-5-pro" | "xiaomi-mimo-v2.5-pro" | "xiaomi-mimo-v2-5-pro" => Some(DEFAULT_XIAOMI_MIMO_MODEL), - "mimo-v2.5" | "mimo-v25" | "mimo-v2-5" | "xiaomi-mimo-v2.5" | "xiaomi-mimo-v2-5" => { - Some("mimo-v2.5") + "omni" + | "mimo-omni" + | "v2.5-omni" + | "v25-omni" + | "mimo-v2.5" + | "mimo-v25" + | "mimo-v2-5" + | "mimo-v2.5-omni" + | "mimo-v25-omni" + | "mimo-v2-5-omni" + | "xiaomi-mimo-v2.5" + | "xiaomi-mimo-v2-5" + | "xiaomi-mimo-v2.5-omni" + | "xiaomi-mimo-v2-5-omni" => Some(XIAOMI_MIMO_V2_5_OMNI_MODEL), + "asr" | "mimo-asr" | "mimo-v2.5-asr" | "speech-to-text" | "transcribe" => { + Some(XIAOMI_MIMO_ASR_MODEL) } "mimo-tts" | "mimo-v25-tts" | "mimo-v2.5-tts" | "tts" | "speech" => { Some(XIAOMI_MIMO_TTS_MODEL) @@ -1646,9 +1670,19 @@ fn canonical_openrouter_recent_model_id(model: &str) -> Option<&'static str> { | "qwen3.6-35b-a3b" | "qwen-3.6-35b-a3b" | "qwen3-6-35b-a3b" => Some(OPENROUTER_QWEN_3_6_35B_A3B_MODEL), + OPENROUTER_QWEN_3_6_FLASH_MODEL | "qwen3.6-flash" | "qwen-3.6-flash" => { + Some(OPENROUTER_QWEN_3_6_FLASH_MODEL) + } + OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL + | "qwen3.6-max-preview" + | "qwen-3.6-max-preview" + | "qwen-max-preview" => Some(OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL), OPENROUTER_QWEN_3_6_27B_MODEL | "qwen3.6-27b" | "qwen-3.6-27b" | "qwen3-6-27b" => { Some(OPENROUTER_QWEN_3_6_27B_MODEL) } + OPENROUTER_QWEN_3_6_PLUS_MODEL | "qwen3.6-plus" | "qwen-3.6-plus" => { + Some(OPENROUTER_QWEN_3_6_PLUS_MODEL) + } OPENROUTER_TENCENT_HY3_PREVIEW_MODEL | "hy3-preview" | "tencent-hy3-preview" => { Some(OPENROUTER_TENCENT_HY3_PREVIEW_MODEL) } @@ -3693,7 +3727,39 @@ unix_socket_path = "/tmp/cw-hooks.sock" } #[test] - fn xiaomi_mimo_tts_aliases_resolve_to_canonical_models() { + fn xiaomi_provider_alias_table_maps_to_mimo_runtime_config() { + let _lock = env_lock(); + let _env = EnvGuard::without_deepseek_runtime_overrides(); + let config: ConfigToml = toml::from_str( + r#" +provider = "xiaomi-mimo" +default_text_model = "deepseek/deepseek-v4-pro" + +[providers.xiaomi] +api_key = "mimo-table-key" +base_url = "https://token-plan-sgp.xiaomimimo.com/v1" +model = "mimo-v2.5-pro" +"#, + ) + .expect("xiaomi provider alias config"); + + let resolved = config.resolve_runtime_options(&CliRuntimeOverrides::default()); + + assert_eq!(resolved.provider, ProviderKind::XiaomiMimo); + assert_eq!(resolved.api_key.as_deref(), Some("mimo-table-key")); + assert_eq!( + resolved.base_url, + "https://token-plan-sgp.xiaomimimo.com/v1" + ); + assert_eq!(resolved.model, DEFAULT_XIAOMI_MIMO_MODEL); + } + + #[test] + fn xiaomi_mimo_aliases_resolve_to_canonical_models() { + assert_eq!( + normalize_model_for_provider(ProviderKind::XiaomiMimo, "omni"), + "mimo-v2.5" + ); assert_eq!( normalize_model_for_provider(ProviderKind::XiaomiMimo, "tts"), "mimo-v2.5-tts" @@ -4343,7 +4409,10 @@ unix_socket_path = "/tmp/cw-hooks.sock" "trinity-large-thinking", OPENROUTER_ARCEE_TRINITY_LARGE_THINKING_MODEL, ), + ("qwen3.6-flash", OPENROUTER_QWEN_3_6_FLASH_MODEL), ("qwen3.6-35b-a3b", OPENROUTER_QWEN_3_6_35B_A3B_MODEL), + ("qwen3.6-max-preview", OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL), + ("qwen3.6-plus", OPENROUTER_QWEN_3_6_PLUS_MODEL), ("mimo-v2.5-pro", OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL), ("kimi-k2.6", OPENROUTER_KIMI_K2_6_MODEL), ("gemma-4-31b-it", OPENROUTER_GEMMA_4_31B_MODEL), diff --git a/crates/core/Cargo.toml b/crates/core/Cargo.toml index 43011a6b..a03f2805 100644 --- a/crates/core/Cargo.toml +++ b/crates/core/Cargo.toml @@ -9,13 +9,13 @@ description = "Core runtime boundaries for DeepSeek workspace architecture" [dependencies] anyhow.workspace = true chrono.workspace = true -codewhale-agent = { path = "../agent", version = "0.8.50" } -codewhale-config = { path = "../config", version = "0.8.50" } -codewhale-execpolicy = { path = "../execpolicy", version = "0.8.50" } -codewhale-hooks = { path = "../hooks", version = "0.8.50" } -codewhale-mcp = { path = "../mcp", version = "0.8.50" } -codewhale-protocol = { path = "../protocol", version = "0.8.50" } -codewhale-state = { path = "../state", version = "0.8.50" } -codewhale-tools = { path = "../tools", version = "0.8.50" } +codewhale-agent = { path = "../agent", version = "0.8.51" } +codewhale-config = { path = "../config", version = "0.8.51" } +codewhale-execpolicy = { path = "../execpolicy", version = "0.8.51" } +codewhale-hooks = { path = "../hooks", version = "0.8.51" } +codewhale-mcp = { path = "../mcp", version = "0.8.51" } +codewhale-protocol = { path = "../protocol", version = "0.8.51" } +codewhale-state = { path = "../state", version = "0.8.51" } +codewhale-tools = { path = "../tools", version = "0.8.51" } serde_json.workspace = true uuid.workspace = true diff --git a/crates/execpolicy/Cargo.toml b/crates/execpolicy/Cargo.toml index 4214f686..721064f3 100644 --- a/crates/execpolicy/Cargo.toml +++ b/crates/execpolicy/Cargo.toml @@ -8,5 +8,5 @@ description = "Execution policy and approval model parity for DeepSeek workspace [dependencies] anyhow.workspace = true -codewhale-protocol = { path = "../protocol", version = "0.8.50" } +codewhale-protocol = { path = "../protocol", version = "0.8.51" } serde.workspace = true diff --git a/crates/hooks/Cargo.toml b/crates/hooks/Cargo.toml index c1460ab0..1ea4bfd8 100644 --- a/crates/hooks/Cargo.toml +++ b/crates/hooks/Cargo.toml @@ -10,7 +10,7 @@ description = "Hook dispatch and notifications parity for DeepSeek workspace arc anyhow.workspace = true async-trait.workspace = true chrono.workspace = true -codewhale-protocol = { path = "../protocol", version = "0.8.50" } +codewhale-protocol = { path = "../protocol", version = "0.8.51" } reqwest.workspace = true serde.workspace = true serde_json.workspace = true diff --git a/crates/tools/Cargo.toml b/crates/tools/Cargo.toml index ca14cd65..e3790c94 100644 --- a/crates/tools/Cargo.toml +++ b/crates/tools/Cargo.toml @@ -9,7 +9,7 @@ description = "Tool invocation lifecycle, schema validation, and scheduler paral [dependencies] anyhow.workspace = true async-trait.workspace = true -codewhale-protocol = { path = "../protocol", version = "0.8.50" } +codewhale-protocol = { path = "../protocol", version = "0.8.51" } serde.workspace = true serde_json.workspace = true thiserror.workspace = true diff --git a/crates/tui/CHANGELOG.md b/crates/tui/CHANGELOG.md index 6fdabac0..801dd9ee 100644 --- a/crates/tui/CHANGELOG.md +++ b/crates/tui/CHANGELOG.md @@ -7,12 +7,76 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.8.51] - 2026-06-02 + ### Added -- Added Arcee AI as a direct OpenAI-compatible provider with `[providers.arcee]`, - `ARCEE_API_KEY` / `ARCEE_BASE_URL` / `ARCEE_MODEL`, `trinity-mini` as the - default model, and `trinity-large-preview` as the documented direct API model. - OpenRouter's `arcee-ai/trinity-large-thinking` route remains separate. +- **Arcee AI as a direct provider.** New `[providers.arcee]` config block and + `ARCEE_API_KEY` / `ARCEE_BASE_URL` / `ARCEE_MODEL` environment variables, + wired through CLI auth (`codewhale auth set --provider arcee`), the TUI + provider picker, and the model registry. The default direct-API model is + `trinity-large-thinking` (reasoning-capable, 262K context and 262K max + output); `trinity-large-preview` (262K context, non-reasoning) and + `trinity-mini` (128K context) are also selectable. OpenRouter's + `arcee-ai/trinity-large-thinking` route remains separate. +- **Arcee Cloudflare-WAF compatibility.** The opening turn to the Arcee gateway + uses a benign read-only tool surface (`read_file`, `list_dir`, `file_search`, + `grep_files`, `git_status`, `git_diff`, `checklist_write`, `update_plan`) and + splits example payloads such as `python -c …` out of the system prompt, so the + WAF does not reject the first request; the full tool catalog stays reachable + through tool-search. `trinity-large-thinking`'s `reasoning_content` is + recognized and replayed on tool-call turns. +- **Expanded model catalog.** Added context-window, max-output, and + reasoning-capability metadata for additional model IDs, including + `qwen/qwen3.6-flash`, `qwen/qwen3.6-plus`, `qwen/qwen3.6-max-preview`, and + Xiaomi MiMo v2.5 chat/ASR/TTS variants; `trinity-large-preview`'s context + window was corrected to 262K. +- **Provider-aware model picker.** The picker groups models by provider, shows + per-model hints, and remembers a saved model per provider. + +### Changed + +- **Auto-compaction is now percentage- and model-aware.** The per-model + threshold helper is `compaction_threshold_for_model_at_percent(model, + percent)` (replacing the effort-based variant), and the default + `auto_compact_threshold_percent` is 80%. Auto-compaction defaults on for + models with a context window of 256K or smaller and stays opt-in for 1M-token + models (e.g. DeepSeek V4) to protect prefix-cache economics, unless the user + has explicitly set `auto_compact`. +- **Clearer provider/gateway errors.** HTTP error bodies are sanitized before + display — HTML interstitials and Cloudflare "Access Denied" pages collapse to + a one-line reason (with the ray/error ID) instead of dumping raw markup into + the transcript — and 403s are split into authentication vs. authorization + (gateway/WAF block) categories. +- The invalid-model error now names the active provider and lists Arcee among + the options. + +### Removed + +- **The session "cycle" / checkpoint-restart system.** Removed the `/cycles`, + `/cycle `, and `/recall` commands, the `recall_archive` tool, the + cycle-handoff briefing prompt, the sidebar "cycles" lines, and the + `cycle_manager` engine plumbing (`EngineConfig.cycle`, `Event::CycleAdvanced`, + seam-manager cycle thresholds and flash briefings). Long sessions no longer + auto-reset their context at a fixed token boundary — reclaim budget with + `/compact` or model-aware auto-compaction instead. Existing on-disk cycle + archives are left untouched but are no longer read or written. + +### Fixed + +- Assistant turns no longer leave an orphaned role glyph (the stray "blue dot") + when a turn streams only whitespace between reasoning and a tool call. +- Scrolling the mouse wheel over the right-hand sidebar no longer leaks into + the transcript scroll. +- The sidebar hover tooltip now appears only for truncated lines, sits below + the cursor, and uses a neutral surface color instead of the warning-orange + highlight that overlapped neighbouring rows. +- Corrected the README's description of the Constitution (Article VII is the + hierarchy itself; Article II's truth duty overrides even a user request) to + match `prompts/base.md`. +- Repaired release-blocking unit and integration tests left failing by the + cycle-removal and compaction-threshold refactors (relay instruction, + model-reject message, compaction budget, mock-LLM threshold helper). ## [0.8.50] - 2026-06-02 diff --git a/crates/tui/Cargo.toml b/crates/tui/Cargo.toml index ce781812..8f88110d 100644 --- a/crates/tui/Cargo.toml +++ b/crates/tui/Cargo.toml @@ -27,11 +27,11 @@ path = "src/bin/deepseek_tui_legacy_shim.rs" [dependencies] anyhow = "1.0.100" arboard = "3.4" -codewhale-config = { path = "../config", version = "0.8.50" } -codewhale-protocol = { path = "../protocol", version = "0.8.50" } -codewhale-release = { path = "../release", version = "0.8.50" } -codewhale-secrets = { path = "../secrets", version = "0.8.50" } -codewhale-tools = { path = "../tools", version = "0.8.50" } +codewhale-config = { path = "../config", version = "0.8.51" } +codewhale-protocol = { path = "../protocol", version = "0.8.51" } +codewhale-release = { path = "../release", version = "0.8.51" } +codewhale-secrets = { path = "../secrets", version = "0.8.51" } +codewhale-tools = { path = "../tools", version = "0.8.51" } schemaui = { version = "0.12.0", default-features = false, optional = true } async-stream = "0.3.6" async-trait = "0.1" diff --git a/crates/tui/src/client.rs b/crates/tui/src/client.rs index 90e61610..8b8353d9 100644 --- a/crates/tui/src/client.rs +++ b/crates/tui/src/client.rs @@ -16,7 +16,8 @@ use tokio::sync::Mutex as AsyncMutex; use crate::config::{ApiProvider, Config, RetryPolicy, wire_model_for_provider}; use crate::llm_client::{ - LlmClient, LlmError, RetryConfig as LlmRetryConfig, extract_retry_after, with_retry, + LlmClient, LlmError, RetryConfig as LlmRetryConfig, extract_retry_after, + sanitize_http_error_body, with_retry, }; use crate::logging; use crate::models::{MessageRequest, MessageResponse, ServerToolUsage, SystemPrompt, Usage}; @@ -667,6 +668,13 @@ impl DeepSeekClient { &self.base_url } + /// Returns the active API provider for this client. Used by the turn loop + /// to apply provider-specific request policies (e.g. Arcee's reduced + /// first-turn tool surface that clears the Cloudflare WAF). + pub fn api_provider(&self) -> ApiProvider { + self.api_provider + } + /// Translate text to the requested target language using a focused /// non-streaming chat completion call on the supplied model. /// @@ -731,7 +739,12 @@ impl DeepSeekClient { let status = response.status(); if !status.is_success() { - let error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let raw_error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let error_text = sanitize_http_error_body( + Some(self.api_provider.display_name()), + status.as_u16(), + &raw_error_text, + ); anyhow::bail!("Failed to list models: HTTP {status}: {error_text}"); } let response_text = response.text().await.unwrap_or_default(); @@ -806,7 +819,12 @@ impl DeepSeekClient { .await?; let status = response.status(); if !status.is_success() { - let error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let raw_error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let error_text = sanitize_http_error_body( + Some(self.api_provider.display_name()), + status.as_u16(), + &raw_error_text, + ); anyhow::bail!("Speech synthesis failed: HTTP {status}: {error_text}"); } @@ -897,6 +915,11 @@ impl DeepSeekClient { } let retry_after = extract_retry_after(response.headers()); let body = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let body = sanitize_http_error_body( + Some(self.api_provider.display_name()), + status.as_u16(), + &body, + ); Err(LlmError::from_http_response_with_retry_after( status.as_u16(), &body, @@ -1301,7 +1324,12 @@ impl DeepSeekClient { .await?; let status = response.status(); if !status.is_success() { - let error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let raw_error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let error_text = sanitize_http_error_body( + Some(self.api_provider.display_name()), + status.as_u16(), + &raw_error_text, + ); anyhow::bail!("FIM API error: HTTP {status}: {error_text}"); } let response_text = response.text().await.unwrap_or_default(); diff --git a/crates/tui/src/client/chat.rs b/crates/tui/src/client/chat.rs index 6a95abf3..49621186 100644 --- a/crates/tui/src/client/chat.rs +++ b/crates/tui/src/client/chat.rs @@ -61,6 +61,7 @@ fn stream_idle_timeout() -> Duration { use crate::config::ApiProvider; use crate::llm_client::StreamEventBox; +use crate::llm_client::sanitize_http_error_body; use crate::logging; use crate::models::{ ContentBlock, ContentBlockStart, Delta, Message, MessageDelta, MessageRequest, MessageResponse, @@ -157,7 +158,12 @@ impl DeepSeekClient { let status = response.status(); if !status.is_success() { - let error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let raw_error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let error_text = sanitize_http_error_body( + Some(self.api_provider.display_name()), + status.as_u16(), + &raw_error_text, + ); anyhow::bail!("Failed to call DeepSeek Chat API: HTTP {status}: {error_text}"); } @@ -246,7 +252,12 @@ impl DeepSeekClient { let status = response.status(); if !status.is_success() { - let error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let raw_error_text = bounded_error_text(response, ERROR_BODY_MAX_BYTES).await; + let error_text = sanitize_http_error_body( + Some(self.api_provider.display_name()), + status.as_u16(), + &raw_error_text, + ); // If DeepSeek rejected for missing reasoning_content despite the // sanitizer, dump the offending indices so we can diagnose where // they came from on the next failure. @@ -507,7 +518,7 @@ impl<'a> PromptBuilder<'a> { } fn build_for_provider(self, provider: ApiProvider) -> Vec { - build_chat_messages_with_reasoning( + let mut messages = build_chat_messages_with_reasoning( self.system, self.messages, self.model, @@ -517,7 +528,11 @@ impl<'a> PromptBuilder<'a> { self.reasoning_effort, ), false, - ) + ); + if provider == ApiProvider::Arcee { + apply_arcee_waf_safe_message_encoding(&mut messages); + } + messages } fn inspect(self) -> PromptInspection { @@ -564,6 +579,68 @@ impl<'a> PromptBuilder<'a> { } } +const ARCEE_WAF_TEXT_SPLIT_TRIGGERS: &[(&str, &str, &str)] = &[("python -c", "python ", "-c")]; + +fn apply_arcee_waf_safe_message_encoding(messages: &mut [Value]) { + for message in messages { + if message.get("role").and_then(Value::as_str) != Some("system") { + continue; + } + let Some(content) = message.get("content").and_then(Value::as_str) else { + continue; + }; + let Some(parts) = arcee_waf_safe_text_parts(content) else { + continue; + }; + message["content"] = json!(parts); + } +} + +fn arcee_waf_safe_text_parts(content: &str) -> Option> { + let mut parts = Vec::new(); + let mut cursor = 0usize; + let mut split_any = false; + + while cursor < content.len() { + let Some((trigger_start, trigger, left, right)) = next_arcee_waf_trigger(content, cursor) + else { + push_text_part(&mut parts, &content[cursor..]); + break; + }; + + push_text_part(&mut parts, &content[cursor..trigger_start]); + push_text_part(&mut parts, left); + push_text_part(&mut parts, right); + cursor = trigger_start + trigger.len(); + split_any = true; + } + + split_any.then_some(parts) +} + +fn next_arcee_waf_trigger<'a>( + content: &'a str, + cursor: usize, +) -> Option<(usize, &'a str, &'a str, &'a str)> { + ARCEE_WAF_TEXT_SPLIT_TRIGGERS + .iter() + .filter_map(|(trigger, left, right)| { + content[cursor..] + .find(trigger) + .map(|offset| (cursor + offset, *trigger, *left, *right)) + }) + .min_by_key(|(start, _, _, _)| *start) +} + +fn push_text_part(parts: &mut Vec, text: &str) { + if !text.is_empty() { + parts.push(json!({ + "type": "text", + "text": text, + })); + } +} + pub(crate) const CACHE_WARMUP_USER_TAIL: &str = "请只回复 OK"; const TOOL_RESULT_SENT_CHAR_BUDGET: usize = 12_000; const TOOL_RESULT_HEAD_CHARS: usize = 4_000; @@ -1821,6 +1898,18 @@ fn is_reasoning_model_for_stream(provider: ApiProvider, model: &str) -> bool { provider_accepts_reasoning_content(provider) && model_supports_reasoning(model) } +/// Providers whose chat-completions API both returns and accepts a dedicated +/// `reasoning_content` field on assistant messages. +/// +/// Arcee is intentionally included. Trinity-Large-Thinking natively emits +/// `...` traces, but Arcee's hosted API serves it through vLLM +/// with `--reasoning-parser deepseek_r1`, which parses those blocks into a +/// `reasoning_content` field (verified live against `api.arcee.ai`: thinking +/// streams as `delta.reasoning_content`, the answer as `delta.content`, with no +/// `` tags on the wire). Arcee's docs require replaying `reasoning_content` +/// on assistant tool-call turns; dropping it makes the model emit tool calls as +/// raw XML inside its thinking ("xml_in_reasoning" pitfall). Do not remove Arcee +/// here without new live evidence — see docs.arcee.ai/capabilities/reasoning-traces. fn provider_accepts_reasoning_content(provider: ApiProvider) -> bool { matches!( provider, @@ -1832,6 +1921,7 @@ fn provider_accepts_reasoning_content(provider: ApiProvider) -> bool { | ApiProvider::Novita | ApiProvider::Fireworks | ApiProvider::Siliconflow + | ApiProvider::Arcee | ApiProvider::Sglang ) } @@ -2416,6 +2506,83 @@ mod stream_diagnostics_tests { } } +#[cfg(test)] +mod arcee_waf_message_encoding_tests { + use super::build_chat_messages_for_request_and_provider; + use crate::config::ApiProvider; + use crate::models::{MessageRequest, SystemPrompt}; + use serde_json::Value; + + fn request_with_system(system: &str) -> MessageRequest { + MessageRequest { + model: "trinity-large-thinking".to_string(), + messages: Vec::new(), + max_tokens: 16, + system: Some(SystemPrompt::Text(system.to_string())), + tools: None, + tool_choice: None, + metadata: None, + thinking: None, + reasoning_effort: None, + stream: None, + temperature: None, + top_p: None, + } + } + + fn decoded_content(content: &Value) -> String { + if let Some(text) = content.as_str() { + return text.to_string(); + } + content + .as_array() + .expect("content parts") + .iter() + .map(|part| part.get("text").and_then(Value::as_str).expect("text part")) + .collect() + } + + #[test] + fn arcee_splits_waf_trigger_without_changing_decoded_system_prompt() { + let system = "Run calculations with `python -c 'print(1)'` when a tool is available."; + let request = request_with_system(system); + + let messages = build_chat_messages_for_request_and_provider(&request, ApiProvider::Arcee); + let content = &messages[0]["content"]; + + assert!( + content.is_array(), + "Arcee system content with a WAF trigger should be encoded as text parts" + ); + assert_eq!(decoded_content(content), system); + let serialized = serde_json::to_string(&messages).expect("serialize messages"); + assert!( + !serialized.contains("python -c"), + "wire JSON should not contain the Cloudflare trigger contiguously: {serialized}" + ); + } + + #[test] + fn non_arcee_providers_keep_system_prompt_as_string() { + let system = "Run calculations with `python -c 'print(1)'` when a tool is available."; + let request = request_with_system(system); + + let messages = build_chat_messages_for_request_and_provider(&request, ApiProvider::Openai); + + assert_eq!(messages[0]["content"].as_str(), Some(system)); + } + + #[test] + fn arcee_keeps_non_triggering_system_prompt_as_string() { + let system = "Use read-only tools to inspect files before reporting results."; + let request = request_with_system(system); + + let messages = build_chat_messages_for_request_and_provider(&request, ApiProvider::Arcee); + + assert_eq!(messages[0]["content"].as_str(), Some(system)); + } +} + // === #103 Phase 4: SSE decoder behavior on canned chunk sequences ============ #[cfg(test)] @@ -3367,6 +3534,7 @@ mod alias_thinking_detection_tests { assert!(provider_accepts_reasoning_content(ApiProvider::Deepseek)); assert!(provider_accepts_reasoning_content(ApiProvider::NvidiaNim)); assert!(provider_accepts_reasoning_content(ApiProvider::XiaomiMimo)); + assert!(provider_accepts_reasoning_content(ApiProvider::Arcee)); } #[test] @@ -3467,6 +3635,10 @@ mod alias_thinking_detection_tests { is_reasoning_model_for_stream(ApiProvider::XiaomiMimo, "mimo-v2.5-pro"), "mimo-v2.5-pro should stream reasoning as thinking on Xiaomi MiMo" ); + assert!( + is_reasoning_model_for_stream(ApiProvider::Arcee, "trinity-large-thinking"), + "trinity-large-thinking should stream reasoning as thinking on direct Arcee" + ); for model in [ "arcee-ai/trinity-large-thinking", "minimax/minimax-m3", diff --git a/crates/tui/src/commands/config.rs b/crates/tui/src/commands/config.rs index 39485c6b..4e9b9b1a 100644 --- a/crates/tui/src/commands/config.rs +++ b/crates/tui/src/commands/config.rs @@ -512,6 +512,7 @@ pub fn set_config_value(app: &mut App, key: &str, value: &str, persist: bool) -> match key.as_str() { "auto_compact" | "compact" => { app.auto_compact = settings.auto_compact; + app.auto_compact_user_configured = true; action = Some(AppAction::UpdateCompaction(app.compaction_config())); } "calm_mode" | "calm" => { diff --git a/crates/tui/src/commands/core.rs b/crates/tui/src/commands/core.rs index 349002c7..21e5b38d 100644 --- a/crates/tui/src/commands/core.rs +++ b/crates/tui/src/commands/core.rs @@ -4,7 +4,8 @@ use std::fmt::Write; use std::path::PathBuf; use crate::config::{ - COMMON_DEEPSEEK_MODELS, normalize_custom_model_id, normalize_model_name_for_provider, + ApiProvider, COMMON_DEEPSEEK_MODELS, normalize_custom_model_id, + normalize_model_name_for_provider, provider_passes_model_through, }; use crate::localization::{MessageId, tr}; use crate::tui::app::{App, AppAction, AppMode, ReasoningEffort}; @@ -149,8 +150,20 @@ pub fn model(app: &mut App, model_name: Option<&str>) -> CommandResult { model_id } else { let Some(model_id) = normalize_model_name_for_provider(app.api_provider, name) else { + if let Some((provider, model_id)) = saved_provider_model_match(app, name) { + return CommandResult::with_message_and_action( + format!( + "Switching provider to {} for model {model_id}.", + provider.as_str() + ), + AppAction::SwitchProvider { + provider, + model: Some(model_id), + }, + ); + } return CommandResult::error(format!( - "Invalid model '{name}'. Expected auto or a DeepSeek model ID. Common models: {}", + "Invalid model '{name}'. Expected auto, a model for the active provider, or a saved provider model. Common DeepSeek models: {}", COMMON_DEEPSEEK_MODELS.join(", ") )); }; @@ -179,6 +192,43 @@ pub fn model(app: &mut App, model_name: Option<&str>) -> CommandResult { } } +fn saved_provider_model_match(app: &App, name: &str) -> Option<(ApiProvider, String)> { + let requested = normalize_custom_model_id(name)?; + let mut saved = app + .provider_models + .iter() + .filter_map(|(provider_name, model)| { + let provider = ApiProvider::parse(provider_name)?; + (provider != app.api_provider).then_some((provider, model.as_str())) + }) + .collect::>(); + saved.sort_by_key(|(provider, _)| provider.as_str()); + + for (provider, saved_model) in saved { + let Some(saved_model) = normalize_model_for_provider_selection(provider, saved_model) + else { + continue; + }; + let requested_model = normalize_model_for_provider_selection(provider, &requested) + .unwrap_or_else(|| requested.clone()); + if saved_model.eq_ignore_ascii_case(&requested_model) + || saved_model.eq_ignore_ascii_case(&requested) + { + return Some((provider, saved_model)); + } + } + + None +} + +fn normalize_model_for_provider_selection(provider: ApiProvider, model: &str) -> Option { + if provider_passes_model_through(provider) { + normalize_custom_model_id(model) + } else { + normalize_model_name_for_provider(provider, model) + } +} + /// Fetch and list available models from the configured API endpoint. pub fn models(_app: &mut App) -> CommandResult { CommandResult::action(AppAction::FetchModels) @@ -433,6 +483,8 @@ mod tests { let mut app = App::new(options, &Config::default()); app.ui_locale = crate::localization::Locale::En; app.api_provider = crate::config::ApiProvider::Deepseek; + app.model = "deepseek-v4-pro".to_string(); + app.auto_model = false; app.model_ids_passthrough = false; app } @@ -791,12 +843,32 @@ mod tests { assert!(result.message.is_some()); let msg = result.message.unwrap(); assert!(msg.contains("Invalid model")); - assert!(msg.contains("DeepSeek model ID")); + assert!(msg.contains("active provider")); assert!(msg.contains("deepseek-v4-pro")); assert!(msg.contains("deepseek-v4-flash")); assert!(result.action.is_none()); } + #[test] + fn model_command_switches_to_saved_provider_model() { + let mut app = create_test_app(); + app.api_provider = crate::config::ApiProvider::Deepseek; + app.provider_models + .insert("moonshot".to_string(), "kimi-k2.6".to_string()); + + let result = model(&mut app, Some("kimi-k2.6")); + + match result.action { + Some(AppAction::SwitchProvider { provider, model }) => { + assert_eq!(provider, crate::config::ApiProvider::Moonshot); + assert_eq!(model.as_deref(), Some("kimi-k2.6")); + } + other => panic!("expected SwitchProvider action, got {other:?}"), + } + assert_eq!(app.api_provider, crate::config::ApiProvider::Deepseek); + assert_eq!(app.model, "deepseek-v4-pro"); + } + #[test] fn test_model_without_args_opens_picker() { let mut app = create_test_app(); diff --git a/crates/tui/src/commands/cycle.rs b/crates/tui/src/commands/cycle.rs deleted file mode 100644 index 7a1c9c65..00000000 --- a/crates/tui/src/commands/cycle.rs +++ /dev/null @@ -1,225 +0,0 @@ -//! Cycle commands: `/cycles` (list past cycle boundaries) and -//! `/cycle ` (show one cycle's briefing in detail). - -use std::fmt::Write; - -use crate::tui::app::App; - -use super::CommandResult; - -/// `/cycles` — list past cycle handoffs in compact form. -pub fn list_cycles(app: &App) -> CommandResult { - if app.cycle_briefings.is_empty() { - let msg = format!( - "No cycle boundaries have fired yet (current cycle: 1, threshold: {} tokens for {}).", - app.cycle.threshold_for(&app.model), - app.model - ); - return CommandResult::message(msg); - } - - let mut out = String::new(); - let _ = writeln!( - out, - "Cycle handoffs in this session ({} total). Active cycle: {}.", - app.cycle_briefings.len(), - app.cycle_count.saturating_add(1), - ); - out.push('\n'); - for brief in &app.cycle_briefings { - let preview = first_line(&brief.briefing_text, 80); - let _ = writeln!( - out, - " cycle {n} @ {ts} briefing: {tokens} tokens ─ {preview}", - n = brief.cycle, - ts = brief.timestamp.to_rfc3339(), - tokens = brief.token_estimate, - preview = preview, - ); - } - out.push('\n'); - out.push_str("Use `/cycle ` to show the full briefing for a specific cycle.\n"); - CommandResult::message(out) -} - -/// `/cycle ` — print the full briefing for cycle `n`. -pub fn show_cycle(app: &App, arg: Option<&str>) -> CommandResult { - let Some(raw) = arg.map(str::trim) else { - return CommandResult::error( - "Usage: /cycle — n is the cycle number from /cycles".to_string(), - ); - }; - if raw.is_empty() { - return CommandResult::error("Usage: /cycle ".to_string()); - } - let Ok(n) = raw.parse::() else { - return CommandResult::error(format!( - "Cycle number must be a positive integer (got '{raw}')." - )); - }; - - let Some(brief) = app.cycle_briefings.iter().find(|b| b.cycle == n) else { - let known: Vec = app - .cycle_briefings - .iter() - .map(|b| b.cycle.to_string()) - .collect(); - let known_str = if known.is_empty() { - "(none)".to_string() - } else { - known.join(", ") - }; - return CommandResult::error(format!( - "Cycle {n} not found in this session. Known cycles: {known_str}." - )); - }; - - let mut out = String::new(); - let _ = writeln!( - out, - "── Cycle {n} ({ts}) briefing: {tokens} tokens ──", - n = brief.cycle, - ts = brief.timestamp.to_rfc3339(), - tokens = brief.token_estimate, - ); - out.push('\n'); - out.push_str(brief.briefing_text.trim()); - out.push('\n'); - CommandResult::message(out) -} - -/// `/recall ` — user-initiated BM25 search of cycle archives. -/// -/// Synchronous wrapper around `tools::recall_archive::RecallArchiveTool` so -/// users can probe the archive without invoking the model. Output is the -/// same JSON payload the agent would see; the assistant pretty-prints -/// short results and dumps long ones inline. -pub fn recall_archive(app: &App, arg: Option<&str>) -> CommandResult { - use crate::tools::recall_archive::RecallArchiveTool; - use crate::tools::spec::{ToolContext, ToolSpec}; - - let Some(raw) = arg.map(str::trim) else { - return CommandResult::error("Usage: /recall ".to_string()); - }; - if raw.is_empty() { - return CommandResult::error("Usage: /recall ".to_string()); - } - - let session_id = app - .current_session_id - .clone() - .unwrap_or_else(|| "workspace".to_string()); - - let context = ToolContext::new(app.workspace.clone()).with_state_namespace(session_id); - let tool = RecallArchiveTool; - let input = serde_json::json!({"query": raw}); - - let result = tokio::task::block_in_place(|| { - tokio::runtime::Handle::current().block_on(tool.execute(input, &context)) - }); - - match result { - Ok(res) => CommandResult::message(res.content), - Err(err) => CommandResult::error(format!("recall_archive failed: {err}")), - } -} - -/// Truncate `text` to its first non-empty line, capped at `max_chars`. -fn first_line(text: &str, max_chars: usize) -> String { - let line = text - .lines() - .map(str::trim) - .find(|l| !l.is_empty()) - .unwrap_or(""); - if line.chars().count() <= max_chars { - line.to_string() - } else { - let prefix: String = line.chars().take(max_chars).collect(); - format!("{prefix}…") - } -} - -#[cfg(test)] -mod tests { - use super::*; - use crate::cycle_manager::CycleBriefing; - use crate::tui::app::{App, TuiOptions}; - use chrono::Utc; - use std::path::PathBuf; - - fn test_options() -> TuiOptions { - TuiOptions { - model: "deepseek-v4-pro".to_string(), - workspace: PathBuf::from("."), - config_path: None, - config_profile: None, - allow_shell: false, - use_alt_screen: true, - use_mouse_capture: false, - use_bracketed_paste: true, - max_subagents: 1, - skills_dir: PathBuf::from("."), - memory_path: PathBuf::from("memory.md"), - notes_path: PathBuf::from("notes.txt"), - mcp_config_path: PathBuf::from("mcp.json"), - use_memory: false, - start_in_agent_mode: false, - skip_onboarding: true, - yolo: false, - resume_session_id: None, - initial_input: None, - } - } - - #[test] - fn list_cycles_reports_no_boundaries_yet() { - let app = App::new(test_options(), &crate::config::Config::default()); - let res = list_cycles(&app); - assert!(res.message.is_some()); - assert!( - res.message - .as_deref() - .unwrap() - .contains("No cycle boundaries") - ); - } - - #[test] - fn show_cycle_rejects_nonexistent_cycle() { - let app = App::new(test_options(), &crate::config::Config::default()); - let res = show_cycle(&app, Some("3")); - let msg = res.message.expect("error message"); - assert!(msg.contains("Cycle 3 not found"), "got: {msg}"); - } - - #[test] - fn list_and_show_cycles_render_briefings() { - let mut app = App::new(test_options(), &crate::config::Config::default()); - app.cycle_briefings.push(CycleBriefing { - cycle: 1, - timestamp: Utc::now(), - briefing_text: "Decision: chose A; constraint: no async.".to_string(), - token_estimate: 12, - }); - app.cycle_count = 1; - - let listed = list_cycles(&app).message.expect("list message"); - assert!(listed.contains("cycle 1")); - assert!(listed.contains("12 tokens")); - - let shown = show_cycle(&app, Some("1")).message.expect("show message"); - assert!(shown.contains("Decision: chose A")); - } - - #[test] - fn show_cycle_validates_argument() { - let app = App::new(test_options(), &crate::config::Config::default()); - let res = show_cycle(&app, None); - let msg = res.message.expect("error message"); - assert!(msg.contains("Usage: /cycle")); - - let res = show_cycle(&app, Some("not-a-number")); - let msg = res.message.expect("error message"); - assert!(msg.contains("must be a positive integer")); - } -} diff --git a/crates/tui/src/commands/mod.rs b/crates/tui/src/commands/mod.rs index d2dc21df..dd10da10 100644 --- a/crates/tui/src/commands/mod.rs +++ b/crates/tui/src/commands/mod.rs @@ -9,7 +9,6 @@ mod balance; mod change; mod config; mod core; -mod cycle; mod debug; mod feedback; mod goal; @@ -340,24 +339,6 @@ pub const COMMANDS: &[CommandInfo] = &[ usage: "/context", description_id: MessageId::CmdContextDescription, }, - CommandInfo { - name: "cycles", - aliases: &["zhouqi"], - usage: "/cycles", - description_id: MessageId::CmdCyclesDescription, - }, - CommandInfo { - name: "cycle", - aliases: &[], - usage: "/cycle ", - description_id: MessageId::CmdCycleDescription, - }, - CommandInfo { - name: "recall", - aliases: &[], - usage: "/recall ", - description_id: MessageId::CmdRecallDescription, - }, CommandInfo { name: "export", aliases: &["daochu"], @@ -610,9 +591,6 @@ pub fn execute(cmd: &str, app: &mut App) -> CommandResult { "load" | "jiazai" => session::load(app, arg), "compact" | "yasuo" => session::compact(app), "purge" | "qingchu" => session::purge(app), - "cycles" | "zhouqi" => cycle::list_cycles(app), - "cycle" => cycle::show_cycle(app, arg), - "recall" => cycle::recall_archive(app, arg), "export" | "daochu" => session::export(app, arg), // Config commands @@ -849,10 +827,6 @@ fn build_relay_instruction(app: &App, focus: Option<&str>) -> String { if let Some(budget) = app.hunt.token_budget { let _ = writeln!(out, "- Hunt token budget: {budget}"); } - if app.cycle_count > 0 { - let _ = writeln!(out, "- Cycle count: {}", app.cycle_count); - } - if let Ok(todos) = app.todos.try_lock() { let snapshot = todos.snapshot(); if !snapshot.items.is_empty() { @@ -1184,7 +1158,6 @@ mod tests { let mut app = create_test_app(); app.hunt.quarry = Some("Unify the work surface".to_string()); app.hunt.token_budget = Some(12_000); - app.cycle_count = 2; { let mut todos = app.todos.try_lock().expect("todo lock"); todos.add("inspect workspace".to_string(), TodoStatus::Completed); @@ -1220,7 +1193,6 @@ mod tests { assert!(message.contains("Requested relay focus: verify install")); assert!(message.contains("Hunt quarry: Unify the work surface")); assert!(message.contains("Hunt token budget: 12000")); - assert!(message.contains("Cycle count: 2")); assert!(message.contains("Work checklist (primary progress surface, 50% complete)")); assert!(message.contains("#1 [completed] inspect workspace")); assert!(message.contains("#2 [in_progress] patch relay command")); diff --git a/crates/tui/src/commands/provider.rs b/crates/tui/src/commands/provider.rs index 72cf1bd8..c11e564d 100644 --- a/crates/tui/src/commands/provider.rs +++ b/crates/tui/src/commands/provider.rs @@ -17,7 +17,7 @@ use super::CommandResult; /// With no args, opens the picker modal. With ` [model]`, performs /// the switch directly (e.g. `/provider nim flash` lands on /// `deepseek-ai/deepseek-v4-flash`). The optional model accepts shorthand -/// (`flash`, `pro`, `v4-flash`, `v4-pro`) or any normal DeepSeek model ID. +/// (`flash`, `pro`, `v4-flash`, `v4-pro`) or any normal provider model ID. pub fn provider(app: &mut App, args: Option<&str>) -> CommandResult { let trimmed = args.map(str::trim).filter(|s| !s.is_empty()); let Some(args) = trimmed else { @@ -52,7 +52,7 @@ pub fn provider(app: &mut App, args: Option<&str>) -> CommandResult { Some(normalized) => Some(normalized), None => { return CommandResult::error(format!( - "Invalid model '{raw}'. Try: flash, pro, deepseek-v4-flash, deepseek-v4-pro, or xiaomi-mimo tts." + "Invalid model '{raw}'. Try: flash, pro, deepseek-v4-flash, deepseek-v4-pro, or xiaomi-mimo omni." )); } } @@ -74,7 +74,7 @@ fn expand_model_alias_for_provider(provider: ApiProvider, name: &str) -> String if matches!(provider, ApiProvider::XiaomiMimo) { return match lower.as_str() { "pro" | "mimo" => "mimo-v2.5-pro".to_string(), - "text" => "mimo-v2.5".to_string(), + "text" | "omni" | "v2.5-omni" => "mimo-v2.5".to_string(), "tts" | "speech" | "mimo-tts" => "mimo-v2.5-tts".to_string(), "voicedesign" | "voice-design" | "mimo-voice-design" => { "mimo-v2.5-tts-voicedesign".to_string() @@ -196,6 +196,24 @@ mod tests { } } + #[test] + fn switch_to_xiaomi_mimo_accepts_chat_shorthands() { + let mut app = create_test_app(); + for (input, expected) in [ + ("xiaomi-mimo omni", "mimo-v2.5"), + ("xiaomi-mimo v2.5-omni", "mimo-v2.5"), + ] { + let result = provider(&mut app, Some(input)); + match result.action { + Some(AppAction::SwitchProvider { provider, model }) => { + assert_eq!(provider, ApiProvider::XiaomiMimo); + assert_eq!(model.as_deref(), Some(expected)); + } + other => panic!("expected SwitchProvider for {input}, got {other:?}"), + } + } + } + #[test] fn switch_to_atlascloud_emits_action() { let mut app = create_test_app(); diff --git a/crates/tui/src/commands/skills.rs b/crates/tui/src/commands/skills.rs index 7e311906..e852d030 100644 --- a/crates/tui/src/commands/skills.rs +++ b/crates/tui/src/commands/skills.rs @@ -495,9 +495,8 @@ where F: std::future::Future, { // We're on the TUI's thread, which is part of the multi-threaded runtime. - // `block_in_place` + `Handle::current().block_on` is the pattern used by - // `commands/cycle.rs` to bridge sync slash-command handlers back into the - // async ecosystem. + // `block_in_place` + `Handle::current().block_on` bridges sync + // slash-command handlers back into the async ecosystem. tokio::task::block_in_place(|| tokio::runtime::Handle::current().block_on(future)) } diff --git a/crates/tui/src/compaction.rs b/crates/tui/src/compaction.rs index 4048524d..8dcb8ccc 100644 --- a/crates/tui/src/compaction.rs +++ b/crates/tui/src/compaction.rs @@ -20,7 +20,7 @@ use crate::models::{ /// Configuration for conversation compaction behavior. /// /// v0.8.11 simplified this from the prior token-OR-message-count trigger -/// to a token-only trigger gated by an absolute floor. The +/// to a token-only trigger. The /// `message_threshold` field was removed: its only purpose was to fire /// compaction on long sessions of small messages, which is exactly the /// case where rewriting the V4 prefix cache is least valuable. Token @@ -31,13 +31,6 @@ pub struct CompactionConfig { pub token_threshold: usize, pub model: String, pub cache_summary: bool, - /// Hard floor — `should_compact` returns `false` when total session - /// tokens fall below this number, regardless of `enabled` or - /// `token_threshold`. Defaults to [`MINIMUM_AUTO_COMPACTION_TOKENS`] - /// (500K) for v0.8.11+. Tests that want to exercise the threshold - /// logic at small fixture sizes can set this to `0` to disable the - /// floor. - pub auto_floor_tokens: usize, } impl Default for CompactionConfig { @@ -62,28 +55,10 @@ impl Default for CompactionConfig { token_threshold: 800_000, model: DEFAULT_TEXT_MODEL.to_string(), cache_summary: true, - auto_floor_tokens: MINIMUM_AUTO_COMPACTION_TOKENS, } } } -/// Hard floor for automatic compaction in v0.8.11+. -/// -/// Below this token count, `should_compact` returns `false` regardless of -/// `enabled` or `token_threshold`. The point of the floor is V4 prefix-cache -/// economics: compaction rewrites the stable prefix, which destroys the KV -/// cache. At low token counts the prefix cache is healthy and compaction's -/// cost (full re-prefill at miss prices) dwarfs its benefit (a tiny budget -/// reclaim). Above the floor compaction can still be net-positive — cache -/// is already pressured, the prefix has drifted, and freeing budget matters. -/// -/// Manual `/compact` slash command bypasses this floor with explicit user -/// agency. -/// -/// Constant rather than configurable for v0.8.11. If anyone needs to dial -/// it (smaller models, opinionated workflows), we can add a setting later. -pub const MINIMUM_AUTO_COMPACTION_TOKENS: usize = 500_000; - pub const KEEP_RECENT_MESSAGES: usize = 4; const RECENT_WORKING_SET_WINDOW: usize = 12; const MAX_WORKING_SET_PATHS: usize = 24; @@ -649,21 +624,6 @@ pub fn should_compact( return false; } - // v0.8.11: hard floor enforcement. Below the floor (default 500K tokens - // — see `MINIMUM_AUTO_COMPACTION_TOKENS`), automatic compaction is - // refused because rewriting the prefix kills V4's prefix cache for - // little budget recovery. Manual `/compact` and the `compact_now` tool - // bypass this floor by going through different code paths. - if config.auto_floor_tokens > 0 { - let total_session_tokens: usize = messages - .iter() - .map(|m| estimate_tokens_for_message(m, false)) - .sum(); - if total_session_tokens < config.auto_floor_tokens { - return false; - } - } - let plan = plan_compaction( messages, workspace, @@ -2028,7 +1988,6 @@ mod tests { let config = CompactionConfig { enabled: true, token_threshold: 1_000_000, - auto_floor_tokens: 0, ..Default::default() }; @@ -2447,7 +2406,6 @@ mod tests { let config = CompactionConfig { enabled: true, token_threshold: 100, // Low threshold for testing - auto_floor_tokens: 0, ..Default::default() }; @@ -2474,61 +2432,16 @@ mod tests { assert!(!should_compact(&messages, &config, None, None, None)); } - /// v0.8.11: the 500K hard floor blocks auto-compaction even when the - /// token-percentage threshold would otherwise fire. This is the V4 - /// prefix-cache protection — below 500K total tokens, rewriting the - /// prefix loses cache for tiny budget gains. #[test] - fn auto_compaction_floor_blocks_below_500k_even_when_threshold_says_yes() { + fn auto_compaction_uses_token_threshold_without_fixed_floor() { let config = CompactionConfig { enabled: true, - token_threshold: 100, // would normally fire instantly - // Use the production default explicitly so this test pins the - // floor's contract rather than relying on `Default`. - auto_floor_tokens: MINIMUM_AUTO_COMPACTION_TOKENS, + token_threshold: 100, ..Default::default() }; let messages: Vec = (0..10).map(|_| msg("user", &"x".repeat(50))).collect(); - // Total tokens way under 500K, so floor blocks compaction. - assert!(!should_compact(&messages, &config, None, None, None)); - } - - /// v0.8.11: when total tokens cross the 500K floor, the existing - /// threshold/message-count logic takes over again. - #[test] - fn auto_compaction_floor_yields_to_threshold_logic_above_500k() { - let config = CompactionConfig { - enabled: true, - token_threshold: 2_000_000, - auto_floor_tokens: MINIMUM_AUTO_COMPACTION_TOKENS, - ..Default::default() - }; - - // Each message ~500 tokens; 1100 messages → ~550K total tokens. - // That's above the floor (500K) AND below the deliberately high - // token_threshold, so auto-compaction stays off — by threshold, - // not floor. - let messages: Vec = (0..1100).map(|_| msg("user", &"x".repeat(2000))).collect(); - assert!(!should_compact(&messages, &config, None, None, None)); - - // Crank threshold below total → compaction fires now that we're - // past the floor. - let config_lower = CompactionConfig { - token_threshold: 100_000, - ..config - }; - assert!(should_compact(&messages, &config_lower, None, None, None)); - } - - /// `CompactionConfig::default()` ships with the 500K floor on by - /// default — production callers via `..Default::default()` get the - /// safety guarantee automatically. - #[test] - fn compaction_config_default_carries_500k_floor() { - let config = CompactionConfig::default(); - assert_eq!(config.auto_floor_tokens, MINIMUM_AUTO_COMPACTION_TOKENS); - assert_eq!(config.auto_floor_tokens, 500_000); + assert!(should_compact(&messages, &config, None, None, None)); } #[test] diff --git a/crates/tui/src/config.rs b/crates/tui/src/config.rs index 42a247bb..2b2f2a67 100644 --- a/crates/tui/src/config.rs +++ b/crates/tui/src/config.rs @@ -56,8 +56,11 @@ pub const OPENROUTER_KIMI_K2_6_MODEL: &str = "moonshotai/kimi-k2.6"; pub const OPENROUTER_MINIMAX_M3_MODEL: &str = "minimax/minimax-m3"; pub const OPENROUTER_NEMOTRON_3_NANO_OMNI_MODEL: &str = "nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free"; +pub const OPENROUTER_QWEN_3_6_FLASH_MODEL: &str = "qwen/qwen3.6-flash"; pub const OPENROUTER_QWEN_3_6_35B_A3B_MODEL: &str = "qwen/qwen3.6-35b-a3b"; +pub const OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL: &str = "qwen/qwen3.6-max-preview"; pub const OPENROUTER_QWEN_3_6_27B_MODEL: &str = "qwen/qwen3.6-27b"; +pub const OPENROUTER_QWEN_3_6_PLUS_MODEL: &str = "qwen/qwen3.6-plus"; pub const OPENROUTER_TENCENT_HY3_PREVIEW_MODEL: &str = "tencent/hy3-preview"; pub const OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL: &str = "xiaomi/mimo-v2.5-pro"; pub const OPENROUTER_XIAOMI_MIMO_V2_5_MODEL: &str = "xiaomi/mimo-v2.5"; @@ -66,8 +69,11 @@ pub const RECENT_OPENROUTER_LARGE_MODELS: &[&str] = &[ OPENROUTER_MINIMAX_M3_MODEL, OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL, OPENROUTER_XIAOMI_MIMO_V2_5_MODEL, + OPENROUTER_QWEN_3_6_FLASH_MODEL, OPENROUTER_QWEN_3_6_35B_A3B_MODEL, + OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL, OPENROUTER_QWEN_3_6_27B_MODEL, + OPENROUTER_QWEN_3_6_PLUS_MODEL, OPENROUTER_KIMI_K2_6_MODEL, OPENROUTER_GLM_5_1_MODEL, OPENROUTER_TENCENT_HY3_PREVIEW_MODEL, @@ -78,6 +84,8 @@ pub const RECENT_OPENROUTER_LARGE_MODELS: &[&str] = &[ pub const DEFAULT_OPENROUTER_BASE_URL: &str = "https://openrouter.ai/api/v1"; pub const DEFAULT_XIAOMI_MIMO_MODEL: &str = "mimo-v2.5-pro"; pub const DEFAULT_XIAOMI_MIMO_BASE_URL: &str = "https://api.xiaomimimo.com/v1"; +pub const XIAOMI_MIMO_V2_5_OMNI_MODEL: &str = "mimo-v2.5"; +pub const XIAOMI_MIMO_ASR_MODEL: &str = "mimo-v2.5-asr"; pub const XIAOMI_MIMO_TTS_MODEL: &str = "mimo-v2.5-tts"; pub const XIAOMI_MIMO_TTS_VOICE_DESIGN_MODEL: &str = "mimo-v2.5-tts-voicedesign"; pub const XIAOMI_MIMO_TTS_VOICE_CLONE_MODEL: &str = "mimo-v2.5-tts-voiceclone"; @@ -90,8 +98,9 @@ pub const DEFAULT_FIREWORKS_BASE_URL: &str = "https://api.fireworks.ai/inference pub const DEFAULT_SILICONFLOW_MODEL: &str = "deepseek-ai/DeepSeek-V4-Pro"; pub const DEFAULT_SILICONFLOW_FLASH_MODEL: &str = "deepseek-ai/DeepSeek-V4-Flash"; pub const DEFAULT_SILICONFLOW_BASE_URL: &str = "https://api.siliconflow.com/v1"; -pub const DEFAULT_ARCEE_MODEL: &str = "trinity-mini"; +pub const DEFAULT_ARCEE_MODEL: &str = "trinity-large-thinking"; pub const ARCEE_TRINITY_LARGE_PREVIEW_MODEL: &str = "trinity-large-preview"; +pub const ARCEE_TRINITY_MINI_MODEL: &str = "trinity-mini"; pub const DEFAULT_ARCEE_BASE_URL: &str = "https://api.arcee.ai/api/v1"; pub const DEFAULT_MOONSHOT_MODEL: &str = "kimi-k2.6"; pub const DEFAULT_MOONSHOT_BASE_URL: &str = "https://api.moonshot.ai/v1"; @@ -328,9 +337,10 @@ pub fn provider_capability(provider: ApiProvider, resolved_model: &str) -> Provi return ProviderCapability { provider, resolved_model: resolved_model.to_string(), - context_window: 1_000_000, - max_output: 128_000, - thinking_supported: true, + context_window: crate::models::context_window_for_model(resolved_model) + .unwrap_or(crate::models::LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS), + max_output: crate::models::max_output_tokens_for_model(resolved_model).unwrap_or(4096), + thinking_supported: crate::models::model_supports_reasoning(resolved_model), cache_telemetry_supported: false, request_payload_mode: RequestPayloadMode::ChatCompletions, alias_deprecation: None, @@ -544,9 +554,19 @@ fn canonical_openrouter_recent_model_id(model: &str) -> Option<&'static str> { | "qwen3.6-35b-a3b" | "qwen-3.6-35b-a3b" | "qwen3-6-35b-a3b" => Some(OPENROUTER_QWEN_3_6_35B_A3B_MODEL), + OPENROUTER_QWEN_3_6_FLASH_MODEL | "qwen3.6-flash" | "qwen-3.6-flash" => { + Some(OPENROUTER_QWEN_3_6_FLASH_MODEL) + } + OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL + | "qwen3.6-max-preview" + | "qwen-3.6-max-preview" + | "qwen-max-preview" => Some(OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL), OPENROUTER_QWEN_3_6_27B_MODEL | "qwen3.6-27b" | "qwen-3.6-27b" | "qwen3-6-27b" => { Some(OPENROUTER_QWEN_3_6_27B_MODEL) } + OPENROUTER_QWEN_3_6_PLUS_MODEL | "qwen3.6-plus" | "qwen-3.6-plus" => { + Some(OPENROUTER_QWEN_3_6_PLUS_MODEL) + } OPENROUTER_TENCENT_HY3_PREVIEW_MODEL | "hy3-preview" | "tencent-hy3-preview" => { Some(OPENROUTER_TENCENT_HY3_PREVIEW_MODEL) } @@ -573,8 +593,22 @@ fn canonical_xiaomi_mimo_model_id(model: &str) -> Option<&'static str> { | "mimo-v2-5-pro" | "xiaomi-mimo-v2.5-pro" | "xiaomi-mimo-v2-5-pro" => Some(DEFAULT_XIAOMI_MIMO_MODEL), - "mimo-v2.5" | "mimo-v25" | "mimo-v2-5" | "xiaomi-mimo-v2.5" | "xiaomi-mimo-v2-5" => { - Some("mimo-v2.5") + "omni" + | "mimo-omni" + | "v2.5-omni" + | "v25-omni" + | "mimo-v2.5" + | "mimo-v25" + | "mimo-v2-5" + | "mimo-v2.5-omni" + | "mimo-v25-omni" + | "mimo-v2-5-omni" + | "xiaomi-mimo-v2.5" + | "xiaomi-mimo-v2-5" + | "xiaomi-mimo-v2.5-omni" + | "xiaomi-mimo-v2-5-omni" => Some(XIAOMI_MIMO_V2_5_OMNI_MODEL), + "asr" | "mimo-asr" | "mimo-v2.5-asr" | "speech-to-text" | "transcribe" => { + Some(XIAOMI_MIMO_ASR_MODEL) } "mimo-tts" | "mimo-v25-tts" | "mimo-v2.5-tts" | "tts" | "speech" => { Some(XIAOMI_MIMO_TTS_MODEL) @@ -600,9 +634,10 @@ fn canonical_arcee_model_id(model: &str) -> Option<&'static str> { let normalized = model.trim().to_ascii_lowercase(); let normalized = normalized.replace(['_', ' '], "-"); match normalized.as_str() { - "trinity" | "arcee-trinity" | "arcee-trinity-mini" | DEFAULT_ARCEE_MODEL => { + "trinity" | "arcee-trinity" | "trinity-large-thinking" | "arcee-trinity-large-thinking" => { Some(DEFAULT_ARCEE_MODEL) } + "arcee-trinity-mini" | ARCEE_TRINITY_MINI_MODEL => Some(ARCEE_TRINITY_MINI_MODEL), "arcee-trinity-large-preview" | ARCEE_TRINITY_LARGE_PREVIEW_MODEL => { Some(ARCEE_TRINITY_LARGE_PREVIEW_MODEL) } @@ -692,14 +727,7 @@ pub fn model_completion_names_for_provider(provider: ApiProvider) -> Vec<&'stati models.extend_from_slice(RECENT_OPENROUTER_LARGE_MODELS); models } - ApiProvider::XiaomiMimo => vec![ - DEFAULT_XIAOMI_MIMO_MODEL, - "mimo-v2.5", - XIAOMI_MIMO_TTS_MODEL, - XIAOMI_MIMO_TTS_VOICE_DESIGN_MODEL, - XIAOMI_MIMO_TTS_VOICE_CLONE_MODEL, - XIAOMI_MIMO_V2_TTS_MODEL, - ], + ApiProvider::XiaomiMimo => vec![DEFAULT_XIAOMI_MIMO_MODEL, XIAOMI_MIMO_V2_5_OMNI_MODEL], ApiProvider::Novita => vec![DEFAULT_NOVITA_MODEL, DEFAULT_NOVITA_FLASH_MODEL], ApiProvider::Fireworks => vec![DEFAULT_FIREWORKS_MODEL], ApiProvider::Siliconflow => { @@ -1347,9 +1375,6 @@ pub struct ContextConfig { pub l2_threshold: Option, #[serde(default)] pub l3_threshold: Option, - /// Hard cycle boundary. Default: 768000. - #[serde(default)] - pub cycle_threshold: Option, /// Model used for seam/briefing work. Default: "deepseek-v4-flash". #[serde(default)] pub seam_model: Option, @@ -1798,7 +1823,7 @@ pub struct ProvidersConfig { pub volcengine: ProviderConfig, #[serde(default)] pub openrouter: ProviderConfig, - #[serde(default)] + #[serde(default, alias = "xiaomi", alias = "mimo", alias = "xiaomimimo")] pub xiaomi_mimo: ProviderConfig, #[serde(default)] pub novita: ProviderConfig, @@ -2123,6 +2148,29 @@ impl Config { }) } + pub(crate) fn provider_config_for_mut(&mut self, provider: ApiProvider) -> &mut ProviderConfig { + let providers = self.providers.get_or_insert_with(ProvidersConfig::default); + match provider { + ApiProvider::Deepseek => &mut providers.deepseek, + ApiProvider::DeepseekCN => &mut providers.deepseek_cn, + ApiProvider::NvidiaNim => &mut providers.nvidia_nim, + ApiProvider::Openai => &mut providers.openai, + ApiProvider::Atlascloud => &mut providers.atlascloud, + ApiProvider::WanjieArk => &mut providers.wanjie_ark, + ApiProvider::Openrouter => &mut providers.openrouter, + ApiProvider::XiaomiMimo => &mut providers.xiaomi_mimo, + ApiProvider::Novita => &mut providers.novita, + ApiProvider::Fireworks => &mut providers.fireworks, + ApiProvider::Siliconflow => &mut providers.siliconflow, + ApiProvider::Arcee => &mut providers.arcee, + ApiProvider::Moonshot => &mut providers.moonshot, + ApiProvider::Sglang => &mut providers.sglang, + ApiProvider::Vllm => &mut providers.vllm, + ApiProvider::Ollama => &mut providers.ollama, + ApiProvider::Volcengine => &mut providers.volcengine, + } + } + pub(crate) fn provider_config(&self) -> Option<&ProviderConfig> { self.provider_config_for(self.api_provider()) } @@ -2180,17 +2228,26 @@ impl Config { if moonshot_uses_kimi_code { return DEFAULT_KIMI_CODE_MODEL.to_string(); } + if let Some(model) = self.default_text_model.as_deref() + && model.trim().eq_ignore_ascii_case("auto") + { + return "auto".to_string(); + } + if provider == ApiProvider::XiaomiMimo + && let Some(model) = self.default_text_model.as_deref() + && let Some(canonical) = canonical_xiaomi_mimo_model_id(model) + { + return canonical.to_string(); + } + if provider == ApiProvider::XiaomiMimo { + return DEFAULT_XIAOMI_MIMO_MODEL.to_string(); + } if let Some(model) = self.default_text_model.as_deref() && (provider_passes_model_through(provider) || self.active_provider_preserves_custom_base_url_model()) { return model.trim().to_string(); } - if let Some(model) = self.default_text_model.as_deref() - && model.trim().eq_ignore_ascii_case("auto") - { - return "auto".to_string(); - } if let Some(model) = self.default_text_model.as_deref() && let Some(normalized) = normalize_model_name_for_provider(provider, model) { @@ -3993,10 +4050,6 @@ fn merge_config(base: Config, override_cfg: Config) -> Config { .context .l3_threshold .or(base.context.l3_threshold), - cycle_threshold: override_cfg - .context - .cycle_threshold - .or(base.context.cycle_threshold), seam_model: override_cfg.context.seam_model.or(base.context.seam_model), }, subagents: override_cfg.subagents.or(base.subagents), @@ -6697,7 +6750,10 @@ api_key = "old-openrouter-key" "trinity-large-thinking", OPENROUTER_ARCEE_TRINITY_LARGE_THINKING_MODEL, ), + ("qwen3.6-flash", OPENROUTER_QWEN_3_6_FLASH_MODEL), ("qwen3.6-35b-a3b", OPENROUTER_QWEN_3_6_35B_A3B_MODEL), + ("qwen3.6-max-preview", OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL), + ("qwen3.6-plus", OPENROUTER_QWEN_3_6_PLUS_MODEL), ("mimo-v2.5-pro", OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL), ("kimi-k2.6", OPENROUTER_KIMI_K2_6_MODEL), ("minimax-m3", OPENROUTER_MINIMAX_M3_MODEL), @@ -6716,7 +6772,10 @@ api_key = "old-openrouter-key" for (alias, expected) in [ ("trinity", DEFAULT_ARCEE_MODEL), ("arcee-trinity", DEFAULT_ARCEE_MODEL), - ("arcee-trinity-mini", DEFAULT_ARCEE_MODEL), + ("trinity-large-thinking", DEFAULT_ARCEE_MODEL), + ("arcee-trinity-large-thinking", DEFAULT_ARCEE_MODEL), + ("arcee-trinity-mini", ARCEE_TRINITY_MINI_MODEL), + ("trinity-mini", ARCEE_TRINITY_MINI_MODEL), ( "arcee-trinity-large-preview", ARCEE_TRINITY_LARGE_PREVIEW_MODEL, @@ -6731,7 +6790,11 @@ api_key = "old-openrouter-key" } #[test] - fn normalize_xiaomi_mimo_tts_aliases_for_provider() { + fn normalize_xiaomi_mimo_aliases_for_provider() { + assert_eq!( + normalize_model_name_for_provider(ApiProvider::XiaomiMimo, "omni").as_deref(), + Some("mimo-v2.5") + ); assert_eq!( normalize_model_name_for_provider(ApiProvider::XiaomiMimo, "tts").as_deref(), Some("mimo-v2.5-tts") @@ -6747,17 +6810,27 @@ api_key = "old-openrouter-key" } #[test] - fn model_completion_names_for_xiaomi_mimo_include_tts_models() { + fn model_completion_names_for_xiaomi_mimo_include_chat_models() { let models = model_completion_names_for_provider(ApiProvider::XiaomiMimo); - for expected in [ - "mimo-v2.5-pro", - "mimo-v2.5", + for expected in ["mimo-v2.5-pro", "mimo-v2.5"] { + assert!(models.contains(&expected), "missing {expected}"); + } + for deprecated in ["mimo-v2-pro", "mimo-v2-omni", "mimo-v2-flash"] { + assert!( + !models.contains(&deprecated), + "{deprecated} is deprecated and should not be promoted" + ); + } + for speech_model in [ "mimo-v2.5-tts", "mimo-v2.5-tts-voicedesign", "mimo-v2.5-tts-voiceclone", "mimo-v2-tts", ] { - assert!(models.contains(&expected), "missing {expected}"); + assert!( + !models.contains(&speech_model), + "{speech_model} belongs in speech/TTS selection, not /model" + ); } } @@ -6779,7 +6852,11 @@ api_key = "old-openrouter-key" OPENROUTER_ARCEE_TRINITY_LARGE_THINKING_MODEL, OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL, OPENROUTER_MINIMAX_M3_MODEL, + OPENROUTER_QWEN_3_6_FLASH_MODEL, OPENROUTER_QWEN_3_6_35B_A3B_MODEL, + OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL, + OPENROUTER_QWEN_3_6_27B_MODEL, + OPENROUTER_QWEN_3_6_PLUS_MODEL, OPENROUTER_GEMMA_4_31B_MODEL, ] { assert!(models.contains(&expected), "missing {expected}"); @@ -6818,7 +6895,6 @@ api_key = "old-openrouter-key" let config = Config::default(); assert!(!config.context.enabled.unwrap_or(false)); assert_eq!(config.context.l1_threshold.unwrap_or(192_000), 192_000); - assert_eq!(config.context.cycle_threshold.unwrap_or(768_000), 768_000); assert_eq!( config .context @@ -6859,7 +6935,6 @@ api_key = "old-openrouter-key" l1_threshold = 111 l2_threshold = 222 l3_threshold = 333 - cycle_threshold = 444 "#, )?; @@ -7235,6 +7310,45 @@ http_headers = { "X-Model-Provider-Id" = "from-file" } Ok(()) } + #[test] + fn xiaomi_mimo_provider_ignores_non_mimo_root_default_model() -> Result<()> { + let config = Config { + provider: Some("xiaomi-mimo".to_string()), + default_text_model: Some(DEFAULT_OPENROUTER_MODEL.to_string()), + ..Default::default() + }; + + config.validate()?; + assert_eq!(config.api_provider(), ApiProvider::XiaomiMimo); + assert_eq!(config.default_model(), DEFAULT_XIAOMI_MIMO_MODEL); + Ok(()) + } + + #[test] + fn xiaomi_provider_alias_table_maps_to_mimo_config() -> Result<()> { + let config: Config = toml::from_str( + r#" +provider = "xiaomi-mimo" +default_text_model = "deepseek/deepseek-v4-pro" + +[providers.xiaomi] +api_key = "mimo-table-key" +base_url = "https://token-plan-sgp.xiaomimimo.com/v1" +model = "mimo-v2.5-pro" +"#, + )?; + + config.validate()?; + assert_eq!(config.api_provider(), ApiProvider::XiaomiMimo); + assert_eq!(config.deepseek_api_key()?, "mimo-table-key"); + assert_eq!( + config.deepseek_base_url(), + "https://token-plan-sgp.xiaomimimo.com/v1" + ); + assert_eq!(config.default_model(), DEFAULT_XIAOMI_MIMO_MODEL); + Ok(()) + } + #[test] fn xiaomi_mimo_env_overrides_provider_base_url_model_and_key() -> Result<()> { let _lock = lock_test_env(); @@ -8964,6 +9078,11 @@ model = "deepseek-ai/deepseek-v4-pro" 262_144, 262_144, ), + (OPENROUTER_QWEN_3_6_FLASH_MODEL, 1_000_000, 65_536), + (OPENROUTER_QWEN_3_6_35B_A3B_MODEL, 262_144, 262_140), + (OPENROUTER_QWEN_3_6_MAX_PREVIEW_MODEL, 262_144, 65_536), + (OPENROUTER_QWEN_3_6_27B_MODEL, 262_144, 262_140), + (OPENROUTER_QWEN_3_6_PLUS_MODEL, 1_000_000, 65_536), (OPENROUTER_XIAOMI_MIMO_V2_5_PRO_MODEL, 1_000_000, 131_072), (OPENROUTER_MINIMAX_M3_MODEL, 1_000_000, 524_288), ] { @@ -8982,10 +9101,25 @@ model = "deepseek-ai/deepseek-v4-pro" #[test] fn provider_capability_arcee_direct_models_use_api_docs_shape() { - for model in [DEFAULT_ARCEE_MODEL, ARCEE_TRINITY_LARGE_PREVIEW_MODEL] { + let thinking_cap = provider_capability(ApiProvider::Arcee, DEFAULT_ARCEE_MODEL); + assert_eq!(thinking_cap.context_window, 262_144); + assert_eq!(thinking_cap.max_output, 262_144); + assert!(thinking_cap.thinking_supported); + assert!(!thinking_cap.cache_telemetry_supported); + assert_eq!( + thinking_cap.request_payload_mode, + RequestPayloadMode::ChatCompletions + ); + + for model in [ARCEE_TRINITY_LARGE_PREVIEW_MODEL, ARCEE_TRINITY_MINI_MODEL] { let cap = provider_capability(ApiProvider::Arcee, model); - assert_eq!(cap.context_window, 128_000); + let expected_window = if model == ARCEE_TRINITY_LARGE_PREVIEW_MODEL { + 262_144 + } else { + 128_000 + }; + assert_eq!(cap.context_window, expected_window); assert_eq!(cap.max_output, 4096); assert!(!cap.thinking_supported); assert!(!cap.cache_telemetry_supported); @@ -9000,7 +9134,7 @@ model = "deepseek-ai/deepseek-v4-pro" fn provider_capability_xiaomi_mimo_has_thinking_no_cache() { let cap = provider_capability(ApiProvider::XiaomiMimo, DEFAULT_XIAOMI_MIMO_MODEL); assert_eq!(cap.context_window, 1_000_000); - assert_eq!(cap.max_output, 128_000); + assert_eq!(cap.max_output, 131_072); assert!(cap.thinking_supported); assert!(!cap.cache_telemetry_supported); assert_eq!( diff --git a/crates/tui/src/core/engine.rs b/crates/tui/src/core/engine.rs index 23b4194a..8efb9570 100644 --- a/crates/tui/src/core/engine.rs +++ b/crates/tui/src/core/engine.rs @@ -26,10 +26,6 @@ use crate::compaction::{ CompactionConfig, compact_messages_safe, merge_system_prompts, should_compact, }; use crate::config::{ApiProvider, Config, DEFAULT_MAX_SUBAGENTS, DEFAULT_TEXT_MODEL}; -use crate::cycle_manager::{ - CycleBriefing, CycleConfig, StructuredState, archive_cycle, build_seed_messages, - estimate_briefing_tokens, produce_briefing, should_advance_cycle, -}; use crate::error_taxonomy::{ErrorCategory, ErrorEnvelope, StreamError}; use crate::features::{Feature, Features}; use crate::llm_client::LlmClient; @@ -44,19 +40,21 @@ use crate::prompts; use crate::purge::{emit_purge_completed, emit_purge_failed, emit_purge_started, run_purge}; use crate::seam_manager::{SeamConfig, SeamManager}; use crate::tools::goal::{SharedGoalState, new_shared_goal_state}; -use crate::tools::plan::{SharedPlanState, new_shared_plan_state}; +use crate::tools::plan::{PlanSnapshot, SharedPlanState, new_shared_plan_state}; use crate::tools::shell::{SharedShellManager, new_shared_shell_manager}; use crate::tools::spec::RuntimeToolServices; use crate::tools::spec::{ApprovalRequirement, ToolError, ToolResult}; use crate::tools::subagent::{ - Mailbox, SharedSubAgentManager, SubAgentCompletion, SubAgentForkContext, SubAgentRuntime, - SubAgentType, new_shared_subagent_manager, resolve_subagent_assignment_route, + Mailbox, SharedSubAgentManager, SubAgentCompletion, SubAgentForkContext, SubAgentResult, + SubAgentRuntime, SubAgentStatus, SubAgentType, new_shared_subagent_manager, + resolve_subagent_assignment_route, }; -use crate::tools::todo::{SharedTodoList, new_shared_todo_list}; +use crate::tools::todo::{SharedTodoList, TodoListSnapshot, new_shared_todo_list}; use crate::tools::user_input::{UserInputRequest, UserInputResponse}; use crate::tools::{ToolContext, ToolRegistryBuilder}; use crate::tui::app::AppMode; use crate::utils::spawn_supervised; +use crate::working_set::WorkingSet; use super::capacity::{ CapacityController, CapacityControllerConfig, CapacityDecision, CapacityObservationInput, @@ -73,6 +71,139 @@ use super::session::Session; use super::tool_parser; use super::turn::{TurnContext, TurnToolCall, post_turn_snapshot, pre_turn_snapshot}; +/// Snapshot of parent state that can be passed to forked sub-agents without +/// rewriting the parent transcript. +#[derive(Debug, Clone, Default)] +struct StructuredState { + mode_label: String, + workspace: PathBuf, + cwd: Option, + working_set_summary: Option, + todo_snapshot: Option, + plan_snapshot: Option, + subagent_snapshots: Vec, +} + +impl StructuredState { + async fn capture( + mode_label: impl Into, + workspace: PathBuf, + cwd: Option, + working_set: &WorkingSet, + todos: &SharedTodoList, + plan_state: &SharedPlanState, + subagents: Option<&SharedSubAgentManager>, + ) -> Self { + let working_set_summary = working_set.summary_block(&workspace); + + let todo_snapshot = { + let guard = todos.lock().await; + let snap = guard.snapshot(); + if snap.items.is_empty() { + None + } else { + Some(snap) + } + }; + + let plan_snapshot = { + let guard = plan_state.lock().await; + if guard.is_empty() { + None + } else { + Some(guard.snapshot()) + } + }; + + let subagent_snapshots = if let Some(handle) = subagents { + let guard = handle.read().await; + guard + .list() + .into_iter() + .filter(|s| matches!(s.status, SubAgentStatus::Running)) + .collect() + } else { + Vec::new() + }; + + Self { + mode_label: mode_label.into(), + workspace, + cwd, + working_set_summary, + todo_snapshot, + plan_snapshot, + subagent_snapshots, + } + } + + #[must_use] + fn to_system_block(&self) -> Option { + let mut out = String::new(); + out.push_str("## Fork State\n\n"); + out.push_str(&format!("- Mode: `{}`\n", self.mode_label)); + out.push_str(&format!("- Workspace: `{}`\n", self.workspace.display())); + if let Some(cwd) = self.cwd.as_ref() { + out.push_str(&format!("- Cwd: `{}`\n", cwd.display())); + } + + if self.todo_snapshot.is_some() || self.plan_snapshot.is_some() { + out.push_str("\n### Work\n"); + } + + if let Some(todos) = self.todo_snapshot.as_ref() { + out.push_str(&format!( + "\nChecklist ({}% complete)\n", + todos.completion_pct + )); + for item in &todos.items { + let marker = match item.status { + crate::tools::todo::TodoStatus::Pending => "[ ]", + crate::tools::todo::TodoStatus::InProgress => "[~]", + crate::tools::todo::TodoStatus::Completed => "[x]", + }; + out.push_str(&format!("- {marker} {}\n", item.content)); + } + } + + if let Some(plan) = self.plan_snapshot.as_ref() { + out.push_str("\nStrategy metadata\n"); + if let Some(explanation) = plan.explanation.as_ref() { + out.push_str(&format!("{explanation}\n\n")); + } + for item in &plan.items { + let marker = match item.status { + crate::tools::plan::StepStatus::Pending => "[ ]", + crate::tools::plan::StepStatus::InProgress => "[~]", + crate::tools::plan::StepStatus::Completed => "[x]", + }; + out.push_str(&format!("- {marker} {}\n", item.step)); + } + } + + if !self.subagent_snapshots.is_empty() { + out.push_str("\n### Open Sub-Agents\n"); + for s in &self.subagent_snapshots { + let role = s.assignment.role.as_deref().unwrap_or("-"); + let goal = if s.assignment.objective.is_empty() { + "(no objective set)" + } else { + s.assignment.objective.as_str() + }; + out.push_str(&format!("- `{}` (role: {}) - {}\n", s.agent_id, role, goal)); + } + } + + if let Some(working_set) = self.working_set_summary.as_deref() { + out.push('\n'); + out.push_str(working_set); + out.push('\n'); + } + + Some(out) + } +} + // === Types === /// Configuration for the engine @@ -115,16 +246,7 @@ pub struct EngineConfig { /// Feature flags controlling tool availability. pub features: Features, /// Auto-compaction settings for long conversations. - /// - /// As of v0.6.6 the high-level summarization compaction (`compact_messages_safe`) - /// is **disabled by default**; the checkpoint-restart cycle architecture - /// (`cycle_manager`) replaces it. The compaction config is still wired through - /// for the per-tool-result truncation path (`compact_tool_result_for_context`) - /// and for users who explicitly opt back in through the `auto_compact` - /// setting or a direct engine config. pub compaction: CompactionConfig, - /// Checkpoint-restart cycle settings (issue #124). - pub cycle: CycleConfig, /// Capacity-controller settings. pub capacity: CapacityControllerConfig, /// Shared Todo list state. @@ -223,7 +345,6 @@ impl Default for EngineConfig { max_subagents: DEFAULT_MAX_SUBAGENTS, features: Features::with_defaults(), compaction: CompactionConfig::default(), - cycle: CycleConfig::default(), capacity: CapacityControllerConfig::default(), todos: new_shared_todo_list(), plan_state: new_shared_plan_state(), @@ -550,10 +671,6 @@ impl Engine { .context .l3_threshold .unwrap_or(crate::seam_manager::DEFAULT_L3_THRESHOLD), - cycle_threshold: api_config - .context - .cycle_threshold - .unwrap_or(crate::seam_manager::DEFAULT_CYCLE_THRESHOLD), seam_model: api_config .context .seam_model @@ -1573,16 +1690,6 @@ impl Engine { ) .await; - // Checkpoint-restart cycle boundary (issue #124). Run BEFORE - // TurnComplete so the engine loop doesn't block the terminal after - // the turn signal (#234). The status chip ("↻ context refreshing...") - // is visible during the wait, and once TurnComplete fires the - // terminal is immediately responsive. No-op unless the estimated - // input tokens have crossed the per-cycle threshold. - if matches!(status, TurnOutcomeStatus::Completed) { - self.maybe_advance_cycle(mode).await; - } - // Update session usage self.session.total_usage.add(&turn.usage); @@ -1848,10 +1955,6 @@ impl Engine { .token_threshold .min(target_budget.saturating_sub(1)) .max(1); - // v0.8.11: forced compaction (capacity guardrail) bypasses the floor - // because we're at a hard ceiling and have to free budget regardless - // of cache cost. - forced_config.auto_floor_tokens = 0; match compact_messages_safe( client, @@ -2142,205 +2245,6 @@ impl Engine { ))) .await; } - /// its token threshold (issue #124). No-op in the common case. - /// - /// Caller must invoke this only at a clean turn boundary (no in-flight - /// tool, no open stream, no pending approval modal). The phase guard - /// inside `should_advance_cycle` is a defence-in-depth check; the - /// engine's wider state machine is the primary enforcement layer. - /// - /// Sub-agents are intentionally NOT awaited: each sub-agent has its own - /// context, the parent's reset doesn't invalidate them. Their handles - /// are captured in the structured-state block so the next cycle can see - /// they're still running. - async fn maybe_advance_cycle(&mut self, mode: AppMode) { - if !should_advance_cycle( - self.estimated_input_tokens() as u64, - turn_response_headroom_tokens(), - &self.session.model, - &self.config.cycle, - false, - ) { - return; - } - - let Some(client) = self.deepseek_client.clone() else { - crate::logging::warn( - "Cycle boundary skipped: API client not configured for briefing turn", - ); - return; - }; - - let from = self.session.cycle_count; - let to = from.saturating_add(1); - let archive_started = self.session.current_cycle_started; - let max_briefing_tokens = self.config.cycle.briefing_max_for(&self.session.model); - - let _ = self - .tx_event - .send(Event::status(format!( - "↻ context refreshing (cycle {from} → {to}, generating briefing…)" - ))) - .await; - - // 1. Generate the model-curated briefing. Prefer the Flash seam - // manager (#159) for cost and speed; fall back to the main model - // (legacy produce_briefing) when the seam manager isn't available. - let briefing_text = if let Some(ref seam_mgr) = self.seam_manager { - let seams = seam_mgr.collect_seam_texts(&self.session.messages).await; - let state_text = { - let s = StructuredState::capture( - mode.label(), - self.config.workspace.clone(), - std::env::current_dir().ok(), - &self.session.working_set, - &self.config.todos, - &self.config.plan_state, - Some(&self.subagent_manager), - ) - .await; - s.to_system_block() - }; - match seam_mgr - .produce_flash_briefing(&seams, state_text.as_deref()) - .await - { - Ok(text) => text, - Err(err) => { - crate::logging::warn(format!( - "Flash briefing failed, falling back to main model: {err}" - )); - match produce_briefing( - &client, - &self.session.model, - &self.session.messages, - max_briefing_tokens, - ) - .await - { - Ok(text) => text, - Err(err2) => { - crate::logging::warn(format!( - "Cycle briefing turn failed; skipping cycle advance: {err2}" - )); - let _ = self - .tx_event - .send(Event::status(format!( - "↻ cycle handoff failed (continuing in cycle {from}): {err2}" - ))) - .await; - return; - } - } - } - } - } else { - match produce_briefing( - &client, - &self.session.model, - &self.session.messages, - max_briefing_tokens, - ) - .await - { - Ok(text) => text, - Err(err) => { - crate::logging::warn(format!( - "Cycle briefing turn failed; skipping cycle advance: {err}" - )); - let _ = self - .tx_event - .send(Event::status(format!( - "↻ cycle handoff failed (continuing in cycle {from}): {err}" - ))) - .await; - return; - } - } - }; - - let briefing_tokens = estimate_briefing_tokens(&briefing_text); - let now = chrono::Utc::now(); - let briefing = CycleBriefing { - cycle: to, - timestamp: now, - briefing_text: briefing_text.clone(), - token_estimate: briefing_tokens, - }; - - // 2. Archive the cycle to disk. If the archive write fails we still - // proceed with the swap — the briefing alone preserves enough - // state to continue, and the user can recover the lost archive - // from their session log if needed. - match archive_cycle( - &self.session.id, - to, - &self.session.messages, - &self.session.model, - archive_started, - ) { - Ok(path) => { - crate::logging::info(format!("Cycle {to} archived to {}", path.display())); - } - Err(err) => { - crate::logging::warn(format!( - "Failed to archive cycle {to}; continuing with swap: {err}" - )); - } - } - - // 3. Capture structured state. Locks are held only for the snapshot. - let state = StructuredState::capture( - mode.label(), - self.config.workspace.clone(), - std::env::current_dir().ok(), - &self.session.working_set, - &self.config.todos, - &self.config.plan_state, - Some(&self.subagent_manager), - ) - .await; - let state_block = state.to_system_block(); - - // 4. Build the seed messages. The next cycle starts with the - // base system prompt (refreshed below) and these seeds. - let seed_messages = build_seed_messages( - state_block.as_deref(), - Some(&briefing), - None, // pending_user_message — pulled from steer/queue elsewhere - ); - - // 5. Atomic swap. - self.session.messages = seed_messages; - self.session.cycle_count = to; - self.session.current_cycle_started = now; - self.session.cycle_briefings.push(briefing.clone()); - // Reset seam tracking for the new cycle. - if let Some(ref seam_mgr) = self.seam_manager { - seam_mgr.reset().await; - } - // Drop any compaction summary — that path is incompatible with the - // fresh-context model and would Frankenstein-merge with the briefing. - self.session.compaction_summary_prompt = None; - self.refresh_system_prompt(mode); - self.emit_session_updated().await; - - let _ = self - .tx_event - .send(Event::CycleAdvanced { - from, - to, - briefing: briefing.clone(), - }) - .await; - let _ = self - .tx_event - .send(Event::status(format!( - "↻ context refreshed (cycle {from} → {to}, briefing: {briefing_tokens} tokens carried)" - ))) - .await; - } - /// Refresh the system prompt based on current mode and context. fn refresh_system_prompt(&mut self, mode: AppMode) { let user_memory_block = @@ -2636,7 +2540,6 @@ use context::{ COMPACTION_SUMMARY_MARKER, MAX_CONTEXT_RECOVERY_ATTEMPTS, MIN_RECENT_MESSAGES_TO_KEEP, context_input_budget, effective_max_output_tokens, estimate_input_tokens_conservative, extract_compaction_summary_prompt, is_context_length_error_message, summarize_text, - turn_response_headroom_tokens, }; mod dispatch; mod loop_guard; @@ -2674,10 +2577,10 @@ use self::streaming::{ }; use self::tool_catalog::{ CODE_EXECUTION_TOOL_NAME, JS_EXECUTION_TOOL_NAME, MULTI_TOOL_PARALLEL_NAME, - REQUEST_USER_INPUT_NAME, active_tools_for_step, build_model_tool_catalog, - ensure_advanced_tooling, execute_code_execution_tool, execute_tool_search, - initial_active_tools, is_tool_search_tool, maybe_hydrate_requested_deferred_tool, - missing_tool_error_message, + REQUEST_USER_INPUT_NAME, active_tools_for_step, apply_provider_tool_policy, + build_model_tool_catalog, ensure_advanced_tooling, execute_code_execution_tool, + execute_tool_search, initial_active_tools, is_tool_search_tool, + maybe_hydrate_requested_deferred_tool, missing_tool_error_message, }; #[cfg(test)] use self::tool_catalog::{ diff --git a/crates/tui/src/core/engine/context.rs b/crates/tui/src/core/engine/context.rs index 7d3e8832..08ce9004 100644 --- a/crates/tui/src/core/engine/context.rs +++ b/crates/tui/src/core/engine/context.rs @@ -586,10 +586,6 @@ pub(super) fn context_input_budget(model: &str) -> Option { .and_then(|v| v.checked_sub(CONTEXT_HEADROOM_TOKENS)) } -pub(super) fn turn_response_headroom_tokens() -> u64 { - u64::from(TURN_MAX_OUTPUT_TOKENS).saturating_add(CONTEXT_HEADROOM_TOKENS as u64) -} - pub(super) fn is_context_length_error_message(message: &str) -> bool { crate::error_taxonomy::classify_error_message(message) == ErrorCategory::InvalidInput } diff --git a/crates/tui/src/core/engine/tests.rs b/crates/tui/src/core/engine/tests.rs index 48491277..6043a608 100644 --- a/crates/tui/src/core/engine/tests.rs +++ b/crates/tui/src/core/engine/tests.rs @@ -584,6 +584,105 @@ fn model_tool_catalog_applies_native_and_mcp_deferral() { assert_eq!(defer_loading("mcp_server_write"), Some(true)); } +#[test] +fn arcee_provider_policy_defers_risky_tools_keeps_read_only_and_tool_search() { + let always_load = HashSet::new(); + let mut catalog = vec![ + api_tool("read_file"), + api_tool("list_dir"), + api_tool("git_status"), + api_tool("git_diff"), + api_tool("grep_files"), + api_tool("file_search"), + api_tool("update_plan"), + api_tool("checklist_write"), + api_tool("exec_shell"), + api_tool("apply_patch"), + api_tool("write_file"), + api_tool("edit_file"), + api_tool("fetch_url"), + api_tool("web_search"), + api_tool("tool_search_tool_regex"), + api_tool("tool_search_tool_bm25"), + ]; + + apply_provider_tool_policy(&mut catalog, ApiProvider::Arcee, &always_load); + + let defer = |name: &str| { + catalog + .iter() + .find(|tool| tool.name == name) + .and_then(|tool| tool.defer_loading) + }; + + // Benign read-only first-turn set stays active so the opening Arcee + // request clears Cloudflare's WAF. + for active in [ + "read_file", + "list_dir", + "git_status", + "git_diff", + "grep_files", + "file_search", + "update_plan", + "checklist_write", + ] { + assert_eq!(defer(active), Some(false), "{active} should stay active"); + } + // Tool-search stays active so the deferred tail remains discoverable. + assert_eq!(defer("tool_search_tool_regex"), Some(false)); + assert_eq!(defer("tool_search_tool_bm25"), Some(false)); + // WAF-risky / mutating tools are deferred on the first Arcee turn. + for deferred in [ + "exec_shell", + "apply_patch", + "write_file", + "edit_file", + "fetch_url", + "web_search", + ] { + assert_eq!(defer(deferred), Some(true), "{deferred} should be deferred"); + } + + let active = initial_active_tools(&catalog); + assert!(active.contains("read_file")); + assert!(active.contains("tool_search_tool_regex")); + assert!(!active.contains("exec_shell")); + assert!(!active.contains("apply_patch")); +} + +#[test] +fn provider_tool_policy_is_noop_for_non_waf_providers() { + let always_load = HashSet::new(); + let mut catalog = vec![api_tool("exec_shell"), api_tool("read_file")]; + + // DeepSeek has no reduced first-turn surface: the policy must leave the + // default deferral flags untouched (here: still unset). + apply_provider_tool_policy(&mut catalog, ApiProvider::Deepseek, &always_load); + + assert!(catalog.iter().all(|tool| tool.defer_loading.is_none())); +} + +#[test] +fn arcee_provider_policy_honors_always_load_override() { + let mut always_load = HashSet::new(); + always_load.insert("exec_shell".to_string()); + let mut catalog = vec![api_tool("exec_shell"), api_tool("apply_patch")]; + + apply_provider_tool_policy(&mut catalog, ApiProvider::Arcee, &always_load); + + let defer = |name: &str| { + catalog + .iter() + .find(|tool| tool.name == name) + .and_then(|tool| tool.defer_loading) + }; + // A user-pinned always_load tool stays active even on Arcee. + assert_eq!(defer("exec_shell"), Some(false)); + // Other risky tools remain deferred. + assert_eq!(defer("apply_patch"), Some(true)); +} + #[test] fn agent_catalog_keeps_edit_file_loaded_when_fuzz_is_omitted() { let (engine, _handle) = Engine::new(EngineConfig::default(), &Config::default()); @@ -684,7 +783,6 @@ fn print_agent_tool_catalog_metrics() { .with_plan_tool(new_shared_plan_state()) .with_review_tool(None, DEFAULT_TEXT_MODEL.to_string()) .with_rlm_tool(None, DEFAULT_TEXT_MODEL.to_string()) - .with_recall_archive_tool() .with_notify_tool() .with_subagent_tools(manager, runtime) .build(context); @@ -1172,8 +1270,6 @@ fn turn_tool_registry_builder_keeps_plan_mode_read_only_for_files() { assert!(registry.contains("task_list")); assert!(registry.contains("task_read")); assert!(registry.contains("handle_read")); - assert!(registry.contains("recall_archive")); - let plan_state_tools = [ "checklist_add", "checklist_update", @@ -1332,26 +1428,6 @@ fn plan_mode_toggle_preserves_catalog_byte_stability() { ); } -#[test] -fn parent_turn_registry_includes_recall_archive_for_investigative_modes() { - let (engine, _handle) = Engine::new(EngineConfig::default(), &Config::default()); - - for mode in [AppMode::Plan, AppMode::Agent, AppMode::Yolo] { - let registry = engine - .build_turn_tool_registry_builder( - mode, - engine.config.todos.clone(), - engine.config.plan_state.clone(), - ) - .build(engine.build_tool_context(mode, false)); - - assert!( - registry.contains("recall_archive"), - "parent {mode:?} registry should expose recall_archive" - ); - } -} - #[test] fn parent_turn_registry_includes_goal_tools_for_all_modes() { let (engine, _handle) = Engine::new(EngineConfig::default(), &Config::default()); diff --git a/crates/tui/src/core/engine/tool_catalog.rs b/crates/tui/src/core/engine/tool_catalog.rs index de4690c1..e88416a3 100644 --- a/crates/tui/src/core/engine/tool_catalog.rs +++ b/crates/tui/src/core/engine/tool_catalog.rs @@ -11,6 +11,7 @@ use std::time::Duration; use serde_json::{Value, json}; +use crate::config::ApiProvider; use crate::models::Tool; use crate::tools::spec::{ToolError, ToolResult, optional_u64, required_str}; use crate::tui::app::AppMode; @@ -90,6 +91,63 @@ pub(super) fn apply_native_tool_deferral( } } +/// First-turn native tool surface for Arcee (Trinity). +/// +/// Arcee's hosted API is fronted by Cloudflare, whose managed WAF returns +/// HTTP 403 "Access Denied" when a request body contains injection-like text. +/// CodeWhale's full agent catalog trips it: shell/patch/code-execution tool +/// descriptions and schemas carry example payloads (`rm -rf`, `../../`, +/// ` + 2600:1700:467:d410:f137:b94f:1dd0:d1e4 + a059a2873f3fdf82 +
Cloudflare Error Pages
+ "#; + + let message = sanitize_http_error_body(Some("Arcee AI"), 403, body); + + assert!(message.contains("Arcee AI API returned Cloudflare Access Denied")); + assert!(message.contains("ID a059a2873f3fdf82")); + assert!(!message.contains("

Access Denied

Cloudflare Error Pages

"#, + ); + let err = LlmError::from_http_response(403, &message); + + assert!(matches!(err, LlmError::AuthorizationError(_))); + } + + #[test] + fn arcee_access_denied_without_literal_cloudflare_is_still_summarized() { + // Mirrors api.arcee.ai's real 403 page: "Cloudflare" appears only in a + // `` attribute and the ` +

Access Denied

+

The action you just performed triggered a security alert.

+

Please contact us if this was a mistake.

+ Contact Support + 2600:1700:467:d410:f137:b94f:1dd0:d1e4 + a059c0d4caf1f9cc + "#; + + let message = sanitize_http_error_body(Some("Arcee AI"), 403, body); + + assert!( + message.contains("Arcee AI API returned Access Denied"), + "got: {message}" + ); + assert!(message.contains("ID a059c0d4caf1f9cc"), "got: {message}"); + assert!( + !message.to_ascii_lowercase().contains("cloudflare"), + "stripped Arcee page has no literal Cloudflare: {message}" + ); + assert!(!message.contains('<'), "no raw markup: {message}"); + assert!(message.len() < 300, "stays concise: {message}"); + + // A WAF block is authorization, not a bad API key. + let err = LlmError::from_http_response(403, &message); + assert!(matches!(err, LlmError::AuthorizationError(_))); + } + #[test] fn test_llm_error_suggested_retry_delay() { let err = LlmError::RateLimited { diff --git a/crates/tui/src/localization.rs b/crates/tui/src/localization.rs index f08fbccc..dfff3e64 100644 --- a/crates/tui/src/localization.rs +++ b/crates/tui/src/localization.rs @@ -263,8 +263,6 @@ pub enum MessageId { CmdConfigDescription, CmdContextDescription, CmdCostDescription, - CmdCycleDescription, - CmdCyclesDescription, CmdDiffDescription, CmdEditDescription, CmdExitDescription, @@ -305,7 +303,6 @@ pub enum MessageId { CmdQueueMissingIndex, CmdQueueIndexPositive, CmdQueueIndexMin, - CmdRecallDescription, CmdRelayDescription, CmdRenameDescription, CmdRestoreDescription, @@ -546,8 +543,6 @@ pub const ALL_MESSAGE_IDS: &[MessageId] = &[ MessageId::CmdConfigDescription, MessageId::CmdContextDescription, MessageId::CmdCostDescription, - MessageId::CmdCycleDescription, - MessageId::CmdCyclesDescription, MessageId::CmdDiffDescription, MessageId::CmdEditDescription, MessageId::CmdExitDescription, @@ -586,7 +581,6 @@ pub const ALL_MESSAGE_IDS: &[MessageId] = &[ MessageId::CmdQueueMissingIndex, MessageId::CmdQueueIndexPositive, MessageId::CmdQueueIndexMin, - MessageId::CmdRecallDescription, MessageId::CmdRelayDescription, MessageId::CmdRenameDescription, MessageId::CmdRestoreDescription, @@ -1024,17 +1018,13 @@ fn english(id: MessageId) -> &'static str { } MessageId::CmdBalanceDescription => "Check the active provider account balance", MessageId::CmdClearDescription => "Clear conversation history", - MessageId::CmdCompactDescription => { - "Trigger context compaction to free up space (legacy; v0.6.6 prefers cycle restart)" - } + MessageId::CmdCompactDescription => "Trigger context compaction to free up space", MessageId::CmdPurgeDescription => { "Let the agent surgically prune conversation history to free context space" } MessageId::CmdConfigDescription => "Open interactive configuration editor", MessageId::CmdContextDescription => "Open compact session context inspector", MessageId::CmdCostDescription => "Show session cost breakdown", - MessageId::CmdCycleDescription => "Show the carry-forward briefing for a specific cycle", - MessageId::CmdCyclesDescription => "List checkpoint-restart cycle handoffs in this session", MessageId::CmdDiffDescription => "Show file changes since session start", MessageId::CmdEditDescription => "Revise and resubmit the last message", MessageId::CmdExitDescription => "Exit the application", @@ -1089,7 +1079,6 @@ fn english(id: MessageId) -> &'static str { } MessageId::CmdQueueIndexPositive => "Index must be a positive number", MessageId::CmdQueueIndexMin => "Index must be >= 1", - MessageId::CmdRecallDescription => "Search prior cycle archives (BM25 over message text)", MessageId::CmdRelayDescription => "Create a session relay (接力) for a fresh thread", MessageId::CmdRenameDescription => "Rename the current session", MessageId::CmdRestoreDescription => { @@ -1448,19 +1437,13 @@ fn vietnamese(id: MessageId) -> Option<&'static str> { "Kiểm tra số dư tài khoản của nhà cung cấp dịch vụ đang hoạt động" } MessageId::CmdClearDescription => "Xóa lịch sử trò chuyện", - MessageId::CmdCompactDescription => { - "Kích hoạt nén ngữ cảnh để giải phóng không gian (cũ; v0.6.6 ưu tiên khởi động lại chu kỳ)" - } + MessageId::CmdCompactDescription => "Kích hoạt nén ngữ cảnh để giải phóng không gian", MessageId::CmdPurgeDescription => { "Cho agent cắt gọn lịch sử trò chuyện để giải phóng ngữ cảnh" } MessageId::CmdConfigDescription => "Mở trình chỉnh sửa cấu hình tương tác", MessageId::CmdContextDescription => "Mở trình kiểm tra ngữ cảnh phiên thu gọn", MessageId::CmdCostDescription => "Hiển thị chi tiết chi phí của phiên làm việc", - MessageId::CmdCycleDescription => "Hiển thị báo cáo chuyển tiếp cho một chu kỳ cụ thể", - MessageId::CmdCyclesDescription => { - "Liệt kê các lần bàn giao chu kỳ checkpoint-restart trong phiên này" - } MessageId::CmdDiffDescription => "Hiển thị các thay đổi của tệp kể từ khi bắt đầu phiên", MessageId::CmdEditDescription => "Chỉnh sửa và gửi lại tin nhắn gần nhất", MessageId::CmdExitDescription => "Thoát ứng dụng", @@ -1521,9 +1504,6 @@ fn vietnamese(id: MessageId) -> Option<&'static str> { } MessageId::CmdQueueIndexPositive => "Chỉ mục phải là số dương", MessageId::CmdQueueIndexMin => "Chỉ mục phải >= 1", - MessageId::CmdRecallDescription => { - "Tìm kiếm kho lưu trữ chu kỳ trước (BM25 trên văn bản tin nhắn)" - } MessageId::CmdRelayDescription => "Tạo một phiên tiếp sức cho một luồng mới", MessageId::CmdRenameDescription => "Đổi tên phiên làm việc hiện tại", MessageId::CmdRestoreDescription => { @@ -1912,19 +1892,13 @@ fn japanese(id: MessageId) -> Option<&'static str> { } MessageId::CmdBalanceDescription => "アクティブなプロバイダーのアカウント残高を確認", MessageId::CmdClearDescription => "会話履歴をクリア", - MessageId::CmdCompactDescription => { - "コンテキスト圧縮で容量を確保(旧式:v0.6.6 以降はサイクル再起動を推奨)" - } + MessageId::CmdCompactDescription => "コンテキスト圧縮で容量を確保", MessageId::CmdPurgeDescription => { "エージェントに会話履歴を分析させ、不要なメッセージを削除・要約" } MessageId::CmdConfigDescription => "インタラクティブな設定エディタを開く", MessageId::CmdContextDescription => "コンパクトなセッションコンテキスト検査ツールを開く", MessageId::CmdCostDescription => "セッションのコスト内訳を表示", - MessageId::CmdCycleDescription => "指定したサイクルの引き継ぎブリーフィングを表示", - MessageId::CmdCyclesDescription => { - "セッション内のチェックポイント再起動サイクルの引き継ぎを一覧表示" - } MessageId::CmdDiffDescription => "セッション開始以降のファイル変更を表示", MessageId::CmdEditDescription => "最後のメッセージを編集して再送信", MessageId::CmdExitDescription => "アプリを終了", @@ -1983,9 +1957,6 @@ fn japanese(id: MessageId) -> Option<&'static str> { } MessageId::CmdQueueIndexPositive => "インデックスは正の数値である必要があります", MessageId::CmdQueueIndexMin => "インデックスは 1 以上である必要があります", - MessageId::CmdRecallDescription => { - "過去のサイクルアーカイブを検索(メッセージ本文への BM25 検索)" - } MessageId::CmdRelayDescription => "新しいスレッド用のセッションリレー(接力)を作成", MessageId::CmdRenameDescription => "現在のセッションの名前を変更", MessageId::CmdRestoreDescription => { @@ -2323,15 +2294,11 @@ fn chinese_simplified(id: MessageId) -> Option<&'static str> { } MessageId::CmdBalanceDescription => "查看当前提供商账户余额", MessageId::CmdClearDescription => "清除对话历史", - MessageId::CmdCompactDescription => { - "触发上下文压缩以释放空间(旧版命令;v0.6.6 起建议改用循环重启)" - } + MessageId::CmdCompactDescription => "触发上下文压缩以释放空间", MessageId::CmdPurgeDescription => "让 Agent 分析对话历史,精确保留有用信息并移除冗余内容", MessageId::CmdConfigDescription => "打开交互式配置编辑器", MessageId::CmdContextDescription => "打开紧凑会话上下文检查器", MessageId::CmdCostDescription => "显示本次会话的费用明细", - MessageId::CmdCycleDescription => "显示指定循环的延续简报", - MessageId::CmdCyclesDescription => "列出本次会话中的检查点重启循环交接", MessageId::CmdDiffDescription => "显示会话开始以来的文件变更", MessageId::CmdEditDescription => "修改并重新提交最后一条消息", MessageId::CmdExitDescription => "退出应用", @@ -2380,7 +2347,6 @@ fn chinese_simplified(id: MessageId) -> Option<&'static str> { MessageId::CmdQueueMissingIndex => "缺少索引。用法: /queue edit 或 /queue drop ", MessageId::CmdQueueIndexPositive => "索引必须为正数", MessageId::CmdQueueIndexMin => "索引必须 >= 1", - MessageId::CmdRecallDescription => "搜索此前的循环归档(基于消息文本的 BM25 检索)", MessageId::CmdRelayDescription => "为新线程创建会话接力摘要", MessageId::CmdRenameDescription => "重命名当前会话", MessageId::CmdRestoreDescription => { @@ -2690,21 +2656,13 @@ fn portuguese_brazil(id: MessageId) -> Option<&'static str> { } MessageId::CmdBalanceDescription => "Verificar o saldo da conta do provedor ativo", MessageId::CmdClearDescription => "Limpar o histórico da conversa", - MessageId::CmdCompactDescription => { - "Compactar o contexto para liberar espaço (legado; a v0.6.6 prefere o reinício de ciclo)" - } + MessageId::CmdCompactDescription => "Compactar o contexto para liberar espaço", MessageId::CmdPurgeDescription => { "Deixe o agente podar cirurgicamente o histórico para liberar espaço de contexto" } MessageId::CmdConfigDescription => "Abrir o editor interativo de configuração", MessageId::CmdContextDescription => "Abrir o inspetor compacto de contexto da sessão", MessageId::CmdCostDescription => "Exibir o detalhamento de custo da sessão", - MessageId::CmdCycleDescription => { - "Exibir o briefing de continuidade de um ciclo específico" - } - MessageId::CmdCyclesDescription => { - "Listar as transferências dos ciclos checkpoint-restart desta sessão" - } MessageId::CmdDiffDescription => "Mostrar alterações em arquivos desde o início da sessão", MessageId::CmdEditDescription => "Revisar e reenviar a última mensagem", MessageId::CmdExitDescription => "Sair do aplicativo", @@ -2765,9 +2723,6 @@ fn portuguese_brazil(id: MessageId) -> Option<&'static str> { } MessageId::CmdQueueIndexPositive => "O índice deve ser um número positivo", MessageId::CmdQueueIndexMin => "O índice deve ser >= 1", - MessageId::CmdRecallDescription => { - "Buscar arquivos de ciclos anteriores (BM25 sobre o texto das mensagens)" - } MessageId::CmdRelayDescription => "Criar um relay da sessão para um novo thread", MessageId::CmdRenameDescription => "Renomear a sessão atual", MessageId::CmdRestoreDescription => { @@ -3133,21 +3088,13 @@ fn spanish_latin_america(id: MessageId) -> Option<&'static str> { } MessageId::CmdBalanceDescription => "Consultar el saldo de la cuenta del proveedor activo", MessageId::CmdClearDescription => "Limpiar el historial de la conversación", - MessageId::CmdCompactDescription => { - "Compactar el contexto para liberar espacio (heredado; v0.6.6 prefiere reinicio de ciclo)" - } + MessageId::CmdCompactDescription => "Compactar el contexto para liberar espacio", MessageId::CmdPurgeDescription => { "Permite al agente eliminar quirúrgicamente historial innecesario para liberar espacio de contexto" } MessageId::CmdConfigDescription => "Abrir el editor interactivo de configuración", MessageId::CmdContextDescription => "Abrir el inspector compacto de contexto de la sesión", MessageId::CmdCostDescription => "Mostrar el desglose de costo de la sesión", - MessageId::CmdCycleDescription => { - "Mostrar el resumen de continuidad de un ciclo específico" - } - MessageId::CmdCyclesDescription => { - "Listar las transferencias de checkpoint-restart de esta sesión" - } MessageId::CmdDiffDescription => "Mostrar cambios en archivos desde el inicio de la sesión", MessageId::CmdEditDescription => "Revisar y reenviar el último mensaje", MessageId::CmdExitDescription => "Salir de la aplicación", @@ -3216,9 +3163,6 @@ fn spanish_latin_america(id: MessageId) -> Option<&'static str> { } MessageId::CmdQueueIndexPositive => "El índice debe ser un número positivo", MessageId::CmdQueueIndexMin => "El índice debe ser >= 1", - MessageId::CmdRecallDescription => { - "Buscar archivos de ciclos anteriores (BM25 sobre el texto de los mensajes)" - } MessageId::CmdRelayDescription => "Crear un relay de sesión (接力) para un hilo nuevo", MessageId::CmdRenameDescription => "Renombrar la sesión actual", MessageId::CmdRestoreDescription => { diff --git a/crates/tui/src/main.rs b/crates/tui/src/main.rs index 72795d03..1dd470a6 100644 --- a/crates/tui/src/main.rs +++ b/crates/tui/src/main.rs @@ -30,7 +30,6 @@ mod config; mod config_ui; mod core; mod cost_status; -mod cycle_manager; mod deepseek_theme; mod dependencies; mod error_taxonomy; @@ -5637,7 +5636,9 @@ async fn run_exec_agent( use crate::core::engine::{EngineConfig, spawn_engine}; use crate::core::events::Event; use crate::core::ops::Op; - use crate::models::compaction_threshold_for_model; + use crate::models::{ + auto_compact_default_for_model, compaction_threshold_for_model_at_percent, + }; use crate::tools::plan::new_shared_plan_state; use crate::tools::todo::new_shared_todo_list; use crate::tui::app::AppMode; @@ -5649,15 +5650,19 @@ async fn run_exec_agent( .reasoning_effort .map(|effort| effort.as_setting().to_string()); - // Compaction defaults to disabled in v0.6.6: the checkpoint-restart cycle - // architecture (issue #124) handles long-context resets via fresh contexts - // rather than progressive summarization. The compaction config is still - // wired through so users who explicitly opt back in through TUI settings - // or direct engine config keep their old behavior. + let settings = crate::settings::Settings::load().unwrap_or_default(); + let auto_compact_enabled = if crate::settings::Settings::auto_compact_explicitly_configured() { + settings.auto_compact + } else { + auto_compact_default_for_model(&effective_model) + }; let compaction = CompactionConfig { - enabled: false, + enabled: auto_compact_enabled, model: effective_model.clone(), - token_threshold: compaction_threshold_for_model(&effective_model), + token_threshold: compaction_threshold_for_model_at_percent( + &effective_model, + settings.auto_compact_threshold_percent, + ), ..Default::default() }; @@ -5669,8 +5674,6 @@ async fn run_exec_agent( .lsp .clone() .map(crate::config::LspConfigToml::into_runtime); - let settings = crate::settings::Settings::load().unwrap_or_default(); - let engine_config = EngineConfig { model: effective_model.clone(), workspace: workspace.clone(), @@ -5691,7 +5694,6 @@ async fn run_exec_agent( max_subagents, features: config.features(), compaction, - cycle: crate::cycle_manager::CycleConfig::default(), capacity: crate::core::capacity::CapacityControllerConfig::from_app_config(config), todos: new_shared_todo_list(), plan_state: new_shared_plan_state(), diff --git a/crates/tui/src/models.rs b/crates/tui/src/models.rs index 60ed6351..5a749bfa 100644 --- a/crates/tui/src/models.rs +++ b/crates/tui/src/models.rs @@ -15,6 +15,7 @@ pub const DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS: u32 = 1_000_000; /// [`compaction_threshold_for_model`] (#664). pub const DEFAULT_COMPACTION_TOKEN_THRESHOLD: usize = 102_400; const COMPACTION_THRESHOLD_PERCENT: u32 = 80; +pub const DEFAULT_AUTO_COMPACT_MAX_CONTEXT_WINDOW_TOKENS: u32 = 262_144; // === Core Message Types === @@ -240,25 +241,31 @@ pub fn context_window_for_model(model: &str) -> Option { fn known_context_window_for_model(model_lower: &str) -> Option { match model_lower { - "trinity-mini" | "trinity-large-preview" => Some(128_000), - "arcee-ai/trinity-large-thinking" | "trinity-large-thinking" => Some(262_144), + "trinity-mini" => Some(128_000), + "arcee-ai/trinity-large-thinking" | "trinity-large-thinking" | "trinity-large-preview" => { + Some(262_144) + } "google/gemma-4-31b-it" | "google/gemma-4-31b-it:free" | "google/gemma-4-26b-a4b-it" | "google/gemma-4-26b-a4b-it:free" | "nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free" | "qwen/qwen3.6-35b-a3b" + | "qwen/qwen3.6-max-preview" | "qwen/qwen3.6-27b" | "tencent/hy3-preview" | "moonshotai/kimi-k2.6" | "moonshotai/kimi-k2.6:free" => Some(262_144), "z-ai/glm-5.1" | "z-ai/glm-5v-turbo" => Some(202_752), - "minimax/minimax-m3" => Some(1_000_000), - "xiaomi/mimo-v2.5-pro" - | "xiaomi/mimo-v2.5" - | "mimo-v2.5-pro" - | "mimo-v2.5" - | "qwen/qwen3.6-plus" => Some(1_000_000), + "minimax/minimax-m3" | "qwen/qwen3.6-flash" | "qwen/qwen3.6-plus" => Some(1_000_000), + "xiaomi/mimo-v2.5-pro" | "xiaomi/mimo-v2.5" | "mimo-v2.5-pro" | "mimo-v2.5" => { + Some(1_000_000) + } + "mimo-v2.5-asr" + | "mimo-v2.5-tts" + | "mimo-v2.5-tts-voicedesign" + | "mimo-v2.5-tts-voiceclone" + | "mimo-v2-tts" => Some(8_000), _ => None, } } @@ -275,9 +282,15 @@ pub fn max_output_tokens_for_model(model: &str) -> Option { } "minimax/minimax-m3" => Some(524_288), "qwen/qwen3.6-35b-a3b" | "qwen/qwen3.6-27b" => Some(262_140), + "qwen/qwen3.6-flash" | "qwen/qwen3.6-max-preview" | "qwen/qwen3.6-plus" => Some(65_536), "xiaomi/mimo-v2.5-pro" | "xiaomi/mimo-v2.5" | "mimo-v2.5-pro" | "mimo-v2.5" => { Some(131_072) } + "mimo-v2.5-asr" => Some(2_048), + "mimo-v2.5-tts" + | "mimo-v2.5-tts-voicedesign" + | "mimo-v2.5-tts-voiceclone" + | "mimo-v2-tts" => Some(8_192), "nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free" => Some(65_536), "google/gemma-4-31b-it" => Some(16_384), "google/gemma-4-31b-it:free" | "google/gemma-4-26b-a4b-it:free" => Some(32_768), @@ -303,8 +316,11 @@ pub fn model_supports_reasoning(model: &str) -> bool { | "moonshotai/kimi-k2.6:free" | "minimax/minimax-m3" | "nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free" + | "qwen/qwen3.6-flash" | "qwen/qwen3.6-35b-a3b" + | "qwen/qwen3.6-max-preview" | "qwen/qwen3.6-27b" + | "qwen/qwen3.6-plus" | "tencent/hy3-preview" | "xiaomi/mimo-v2.5-pro" | "xiaomi/mimo-v2.5" @@ -347,34 +363,32 @@ fn explicit_context_window_hint(model_lower: &str) -> Option { None } -/// Derive a compaction token threshold from model context window. -/// -/// Keeps headroom for tool outputs and assistant completion by defaulting to 80% -/// of known context windows. This is the hard automatic compaction threshold -/// used only when `auto_compact` is enabled; model-facing guidance still -/// suggests manual `/compact` earlier (~60%) during sustained work. +/// Derive a compaction token threshold from model context and a caller-supplied +/// percentage. #[must_use] -pub fn compaction_threshold_for_model(model: &str) -> usize { +pub fn compaction_threshold_for_model_at_percent(model: &str, percent: f64) -> usize { let Some(window) = context_window_for_model(model) else { return DEFAULT_COMPACTION_TOKEN_THRESHOLD; }; - let threshold = (u64::from(window) * u64::from(COMPACTION_THRESHOLD_PERCENT)) / 100; + let percent = percent.clamp(10.0, 100.0); + let threshold = (f64::from(window) * percent / 100.0).round(); + let threshold = if threshold.is_finite() && threshold > 0.0 { + threshold as u64 + } else { + u64::from(window) * u64::from(COMPACTION_THRESHOLD_PERCENT) / 100 + }; usize::try_from(threshold).unwrap_or(DEFAULT_COMPACTION_TOKEN_THRESHOLD) } -/// Compaction threshold keyed by model and caller-supplied effort tier. -/// -/// Replacement-style compaction rewrites the stable prefix, which works against -/// DeepSeek V4 prefix-cache economics. Reasoning effort must not lower V4's -/// automatic replacement threshold; V4-family models use the same late -/// 80%-of-window hard guard as `compaction_threshold_for_model`. +/// Whether auto-compaction should be enabled when the user did not explicitly +/// configure it. V4-class 1M models keep the prefix-cache-friendly opt-in +/// behavior; 256K-class and smaller known models need automatic pressure +/// relief near the context wall. #[must_use] -pub fn compaction_threshold_for_model_and_effort( - model: &str, - _reasoning_effort: Option<&str>, -) -> usize { - compaction_threshold_for_model(model) +pub fn auto_compact_default_for_model(model: &str) -> bool { + context_window_for_model(model) + .is_some_and(|window| window <= DEFAULT_AUTO_COMPACT_MAX_CONTEXT_WINDOW_TOKENS) } // === Streaming Structures === @@ -503,9 +517,13 @@ mod tests { for (model, expected_window) in [ ("arcee-ai/trinity-large-thinking", 262_144), ("trinity-large-thinking", 262_144), + (concat!("qwen/", "qwen3.6-flash"), 1_000_000), (concat!("qwen/", "qwen3.6-35b-a3b"), 262_144), + (concat!("qwen/", "qwen3.6-max-preview"), 262_144), + (concat!("qwen/", "qwen3.6-plus"), 1_000_000), (concat!("xiaomi/", "mimo-v2.5-pro"), 1_000_000), ("mimo-v2.5-pro", 1_000_000), + ("mimo-v2.5", 1_000_000), ("minimax/minimax-m3", 1_000_000), ("moonshotai/kimi-k2.6", 262_144), ("google/gemma-4-31b-it", 262_144), @@ -518,10 +536,13 @@ mod tests { #[test] fn arcee_direct_models_have_static_windows_without_reasoning_flag() { - for model in ["trinity-mini", "trinity-large-preview"] { - assert_eq!(context_window_for_model(model), Some(128_000)); - assert!(!model_supports_reasoning(model)); - } + assert_eq!( + context_window_for_model("trinity-large-preview"), + Some(262_144) + ); + assert!(!model_supports_reasoning("trinity-large-preview")); + assert_eq!(context_window_for_model("trinity-mini"), Some(128_000)); + assert!(!model_supports_reasoning("trinity-mini")); } #[test] @@ -534,11 +555,24 @@ mod tests { max_output_tokens_for_model("trinity-large-thinking"), Some(262_144) ); + assert_eq!( + max_output_tokens_for_model(concat!("qwen/", "qwen3.6-flash")), + Some(65_536) + ); + assert_eq!( + max_output_tokens_for_model(concat!("qwen/", "qwen3.6-max-preview")), + Some(65_536) + ); + assert_eq!( + max_output_tokens_for_model(concat!("qwen/", "qwen3.6-plus")), + Some(65_536) + ); assert_eq!( max_output_tokens_for_model(concat!("xiaomi/", "mimo-v2.5-pro")), Some(131_072) ); assert_eq!(max_output_tokens_for_model("mimo-v2.5-pro"), Some(131_072)); + assert_eq!(max_output_tokens_for_model("mimo-v2.5"), Some(131_072)); assert_eq!( max_output_tokens_for_model("minimax/minimax-m3"), Some(524_288) @@ -561,7 +595,7 @@ mod tests { #[test] fn compaction_threshold_scales_with_context_window() { assert_eq!( - compaction_threshold_for_model("deepseek-v3.2-128k"), + compaction_threshold_for_model_at_percent("deepseek-v3.2-128k", 80.0), 102_400 ); // v0.8.11 (#664): unknown-model fallback also resolves to 80% of @@ -570,55 +604,38 @@ mod tests { // `50_000` pre-v0.8.11; that hardcoded value compacted at ~5% of a // 1M window when model detection silently fell through, which is // exactly the prefix-cache-burning behaviour we're getting away from. - assert_eq!(compaction_threshold_for_model("unknown-model"), 102_400); + assert_eq!( + compaction_threshold_for_model_at_percent("unknown-model", 80.0), + 102_400 + ); } #[test] fn compaction_scales_for_deepseek_v4_1m_context() { - assert_eq!(compaction_threshold_for_model("deepseek-v4-pro"), 800_000); - } - - #[test] - fn v4_replacement_compaction_ignores_reasoning_effort() { assert_eq!( - compaction_threshold_for_model_and_effort("deepseek-v4-pro", Some("off")), - 800_000 - ); - assert_eq!( - compaction_threshold_for_model_and_effort("deepseek-v4-pro", Some("high")), - 800_000 - ); - assert_eq!( - compaction_threshold_for_model_and_effort("deepseek-v4-pro", Some("max")), + compaction_threshold_for_model_at_percent("deepseek-v4-pro", 80.0), 800_000 ); } #[test] - fn v4_soft_caps_only_apply_to_v4_models() { + fn compaction_threshold_honors_configured_percent() { assert_eq!( - compaction_threshold_for_model_and_effort("deepseek-v3.2-128k", Some("max")), - 102_400 + compaction_threshold_for_model_at_percent("deepseek-v4-pro", 75.0), + 750_000 ); - // v0.8.11 (#664): unknown-model fallback also lands on the - // 80%-of-128K legacy DeepSeek fallback instead of the legacy - // hardcoded 50K, so model-detection-fall-through doesn't quietly - // burn V4 prefix cache at 5%-of-window. assert_eq!( - compaction_threshold_for_model_and_effort("unknown-model", Some("max")), - 102_400 + compaction_threshold_for_model_at_percent("trinity-large-thinking", 80.0), + 209_715 ); } #[test] - fn v4_replacement_compaction_defaults_to_late_guard_when_effort_unknown() { - assert_eq!( - compaction_threshold_for_model_and_effort("deepseek-v4-pro", None), - 800_000 - ); - assert_eq!( - compaction_threshold_for_model_and_effort("deepseek-v4-pro", Some("unknown")), - 800_000 - ); + fn auto_compaction_defaults_on_for_256k_class_models_only() { + assert!(auto_compact_default_for_model("trinity-large-thinking")); + assert!(auto_compact_default_for_model("deepseek-v3.2-128k")); + assert!(!auto_compact_default_for_model("deepseek-v4-pro")); + assert!(!auto_compact_default_for_model("mimo-v2.5-pro")); + assert!(!auto_compact_default_for_model("unknown-model")); } } diff --git a/crates/tui/src/prompts/cycle_handoff.md b/crates/tui/src/prompts/cycle_handoff.md deleted file mode 100644 index c66c8ac3..00000000 --- a/crates/tui/src/prompts/cycle_handoff.md +++ /dev/null @@ -1,76 +0,0 @@ -# Cycle Handoff Briefing - -You are about to cross a context cycle boundary. The conversation so far has -crossed the per-cycle token budget, so this entire transcript is going to be -**archived to disk** and the next turn will start with a fresh context: the -original system prompt, structured state (todos, plan, working set, open -sub-agents), the user's pending message, and a free-form briefing that **you -write right now**. - -Your job, in this single message: produce a `` block of at most -**3,000 tokens** that captures the irreducible state the *next cycle's you* will -need to continue without redoing work. - -## What to put in `` - -Write concrete prose, not bullet-point summaries of the transcript. Cover: - -- **Decisions made and why.** The things you've chosen and the reasoning that - led there. Not "we discussed options" — name the choice and the constraint - that made it the right one. -- **Constraints discovered.** Concrete facts about the codebase, environment, - user preferences, or external systems that the next cycle will trip over if - it doesn't know them. (e.g. "the audit log is JSONL not JSON", "the user - insists on no `unwrap()` in non-test code", "macOS sandbox blocks raw - sockets in tools/exec.rs".) -- **Hypotheses being tested.** Open questions you're actively investigating, - what you're trying to falsify, what evidence would change your mind. -- **Approaches that failed.** Dead ends with enough detail that the next - cycle won't repeat them. Name the approach and the specific reason it - didn't work, not just "tried X, didn't work". -- **Open questions for the user.** Things you're blocked on that the next - cycle should ask about if the user doesn't volunteer them. - -## What NOT to put in `` - -- Tool output bytes. (They're already archived to disk.) -- File contents you read. (The next cycle can re-read them — pricier than a - briefing token, but cheaper than a wrong assumption built on a stale - paraphrase.) -- Step-by-step recap of what you did. The next cycle does not need to know - the order of operations; it needs to know the *current state*. -- Pleasantries, throat-clearing, framing language. Every token matters. - -## Format - -Open with `` on its own line. Close with `` on -its own line. No prose outside the tags. No nested tags. No code fences around -the block itself (you can use code fences inside if you need to quote a -specific snippet). - -The `recall_archive` tool is available in the next cycle. It searches the -archived transcripts (BM25 over message text, top-N hits) when your briefing -missed something the next cycle needs. Use it sparingly — frequent recalls -mean your briefing was too sparse, so refine your *next* briefing rather than -leaning on the archive. Don't try to be exhaustive here: be precise about the -load-bearing state and trust the archive for the rest. - -## Example shape (do not copy verbatim — write your own) - -``` - -Working on issue #124 (cycle-restart). Key decisions: (1) trigger at 110K -tokens not 128K — need ~8.5K headroom for the briefing turn itself plus -next-turn growth before the next boundary; (2) archive to JSONL with a -header line so future tools can stream-read without parsing the whole -file. Constraint discovered: DeepSeek V4 thinking-mode requires -reasoning_content replay on assistant messages with tool calls — so seed -messages can't include orphan tool calls from the archived cycle. The -approach of "summarize then keep recent messages" (the old compaction -path) was failing because the model couldn't tell which fragments were -verbatim vs. paraphrased; replacing it entirely. Open question for user: -do they want per-model briefing token caps, or one global cap? - -``` - -Now write your `` for this conversation. diff --git a/crates/tui/src/runtime_threads.rs b/crates/tui/src/runtime_threads.rs index 805bb2e8..db6a2cc3 100644 --- a/crates/tui/src/runtime_threads.rs +++ b/crates/tui/src/runtime_threads.rs @@ -31,7 +31,10 @@ use crate::core::coherence::CoherenceState; use crate::core::engine::{EngineConfig, EngineHandle, spawn_engine}; use crate::core::events::{Event as EngineEvent, TurnOutcomeStatus}; use crate::core::ops::Op; -use crate::models::{ContentBlock, Message, SystemPrompt, Usage, compaction_threshold_for_model}; +use crate::models::{ + ContentBlock, Message, SystemPrompt, Usage, auto_compact_default_for_model, + compaction_threshold_for_model_at_percent, +}; use crate::tools::plan::new_shared_plan_state; use crate::tools::subagent::SubAgentStatus; use crate::tools::todo::new_shared_todo_list; @@ -58,12 +61,9 @@ fn validated_record_id<'a>(id: &'a str, label: &str) -> Result<&'a str> { Ok(trimmed) } -/// Bumped to 2 for v0.6.6 — see issue #124. The persisted thread/turn/item -/// records didn't change shape, but the live engine semantics did: cycle -/// boundaries advance the `Session.cycle_count` and produce archived JSONL -/// files at `~/.deepseek/sessions//cycles/.jsonl`. A v1 reader on a -/// session written by v2 wouldn't know about the cycle archive directory and -/// might misinterpret message counts; bumping is the safe choice. +/// Bumped to 2 for v0.6.6 after live engine semantics changed. The persisted +/// thread/turn/item records did not change shape, but a v1 reader on a v2 +/// session should still fail closed rather than silently mis-replay. const CURRENT_RUNTIME_SCHEMA_VERSION: u32 = 2; const RUNTIME_RESTART_REASON: &str = "Interrupted by process restart"; const APPROVAL_DECISION_TIMEOUT: Duration = Duration::from_secs(300); @@ -1945,12 +1945,20 @@ impl RuntimeThreadManager { } // Compaction defaults to disabled in v0.6.6 — the cycle architecture - // (issue #124) handles long-context resets. Threads keep the - // legacy summarizer wired off unless an operator opts in via config. + let settings = crate::settings::Settings::load().unwrap_or_default(); + let auto_compact_enabled = + if crate::settings::Settings::auto_compact_explicitly_configured() { + settings.auto_compact + } else { + auto_compact_default_for_model(&thread.model) + }; let compaction = CompactionConfig { - enabled: false, + enabled: auto_compact_enabled, model: thread.model.clone(), - token_threshold: compaction_threshold_for_model(&thread.model), + token_threshold: compaction_threshold_for_model_at_percent( + &thread.model, + settings.auto_compact_threshold_percent, + ), ..Default::default() }; let network_policy = self.config.network.clone().map(|toml_cfg| { @@ -1961,7 +1969,6 @@ impl RuntimeThreadManager { .lsp .clone() .map(crate::config::LspConfigToml::into_runtime); - let settings = crate::settings::Settings::load().unwrap_or_default(); let engine_cfg = EngineConfig { model: thread.model.clone(), workspace: thread.workspace.clone(), @@ -1983,7 +1990,6 @@ impl RuntimeThreadManager { max_subagents: self.config.max_subagents().clamp(1, MAX_SUBAGENTS), features: self.config.features(), compaction, - cycle: crate::cycle_manager::CycleConfig::default(), capacity: crate::core::capacity::CapacityControllerConfig::from_app_config( &self.config, ), @@ -2418,26 +2424,6 @@ impl RuntimeThreadManager { .await?; } } - EngineEvent::CycleAdvanced { from, to, briefing } => { - // Surface the cycle boundary in the runtime event timeline so - // background-task subscribers and replay see it. The actual - // archive write is the engine's responsibility (see - // `cycle_manager::archive_cycle`); this event is informational. - self.emit_event( - &thread_id, - Some(&turn_id), - None, - "cycle.advanced", - json!({ - "from": from, - "to": to, - "briefing_tokens": briefing.token_estimate, - "cycle": briefing.cycle, - "timestamp": briefing.timestamp, - }), - ) - .await?; - } EngineEvent::CoherenceState { state, label, diff --git a/crates/tui/src/seam_manager.rs b/crates/tui/src/seam_manager.rs index 404ae3fe..53645486 100644 --- a/crates/tui/src/seam_manager.rs +++ b/crates/tui/src/seam_manager.rs @@ -22,11 +22,9 @@ //! | L1 | 192K | 0–128K | ~2,500 tokens | //! | L2 | 384K | 0–320K | ~1,800 tokens | //! | L3 | 576K | 0–512K | ~1,200 tokens | -//! | Cycle | 768K | All -> archive | <=3,000 tokens | //! //! Thresholds derived from V4 paper Figure 9 (MMR): 128K->256K is the real -//! cliff at -0.09. L1 triggers at 192K, before the cliff. Hard cycle at -//! 768K (~75% of 1M window). +//! cliff at -0.09. L1 triggers at 192K, before the cliff. use std::fmt::Write; use std::path::Path; @@ -40,7 +38,7 @@ use crate::client::DeepSeekClient; use crate::compaction::KEEP_RECENT_MESSAGES; use crate::compaction::plan_compaction; use crate::llm_client::LlmClient; -use crate::models::{ContentBlock, Message, MessageRequest, SystemBlock, SystemPrompt}; +use crate::models::{ContentBlock, Message, MessageRequest, SystemPrompt}; /// Default seam model — Flash is cheap and fast, ideal for summarization. pub const DEFAULT_SEAM_MODEL: &str = "deepseek-v4-flash"; @@ -49,7 +47,6 @@ pub const DEFAULT_SEAM_MODEL: &str = "deepseek-v4-flash"; pub const DEFAULT_L1_THRESHOLD: usize = 192_000; pub const DEFAULT_L2_THRESHOLD: usize = 384_000; pub const DEFAULT_L3_THRESHOLD: usize = 576_000; -pub const DEFAULT_CYCLE_THRESHOLD: usize = 768_000; /// Verbatim window: last N turns never summarized. pub const VERBATIM_WINDOW_TURNS: usize = 16; @@ -70,8 +67,6 @@ pub struct SeamConfig { pub l1_threshold: usize, pub l2_threshold: usize, pub l3_threshold: usize, - /// Hard cycle boundary. - pub cycle_threshold: usize, /// Model used for seam/briefing work. pub seam_model: String, } @@ -84,7 +79,6 @@ impl Default for SeamConfig { l1_threshold: DEFAULT_L1_THRESHOLD, l2_threshold: DEFAULT_L2_THRESHOLD, l3_threshold: DEFAULT_L3_THRESHOLD, - cycle_threshold: DEFAULT_CYCLE_THRESHOLD, seam_model: DEFAULT_SEAM_MODEL.to_string(), } } @@ -153,16 +147,6 @@ impl SeamManager { seam_level_for_active_input(&self.config, active_input_tokens, highest_existing_level) } - /// Check whether the hard cycle boundary is crossed. - /// - /// Note: not currently called — cycle detection uses an inline check. - /// Kept as the canonical boundary definition for future wiring. - #[must_use] - #[allow(dead_code)] - pub fn should_cycle(&self, active_input_tokens: usize) -> bool { - self.config.enabled && active_input_tokens >= self.config.cycle_threshold - } - /// Compute the verbatim window: the last N message indices that must /// never be summarized. Returns the start index of the verbatim window. pub fn verbatim_window_start(&self, message_count: usize) -> usize { @@ -368,82 +352,6 @@ impl SeamManager { )) } - /// Produce a cycle briefing using Flash. Unlike the current - /// `produce_briefing` in cycle_manager.rs (which uses the main model), - /// this consumes existing `` blocks as input rather - /// than scanning raw history. - pub async fn produce_flash_briefing( - &self, - existing_seams: &[String], - structured_state: Option<&str>, - ) -> Result { - let mut input = String::from( - "## Briefing Request\n\n\ - Produce a block summarizing the session state. \ - Include: decisions made + why, constraints discovered, \ - hypotheses being tested, approaches that failed, open questions. \ - Do NOT include tool output bytes, file contents, or step-by-step recaps.\n\n", - ); - - if let Some(state) = structured_state { - let _ = write!(input, "## Structured State\n\n{state}\n\n"); - } - - if !existing_seams.is_empty() { - input.push_str("## Prior Context Summaries\n\n"); - for (i, seam) in existing_seams.iter().enumerate() { - let _ = write!(input, "### Seam {}\n{seam}\n\n", i + 1); - } - } else { - input.push_str( - "No prior context summaries available. Produce a brief carry-forward \ - from the structured state alone.\n", - ); - } - - let request = MessageRequest { - model: self.config.seam_model.clone(), - messages: vec![Message { - role: "user".to_string(), - content: vec![ContentBlock::Text { - text: input, - cache_control: None, - }], - }], - max_tokens: 4_096, - system: Some(SystemPrompt::Blocks(vec![SystemBlock { - block_type: "text".to_string(), - text: crate::cycle_manager::CYCLE_HANDOFF_TEMPLATE.to_string(), - cache_control: None, - }])), - tools: None, - tool_choice: None, - metadata: None, - thinking: None, - reasoning_effort: None, - stream: Some(false), - temperature: Some(0.2), - top_p: None, - }; - - let response = self.flash_client.create_message(request).await?; - // Seam recompaction calls are billed; route through the - // side-channel (#526) so the footer total matches the - // DeepSeek website. - crate::cost_status::report(&response.model, &response.usage); - let raw = response - .content - .iter() - .filter_map(|block| match block { - ContentBlock::Text { text, .. } => Some(text.as_str()), - _ => None, - }) - .collect::>() - .join("\n"); - - Ok(crate::cycle_manager::extract_carry_forward(&raw)) - } - /// Internal: summarize a slice of messages using Flash. async fn summarize_messages( &self, @@ -568,11 +476,6 @@ impl SeamManager { let seams = self.active_seams.lock().await; seams.last().map(|s| s.level) } - - /// Clear seam tracking (called on hard cycle reset). - pub async fn reset(&self) { - self.active_seams.lock().await.clear(); - } } #[must_use] @@ -660,13 +563,6 @@ mod tests { ); } - #[test] - fn cycle_threshold_check() { - let config = SeamConfig::default(); - assert!(768_000 >= config.cycle_threshold); - assert!(700_000 < config.cycle_threshold); - } - #[test] fn verbatim_window_calculation() { let config = SeamConfig { diff --git a/crates/tui/src/settings.rs b/crates/tui/src/settings.rs index 29919554..021475ca 100644 --- a/crates/tui/src/settings.rs +++ b/crates/tui/src/settings.rs @@ -322,17 +322,13 @@ pub struct Settings { impl Default for Settings { fn default() -> Self { Self { - // v0.8.11: default flipped to `false` to stop the engine from - // routinely rewriting the prompt prefix, which breaks DeepSeek - // V4's prefix cache (~90% discount on cached prefix tokens) and - // ends up costing more than the compaction itself saves. With - // V4's 1M-token window the user has plenty of headroom to run - // long sessions without auto-trimming, and the explicit - // `/compact` slash command + `auto_compact = on` opt-in remain - // available for users / agents that decide compaction is - // worth the cache hit on their workload (#664). + // Keep the persisted fallback `false`; startup code may enable + // auto-compaction by model window when the user has not saved an + // explicit preference. V4-class 1M-token models stay opt-in to + // preserve prefix-cache behavior, while 256K-class models default + // on at the configured percent threshold. auto_compact: false, - auto_compact_threshold_percent: 70.0, + auto_compact_threshold_percent: 80.0, calm_mode: false, low_motion: false, fancy_animations: true, @@ -428,6 +424,23 @@ impl Settings { Ok(settings) } + /// Whether the user explicitly persisted an `auto_compact` preference. + /// When absent, callers may choose a model-aware default. + pub fn auto_compact_explicitly_configured() -> bool { + let Ok(path) = Self::path() else { + return false; + }; + let Ok(content) = std::fs::read_to_string(path) else { + return false; + }; + let Ok(value) = toml::from_str::(&content) else { + return false; + }; + value + .as_table() + .is_some_and(|table| table.contains_key("auto_compact")) + } + /// Apply environment-driven overlays after disk load. Used for /// platform a11y signals that should ignore the user's saved /// preference (#450). The env values are consulted at startup; @@ -826,7 +839,7 @@ impl Settings { ), ( "auto_compact_threshold_percent", - "Auto-compact trigger threshold percent when auto_compact is on: 10-100 (default 70)", + "Auto-compact trigger threshold percent when auto_compact is on: 10-100 (default 80)", ), ("calm_mode", "Calmer UI defaults: on/off"), ( @@ -1222,7 +1235,7 @@ mod tests { // flipped so the cache-friendly path is the one users get // without configuring anything (#664). assert!(!settings.auto_compact); - assert_eq!(settings.auto_compact_threshold_percent, 70.0); + assert_eq!(settings.auto_compact_threshold_percent, 80.0); } #[test] diff --git a/crates/tui/src/tools/mod.rs b/crates/tui/src/tools/mod.rs index 20a94e24..8c23ca9f 100644 --- a/crates/tui/src/tools/mod.rs +++ b/crates/tui/src/tools/mod.rs @@ -35,7 +35,6 @@ pub mod parallel; pub mod plan; pub mod plugin; pub mod project; -pub mod recall_archive; pub mod registry; pub mod remember; pub mod revert_turn; diff --git a/crates/tui/src/tools/recall_archive.rs b/crates/tui/src/tools/recall_archive.rs deleted file mode 100644 index 6ec0b1a6..00000000 --- a/crates/tui/src/tools/recall_archive.rs +++ /dev/null @@ -1,718 +0,0 @@ -//! `recall_archive` tool — search prior cycle archives (issue #127). -//! -//! Companion to the checkpoint-restart cycle architecture (#124). When the -//! agent's `` briefing missed something, this tool scans the -//! on-disk JSONL archives at `~/.deepseek/sessions//cycles/*.jsonl` and -//! returns the top-N matching messages. -//! -//! ## Scoring -//! -//! v1: a simplified BM25 over tokenized message text. No external embedding -//! model, no cache — every call walks the archives. Acceptable because the -//! per-cycle archive is bounded by the 110K cycle threshold and most sessions -//! cross at most a handful of cycles. v2 (later) can add an -//! `~/.deepseek/embeddings/` cache built on archive write. - -use std::collections::HashMap; -use std::fs::read_dir; -use std::path::PathBuf; - -use async_trait::async_trait; -use serde::Serialize; -use serde_json::{Value, json}; - -use super::spec::{ - ApprovalRequirement, ToolCapability, ToolContext, ToolError, ToolResult, ToolSpec, - optional_u64, required_str, -}; -use crate::cycle_manager::open_archive; -use crate::models::{ContentBlock, Message}; - -const DEFAULT_MAX_RESULTS: usize = 3; -const HARD_MAX_RESULTS: usize = 10; -const CONTEXT_WINDOW_CHARS: usize = 240; - -/// BM25 hyper-parameters. Standard defaults from the literature. -const K1: f64 = 1.5; -const B: f64 = 0.75; - -pub struct RecallArchiveTool; - -#[derive(Debug, Clone, Serialize)] -struct RecallHit { - cycle: u32, - /// 0-based message index within the cycle. - message_index: usize, - role: String, - score: f64, - /// Short window around the best match, with `…` markers when truncated. - excerpt: String, -} - -#[async_trait] -impl ToolSpec for RecallArchiveTool { - fn name(&self) -> &'static str { - "recall_archive" - } - - fn description(&self) -> &'static str { - "Search prior context cycles for content not in your briefing. Use sparingly — \ - frequent recalls mean your briefing was too sparse; refine your next briefing." - } - - fn input_schema(&self) -> Value { - json!({ - "type": "object", - "properties": { - "query": { - "type": "string", - "description": "Search query. Tokenized and BM25-scored against archived messages." - }, - "cycle": { - "type": "integer", - "description": "Optional: limit to a specific prior cycle number." - }, - "max_results": { - "type": "integer", - "description": "Maximum hits to return (default 3, hard-capped at 10)." - } - }, - "required": ["query"] - }) - } - - fn capabilities(&self) -> Vec { - vec![ToolCapability::ReadOnly] - } - - fn approval_requirement(&self) -> ApprovalRequirement { - ApprovalRequirement::Auto - } - - async fn execute(&self, input: Value, context: &ToolContext) -> Result { - let query = required_str(&input, "query")?.trim().to_string(); - if query.is_empty() { - return Err(ToolError::invalid_input("query cannot be empty")); - } - - let max_results = (optional_u64(&input, "max_results", DEFAULT_MAX_RESULTS as u64) - as usize) - .clamp(1, HARD_MAX_RESULTS); - let cycle_filter = input.get("cycle").and_then(Value::as_u64).map(|n| n as u32); - - let session_id = context.state_namespace.as_str(); - let archives = list_archives(session_id).map_err(|err| { - ToolError::execution_failed(format!("Failed to enumerate cycle archives: {err}")) - })?; - - if archives.is_empty() { - return Ok(ToolResult::success(json!({ - "hits": [], - "note": "No prior cycle archives exist. The session has not crossed a cycle boundary yet." - }).to_string())); - } - - let documents = load_messages(&archives, cycle_filter).map_err(|err| { - ToolError::execution_failed(format!("Failed to read cycle archives: {err}")) - })?; - - if documents.is_empty() { - let note = match cycle_filter { - Some(c) => format!("Cycle {c} has no messages in its archive."), - None => "Cycle archives exist but contain no message text.".to_string(), - }; - return Ok(ToolResult::success( - json!({"hits": [], "note": note}).to_string(), - )); - } - - let query_tokens = tokenize(&query); - if query_tokens.is_empty() { - return Err(ToolError::invalid_input( - "query has no scoring tokens after tokenization", - )); - } - - let hits = score_bm25(&documents, &query_tokens, max_results); - - let payload = json!({ - "query": query, - "cycles_searched": archives.len(), - "messages_scanned": documents.len(), - "hits": hits, - }); - - Ok(ToolResult::success(payload.to_string())) - } -} - -/// One archived message + its provenance, ready to score. -struct ArchivedDoc { - cycle: u32, - message_index: usize, - role: String, - text: String, - tokens: Vec, -} - -fn archive_root(session_id: &str) -> Result { - let home = dirs::home_dir().ok_or_else(|| { - std::io::Error::new( - std::io::ErrorKind::NotFound, - "Could not resolve home directory for cycle archive root", - ) - })?; - // Use resolved sessions dir (prefers ~/.codewhale/sessions) - let sessions = codewhale_config::resolve_state_dir("sessions") - .unwrap_or_else(|_| home.join(".deepseek").join("sessions")); - Ok(sessions.join(session_id).join("cycles")) -} - -/// Enumerate all archive files for a session, sorted by cycle number ascending. -fn list_archives(session_id: &str) -> Result, std::io::Error> { - let root = archive_root(session_id)?; - if !root.exists() { - return Ok(Vec::new()); - } - let mut archives: Vec<(u32, PathBuf)> = Vec::new(); - for entry in read_dir(&root)? { - let entry = entry?; - let path = entry.path(); - if path.extension().and_then(|s| s.to_str()) != Some("jsonl") { - continue; - } - let stem = match path.file_stem().and_then(|s| s.to_str()) { - Some(s) => s, - None => continue, - }; - let Ok(cycle_n) = stem.parse::() else { - continue; - }; - archives.push((cycle_n, path)); - } - archives.sort_by_key(|(n, _)| *n); - Ok(archives) -} - -/// Read messages from each archive into a flat scoreable list. -fn load_messages( - archives: &[(u32, PathBuf)], - cycle_filter: Option, -) -> Result, anyhow::Error> { - let mut docs: Vec = Vec::new(); - for (cycle_n, path) in archives { - if let Some(filter) = cycle_filter - && *cycle_n != filter - { - continue; - } - let (header, reader) = open_archive(path)?; - for (idx, message_result) in reader.enumerate() { - let message = message_result?; - let text = message_text(&message); - if text.trim().is_empty() { - continue; - } - let tokens = tokenize(&text); - if tokens.is_empty() { - continue; - } - docs.push(ArchivedDoc { - cycle: header.cycle, - message_index: idx, - role: message.role, - text, - tokens, - }); - } - } - Ok(docs) -} - -/// Concatenate all text-bearing content blocks of a message. -fn message_text(message: &Message) -> String { - let mut out = String::new(); - let mut push = |s: &str| { - if !out.is_empty() { - out.push('\n'); - } - out.push_str(s); - }; - for block in &message.content { - match block { - ContentBlock::Text { text, .. } => push(text), - ContentBlock::ToolUse { name, input, .. } => { - push(&format!("[tool_use {name}] {input}")); - } - ContentBlock::ToolResult { content, .. } => { - push(&format!("[tool_result] {content}")); - } - ContentBlock::Thinking { thinking } => { - push(&format!("[thinking] {thinking}")); - } - ContentBlock::ServerToolUse { name, input, .. } => { - push(&format!("[server_tool_use {name}] {input}")); - } - ContentBlock::ToolSearchToolResult { content, .. } => { - push(&format!("[tool_search_result] {content}")); - } - ContentBlock::CodeExecutionToolResult { content, .. } => { - push(&format!("[code_execution_result] {content}")); - } - } - } - out -} - -/// Lower-case, split on non-alphanumerics, drop short tokens. Same recipe as -/// most lightweight BM25 implementations. -fn tokenize(text: &str) -> Vec { - text.to_ascii_lowercase() - .split(|c: char| !c.is_alphanumeric()) - .filter(|s| s.len() >= 2) - .map(str::to_string) - .collect() -} - -/// Score documents against a query using BM25, return the top-N. -fn score_bm25(docs: &[ArchivedDoc], query_tokens: &[String], max_results: usize) -> Vec { - if docs.is_empty() || query_tokens.is_empty() { - return Vec::new(); - } - - let n = docs.len() as f64; - let avgdl: f64 = docs.iter().map(|d| d.tokens.len() as f64).sum::() / n.max(1.0); - - // Document frequency per query term. - let mut df: HashMap<&str, u64> = HashMap::new(); - for token in query_tokens { - let mut count = 0u64; - for doc in docs { - if doc.tokens.iter().any(|t| t == token) { - count += 1; - } - } - df.insert(token.as_str(), count); - } - - let mut scored: Vec<(f64, &ArchivedDoc)> = docs - .iter() - .map(|doc| (bm25_doc_score(doc, query_tokens, &df, n, avgdl), doc)) - .filter(|(score, _)| *score > 0.0) - .collect(); - - scored.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal)); - scored.truncate(max_results); - - scored - .into_iter() - .map(|(score, doc)| RecallHit { - cycle: doc.cycle, - message_index: doc.message_index, - role: doc.role.clone(), - score: round_score(score), - excerpt: best_window(&doc.text, query_tokens, CONTEXT_WINDOW_CHARS), - }) - .collect() -} - -fn bm25_doc_score( - doc: &ArchivedDoc, - query_tokens: &[String], - df: &HashMap<&str, u64>, - n: f64, - avgdl: f64, -) -> f64 { - let dl = doc.tokens.len() as f64; - if dl == 0.0 { - return 0.0; - } - let mut score = 0.0; - for token in query_tokens { - let tf = doc.tokens.iter().filter(|t| *t == token).count() as f64; - if tf == 0.0 { - continue; - } - let df_t = df.get(token.as_str()).copied().unwrap_or(0) as f64; - let idf = ((n - df_t + 0.5) / (df_t + 0.5) + 1.0).ln(); - let denom = tf + K1 * (1.0 - B + B * (dl / avgdl.max(1.0))); - score += idf * (tf * (K1 + 1.0)) / denom.max(f64::EPSILON); - } - score -} - -fn round_score(score: f64) -> f64 { - (score * 1000.0).round() / 1000.0 -} - -/// Find the substring of `text` of at most `window_chars` characters that -/// contains the densest cluster of query tokens. Returns it with `…` markers -/// when truncated. Falls back to a head-of-text excerpt when no tokens hit. -fn best_window(text: &str, query_tokens: &[String], window_chars: usize) -> String { - let lower = text.to_ascii_lowercase(); - let mut hit_positions: Vec = Vec::new(); - for token in query_tokens { - let mut start = 0usize; - while let Some(pos) = lower[start..].find(token.as_str()) { - hit_positions.push(start + pos); - start += pos + token.len(); - } - } - if hit_positions.is_empty() { - return head_excerpt(text, window_chars); - } - hit_positions.sort_unstable(); - - // Greedy: center the window on the first hit, walk forward as long as - // additional hits fit in the window. - let center = hit_positions[0]; - let half = window_chars / 2; - let start = center.saturating_sub(half); - let end = (start + window_chars).min(text.len()); - let start = align_char_boundary(text, start, false); - let end = align_char_boundary(text, end, true); - let prefix = if start > 0 { "…" } else { "" }; - let suffix = if end < text.len() { "…" } else { "" }; - format!("{prefix}{}{suffix}", &text[start..end]) -} - -fn head_excerpt(text: &str, max_chars: usize) -> String { - if text.len() <= max_chars { - return text.to_string(); - } - let cut = align_char_boundary(text, max_chars, true); - format!("{}…", &text[..cut]) -} - -/// Walk left or right until `idx` lands on a UTF-8 char boundary. -fn align_char_boundary(text: &str, mut idx: usize, walk_right: bool) -> usize { - if idx >= text.len() { - return text.len(); - } - while idx > 0 && idx < text.len() && !text.is_char_boundary(idx) { - if walk_right { - idx += 1; - } else { - idx -= 1; - } - } - idx -} - -#[cfg(test)] -mod tests { - use super::*; - use crate::cycle_manager::archive_cycle; - use crate::models::{ContentBlock, Message}; - use chrono::Utc; - use tempfile::TempDir; - - fn user_msg(text: &str) -> Message { - Message { - role: "user".to_string(), - content: vec![ContentBlock::Text { - text: text.to_string(), - cache_control: None, - }], - } - } - - fn asst_msg(text: &str) -> Message { - Message { - role: "assistant".to_string(), - content: vec![ContentBlock::Text { - text: text.to_string(), - cache_control: None, - }], - } - } - - /// Guard that points `dirs::home_dir()` at a tempdir for the test's - /// lifetime and restores the original on drop. On Unix this means - /// `HOME`; on Windows it means `USERPROFILE`. We set both so the same - /// guard works portably. Holds process-wide lock to serialize. - struct HomeGuard { - _tmp: TempDir, - original_home: Option, - original_userprofile: Option, - _lock: std::sync::MutexGuard<'static, ()>, - } - impl HomeGuard { - fn new() -> Self { - let lock = crate::test_support::lock_test_env(); - let tmp = TempDir::new().expect("tempdir"); - let original_home = std::env::var("HOME").ok(); - let original_userprofile = std::env::var("USERPROFILE").ok(); - // SAFETY: serialized by process-wide lock; only this thread mutates the - // env vars for the duration of the guard. - unsafe { - std::env::set_var("HOME", tmp.path()); - std::env::set_var("USERPROFILE", tmp.path()); - } - Self { - _tmp: tmp, - original_home, - original_userprofile, - _lock: lock, - } - } - } - impl Drop for HomeGuard { - fn drop(&mut self) { - // SAFETY: still holding HOME_LOCK. - unsafe { - match self.original_home.take() { - Some(v) => std::env::set_var("HOME", v), - None => std::env::remove_var("HOME"), - } - match self.original_userprofile.take() { - Some(v) => std::env::set_var("USERPROFILE", v), - None => std::env::remove_var("USERPROFILE"), - } - } - } - } - - fn fresh_session_id() -> String { - format!("test-{}", uuid::Uuid::new_v4()) - } - - fn ctx_for_session(workspace: &std::path::Path, session_id: &str) -> ToolContext { - ToolContext::new(workspace).with_state_namespace(session_id.to_string()) - } - - #[test] - fn tokenize_lowers_splits_drops_short() { - // Filter is `len >= 2`, so "a" and "0" drop; "42" stays. - let toks = tokenize("Hello, World! a 42 OAuth-2.0"); - assert_eq!(toks, vec!["hello", "world", "42", "oauth"]); - } - - #[test] - fn message_text_concatenates_blocks() { - let m = Message { - role: "user".to_string(), - content: vec![ - ContentBlock::Text { - text: "first".to_string(), - cache_control: None, - }, - ContentBlock::Text { - text: "second".to_string(), - cache_control: None, - }, - ], - }; - assert_eq!(message_text(&m), "first\nsecond"); - } - - #[test] - fn list_archives_handles_missing_dir() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let archives = list_archives(&sid).expect("list_archives"); - assert!(archives.is_empty()); - } - - #[test] - fn list_archives_sorts_by_cycle_number() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let now = Utc::now(); - archive_cycle(&sid, 3, &[user_msg("c3")], "deepseek-v4-pro", now).unwrap(); - archive_cycle(&sid, 1, &[user_msg("c1")], "deepseek-v4-pro", now).unwrap(); - archive_cycle(&sid, 2, &[user_msg("c2")], "deepseek-v4-pro", now).unwrap(); - let archives = list_archives(&sid).unwrap(); - let cycles: Vec = archives.iter().map(|(n, _)| *n).collect(); - assert_eq!(cycles, vec![1, 2, 3]); - } - - #[tokio::test] - async fn execute_returns_empty_when_no_archives() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let workspace = TempDir::new().unwrap(); - let ctx = ctx_for_session(workspace.path(), &sid); - let tool = RecallArchiveTool; - let result = tool - .execute(json!({"query": "anything"}), &ctx) - .await - .unwrap(); - assert!(result.content.contains("No prior cycle archives")); - } - - #[tokio::test] - async fn execute_finds_matching_messages() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let workspace = TempDir::new().unwrap(); - let ctx = ctx_for_session(workspace.path(), &sid); - let now = Utc::now(); - let messages = vec![ - user_msg("How does the cycle restart strategy work?"), - asst_msg("It archives messages to JSONL when crossing the 110K threshold."), - user_msg("What happens if briefing is too short?"), - asst_msg("Use recall_archive to retrieve specific past content from JSONL files."), - ]; - archive_cycle(&sid, 1, &messages, "deepseek-v4-pro", now).unwrap(); - - let tool = RecallArchiveTool; - let result = tool - .execute( - json!({"query": "JSONL archive briefing", "max_results": 3}), - &ctx, - ) - .await - .unwrap(); - assert!( - result.content.contains("\"cycle\":1"), - "got: {}", - result.content - ); - assert!( - result.content.contains("\"hits\""), - "got: {}", - result.content - ); - assert!(result.content.contains("JSONL"), "got: {}", result.content); - } - - #[tokio::test] - async fn execute_filters_by_cycle() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let workspace = TempDir::new().unwrap(); - let ctx = ctx_for_session(workspace.path(), &sid); - let now = Utc::now(); - archive_cycle( - &sid, - 1, - &[user_msg("alpha pattern")], - "deepseek-v4-pro", - now, - ) - .unwrap(); - archive_cycle( - &sid, - 2, - &[user_msg("alpha pattern")], - "deepseek-v4-pro", - now, - ) - .unwrap(); - - let tool = RecallArchiveTool; - let result = tool - .execute( - json!({"query": "alpha", "cycle": 2, "max_results": 5}), - &ctx, - ) - .await - .unwrap(); - assert!( - result.content.contains("\"cycle\":2"), - "got: {}", - result.content - ); - assert!( - !result.content.contains("\"cycle\":1"), - "got: {}", - result.content - ); - } - - #[tokio::test] - async fn execute_caps_max_results_at_hard_max() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let workspace = TempDir::new().unwrap(); - let ctx = ctx_for_session(workspace.path(), &sid); - let now = Utc::now(); - let mut messages: Vec = Vec::new(); - for i in 0..30 { - messages.push(user_msg(&format!("alpha message number {i}"))); - } - archive_cycle(&sid, 1, &messages, "deepseek-v4-pro", now).unwrap(); - - let tool = RecallArchiveTool; - let result = tool - .execute(json!({"query": "alpha", "max_results": 999}), &ctx) - .await - .unwrap(); - let count = result.content.matches("\"message_index\":").count(); - assert!(count <= HARD_MAX_RESULTS, "got {count} hits"); - } - - #[tokio::test] - async fn execute_rejects_empty_query() { - let _home = HomeGuard::new(); - let sid = fresh_session_id(); - let workspace = TempDir::new().unwrap(); - let ctx = ctx_for_session(workspace.path(), &sid); - let tool = RecallArchiveTool; - let err = tool - .execute(json!({"query": " "}), &ctx) - .await - .unwrap_err(); - assert!(matches!(err, ToolError::InvalidInput { .. })); - } - - #[test] - fn best_window_centers_on_first_hit() { - let text = "lorem ipsum dolor sit amet, the quick brown fox jumps over the lazy dog"; - let win = best_window(text, &["fox".to_string()], 30); - assert!(win.contains("fox"), "got: {win}"); - } - - #[test] - fn best_window_falls_back_to_head_when_no_hits() { - let text = "the quick brown fox jumps"; - let win = best_window(text, &["zzz".to_string()], 10); - assert!(win.starts_with("the quick"), "got: {win}"); - } - - #[test] - fn align_char_boundary_handles_multibyte() { - let text = "héllo world"; - // Index 2 is mid-byte for `é` (UTF-8 encoded as 2 bytes). - let aligned = align_char_boundary(text, 2, true); - assert!(text.is_char_boundary(aligned), "boundary check"); - } - - #[test] - fn bm25_returns_relevant_docs_drops_irrelevant() { - // BM25 length normalization can let very short matching docs outrank - // longer ones with higher term-frequency, so we only assert the - // weak invariant: matching docs are returned, non-matching docs are - // filtered out. - let docs = vec![ - ArchivedDoc { - cycle: 1, - message_index: 0, - role: "user".to_string(), - text: "cat dog cat dog cat".to_string(), - tokens: tokenize("cat dog cat dog cat"), - }, - ArchivedDoc { - cycle: 1, - message_index: 1, - role: "user".to_string(), - text: "fish bird".to_string(), - tokens: tokenize("fish bird"), - }, - ArchivedDoc { - cycle: 1, - message_index: 2, - role: "user".to_string(), - text: "cat sleeps".to_string(), - tokens: tokenize("cat sleeps"), - }, - ]; - let hits = score_bm25(&docs, &["cat".to_string()], 3); - let indices: Vec = hits.iter().map(|h| h.message_index).collect(); - assert!(indices.contains(&0), "doc 0 (3x cat) should appear"); - assert!(indices.contains(&2), "doc 2 (1x cat) should appear"); - assert!(!indices.contains(&1), "zero-score doc filtered"); - assert!(hits[0].score > 0.0, "top hit has positive score"); - } -} diff --git a/crates/tui/src/tools/registry.rs b/crates/tui/src/tools/registry.rs index 8dcf0e54..cb3ca0bf 100644 --- a/crates/tui/src/tools/registry.rs +++ b/crates/tui/src/tools/registry.rs @@ -820,14 +820,6 @@ impl ToolRegistryBuilder { self.with_tool(Arc::new(ReviewTool::new(client, model))) } - /// Include the `recall_archive` tool — searches prior cycle archives - /// produced by the checkpoint-restart system (issue #127). - #[must_use] - pub fn with_recall_archive_tool(self) -> Self { - use super::recall_archive::RecallArchiveTool; - self.with_tool(Arc::new(RecallArchiveTool)) - } - /// Include note tool. #[must_use] pub fn with_note_tool(self) -> Self { @@ -982,7 +974,6 @@ impl ToolRegistryBuilder { .with_review_tool(client.clone(), model.clone()) .with_rlm_tool(client, model) .with_speech_tools(speech_client, speech_output_dir) - .with_recall_archive_tool() .with_subagent_tools(manager, runtime) } diff --git a/crates/tui/src/tools/subagent/tests.rs b/crates/tui/src/tools/subagent/tests.rs index 2fd3a51a..50469c09 100644 --- a/crates/tui/src/tools/subagent/tests.rs +++ b/crates/tui/src/tools/subagent/tests.rs @@ -453,9 +453,7 @@ fn forked_subagent_messages_preserve_parent_prefix_then_append_task() { let fork_context = SubAgentForkContext { system: Some(parent_system.clone()), messages: vec![parent_message.clone()], - structured_state_block: Some( - "## Cycle State (Auto-Preserved)\n- Mode: `AGENT`".to_string(), - ), + structured_state_block: Some("## Fork State\n- Mode: `AGENT`".to_string()), }; let assignment = SubAgentAssignment::new("inspect parser".to_string(), Some("worker".into())); diff --git a/crates/tui/src/tui/app.rs b/crates/tui/src/tui/app.rs index 8c517c6b..a9bcd607 100644 --- a/crates/tui/src/tui/app.rs +++ b/crates/tui/src/tui/app.rs @@ -17,10 +17,12 @@ use crate::config::{ }; use crate::config_ui::ConfigUiMode; use crate::core::coherence::CoherenceState; -use crate::cycle_manager::{CycleBriefing, CycleConfig}; use crate::hooks::{HookContext, HookEvent, HookExecutor, HookResult}; use crate::localization::{Locale, MessageId, resolve_locale, tr}; -use crate::models::{Message, SystemPrompt, Tool, compaction_threshold_for_model_and_effort}; +use crate::models::{ + Message, SystemPrompt, Tool, auto_compact_default_for_model, + compaction_threshold_for_model_at_percent, +}; use crate::palette::{self, UiTheme}; use crate::pricing::{CostCurrency, CostEstimate}; use crate::session_manager::SessionContextReference; @@ -606,7 +608,7 @@ fn looks_like_raw_mouse_report_fragment(run: &[char]) -> bool { /// first BEL (`\x07`), `\x1b\\`, lone `\\`, or the next `\x1b]8;` /// block — terminator characters are optional because crossterm may /// have already consumed them. -/// - **Kitty CSI**: `(\x1b?) [ (? | > | =) ... u` — the `?`/`>`/`=` +/// - **Kitty CSI**: `(\x1b?) [ (? | > | < | =) ... u` — the /// private-parameter prefix is what distinguishes a Kitty response /// from a user-typed `[…u` (which is exceedingly rare and would /// need an explicit private-parameter byte to be a real CSI). @@ -711,8 +713,8 @@ fn match_osc8_fragment(chars: &[char], start: usize) -> Option { } /// If a Kitty keyboard protocol CSI fragment starts at `chars[start]`, -/// return its end index (exclusive). Shape: `(ESC)? [ (? | > | =) -/// [0-9;:]* u`. The private-parameter byte (`?`, `>`, `=`) is what +/// return its end index (exclusive). Shape: `(ESC)? [ (? | > | < | =) +/// [0-9;:]* u`. The private-parameter byte (`?`, `>`, `<`, `=`) is what /// keeps this distinct from text the user might plausibly type. fn match_kitty_csi_fragment(chars: &[char], start: usize) -> Option { let after_csi = if chars.get(start) == Some(&'\x1b') && chars.get(start + 1) == Some(&'[') { @@ -724,7 +726,7 @@ fn match_kitty_csi_fragment(chars: &[char], start: usize) -> Option { }; let priv_byte = chars.get(after_csi)?; - if !matches!(priv_byte, '?' | '>' | '=') { + if !matches!(priv_byte, '?' | '>' | '<' | '=') { return None; } @@ -979,6 +981,10 @@ pub struct ViewportState { pub transcript_scrollbar_dragging: bool, pub last_transcript_area: Option, pub last_composer_area: Option, + /// Outer rect of the right-hand sidebar (when visible), stored at render + /// time so mouse hit-testing can keep scroll events over the sidebar from + /// leaking into the transcript viewport. + pub last_sidebar_area: Option, pub last_transcript_top: usize, pub last_transcript_visible: usize, pub last_transcript_total: usize, @@ -1007,6 +1013,7 @@ impl Default for ViewportState { transcript_scrollbar_dragging: false, last_transcript_area: None, last_composer_area: None, + last_sidebar_area: None, last_transcript_top: 0, last_transcript_visible: 0, last_transcript_total: 0, @@ -1175,6 +1182,9 @@ pub struct App { /// Last status text already promoted from `status_message` into toast state. pub last_status_message_seen: Option, pub model: String, + /// Persisted model selections by provider name. Loaded from settings so + /// `/model` and the picker can surface saved provider-specific choices. + pub provider_models: HashMap, /// When true, the model is auto-selected based on request complexity /// rather than using a fixed model. The `/model auto` command sets this. /// `dispatch_user_message` calls `auto_model_heuristic` to resolve the @@ -1237,6 +1247,7 @@ pub struct App { #[allow(dead_code)] pub system_prompt: Option, pub auto_compact: bool, + pub auto_compact_user_configured: bool, pub auto_compact_threshold_percent: f64, pub calm_mode: bool, pub low_motion: bool, @@ -1536,14 +1547,6 @@ pub struct App { /// states. See [`App::arm_quit`] / [`App::quit_is_armed`]. pub quit_armed_until: Option, - /// Number of checkpoint-restart cycles crossed in this session - /// (issue #124). Mirrors `Session.cycle_count` on the engine side. - pub cycle_count: u32, - - /// Briefings produced at past cycle boundaries, in chronological order. - /// Used by `/cycles` and `/cycle ` slash commands. - pub cycle_briefings: Vec, - // === Prefix-Cache Stability Tracking === /// Number of times the prefix (system prompt + tool specs) has changed. pub prefix_change_count: u64, @@ -1558,10 +1561,6 @@ pub struct App { /// `/cache stats` for cache-hit debugging. pub last_pinned_prefix_hash: Option, - /// Active cycle configuration (token threshold, briefing cap, per-model - /// overrides). Loaded from config and forwarded to the engine. - pub cycle: CycleConfig, - // === Transcript filtering (#397) === /// Transcript cells the user has collapsed (hidden from view). /// Stores **original** virtual cell indices (pre-filtering). @@ -1753,8 +1752,15 @@ impl App { let settings = Settings::load().unwrap_or_else(|_| Settings::default()); let mut provider = config.api_provider(); - // Let settings override the config provider so runtime switches survive restarts. - if let Some(ref provider_str) = settings.default_provider + // Let settings preserve runtime switches only when config/CLI did not + // explicitly select a provider. A configured provider must not be + // pushed back to a stale saved setting on restart. + if config + .provider + .as_deref() + .and_then(ApiProvider::parse) + .is_none() + && let Some(ref provider_str) = settings.default_provider && let Some(parsed) = ApiProvider::parse(provider_str) { provider = parsed; @@ -1770,7 +1776,8 @@ impl App { let api_key_env_only = crate::config::active_provider_uses_env_only_api_key(&effective_auth_config); let was_onboarded = crate::tui::onboarding::is_onboarded(); - let auto_compact = settings.auto_compact; + let settings_auto_compact = settings.auto_compact; + let auto_compact_user_configured = Settings::auto_compact_explicitly_configured(); let auto_compact_threshold_percent = settings.auto_compact_threshold_percent; let calm_mode = settings.calm_mode; let low_motion = settings.low_motion; @@ -1808,10 +1815,10 @@ impl App { { ui_theme = ui_theme.with_background_color(background); } - let model = settings - .provider_models - .as_ref() - .and_then(|m| m.get(provider.as_str()).cloned()) + let provider_models = settings.provider_models.clone().unwrap_or_default(); + let model = provider_models + .get(provider.as_str()) + .cloned() .or_else(|| { // default_model is a DeepSeek-centric setting; other providers // get their model from config.toml / env (e.g. OPENAI_MODEL). @@ -1832,8 +1839,15 @@ impl App { } else { model.as_str() }; - let compact_threshold = - compaction_threshold_for_model_and_effort(threshold_model, configured_reasoning_effort); + let compact_threshold = compaction_threshold_for_model_at_percent( + threshold_model, + auto_compact_threshold_percent, + ); + let auto_compact = if auto_compact_user_configured { + settings_auto_compact + } else { + auto_compact_default_for_model(threshold_model) + }; let reasoning_effort = if auto_model { ReasoningEffort::Auto } else { @@ -1950,6 +1964,7 @@ impl App { sticky_status: None, last_status_message_seen: None, model, + provider_models, auto_model, last_effective_model: None, api_provider: provider, @@ -1970,6 +1985,7 @@ impl App { bracketed_paste_seen: false, system_prompt: None, auto_compact, + auto_compact_user_configured, auto_compact_threshold_percent, calm_mode, low_motion, @@ -2109,14 +2125,11 @@ impl App { last_submitted_prompt: None, auto_submit_initial_input, quit_armed_until: None, - cycle_count: 0, - cycle_briefings: Vec::new(), prefix_change_count: 0, prefix_checks_total: 0, prefix_stability_pct: None, last_prefix_change_desc: None, last_pinned_prefix_hash: None, - cycle: CycleConfig::default(), collapsed_cells: HashSet::new(), folded_thinking: HashSet::new(), collapsed_cell_map: Vec::new(), @@ -4708,7 +4721,10 @@ impl App { pub fn update_model_compaction_budget(&mut self) { let model = self.effective_model_for_budget().to_string(); self.compact_threshold = - compaction_threshold_for_model_and_effort(&model, self.reasoning_effort.api_value()); + compaction_threshold_for_model_at_percent(&model, self.auto_compact_threshold_percent); + if !self.auto_compact_user_configured { + self.auto_compact = auto_compact_default_for_model(&model); + } } pub fn set_model_selection(&mut self, model: String) { @@ -4780,12 +4796,6 @@ impl App { ..Default::default() } } - - /// Forward the active cycle configuration to the engine. Cloned so the - /// engine has its own copy to mutate per-session. - pub fn cycle_config(&self) -> CycleConfig { - self.cycle.clone() - } } pub fn media_attachment_reference(kind: &str, path: &Path, description: Option<&str>) -> String { @@ -5116,6 +5126,78 @@ mod tests { assert!(!app.api_key_env_only); } + #[test] + fn explicit_config_provider_wins_over_saved_default_provider() { + let _lock = lock_test_env(); + let tmp = tempfile::TempDir::new().expect("tempdir"); + let config_path = tmp.path().join("config.toml"); + std::fs::write( + tmp.path().join("settings.toml"), + "default_provider = \"deepseek\"\ndefault_model = \"deepseek-v4-pro\"\n", + ) + .expect("settings"); + let _config_path = EnvVarGuard::set("DEEPSEEK_CONFIG_PATH", &config_path); + + let config = Config { + provider: Some("xiaomi-mimo".to_string()), + providers: Some(ProvidersConfig { + xiaomi_mimo: ProviderConfig { + api_key: Some("mimo-config-key".to_string()), + model: Some("mimo-v2.5-pro".to_string()), + ..ProviderConfig::default() + }, + ..ProvidersConfig::default() + }), + ..Config::default() + }; + + let mut options = test_options(false); + options.model = "mimo-v2.5-pro".to_string(); + let app = App::new(options, &config); + + assert_eq!(app.api_provider, ApiProvider::XiaomiMimo); + assert_eq!(app.model, "mimo-v2.5-pro"); + assert!( + !app.onboarding_needs_api_key, + "Xiaomi MiMo provider config key should satisfy startup auth" + ); + } + + #[test] + fn app_new_defaults_auto_compact_on_for_256k_class_models_when_unset() { + let _lock = lock_test_env(); + let tmp = tempfile::TempDir::new().expect("tempdir"); + let config_path = tmp.path().join("config.toml"); + let _config_path = EnvVarGuard::set("DEEPSEEK_CONFIG_PATH", &config_path); + + let mut options = test_options(false); + options.model = "trinity-large-thinking".to_string(); + let app = App::new(options, &Config::default()); + + assert!(app.auto_compact); + assert!(!app.auto_compact_user_configured); + assert_eq!(app.auto_compact_threshold_percent, 80.0); + assert_eq!(app.compact_threshold, 209_715); + } + + #[test] + fn app_new_respects_explicit_auto_compact_false_for_256k_class_models() { + let _lock = lock_test_env(); + let tmp = tempfile::TempDir::new().expect("tempdir"); + let config_path = tmp.path().join("config.toml"); + std::fs::write(tmp.path().join("settings.toml"), "auto_compact = false\n") + .expect("settings"); + let _config_path = EnvVarGuard::set("DEEPSEEK_CONFIG_PATH", &config_path); + + let mut options = test_options(false); + options.model = "trinity-large-thinking".to_string(); + let app = App::new(options, &Config::default()); + + assert!(!app.auto_compact); + assert!(app.auto_compact_user_configured); + assert_eq!(app.compact_threshold, 209_715); + } + #[test] fn sidebar_focus_accepts_work_and_maps_legacy_trackers_to_work() { assert_eq!(SidebarFocus::from_setting("auto"), SidebarFocus::Auto); @@ -5328,9 +5410,9 @@ mod tests { app.insert_str("ready "); // Kitty keyboard protocol responses look like `\x1b[?1u`, - // `\x1b[>1u`, or `\x1b[?u`. With the ESC consumed, the tail - // shape is `[?…u` or `[>…u`. - app.insert_str("[?1u[>1u[?u"); + // `\x1b[>1u`, `\x1b[<1u`, or `\x1b[?u`. With the ESC consumed, + // the tail shape is `[?…u`, `[>…u`, or `[<…u`. + app.insert_str("[?1u[>1u[<1u[?u"); assert_eq!(app.input, "ready "); assert_eq!(app.cursor_position, "ready ".chars().count()); @@ -6079,14 +6161,31 @@ mod tests { #[test] fn test_update_model_compaction_budget() { let mut app = App::new(test_options(false), &Config::default()); + // Pin the inputs so the budget math is deterministic and does not + // depend on the developer's local `auto_compact_threshold_percent` + // setting (App::new loads real settings) or on auto-model resolution. + app.auto_model = false; + app.auto_compact_threshold_percent = 80.0; + + // A large-context model earns a proportionally larger compaction + // budget; an unknown model falls back to the fixed default threshold. + app.model = "deepseek-v4-pro".to_string(); + app.update_model_compaction_budget(); + let large_window_threshold = app.compact_threshold; + app.model = "unknown-test-model".to_string(); app.update_model_compaction_budget(); - let initial_threshold = app.compact_threshold; - app.model = "deepseek-v3.2-128k".to_string(); - app.update_model_compaction_budget(); - // Threshold may have changed based on model - // Explicit 128k DeepSeek model IDs have a higher threshold than unknown models. - assert!(app.compact_threshold >= initial_threshold); + let unknown_threshold = app.compact_threshold; + + assert!( + unknown_threshold > 0, + "unknown model must still get a positive budget" + ); + assert!( + large_window_threshold > unknown_threshold, + "a large-context model ({large_window_threshold}) should budget more \ + than an unknown model ({unknown_threshold})" + ); } #[test] diff --git a/crates/tui/src/tui/history.rs b/crates/tui/src/tui/history.rs index ca07070e..9ef96eb5 100644 --- a/crates/tui/src/tui/history.rs +++ b/crates/tui/src/tui/history.rs @@ -15,7 +15,7 @@ use crate::tools::review::ReviewOutput; use crate::tui::app::TranscriptSpacing; use crate::tui::diff_render; use crate::tui::markdown_render; -use crate::tui::ui_text::CopyLineSeparator; +use crate::tui::ui_text::{CopyLineSeparator, truncate_line_to_width}; // === Constants === @@ -266,7 +266,17 @@ impl HistoryCell { folded: bool, ) -> Vec> { match self { - HistoryCell::Thinking { .. } if !options.show_thinking => Vec::new(), + HistoryCell::Thinking { + streaming, + duration_secs, + .. + } if !options.show_thinking => { + if *streaming { + render_hidden_thinking_activity(width, *duration_secs, options.low_motion) + } else { + Vec::new() + } + } HistoryCell::Thinking { content, streaming, @@ -2271,6 +2281,46 @@ fn render_thinking( lines } +fn render_hidden_thinking_activity( + width: u16, + duration_secs: Option, + low_motion: bool, +) -> Vec> { + let state = ThinkingVisualState::Live; + let rail_style = Style::default().fg(thinking_state_accent(state)); + let body_style = thinking_style().italic(); + let content_width = width.saturating_sub(3).max(1) as usize; + + let mut header_spans = vec![ + Span::styled( + format!("{REASONING_OPENER} "), + Style::default().fg(thinking_state_accent(state)), + ), + Span::styled("thinking", thinking_title_style()), + Span::styled(" ", Style::default()), + Span::styled(thinking_status_label(state), thinking_status_style(state)), + ]; + if let Some(dur) = duration_secs { + header_spans.push(Span::styled(" · ", Style::default().fg(palette::TEXT_DIM))); + header_spans.push(Span::styled(format!("{dur:.1}s"), thinking_meta_style())); + } + + let mut body = + truncate_line_to_width("reasoning hidden; model is still working", content_width); + if !low_motion { + body.push(' '); + body.push_str(REASONING_CURSOR); + } + + vec![ + Line::from(header_spans), + Line::from(vec![ + Span::styled(REASONING_RAIL.to_string(), rail_style), + Span::styled(body, body_style), + ]), + ] +} + fn render_message( prefix: &str, label_style: Style, @@ -2291,6 +2341,15 @@ fn render_message_with_copy_metadata( content: &str, width: u16, ) -> Vec { + // An assistant cell whose content is entirely whitespace (e.g. a stray + // newline streamed between reasoning and a tool call) would otherwise + // render as a bare, orphaned role glyph floating on its own line — the + // "blue dots with nothing after them" artifact. Render nothing so the + // transcript doesn't accumulate empty markers. Real prose, including + // messages that merely start with blank lines, still renders normally. + if prefix == ASSISTANT_GLYPH && content.trim().is_empty() { + return Vec::new(); + } let prefix_width = UnicodeWidthStr::width(prefix); let prefix_width_u16 = u16::try_from(prefix_width.saturating_add(2)).unwrap_or(u16::MAX); let content_width = usize::from(width.saturating_sub(prefix_width_u16).max(1)); @@ -3903,6 +3962,56 @@ mod tests { ); } + #[test] + fn render_hidden_streaming_thinking_shows_activity_without_content() { + let cell = HistoryCell::Thinking { + content: "private chain of thought that must not be shown".to_string(), + streaming: true, + duration_secs: None, + }; + + let lines = cell.lines_with_options( + 80, + TranscriptRenderOptions { + show_thinking: false, + low_motion: true, + ..TranscriptRenderOptions::default() + }, + ); + let text = lines_text(&lines); + + assert!( + text.contains("reasoning hidden"), + "hidden live thinking should still show progress: {text}" + ); + assert!( + !text.contains("private chain of thought"), + "hidden live thinking must not reveal content: {text}" + ); + } + + #[test] + fn render_hidden_completed_thinking_stays_hidden() { + let cell = HistoryCell::Thinking { + content: "completed hidden reasoning".to_string(), + streaming: false, + duration_secs: Some(1.0), + }; + + let lines = cell.lines_with_options( + 80, + TranscriptRenderOptions { + show_thinking: false, + ..TranscriptRenderOptions::default() + }, + ); + + assert!( + lines.is_empty(), + "completed hidden thinking should stay out of the transcript" + ); + } + #[test] fn render_thinking_streaming_truncated_shows_continues_affordance() { // #861 RC4: when a streaming thinking block exceeds the line cap, @@ -4084,6 +4193,38 @@ mod tests { assert_ne!(head.style.bg, Some(palette::SURFACE_ELEVATED)); } + #[test] + fn whitespace_only_assistant_cell_renders_nothing() { + // Regression: a stray newline/space streamed between reasoning and a + // tool call produced a whitespace-only Assistant cell that rendered as + // a bare, orphaned role glyph — the "blue dot with nothing after it" + // artifact. It must collapse to zero lines instead. + for content in ["", " ", "\n", "\n\n", " \t \n"] { + for streaming in [false, true] { + let cell = HistoryCell::Assistant { + content: content.to_string(), + streaming, + }; + assert!( + cell.lines(80).is_empty(), + "whitespace-only assistant content {content:?} (streaming={streaming}) \ + must render no lines", + ); + } + } + + // Sanity: real prose still renders the role glyph as its first span. + let cell = HistoryCell::Assistant { + content: "hi".to_string(), + streaming: false, + }; + assert_eq!( + cell.lines(80)[0].spans[0].content.as_ref(), + ASSISTANT_GLYPH, + "non-empty assistant content must still render the role glyph", + ); + } + #[test] fn assistant_cell_still_renders_markdown() { let cell = HistoryCell::Assistant { diff --git a/crates/tui/src/tui/model_picker.rs b/crates/tui/src/tui/model_picker.rs index 563bcf41..e2a8f1a0 100644 --- a/crates/tui/src/tui/model_picker.rs +++ b/crates/tui/src/tui/model_picker.rs @@ -39,6 +39,7 @@ enum Pane { pub struct ModelPickerView { initial_model: String, + initial_provider: ApiProvider, initial_effort: ReasoningEffort, /// Working selection (separate from the initial values so we can offer a /// clean Esc-to-cancel without mutating App state). @@ -48,7 +49,14 @@ pub struct ModelPickerView { /// True when the active model is one we don't list — we still show it /// so the picker doesn't quietly forget the user's chosen IDs. show_custom_model_row: bool, - model_ids: Vec<&'static str>, + model_rows: Vec, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +struct ModelPickerRow { + id: String, + provider: Option, + hint: String, } impl ModelPickerView { @@ -59,11 +67,14 @@ impl ModelPickerView { } else { app.model.clone() }; - let model_ids = picker_model_ids_for_provider(app.api_provider); - let mut selected_model_idx = model_ids.iter().position(|id| *id == initial_model); + let model_rows = picker_model_rows_for_app(app); + let mut selected_model_idx = model_rows.iter().position(|row| { + row.id == initial_model + && (row.provider.is_none() || row.provider == Some(app.api_provider)) + }); let show_custom_model_row = selected_model_idx.is_none(); if show_custom_model_row { - selected_model_idx = Some(model_ids.len()); + selected_model_idx = Some(model_rows.len()); } let selected_model_idx = selected_model_idx.unwrap_or(0); @@ -80,35 +91,49 @@ impl ModelPickerView { Self { initial_model, + initial_provider: app.api_provider, initial_effort, selected_model_idx, selected_effort_idx, focus: Pane::Model, show_custom_model_row, - model_ids, + model_rows, } } - fn visible_model_ids(&self) -> Vec<&'static str> { - self.model_ids.clone() + #[cfg(test)] + fn visible_model_ids(&self) -> Vec<&str> { + self.model_rows.iter().map(|row| row.id.as_str()).collect() + } + + fn visible_model_rows(&self) -> &[ModelPickerRow] { + &self.model_rows } fn model_row_count(&self) -> usize { - self.visible_model_ids().len() + if self.show_custom_model_row { 1 } else { 0 } + self.model_rows.len() + if self.show_custom_model_row { 1 } else { 0 } } /// Resolve the currently highlighted row to a model id. fn resolved_model(&self) -> String { - let visible = self.visible_model_ids(); - if self.show_custom_model_row && self.selected_model_idx == visible.len() { + if self.show_custom_model_row && self.selected_model_idx == self.model_rows.len() { self.initial_model.clone() - } else if self.selected_model_idx < visible.len() { - visible[self.selected_model_idx].to_string() + } else if self.selected_model_idx < self.model_rows.len() { + self.model_rows[self.selected_model_idx].id.clone() } else { self.initial_model.clone() } } + fn resolved_provider(&self) -> Option { + if self.show_custom_model_row && self.selected_model_idx == self.model_rows.len() { + return Some(self.initial_provider); + } + self.model_rows + .get(self.selected_model_idx) + .and_then(|row| row.provider) + } + fn resolved_effort(&self) -> ReasoningEffort { if self.resolved_model().trim().eq_ignore_ascii_case("auto") { return ReasoningEffort::Auto; @@ -162,8 +187,12 @@ impl ModelPickerView { } fn build_event(&self) -> ViewEvent { + let provider = self + .resolved_provider() + .filter(|provider| *provider != self.initial_provider); ViewEvent::ModelPickerApplied { model: self.resolved_model(), + provider, effort: self.resolved_effort(), previous_model: self.initial_model.clone(), previous_effort: self.initial_effort, @@ -311,6 +340,7 @@ fn fit_text(text: &str, width: usize) -> String { out } +#[cfg(test)] fn picker_model_ids_for_provider(provider: ApiProvider) -> Vec<&'static str> { let mut models = vec!["auto"]; for id in model_completion_names_for_provider(provider) { @@ -321,6 +351,58 @@ fn picker_model_ids_for_provider(provider: ApiProvider) -> Vec<&'static str> { models } +fn picker_model_rows_for_app(app: &App) -> Vec { + let mut rows = Vec::new(); + push_model_row( + &mut rows, + "auto".to_string(), + None, + picker_model_hint("auto").to_string(), + ); + + for id in model_completion_names_for_provider(app.api_provider) { + if id != "auto" { + push_model_row( + &mut rows, + id.to_string(), + Some(app.api_provider), + picker_model_hint(id).to_string(), + ); + } + } + + if let Some(model) = app + .provider_models + .get(app.api_provider.as_str()) + .map(|model| model.trim()) + .filter(|model| !model.is_empty()) + { + push_model_row( + &mut rows, + model.to_string(), + Some(app.api_provider), + format!("{} saved", app.api_provider.display_name()), + ); + } + + rows +} + +fn push_model_row( + rows: &mut Vec, + id: String, + provider: Option, + hint: String, +) { + if rows + .iter() + .any(|row| row.id == id && row.provider == provider) + { + return; + } + rows.push(ModelPickerRow { id, provider, hint }); +} + fn picker_model_hint(id: &str) -> &'static str { match id { "auto" => "select per turn", @@ -331,7 +413,8 @@ fn picker_model_hint(id: &str) -> &'static str { "faster model" } "arcee-ai/trinity-large-thinking" => "large thinking", - "xiaomi/mimo-v2.5-pro" | "mimo-v2.5-pro" => "long context", + "xiaomi/mimo-v2.5-pro" | "mimo-v2.5-pro" => "reasoning / coding", + "xiaomi/mimo-v2.5" | "mimo-v2.5" => "v2.5 omni", "mimo-v2.5-tts" | "mimo-v2-tts" => "speech / TTS", "mimo-v2.5-tts-voicedesign" => "voice design", "mimo-v2.5-tts-voiceclone" => "voice clone", @@ -474,9 +557,9 @@ impl ModelPickerView { .split(inner); let mut model_rows: Vec<(String, String)> = self - .visible_model_ids() - .into_iter() - .map(|id| (id.to_string(), picker_model_hint(id).to_string())) + .visible_model_rows() + .iter() + .map(|row| (row.id.clone(), row.hint.clone())) .collect(); if self.show_custom_model_row { model_rows.push((self.initial_model.clone(), "current (custom)".to_string())); @@ -558,6 +641,7 @@ mod tests { app.reasoning_effort = ReasoningEffort::Max; app.api_provider = crate::config::ApiProvider::Deepseek; app.model_ids_passthrough = false; + app.provider_models.clear(); (app, lock) } @@ -661,6 +745,38 @@ mod tests { assert_eq!(view.resolved_model(), "minimax/minimax-m3"); } + #[test] + fn picker_lists_xiaomi_mimo_chat_models_without_speech_models() { + let (mut app, _lock) = create_test_app(); + app.api_provider = crate::config::ApiProvider::XiaomiMimo; + app.model = "mimo-v2.5-pro".to_string(); + app.auto_model = false; + + let view = ModelPickerView::new(&app); + let model_ids = view.visible_model_ids(); + + for expected in ["mimo-v2.5-pro", "mimo-v2.5"] { + assert!(model_ids.contains(&expected), "missing {expected}"); + } + for deprecated in ["mimo-v2-pro", "mimo-v2-omni", "mimo-v2-flash"] { + assert!( + !model_ids.contains(&deprecated), + "{deprecated} is deprecated and should not be promoted" + ); + } + for speech_model in [ + "mimo-v2.5-tts", + "mimo-v2.5-tts-voicedesign", + "mimo-v2.5-tts-voiceclone", + "mimo-v2-tts", + ] { + assert!( + !model_ids.contains(&speech_model), + "{speech_model} should not appear in the chat model picker" + ); + } + } + #[test] fn visible_row_window_tracks_selection_in_short_panes() { assert_eq!(visible_row_window(0, 16, 8), (0, 8)); @@ -704,6 +820,88 @@ mod tests { assert_eq!(view.resolved_model(), "opencode-go/glm-5.1"); } + #[test] + fn picker_exposes_saved_model_for_active_provider() { + let (mut app, _lock) = create_test_app(); + app.api_provider = crate::config::ApiProvider::XiaomiMimo; + app.model = "mimo-v2.5-custom".to_string(); + app.auto_model = false; + app.provider_models + .insert("xiaomi-mimo".to_string(), "mimo-v2.5-custom".to_string()); + + let mut view = ModelPickerView::new(&app); + view.selected_model_idx = view + .model_rows + .iter() + .position(|row| { + row.id == "mimo-v2.5-custom" + && row.provider == Some(crate::config::ApiProvider::XiaomiMimo) + }) + .expect("saved Xiaomi MiMo model row"); + + let action = view.handle_key(KeyEvent::new( + KeyCode::Enter, + crossterm::event::KeyModifiers::NONE, + )); + match action { + ViewAction::EmitAndClose(ViewEvent::ModelPickerApplied { + model, provider, .. + }) => { + assert_eq!(model, "mimo-v2.5-custom"); + assert_eq!(provider, None); + } + other => panic!("expected ModelPickerApplied EmitAndClose, got {other:?}"), + } + } + + #[test] + fn picker_hides_saved_models_from_other_providers() { + let (mut app, _lock) = create_test_app(); + app.api_provider = crate::config::ApiProvider::XiaomiMimo; + app.model = "mimo-v2.5-pro".to_string(); + app.auto_model = false; + app.provider_models + .insert("deepseek".to_string(), "deepseek-v4-pro".to_string()); + app.provider_models + .insert("moonshot".to_string(), "kimi-k2.6".to_string()); + + let view = ModelPickerView::new(&app); + let model_ids = view.visible_model_ids(); + + assert!(model_ids.contains(&"mimo-v2.5-pro")); + assert!(!model_ids.contains(&"deepseek-v4-pro")); + assert!(!model_ids.contains(&"kimi-k2.6")); + assert!(!view.show_custom_model_row); + } + + #[test] + fn picker_does_not_hijack_current_custom_model_with_saved_provider_row() { + let (mut app, _lock) = create_test_app(); + app.api_provider = crate::config::ApiProvider::Openai; + app.model_ids_passthrough = true; + app.model = "kimi-k2.6".to_string(); + app.provider_models + .insert("moonshot".to_string(), "kimi-k2.6".to_string()); + + let mut view = ModelPickerView::new(&app); + + assert!(view.show_custom_model_row); + assert_eq!(view.resolved_model(), "kimi-k2.6"); + let action = view.handle_key(KeyEvent::new( + KeyCode::Enter, + crossterm::event::KeyModifiers::NONE, + )); + match action { + ViewAction::EmitAndClose(ViewEvent::ModelPickerApplied { + model, provider, .. + }) => { + assert_eq!(model, "kimi-k2.6"); + assert_eq!(provider, None); + } + other => panic!("expected ModelPickerApplied EmitAndClose, got {other:?}"), + } + } + #[test] fn arrow_keys_move_within_focused_pane() { let (mut app, _lock) = create_test_app(); diff --git a/crates/tui/src/tui/mouse_ui.rs b/crates/tui/src/tui/mouse_ui.rs index 1d0feca0..8fc0e3c5 100644 --- a/crates/tui/src/tui/mouse_ui.rs +++ b/crates/tui/src/tui/mouse_ui.rs @@ -221,12 +221,32 @@ pub(crate) fn handle_mouse_event(app: &mut App, mouse: MouseEvent) -> Vec= area.x + && mouse.column < area.x.saturating_add(area.width) + && mouse.row >= area.y + && mouse.row < area.y.saturating_add(area.height) + }) { + return Vec::new(); + } + match mouse.kind { MouseEventKind::Moved => { // Update last mouse position for tooltip rendering. app.last_mouse_pos = Some((mouse.column, mouse.row)); - // Check sidebar sections for hover tooltip. + // Check sidebar sections for hover tooltip. Only surface a tooltip + // when the hovered line was actually truncated to fit the panel + // width — otherwise it just paints a redundant copy of + // already-visible text over the neighbouring row, which reads as + // visual corruption. let mut found = false; for section in &app.sidebar_hover.sections { if mouse.column >= section.content_area.x @@ -243,10 +263,12 @@ pub(crate) fn handle_mouse_event(app: &mut App, mouse: MouseEvent) -> Vec section.content_area.width as usize; + let desired = truncated.then(|| full.clone()); + if app.sidebar_hover_tooltip != desired { + app.sidebar_hover_tooltip = desired; app.needs_redraw = true; } found = true; diff --git a/crates/tui/src/tui/notifications.rs b/crates/tui/src/tui/notifications.rs index 1cadf7d1..5c3b84bd 100644 --- a/crates/tui/src/tui/notifications.rs +++ b/crates/tui/src/tui/notifications.rs @@ -508,7 +508,7 @@ fn macos_display_notification(msg: &str) { /// Examples: /// * `"45s"`, `"1m"`, `"1m 12s"` /// * `"1h"`, `"3h 12m"` (#447 — was previously `"192m"` form) -/// * `"1d"`, `"2d 5h"` (#447 — multi-day sessions/cycles) +/// * `"1d"`, `"2d 5h"` (#447 — multi-day sessions) /// * `"1w"`, `"3w 2d"` (#447 — long-running automations) /// /// The output drops the secondary unit when it's zero, so `"1h"` diff --git a/crates/tui/src/tui/sidebar.rs b/crates/tui/src/tui/sidebar.rs index 89316271..9645bd8f 100644 --- a/crates/tui/src/tui/sidebar.rs +++ b/crates/tui/src/tui/sidebar.rs @@ -174,7 +174,6 @@ struct SidebarWorkSummary { goal_completed: bool, goal_started_at: Option, tokens_used: u32, - cycle_count: u32, checklist_completion_pct: u8, checklist_items: Vec, strategy_explanation: Option, @@ -194,7 +193,6 @@ impl SidebarWorkSummary { self.goal_objective .as_deref() .is_some_and(|s| !s.trim().is_empty()) - || self.cycle_count > 0 || !self.checklist_items.is_empty() || self.has_strategy() || self.state_updating @@ -235,7 +233,6 @@ fn sidebar_work_summary(app: &App) -> SidebarWorkSummary { goal_completed: app.hunt.verdict == HuntVerdict::Hunted, goal_started_at: app.hunt.started_at, tokens_used: app.session.total_conversation_tokens, - cycle_count: app.cycle_count, ..SidebarWorkSummary::default() }; @@ -303,17 +300,6 @@ fn work_panel_lines( push_work_checklist_lines(summary, content_width, max_rows, &mut lines, ui_theme); push_work_strategy_lines(summary, content_width, max_rows, &mut lines, &theme); - if summary.cycle_count > 0 && lines.len() < max_rows { - lines.push(Line::from(Span::styled( - format!( - "cycles: {} (active: {})", - summary.cycle_count, - summary.cycle_count.saturating_add(1) - ), - Style::default().fg(ui_theme.text_muted), - ))); - } - if lines.is_empty() { lines.push(Line::from(Span::styled( work_panel_empty_hint(content_width), @@ -1854,18 +1840,6 @@ fn render_context_panel(f: &mut Frame, area: Rect, app: &mut App) { Style::default().fg(theme.text_muted), ))); - // ── Cycles ─────────────────────────────────────────────────── - if app.cycle_count > 0 { - lines.push(Line::from(Span::styled( - format!( - "cycles: {} crossed, {} briefing(s)", - app.cycle_count, - app.cycle_briefings.len() - ), - Style::default().fg(theme.text_muted), - ))); - } - // ── Memory ─────────────────────────────────────────────────── if app.use_memory { let size_hint = std::fs::metadata(&app.memory_path) diff --git a/crates/tui/src/tui/ui.rs b/crates/tui/src/tui/ui.rs index d0082c31..81b3d92a 100644 --- a/crates/tui/src/tui/ui.rs +++ b/crates/tui/src/tui/ui.rs @@ -40,7 +40,7 @@ use crate::client::{ inspect_prompt_for_request, }; use crate::commands; -use crate::compaction::{MINIMUM_AUTO_COMPACTION_TOKENS, estimate_input_tokens_conservative}; +use crate::compaction::estimate_input_tokens_conservative; use crate::config::{ ApiProvider, Config, DEFAULT_NVIDIA_NIM_BASE_URL, ProviderConfig, ProvidersConfig, StatusItem, UpdateConfig, save_provider_auth_mode_for, @@ -751,7 +751,6 @@ fn build_engine_config(app: &App, config: &Config) -> EngineConfig { max_subagents: app.max_subagents, features: config.features(), compaction: app.compaction_config(), - cycle: app.cycle_config(), capacity: crate::core::capacity::CapacityControllerConfig::from_app_config(config), todos: app.todos.clone(), plan_state: app.plan_state.clone(), @@ -858,6 +857,7 @@ async fn refresh_active_task_panel(app: &mut App, task_manager: &SharedTaskManag .map(task_summary_to_panel_entry) .collect(); + entries.extend(active_reasoning_task_entries(app)); entries.extend(active_rlm_task_entries(app)); if let Some(shell_mgr) = app.runtime_services.shell_manager.as_ref() @@ -879,6 +879,32 @@ async fn refresh_active_task_panel(app: &mut App, task_manager: &SharedTaskManag app.task_panel = entries; } +fn active_reasoning_task_entries(app: &App) -> Vec { + let Some(active) = app.active_cell.as_ref() else { + return Vec::new(); + }; + let duration_ms = app + .turn_started_at + .map(|started| u64::try_from(started.elapsed().as_millis()).unwrap_or(u64::MAX)); + + active + .entries() + .iter() + .enumerate() + .filter_map(|(idx, entry)| match entry { + HistoryCell::Thinking { + streaming: true, .. + } => Some(TaskPanelEntry { + id: format!("reasoning-{}", idx + 1), + status: "running".to_string(), + prompt_summary: "model reasoning".to_string(), + duration_ms, + }), + _ => None, + }) + .collect() +} + fn active_rlm_task_entries(app: &App) -> Vec { let Some(active) = app.active_cell.as_ref() else { return Vec::new(); @@ -1150,7 +1176,18 @@ async fn run_event_loop( let mut queued_to_send: Option = None; { let mut rx = engine_handle.rx_event.write().await; - while let Ok(event) = rx.try_recv() { + loop { + let event = match rx.try_recv() { + Ok(event) => event, + Err(tokio::sync::mpsc::error::TryRecvError::Empty) => break, + Err(tokio::sync::mpsc::error::TryRecvError::Disconnected) => { + if recover_engine_event_disconnect(app) { + received_engine_event = true; + transcript_batch_updated = true; + } + break; + } + }; received_engine_event = true; if app.suppress_stream_events_until_turn_complete { if matches!(event, EngineEvent::TurnStarted { .. }) { @@ -1884,21 +1921,6 @@ async fn run_event_loop( app.is_purging = false; app.status_message = Some(message); } - EngineEvent::CycleAdvanced { from, to, briefing } => { - // Mirror the engine-side counter on the UI app state - // so the sidebar / slash commands stay in sync, and - // record the briefing so `/cycle ` can show it. - app.cycle_count = to; - let briefing_tokens = briefing.token_estimate; - app.cycle_briefings.push(briefing); - let separator = format!( - "─── cycle {from} → {to} (briefing: {briefing_tokens} tokens) ───" - ); - app.add_message(HistoryCell::System { content: separator }); - app.status_message = Some(format!( - "↻ context refreshed (cycle {from} → {to}, briefing: {briefing_tokens} tokens carried)" - )); - } EngineEvent::CoherenceState { state, .. } => { app.coherence_state = state; } @@ -4205,6 +4227,55 @@ fn reconcile_turn_liveness(app: &mut App, now: Instant, has_running_agents: bool false } +fn recover_engine_event_disconnect(app: &mut App) -> bool { + let had_live_work = app.is_loading + || app.is_compacting + || app.is_purging + || matches!(app.runtime_turn_status.as_deref(), Some("in_progress")) + || app.streaming_message_index.is_some() + || app.streaming_thinking_active_entry.is_some() + || app + .active_cell + .as_ref() + .is_some_and(|cell| !cell.is_empty()); + + if !had_live_work { + return false; + } + + streaming_thinking::finalize_current(app); + app.finalize_streaming_assistant_as_interrupted(); + app.finalize_active_cell_as_interrupted(); + app.streaming_state.reset(); + app.streaming_message_index = None; + app.streaming_thinking_active_entry = None; + app.is_loading = false; + app.is_compacting = false; + app.is_purging = false; + app.turn_started_at = None; + app.turn_last_activity_at = None; + app.runtime_turn_status = None; + app.runtime_turn_id = None; + app.dispatch_started_at = None; + app.user_scrolled_during_stream = false; + + for msg in app.drain_pending_steers() { + app.queue_message(msg); + } + + app.add_message(HistoryCell::Error { + message: "Engine stopped before completing the turn. Check ~/.codewhale/crashes and retry." + .to_string(), + severity: crate::error_taxonomy::ErrorSeverity::Error, + }); + app.push_status_toast( + "Engine stopped before completing the turn.", + StatusToastLevel::Error, + None, + ); + true +} + fn record_turn_activity(app: &mut App, event: &EngineEvent, now: Instant) { if matches!(event, EngineEvent::TurnStarted { .. }) { app.turn_last_activity_at = Some(now); @@ -4916,8 +4987,10 @@ async fn drain_web_config_events( /// a one-line status describing what changed. async fn apply_model_picker_choice( app: &mut App, - engine_handle: &EngineHandle, + engine_handle: &mut EngineHandle, + config: &mut Config, model: String, + target_provider: Option, mut effort: crate::tui::app::ReasoningEffort, previous_model: String, previous_effort: crate::tui::app::ReasoningEffort, @@ -4926,6 +4999,25 @@ async fn apply_model_picker_choice( if model_is_auto { effort = ReasoningEffort::Auto; } + if let Some(target_provider) = target_provider + && target_provider != app.api_provider + && !model_is_auto + { + switch_provider( + app, + engine_handle, + config, + target_provider, + Some(model.clone()), + ) + .await; + if app.api_provider != target_provider { + return; + } + apply_picker_effort_choice(app, engine_handle, effort, previous_effort).await; + return; + } + let model_changed = model != previous_model || app.auto_model != model_is_auto; let effort_changed = effort != previous_effort; if !model_changed && !effort_changed { @@ -4938,6 +5030,8 @@ async fn apply_model_picker_choice( if model_changed { app.set_model_selection(model.clone()); + app.provider_models + .insert(app.api_provider.as_str().to_string(), model.clone()); app.clear_model_scoped_telemetry(); } if effort_changed { @@ -4954,7 +5048,12 @@ async fn apply_model_picker_choice( let persist_result = (|| -> anyhow::Result<()> { let mut settings = crate::settings::Settings::load()?; if model_changed { - settings.set("default_model", &model)?; + if matches!( + app.api_provider, + ApiProvider::Deepseek | ApiProvider::DeepseekCN + ) { + settings.set("default_model", &model)?; + } settings.set_model_for_provider(app.api_provider.as_str(), &model); } if effort_changed { @@ -5001,6 +5100,42 @@ async fn apply_model_picker_choice( app.status_message = Some(summary); } +async fn apply_picker_effort_choice( + app: &mut App, + engine_handle: &EngineHandle, + effort: ReasoningEffort, + previous_effort: ReasoningEffort, +) { + if effort == previous_effort { + return; + } + + app.reasoning_effort = effort; + app.last_effective_reasoning_effort = None; + app.update_model_compaction_budget(); + + let persist_warning = (|| -> anyhow::Result<()> { + let mut settings = crate::settings::Settings::load()?; + settings.set("reasoning_effort", effort.as_setting())?; + settings.save() + })() + .err() + .map(|err| format!(" (not persisted: {err})")); + + apply_model_and_compaction_update(engine_handle, app.compaction_config(), app.mode).await; + + let mut summary = format!( + "Thinking: {} → {} · model {}", + previous_effort.short_label(), + effort.short_label(), + app.model_display_label() + ); + if let Some(warning) = persist_warning { + summary.push_str(&warning); + } + app.status_message = Some(summary); +} + /// Apply a `/provider` switch by mutating the in-memory config, validating /// that credentials exist for the new provider, then respawning the engine /// so the API client picks up the new base URL/key. When `model_override` @@ -5018,6 +5153,7 @@ async fn switch_provider( let previous_provider_str = config.provider.clone(); let previous_base_url = config.base_url.clone(); let previous_default_text_model = config.default_text_model.clone(); + let previous_providers = config.providers.clone(); config.provider = Some(target.as_str().to_string()); if matches!(target, ApiProvider::NvidiaNim) @@ -5029,23 +5165,24 @@ async fn switch_provider( { config.base_url = Some(DEFAULT_NVIDIA_NIM_BASE_URL.to_string()); } - if matches!(target, ApiProvider::Deepseek) + if matches!(target, ApiProvider::Deepseek | ApiProvider::DeepseekCN) && config .base_url .as_deref() - .map(|base| base.contains("integrate.api.nvidia.com")) + .map(root_base_url_belongs_to_non_deepseek_provider) .unwrap_or(false) { config.base_url = None; } if let Some(ref model) = model_override { - config.default_text_model = Some(model.clone()); + config.provider_config_for_mut(target).model = Some(model.clone()); } if let Err(err) = DeepSeekClient::new(config) { config.provider = previous_provider_str; config.base_url = previous_base_url; config.default_text_model = previous_default_text_model; + config.providers = previous_providers; app.add_message(HistoryCell::System { content: format!( "Failed to switch provider to {}: {err}\nProvider unchanged ({}).", @@ -5061,6 +5198,10 @@ async fn switch_provider( app.api_provider = target; app.model_ids_passthrough = config.model_ids_pass_through(); app.set_model_selection(new_model.clone()); + if model_override.is_some() { + app.provider_models + .insert(target.as_str().to_string(), new_model.clone()); + } app.update_model_compaction_budget(); if cache_scope_changed { app.clear_model_scoped_telemetry(); @@ -5105,10 +5246,37 @@ async fn switch_provider( // Persist the provider choice so it survives restarts. if let Ok(mut settings) = crate::settings::Settings::load() { settings.default_provider = Some(target.as_str().to_string()); + if model_override.is_some() { + settings.set_model_for_provider(target.as_str(), &new_model); + if matches!(target, ApiProvider::Deepseek | ApiProvider::DeepseekCN) { + let _ = settings.set("default_model", &new_model); + } + } let _ = settings.save(); } } +fn root_base_url_belongs_to_non_deepseek_provider(base_url: &str) -> bool { + let lower = base_url.to_ascii_lowercase(); + [ + "integrate.api.nvidia.com", + "api.openai.com", + "api.atlascloud.ai", + "maas-openapi.wanjiedata.com", + "volces.com", + "openrouter.ai", + "xiaomimimo.com", + "novita.ai", + "fireworks.ai", + "siliconflow", + "arcee.ai", + "moonshot.ai", + "api.kimi.com", + ] + .iter() + .any(|needle| lower.contains(needle)) +} + fn sync_config_provider_from_app(config: &mut Config, app: &App) { config.provider = Some(app.api_provider.as_str().to_string()); } @@ -6388,6 +6556,10 @@ fn render(f: &mut Frame, app: &mut App) { sidebar_area = Some(split[1]); } + // Record the sidebar rect (or its absence) every frame so mouse + // hit-testing can route scroll events correctly. + app.viewport.last_sidebar_area = sidebar_area; + let chat_widget = ChatWidget::new(app, chat_area); let buf = f.buffer_mut(); chat_widget.render(chat_area, buf); @@ -6404,8 +6576,10 @@ fn render(f: &mut Frame, app: &mut App) { let x = mouse_col .saturating_add(2) .min(size.width.saturating_sub(text_width)); + // Sit one row BELOW the cursor so the tooltip never paints over + // the row above the hovered line (which read as corruption). let y = mouse_row - .saturating_sub(1) + .saturating_add(1) .min(size.height.saturating_sub(tooltip_height)); if text_width > 0 && tooltip_height > 0 { let tooltip_area = Rect { @@ -6414,10 +6588,12 @@ fn render(f: &mut Frame, app: &mut App) { width: text_width, height: tooltip_height, }; + // Neutral elevated-surface styling so the tooltip reads as a + // tooltip, not a warning highlight (was STATUS_WARNING). let tooltip = ratatui::widgets::Paragraph::new(tooltip_text.as_str()).style( Style::default() - .bg(palette::STATUS_WARNING) - .fg(palette::TEXT_MUTED), + .bg(palette::SURFACE_ELEVATED) + .fg(palette::TEXT_PRIMARY), ); f.render_widget(tooltip, tooltip_area); } @@ -6898,6 +7074,7 @@ async fn handle_view_events( } ViewEvent::ModelPickerApplied { model, + provider, effort, previous_model, previous_effort, @@ -6905,7 +7082,9 @@ async fn handle_view_events( apply_model_picker_choice( app, engine_handle, + config, model, + provider, effort, previous_model, previous_effort, @@ -7997,11 +8176,8 @@ fn maybe_warn_context_pressure(app: &mut App) { return; } - let below_auto_floor = used < MINIMUM_AUTO_COMPACTION_TOKENS as i64; let recommendation = if !app.auto_compact { "Consider enabling auto_compact or use /compact." - } else if below_auto_floor { - "Auto-compaction is enabled but waits for the 500K token floor." } else if percent >= configured_threshold { "Auto-compaction will run before the next send." } else { @@ -8032,10 +8208,7 @@ fn should_auto_compact_before_send(app: &App) -> bool { return false; } context_usage_snapshot(app) - .map(|(used, _, pct)| { - used >= MINIMUM_AUTO_COMPACTION_TOKENS as i64 - && pct >= app.auto_compact_threshold_percent.clamp(10.0, 100.0) - }) + .map(|(_, _, pct)| pct >= app.auto_compact_threshold_percent.clamp(10.0, 100.0)) .unwrap_or(false) } diff --git a/crates/tui/src/tui/ui/tests.rs b/crates/tui/src/tui/ui/tests.rs index 04d1dc4c..c1a89cf8 100644 --- a/crates/tui/src/tui/ui/tests.rs +++ b/crates/tui/src/tui/ui/tests.rs @@ -2104,7 +2104,7 @@ async fn provider_switch_clears_turn_cache_history() { // serialize with other tests that mutate the same env vars. // Wrap the lock inside a guard struct so clippy's // `await_holding_lock` doesn't fire on the `.await` below; the - // pattern matches `tools::recall_archive::HomeGuard`. + // pattern matches other tests that guard HOME / USERPROFILE mutations. struct HomeGuard { _tmp: tempfile::TempDir, prev_home: Option, @@ -2197,6 +2197,95 @@ async fn provider_switch_to_deepseek_canonicalizes_openrouter_default_model() { assert_eq!(app.model, DEFAULT_TEXT_MODEL); } +#[tokio::test] +async fn provider_switch_to_deepseek_drops_stale_xiaomi_root_base_url() { + let _home = SettingsHomeGuard::new(); + let mut app = create_test_app(); + app.api_provider = ApiProvider::XiaomiMimo; + app.model = "mimo-v2.5-pro".to_string(); + app.model_ids_passthrough = true; + let mut engine = mock_engine_handle(); + let mut config = Config { + provider: Some("xiaomi-mimo".to_string()), + api_key: Some("deepseek-key".to_string()), + base_url: Some("https://token-plan-sgp.xiaomimimo.com/v1".to_string()), + default_text_model: Some("mimo-v2.5-pro".to_string()), + providers: Some(ProvidersConfig { + xiaomi_mimo: ProviderConfig { + api_key: Some("mimo-key".to_string()), + model: Some("mimo-v2.5-pro".to_string()), + ..Default::default() + }, + ..Default::default() + }), + ..Default::default() + }; + + switch_provider( + &mut app, + &mut engine.handle, + &mut config, + ApiProvider::Deepseek, + None, + ) + .await; + + assert_eq!(app.api_provider, ApiProvider::Deepseek); + assert!(!app.model_ids_passthrough); + assert_eq!(app.model, DEFAULT_TEXT_MODEL); + assert_eq!(config.provider.as_deref(), Some("deepseek")); + assert_eq!(config.base_url, None); +} + +#[tokio::test] +async fn provider_switch_model_override_updates_target_provider_model_slot() { + let _home = SettingsHomeGuard::new(); + let mut app = create_test_app(); + app.api_provider = ApiProvider::XiaomiMimo; + app.model = "mimo-v2.5-pro".to_string(); + let mut engine = mock_engine_handle(); + let mut config = Config { + provider: Some("xiaomi-mimo".to_string()), + api_key: Some("deepseek-key".to_string()), + default_text_model: Some("mimo-v2.5-pro".to_string()), + providers: Some(ProvidersConfig { + xiaomi_mimo: ProviderConfig { + api_key: Some("mimo-key".to_string()), + model: Some("mimo-v2.5-pro".to_string()), + ..Default::default() + }, + ..Default::default() + }), + ..Default::default() + }; + + switch_provider( + &mut app, + &mut engine.handle, + &mut config, + ApiProvider::Deepseek, + Some("deepseek-v4-flash".to_string()), + ) + .await; + + assert_eq!(app.api_provider, ApiProvider::Deepseek); + assert_eq!(app.model, "deepseek-v4-flash"); + assert_eq!( + config + .providers + .as_ref() + .and_then(|providers| providers.deepseek.model.as_deref()), + Some("deepseek-v4-flash") + ); + assert_eq!( + config + .providers + .as_ref() + .and_then(|providers| providers.xiaomi_mimo.model.as_deref()), + Some("mimo-v2.5-pro") + ); +} + #[tokio::test] async fn provider_switch_to_openrouter_canonicalizes_deepseek_default_model() { let _home = SettingsHomeGuard::new(); @@ -2607,6 +2696,62 @@ fn turn_liveness_recovers_stalled_in_progress_turn() { assert!(toast.text.contains("Turn stalled")); } +#[test] +fn engine_event_disconnect_recovers_live_turn_immediately() { + let mut app = create_test_app(); + app.is_loading = true; + app.runtime_turn_status = Some("in_progress".to_string()); + app.runtime_turn_id = Some("turn_dead".to_string()); + app.turn_started_at = Some(Instant::now()); + app.dispatch_started_at = Some(Instant::now()); + app.user_scrolled_during_stream = true; + let thinking_idx = crate::tui::streaming_thinking::ensure_active_entry(&mut app); + crate::tui::streaming_thinking::append(&mut app, thinking_idx, "partial reasoning"); + app.push_pending_steer(crate::tui::app::QueuedMessage::new( + "please continue after recovery".to_string(), + None, + )); + + let recovered = recover_engine_event_disconnect(&mut app); + + assert!(recovered); + assert!(!app.is_loading); + assert!(app.runtime_turn_status.is_none()); + assert!(app.runtime_turn_id.is_none()); + assert!(app.dispatch_started_at.is_none()); + assert!(app.turn_started_at.is_none()); + assert!(app.streaming_thinking_active_entry.is_none()); + assert!(!app.user_scrolled_during_stream); + assert_eq!(app.queued_message_count(), 1); + assert_eq!( + app.queued_messages + .front() + .map(crate::tui::app::QueuedMessage::content), + Some("please continue after recovery".to_string()) + ); + assert!( + app.history.iter().any(|cell| matches!( + cell, + HistoryCell::Error { message, .. } + if message.contains("Engine stopped before completing the turn") + )), + "disconnect recovery should add a visible transcript error" + ); + let toast = app.status_toasts.back().expect("disconnect toast"); + assert_eq!(toast.level, StatusToastLevel::Error); +} + +#[test] +fn engine_event_disconnect_while_idle_is_noop() { + let mut app = create_test_app(); + + let recovered = recover_engine_event_disconnect(&mut app); + + assert!(!recovered); + assert!(app.history.is_empty()); + assert!(app.status_toasts.is_empty()); +} + #[test] fn fixed_model_auto_thinking_skips_auto_model_router() { let mut app = create_test_app(); @@ -3480,8 +3625,8 @@ fn should_auto_compact_before_send_respects_threshold_and_setting() { app.auto_compact = false; assert!(!should_auto_compact_before_send(&app)); - // Small estimated context + auto_compact ON → does NOT trigger, - // regardless of what `last_prompt_tokens` reports. This matches the + // Small estimated context + auto_compact ON can trigger once the + // configured percent threshold is crossed. This still matches the // #115 fix: the estimate is the primary signal, not the engine's // turn-cumulative reported value (which used to rule the displayed // % and could spuriously trigger / suppress auto-compact). @@ -3490,12 +3635,12 @@ fn should_auto_compact_before_send_respects_threshold_and_setting() { app.auto_compact_threshold_percent = 10.0; app.session.last_prompt_tokens = Some(10_000); let (used, _, percent) = - context_usage_snapshot(&app).expect("floor context snapshot should be available"); + context_usage_snapshot(&app).expect("context snapshot should be available"); assert!( - used < crate::compaction::MINIMUM_AUTO_COMPACTION_TOKENS as i64 && percent >= 10.0, - "test fixture should cross percent threshold but stay below the 500K floor; used={used} percent={percent:.2}" + used > 0 && percent >= 10.0, + "test fixture should cross percent threshold; used={used} percent={percent:.2}" ); - assert!(!should_auto_compact_before_send(&app)); + assert!(should_auto_compact_before_send(&app)); } #[test] @@ -4672,6 +4817,26 @@ fn active_rlm_task_entries_surface_foreground_rlm_work() { assert!(entries[0].duration_ms.unwrap_or_default() >= 3000); } +#[test] +fn active_reasoning_task_entries_surface_reasoning_only_turns() { + let mut app = create_test_app(); + app.turn_started_at = Some(Instant::now() - Duration::from_secs(2)); + let mut active = ActiveCell::new(); + active.push_thinking(HistoryCell::Thinking { + content: "reasoning text".to_string(), + streaming: true, + duration_secs: None, + }); + app.active_cell = Some(active); + + let entries = active_reasoning_task_entries(&app); + assert_eq!(entries.len(), 1); + assert_eq!(entries[0].id, "reasoning-1"); + assert_eq!(entries[0].status, "running"); + assert_eq!(entries[0].prompt_summary, "model reasoning"); + assert!(entries[0].duration_ms.unwrap_or_default() >= 2000); +} + #[test] fn alt_nav_modifiers_require_alt_and_exclude_ctrl_super() { // v0.8.30 — transcript-nav shortcuts (`Alt+[`, `Alt+]`, etc.) require @@ -5311,12 +5476,15 @@ async fn model_picker_persists_model_and_reasoning_effort() { let mut app = create_test_app(); app.set_model_selection("auto".to_string()); app.reasoning_effort = ReasoningEffort::Auto; - let engine = mock_engine_handle(); + let mut engine = mock_engine_handle(); + let mut config = Config::default(); apply_model_picker_choice( &mut app, - &engine.handle, + &mut engine.handle, + &mut config, "deepseek-v4-pro".to_string(), + None, ReasoningEffort::High, "auto".to_string(), ReasoningEffort::Auto, diff --git a/crates/tui/src/tui/views/mod.rs b/crates/tui/src/tui/views/mod.rs index 7cc4c11b..cd11a102 100644 --- a/crates/tui/src/tui/views/mod.rs +++ b/crates/tui/src/tui/views/mod.rs @@ -140,6 +140,7 @@ pub enum ViewEvent { /// nothing changed and craft a clear status message. ModelPickerApplied { model: String, + provider: Option, effort: crate::tui::app::ReasoningEffort, previous_model: String, previous_effort: crate::tui::app::ReasoningEffort, diff --git a/crates/tui/src/vision/tools.rs b/crates/tui/src/vision/tools.rs index 41cc41c1..f80ad396 100644 --- a/crates/tui/src/vision/tools.rs +++ b/crates/tui/src/vision/tools.rs @@ -8,7 +8,7 @@ use base64::{Engine as _, engine::general_purpose::STANDARD as BASE64}; use serde_json::{Value, json}; use crate::config::VisionModelConfig; -use crate::llm_client::{LlmError, RetryConfig, with_retry}; +use crate::llm_client::{LlmError, RetryConfig, sanitize_http_error_body, with_retry}; use crate::tools::spec::{ ToolCapability, ToolContext, ToolError, ToolResult, ToolSpec, required_str, }; @@ -213,6 +213,11 @@ impl ToolSpec for ImageAnalyzeTool { .text() .await .unwrap_or_else(|_| "Unknown error".to_string()); + let error_text = sanitize_http_error_body( + Some("Vision provider"), + status.as_u16(), + &error_text, + ); return Err(LlmError::from_http_response(status.as_u16(), &error_text)); } Ok(response) diff --git a/crates/tui/tests/integration_mock_llm.rs b/crates/tui/tests/integration_mock_llm.rs index 94796862..ee447420 100644 --- a/crates/tui/tests/integration_mock_llm.rs +++ b/crates/tui/tests/integration_mock_llm.rs @@ -548,8 +548,7 @@ fn compaction_config_defaults_are_enabled_for_session_survivability() { // This test is a smoke check that the defaults compile and are correct. // The production `CompactionConfig::default()` is exercised by // `compaction::tests::should_compact_respects_enabled_flag` etc. - let config = - crate::models::compaction_threshold_for_model_and_effort("deepseek-v4-pro", Some("high")); + let config = crate::models::compaction_threshold_for_model_at_percent("deepseek-v4-pro", 80.0); // Verify the threshold is reasonable (> 0 and < context window). assert!(config > 0, "compaction threshold must be positive"); assert!(config < 1_000_000, "compaction threshold must be below 1M"); diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index c5de80eb..0e05bd47 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -121,9 +121,12 @@ SiliconFlow defaults to `https://api.siliconflow.com/v1`, accepts `https://api.siliconflow.cn/v1` can still be configured explicitly when a user needs the regional endpoint. Arcee AI defaults to `https://api.arcee.ai/api/v1`, accepts `ARCEE_BASE_URL`, -and uses `trinity-mini` by default. `trinity-large-preview` is also listed as a -direct Arcee API model; OpenRouter's `arcee-ai/trinity-large-thinking` remains -an OpenRouter model ID, not the direct Arcee default. +and uses `trinity-large-thinking` by default for CodeWhale agent work. +`trinity-large-preview` is also listed as a direct Arcee API model; OpenRouter's +`arcee-ai/trinity-large-thinking` remains the OpenRouter namespaced form, while +the direct Arcee provider uses the bare `trinity-large-thinking` ID. Direct +Arcee large-model API calls are tracked as 256K-context BF16 serving; Thinking +is reasoning-capable, while Preview is not marked as a thinking model. ### Custom OpenAI-Compatible Gateways @@ -509,12 +512,12 @@ codewhale also stores user preferences in: - `~/.deepseek/settings.toml` or the legacy platform config-dir `deepseek/settings.toml` when an existing settings file is present -Notable settings include `auto_compact` (default `false`), which opts into -replacement-style summarization before the active model limit. The trigger -defaults to `auto_compact_threshold_percent = 70`, but the 500K-token floor -still blocks early compaction. The default V4 path preserves the stable message -prefix for cache reuse; use manual `/compact` / Ctrl+L or enable -`auto_compact` only when you explicitly want automatic replacement compaction. +Notable settings include `auto_compact` (default `false` for 1M-class models, +model-aware default-on for 256K-class models), which opts into replacement-style +summarization before the active model limit. The trigger defaults to +`auto_compact_threshold_percent = 80`. The default V4 path preserves the stable +message prefix for cache reuse; use manual `/compact` / Ctrl+L or enable +`auto_compact` when you explicitly want automatic replacement compaction. You can inspect or update these from the TUI with `/settings` and `/config` (interactive editor). @@ -527,7 +530,7 @@ Common settings keys: community presets apply across the TUI. Aliases such as `whale`, `mono`, `black-white`, `tokyonight`, and `gruvbox` are accepted. - `auto_compact` (on/off, default off) -- `auto_compact_threshold_percent` (10-100, default `70`): pre-send +- `auto_compact_threshold_percent` (10-100, default `80`): pre-send auto-compaction threshold used only when `auto_compact` is enabled. - `paste_burst_detection` (on/off, default on): fallback rapid-key paste detection for terminals that do not emit bracketed-paste events. This is @@ -593,18 +596,19 @@ separate: | Quantity | Meaning | Allowed to drive | |---|---|---| -| Active request input estimate | Conservative estimate of the next request's live system prompt and transcript payload. | Header/footer context percent, hard-cycle trigger, opt-in Flash seam trigger, and emergency overflow preflight. | -| Reserved response headroom | The internal turn budget plus safety headroom. v0.8.16 keeps normal turns at `262144` reserved output tokens and adds `1024` safety tokens for context-window checks, even though V4 capability metadata reports the official `384000` max output. | Hard-cycle and emergency overflow budget checks only. | +| Active request input estimate | Conservative estimate of the next request's live system prompt and transcript payload. | Header/footer context percent, auto-compaction trigger, opt-in Flash seam trigger, and emergency overflow preflight. | +| Reserved response headroom | The internal turn budget plus safety headroom. v0.8.16 keeps normal turns at `262144` reserved output tokens and adds `1024` safety tokens for context-window checks, even though V4 capability metadata reports the official `384000` max output. | Emergency overflow budget checks only. | | Cumulative API usage | Provider-reported input plus output tokens summed across completed API calls; multi-tool turns may count the same stable prefix more than once. | Session usage and approximate cost telemetry only. | -| Prompt cache hit/miss | Provider cache telemetry for the most recent call when available. | Cache-hit display and cost estimation only; never compaction, seam, or cycle triggers. | +| Prompt cache hit/miss | Provider cache telemetry for the most recent call when available. | Cache-hit display and cost estimation only; never compaction or seam triggers. | | Context percent | Active request input estimate divided by the model context window. | Display only; it mirrors the active-input basis used by context safeguards. | | Cost estimate | Approximate spend from provider usage and configured DeepSeek rates. | Display only. | -For the default V4 path, hard cycles fire when active input reaches the smaller -of the configured cycle threshold (`768000`) and the model window minus reserved -response headroom. Replacement compaction remains opt-in (`auto_compact = false` -by default), the Flash seam manager remains opt-in (`[context].enabled = false`), -and the capacity controller remains disabled unless configured. +For the default V4 path, replacement compaction remains opt-in +(`auto_compact = false` by default) and fires at the active model's +compaction threshold when enabled. For 256K-class models, auto-compaction is +enabled by default unless the user explicitly configures `auto_compact`. The +Flash seam manager remains opt-in (`[context].enabled = false`), and the +capacity controller remains disabled unless configured. ### Command Migration Notes @@ -626,7 +630,7 @@ If you are upgrading from older releases: - `provider` (string, optional): `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `arcee`, `moonshot`, `sglang`, `vllm`, or `ollama`. Legacy `deepseek-cn` configs are still accepted as an alias for `deepseek`; DeepSeek uses the same official host [`https://api.deepseek.com`](https://api-docs.deepseek.com/) worldwide. `nvidia-nim` targets NVIDIA's NIM-hosted DeepSeek endpoints through `https://integrate.api.nvidia.com/v1`; `openai` targets a generic OpenAI-compatible endpoint, defaulting to `https://api.openai.com/v1`; `atlascloud` targets AtlasCloud's OpenAI-compatible endpoint at `https://api.atlascloud.ai/v1`; `wanjie-ark` targets Wanjie Ark's OpenAI-compatible endpoint at `https://maas-openapi.wanjiedata.com/api/v1`; `volcengine` targets Volcengine Ark's OpenAI-compatible coding endpoint at `https://ark.cn-beijing.volces.com/api/coding/v3`; `openrouter` targets `https://openrouter.ai/api/v1`; `xiaomi-mimo` targets Xiaomi MiMo's OpenAI-compatible endpoint at `https://api.xiaomimimo.com/v1`; `novita` targets `https://api.novita.ai/v1`; `fireworks` targets `https://api.fireworks.ai/inference/v1`; `siliconflow` targets SiliconFlow, defaulting to `https://api.siliconflow.com/v1`; `arcee` targets Arcee AI's OpenAI-compatible endpoint at `https://api.arcee.ai/api/v1`; `moonshot` targets Moonshot/Kimi, defaulting to `https://api.moonshot.ai/v1`; `sglang` targets a self-hosted OpenAI-compatible endpoint, defaulting to `http://localhost:30000/v1`; `vllm` targets a self-hosted vLLM OpenAI-compatible endpoint, defaulting to `http://localhost:8000/v1`; `ollama` targets Ollama's OpenAI-compatible endpoint, defaulting to `http://localhost:11434/v1`. - `api_key` (string, required for hosted providers): must be non-empty for DeepSeek/hosted providers (or set the provider API key env var). Self-hosted SGLang, vLLM, and Ollama can omit it. - `base_url` (string, optional): defaults to `https://api.deepseek.com/beta` for DeepSeek's OpenAI-compatible Chat Completions API, including legacy `provider = "deepseek-cn"` configs. Other defaults are `https://integrate.api.nvidia.com/v1` for `nvidia-nim`, `https://api.openai.com/v1` for `openai`, `https://api.atlascloud.ai/v1` for `atlascloud`, `https://maas-openapi.wanjiedata.com/api/v1` for `wanjie-ark`, `https://ark.cn-beijing.volces.com/api/coding/v3` for `volcengine`, `https://openrouter.ai/api/v1` for `openrouter`, `https://api.xiaomimimo.com/v1` for `xiaomi-mimo`, `https://api.novita.ai/v1` for `novita`, `https://api.fireworks.ai/inference/v1` for `fireworks`, `https://api.siliconflow.com/v1` for `siliconflow`, `https://api.arcee.ai/api/v1` for `arcee`, `https://api.moonshot.ai/v1` for `moonshot`, `http://localhost:30000/v1` for `sglang`, `http://localhost:8000/v1` for `vllm`, and `http://localhost:11434/v1` for `ollama`. Set `https://api.deepseek.com` or `https://api.deepseek.com/v1` explicitly to opt out of DeepSeek beta features. -- `default_text_model` (string, optional): defaults to `deepseek-v4-pro` for DeepSeek and generic OpenAI-compatible endpoints, `deepseek-ai/deepseek-v4-pro` for NVIDIA NIM, `deepseek-ai/deepseek-v4-flash` for AtlasCloud, `deepseek-reasoner` for Wanjie Ark, `DeepSeek-V4-Pro` for Volcengine Ark, `deepseek/deepseek-v4-pro` for OpenRouter and Novita, `mimo-v2.5-pro` for Xiaomi MiMo, `accounts/fireworks/models/deepseek-v4-pro` for Fireworks, `deepseek-ai/DeepSeek-V4-Pro` for SiliconFlow, `trinity-mini` for Arcee AI, `kimi-k2.6` for Moonshot, `deepseek-ai/DeepSeek-V4-Pro` for SGLang/vLLM, and `deepseek-coder:1.3b` for Ollama. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows, 384K max output, and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash` until July 24, 2026, except SiliconFlow maps `deepseek-reasoner` and `deepseek-r1` to its Pro model while `deepseek-chat` and `deepseek-v3` map to Flash. Provider-specific mappings translate `deepseek-v4-pro` / `deepseek-v4-flash` to each provider's model ID where supported. OpenRouter also recognizes recent large IDs such as `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-35b-a3b`, `google/gemma-4-31b-it`, and `moonshotai/kimi-k2.6`; direct Arcee uses bare IDs such as `trinity-mini` and `trinity-large-preview`. Generic `openai`, `atlascloud`, `wanjie-ark`, `xiaomi-mimo`, `arcee`, and Ollama model IDs are passed through unchanged after known aliases are normalized. OpenRouter and SiliconFlow provider configs with a custom `base_url` also preserve explicit model values, which lets OpenAI-compatible gateways accept bare model IDs. Use `/models` or `codewhale models` to discover live IDs from your configured endpoint. `CODEWHALE_MODEL` overrides this for a single process; `DEEPSEEK_MODEL` is the legacy alias. +- `default_text_model` (string, optional): defaults to `deepseek-v4-pro` for DeepSeek and generic OpenAI-compatible endpoints, `deepseek-ai/deepseek-v4-pro` for NVIDIA NIM, `deepseek-ai/deepseek-v4-flash` for AtlasCloud, `deepseek-reasoner` for Wanjie Ark, `DeepSeek-V4-Pro` for Volcengine Ark, `deepseek/deepseek-v4-pro` for OpenRouter and Novita, `mimo-v2.5-pro` for Xiaomi MiMo, `accounts/fireworks/models/deepseek-v4-pro` for Fireworks, `deepseek-ai/DeepSeek-V4-Pro` for SiliconFlow, `trinity-large-thinking` for Arcee AI, `kimi-k2.6` for Moonshot, `deepseek-ai/DeepSeek-V4-Pro` for SGLang/vLLM, and `deepseek-coder:1.3b` for Ollama. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows, 384K max output, and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash` until July 24, 2026, except SiliconFlow maps `deepseek-reasoner` and `deepseek-r1` to its Pro model while `deepseek-chat` and `deepseek-v3` map to Flash. Provider-specific mappings translate `deepseek-v4-pro` / `deepseek-v4-flash` to each provider's model ID where supported. OpenRouter also recognizes recent large IDs such as `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, and `moonshotai/kimi-k2.6`; direct Arcee uses bare IDs such as `trinity-large-thinking` and `trinity-large-preview`; direct Xiaomi MiMo recognizes chat IDs `mimo-v2.5-pro` and `mimo-v2.5`, while TTS IDs are selected through `codewhale speech` / `tts`. Generic `openai`, `atlascloud`, `wanjie-ark`, `xiaomi-mimo`, `arcee`, and Ollama model IDs are passed through unchanged after known aliases are normalized. OpenRouter and SiliconFlow provider configs with a custom `base_url` also preserve explicit model values, which lets OpenAI-compatible gateways accept bare model IDs. Use `/models` or `codewhale models` to discover live IDs from your configured endpoint. `CODEWHALE_MODEL` overrides this for a single process; `DEEPSEEK_MODEL` is the legacy alias. - `reasoning_effort` (string, optional): `off`, `low`, `medium`, `high`, or `max`; defaults to the configured UI tier. DeepSeek Platform receives top-level `thinking` / `reasoning_effort` fields. NVIDIA NIM receives equivalent settings through `chat_template_kwargs`. - `allow_shell` (bool, optional): defaults to `false`; shell tools must be explicitly enabled. - `approval_policy` (string, optional): `on-request`, `untrusted`, or `never`. Runtime `approval_mode` editing in `/config` also accepts `on-request` and `untrusted` aliases. @@ -701,7 +705,6 @@ If you are upgrading from older releases: - `[context].l1_threshold` (int, default `192000`) - `[context].l2_threshold` (int, default `384000`) - `[context].l3_threshold` (int, default `576000`) - - `[context].cycle_threshold` (int, default `768000`) - `[context].seam_model` (string, default `deepseek-v4-flash`) - `retry.*` (optional): retry/backoff settings for API requests: - `[retry].enabled` (bool, default `true`) diff --git a/docs/LEGACY_RUST_AUDIT_0_7_6.md b/docs/LEGACY_RUST_AUDIT_0_7_6.md index e189ca91..2010ef67 100644 --- a/docs/LEGACY_RUST_AUDIT_0_7_6.md +++ b/docs/LEGACY_RUST_AUDIT_0_7_6.md @@ -10,7 +10,7 @@ This audit is deliberately non-destructive. No compatibility code is removed in |---|---|---|---|---|---|---| | Legacy MCP sync API (`McpServerInput`, `list`, `add`, `remove`, `call_tool`, `load_legacy`) | `crates/tui/src/mcp.rs` | Not wired into current `/mcp` command path; retained behind `#[allow(dead_code)]` | Direct Rust references and current MCP command path inspected; saved/config JSON compatibility still needs a dedicated smoke | Preserves old JSON shape including `mcpServers` alias and sync call helpers while the async MCP manager is the active path | Code TODO only | Gate behind an explicit legacy module or remove after CLI/runtime parity tests prove no caller uses it. Tracked by #218. | | Legacy prompt constants/functions (`AGENT_PROMPT`, `YOLO_PROMPT`, `PLAN_PROMPT`, `base_system_prompt`, `normal_system_prompt`, etc.) | `crates/tui/src/prompts.rs` | Tests and older callers that still import prompt constants directly | Direct Rust references remain; public-crate and older harness imports are not proven absent | Layered prompt API replaced monolithic prompts, but older call sites may still compile against constants | None | Keep for v0.7.6; add deprecation annotations only after internal callers are migrated. Tracked by #219. | -| `/compact` slash command positioning | `crates/tui/src/commands/mod.rs` | Public slash-command registry and help overlay | Public command registry/docs path still references it | Current cycle/seam policy prefers restart/cycle flows, but users may still run `/compact` manually | Description says legacy and points at cycle restart | Keep as a manual compatibility command; do not remove until context/token issues are resolved. | +| `/compact` slash command positioning | `crates/tui/src/commands/mod.rs` | Public slash-command registry and help overlay | Public command registry/docs path still references it | Users may still run `/compact` manually when they want an immediate replacement-style summary | Description is intentionally explicit about manual compaction | Keep as a manual compatibility command; do not remove until context/token issues are resolved. | | `todo_*` compatibility tools | `crates/tui/src/tools/todo.rs` | Tool registry/model calls that still use `todo_add`, `todo_update`, `todo_list`, `todo_write` | Tool registry compatibility and saved tool-call risk remain | `checklist_*` is canonical, but old tool names may appear in saved prompts, traces, or model priors | Metadata marks `compat_alias: true`; descriptions say compatibility alias | Add explicit deprecation metadata with target version, then remove only after tool-schema migration evidence. Tracked by #220. | | Deprecated sub-agent alias tools (`spawn_agent`, `send_input`, delegate aliases) | `crates/tui/src/tools/subagent/mod.rs` | Tool registry and model/tool-call compatibility | Tool registry compatibility and saved tool-call risk remain | Canonical names are `agent_spawn`, `agent_send_input`, etc.; alias names preserve older tool-call compatibility | `_deprecation` metadata and tracing warn; removal target is `v0.8.0` | Keep through v0.7.x; removal already has metadata. Tracked by #221. | | Legacy root/provider TOML `api_key` compatibility | `crates/tui/src/config.rs`, `crates/config/src/lib.rs` | Config resolver; users with existing `api_key` in config files | Public config loading and docs still mention migration behavior | Keyring migration is preferred, but breaking existing configs would block startup/auth | Tracing warnings point to `deepseek auth set` / `deepseek auth migrate` | Keep; warnings are user-actionable. Removal should wait for a migration command and release-note window. | diff --git a/docs/PROVIDERS.md b/docs/PROVIDERS.md index 000fe9a9..94c2f475 100644 --- a/docs/PROVIDERS.md +++ b/docs/PROVIDERS.md @@ -117,12 +117,12 @@ endpoint. | `atlascloud` | `[providers.atlascloud]` | `ATLASCLOUD_API_KEY` | `ATLASCLOUD_BASE_URL`; default `https://api.atlascloud.ai/v1` | Default `deepseek-ai/deepseek-v4-flash`; explicit `vendor/model-id` values pass through when AtlasCloud is selected | OpenAI-compatible hosted route. `ATLASCLOUD_MODEL` is accepted by the TUI config path, the static `ModelRegistry` keeps DeepSeek V4 fallback rows, and provider-hinted CLI model IDs are sent to AtlasCloud exactly as requested. | | `wanjie-ark` | `[providers.wanjie_ark]` | `WANJIE_ARK_API_KEY`, `WANJIE_API_KEY`, `WANJIE_MAAS_API_KEY` | `WANJIE_ARK_BASE_URL`, `WANJIE_BASE_URL`, `WANJIE_MAAS_BASE_URL`; default `https://maas-openapi.wanjiedata.com/api/v1` | `deepseek-reasoner` | OpenAI-compatible hosted route. `WANJIE_ARK_MODEL`, `WANJIE_MODEL`, and `WANJIE_MAAS_MODEL` are accepted. | | `volcengine` | `[providers.volcengine]` | `VOLCENGINE_API_KEY`, `VOLCENGINE_ARK_API_KEY`, `ARK_API_KEY` | `VOLCENGINE_BASE_URL`, `VOLCENGINE_ARK_BASE_URL`, `ARK_BASE_URL`; default `https://ark.cn-beijing.volces.com/api/coding/v3` | `DeepSeek-V4-Pro`, `DeepSeek-V4-Flash` | Volcengine/Volcano Engine Ark OpenAI-compatible coding endpoint. `VOLCENGINE_MODEL` and `VOLCENGINE_ARK_MODEL` are accepted. | -| `openrouter` | `[providers.openrouter]` | `OPENROUTER_API_KEY` | `OPENROUTER_BASE_URL`; default `https://openrouter.ai/api/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`; recent large IDs include `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-35b-a3b`, `google/gemma-4-31b-it`, `z-ai/glm-5.1`, `moonshotai/kimi-k2.6` | Additive open-model routing layer. It does not replace DeepSeek; it lets users route supported model IDs through OpenRouter when they choose it. | -| `xiaomi-mimo` | `[providers.xiaomi_mimo]` | `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, `MIMO_API_KEY` | `XIAOMI_MIMO_BASE_URL`, `MIMO_BASE_URL`; default `https://api.xiaomimimo.com/v1` | `mimo-v2.5-pro`, `mimo-v2.5`, `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | Xiaomi MiMo OpenAI-compatible chat completions route. It sends `max_completion_tokens` and uses MiMo's `thinking` field for reasoning control. `codewhale speech` / `tts` uses the TTS models. | +| `openrouter` | `[providers.openrouter]` | `OPENROUTER_API_KEY` | `OPENROUTER_BASE_URL`; default `https://openrouter.ai/api/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`; recent large IDs include `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, `z-ai/glm-5.1`, `moonshotai/kimi-k2.6` | Additive open-model routing layer. It does not replace DeepSeek; it lets users route supported model IDs through OpenRouter when they choose it. | +| `xiaomi-mimo` | `[providers.xiaomi_mimo]` | `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, `MIMO_API_KEY` | `XIAOMI_MIMO_BASE_URL`, `MIMO_BASE_URL`; default `https://api.xiaomimimo.com/v1` | Chat: `mimo-v2.5-pro`, `mimo-v2.5`; speech/TTS: `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | Xiaomi MiMo OpenAI-compatible chat completions route. It sends `max_completion_tokens` and uses MiMo's `thinking` field for reasoning control. `codewhale speech` / `tts` uses the TTS models. | | `novita` | `[providers.novita]` | `NOVITA_API_KEY` | `NOVITA_BASE_URL`; default `https://api.novita.ai/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash` | OpenAI-compatible hosted route for DeepSeek model IDs. Use config or `CODEWHALE_MODEL` / `DEEPSEEK_MODEL` for model overrides. | | `fireworks` | `[providers.fireworks]` | `FIREWORKS_API_KEY` | `FIREWORKS_BASE_URL`; default `https://api.fireworks.ai/inference/v1` | `accounts/fireworks/models/deepseek-v4-pro` | OpenAI-compatible hosted route. Use config or `CODEWHALE_MODEL` / `DEEPSEEK_MODEL` for model overrides. | | `siliconflow` | `[providers.siliconflow]` | `SILICONFLOW_API_KEY` | `SILICONFLOW_BASE_URL`; default `https://api.siliconflow.com/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | OpenAI-compatible hosted route. Official docs use the `.com` endpoint; users who need the regional endpoint can set `https://api.siliconflow.cn/v1` explicitly. `SILICONFLOW_MODEL` is accepted. Reasoning aliases `deepseek-reasoner` and `deepseek-r1` map to Pro; `deepseek-chat` and `deepseek-v3` map to Flash. | -| `arcee` | `[providers.arcee]` | `ARCEE_API_KEY` | `ARCEE_BASE_URL`; default `https://api.arcee.ai/api/v1` | `trinity-mini`, `trinity-large-preview` | Arcee AI direct OpenAI-compatible route. `ARCEE_MODEL` is accepted. OpenRouter's `arcee-ai/trinity-large-thinking` remains an OpenRouter model ID; direct Arcee uses bare model IDs such as `trinity-mini`. | +| `arcee` | `[providers.arcee]` | `ARCEE_API_KEY` | `ARCEE_BASE_URL`; default `https://api.arcee.ai/api/v1` | `trinity-large-thinking`, `trinity-large-preview` | Arcee AI direct OpenAI-compatible route, tracked as 256K-context BF16 serving. `ARCEE_MODEL` is accepted. OpenRouter's `arcee-ai/trinity-large-thinking` remains the OpenRouter namespaced model ID; direct Arcee uses the bare `trinity-large-thinking` ID. | | `moonshot` | `[providers.moonshot]` | `MOONSHOT_API_KEY`, `KIMI_API_KEY` | `MOONSHOT_BASE_URL`, `KIMI_BASE_URL`; default `https://api.moonshot.ai/v1` | `kimi-k2.6`; Kimi Code path uses `kimi-for-coding` at `https://api.kimi.com/coding/v1` | Moonshot/Kimi route. `MOONSHOT_MODEL`, `KIMI_MODEL_NAME`, and `KIMI_MODEL` are accepted. `[providers.moonshot] auth_mode = "kimi_oauth"` reads Kimi CLI OAuth credentials when present. | | `sglang` | `[providers.sglang]` | Optional `SGLANG_API_KEY` | `SGLANG_BASE_URL`; default `http://localhost:30000/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Self-hosted OpenAI-compatible route. Localhost deployments commonly omit auth. `SGLANG_MODEL` is accepted. | | `vllm` | `[providers.vllm]` | Optional `VLLM_API_KEY` | `VLLM_BASE_URL`; default `http://localhost:8000/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Self-hosted vLLM OpenAI-compatible route. Localhost deployments commonly omit auth. `VLLM_MODEL` is accepted. | @@ -131,9 +131,10 @@ endpoint. ### Xiaomi MiMo Notes `xiaomi-mimo` defaults to `mimo-v2.5-pro` for long-context reasoning and coding -work, while the static registry also exposes `mimo-v2.5`. Xiaomi MiMo TTS is -available through `codewhale --provider xiaomi-mimo speech "text" --model tts` -(or the `tts` alias) plus model-visible `speech` / `tts` tools in Agent/YOLO mode. +work. The chat picker also exposes the latest Omni model `mimo-v2.5`. Xiaomi MiMo +TTS is available through `codewhale --provider xiaomi-mimo speech "text" +--model tts` (or the `tts` alias) plus model-visible `speech` / `tts` tools in +Agent/YOLO mode. Voice-design and voice-clone shorthands map to `mimo-v2.5-tts-voicedesign` and `mimo-v2.5-tts-voiceclone`. Xiaomi's current [image-understanding guide](https://platform.xiaomimimo.com/docs/en-US/usage-guide/multimodal-understanding/image-understanding) @@ -145,8 +146,9 @@ separate `[vision_model]` / `image_analyze` path; set that model to OpenRouter completions and static registry rows include the April 2026 onward large models verified through OpenRouter's model metadata: -`arcee-ai/trinity-large-thinking`, `qwen/qwen3.6-35b-a3b`, -`qwen/qwen3.6-27b`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, +`arcee-ai/trinity-large-thinking`, `qwen/qwen3.6-flash`, +`qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, +`qwen/qwen3.6-plus`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `xiaomi/mimo-v2.5`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `tencent/hy3-preview`, `google/gemma-4-31b-it`, `google/gemma-4-26b-a4b-it`, and `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free`. @@ -168,12 +170,12 @@ endpoint when the endpoint supports model listing. | `atlascloud` | `deepseek-ai/deepseek-v4-flash`, `deepseek-ai/deepseek-v4-pro` | yes | yes | | `wanjie-ark` | `deepseek-reasoner` | yes | yes | | `volcengine` | `DeepSeek-V4-Pro`, `DeepSeek-V4-Flash` | yes | yes | -| `openrouter` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`, `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `xiaomi/mimo-v2.5`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-27b`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `tencent/hy3-preview`, `google/gemma-4-31b-it`, `google/gemma-4-26b-a4b-it`, `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free` | yes | yes | -| `xiaomi-mimo` | `mimo-v2.5-pro`, `mimo-v2.5`, `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | yes | yes for chat models; no for TTS models | +| `openrouter` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`, `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `xiaomi/mimo-v2.5`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `tencent/hy3-preview`, `google/gemma-4-31b-it`, `google/gemma-4-26b-a4b-it`, `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free` | yes | yes | +| `xiaomi-mimo` | `mimo-v2.5-pro`, `mimo-v2.5`; speech/TTS IDs are selected through `codewhale speech` / `tts` | yes | yes for chat models; no for speech/TTS models | | `novita` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash` | yes | yes | | `fireworks` | `accounts/fireworks/models/deepseek-v4-pro` | yes | yes | | `siliconflow` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | yes | yes | -| `arcee` | `trinity-mini`, `trinity-large-preview`; provider-hinted custom model IDs pass through | yes | no for documented direct API models | +| `arcee` | `trinity-large-thinking`, `trinity-large-preview`; provider-hinted custom model IDs pass through | yes | yes for `trinity-large-thinking`; no for `trinity-large-preview` | | `moonshot` | `kimi-k2.6` | yes | yes | | `sglang` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | yes | yes | | `vllm` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | yes | yes | @@ -201,9 +203,13 @@ All shipped providers use the Chat Completions request payload mode today. | NVIDIA NIM V4 registry models | 1,000,000 | 384,000 | yes | yes | not documented in code | | Volcengine Ark V4 model IDs | 1,000,000 | 384,000 | yes | yes | not documented in code | | OpenRouter, Novita, Fireworks, SiliconFlow, SGLang, and vLLM V4 model IDs | 1,000,000 | 384,000 | yes | no | not documented in code | -| Xiaomi MiMo models | 1,000,000 | 128,000 | yes | no | not documented in code | +| Xiaomi MiMo `mimo-v2.5-pro`, `mimo-v2.5` | 1,000,000 | 131,072 | yes | no | not documented in code | +| OpenRouter Qwen 3.6 Flash / Plus | 1,000,000 | 65,536 | yes | no | not documented in code | +| OpenRouter Qwen 3.6 35B / 27B | 262,144 | 262,140 | yes | no | not documented in code | +| OpenRouter Qwen 3.6 Max Preview | 262,144 | 65,536 | yes | no | not documented in code | | Wanjie Ark `reasoner` / `r1` model IDs | 128,000 | 4,096 | yes | no | not documented in code | -| Direct Arcee API models (`trinity-mini`, `trinity-large-preview`) | 128,000 | 4,096 | no in doctor capability metadata | no | not documented in code | +| Direct Arcee API `trinity-large-thinking` | 262,144 | 262,144 | yes | no | not documented in code | +| Direct Arcee API `trinity-large-preview` | 262,144 | 4,096 | no in doctor capability metadata | no | not documented in code | | Generic `openai`, AtlasCloud, and Moonshot/Kimi | 128,000 | 4,096 | no in doctor capability metadata | no | not documented in code | | Ollama | 8,192 | 4,096 | no | no | not documented in code | | Other recognized DeepSeek model IDs | 128,000 unless the model name carries an explicit `Nk` hint | 4,096 | no unless V4/reasoner logic matches | DeepSeek/NIM only | DeepSeek beta only | diff --git a/docs/REMOTE_SETUP_DESIGN.md b/docs/REMOTE_SETUP_DESIGN.md new file mode 100644 index 00000000..384c67ef --- /dev/null +++ b/docs/REMOTE_SETUP_DESIGN.md @@ -0,0 +1,264 @@ +# `codewhale remote-setup` — Design & Implementation Plan + +Status: **design** (do not implement against the 0.8.48 release wrap; land on a +branch or after 0.8.48 ships). Author handoff doc, mirrors the style of +`REFACTOR_HANDOFF.md`. + +## Goal + +One command — `codewhale remote-setup` — that guides a user through standing up +a remote CodeWhale agent they can talk to from a phone chat app, across: + +- **Cloud target:** Tencent Lighthouse **or** Azure (extensible to GCP/Hetzner/bare). +- **Chat bridge:** Feishu/Lark **or** Telegram (extensible to Slack/Discord). +- **Model provider:** any entry in the existing `PROVIDERS` registry + (DeepSeek, OpenAI, NVIDIA NIM, Atlascloud, WanjieArk, OpenRouter, Novita, + Fireworks, Moonshot, SGLang, vLLM, Ollama, Xiaomi). + +Decisions locked with the user: +- **Form:** native Rust subcommand in-binary (touches `crates/cli` + `crates/tui`). +- **Scope:** generate the deploy bundle **and** optionally auto-provision via the + cloud CLI (`az` / CNB), behind a confirmation gate. + +## Prior art: Hermes Agent (reference only — do not copy) + +`/Volumes/VIXinSSD/hermesagent` (Nous Research's Hermes Agent, Python) solves the +same problem and **validates this design**. Use it for ideas; keep CodeWhale's +style (Rust core, zero-dep Node bridges, plain-text replies). + +- Its `gateway/platform_registry.py` is exactly the table-driven approach here: a + `PlatformEntry { name, label, adapter_factory, check_fn, validate_config, + required_env, install_hint, setup_fn, source }`. That maps 1:1 onto our + `BridgeSpec`/`CloudTarget` rows, and its per-platform `setup_fn` + `required_env` + are what our wizard reads to prompt. A single gateway process fans out to many + platforms — the model we want. +- Its `gateway/pairing.py` mirrors our allowlist/first-pairing flow. + +### Telegram hardening checklist (mined from `gateway/platforms/telegram.py`) + +That adapter is battle-tested; its method names enumerate edge cases our MVP +bridge should handle. Status against `integrations/telegram-bridge`: + +| Edge case | In Hermes | In our bridge | +|---|---|---| +| 409 polling conflict (two `getUpdates`) | `_looks_like_polling_conflict` | **done** — poll loop backs off 10s + warns | +| 429 `retry_after` | rate-limit handling | **done** — `telegramApi` honors `parameters.retry_after` | +| Forum "General topic = 1" send/typing asymmetry | `_message_thread_id_for_send` vs `_for_typing` | **done** — omit `message_thread_id` when id is 1 on send | +| "message to be replied not found" after restart | `_send_with_dm_topic_reply_anchor_retry` | **sidestepped** — we never set `reply_to_message_id` | +| Network/connect-timeout retry | `_looks_like_network_error` | partial — generic 3s backoff in poll loop | +| Text batching + progress-edit (edit one msg vs spam) | `test_telegram_text_batching` | **deferred** — we send a chunk every 15s | +| MarkdownV2 escaping + table rendering | `_escape_mdv2`, `_wrap_markdown_tables` | **deferred** — plain text (safe; tables look plain) | +| Webhook mode as an alternative to long polling | `_webhook_mode` | out of scope — long-poll only (no inbound ports) | + +Deferred items are deliberate: progress-edit and MarkdownV2 add real UX polish +but also complexity and (for MDV2) a whole class of parser-escaping bugs. Revisit +after `remote-setup` lands. + +## Design principle: table-driven, like `ProviderSpec` + +The provider registry (`crates/config/src/lib.rs::PROVIDERS`) is the model to +copy: "adding a provider is one row." Apply the same to clouds and bridges so +the matrix grows by data, not by new control flow. + +``` + CloudTarget × BridgeSpec + ProviderSpec (existing registry) + ─────────── ────────── ──────────────────────────────── + lighthouse feishu deepseek / openai / nvidia-nim / … + azure telegram (wizard reads PROVIDERS, prompts for + (future…) (future…) that provider's env_keys[0]) +``` + +Clean separation that the architecture already implies: +- **Provider = a `runtime.env` concern.** The runtime resolves the provider from + `CODEWHALE_PROVIDER` and the provider's own key var. The bridge never needs to + know which provider is behind the runtime — it only forwards `model` to + `/v1/threads`. So "multi-provider" only touches `runtime.env` generation. +- **Cloud = where it runs + where secrets live.** +- **Bridge = pure transport** between a chat app and `127.0.0.1:7878`. + +## Command surface + +New variant in `crates/cli/src/lib.rs` `Commands`: + +```rust +/// Provision and configure a remote CodeWhale agent (cloud + chat bridge). +RemoteSetup(RemoteSetupArgs), +``` + +`RemoteSetupArgs` (clap): + +| Flag | Meaning | +|---|---| +| `--cloud ` | Skip the cloud prompt. | +| `--bridge ` | Skip the bridge prompt. | +| `--provider ` | Provider slug; validated against `PROVIDERS`. | +| `--out ` | Bundle output dir (default `./codewhale-deploy/-`). | +| `--generate-only` | Emit the bundle, do not provision (default). | +| `--apply` | Run the cloud CLI to actually provision (the auto-provision path). | +| `--yes` | Skip the final confirmation gate (CI/non-interactive). | +| `--non-interactive` | Fail instead of prompting if any required value is missing. | + +CLI delegates to the TUI binary exactly like `Serve`/`Setup` do +(`delegate_to_tui(&cli, &resolved_runtime, tui_args("remote-setup", args))`). +The implementation lives next to `run_setup` in `crates/tui/src/`. + +## Code layout + +New module `crates/tui/src/remote_setup/`: + +``` +remote_setup/ + mod.rs # run_remote_setup(): wizard orchestration + dispatch + registry.rs # CloudTarget + BridgeSpec tables (the matrix) + prompt.rs # thin stdin prompt helpers (reuse existing patterns) + bundle.rs # render env files / systemd units / RUNBOOK.md to --out + provision/ + mod.rs # Provisioner trait + confirmation gate + dry-run printer + azure.rs # az preflight, RG, VM+cloud-init, Key Vault, NSG, start + lighthouse.rs # cnb.yml + tag_deploy.yml generation, CNB guidance + templates/ # runtime.env, .env, *.service, cloud-init.yaml.tmpl +``` + +### Registry types + +```rust +pub struct BridgeSpec { + pub slug: &'static str, // "telegram" + pub display: &'static str, // "Telegram" + pub package_dir: &'static str, // "integrations/telegram-bridge" + pub service_unit: &'static str, // "codewhale-telegram-bridge.service" + pub env_template: &'static str, // templates/telegram.env + /// Bridge-specific secret env keys to prompt for (token, etc.). + pub secret_keys: &'static [&'static str], // ["TELEGRAM_BOT_TOKEN"] + /// One-liner shown before prompting (e.g. "Create a bot with @BotFather"). + pub setup_hint: &'static str, +} + +pub struct CloudTarget { + pub slug: &'static str, // "azure" + pub display: &'static str, // "Azure VM" + pub secret_store: SecretStore, // KeyVault | EnvFile + pub install: InstallMethod, // Docker | NativeSystemd + /// Builds the ordered list of provisioning steps as (description, command). + /// Commands are returned as data so they can be dry-run printed, gated, + /// and only then executed. + pub plan: fn(&DeployInputs) -> Vec, +} +``` + +A `ProvisionStep { description, program, args, secret_args }` is *data*, never a +shell string — so the confirmation gate can print every command, secrets are fed +via stdin/temp files (never argv/`history`), and `--apply` just executes the +already-printed plan. + +## Wizard flow + +1. **Cloud** — pick from `CLOUD_TARGETS` (or `--cloud`). +2. **Bridge** — pick from `BRIDGES` (or `--bridge`); print `setup_hint`. +3. **Provider** — list `PROVIDERS` (canonical names), pick (or `--provider`). + Look up `spec.env_keys[0]` as the key var to prompt for. +4. **Secrets** — prompt for: provider API key, bridge token(s) from + `secret_keys`, allowlist (chat ids). Generate a random `CODEWHALE_RUNTIME_TOKEN`. +5. **Mode** — generate-only vs `--apply`. +6. **Render bundle** to `--out` (always, even with `--apply`). +7. **Confirm + provision** (only if `--apply`): print the full ordered command + plan, require `y` (unless `--yes`), then execute step by step with progress. +8. **Print RUNBOOK.md** path and the remaining manual steps. + +## Generated bundle + +Written to `./codewhale-deploy/-/`: + +- `runtime.env` — **provider config lives here**: + ``` + CODEWHALE_PROVIDER=openai + OPENAI_API_KEY=… # the provider's own key var, from registry + CODEWHALE_MODEL=auto + CODEWHALE_RUNTIME_TOKEN= + CODEWHALE_RUNTIME_PORT=7878 + CODEWHALE_RUNTIME_WORKERS=2 + RUST_LOG=info + ``` +- `.env` — transport only: `CODEWHALE_RUNTIME_URL=http://127.0.0.1:7878`, + matching `CODEWHALE_RUNTIME_TOKEN`, allowlist, `TELEGRAM_BOT_TOKEN` (or Feishu + app id/secret), `CODEWHALE_WORKSPACE`, `CODEWHALE_MODEL`. +- `codewhale-runtime.service`, `codewhale-.service`. +- Cloud artifact: `cloud-init.yaml` + `provision.sh` (Azure) or `cnb.yml` + + `tag_deploy.yml` (Lighthouse). +- `RUNBOOK.md` — the exact remaining commands + first-pairing steps. + +## Auto-provision + +### Azure (`--apply --cloud azure`) +Preflight: `az account show` (fail with "run `az login`" if absent). Then the +`plan()` emits, in order: +1. `az group create` (region prompted; default `eastus`). +2. `az keyvault create` + `az keyvault secret set` for the provider key and the + runtime token (secrets via stdin, not argv). +3. `az vm create` with `--custom-data cloud-init.yaml` and a **system-assigned + managed identity**; cloud-init pulls `ghcr.io/hmbown/codewhale:latest`, reads + the secrets from Key Vault via the identity, writes `/etc/codewhale/*.env`, + installs both systemd units, `enable --now`. +4. NSG: SSH (22) only, scoped to the caller's IP; **7878 stays on `127.0.0.1`**. +5. Print the SSH tunnel command for `/status` from a laptop if desired. + +### Lighthouse (`--apply --cloud lighthouse`) +Reuse the existing `deploy/tencent-lighthouse/cnb/*.example` pipeline: render +`cnb.yml` + `tag_deploy.yml` from inputs and walk the user through the CNB +trigger (CNB does the VM-side work). Systemd units mirror the existing +`codewhale-runtime.service`. + +Safety (matches the harness rules for outward-facing actions): +- Every command printed before execution; `y` gate unless `--yes`. +- Secrets never in argv or shell history. +- `--generate-only` is the default; `--apply` is explicit. + +## Namespace migration: `DEEPSEEK_*` → `CODEWHALE_*` + +Follow the convention already in `crates/config/src/lib.rs`: **read +`CODEWHALE_X` first, fall back to `DEEPSEEK_X`.** Nothing breaks for existing +deployments. + +Touch list: +1. **Bridges** (`integrations/feishu-bridge`, `integrations/telegram-bridge`): + in `lib.mjs`/`index.mjs`, read `process.env.CODEWHALE_X ?? process.env.DEEPSEEK_X` + for `RUNTIME_URL`, `RUNTIME_TOKEN`, `WORKSPACE`, `MODEL`, `MODE`, `ALLOW_SHELL`, + `TRUST_MODE`, `AUTO_APPROVE`, `CHAT_ALLOWLIST`, `ALLOW_UNLISTED`, `TURN_TIMEOUT_MS`. + Validators accept either; templates emit `CODEWHALE_*`. +2. **Deploy units** (`deploy/tencent-lighthouse/systemd/*`, + `integrations/*/deploy/*`): `DEEPSEEK_RUNTIME_*` → `CODEWHALE_RUNTIME_*`, + env file paths `/etc/deepseek/` → `/etc/codewhale/` (keep reading the old path + if present). +3. **`.env.example` files + `config.example.toml`**: lead with `CODEWHALE_*`, + document `DEEPSEEK_*` as legacy aliases. +4. **Drop DeepSeek-shaped defaults** in the bridge: no hardcoded + `DEEPSEEK_MODEL=auto`; the provider lives in `runtime.env` via + `CODEWHALE_PROVIDER` + the registry's key var. + +Note: items 1–3 touch **tracked** files, so they are part of the same +"don't ship during 0.8.48" hold. The brand-new (untracked) Telegram bridge can +be converted to `CODEWHALE_*` first as the reference implementation. + +## Tests + +- `registry.rs`: every `CloudTarget`/`BridgeSpec` slug is unique; each bridge's + `package_dir`/`service_unit`/`env_template` exists. +- `bundle.rs`: rendering a bundle for each cloud×bridge×provider triple produces + files with `CODEWHALE_*` keys, a matching runtime/bridge token, and a non-empty + RUNBOOK. +- `provision`: `plan()` returns the expected ordered steps; **commands are built + but never executed** in tests (assert on program+args, secrets redacted). + +## Extensibility check + +- Add **GCP**: one `CloudTarget` row + a `provision/gcp.rs` + a cloud-init reuse. +- Add **Slack**: one `BridgeSpec` row + `integrations/slack-bridge` + template. +No changes to the wizard control flow — it iterates the registries. + +## Suggested sequencing (given the 0.8.48 freeze) + +1. **Now (safe, untracked):** convert the new Telegram bridge to `CODEWHALE_* ?? + DEEPSEEK_*`; finalize this design. +2. **Post-0.8.48, branch:** namespace migration on tracked bridges + deploy units. +3. **Then:** implement `remote-setup` (registry → bundle → Azure provisioner → + Lighthouse provisioner), generate-only first, `--apply` second. diff --git a/npm/codewhale/package.json b/npm/codewhale/package.json index 6e15c9ae..7a2230d2 100644 --- a/npm/codewhale/package.json +++ b/npm/codewhale/package.json @@ -1,7 +1,7 @@ { "name": "codewhale", - "version": "0.8.50", - "codewhaleBinaryVersion": "0.8.50", + "version": "0.8.51", + "codewhaleBinaryVersion": "0.8.51", "description": "Install and run CodeWhale, the agentic terminal for open-source and open-weight coding models, from GitHub release artifacts.", "author": "Hmbown", "license": "MIT",