diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 3a589933..57ce9cec 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -115,3 +115,21 @@ jobs: with: files: artifacts/*/* prerelease: false + + publish-npm: + needs: release + runs-on: ubuntu-latest + # Trusted Publishing via OIDC — no NPM_TOKEN required. + # Configure the trusted publisher on the npm package settings page. + permissions: + contents: read + id-token: write + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-node@v4 + with: + node-version: '20' + registry-url: 'https://registry.npmjs.org' + - name: Publish wrapper to npm + working-directory: npm/deepseek-tui + run: npm publish --provenance --access public diff --git a/.mailmap b/.mailmap new file mode 100644 index 00000000..de638f42 --- /dev/null +++ b/.mailmap @@ -0,0 +1 @@ +Hunter Bown devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> diff --git a/AGENTS.md b/AGENTS.md index 24d722d9..d9934cb5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -17,9 +17,10 @@ See README.md for project overview, docs/ARCHITECTURE.md for internals. ## DeepSeek-Specific Notes - **Thinking Tokens**: DeepSeek models output thinking blocks (`ContentBlock::Thinking`) before final answers. The TUI streams and displays these with visual distinction. -- **Reasoning Models**: `deepseek-reasoner` and `deepseek-r1` excel at step-by-step problem solving. -- **Large Context Window**: 128k tokens. Use search tools to navigate efficiently. -- **API**: OpenAI-compatible with Responses API preferred, chat completions as fallback. Base URL configurable for global (`api.deepseek.com`) or China (`api.deepseeki.com`). +- **Reasoning Models**: `deepseek-v4-pro` and `deepseek-v4-flash` are the documented V4 model IDs. Legacy `deepseek-chat` and `deepseek-reasoner` are compatibility aliases for `deepseek-v4-flash`. +- **Large Context Window**: DeepSeek V4 models have 1M-token context windows. Use search tools to navigate efficiently. +- **API**: OpenAI-compatible Chat Completions (`/chat/completions`) is the documented DeepSeek API path. Base URL configurable for global (`api.deepseek.com`) or China (`api.deepseeki.com`); `/v1` is accepted for OpenAI SDK compatibility, and `/beta` is only needed for beta features such as strict tool mode, chat prefix completion, and FIM completion. +- **Thinking + Tool Calls**: In V4 thinking mode, assistant messages that contain tool calls must replay their `reasoning_content` in all subsequent requests or the API returns HTTP 400. ## Important Notes diff --git a/CHANGELOG.md b/CHANGELOG.md index 65163082..93f27ca0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Fixed +- DeepSeek V4 thinking-mode tool calls now preserve prior assistant `reasoning_content` whenever a tool call is replayed, matching DeepSeek's multi-turn contract and avoiding HTTP 400 rejections on later turns. +- Raw Chat Completions requests now send DeepSeek's top-level `thinking` parameter instead of the OpenAI SDK-only `extra_body` wrapper. +- Context-window budgeting now treats legacy `deepseek-chat` / `deepseek-reasoner` aliases as V4 Flash's 1M-token context window. +- npm wrapper first-run downloads now use process-unique temp files so concurrent `deepseek` / `deepseek-tui` invocations do not race on `*.download` files. + +## [0.4.0] - 2026-04-23 + +### Added +- **DeepSeek V4 support**: `deepseek-v4-pro` (flagship) and `deepseek-v4-flash` (fast/cheap) are now first-class model IDs with 1M context windows. +- **Reasoning-effort tier**: new `reasoning_effort` config field (`off | low | medium | high | max`) mapped to DeepSeek's `reasoning_effort` + `thinking` request fields. Defaults to `max`. +- **Shift+Tab cycles reasoning-effort** through the three behaviorally distinct tiers (`off → high → max`). The current tier is shown as a ⚡ chip in the header. +- Per-model pricing table: `deepseek-v4-pro` priced at $0.145/$1.74/$3.48 per 1M tokens (cache-hit/miss/output); `deepseek-v4-flash` and legacy aliases at $0.028/$0.14/$0.28. + +### Changed +- **Default model flipped to `deepseek-v4-pro`** (from `deepseek-reasoner`). +- `deepseek-chat` / `deepseek-reasoner` remain as silent aliases of `deepseek-v4-flash` for API compatibility; priced identically. +- **Context compaction**: raised `MAX_COMPACTION_MESSAGE_THRESHOLD` from 150 → 500 so 1M-context models can use proportionally more history before message-count compaction. Token-based compaction still triggers at 80% of the window and scales automatically. +- Cycling modes is now Tab-only; Shift+Tab is repurposed for reasoning-effort (reverse-mode cycle was low-value with only three modes). +- Updated help/hint strings, validator error messages, and the model picker to reference V4 IDs. + +### Fixed +- `requires_reasoning_content` now recognizes `deepseek-v4*` so thinking streams render correctly on V4 models. + ## [0.3.33] - 2026-04-11 ### Changed @@ -474,4 +498,4 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [0.1.7]: https://github.com/Hmbown/DeepSeek-TUI/compare/v0.1.6...v0.1.7 [0.1.6]: https://github.com/Hmbown/DeepSeek-TUI/compare/v0.1.5...v0.1.6 [0.1.5]: https://github.com/Hmbown/DeepSeek-TUI/compare/v0.1.0...v0.1.5 -[0.1.0]: https://github.com/Hmbown/DeepSeek-TUI/releases/tag/v0.1.0 \ No newline at end of file +[0.1.0]: https://github.com/Hmbown/DeepSeek-TUI/releases/tag/v0.1.0 diff --git a/Cargo.lock b/Cargo.lock index 0661f52c..66379d59 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -806,7 +806,7 @@ dependencies = [ [[package]] name = "deepseek-agent" -version = "0.3.33" +version = "0.4.0" dependencies = [ "deepseek-config", "serde", @@ -814,7 +814,7 @@ dependencies = [ [[package]] name = "deepseek-app-server" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "axum", @@ -837,7 +837,7 @@ dependencies = [ [[package]] name = "deepseek-config" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "dirs", @@ -848,7 +848,7 @@ dependencies = [ [[package]] name = "deepseek-core" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "chrono", @@ -867,7 +867,7 @@ dependencies = [ [[package]] name = "deepseek-execpolicy" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "deepseek-protocol", @@ -876,7 +876,7 @@ dependencies = [ [[package]] name = "deepseek-hooks" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "async-trait", @@ -890,7 +890,7 @@ dependencies = [ [[package]] name = "deepseek-mcp" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "deepseek-protocol", @@ -900,7 +900,7 @@ dependencies = [ [[package]] name = "deepseek-protocol" -version = "0.3.33" +version = "0.4.0" dependencies = [ "serde", "serde_json", @@ -908,7 +908,7 @@ dependencies = [ [[package]] name = "deepseek-state" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "chrono", @@ -920,7 +920,7 @@ dependencies = [ [[package]] name = "deepseek-tools" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "async-trait", @@ -933,7 +933,7 @@ dependencies = [ [[package]] name = "deepseek-tui" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "arboard", @@ -987,7 +987,7 @@ dependencies = [ [[package]] name = "deepseek-tui-cli" -version = "0.3.33" +version = "0.4.0" dependencies = [ "anyhow", "chrono", @@ -1005,7 +1005,7 @@ dependencies = [ [[package]] name = "deepseek-tui-core" -version = "0.3.33" +version = "0.4.0" [[package]] name = "deranged" diff --git a/Cargo.toml b/Cargo.toml index 2c273331..b1b92600 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -18,7 +18,7 @@ default-members = ["crates/cli", "crates/app-server", "crates/tui"] resolver = "2" [workspace.package] -version = "0.3.33" +version = "0.4.0" edition = "2024" license = "MIT" repository = "https://github.com/Hmbown/DeepSeek-TUI" diff --git a/README.md b/README.md index 70b46717..ef7f0b9c 100644 --- a/README.md +++ b/README.md @@ -21,15 +21,18 @@ npm install -g deepseek-tui Start the TUI: ```bash -deepseek-tui +deepseek ``` On first launch, it will prompt for your API key if one is not already configured. +The package also installs `deepseek-tui`; both commands share the same +`~/.deepseek/config.toml` for DeepSeek auth and default model settings. You can also set auth ahead of time with either of these: ```bash -deepseek-tui login +deepseek login --api-key "YOUR_DEEPSEEK_API_KEY" +deepseek-tui login --api-key "YOUR_DEEPSEEK_API_KEY" DEEPSEEK_API_KEY="YOUR_DEEPSEEK_API_KEY" deepseek-tui ``` @@ -59,7 +62,7 @@ surface immediately. A terminal coding agent for DeepSeek models with file editing, shell execution, `web.run` browsing, git operations, session resume, and [MCP](https://modelcontextprotocol.io) server integration. -Three visible modes (**Tab** / **Shift+Tab** to cycle): +Three visible modes (**Tab** to cycle): | Mode | Behavior | |------|----------| @@ -67,16 +70,35 @@ Three visible modes (**Tab** / **Shift+Tab** to cycle): | **Agent** | Default interactive mode with multi-step tool use | | **YOLO** | Auto-approve tools in a trusted workspace | +**Shift+Tab** cycles the reasoning-effort tier for DeepSeek thinking mode: +`off` → `high` → `max`. The current tier is shown as a ⚡ chip in the header. +Set a default in config with `reasoning_effort = "max"` (or `off` / `low` / +`medium` / `high`). + +## Models & pricing + +| Model | Thinking | Context | Input cache hit | Input cache miss | Output | +|---|---|---|---|---|---| +| `deepseek-v4-pro` | default | 1M | $0.145 / 1M | $1.74 / 1M | $3.48 / 1M | +| `deepseek-v4-flash` | default | 1M | $0.028 / 1M | $0.14 / 1M | $0.28 / 1M | + +Legacy `deepseek-chat` and `deepseek-reasoner` remain as silent aliases for +`deepseek-v4-flash` (priced identically). Pricing is per 1M tokens as published +by DeepSeek and is subject to change. + ## Usage ```bash -deepseek-tui # interactive TUI -deepseek-tui -p "explain this in 2 sentences" # one-shot prompt -deepseek-tui --yolo # YOLO mode -deepseek-tui login # save API key to config -deepseek-tui doctor # check setup -deepseek-tui models # list available models -deepseek-tui serve --http # HTTP/SSE API server +deepseek # interactive TUI +deepseek "explain this in 2 sentences" # one-shot prompt +deepseek --model deepseek-v4-flash "summarize" # one-shot with model override +deepseek --yolo # YOLO mode +deepseek login --api-key "..." # save API key to shared config +deepseek doctor # check setup +deepseek models # list live DeepSeek API models +deepseek sessions # list saved sessions +deepseek resume --last # resume the latest session +deepseek serve --http # HTTP/SSE API server ``` Controls: `F1` help, `Esc` backs out of the current action, `Ctrl+K` command palette. @@ -85,7 +107,13 @@ Controls: `F1` help, `Esc` backs out of the current action, `Ctrl+K` command pal `~/.deepseek/config.toml` — see [config.example.toml](config.example.toml) for all options. -Key environment overrides: `DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, `DEEPSEEK_PROFILE`. +Key environment overrides: `DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, +`DEEPSEEK_MODEL`, `DEEPSEEK_PROFILE`. + +The client targets DeepSeek's documented OpenAI-compatible Chat Completions API +(`/chat/completions`). DeepSeek context caching is automatic; when the API +returns cache hit/miss token fields, the TUI includes them in usage and cost +tracking. Full reference: [docs/CONFIGURATION.md](docs/CONFIGURATION.md). diff --git a/config.example.toml b/config.example.toml index befdab9e..bff40a73 100644 --- a/config.example.toml +++ b/config.example.toml @@ -16,11 +16,28 @@ api_key = "YOUR_DEEPSEEK_API_KEY" # must be non-empty # ───────────────────────────────────────────────────────────────────────────────── base_url = "https://api.deepseek.com" # base_url = "https://api.deepseeki.com" # China users +# base_url = "https://api.deepseek.com/beta" # DeepSeek beta features such as strict tool mode # ───────────────────────────────────────────────────────────────────────────────── # Default Models # ───────────────────────────────────────────────────────────────────────────────── -default_text_model = "deepseek-reasoner" # common IDs: deepseek-reasoner, deepseek-chat (any deepseek-* ID is valid) +# DeepSeek V4 family: +# deepseek-v4-pro — flagship reasoning model +# deepseek-v4-flash — fast, cost-efficient (legacy aliases: deepseek-chat, deepseek-reasoner) +default_text_model = "deepseek-v4-pro" + +# ───────────────────────────────────────────────────────────────────────────────── +# Thinking Mode (DeepSeek V4 reasoning effort) +# ───────────────────────────────────────────────────────────────────────────────── +# "off" — disables chain-of-thought (thinking.type = disabled) +# "low" — compat-maps to "high" server-side +# "medium" — compat-maps to "high" server-side +# "high" — reasoning_effort = high (DeepSeek default) +# "max" — reasoning_effort = max (deepest reasoning) +# +# Shift+Tab in the TUI cycles between off / high / max. The header shows the +# current tier as a ⚡ chip. +reasoning_effort = "max" # ───────────────────────────────────────────────────────────────────────────────── # Paths @@ -81,7 +98,7 @@ exponential_base = 2.0 # token_threshold = 50000 # Trigger compaction above this token estimate # message_threshold = 50 # Or above this message count # model = "deepseek-chat" # Model to use for summarization -# cache_summary = true # Cache the summary block +# cache_summary = true # Keep summary blocks stable; DeepSeek context caching is automatic # ───────────────────────────────────────────────────────────────────────────────── # Capacity Controller (runtime pressure guardrails) diff --git a/crates/agent/Cargo.toml b/crates/agent/Cargo.toml index 34e4ccfc..b33f1f41 100644 --- a/crates/agent/Cargo.toml +++ b/crates/agent/Cargo.toml @@ -7,5 +7,5 @@ repository.workspace = true description = "Model/provider registry and fallback strategy for DeepSeek workspace architecture" [dependencies] -deepseek-config = { path = "../config", version = "0.3.33" } +deepseek-config = { path = "../config", version = "0.4.0" } serde.workspace = true diff --git a/crates/agent/src/lib.rs b/crates/agent/src/lib.rs index 44b20713..f88c6708 100644 --- a/crates/agent/src/lib.rs +++ b/crates/agent/src/lib.rs @@ -30,18 +30,24 @@ impl Default for ModelRegistry { fn default() -> Self { let models = vec![ ModelInfo { - id: "deepseek-reasoner".to_string(), + id: "deepseek-v4-pro".to_string(), provider: ProviderKind::Deepseek, - aliases: vec!["deepseek-r1".to_string()], + aliases: vec![], supports_tools: true, supports_reasoning: true, }, ModelInfo { - id: "deepseek-chat".to_string(), + id: "deepseek-v4-flash".to_string(), provider: ProviderKind::Deepseek, - aliases: vec!["deepseek-v3".to_string(), "deepseek-v3.2".to_string()], + aliases: vec![ + "deepseek-chat".to_string(), + "deepseek-reasoner".to_string(), + "deepseek-r1".to_string(), + "deepseek-v3".to_string(), + "deepseek-v3.2".to_string(), + ], supports_tools: true, - supports_reasoning: false, + supports_reasoning: true, }, ModelInfo { id: "gpt-4.1".to_string(), @@ -112,13 +118,13 @@ impl ModelRegistry { } let final_fallback = self.models.first().cloned().unwrap_or(ModelInfo { - id: "deepseek-reasoner".to_string(), + id: "deepseek-v4-pro".to_string(), provider: ProviderKind::Deepseek, aliases: Vec::new(), supports_tools: true, supports_reasoning: true, }); - fallback_chain.push("global_default:deepseek-reasoner".to_string()); + fallback_chain.push("global_default:deepseek-v4-pro".to_string()); ModelResolution { requested: requested.map(ToOwned::to_owned), resolved: final_fallback, diff --git a/crates/app-server/Cargo.toml b/crates/app-server/Cargo.toml index 88f8e74b..35415872 100644 --- a/crates/app-server/Cargo.toml +++ b/crates/app-server/Cargo.toml @@ -10,15 +10,15 @@ description = "Codex-style app-server transport for DeepSeek workspace architect anyhow.workspace = true axum.workspace = true clap.workspace = true -deepseek-agent = { path = "../agent", version = "0.3.33" } -deepseek-config = { path = "../config", version = "0.3.33" } -deepseek-core = { path = "../core", version = "0.3.33" } -deepseek-execpolicy = { path = "../execpolicy", version = "0.3.33" } -deepseek-hooks = { path = "../hooks", version = "0.3.33" } -deepseek-mcp = { path = "../mcp", version = "0.3.33" } -deepseek-protocol = { path = "../protocol", version = "0.3.33" } -deepseek-state = { path = "../state", version = "0.3.33" } -deepseek-tools = { path = "../tools", version = "0.3.33" } +deepseek-agent = { path = "../agent", version = "0.4.0" } +deepseek-config = { path = "../config", version = "0.4.0" } +deepseek-core = { path = "../core", version = "0.4.0" } +deepseek-execpolicy = { path = "../execpolicy", version = "0.4.0" } +deepseek-hooks = { path = "../hooks", version = "0.4.0" } +deepseek-mcp = { path = "../mcp", version = "0.4.0" } +deepseek-protocol = { path = "../protocol", version = "0.4.0" } +deepseek-state = { path = "../state", version = "0.4.0" } +deepseek-tools = { path = "../tools", version = "0.4.0" } serde.workspace = true serde_json.workspace = true tokio.workspace = true diff --git a/crates/cli/Cargo.toml b/crates/cli/Cargo.toml index 8547b1f4..77e771d2 100644 --- a/crates/cli/Cargo.toml +++ b/crates/cli/Cargo.toml @@ -14,12 +14,12 @@ path = "src/main.rs" anyhow.workspace = true clap.workspace = true clap_complete.workspace = true -deepseek-agent = { path = "../agent", version = "0.3.33" } -deepseek-app-server = { path = "../app-server", version = "0.3.33" } -deepseek-config = { path = "../config", version = "0.3.33" } -deepseek-execpolicy = { path = "../execpolicy", version = "0.3.33" } -deepseek-mcp = { path = "../mcp", version = "0.3.33" } -deepseek-state = { path = "../state", version = "0.3.33" } +deepseek-agent = { path = "../agent", version = "0.4.0" } +deepseek-app-server = { path = "../app-server", version = "0.4.0" } +deepseek-config = { path = "../config", version = "0.4.0" } +deepseek-execpolicy = { path = "../execpolicy", version = "0.4.0" } +deepseek-mcp = { path = "../mcp", version = "0.4.0" } +deepseek-state = { path = "../state", version = "0.4.0" } chrono.workspace = true serde_json.workspace = true tokio.workspace = true diff --git a/crates/cli/src/main.rs b/crates/cli/src/main.rs index 32869513..2d4355cb 100644 --- a/crates/cli/src/main.rs +++ b/crates/cli/src/main.rs @@ -10,7 +10,7 @@ use deepseek_agent::ModelRegistry; use deepseek_app_server::{ AppServerOptions, run as run_app_server, run_stdio as run_app_server_stdio, }; -use deepseek_config::{CliRuntimeOverrides, ConfigStore, ProviderKind}; +use deepseek_config::{CliRuntimeOverrides, ConfigStore, ProviderKind, ResolvedRuntimeOptions}; use deepseek_execpolicy::{AskForApproval, ExecPolicyContext, ExecPolicyEngine}; use deepseek_mcp::{McpServerDefinition, run_stdio_server}; use deepseek_state::{StateStore, ThreadListFilters}; @@ -42,7 +42,11 @@ struct Cli { config: Option, #[arg(long)] profile: Option, - #[arg(long, value_enum)] + #[arg( + long, + value_enum, + help = "Advanced provider selector for non-TUI registry/config commands" + )] provider: Option, #[arg(long)] model: Option, @@ -70,7 +74,37 @@ struct Cli { enum Commands { /// Run interactive/non-interactive flows via the TUI binary. Run(RunArgs), - /// Login using API key, ChatGPT token, or device code style session. + /// Run DeepSeek TUI diagnostics. + Doctor(TuiPassthroughArgs), + /// List live DeepSeek API models via the TUI binary. + Models(TuiPassthroughArgs), + /// List saved TUI sessions. + Sessions(TuiPassthroughArgs), + /// Resume a saved TUI session. + Resume(TuiPassthroughArgs), + /// Fork a saved TUI session. + Fork(TuiPassthroughArgs), + /// Create a default AGENTS.md in the current directory. + Init(TuiPassthroughArgs), + /// Bootstrap MCP config and/or skills directories. + Setup(TuiPassthroughArgs), + /// Run the DeepSeek TUI non-interactive agent command. + Exec(TuiPassthroughArgs), + /// Run a DeepSeek-powered code review over a git diff. + Review(TuiPassthroughArgs), + /// Apply a patch file or stdin to the working tree. + Apply(TuiPassthroughArgs), + /// Run the offline TUI evaluation harness. + Eval(TuiPassthroughArgs), + /// Manage TUI MCP servers. + Mcp(TuiPassthroughArgs), + /// Inspect TUI feature flags. + Features(TuiPassthroughArgs), + /// Run a local TUI server. + Serve(TuiPassthroughArgs), + /// Generate shell completions for the TUI binary. + Completions(TuiPassthroughArgs), + /// Save a DeepSeek API key to the shared config. Login(LoginArgs), /// Remove saved authentication state. Logout, @@ -101,17 +135,23 @@ struct RunArgs { args: Vec, } +#[derive(Debug, Args, Clone)] +struct TuiPassthroughArgs { + #[arg(trailing_var_arg = true, allow_hyphen_values = true)] + args: Vec, +} + #[derive(Debug, Args)] struct LoginArgs { - #[arg(long, value_enum, default_value_t = ProviderArg::Deepseek)] + #[arg(long, value_enum, default_value_t = ProviderArg::Deepseek, hide = true)] provider: ProviderArg, #[arg(long)] api_key: Option, - #[arg(long, default_value_t = false)] + #[arg(long, default_value_t = false, hide = true)] chatgpt: bool, - #[arg(long, default_value_t = false)] + #[arg(long, default_value_t = false, hide = true)] device_code: bool, - #[arg(long)] + #[arg(long, hide = true)] token: Option, } @@ -279,12 +319,57 @@ fn run() -> Result<()> { approval_policy: cli.approval_policy.clone(), sandbox_mode: cli.sandbox_mode.clone(), }; - let _resolved_runtime = store.config.resolve_runtime_options(&runtime_overrides); + let resolved_runtime = store.config.resolve_runtime_options(&runtime_overrides); let command = cli.command.take(); match command { - Some(Commands::Run(args)) => delegate_to_tui(&cli, args.args), + Some(Commands::Run(args)) => delegate_to_tui(&cli, &resolved_runtime, args.args), + Some(Commands::Doctor(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("doctor", args)) + } + Some(Commands::Models(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("models", args)) + } + Some(Commands::Sessions(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("sessions", args)) + } + Some(Commands::Resume(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("resume", args)) + } + Some(Commands::Fork(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("fork", args)) + } + Some(Commands::Init(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("init", args)) + } + Some(Commands::Setup(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("setup", args)) + } + Some(Commands::Exec(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("exec", args)) + } + Some(Commands::Review(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("review", args)) + } + Some(Commands::Apply(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("apply", args)) + } + Some(Commands::Eval(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("eval", args)) + } + Some(Commands::Mcp(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("mcp", args)) + } + Some(Commands::Features(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("features", args)) + } + Some(Commands::Serve(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("serve", args)) + } + Some(Commands::Completions(args)) => { + delegate_to_tui(&cli, &resolved_runtime, tui_args("completions", args)) + } Some(Commands::Login(args)) => run_login_command(&mut store, args), Some(Commands::Logout) => run_logout_command(&mut store), Some(Commands::Auth(args)) => run_auth_command(&mut store, args.command), @@ -305,11 +390,18 @@ fn run() -> Result<()> { forwarded.push("--prompt".to_string()); forwarded.push(prompt); } - delegate_to_tui(&cli, forwarded) + delegate_to_tui(&cli, &resolved_runtime, forwarded) } } } +fn tui_args(command: &str, args: TuiPassthroughArgs) -> Vec { + let mut forwarded = Vec::with_capacity(args.args.len() + 1); + forwarded.push(command.to_string()); + forwarded.extend(args.args); + forwarded +} + fn run_login_command(store: &mut ConfigStore, args: LoginArgs) -> Result<()> { let provider: ProviderKind = args.provider.into(); store.config.provider = provider; @@ -349,12 +441,33 @@ fn run_login_command(store: &mut ConfigStore, args: LoginArgs) -> Result<()> { }; store.config.auth_mode = Some("api_key".to_string()); store.config.providers.for_provider_mut(provider).api_key = Some(api_key); + if provider == ProviderKind::Deepseek { + store.config.api_key = store.config.providers.deepseek.api_key.clone(); + if store.config.default_text_model.is_none() { + store.config.default_text_model = Some( + store + .config + .providers + .deepseek + .model + .clone() + .unwrap_or_else(|| "deepseek-v4-pro".to_string()), + ); + } + } store.save()?; - println!("logged in using API key mode ({})", provider.as_str()); + if provider == ProviderKind::Deepseek { + println!( + "logged in using API key mode (deepseek). This also updates the shared deepseek-tui config." + ); + } else { + println!("logged in using API key mode ({})", provider.as_str()); + } Ok(()) } fn run_logout_command(store: &mut ConfigStore) -> Result<()> { + store.config.api_key = None; store.config.providers.deepseek.api_key = None; store.config.providers.openai.api_key = None; store.config.auth_mode = None; @@ -382,6 +495,7 @@ fn run_auth_command(store: &mut ConfigStore, command: AuthCommand) -> Result<()> .deepseek .api_key .as_ref() + .or(store.config.api_key.as_ref()) .is_some_and(|v| !v.trim().is_empty()); let openai_file = store .config @@ -407,6 +521,9 @@ fn run_auth_command(store: &mut ConfigStore, command: AuthCommand) -> Result<()> }; store.config.provider = provider; store.config.providers.for_provider_mut(provider).api_key = Some(api_key); + if provider == ProviderKind::Deepseek { + store.config.api_key = store.config.providers.deepseek.api_key.clone(); + } store.save()?; println!("saved API key for {}", provider.as_str()); Ok(()) @@ -414,6 +531,9 @@ fn run_auth_command(store: &mut ConfigStore, command: AuthCommand) -> Result<()> AuthCommand::Clear { provider } => { let provider: ProviderKind = provider.into(); store.config.providers.for_provider_mut(provider).api_key = None; + if provider == ProviderKind::Deepseek { + store.config.api_key = None; + } store.save()?; println!("cleared API key for {}", provider.as_str()); Ok(()) @@ -623,7 +743,11 @@ fn persist_mcp_server_definitions( store.save() } -fn delegate_to_tui(cli: &Cli, passthrough: Vec) -> Result<()> { +fn delegate_to_tui( + cli: &Cli, + resolved_runtime: &ResolvedRuntimeOptions, + passthrough: Vec, +) -> Result<()> { let current = std::env::current_exe().context("failed to locate current executable path")?; let tui = current.with_file_name("deepseek-tui"); if !tui.exists() { @@ -642,6 +766,19 @@ fn delegate_to_tui(cli: &Cli, passthrough: Vec) -> Result<()> { } cmd.args(passthrough); + if resolved_runtime.provider != ProviderKind::Deepseek { + bail!( + "The interactive TUI only supports the DeepSeek API. Remove --provider {} or use `deepseek model ...` for provider registry inspection.", + resolved_runtime.provider.as_str() + ); + } + + cmd.env("DEEPSEEK_MODEL", &resolved_runtime.model); + cmd.env("DEEPSEEK_BASE_URL", &resolved_runtime.base_url); + if let Some(api_key) = resolved_runtime.api_key.as_ref() { + cmd.env("DEEPSEEK_API_KEY", api_key); + } + if let Some(provider) = cli.provider { cmd.env("DEEPSEEK_PROVIDER", ProviderKind::from(provider).as_str()); } @@ -931,6 +1068,67 @@ mod tests { )); } + #[test] + fn parses_direct_tui_command_aliases() { + let cli = parse_ok(&["deepseek", "doctor"]); + assert!(matches!( + cli.command, + Some(Commands::Doctor(TuiPassthroughArgs { ref args })) if args.is_empty() + )); + + let cli = parse_ok(&["deepseek", "models", "--json"]); + assert!(matches!( + cli.command, + Some(Commands::Models(TuiPassthroughArgs { ref args })) if args == &["--json"] + )); + + let cli = parse_ok(&["deepseek", "resume", "abc123"]); + assert!(matches!( + cli.command, + Some(Commands::Resume(TuiPassthroughArgs { ref args })) if args == &["abc123"] + )); + + let cli = parse_ok(&["deepseek", "setup", "--skills", "--local"]); + assert!(matches!( + cli.command, + Some(Commands::Setup(TuiPassthroughArgs { ref args })) + if args == &["--skills", "--local"] + )); + } + + #[test] + fn deepseek_login_writes_tui_compatible_config() { + let nanos = chrono::Utc::now().timestamp_nanos_opt().unwrap_or_default(); + let path = std::env::temp_dir().join(format!( + "deepseek-cli-login-test-{}-{nanos}.toml", + std::process::id() + )); + let mut store = ConfigStore::load(Some(path.clone())).expect("store should load"); + + run_login_command( + &mut store, + LoginArgs { + provider: ProviderArg::Deepseek, + api_key: Some("sk-test".to_string()), + chatgpt: false, + device_code: false, + token: None, + }, + ) + .expect("login should write config"); + + assert_eq!(store.config.api_key.as_deref(), Some("sk-test")); + assert_eq!( + store.config.default_text_model.as_deref(), + Some("deepseek-v4-pro") + ); + let saved = std::fs::read_to_string(&path).expect("config should be written"); + assert!(saved.contains("api_key = \"sk-test\"")); + assert!(saved.contains("default_text_model = \"deepseek-v4-pro\"")); + + let _ = std::fs::remove_file(path); + } + #[test] fn parses_global_override_flags() { let cli = parse_ok(&[ @@ -981,6 +1179,11 @@ mod tests { for token in [ "run", + "doctor", + "models", + "sessions", + "resume", + "setup", "login", "logout", "auth", diff --git a/crates/config/src/lib.rs b/crates/config/src/lib.rs index 61214775..1b0513af 100644 --- a/crates/config/src/lib.rs +++ b/crates/config/src/lib.rs @@ -6,7 +6,7 @@ use anyhow::{Context, Result, bail}; use serde::{Deserialize, Serialize}; pub const CONFIG_FILE_NAME: &str = "config.toml"; -const DEFAULT_DEEPSEEK_MODEL: &str = "deepseek-reasoner"; +const DEFAULT_DEEPSEEK_MODEL: &str = "deepseek-v4-pro"; const DEFAULT_OPENAI_MODEL: &str = "gpt-4.1"; const DEFAULT_DEEPSEEK_BASE_URL: &str = "https://api.deepseek.com"; const DEFAULT_OPENAI_BASE_URL: &str = "https://api.openai.com/v1"; @@ -72,6 +72,13 @@ impl ProvidersToml { #[derive(Debug, Clone, Serialize, Deserialize, Default)] pub struct ConfigToml { + /// TUI-compatible DeepSeek API key. Kept at the root so both `deepseek` + /// and `deepseek-tui` can share a single config file. + pub api_key: Option, + /// TUI-compatible DeepSeek base URL. + pub base_url: Option, + /// TUI-compatible default DeepSeek model. + pub default_text_model: Option, #[serde(default)] pub provider: ProviderKind, pub model: Option, @@ -94,6 +101,9 @@ impl ConfigToml { pub fn get_value(&self, key: &str) -> Option { match key { "provider" => Some(self.provider.as_str().to_string()), + "api_key" => self.api_key.clone(), + "base_url" => self.base_url.clone(), + "default_text_model" => self.default_text_model.clone(), "model" => self.model.clone(), "auth.mode" => self.auth_mode.clone(), "auth.chatgpt_access_token" => self.chatgpt_access_token.clone(), @@ -119,6 +129,9 @@ impl ConfigToml { self.provider = ProviderKind::parse(value) .with_context(|| format!("unknown provider '{value}'"))?; } + "api_key" => self.api_key = Some(value.to_string()), + "base_url" => self.base_url = Some(value.to_string()), + "default_text_model" => self.default_text_model = Some(value.to_string()), "model" => self.model = Some(value.to_string()), "auth.mode" => self.auth_mode = Some(value.to_string()), "auth.chatgpt_access_token" => self.chatgpt_access_token = Some(value.to_string()), @@ -131,12 +144,20 @@ impl ConfigToml { "approval_policy" => self.approval_policy = Some(value.to_string()), "sandbox_mode" => self.sandbox_mode = Some(value.to_string()), "providers.deepseek.api_key" => { - self.providers.deepseek.api_key = Some(value.to_string()) + let value = value.to_string(); + self.providers.deepseek.api_key = Some(value.clone()); + self.api_key = Some(value); } "providers.deepseek.base_url" => { - self.providers.deepseek.base_url = Some(value.to_string()); + let value = value.to_string(); + self.providers.deepseek.base_url = Some(value.clone()); + self.base_url = Some(value); + } + "providers.deepseek.model" => { + let value = value.to_string(); + self.providers.deepseek.model = Some(value.clone()); + self.default_text_model = Some(value); } - "providers.deepseek.model" => self.providers.deepseek.model = Some(value.to_string()), "providers.openai.api_key" => self.providers.openai.api_key = Some(value.to_string()), "providers.openai.base_url" => self.providers.openai.base_url = Some(value.to_string()), "providers.openai.model" => self.providers.openai.model = Some(value.to_string()), @@ -151,6 +172,9 @@ impl ConfigToml { pub fn unset_value(&mut self, key: &str) -> Result<()> { match key { "provider" => self.provider = ProviderKind::Deepseek, + "api_key" => self.api_key = None, + "base_url" => self.base_url = None, + "default_text_model" => self.default_text_model = None, "model" => self.model = None, "auth.mode" => self.auth_mode = None, "auth.chatgpt_access_token" => self.chatgpt_access_token = None, @@ -160,9 +184,18 @@ impl ConfigToml { "telemetry" => self.telemetry = None, "approval_policy" => self.approval_policy = None, "sandbox_mode" => self.sandbox_mode = None, - "providers.deepseek.api_key" => self.providers.deepseek.api_key = None, - "providers.deepseek.base_url" => self.providers.deepseek.base_url = None, - "providers.deepseek.model" => self.providers.deepseek.model = None, + "providers.deepseek.api_key" => { + self.providers.deepseek.api_key = None; + self.api_key = None; + } + "providers.deepseek.base_url" => { + self.providers.deepseek.base_url = None; + self.base_url = None; + } + "providers.deepseek.model" => { + self.providers.deepseek.model = None; + self.default_text_model = None; + } "providers.openai.api_key" => self.providers.openai.api_key = None, "providers.openai.base_url" => self.providers.openai.base_url = None, "providers.openai.model" => self.providers.openai.model = None, @@ -178,6 +211,15 @@ impl ConfigToml { let mut out = BTreeMap::new(); out.insert("provider".to_string(), self.provider.as_str().to_string()); + if let Some(v) = self.api_key.as_ref() { + out.insert("api_key".to_string(), redact_secret(v)); + } + if let Some(v) = self.base_url.as_ref() { + out.insert("base_url".to_string(), v.clone()); + } + if let Some(v) = self.default_text_model.as_ref() { + out.insert("default_text_model".to_string(), v.clone()); + } if let Some(v) = self.model.as_ref() { out.insert("model".to_string(), v.clone()); } @@ -236,17 +278,28 @@ impl ConfigToml { let provider = cli.provider.or(env.provider).unwrap_or(self.provider); let provider_cfg = self.providers.for_provider(provider); + let root_deepseek_api_key = (provider == ProviderKind::Deepseek) + .then(|| self.api_key.clone()) + .flatten(); + let root_deepseek_base_url = (provider == ProviderKind::Deepseek) + .then(|| self.base_url.clone()) + .flatten(); + let root_deepseek_model = (provider == ProviderKind::Deepseek) + .then(|| self.default_text_model.clone()) + .flatten(); let api_key = cli .api_key .clone() .or_else(|| env.api_key_for(provider)) - .or_else(|| provider_cfg.api_key.clone()); + .or_else(|| provider_cfg.api_key.clone()) + .or(root_deepseek_api_key); let base_url = cli .base_url .clone() .or_else(|| env.base_url_for(provider)) .or_else(|| provider_cfg.base_url.clone()) + .or(root_deepseek_base_url) .unwrap_or_else(|| match provider { ProviderKind::Deepseek => DEFAULT_DEEPSEEK_BASE_URL.to_string(), ProviderKind::Openai => DEFAULT_OPENAI_BASE_URL.to_string(), @@ -257,6 +310,7 @@ impl ConfigToml { .clone() .or_else(|| env.model.clone()) .or_else(|| provider_cfg.model.clone()) + .or(root_deepseek_model) .or_else(|| self.model.clone()) .unwrap_or_else(|| match provider { ProviderKind::Deepseek => DEFAULT_DEEPSEEK_MODEL.to_string(), @@ -475,3 +529,117 @@ impl EnvRuntimeOverrides { } } } + +#[cfg(test)] +mod tests { + use super::*; + use std::env; + use std::ffi::OsString; + use std::sync::{Mutex, OnceLock}; + + fn env_lock() -> std::sync::MutexGuard<'static, ()> { + static LOCK: OnceLock> = OnceLock::new(); + LOCK.get_or_init(|| Mutex::new(())).lock().unwrap() + } + + struct EnvGuard { + deepseek_api_key: Option, + deepseek_base_url: Option, + deepseek_model: Option, + deepseek_provider: Option, + } + + impl EnvGuard { + fn without_deepseek_runtime_overrides() -> Self { + let guard = Self { + deepseek_api_key: env::var_os("DEEPSEEK_API_KEY"), + deepseek_base_url: env::var_os("DEEPSEEK_BASE_URL"), + deepseek_model: env::var_os("DEEPSEEK_MODEL"), + deepseek_provider: env::var_os("DEEPSEEK_PROVIDER"), + }; + // Safety: test-only environment mutation guarded by a module mutex. + unsafe { + env::remove_var("DEEPSEEK_API_KEY"); + env::remove_var("DEEPSEEK_BASE_URL"); + env::remove_var("DEEPSEEK_MODEL"); + env::remove_var("DEEPSEEK_PROVIDER"); + } + guard + } + + unsafe fn restore_var(key: &str, value: Option) { + if let Some(value) = value { + unsafe { env::set_var(key, value) }; + } else { + unsafe { env::remove_var(key) }; + } + } + } + + impl Drop for EnvGuard { + fn drop(&mut self) { + // Safety: test-only environment mutation guarded by a module mutex. + unsafe { + Self::restore_var("DEEPSEEK_API_KEY", self.deepseek_api_key.take()); + Self::restore_var("DEEPSEEK_BASE_URL", self.deepseek_base_url.take()); + Self::restore_var("DEEPSEEK_MODEL", self.deepseek_model.take()); + Self::restore_var("DEEPSEEK_PROVIDER", self.deepseek_provider.take()); + } + } + } + + #[test] + fn root_deepseek_fields_are_runtime_fallbacks() { + let _lock = env_lock(); + let _env = EnvGuard::without_deepseek_runtime_overrides(); + let config = ConfigToml { + api_key: Some("root-key".to_string()), + base_url: Some("https://api.deepseek.com".to_string()), + default_text_model: Some("deepseek-chat".to_string()), + ..ConfigToml::default() + }; + + let resolved = config.resolve_runtime_options(&CliRuntimeOverrides::default()); + + assert_eq!(resolved.provider, ProviderKind::Deepseek); + assert_eq!(resolved.api_key.as_deref(), Some("root-key")); + assert_eq!(resolved.base_url, "https://api.deepseek.com"); + assert_eq!(resolved.model, "deepseek-chat"); + } + + #[test] + fn provider_specific_deepseek_fields_override_tui_compat_fields() { + let _lock = env_lock(); + let _env = EnvGuard::without_deepseek_runtime_overrides(); + let mut config = ConfigToml { + api_key: Some("root-key".to_string()), + base_url: Some("https://api.deepseek.com".to_string()), + default_text_model: Some("deepseek-chat".to_string()), + ..ConfigToml::default() + }; + config.providers.deepseek.api_key = Some("provider-key".to_string()); + config.providers.deepseek.base_url = Some("https://api.deepseeki.com".to_string()); + config.providers.deepseek.model = Some("deepseek-reasoner".to_string()); + + let resolved = config.resolve_runtime_options(&CliRuntimeOverrides::default()); + + assert_eq!(resolved.api_key.as_deref(), Some("provider-key")); + assert_eq!(resolved.base_url, "https://api.deepseeki.com"); + assert_eq!(resolved.model, "deepseek-reasoner"); + } + + #[test] + fn list_values_redacts_root_api_key() { + let config = ConfigToml { + api_key: Some("sk-deepseek-secret".to_string()), + ..ConfigToml::default() + }; + + let values = config.list_values(); + + assert_eq!( + values.get("api_key").map(String::as_str), + Some("sk-d***cret") + ); + } +} diff --git a/crates/core/Cargo.toml b/crates/core/Cargo.toml index 52a2022d..bf48386d 100644 --- a/crates/core/Cargo.toml +++ b/crates/core/Cargo.toml @@ -9,14 +9,14 @@ description = "Core runtime boundaries for DeepSeek workspace architecture" [dependencies] anyhow.workspace = true chrono.workspace = true -deepseek-agent = { path = "../agent", version = "0.3.33" } -deepseek-config = { path = "../config", version = "0.3.33" } -deepseek-execpolicy = { path = "../execpolicy", version = "0.3.33" } -deepseek-hooks = { path = "../hooks", version = "0.3.33" } -deepseek-mcp = { path = "../mcp", version = "0.3.33" } -deepseek-protocol = { path = "../protocol", version = "0.3.33" } -deepseek-state = { path = "../state", version = "0.3.33" } -deepseek-tools = { path = "../tools", version = "0.3.33" } +deepseek-agent = { path = "../agent", version = "0.4.0" } +deepseek-config = { path = "../config", version = "0.4.0" } +deepseek-execpolicy = { path = "../execpolicy", version = "0.4.0" } +deepseek-hooks = { path = "../hooks", version = "0.4.0" } +deepseek-mcp = { path = "../mcp", version = "0.4.0" } +deepseek-protocol = { path = "../protocol", version = "0.4.0" } +deepseek-state = { path = "../state", version = "0.4.0" } +deepseek-tools = { path = "../tools", version = "0.4.0" } serde_json.workspace = true tokio.workspace = true uuid.workspace = true diff --git a/crates/execpolicy/Cargo.toml b/crates/execpolicy/Cargo.toml index 09960f29..0cedaec1 100644 --- a/crates/execpolicy/Cargo.toml +++ b/crates/execpolicy/Cargo.toml @@ -8,5 +8,5 @@ description = "Execution policy and approval model parity for DeepSeek workspace [dependencies] anyhow.workspace = true -deepseek-protocol = { path = "../protocol", version = "0.3.33" } +deepseek-protocol = { path = "../protocol", version = "0.4.0" } serde.workspace = true diff --git a/crates/hooks/Cargo.toml b/crates/hooks/Cargo.toml index c9d7c08a..9d23fec8 100644 --- a/crates/hooks/Cargo.toml +++ b/crates/hooks/Cargo.toml @@ -10,7 +10,7 @@ description = "Hook dispatch and notifications parity for DeepSeek workspace arc anyhow.workspace = true async-trait.workspace = true chrono.workspace = true -deepseek-protocol = { path = "../protocol", version = "0.3.33" } +deepseek-protocol = { path = "../protocol", version = "0.4.0" } reqwest.workspace = true serde.workspace = true serde_json.workspace = true diff --git a/crates/mcp/Cargo.toml b/crates/mcp/Cargo.toml index a15bfa43..104167de 100644 --- a/crates/mcp/Cargo.toml +++ b/crates/mcp/Cargo.toml @@ -8,6 +8,6 @@ description = "MCP server lifecycle and tool proxy compatibility for DeepSeek wo [dependencies] anyhow.workspace = true -deepseek-protocol = { path = "../protocol", version = "0.3.33" } +deepseek-protocol = { path = "../protocol", version = "0.4.0" } serde.workspace = true serde_json.workspace = true diff --git a/crates/tools/Cargo.toml b/crates/tools/Cargo.toml index 9ffb6345..eb7cee71 100644 --- a/crates/tools/Cargo.toml +++ b/crates/tools/Cargo.toml @@ -9,7 +9,7 @@ description = "Tool invocation lifecycle, schema validation, and scheduler paral [dependencies] anyhow.workspace = true async-trait.workspace = true -deepseek-protocol = { path = "../protocol", version = "0.3.33" } +deepseek-protocol = { path = "../protocol", version = "0.4.0" } serde.workspace = true serde_json.workspace = true tokio.workspace = true diff --git a/crates/tui/src/automation_manager.rs b/crates/tui/src/automation_manager.rs index 6fcacbef..290d56e0 100644 --- a/crates/tui/src/automation_manager.rs +++ b/crates/tui/src/automation_manager.rs @@ -403,7 +403,7 @@ impl AutomationManager { } out.push(record); } - out.sort_by(|a, b| b.updated_at.cmp(&a.updated_at)); + out.sort_by_key(|r| std::cmp::Reverse(r.updated_at)); Ok(out) } @@ -522,7 +522,7 @@ impl AutomationManager { out.push(run); } - out.sort_by(|a, b| b.created_at.cmp(&a.created_at)); + out.sort_by_key(|r| std::cmp::Reverse(r.created_at)); if let Some(limit) = limit { out.truncate(limit); } diff --git a/crates/tui/src/client.rs b/crates/tui/src/client.rs index 77f65a0f..8be8823c 100644 --- a/crates/tui/src/client.rs +++ b/crates/tui/src/client.rs @@ -1,7 +1,8 @@ -//! HTTP client for the DeepSeek OpenAI-compatible APIs. +//! HTTP client for DeepSeek's OpenAI-compatible Chat Completions API. //! -//! Uses the OpenAI Responses API when available, falling back to Chat Completions -//! if the Responses endpoint is unsupported by the target base URL. +//! DeepSeek documents `/chat/completions` as the primary endpoint. A legacy +//! Responses probe remains available behind `DEEPSEEK_EXPERIMENTAL_RESPONSES_API` +//! for local compatibility experiments, but normal traffic uses chat completions. use std::collections::HashSet; use std::pin::Pin; @@ -134,15 +135,16 @@ pub struct DeepSeekClient { retry: RetryPolicy, default_model: String, use_chat_completions: AtomicBool, - /// Counter of chat-completions requests since last Responses API probe. - /// After RESPONSES_RECOVERY_INTERVAL requests, we retry the Responses API. + /// Counter of chat-completions requests since last experimental Responses API probe. + /// After RESPONSES_RECOVERY_INTERVAL requests, we retry the Responses API when + /// `DEEPSEEK_EXPERIMENTAL_RESPONSES_API` is set. chat_fallback_counter: AtomicU32, connection_health: Arc>, rate_limiter: Arc>, } -/// After this many chat-completions requests, retry the Responses API to see -/// if it has recovered. +/// After this many chat-completions requests, retry the experimental Responses +/// API to see if it has recovered. const RESPONSES_RECOVERY_INTERVAL: u32 = 20; const CONNECTION_FAILURE_THRESHOLD: u32 = 2; const RECOVERY_PROBE_COOLDOWN: Duration = Duration::from_secs(15); @@ -150,6 +152,7 @@ const RECOVERY_PROBE_COOLDOWN: Duration = Duration::from_secs(15); const DEFAULT_CLIENT_RATE_LIMIT_RPS: f64 = 8.0; const DEFAULT_CLIENT_RATE_LIMIT_BURST: f64 = 16.0; const ALLOW_INSECURE_HTTP_ENV: &str = "DEEPSEEK_ALLOW_INSECURE_HTTP"; +const EXPERIMENTAL_RESPONSES_API_ENV: &str = "DEEPSEEK_EXPERIMENTAL_RESPONSES_API"; const SSE_BACKPRESSURE_HIGH_WATERMARK: usize = 8 * 1024 * 1024; // 8 MB const SSE_BACKPRESSURE_SLEEP_MS: u64 = 10; @@ -375,6 +378,30 @@ fn validate_base_url_security(base_url: &str) -> Result<()> { ) } +fn experimental_responses_api_enabled() -> bool { + std::env::var(EXPERIMENTAL_RESPONSES_API_ENV) + .ok() + .as_deref() + .is_some_and(|v| v == "1" || v.eq_ignore_ascii_case("true")) +} + +fn versioned_base_url(base_url: &str) -> String { + let trimmed = base_url.trim_end_matches('/'); + if trimmed.ends_with("/v1") || trimmed.ends_with("/beta") { + trimmed.to_string() + } else { + format!("{trimmed}/v1") + } +} + +fn api_url(base_url: &str, path: &str) -> String { + format!( + "{}/{}", + versioned_base_url(base_url).trim_end_matches('/'), + path.trim_start_matches('/') + ) +} + // === DeepSeekClient === impl DeepSeekClient { @@ -428,7 +455,7 @@ impl DeepSeekClient { /// List available models from the provider. pub async fn list_models(&self) -> Result> { - let url = format!("{}/v1/models", self.base_url.trim_end_matches('/')); + let url = api_url(&self.base_url, "models"); let response = self.send_with_retry(|| self.http_client.get(&url)).await?; let status = response.status(); @@ -475,7 +502,7 @@ impl DeepSeekClient { if !should_probe { return; } - let health_url = format!("{}/v1/models", self.base_url.trim_end_matches('/')); + let health_url = api_url(&self.base_url, "models"); let probe = self.http_client.get(health_url).send().await; match probe { Ok(resp) if resp.status().is_success() => { @@ -581,8 +608,9 @@ impl DeepSeekClient { if let Some(choice) = request.tool_choice.as_ref() { body["tool_choice"] = choice.clone(); } + apply_reasoning_effort(&mut body, request.reasoning_effort.as_deref()); - let url = format!("{}/v1/responses", self.base_url.trim_end_matches('/')); + let url = api_url(&self.base_url, "responses"); let response = self .send_with_retry(|| self.http_client.post(&url).json(&body)) .await?; @@ -632,11 +660,9 @@ impl DeepSeekClient { { body["tool_choice"] = mapped; } + apply_reasoning_effort(&mut body, request.reasoning_effort.as_deref()); - let url = format!( - "{}/v1/chat/completions", - self.base_url.trim_end_matches('/') - ); + let url = api_url(&self.base_url, "chat/completions"); let response = self .send_with_retry(|| self.http_client.post(&url).json(&body)) .await?; @@ -666,7 +692,7 @@ impl LlmClient for DeepSeekClient { } async fn health_check(&self) -> Result { - let health_url = format!("{}/v1/models", self.base_url.trim_end_matches('/')); + let health_url = api_url(&self.base_url, "models"); self.wait_for_rate_limit().await; let response = self.http_client.get(health_url).send().await; match response { @@ -688,6 +714,10 @@ impl LlmClient for DeepSeekClient { } async fn create_message(&self, request: MessageRequest) -> Result { + if !experimental_responses_api_enabled() { + return self.create_message_chat(&request).await; + } + // Check if it's time to probe Responses API recovery if self.use_chat_completions.load(Ordering::Relaxed) { let count = self.chat_fallback_counter.fetch_add(1, Ordering::Relaxed); @@ -738,6 +768,9 @@ impl LlmClient for DeepSeekClient { "messages": messages, "max_tokens": request.max_tokens, "stream": true, + "stream_options": { + "include_usage": true + }, }); if let Some(temperature) = request.temperature { @@ -754,11 +787,9 @@ impl LlmClient for DeepSeekClient { { body["tool_choice"] = mapped; } + apply_reasoning_effort(&mut body, request.reasoning_effort.as_deref()); - let url = format!( - "{}/v1/chat/completions", - self.base_url.trim_end_matches('/') - ); + let url = api_url(&self.base_url, "chat/completions"); let response = self .send_with_retry(|| self.http_client.post(&url).json(&body)) .await?; @@ -789,7 +820,7 @@ impl LlmClient for DeepSeekClient { usage: Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }, }, }); @@ -1357,7 +1388,8 @@ fn build_chat_messages( let reasoning_content = thinking_parts.join("\n"); let has_text = !content.trim().is_empty(); let has_tool_calls = !tool_calls.is_empty(); - let has_reasoning = include_reasoning && !reasoning_content.trim().is_empty(); + let include_reasoning_for_turn = include_reasoning && has_tool_calls; + let has_reasoning = include_reasoning_for_turn && !reasoning_content.trim().is_empty(); // DeepSeek rejects assistant messages where both `content` and // `tool_calls` are missing/null. Skip such entries even if they @@ -1546,8 +1578,10 @@ fn tool_to_chat(tool: &Tool) -> Value { if let Some(input_examples) = &tool.input_examples { value["input_examples"] = json!(input_examples); } - if let Some(strict) = tool.strict { - value["strict"] = json!(strict); + if let Some(strict) = tool.strict + && let Some(function) = value.get_mut("function") + { + function["strict"] = json!(strict); } value } @@ -1576,12 +1610,43 @@ fn map_tool_choice_for_chat(choice: &Value) -> Option { fn requires_reasoning_content(model: &str) -> bool { let lower = model.to_lowercase(); lower.contains("deepseek-v3.2") + || lower.contains("deepseek-v4") || lower.contains("reasoner") || lower.contains("-reasoning") || lower.contains("-thinking") || has_deepseek_r_series_marker(&lower) } +/// Translate the TUI's effort-tier string into DeepSeek's request fields. +/// +/// The config surface accepts `off | low | medium | high | max`. DeepSeek +/// itself collapses `low`/`medium` → `"high"` and `xhigh` → `"max"` at the +/// API boundary (per their docs); `off` emits the disable toggle. +fn apply_reasoning_effort(body: &mut Value, effort: Option<&str>) { + let Some(effort) = effort else { + return; + }; + let normalized = effort.trim().to_ascii_lowercase(); + match normalized.as_str() { + "off" | "disabled" | "none" | "false" => { + body["thinking"] = json!({ "type": "disabled" }); + } + "max" | "maximum" | "xhigh" => { + body["reasoning_effort"] = json!("max"); + body["thinking"] = json!({ "type": "enabled" }); + } + "low" | "minimal" | "medium" | "mid" | "high" | "" => { + // Per DeepSeek docs: low/medium compat-map to "high". + body["reasoning_effort"] = json!("high"); + body["thinking"] = json!({ "type": "enabled" }); + } + _ => { + // Unknown value — do not mutate the request, let the provider + // apply its own defaults. + } + } +} + fn has_deepseek_r_series_marker(model_lower: &str) -> bool { const PREFIX: &str = "deepseek-r"; model_lower.match_indices(PREFIX).any(|(idx, _)| { @@ -1701,6 +1766,19 @@ fn parse_usage(usage: Option<&Value>) -> Usage { }) .and_then(Value::as_u64) .unwrap_or(0); + let prompt_cache_hit_tokens = usage + .and_then(|u| u.get("prompt_cache_hit_tokens")) + .and_then(Value::as_u64) + .map(|v| v as u32); + let prompt_cache_miss_tokens = usage + .and_then(|u| u.get("prompt_cache_miss_tokens")) + .and_then(Value::as_u64) + .map(|v| v as u32); + let reasoning_tokens = usage + .and_then(|u| u.get("completion_tokens_details")) + .and_then(|details| details.get("reasoning_tokens")) + .and_then(Value::as_u64) + .map(|v| v as u32); let server_tool_use = usage.and_then(|u| u.get("server_tool_use")).map(|server| { let code_execution_requests = server @@ -1720,6 +1798,9 @@ fn parse_usage(usage: Option<&Value>) -> Usage { Usage { input_tokens: input_tokens as u32, output_tokens: output_tokens as u32, + prompt_cache_hit_tokens, + prompt_cache_miss_tokens, + reasoning_tokens, server_tool_use, } } @@ -1843,6 +1924,20 @@ fn parse_sse_chunk( return events; }; + if choices.is_empty() { + if let Some(usage_val) = chunk.get("usage") { + let usage = parse_usage(Some(usage_val)); + events.push(StreamEvent::MessageDelta { + delta: MessageDelta { + stop_reason: None, + stop_sequence: None, + }, + usage: Some(usage), + }); + } + return events; + } + for choice in choices { let delta = choice.get("delta"); let finish_reason = choice @@ -2077,7 +2172,23 @@ mod tests { } #[test] - fn chat_messages_include_reasoning_content_for_reasoner() { + fn api_url_handles_default_v1_and_beta_base_urls() { + assert_eq!( + api_url("https://api.deepseek.com", "chat/completions"), + "https://api.deepseek.com/v1/chat/completions" + ); + assert_eq!( + api_url("https://api.deepseek.com/v1", "chat/completions"), + "https://api.deepseek.com/v1/chat/completions" + ); + assert_eq!( + api_url("https://api.deepseek.com/beta", "chat/completions"), + "https://api.deepseek.com/beta/chat/completions" + ); + } + + #[test] + fn chat_messages_strip_reasoning_content_from_final_answer() { let message = Message { role: "assistant".to_string(), content: vec![ @@ -2096,9 +2207,10 @@ mod tests { .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) .expect("assistant message"); assert_eq!( - assistant.get("reasoning_content").and_then(Value::as_str), - Some("plan") + assistant.get("content").and_then(Value::as_str), + Some("done") ); + assert!(assistant.get("reasoning_content").is_none()); } #[test] @@ -2117,7 +2229,7 @@ mod tests { } #[test] - fn chat_messages_preserve_thinking_only_assistant_for_reasoner_model() { + fn chat_messages_drop_thinking_only_assistant_for_reasoner_model() { let message = Message { role: "assistant".to_string(), content: vec![ContentBlock::Thinking { @@ -2125,19 +2237,14 @@ mod tests { }], }; let out = build_chat_messages(None, &[message], "deepseek-reasoner"); - let assistant = out - .iter() - .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) - .expect("assistant message"); - assert_eq!(assistant.get("content").and_then(Value::as_str), Some("")); - assert_eq!( - assistant.get("reasoning_content").and_then(Value::as_str), - Some("plan") + assert!( + !out.iter() + .any(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) ); } #[test] - fn chat_messages_preserve_thinking_only_assistant_for_r_series_model() { + fn chat_messages_drop_thinking_only_assistant_for_r_series_model() { let message = Message { role: "assistant".to_string(), content: vec![ContentBlock::Thinking { @@ -2145,6 +2252,47 @@ mod tests { }], }; let out = build_chat_messages(None, &[message], "deepseek-r2-lite-preview"); + assert!( + !out.iter() + .any(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) + ); + } + + #[test] + fn chat_messages_preserve_current_tool_round_reasoning_for_reasoner_model() { + let messages = vec![ + Message { + role: "user".to_string(), + content: vec![ContentBlock::Text { + text: "Need the date".to_string(), + cache_control: None, + }], + }, + Message { + role: "assistant".to_string(), + content: vec![ + ContentBlock::Thinking { + thinking: "Need to call a tool".to_string(), + }, + ContentBlock::ToolUse { + id: "tool-1".to_string(), + name: "get_date".to_string(), + input: json!({}), + caller: None, + }, + ], + }, + Message { + role: "user".to_string(), + content: vec![ContentBlock::ToolResult { + tool_use_id: "tool-1".to_string(), + content: "2026-04-23".to_string(), + is_error: None, + content_blocks: None, + }], + }, + ]; + let out = build_chat_messages(None, &messages, "deepseek-reasoner"); let assistant = out .iter() .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) @@ -2152,10 +2300,183 @@ mod tests { assert_eq!(assistant.get("content").and_then(Value::as_str), Some("")); assert_eq!( assistant.get("reasoning_content").and_then(Value::as_str), - Some("plan") + Some("Need to call a tool") ); } + #[test] + fn chat_messages_preserve_prior_tool_round_reasoning_after_new_user_turn() { + let messages = vec![ + Message { + role: "user".to_string(), + content: vec![ContentBlock::Text { + text: "Need the date".to_string(), + cache_control: None, + }], + }, + Message { + role: "assistant".to_string(), + content: vec![ + ContentBlock::Thinking { + thinking: "Need to call a tool".to_string(), + }, + ContentBlock::ToolUse { + id: "tool-1".to_string(), + name: "get_date".to_string(), + input: json!({}), + caller: None, + }, + ], + }, + Message { + role: "user".to_string(), + content: vec![ContentBlock::ToolResult { + tool_use_id: "tool-1".to_string(), + content: "2026-04-23".to_string(), + is_error: None, + content_blocks: None, + }], + }, + Message { + role: "assistant".to_string(), + content: vec![ContentBlock::Text { + text: "It is 2026-04-23.".to_string(), + cache_control: None, + }], + }, + Message { + role: "user".to_string(), + content: vec![ContentBlock::Text { + text: "Thanks. Next question.".to_string(), + cache_control: None, + }], + }, + ]; + let out = build_chat_messages(None, &messages, "deepseek-reasoner"); + let assistant = out + .iter() + .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) + .expect("assistant message"); + assert_eq!( + assistant.get("reasoning_content").and_then(Value::as_str), + Some("Need to call a tool") + ); + } + + #[test] + fn chat_messages_preserve_v4_tool_round_reasoning() { + let messages = vec![ + Message { + role: "user".to_string(), + content: vec![ContentBlock::Text { + text: "Use a tool".to_string(), + cache_control: None, + }], + }, + Message { + role: "assistant".to_string(), + content: vec![ + ContentBlock::Thinking { + thinking: "Need a tool for this".to_string(), + }, + ContentBlock::ToolUse { + id: "call-1".to_string(), + name: "read_file".to_string(), + input: json!({"path": "Cargo.toml"}), + caller: None, + }, + ], + }, + Message { + role: "user".to_string(), + content: vec![ContentBlock::ToolResult { + tool_use_id: "call-1".to_string(), + content: "workspace manifest".to_string(), + is_error: None, + content_blocks: None, + }], + }, + Message { + role: "assistant".to_string(), + content: vec![ContentBlock::Text { + text: "Read it.".to_string(), + cache_control: None, + }], + }, + Message { + role: "user".to_string(), + content: vec![ContentBlock::Text { + text: "Now continue.".to_string(), + cache_control: None, + }], + }, + ]; + + let out = build_chat_messages(None, &messages, "deepseek-v4-pro"); + let assistant = out + .iter() + .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant")) + .expect("assistant message"); + assert_eq!( + assistant.get("reasoning_content").and_then(Value::as_str), + Some("Need a tool for this") + ); + assert!(assistant.get("tool_calls").is_some()); + } + + #[test] + fn reasoning_effort_uses_deepseek_top_level_thinking_parameter() { + let mut body = json!({}); + apply_reasoning_effort(&mut body, Some("max")); + + assert_eq!( + body.get("reasoning_effort").and_then(Value::as_str), + Some("max") + ); + assert_eq!( + body.pointer("/thinking/type").and_then(Value::as_str), + Some("enabled") + ); + assert!(body.get("extra_body").is_none()); + } + + #[test] + fn reasoning_effort_off_disables_top_level_thinking() { + let mut body = json!({}); + apply_reasoning_effort(&mut body, Some("off")); + + assert_eq!( + body.pointer("/thinking/type").and_then(Value::as_str), + Some("disabled") + ); + assert!(body.get("reasoning_effort").is_none()); + assert!(body.get("extra_body").is_none()); + } + + #[test] + fn chat_tool_strict_flag_is_nested_under_function() { + let tool = Tool { + tool_type: Some("function".to_string()), + name: "emit_json".to_string(), + description: "Emit JSON".to_string(), + input_schema: json!({"type": "object", "properties": {}}), + allowed_callers: None, + defer_loading: None, + input_examples: None, + strict: Some(true), + cache_control: None, + }; + let encoded = tool_to_chat(&tool); + assert_eq!( + encoded + .get("function") + .and_then(|function| function.get("strict")) + .and_then(Value::as_bool), + Some(true) + ); + assert!(encoded.get("strict").is_none()); + } + #[test] fn chat_messages_drop_thinking_only_assistant_for_non_reasoning_model() { let message = Message { @@ -2240,6 +2561,43 @@ mod tests { assert_eq!(deltas, vec![0, 1]); } + #[test] + fn parse_sse_chunk_handles_empty_choices_usage_chunk() { + let chunk = json!({ + "choices": [], + "usage": { + "prompt_tokens": 100, + "completion_tokens": 20, + "prompt_cache_hit_tokens": 70, + "prompt_cache_miss_tokens": 30 + } + }); + + let mut content_index = 0; + let mut text_started = false; + let mut thinking_started = false; + let mut tool_indices: std::collections::HashMap = + std::collections::HashMap::new(); + let events = parse_sse_chunk( + &chunk, + &mut content_index, + &mut text_started, + &mut thinking_started, + &mut tool_indices, + false, + ); + + let StreamEvent::MessageDelta { + usage: Some(usage), .. + } = &events[0] + else { + panic!("expected usage delta"); + }; + assert_eq!(usage.input_tokens, 100); + assert_eq!(usage.prompt_cache_hit_tokens, Some(70)); + assert_eq!(usage.prompt_cache_miss_tokens, Some(30)); + } + #[test] fn chat_messages_drop_orphan_tool_results() { let messages = vec![Message { @@ -2515,6 +2873,25 @@ mod tests { ); } + #[test] + fn parse_usage_reads_deepseek_cache_and_reasoning_tokens() { + let usage = parse_usage(Some(&json!({ + "prompt_tokens": 100, + "completion_tokens": 20, + "prompt_cache_hit_tokens": 70, + "prompt_cache_miss_tokens": 30, + "completion_tokens_details": { + "reasoning_tokens": 12 + } + }))); + + assert_eq!(usage.input_tokens, 100); + assert_eq!(usage.output_tokens, 20); + assert_eq!(usage.prompt_cache_hit_tokens, Some(70)); + assert_eq!(usage.prompt_cache_miss_tokens, Some(30)); + assert_eq!(usage.reasoning_tokens, Some(12)); + } + #[test] fn token_bucket_enforces_delay_when_empty() { let now = Instant::now(); diff --git a/crates/tui/src/compaction.rs b/crates/tui/src/compaction.rs index d886a7f0..11520d07 100644 --- a/crates/tui/src/compaction.rs +++ b/crates/tui/src/compaction.rs @@ -462,7 +462,9 @@ fn estimate_tokens_for_message(message: &Message) -> usize { .iter() .map(|c| match c { ContentBlock::Text { text, .. } => text.len() / 4, - ContentBlock::Thinking { thinking } => thinking.len() / 4, + // Historical reasoning blocks are UI/session metadata for DeepSeek. + // They are only sent back during an in-progress tool-call round. + ContentBlock::Thinking { .. } => 0, ContentBlock::ToolUse { input, .. } => serde_json::to_string(input) .map(|s| s.len() / 4) .unwrap_or(100), @@ -819,6 +821,7 @@ async fn create_summary( tool_choice: None, metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: Some(0.3), top_p: None, @@ -859,15 +862,14 @@ fn extract_workflow_context(messages: &[Message], workspace: Option<&Path>) -> S files_touched.push(path); } } - ContentBlock::Text { text, .. } => { + ContentBlock::Text { text, .. } // Look for task/todo mentions - if text.contains("TODO") || text.contains("task") || text.contains("need to") { + if (text.contains("TODO") || text.contains("task") || text.contains("need to")) => { let task = truncate_chars(text, 200).to_string(); if !tasks_identified.contains(&task) { tasks_identified.push(task); } } - } _ => {} } } diff --git a/crates/tui/src/config.rs b/crates/tui/src/config.rs index 1ca4fa35..59432916 100644 --- a/crates/tui/src/config.rs +++ b/crates/tui/src/config.rs @@ -15,14 +15,24 @@ use crate::hooks::HooksConfig; pub const DEFAULT_MAX_SUBAGENTS: usize = 5; pub const MAX_SUBAGENTS: usize = 20; -pub const DEFAULT_TEXT_MODEL: &str = "deepseek-reasoner"; +pub const DEFAULT_TEXT_MODEL: &str = "deepseek-v4-pro"; const API_KEYRING_SENTINEL: &str = "__KEYRING__"; -pub const COMMON_DEEPSEEK_MODELS: &[&str] = &["deepseek-chat", "deepseek-reasoner"]; +pub const COMMON_DEEPSEEK_MODELS: &[&str] = &[ + "deepseek-v4-pro", + "deepseek-v4-flash", + "deepseek-chat", + "deepseek-reasoner", +]; /// Canonicalize common model aliases to stable DeepSeek IDs. +/// +/// Legacy `deepseek-chat` / `deepseek-reasoner` remain as silent aliases: they +/// resolve to themselves for API compatibility and are priced as `deepseek-v4-flash`. #[must_use] pub fn canonical_model_name(model: &str) -> Option<&'static str> { match model.trim().to_ascii_lowercase().as_str() { + "deepseek-v4-pro" | "deepseek-v4pro" => Some("deepseek-v4-pro"), + "deepseek-v4-flash" | "deepseek-v4flash" => Some("deepseek-v4-flash"), "deepseek-chat" | "deepseek-v3" | "deepseek-v3.2" => Some("deepseek-chat"), "deepseek-reasoner" | "deepseek-r1" => Some("deepseek-reasoner"), _ => None, @@ -123,6 +133,9 @@ pub struct Config { pub api_key: Option, pub base_url: Option, pub default_text_model: Option, + /// DeepSeek reasoning-effort tier: `"off" | "low" | "medium" | "high" | "max"`. + /// Defaults to `"max"` at runtime if unset. + pub reasoning_effort: Option, pub tools_file: Option, pub skills_dir: Option, pub mcp_config_path: Option, @@ -215,7 +228,7 @@ impl Config { && normalize_model_name(model).is_none() { anyhow::bail!( - "Invalid default_text_model '{model}': expected a DeepSeek model ID (for example: deepseek-chat, deepseek-reasoner, deepseek-v4)." + "Invalid default_text_model '{model}': expected a DeepSeek model ID (for example: deepseek-v4-pro, deepseek-v4-flash)." ); } if let Some(policy) = self.approval_policy.as_deref() { @@ -373,6 +386,12 @@ impl Config { .clamp(1, MAX_SUBAGENTS) } + /// Return the configured DeepSeek reasoning-effort tier, if any. + #[must_use] + pub fn reasoning_effort(&self) -> Option<&str> { + self.reasoning_effort.as_deref() + } + /// Get hooks configuration, returning default if not configured. pub fn hooks_config(&self) -> HooksConfig { self.hooks.clone().unwrap_or_default() @@ -566,6 +585,11 @@ fn apply_env_overrides(config: &mut Config) { if let Ok(value) = std::env::var("DEEPSEEK_BASE_URL") { config.base_url = Some(value); } + if let Ok(value) = + std::env::var("DEEPSEEK_MODEL").or_else(|_| std::env::var("DEEPSEEK_DEFAULT_TEXT_MODEL")) + { + config.default_text_model = Some(value); + } if let Ok(value) = std::env::var("DEEPSEEK_SKILLS_DIR") { config.skills_dir = Some(value); } @@ -753,6 +777,7 @@ fn merge_config(base: Config, override_cfg: Config) -> Config { api_key: override_cfg.api_key.or(base.api_key), base_url: override_cfg.base_url.or(base.base_url), default_text_model: override_cfg.default_text_model.or(base.default_text_model), + reasoning_effort: override_cfg.reasoning_effort.or(base.reasoning_effort), tools_file: override_cfg.tools_file.or(base.tools_file), skills_dir: override_cfg.skills_dir.or(base.skills_dir), mcp_config_path: override_cfg.mcp_config_path.or(base.mcp_config_path), @@ -926,6 +951,11 @@ api_key = "{key_to_write}" # Default model default_text_model = "{default_model}" + +# Thinking mode (DeepSeek V4 reasoning effort): +# "off" | "low" | "medium" | "high" | "max" +# Shift+Tab in the TUI cycles between off / high / max. +reasoning_effort = "max" "#, default_model = DEFAULT_TEXT_MODEL ) @@ -1007,6 +1037,8 @@ mod tests { userprofile: Option, deepseek_config_path: Option, deepseek_api_key: Option, + deepseek_model: Option, + deepseek_default_text_model: Option, } impl EnvGuard { @@ -1018,18 +1050,24 @@ mod tests { let userprofile_prev = env::var_os("USERPROFILE"); let deepseek_config_prev = env::var_os("DEEPSEEK_CONFIG_PATH"); let api_key_prev = env::var_os("DEEPSEEK_API_KEY"); + let model_prev = env::var_os("DEEPSEEK_MODEL"); + let default_text_model_prev = env::var_os("DEEPSEEK_DEFAULT_TEXT_MODEL"); // Safety: test-only environment mutation guarded by a global mutex. unsafe { env::set_var("HOME", &home_str); env::set_var("USERPROFILE", &home_str); env::set_var("DEEPSEEK_CONFIG_PATH", &config_str); env::remove_var("DEEPSEEK_API_KEY"); + env::remove_var("DEEPSEEK_MODEL"); + env::remove_var("DEEPSEEK_DEFAULT_TEXT_MODEL"); } Self { home: home_prev, userprofile: userprofile_prev, deepseek_config_path: deepseek_config_prev, deepseek_api_key: api_key_prev, + deepseek_model: model_prev, + deepseek_default_text_model: default_text_model_prev, } } } @@ -1042,6 +1080,11 @@ mod tests { Self::restore_var("USERPROFILE", self.userprofile.take()); Self::restore_var("DEEPSEEK_CONFIG_PATH", self.deepseek_config_path.take()); Self::restore_var("DEEPSEEK_API_KEY", self.deepseek_api_key.take()); + Self::restore_var("DEEPSEEK_MODEL", self.deepseek_model.take()); + Self::restore_var( + "DEEPSEEK_DEFAULT_TEXT_MODEL", + self.deepseek_default_text_model.take(), + ); } } } @@ -1280,4 +1323,29 @@ mod tests { config.validate()?; Ok(()) } + + #[test] + fn deepseek_model_env_overrides_default_text_model() -> Result<()> { + let _lock = lock_test_env(); + let nanos = SystemTime::now() + .duration_since(UNIX_EPOCH) + .unwrap() + .as_nanos(); + let temp_root = env::temp_dir().join(format!( + "deepseek-tui-model-env-test-{}-{}", + std::process::id(), + nanos + )); + fs::create_dir_all(&temp_root)?; + let _guard = EnvGuard::new(&temp_root); + + // Safety: test-only environment mutation guarded by a global mutex. + unsafe { + env::set_var("DEEPSEEK_MODEL", "deepseek-chat"); + } + + let config = Config::load(None, None)?; + assert_eq!(config.default_text_model.as_deref(), Some("deepseek-chat")); + Ok(()) + } } diff --git a/crates/tui/src/core/engine.rs b/crates/tui/src/core/engine.rs index 5f4ccafa..67cbc732 100644 --- a/crates/tui/src/core/engine.rs +++ b/crates/tui/src/core/engine.rs @@ -30,7 +30,7 @@ use crate::llm_client::LlmClient; use crate::mcp::McpPool; use crate::models::{ ContentBlock, ContentBlockStart, DEFAULT_CONTEXT_WINDOW_TOKENS, Delta, Message, MessageRequest, - StreamEvent, SystemPrompt, Tool, ToolCaller, Usage, context_window_for_model, + StreamEvent, SystemBlock, SystemPrompt, Tool, ToolCaller, Usage, context_window_for_model, }; use crate::prompts; use crate::tools::plan::{SharedPlanState, new_shared_plan_state}; @@ -360,6 +360,7 @@ const TOOL_RESULT_CONTEXT_SNIPPET_CHARS: usize = 900; /// Max chars to keep from metadata-provided output summaries. const TOOL_RESULT_METADATA_SUMMARY_CHARS: usize = 320; const COMPACTION_SUMMARY_MARKER: &str = "Conversation Summary (Auto-Generated)"; +const WORKING_SET_SUMMARY_MARKER: &str = "## Repo Working Set"; const TOOL_CALL_START_MARKERS: [&str; 5] = [ "[TOOL_CALL]", @@ -1127,6 +1128,56 @@ fn extract_compaction_summary_prompt(prompt: Option) -> Option) -> Option { + match prompt { + Some(SystemPrompt::Blocks(blocks)) => { + let filtered: Vec = blocks + .iter() + .filter(|block| !block.text.contains(WORKING_SET_SUMMARY_MARKER)) + .cloned() + .collect(); + if filtered.is_empty() { + None + } else { + Some(SystemPrompt::Blocks(filtered)) + } + } + Some(SystemPrompt::Text(text)) => Some(SystemPrompt::Text(text.clone())), + None => None, + } +} + +fn append_working_set_summary( + prompt: Option, + working_set_summary: Option<&str>, +) -> Option { + let Some(summary) = working_set_summary.map(str::trim).filter(|s| !s.is_empty()) else { + return prompt; + }; + let working_set_block = SystemBlock { + block_type: "text".to_string(), + text: summary.to_string(), + cache_control: None, + }; + + match prompt { + Some(SystemPrompt::Text(text)) => Some(SystemPrompt::Blocks(vec![ + SystemBlock { + block_type: "text".to_string(), + text, + cache_control: None, + }, + working_set_block, + ])), + Some(SystemPrompt::Blocks(mut blocks)) => { + blocks.retain(|block| !block.text.contains(WORKING_SET_SUMMARY_MARKER)); + blocks.push(working_set_block); + Some(SystemPrompt::Blocks(blocks)) + } + None => Some(SystemPrompt::Blocks(vec![working_set_block])), + } +} + fn estimate_text_tokens_conservative(text: &str) -> usize { text.chars().count().div_ceil(3) } @@ -1231,12 +1282,10 @@ impl Engine { // Set up system prompt with project context (default to agent mode) let working_set_summary = session.working_set.summary_block(&config.workspace); - let system_prompt = prompts::system_prompt_for_mode_with_context( - AppMode::Agent, - &config.workspace, - working_set_summary.as_deref(), - ); - session.system_prompt = Some(system_prompt); + let system_prompt = + prompts::system_prompt_for_mode_with_context(AppMode::Agent, &config.workspace, None); + session.system_prompt = + append_working_set_summary(Some(system_prompt), working_set_summary.as_deref()); let subagent_manager = new_shared_subagent_manager(config.workspace.clone(), config.max_subagents); @@ -1285,6 +1334,7 @@ impl Engine { content, mode, model, + reasoning_effort, allow_shell, trust_mode, auto_approve, @@ -1293,6 +1343,7 @@ impl Engine { content, mode, model, + reasoning_effort, allow_shell, trust_mode, auto_approve, @@ -1444,11 +1495,13 @@ impl Engine { } /// Handle a send message operation + #[allow(clippy::too_many_arguments)] async fn handle_send_message( &mut self, content: String, mode: AppMode, model: String, + reasoning_effort: Option, allow_shell: bool, trust_mode: bool, auto_approve: bool, @@ -1511,6 +1564,7 @@ impl Engine { self.session.model = model; self.config.model.clone_from(&self.session.model); + self.session.reasoning_effort = reasoning_effort; self.session.allow_shell = allow_shell; self.config.allow_shell = allow_shell; self.session.trust_mode = trust_mode; @@ -1650,7 +1704,7 @@ impl Engine { let zero_usage = Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }; let Some(client) = self.deepseek_client.clone() else { let message = "Manual compaction unavailable: API client not configured".to_string(); @@ -2428,6 +2482,7 @@ impl Engine { }, metadata: None, thinking: None, + reasoning_effort: self.session.reasoning_effort.clone(), stream: Some(true), temperature: None, top_p: None, @@ -2471,7 +2526,7 @@ impl Engine { let mut usage = Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }; let mut current_block_kind: Option = None; let mut current_tool_index: Option = None; @@ -4277,13 +4332,11 @@ impl Engine { .session .working_set .summary_block(&self.config.workspace); - let base = prompts::system_prompt_for_mode_with_context( - mode, - &self.config.workspace, - working_set_summary.as_deref(), - ); - self.session.system_prompt = + let base = prompts::system_prompt_for_mode_with_context(mode, &self.config.workspace, None); + let stable_prompt = merge_system_prompts(Some(&base), self.session.compaction_summary_prompt.clone()); + self.session.system_prompt = + append_working_set_summary(stable_prompt, working_set_summary.as_deref()); } fn merge_compaction_summary(&mut self, summary_prompt: Option) { @@ -4294,8 +4347,15 @@ impl Engine { self.session.compaction_summary_prompt.as_ref(), summary_prompt.clone(), ); + let current_without_working_set = + remove_working_set_summary(self.session.system_prompt.as_ref()); + let merged = merge_system_prompts(current_without_working_set.as_ref(), summary_prompt); + let working_set_summary = self + .session + .working_set + .summary_block(&self.config.workspace); self.session.system_prompt = - merge_system_prompts(self.session.system_prompt.as_ref(), summary_prompt); + append_working_set_summary(merged, working_set_summary.as_deref()); } } diff --git a/crates/tui/src/core/engine/tests.rs b/crates/tui/src/core/engine/tests.rs index de510121..7a3071e5 100644 --- a/crates/tui/src/core/engine/tests.rs +++ b/crates/tui/src/core/engine/tests.rs @@ -221,12 +221,80 @@ fn detects_context_length_errors_from_provider_payloads() { #[test] fn context_budget_reserves_output_and_headroom() { - let budget = context_input_budget("deepseek-reasoner", TURN_MAX_OUTPUT_TOKENS) + let budget = context_input_budget("deepseek-v3.2-128k", TURN_MAX_OUTPUT_TOKENS) .expect("deepseek models should have known context window"); let expected = 128_000usize - 4_096usize - 1_024usize; assert_eq!(budget, expected); } +#[test] +fn refresh_system_prompt_places_working_set_after_stable_prefix() { + let tmp = tempdir().expect("tempdir"); + fs::create_dir_all(tmp.path().join("src")).expect("mkdir"); + fs::write(tmp.path().join("src/lib.rs"), "pub fn sample() {}").expect("write"); + + let config = EngineConfig { + workspace: tmp.path().to_path_buf(), + ..Default::default() + }; + let (mut engine, _handle) = Engine::new(config, &Config::default()); + engine + .session + .working_set + .observe_user_message("please inspect src/lib.rs", tmp.path()); + + engine.refresh_system_prompt(AppMode::Agent); + + let Some(SystemPrompt::Blocks(blocks)) = &engine.session.system_prompt else { + panic!("expected structured prompt blocks"); + }; + let last = blocks.last().expect("working-set block"); + assert!(last.text.contains(WORKING_SET_SUMMARY_MARKER)); + assert!( + blocks[..blocks.len() - 1] + .iter() + .all(|block| !block.text.contains(WORKING_SET_SUMMARY_MARKER)) + ); +} + +#[test] +fn compaction_summary_stays_before_volatile_working_set() { + let tmp = tempdir().expect("tempdir"); + fs::create_dir_all(tmp.path().join("src")).expect("mkdir"); + fs::write(tmp.path().join("src/main.rs"), "fn main() {}").expect("write"); + + let config = EngineConfig { + workspace: tmp.path().to_path_buf(), + ..Default::default() + }; + let (mut engine, _handle) = Engine::new(config, &Config::default()); + engine + .session + .working_set + .observe_user_message("continue in src/main.rs", tmp.path()); + engine.refresh_system_prompt(AppMode::Agent); + engine.merge_compaction_summary(Some(SystemPrompt::Blocks(vec![SystemBlock { + block_type: "text".to_string(), + text: format!("{COMPACTION_SUMMARY_MARKER}\nsummary"), + cache_control: None, + }]))); + + let Some(SystemPrompt::Blocks(blocks)) = &engine.session.system_prompt else { + panic!("expected structured prompt blocks"); + }; + let summary_index = blocks + .iter() + .position(|block| block.text.contains(COMPACTION_SUMMARY_MARKER)) + .expect("summary block"); + let working_set_index = blocks + .iter() + .position(|block| block.text.contains(WORKING_SET_SUMMARY_MARKER)) + .expect("working-set block"); + + assert!(summary_index < working_set_index); + assert_eq!(working_set_index, blocks.len() - 1); +} + #[tokio::test] async fn pre_request_refresh_invoked_when_medium_risk() { let capacity = CapacityControllerConfig { @@ -245,6 +313,11 @@ async fn pre_request_refresh_invoked_when_medium_risk() { .capacity_controller .mark_turn_start(engine.turn_counter); + // Pin the model to an explicit 128k-context variant so the pressure ratio stays + // stable regardless of changes to the workspace-wide default model. + engine.session.model = "deepseek-v3.2-128k".to_string(); + engine.config.model = "deepseek-v3.2-128k".to_string(); + let long = "x".repeat(5_000); for _ in 0..200 { engine.session.messages.push(Message { diff --git a/crates/tui/src/core/ops.rs b/crates/tui/src/core/ops.rs index f33f3558..efb42bcb 100644 --- a/crates/tui/src/core/ops.rs +++ b/crates/tui/src/core/ops.rs @@ -16,6 +16,9 @@ pub enum Op { content: String, mode: AppMode, model: String, + /// Reasoning-effort tier: `"off" | "low" | "medium" | "high" | "max"`. + /// `None` lets the provider apply its default. + reasoning_effort: Option, allow_shell: bool, trust_mode: bool, auto_approve: bool, @@ -72,6 +75,7 @@ impl Op { content: impl Into, mode: AppMode, model: impl Into, + reasoning_effort: Option, allow_shell: bool, trust_mode: bool, auto_approve: bool, @@ -80,6 +84,7 @@ impl Op { content: content.into(), mode, model: model.into(), + reasoning_effort, allow_shell, trust_mode, auto_approve, diff --git a/crates/tui/src/core/session.rs b/crates/tui/src/core/session.rs index 59a964f8..37629848 100644 --- a/crates/tui/src/core/session.rs +++ b/crates/tui/src/core/session.rs @@ -13,6 +13,11 @@ pub struct Session { /// Model being used pub model: String, + /// Reasoning-effort tier for DeepSeek thinking mode: + /// `"off" | "low" | "medium" | "high" | "max"`. `None` lets the provider + /// apply its own defaults. + pub reasoning_effort: Option, + /// Workspace directory pub workspace: PathBuf, @@ -69,6 +74,12 @@ impl SessionUsage { pub fn add(&mut self, usage: &Usage) { self.input_tokens += u64::from(usage.input_tokens); self.output_tokens += u64::from(usage.output_tokens); + if let Some(tokens) = usage.prompt_cache_miss_tokens { + self.cache_creation_input_tokens += u64::from(tokens); + } + if let Some(tokens) = usage.prompt_cache_hit_tokens { + self.cache_read_input_tokens += u64::from(tokens); + } } } @@ -88,6 +99,7 @@ impl Session { Self { model, + reasoning_effort: None, workspace, system_prompt: None, compaction_summary_prompt: None, diff --git a/crates/tui/src/core/turn.rs b/crates/tui/src/core/turn.rs index b14c1733..ea187710 100644 --- a/crates/tui/src/core/turn.rs +++ b/crates/tui/src/core/turn.rs @@ -57,7 +57,7 @@ impl TurnContext { usage: Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }, } } @@ -94,6 +94,25 @@ impl TurnContext { pub fn add_usage(&mut self, usage: &Usage) { self.usage.input_tokens += usage.input_tokens; self.usage.output_tokens += usage.output_tokens; + self.usage.prompt_cache_hit_tokens = add_optional_usage( + self.usage.prompt_cache_hit_tokens, + usage.prompt_cache_hit_tokens, + ); + self.usage.prompt_cache_miss_tokens = add_optional_usage( + self.usage.prompt_cache_miss_tokens, + usage.prompt_cache_miss_tokens, + ); + self.usage.reasoning_tokens = + add_optional_usage(self.usage.reasoning_tokens, usage.reasoning_tokens); + } +} + +fn add_optional_usage(total: Option, delta: Option) -> Option { + match (total, delta) { + (Some(total), Some(delta)) => Some(total.saturating_add(delta)), + (None, Some(delta)) => Some(delta), + (Some(total), None) => Some(total), + (None, None) => None, } } diff --git a/crates/tui/src/main.rs b/crates/tui/src/main.rs index 404eabca..304d25bf 100644 --- a/crates/tui/src/main.rs +++ b/crates/tui/src/main.rs @@ -1264,6 +1264,7 @@ async fn test_api_connectivity(config: &Config) -> Result { tool_choice: None, metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: None, top_p: None, @@ -1548,6 +1549,7 @@ Provide findings ordered by severity with file references, then open questions, tool_choice: None, metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: Some(0.2), top_p: Some(0.9), @@ -2175,6 +2177,7 @@ async fn run_one_shot(config: &Config, model: &str, prompt: &str) -> Result<()> tool_choice: None, metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: None, top_p: None, @@ -2213,6 +2216,7 @@ async fn run_one_shot_json(config: &Config, model: &str, prompt: &str) -> Result tool_choice: None, metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: Some(0.2), top_p: Some(0.9), @@ -2293,6 +2297,7 @@ async fn run_exec_agent( prompt, mode, model, + None, auto_approve || config.allow_shell(), trust_mode, auto_approve, @@ -2343,25 +2348,19 @@ async fn run_exec_agent( } ends_with_newline = content.ends_with('\n'); } - Event::MessageComplete { .. } => { - if !json_output && !ends_with_newline { - println!(); + Event::MessageComplete { .. } if !json_output && !ends_with_newline => { + println!(); + } + Event::ToolCallStarted { name, input, .. } if !json_output => { + let summary = summarize_tool_args(&input); + if let Some(summary) = summary { + eprintln!("tool: {name} ({summary})"); + } else { + eprintln!("tool: {name}"); } } - Event::ToolCallStarted { name, input, .. } => { - if !json_output { - let summary = summarize_tool_args(&input); - if let Some(summary) = summary { - eprintln!("tool: {name} ({summary})"); - } else { - eprintln!("tool: {name}"); - } - } - } - Event::ToolCallProgress { id, output } => { - if !json_output { - eprintln!("tool {id}: {}", summarize_tool_output(&output)); - } + Event::ToolCallProgress { id, output } if !json_output => { + eprintln!("tool {id}: {}", summarize_tool_output(&output)); } Event::ToolCallComplete { name, result, .. } => match result { Ok(output) => { diff --git a/crates/tui/src/models.rs b/crates/tui/src/models.rs index 89c248eb..0ecdaae8 100644 --- a/crates/tui/src/models.rs +++ b/crates/tui/src/models.rs @@ -3,11 +3,12 @@ use serde::{Deserialize, Serialize}; pub const DEFAULT_CONTEXT_WINDOW_TOKENS: u32 = 128_000; +pub const DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS: u32 = 1_000_000; pub const DEFAULT_COMPACTION_TOKEN_THRESHOLD: usize = 50_000; pub const DEFAULT_COMPACTION_MESSAGE_THRESHOLD: usize = 50; const COMPACTION_THRESHOLD_PERCENT: u32 = 80; const COMPACTION_MESSAGE_DIVISOR: u32 = 1200; -const MAX_COMPACTION_MESSAGE_THRESHOLD: usize = 150; +const MAX_COMPACTION_MESSAGE_THRESHOLD: usize = 500; // === Core Message Types === @@ -27,6 +28,10 @@ pub struct MessageRequest { pub metadata: Option, #[serde(skip_serializing_if = "Option::is_none")] pub thinking: Option, + /// DeepSeek reasoning-effort tier: "off" | "low" | "medium" | "high" | "max". + /// Translated by the client into DeepSeek's `reasoning_effort` + `thinking` fields. + #[serde(skip_serializing_if = "Option::is_none")] + pub reasoning_effort: Option, #[serde(skip_serializing_if = "Option::is_none")] pub stream: Option, #[serde(skip_serializing_if = "Option::is_none")] @@ -176,11 +181,17 @@ pub struct MessageResponse { } /// Token usage metadata for a response. -#[derive(Debug, Serialize, Deserialize, Clone)] +#[derive(Debug, Serialize, Deserialize, Clone, Default)] pub struct Usage { pub input_tokens: u32, pub output_tokens: u32, #[serde(skip_serializing_if = "Option::is_none")] + pub prompt_cache_hit_tokens: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub prompt_cache_miss_tokens: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub reasoning_tokens: Option, + #[serde(skip_serializing_if = "Option::is_none")] pub server_tool_use: Option, } @@ -188,11 +199,15 @@ pub struct Usage { #[must_use] pub fn context_window_for_model(model: &str) -> Option { let lower = model.to_lowercase(); - // DeepSeek models default to 128k unless an explicit *k suffix is present. + // Unknown DeepSeek model IDs default to 128k unless an explicit *k suffix is present. + // DeepSeek-V4 family and current legacy aliases ship with a 1M context window. if lower.contains("deepseek") { if let Some(explicit_window) = deepseek_context_window_hint(&lower) { return Some(explicit_window); } + if lower.contains("v4") || is_current_deepseek_v4_alias(&lower) { + return Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS); + } return Some(DEFAULT_CONTEXT_WINDOW_TOKENS); } if lower.contains("claude") { @@ -201,6 +216,13 @@ pub fn context_window_for_model(model: &str) -> Option { None } +fn is_current_deepseek_v4_alias(model_lower: &str) -> bool { + matches!( + model_lower, + "deepseek-chat" | "deepseek-reasoner" | "deepseek-r1" | "deepseek-v3" | "deepseek-v3.2" + ) +} + fn deepseek_context_window_hint(model_lower: &str) -> Option { let bytes = model_lower.as_bytes(); let mut i = 0usize; @@ -342,13 +364,29 @@ mod tests { use super::*; #[test] - fn deepseek_models_map_to_128k_context_window() { + fn current_deepseek_aliases_map_to_v4_1m_context_window() { assert_eq!( context_window_for_model("deepseek-reasoner"), - Some(DEFAULT_CONTEXT_WINDOW_TOKENS) + Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS) + ); + assert_eq!( + context_window_for_model("deepseek-chat"), + Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS) + ); + assert_eq!( + context_window_for_model("deepseek-v3"), + Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS) ); assert_eq!( context_window_for_model("deepseek-v3.2"), + Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS) + ); + } + + #[test] + fn unknown_deepseek_models_map_to_128k_context_window() { + assert_eq!( + context_window_for_model("deepseek-coder"), Some(DEFAULT_CONTEXT_WINDOW_TOKENS) ); assert_eq!( @@ -357,6 +395,18 @@ mod tests { ); } + #[test] + fn deepseek_v4_models_map_to_1m_context_window() { + assert_eq!( + context_window_for_model("deepseek-v4-pro"), + Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS) + ); + assert_eq!( + context_window_for_model("deepseek-v4-flash"), + Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS) + ); + } + #[test] fn deepseek_models_with_k_suffix_use_hint() { assert_eq!(context_window_for_model("deepseek-v3.2-32k"), Some(32_000)); @@ -372,17 +422,32 @@ mod tests { #[test] fn compaction_threshold_scales_with_context_window() { - assert_eq!(compaction_threshold_for_model("deepseek-reasoner"), 102_400); + assert_eq!( + compaction_threshold_for_model("deepseek-v3.2-128k"), + 102_400 + ); assert_eq!(compaction_threshold_for_model("unknown-model"), 50_000); } #[test] fn compaction_message_threshold_scales_with_context_window() { assert_eq!( - compaction_message_threshold_for_model("deepseek-reasoner"), + compaction_message_threshold_for_model("deepseek-v3.2-128k"), 106 ); assert_eq!(compaction_message_threshold_for_model("unknown-model"), 50); - assert_eq!(compaction_message_threshold_for_model("claude-3"), 150); + // 200k / 1200 = 166, within the raised cap of 500. + assert_eq!(compaction_message_threshold_for_model("claude-3"), 166); + } + + #[test] + fn compaction_scales_for_deepseek_v4_1m_context() { + // 80% of 1M = 800k tokens before token-based compaction. + assert_eq!(compaction_threshold_for_model("deepseek-v4-pro"), 800_000); + // 1M / 1200 = 833, clamped to the 500-message cap. + assert_eq!( + compaction_message_threshold_for_model("deepseek-v4-pro"), + 500 + ); } } diff --git a/crates/tui/src/modules/text.rs b/crates/tui/src/modules/text.rs index 6a898a61..66553112 100644 --- a/crates/tui/src/modules/text.rs +++ b/crates/tui/src/modules/text.rs @@ -180,6 +180,7 @@ async fn process_deepseek_turn( tool_choice: options.tool_choice.clone(), metadata: None, thinking: None, + reasoning_effort: None, stream: Some(options.stream), temperature: options.temperature, top_p: options.top_p, diff --git a/crates/tui/src/pricing.rs b/crates/tui/src/pricing.rs index 74762b1a..70d78048 100644 --- a/crates/tui/src/pricing.rs +++ b/crates/tui/src/pricing.rs @@ -2,53 +2,66 @@ //! //! Pricing based on DeepSeek's published rates (per million tokens). +use crate::models::Usage; + /// Per-million-token pricing for a model. struct ModelPricing { - input_per_million: f64, + input_cache_hit_per_million: f64, + input_cache_miss_per_million: f64, output_per_million: f64, } /// Look up pricing for a model name. fn pricing_for_model(model: &str) -> Option { let lower = model.to_lowercase(); - if lower.contains("deepseek-reasoner") || lower.contains("deepseek-r1") { - // DeepSeek-R1: $0.55/M input, $2.19/M output + if !lower.contains("deepseek") { + return None; + } + if lower.contains("v4-pro") || lower.contains("v4pro") { Some(ModelPricing { - input_per_million: 0.55, - output_per_million: 2.19, - }) - } else if lower.contains("deepseek-v3.2") { - // DeepSeek-V3.2 (with reasoning): same pricing tier as V3 - Some(ModelPricing { - input_per_million: 0.27, - output_per_million: 1.10, - }) - } else if lower.contains("deepseek-chat") || lower.contains("deepseek-v3") { - // DeepSeek-V3: $0.27/M input, $1.10/M output - Some(ModelPricing { - input_per_million: 0.27, - output_per_million: 1.10, - }) - } else if lower.contains("deepseek") { - // Generic DeepSeek fallback (V3 pricing) - Some(ModelPricing { - input_per_million: 0.27, - output_per_million: 1.10, + input_cache_hit_per_million: 0.145, + input_cache_miss_per_million: 1.74, + output_per_million: 3.48, }) } else { - None + // deepseek-v4-flash and legacy aliases (deepseek-chat, deepseek-reasoner, + // deepseek-v3*) all price as v4-flash. + Some(ModelPricing { + input_cache_hit_per_million: 0.028, + input_cache_miss_per_million: 0.14, + output_per_million: 0.28, + }) } } /// Calculate cost for a turn given token usage and model. #[must_use] +#[allow(dead_code)] pub fn calculate_turn_cost(model: &str, input_tokens: u32, output_tokens: u32) -> Option { let pricing = pricing_for_model(model)?; - let input_cost = (input_tokens as f64 / 1_000_000.0) * pricing.input_per_million; + let input_cost = (input_tokens as f64 / 1_000_000.0) * pricing.input_cache_miss_per_million; let output_cost = (output_tokens as f64 / 1_000_000.0) * pricing.output_per_million; Some(input_cost + output_cost) } +/// Calculate cost from provider usage, honoring DeepSeek context-cache fields. +#[must_use] +pub fn calculate_turn_cost_from_usage(model: &str, usage: &Usage) -> Option { + let pricing = pricing_for_model(model)?; + let hit_tokens = usage.prompt_cache_hit_tokens.unwrap_or(0); + let miss_tokens = usage + .prompt_cache_miss_tokens + .unwrap_or_else(|| usage.input_tokens.saturating_sub(hit_tokens)); + let accounted_input = hit_tokens.saturating_add(miss_tokens); + let uncategorized_input = usage.input_tokens.saturating_sub(accounted_input); + + let hit_cost = (hit_tokens as f64 / 1_000_000.0) * pricing.input_cache_hit_per_million; + let miss_cost = ((miss_tokens.saturating_add(uncategorized_input)) as f64 / 1_000_000.0) + * pricing.input_cache_miss_per_million; + let output_cost = (usage.output_tokens as f64 / 1_000_000.0) * pricing.output_per_million; + Some(hit_cost + miss_cost + output_cost) +} + /// Format a USD cost for compact display. #[must_use] #[allow(dead_code)] diff --git a/crates/tui/src/runtime_api.rs b/crates/tui/src/runtime_api.rs index ae1c9d8d..a7dc3523 100644 --- a/crates/tui/src/runtime_api.rs +++ b/crates/tui/src/runtime_api.rs @@ -2014,7 +2014,7 @@ mod tests { usage: Usage { input_tokens: 10, output_tokens: 5, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -2027,7 +2027,7 @@ mod tests { usage: Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -2159,7 +2159,7 @@ mod tests { usage: Usage { input_tokens: 5, output_tokens: 3, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -2281,7 +2281,7 @@ mod tests { usage: Usage { input_tokens: 2, output_tokens: 1, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -2500,7 +2500,7 @@ mod tests { usage: Usage { input_tokens: 4, output_tokens: 2, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, diff --git a/crates/tui/src/runtime_threads.rs b/crates/tui/src/runtime_threads.rs index ca97963d..1b856626 100644 --- a/crates/tui/src/runtime_threads.rs +++ b/crates/tui/src/runtime_threads.rs @@ -320,7 +320,7 @@ impl RuntimeThreadStore { } out.push(thread); } - out.sort_by(|a, b| b.updated_at.cmp(&a.updated_at)); + out.sort_by_key(|t| std::cmp::Reverse(t.updated_at)); Ok(out) } @@ -349,7 +349,7 @@ impl RuntimeThreadStore { out.push(turn); } } - out.sort_by(|a, b| a.created_at.cmp(&b.created_at)); + out.sort_by_key(|a| a.created_at); Ok(out) } @@ -1044,6 +1044,7 @@ impl RuntimeThreadManager { prompt, mode, model.clone(), + None, allow_shell, trust_mode, auto_approve, @@ -2638,7 +2639,7 @@ mod tests { usage: Usage { input_tokens: 10, output_tokens: 12, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -2913,7 +2914,7 @@ mod tests { usage: Usage { input_tokens: 5, output_tokens: 5, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -3144,7 +3145,7 @@ mod tests { usage: Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -3226,7 +3227,7 @@ mod tests { usage: Usage { input_tokens: 0, output_tokens: 0, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -3286,7 +3287,7 @@ mod tests { usage: Usage { input_tokens: 8, output_tokens: 9, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -3395,7 +3396,7 @@ mod tests { usage: Usage { input_tokens: 3, output_tokens: 3, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, @@ -3425,7 +3426,7 @@ mod tests { usage: Usage { input_tokens: 1, output_tokens: 1, - server_tool_use: None, + ..Usage::default() }, status: TurnOutcomeStatus::Completed, error: None, diff --git a/crates/tui/src/session_manager.rs b/crates/tui/src/session_manager.rs index 013d7eb1..81c8b408 100644 --- a/crates/tui/src/session_manager.rs +++ b/crates/tui/src/session_manager.rs @@ -306,7 +306,7 @@ impl SessionManager { } // Sort by updated_at descending (most recent first) - sessions.sort_by(|a, b| b.updated_at.cmp(&a.updated_at)); + sessions.sort_by_key(|s| std::cmp::Reverse(s.updated_at)); Ok(sessions) } diff --git a/crates/tui/src/settings.rs b/crates/tui/src/settings.rs index a96306fd..0430be7e 100644 --- a/crates/tui/src/settings.rs +++ b/crates/tui/src/settings.rs @@ -235,7 +235,7 @@ impl Settings { let Some(model) = normalize_model_name(trimmed) else { anyhow::bail!( - "Failed to update setting: invalid model '{value}'. Expected: a DeepSeek model ID (for example deepseek-chat, deepseek-reasoner, deepseek-v4), or none/default." + "Failed to update setting: invalid model '{value}'. Expected: a DeepSeek model ID (for example deepseek-v4-pro, deepseek-v4-flash), or none/default." ); }; self.default_model = Some(model); diff --git a/crates/tui/src/task_manager.rs b/crates/tui/src/task_manager.rs index 6dd60406..de3d33e5 100644 --- a/crates/tui/src/task_manager.rs +++ b/crates/tui/src/task_manager.rs @@ -767,7 +767,7 @@ impl TaskManager { .values() .map(TaskSummary::from) .collect::>(); - items.sort_by(|a, b| b.created_at.cmp(&a.created_at)); + items.sort_by_key(|i| std::cmp::Reverse(i.created_at)); if let Some(limit) = limit { items.truncate(limit); } @@ -1483,7 +1483,7 @@ mod tests { data_dir: root, worker_count: 1, default_workspace: PathBuf::from("."), - default_model: "deepseek-v3.2".to_string(), + default_model: "deepseek-v4-flash".to_string(), default_mode: "agent".to_string(), allow_shell: false, trust_mode: false, diff --git a/crates/tui/src/tools/review.rs b/crates/tui/src/tools/review.rs index eb5d4e9a..497567c6 100644 --- a/crates/tui/src/tools/review.rs +++ b/crates/tui/src/tools/review.rs @@ -227,6 +227,7 @@ impl ToolSpec for ReviewTool { tool_choice: None, metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: Some(0.2), top_p: Some(0.9), diff --git a/crates/tui/src/tools/subagent.rs b/crates/tui/src/tools/subagent.rs index e28478f2..732f2866 100644 --- a/crates/tui/src/tools/subagent.rs +++ b/crates/tui/src/tools/subagent.rs @@ -2301,6 +2301,7 @@ async fn run_subagent( tool_choice: Some(json!({ "type": "auto" })), metadata: None, thinking: None, + reasoning_effort: None, stream: Some(false), temperature: None, top_p: None, @@ -2311,10 +2312,8 @@ async fn run_subagent( let mut tool_uses = Vec::new(); for block in &response.content { match block { - ContentBlock::Text { text, .. } => { - if !text.trim().is_empty() { - final_result = Some(text.clone()); - } + ContentBlock::Text { text, .. } if !text.trim().is_empty() => { + final_result = Some(text.clone()); } ContentBlock::ToolUse { id, name, input, .. diff --git a/crates/tui/src/tui/app.rs b/crates/tui/src/tui/app.rs index d84bec83..95ab05a3 100644 --- a/crates/tui/src/tui/app.rs +++ b/crates/tui/src/tui/app.rs @@ -49,6 +49,81 @@ pub enum AppMode { Plan, } +/// DeepSeek reasoning-effort tier, mirrored on ChatGPT/Claude effort pickers. +/// +/// The config file accepts all five string values for forward-compat with +/// providers that expose the full spectrum; DeepSeek currently collapses +/// `Low`/`Medium` → `high` and `Max` → `max` at the API boundary. The +/// keyboard cycler (Shift+Tab) walks only the three behaviorally distinct +/// tiers: `Off` → `High` → `Max` → `Off`. +#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)] +pub enum ReasoningEffort { + Off, + Low, + Medium, + High, + #[default] + Max, +} + +impl ReasoningEffort { + /// Parse a config-file string into an effort tier. Unknown values fall + /// back to the default (`Max`) rather than erroring out. + #[must_use] + pub fn from_setting(value: &str) -> Self { + match value.trim().to_ascii_lowercase().as_str() { + "off" | "disabled" | "none" | "false" => Self::Off, + "low" | "minimal" => Self::Low, + "medium" | "mid" => Self::Medium, + "high" => Self::High, + "max" | "maximum" | "xhigh" => Self::Max, + _ => Self::default(), + } + } + + /// Canonical lowercase label used for config storage and UI hints. + #[must_use] + pub fn as_setting(self) -> &'static str { + match self { + Self::Off => "off", + Self::Low => "low", + Self::Medium => "medium", + Self::High => "high", + Self::Max => "max", + } + } + + /// Short label for the header chip. + #[must_use] + pub fn short_label(self) -> &'static str { + match self { + Self::Off => "off", + Self::Low => "low", + Self::Medium => "med", + Self::High => "high", + Self::Max => "max", + } + } + + /// Value forwarded to the engine/client. `None` means "provider default" + /// (for `Off` we still emit `"off"` so the client can inject + /// `thinking = {"type": "disabled"}`). + #[must_use] + pub fn api_value(self) -> Option<&'static str> { + Some(self.as_setting()) + } + + /// Cycle through the three behaviorally distinct tiers. + #[must_use] + pub fn cycle_next(self) -> Self { + match self { + Self::Off => Self::High, + Self::Low | Self::Medium | Self::High => Self::Max, + Self::Max => Self::Off, + } + } +} + /// Sidebar content focus mode. #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum SidebarFocus { @@ -302,6 +377,9 @@ pub struct App { /// Last status text already promoted from `status_message` into toast state. pub last_status_message_seen: Option, pub model: String, + /// Current reasoning-effort tier for DeepSeek thinking mode. + /// Cycled via Shift+Tab; initialized from config at startup. + pub reasoning_effort: ReasoningEffort, pub workspace: PathBuf, pub skills_dir: PathBuf, pub use_alt_screen: bool, @@ -415,6 +493,10 @@ pub struct App { pub last_prompt_tokens: Option, /// Last completion token usage pub last_completion_tokens: Option, + /// DeepSeek context-cache hit tokens from the last API call. + pub last_prompt_cache_hit_tokens: Option, + /// DeepSeek context-cache miss tokens from the last API call. + pub last_prompt_cache_miss_tokens: Option, /// Cached git context snapshot for the footer. pub workspace_context: Option, /// Timestamp for cached workspace context. @@ -593,6 +675,11 @@ impl App { sticky_status: None, last_status_message_seen: None, model, + reasoning_effort: config + .reasoning_effort() + .map_or_else(ReasoningEffort::default, |s| { + ReasoningEffort::from_setting(s) + }), workspace, skills_dir, use_alt_screen, @@ -673,6 +760,8 @@ impl App { runtime_turn_status: None, last_prompt_tokens: None, last_completion_tokens: None, + last_prompt_cache_hit_tokens: None, + last_prompt_cache_miss_tokens: None, workspace_context: None, workspace_context_refreshed_at: None, task_panel: Vec::new(), @@ -763,6 +852,7 @@ impl App { } /// Cycle through modes in reverse: YOLO -> Agent -> Plan + #[allow(dead_code)] pub fn cycle_mode_reverse(&mut self) { let next = match self.mode { AppMode::Agent => AppMode::Plan, @@ -772,6 +862,18 @@ impl App { let _ = self.set_mode(next); } + /// Cycle reasoning-effort through the three behaviorally distinct tiers: + /// `Off` → `High` → `Max` → `Off`. + pub fn cycle_effort(&mut self) { + self.reasoning_effort = self.reasoning_effort.cycle_next(); + self.needs_redraw = true; + self.push_status_toast( + format!("Thinking: {}", self.reasoning_effort.short_label()), + StatusToastLevel::Info, + Some(1_500), + ); + } + /// Execute hooks for a specific event with the given context pub fn execute_hooks(&self, event: HookEvent, context: &HookContext) -> Vec { self.hooks.execute(event, context) @@ -1538,10 +1640,10 @@ mod tests { fn test_update_model_compaction_budget() { let mut app = App::new(test_options(false), &Config::default()); let initial_threshold = app.compact_threshold; - app.model = "deepseek-reasoner".to_string(); + app.model = "deepseek-v3.2-128k".to_string(); app.update_model_compaction_budget(); // Threshold may have changed based on model - // deepseek-reasoner has 128k context, so threshold should be higher + // Explicit 128k DeepSeek model IDs have a higher threshold than unknown models. assert!(app.compact_threshold >= initial_threshold); } diff --git a/crates/tui/src/tui/history.rs b/crates/tui/src/tui/history.rs index fd8ff54f..7f31b97a 100644 --- a/crates/tui/src/tui/history.rs +++ b/crates/tui/src/tui/history.rs @@ -1432,11 +1432,9 @@ fn status_symbol(started_at: Option, status: ToolStatus, low_motion: bo |t| t.elapsed().as_millis(), ); let cycle = u128::from(TOOL_STATUS_SYMBOL_MS); - let idx = if cycle == 0 { - 0 - } else { - (elapsed_ms / cycle) % (TOOL_RUNNING_SYMBOLS.len() as u128) - }; + let idx = elapsed_ms + .checked_div(cycle) + .map_or(0, |d| d % (TOOL_RUNNING_SYMBOLS.len() as u128)); TOOL_RUNNING_SYMBOLS[usize::try_from(idx).unwrap_or_default()].to_string() } ToolStatus::Success => TOOL_DONE_SYMBOL.to_string(), diff --git a/crates/tui/src/tui/session_picker.rs b/crates/tui/src/tui/session_picker.rs index f2170bf6..8ac6481b 100644 --- a/crates/tui/src/tui/session_picker.rs +++ b/crates/tui/src/tui/session_picker.rs @@ -83,14 +83,14 @@ impl SessionPickerView { match self.sort_mode { SortMode::Recent => { self.sessions - .sort_by(|a, b| b.updated_at.cmp(&a.updated_at)); + .sort_by_key(|s| std::cmp::Reverse(s.updated_at)); } SortMode::Name => { self.sessions.sort_by(|a, b| a.title.cmp(&b.title)); } SortMode::Size => { self.sessions - .sort_by(|a, b| b.message_count.cmp(&a.message_count)); + .sort_by_key(|s| std::cmp::Reverse(s.message_count)); } } diff --git a/crates/tui/src/tui/ui.rs b/crates/tui/src/tui/ui.rs index 1d2ffa2c..a386fc18 100644 --- a/crates/tui/src/tui/ui.rs +++ b/crates/tui/src/tui/ui.rs @@ -148,6 +148,8 @@ pub async fn run_tui(config: &Config, options: TuiOptions) -> Result<()> { app.total_conversation_tokens = app.total_tokens; app.last_prompt_tokens = None; app.last_completion_tokens = None; + app.last_prompt_cache_hit_tokens = None; + app.last_prompt_cache_miss_tokens = None; if let Some(prompt) = saved.system_prompt { app.system_prompt = Some(SystemPrompt::Text(prompt)); } @@ -521,16 +523,16 @@ async fn run_event_loop( app.total_conversation_tokens.saturating_add(turn_tokens); app.last_prompt_tokens = Some(usage.input_tokens); app.last_completion_tokens = Some(usage.output_tokens); + app.last_prompt_cache_hit_tokens = usage.prompt_cache_hit_tokens; + app.last_prompt_cache_miss_tokens = usage.prompt_cache_miss_tokens; if let Some(error) = error { app.status_message = Some(format!("Turn failed: {error}")); } // Update session cost - if let Some(turn_cost) = crate::pricing::calculate_turn_cost( - &app.model, - usage.input_tokens, - usage.output_tokens, - ) { + if let Some(turn_cost) = + crate::pricing::calculate_turn_cost_from_usage(&app.model, &usage) + { app.session_cost += turn_cost; } @@ -907,13 +909,11 @@ async fn run_event_loop( let _ = engine_handle.send(Op::Shutdown).await; return Ok(()); } - KeyCode::Esc => { - if app.onboarding == OnboardingState::ApiKey { - app.onboarding = OnboardingState::Welcome; - app.api_key_input.clear(); - app.api_key_cursor = 0; - app.status_message = None; - } + KeyCode::Esc if app.onboarding == OnboardingState::ApiKey => { + app.onboarding = OnboardingState::Welcome; + app.api_key_input.clear(); + app.api_key_cursor = 0; + app.status_message = None; } KeyCode::Enter => match app.onboarding { OnboardingState::Welcome => { @@ -1068,20 +1068,26 @@ async fn run_event_loop( // Global keybindings match key.code { - KeyCode::Enter if app.input.is_empty() && app.transcript_selection.is_active() => { - if open_pager_for_selection(app) { - continue; - } + KeyCode::Enter + if app.input.is_empty() + && app.transcript_selection.is_active() + && open_pager_for_selection(app) => + { + continue; } - KeyCode::Char('l') if key.modifiers.is_empty() && app.input.is_empty() => { - if open_pager_for_last_message(app) { - continue; - } + KeyCode::Char('l') + if key.modifiers.is_empty() + && app.input.is_empty() + && open_pager_for_last_message(app) => + { + continue; } - KeyCode::Char('v') if key.modifiers.is_empty() && app.input.is_empty() => { - if open_tool_details_pager(app) { - continue; - } + KeyCode::Char('v') + if key.modifiers.is_empty() + && app.input.is_empty() + && open_tool_details_pager(app) => + { + continue; } KeyCode::Char('1') if key.modifiers.contains(KeyModifiers::ALT) => { if key.modifiers.contains(KeyModifiers::CONTROL) { @@ -1169,11 +1175,11 @@ async fn run_event_loop( return Ok(()); } } - KeyCode::Char('d') if key.modifiers.contains(KeyModifiers::CONTROL) => { - if app.input.is_empty() { - let _ = engine_handle.send(Op::Shutdown).await; - return Ok(()); - } + KeyCode::Char('d') + if key.modifiers.contains(KeyModifiers::CONTROL) && app.input.is_empty() => + { + let _ = engine_handle.send(Op::Shutdown).await; + return Ok(()); } KeyCode::Esc => match next_escape_action(app, slash_menu_open) { EscapeAction::CloseSlashMenu => app.close_slash_menu(), @@ -1193,10 +1199,12 @@ async fn run_event_loop( KeyCode::Up if key.modifiers.contains(KeyModifiers::ALT) => { app.scroll_up(3); } - KeyCode::Up if key.modifiers.is_empty() && slash_menu_open => { - if app.slash_menu_selected > 0 { - app.slash_menu_selected = app.slash_menu_selected.saturating_sub(1); - } + KeyCode::Up + if key.modifiers.is_empty() + && slash_menu_open + && app.slash_menu_selected > 0 => + { + app.slash_menu_selected = app.slash_menu_selected.saturating_sub(1); } KeyCode::Down if key.modifiers.contains(KeyModifiers::ALT) => { app.scroll_down(3); @@ -1224,7 +1232,7 @@ async fn run_event_loop( app.cycle_mode(); } KeyCode::BackTab => { - app.cycle_mode_reverse(); + app.cycle_effort(); } KeyCode::Char('g') if key.modifiers.is_empty() && app.input.is_empty() && !slash_menu_open => @@ -1243,18 +1251,20 @@ async fn run_event_loop( app.scroll_to_bottom(); } KeyCode::Char('[') - if key.modifiers.is_empty() && app.input.is_empty() && !slash_menu_open => + if key.modifiers.is_empty() + && app.input.is_empty() + && !slash_menu_open + && !jump_to_adjacent_tool_cell(app, SearchDirection::Backward) => { - if !jump_to_adjacent_tool_cell(app, SearchDirection::Backward) { - app.status_message = Some("No previous tool output".to_string()); - } + app.status_message = Some("No previous tool output".to_string()); } KeyCode::Char(']') - if key.modifiers.is_empty() && app.input.is_empty() && !slash_menu_open => + if key.modifiers.is_empty() + && app.input.is_empty() + && !slash_menu_open + && !jump_to_adjacent_tool_cell(app, SearchDirection::Forward) => { - if !jump_to_adjacent_tool_cell(app, SearchDirection::Forward) { - app.status_message = Some("No next tool output".to_string()); - } + app.status_message = Some("No next tool output".to_string()); } // Input handling KeyCode::Char('j') if key.modifiers.contains(KeyModifiers::CONTROL) => { @@ -1861,6 +1871,8 @@ async fn dispatch_user_message( } app.last_prompt_tokens = None; app.last_completion_tokens = None; + app.last_prompt_cache_hit_tokens = None; + app.last_prompt_cache_miss_tokens = None; // Persist immediately so abrupt termination can recover this in-flight turn. persist_checkpoint(app); @@ -1869,6 +1881,7 @@ async fn dispatch_user_message( content, mode: app.mode, model: app.model.clone(), + reasoning_effort: app.reasoning_effort.api_value().map(str::to_string), allow_shell: app.allow_shell, trust_mode: app.trust_mode, auto_approve: app.mode == AppMode::Yolo, @@ -2324,6 +2337,7 @@ fn render(f: &mut Frame, app: &mut App) { .and_then(|value| value.to_str()) .filter(|value| !value.is_empty()) .unwrap_or("workspace"); + let effort_label = app.reasoning_effort.short_label(); let header_data = HeaderData::new( app.mode, &app.model, @@ -2336,7 +2350,8 @@ fn render(f: &mut Frame, app: &mut App) { sanitized_context_window, app.session_cost, sanitized_prompt_tokens, - ); + ) + .with_reasoning_effort(Some(effort_label)); let header_widget = HeaderWidget::new(header_data); let buf = f.buffer_mut(); header_widget.render(chunks[0], buf); @@ -2966,6 +2981,8 @@ fn apply_loaded_session(app: &mut App, session: &SavedSession) { app.total_conversation_tokens = app.total_tokens; app.last_prompt_tokens = None; app.last_completion_tokens = None; + app.last_prompt_cache_hit_tokens = None; + app.last_prompt_cache_miss_tokens = None; app.current_session_id = Some(session.metadata.id.clone()); app.workspace_context = None; app.workspace_context_refreshed_at = None; @@ -3180,6 +3197,7 @@ fn render_footer(f: &mut Frame, area: Rect, app: &mut App) { fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec> { let context_spans = footer_context_spans(app); + let cache_spans = footer_cache_spans(app); let cost_spans = if app.session_cost > 0.001 { vec![Span::styled( format!("${:.2}", app.session_cost), @@ -3190,6 +3208,20 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec> { }; let mut candidates = Vec::new(); + if !context_spans.is_empty() && !cache_spans.is_empty() && !cost_spans.is_empty() { + let mut combined = context_spans.clone(); + combined.push(Span::raw(" ")); + combined.extend(cache_spans.clone()); + combined.push(Span::raw(" ")); + combined.extend(cost_spans.clone()); + candidates.push(combined); + } + if !context_spans.is_empty() && !cache_spans.is_empty() { + let mut combined = context_spans.clone(); + combined.push(Span::raw(" ")); + combined.extend(cache_spans.clone()); + candidates.push(combined); + } if !context_spans.is_empty() && !cost_spans.is_empty() { let mut combined = context_spans.clone(); combined.push(Span::raw(" ")); @@ -3199,6 +3231,9 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec> { if !context_spans.is_empty() { candidates.push(context_spans); } + if !cache_spans.is_empty() { + candidates.push(cache_spans); + } if !cost_spans.is_empty() { candidates.push(cost_spans); } @@ -3210,6 +3245,23 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec> { .unwrap_or_default() } +fn footer_cache_spans(app: &App) -> Vec> { + let Some(hit_tokens) = app.last_prompt_cache_hit_tokens else { + return Vec::new(); + }; + let miss_tokens = app.last_prompt_cache_miss_tokens.unwrap_or(0); + let total = hit_tokens.saturating_add(miss_tokens); + if total == 0 { + return Vec::new(); + } + + let percent = (f64::from(hit_tokens) / f64::from(total) * 100.0).clamp(0.0, 100.0); + vec![Span::styled( + format!("cache {:.0}%", percent), + Style::default().fg(palette::TEXT_MUTED), + )] +} + fn footer_context_spans(app: &App) -> Vec> { let (_, _, percent) = match context_usage_snapshot(app) { Some(snapshot) => snapshot, @@ -3612,12 +3664,10 @@ fn handle_mouse_event(app: &mut App, mouse: MouseEvent) { app.transcript_selection.head = Some(point); } } - MouseEventKind::Up(MouseButton::Left) => { - if app.transcript_selection.dragging { - app.transcript_selection.dragging = false; - if selection_has_content(app) { - copy_active_selection(app); - } + MouseEventKind::Up(MouseButton::Left) if app.transcript_selection.dragging => { + app.transcript_selection.dragging = false; + if selection_has_content(app) { + copy_active_selection(app); } } _ => {} diff --git a/crates/tui/src/tui/ui/tests.rs b/crates/tui/src/tui/ui/tests.rs index 36746f75..bf51a038 100644 --- a/crates/tui/src/tui/ui/tests.rs +++ b/crates/tui/src/tui/ui/tests.rs @@ -322,6 +322,8 @@ fn footer_auxiliary_spans_prioritize_context_when_busy() { let mut app = create_test_app(); app.is_loading = true; app.last_prompt_tokens = Some(48_000); + app.last_prompt_cache_hit_tokens = Some(36_000); + app.last_prompt_cache_miss_tokens = Some(12_000); app.session_cost = 12.34; let compact = spans_text(&footer_auxiliary_spans(&app, 8)); @@ -332,13 +334,27 @@ fn footer_auxiliary_spans_prioritize_context_when_busy() { let roomy = spans_text(&footer_auxiliary_spans(&app, 20)); assert!(roomy.contains("ctx")); assert!(roomy.contains('%')); + assert!(roomy.contains("cache")); +} + +#[test] +fn footer_auxiliary_spans_can_display_cache_and_cost_when_roomy() { + let mut app = create_test_app(); + app.last_prompt_tokens = Some(48_000); + app.last_prompt_cache_hit_tokens = Some(36_000); + app.last_prompt_cache_miss_tokens = Some(12_000); + app.session_cost = 12.34; + + let roomy = spans_text(&footer_auxiliary_spans(&app, 32)); + assert!(roomy.contains("ctx")); + assert!(roomy.contains("cache 75%")); assert!(roomy.contains("$12.34")); } #[test] fn context_usage_snapshot_prefers_estimate_when_reported_exceeds_window() { let mut app = create_test_app(); - app.last_prompt_tokens = Some(320_000); + app.last_prompt_tokens = Some(1_200_000); app.api_messages = vec![Message { role: "user".to_string(), content: vec![ContentBlock::Text { @@ -349,7 +365,7 @@ fn context_usage_snapshot_prefers_estimate_when_reported_exceeds_window() { let (used, max, percent) = context_usage_snapshot(&app).expect("context usage should be available"); - assert_eq!(max, 128_000); + assert_eq!(max, 1_000_000); assert!(used > 0); assert!(used <= i64::from(max)); assert!(percent < 100.0); @@ -372,7 +388,7 @@ fn context_usage_snapshot_prefers_live_estimate_while_loading() { let (used, max, percent) = context_usage_snapshot(&app).expect("context usage should be available"); assert_eq!(used, estimated); - assert_eq!(max, 128_000); + assert_eq!(max, 1_000_000); assert!(used > i64::from(app.last_prompt_tokens.expect("reported tokens"))); assert!(percent > 0.0); } @@ -380,7 +396,7 @@ fn context_usage_snapshot_prefers_live_estimate_while_loading() { #[test] fn should_auto_compact_before_send_respects_threshold_and_setting() { let mut app = create_test_app(); - app.last_prompt_tokens = Some(123_000); + app.last_prompt_tokens = Some(950_000); app.auto_compact = true; assert!(should_auto_compact_before_send(&app)); diff --git a/crates/tui/src/tui/views/mod.rs b/crates/tui/src/tui/views/mod.rs index bafa1e75..3c808e92 100644 --- a/crates/tui/src/tui/views/mod.rs +++ b/crates/tui/src/tui/views/mod.rs @@ -580,7 +580,7 @@ impl ConfigView { fn config_hint_for_key(key: &str) -> &'static str { match key { "model" => { - "deepseek-chat | deepseek-reasoner | deepseek-* (aliases: deepseek-v3, deepseek-v3.2, deepseek-r1)" + "deepseek-v4-pro | deepseek-v4-flash | deepseek-* (aliases: deepseek-chat, deepseek-reasoner, deepseek-v3, deepseek-v3.2, deepseek-r1)" } "approval_mode" => "auto | suggest | never", "auto_compact" | "calm_mode" | "low_motion" | "show_thinking" | "show_tool_details" @@ -591,7 +591,7 @@ fn config_hint_for_key(key: &str) -> &'static str { "sidebar_width" => "10..=50", "sidebar_focus" => "auto | plan | todos | tasks | agents", "max_history" => "integer (0 allowed)", - "default_model" => "deepseek-chat | deepseek-reasoner | deepseek-* | none/default", + "default_model" => "deepseek-v4-pro | deepseek-v4-flash | deepseek-* | none/default", _ => "", } } @@ -1256,7 +1256,7 @@ impl ModalView for SubAgentsView { if idx > 0 { summary.push(Span::raw(" · ")); } - summary.extend(part.into_iter()); + summary.extend(part); } lines.push(Line::from(summary)); lines.push(Line::from(Span::styled( diff --git a/crates/tui/src/tui/widgets/header.rs b/crates/tui/src/tui/widgets/header.rs index 6a7a0598..6b482a38 100644 --- a/crates/tui/src/tui/widgets/header.rs +++ b/crates/tui/src/tui/widgets/header.rs @@ -33,6 +33,9 @@ pub struct HeaderData<'a> { pub session_cost: f64, /// Input tokens from the most recent API call (current context utilization). pub last_prompt_tokens: Option, + /// Short label for the current reasoning-effort tier (e.g. "max", "high", + /// "off"). Rendered as a chip when space allows. + pub reasoning_effort_label: Option<&'a str>, } impl<'a> HeaderData<'a> { @@ -55,9 +58,17 @@ impl<'a> HeaderData<'a> { context_window: None, session_cost: 0.0, last_prompt_tokens: None, + reasoning_effort_label: None, } } + /// Attach a short reasoning-effort label for the header chip. + #[must_use] + pub fn with_reasoning_effort(mut self, label: Option<&'a str>) -> Self { + self.reasoning_effort_label = label; + self + } + /// Set token/cost fields. #[must_use] pub fn with_usage( @@ -191,6 +202,27 @@ impl<'a> HeaderWidget<'a> { )] } + fn effort_chip_spans(&self, include_prefix: bool) -> Vec> { + let Some(label) = self.data.reasoning_effort_label else { + return Vec::new(); + }; + let trimmed = label.trim(); + if trimmed.is_empty() { + return Vec::new(); + } + let color = if trimmed.eq_ignore_ascii_case("off") { + palette::TEXT_HINT + } else { + palette::DEEPSEEK_SKY + }; + let body = if include_prefix { + format!("⚡{trimmed}") + } else { + trimmed.to_string() + }; + vec![Span::styled(body, Style::default().fg(color))] + } + fn status_variant( &self, show_stream_label: bool, @@ -199,7 +231,16 @@ impl<'a> HeaderWidget<'a> { ) -> Vec> { let mut spans = Vec::new(); + let effort_spans = self.effort_chip_spans(true); + let has_effort = !effort_spans.is_empty(); + if has_effort { + spans.extend(effort_spans); + } + if self.data.is_streaming { + if has_effort { + spans.push(Span::raw(" ")); + } spans.push(Span::styled( "●", Style::default() diff --git a/crates/tui/src/working_set.rs b/crates/tui/src/working_set.rs index 974223ff..d0b7ad2e 100644 --- a/crates/tui/src/working_set.rs +++ b/crates/tui/src/working_set.rs @@ -274,7 +274,7 @@ impl WorkingSet { .values() .map(|entry| (entry.path.clone(), score_entry(entry, self.turn))) .collect(); - ranked.sort_by(|a, b| a.1.cmp(&b.1)); + ranked.sort_by_key(|a| a.1); let to_remove = self.entries.len().saturating_sub(max_entries); for (path, _) in ranked.into_iter().take(to_remove) { diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 904b6ece..a8495b5d 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -81,18 +81,21 @@ Current boundary note: ### LLM Integration -- **`client.rs`** - HTTP client for DeepSeek's OpenAI-compatible Responses API (with chat fallback) +- **`client.rs`** - HTTP client for DeepSeek's documented OpenAI-compatible Chat Completions API - **`llm_client.rs`** - Abstract LLM client trait with retry logic - **`models.rs`** - Data structures for API requests/responses #### DeepSeek API Endpoints DeepSeek exposes OpenAI-compatible endpoints. The CLI uses: -- `https://api.deepseek.com/v1/responses` - preferred Responses API -- `https://api.deepseek.com/v1/chat/completions` - fallback if Responses is unavailable +- `https://api.deepseek.com/v1/chat/completions` - normal and streaming model turns +- `https://api.deepseek.com/v1/models` - live model discovery and health checks -The engine uses `handle_deepseek_turn()` to drive the agent loop against the -Responses API (with automatic fallback if needed). +`https://api.deepseek.com/v1` is accepted for OpenAI SDK compatibility, and +`https://api.deepseek.com/beta` can be configured for beta-only features such as +strict tool mode, chat prefix completion, and FIM completion. The public +DeepSeek docs do not document a Responses API path for this workflow; the engine +drives turns through Chat Completions. ### Tool System diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 0c3957e2..9a14aae4 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -15,6 +15,11 @@ Overrides: If both are set, `--config` wins. Environment variable overrides are applied after the file is loaded. +The `deepseek` facade and `deepseek-tui` binary share the same config file for +DeepSeek auth and model defaults. `deepseek login --api-key ...` writes the +root `api_key` field that `deepseek-tui` reads directly, and `deepseek --model +deepseek-chat` is forwarded to the TUI as `DEEPSEEK_MODEL`. + To bootstrap MCP and skills directories at their resolved paths, run `deepseek-tui setup`. To only scaffold MCP, run `deepseek-tui mcp init`. @@ -30,7 +35,7 @@ You can define multiple profiles in the same file: ```toml api_key = "PERSONAL_KEY" -default_text_model = "deepseek-reasoner" +default_text_model = "deepseek-v4-pro" [profiles.work] api_key = "WORK_KEY" @@ -50,6 +55,7 @@ These override config values: - `DEEPSEEK_API_KEY` - `DEEPSEEK_BASE_URL` +- `DEEPSEEK_MODEL` or `DEEPSEEK_DEFAULT_TEXT_MODEL` - `DEEPSEEK_SKILLS_DIR` - `DEEPSEEK_MCP_CONFIG` - `DEEPSEEK_NOTES_PATH` @@ -123,8 +129,9 @@ If you are upgrading from older releases: ### Core keys (used by the TUI/engine) - `api_key` (string, required): must be non-empty (or set `DEEPSEEK_API_KEY`). -- `base_url` (string, optional): defaults to `https://api.deepseek.com` (OpenAI-compatible Responses API). -- `default_text_model` (string, optional): defaults to `deepseek-reasoner`. Any valid DeepSeek model ID is accepted (common IDs: `deepseek-reasoner`, `deepseek-chat`). Use `/models` to discover live IDs from your configured endpoint. +- `base_url` (string, optional): defaults to `https://api.deepseek.com` for DeepSeek's OpenAI-compatible Chat Completions API. `https://api.deepseek.com/v1` is also accepted for SDK compatibility; use `https://api.deepseek.com/beta` only for DeepSeek beta features such as strict tool mode, chat prefix completion, and FIM completion. +- `default_text_model` (string, optional): defaults to `deepseek-v4-pro`. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash`. Use `/models` or `deepseek models` to discover live IDs from your configured endpoint. `DEEPSEEK_MODEL` overrides this for a single process. +- `reasoning_effort` (string, optional): `off`, `low`, `medium`, `high`, or `max`; defaults to the configured UI tier. `off` sends `thinking = {"type": "disabled"}`. `low` and `medium` are normalized to `high`; `max` sends `reasoning_effort = "max"`. - `allow_shell` (bool, optional): defaults to `true` (sandboxed). - `approval_policy` (string, optional): `on-request`, `untrusted`, or `never`. Runtime `approval_mode` editing in `/config` also accepts `on-request` and `untrusted` aliases. - `sandbox_mode` (string, optional): `read-only`, `workspace-write`, `danger-full-access`, `external-sandbox`. diff --git a/docs/MODES.md b/docs/MODES.md index 50100290..f5fd30a8 100644 --- a/docs/MODES.md +++ b/docs/MODES.md @@ -73,6 +73,7 @@ See `MCP.md`. Run `deepseek --help` for the canonical list. Common flags: - `-p, --prompt `: one-shot prompt mode (prints and exits) +- `--model `: when using the `deepseek` facade, forward a DeepSeek model override to the TUI - `--workspace `: workspace root for file tools - `--yolo`: start in YOLO mode - `-r, --resume `: resume a saved session diff --git a/npm/deepseek-tui/README.md b/npm/deepseek-tui/README.md index fc0c0fd0..482cb246 100644 --- a/npm/deepseek-tui/README.md +++ b/npm/deepseek-tui/README.md @@ -20,6 +20,23 @@ npx deepseek-tui --help `postinstall` downloads platform binaries into `bin/downloads/` and exposes `deepseek` and `deepseek-tui` commands. +## First run + +```bash +deepseek login --api-key "YOUR_DEEPSEEK_API_KEY" +deepseek doctor +deepseek +``` + +The `deepseek` facade and `deepseek-tui` binary share `~/.deepseek/config.toml` +for DeepSeek auth and default model settings. Common TUI commands are available +directly through the facade, including `deepseek doctor`, `deepseek models`, +`deepseek sessions`, and `deepseek resume --last`. + +The app talks to DeepSeek's documented OpenAI-compatible Chat Completions API. +Set `DEEPSEEK_BASE_URL` only if you need the China endpoint or DeepSeek beta +features such as strict tool mode, chat prefix completion, or FIM completion. + ## Supported platforms - Linux x64 diff --git a/npm/deepseek-tui/package.json b/npm/deepseek-tui/package.json index de1548ea..003d2490 100644 --- a/npm/deepseek-tui/package.json +++ b/npm/deepseek-tui/package.json @@ -1,7 +1,7 @@ { "name": "deepseek-tui", - "version": "0.3.33", - "deepseekBinaryVersion": "0.3.33", + "version": "0.4.0", + "deepseekBinaryVersion": "0.4.0", "description": "Install and run deepseek and deepseek-tui binaries from GitHub release artifacts.", "author": "Hmbown", "license": "MIT", diff --git a/npm/deepseek-tui/scripts/install.js b/npm/deepseek-tui/scripts/install.js index a325938e..0afae017 100644 --- a/npm/deepseek-tui/scripts/install.js +++ b/npm/deepseek-tui/scripts/install.js @@ -150,7 +150,7 @@ async function ensureBinary(targetPath, assetName, version, repo, checksums) { } } const url = releaseAssetUrl(assetName, version, repo); - const destination = `${targetPath}.download`; + const destination = `${targetPath}.${process.pid}.${Date.now()}.download`; await download(url, destination); try { await verifyChecksum(destination, assetName, checksums);