diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
index 3a589933..57ce9cec 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -115,3 +115,21 @@ jobs:
         with:
           files: artifacts/*/*
           prerelease: false
+
+  publish-npm:
+    needs: release
+    runs-on: ubuntu-latest
+    # Trusted Publishing via OIDC — no NPM_TOKEN required.
+    # Configure the trusted publisher on the npm package settings page.
+    permissions:
+      contents: read
+      id-token: write
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '20'
+          registry-url: 'https://registry.npmjs.org'
+      - name: Publish wrapper to npm
+        working-directory: npm/deepseek-tui
+        run: npm publish --provenance --access public
diff --git a/.mailmap b/.mailmap
new file mode 100644
index 00000000..de638f42
--- /dev/null
+++ b/.mailmap
@@ -0,0 +1 @@
+Hunter Bown <hmbown@gmail.com> devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
diff --git a/AGENTS.md b/AGENTS.md
index 24d722d9..d9934cb5 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -17,9 +17,10 @@ See README.md for project overview, docs/ARCHITECTURE.md for internals.
 ## DeepSeek-Specific Notes
 
 - **Thinking Tokens**: DeepSeek models output thinking blocks (`ContentBlock::Thinking`) before final answers. The TUI streams and displays these with visual distinction.
-- **Reasoning Models**: `deepseek-reasoner` and `deepseek-r1` excel at step-by-step problem solving.
-- **Large Context Window**: 128k tokens. Use search tools to navigate efficiently.
-- **API**: OpenAI-compatible with Responses API preferred, chat completions as fallback. Base URL configurable for global (`api.deepseek.com`) or China (`api.deepseeki.com`).
+- **Reasoning Models**: `deepseek-v4-pro` and `deepseek-v4-flash` are the documented V4 model IDs. Legacy `deepseek-chat` and `deepseek-reasoner` are compatibility aliases for `deepseek-v4-flash`.
+- **Large Context Window**: DeepSeek V4 models have 1M-token context windows. Use search tools to navigate efficiently.
+- **API**: OpenAI-compatible Chat Completions (`/chat/completions`) is the documented DeepSeek API path. Base URL configurable for global (`api.deepseek.com`) or China (`api.deepseeki.com`); `/v1` is accepted for OpenAI SDK compatibility, and `/beta` is only needed for beta features such as strict tool mode, chat prefix completion, and FIM completion.
+- **Thinking + Tool Calls**: In V4 thinking mode, assistant messages that contain tool calls must replay their `reasoning_content` in all subsequent requests or the API returns HTTP 400.
 
 ## Important Notes
 
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 65163082..93f27ca0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Fixed
+- DeepSeek V4 thinking-mode tool calls now preserve prior assistant `reasoning_content` whenever a tool call is replayed, matching DeepSeek's multi-turn contract and avoiding HTTP 400 rejections on later turns.
+- Raw Chat Completions requests now send DeepSeek's top-level `thinking` parameter instead of the OpenAI SDK-only `extra_body` wrapper.
+- Context-window budgeting now treats legacy `deepseek-chat` / `deepseek-reasoner` aliases as V4 Flash's 1M-token context window.
+- npm wrapper first-run downloads now use process-unique temp files so concurrent `deepseek` / `deepseek-tui` invocations do not race on `*.download` files.
+
+## [0.4.0] - 2026-04-23
+
+### Added
+- **DeepSeek V4 support**: `deepseek-v4-pro` (flagship) and `deepseek-v4-flash` (fast/cheap) are now first-class model IDs with 1M context windows.
+- **Reasoning-effort tier**: new `reasoning_effort` config field (`off | low | medium | high | max`) mapped to DeepSeek's `reasoning_effort` + `thinking` request fields. Defaults to `max`.
+- **Shift+Tab cycles reasoning-effort** through the three behaviorally distinct tiers (`off → high → max`). The current tier is shown as a ⚡ chip in the header.
+- Per-model pricing table: `deepseek-v4-pro` priced at $0.145/$1.74/$3.48 per 1M tokens (cache-hit/miss/output); `deepseek-v4-flash` and legacy aliases at $0.028/$0.14/$0.28.
+
+### Changed
+- **Default model flipped to `deepseek-v4-pro`** (from `deepseek-reasoner`).
+- `deepseek-chat` / `deepseek-reasoner` remain as silent aliases of `deepseek-v4-flash` for API compatibility; priced identically.
+- **Context compaction**: raised `MAX_COMPACTION_MESSAGE_THRESHOLD` from 150 → 500 so 1M-context models can use proportionally more history before message-count compaction. Token-based compaction still triggers at 80% of the window and scales automatically.
+- Cycling modes is now Tab-only; Shift+Tab is repurposed for reasoning-effort (reverse-mode cycle was low-value with only three modes).
+- Updated help/hint strings, validator error messages, and the model picker to reference V4 IDs.
+
+### Fixed
+- `requires_reasoning_content` now recognizes `deepseek-v4*` so thinking streams render correctly on V4 models.
+
 ## [0.3.33] - 2026-04-11
 
 ### Changed
@@ -474,4 +498,4 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 [0.1.7]: https://github.com/Hmbown/DeepSeek-TUI/compare/v0.1.6...v0.1.7
 [0.1.6]: https://github.com/Hmbown/DeepSeek-TUI/compare/v0.1.5...v0.1.6
 [0.1.5]: https://github.com/Hmbown/DeepSeek-TUI/compare/v0.1.0...v0.1.5
-[0.1.0]: https://github.com/Hmbown/DeepSeek-TUI/releases/tag/v0.1.0
\ No newline at end of file
+[0.1.0]: https://github.com/Hmbown/DeepSeek-TUI/releases/tag/v0.1.0
diff --git a/Cargo.lock b/Cargo.lock
index 0661f52c..66379d59 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -806,7 +806,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-agent"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "deepseek-config",
  "serde",
@@ -814,7 +814,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-app-server"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "axum",
@@ -837,7 +837,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-config"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "dirs",
@@ -848,7 +848,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-core"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "chrono",
@@ -867,7 +867,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-execpolicy"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "deepseek-protocol",
@@ -876,7 +876,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-hooks"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "async-trait",
@@ -890,7 +890,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-mcp"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "deepseek-protocol",
@@ -900,7 +900,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-protocol"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "serde",
  "serde_json",
@@ -908,7 +908,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-state"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "chrono",
@@ -920,7 +920,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-tools"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "async-trait",
@@ -933,7 +933,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-tui"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "arboard",
@@ -987,7 +987,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-tui-cli"
-version = "0.3.33"
+version = "0.4.0"
 dependencies = [
  "anyhow",
  "chrono",
@@ -1005,7 +1005,7 @@ dependencies = [
 
 [[package]]
 name = "deepseek-tui-core"
-version = "0.3.33"
+version = "0.4.0"
 
 [[package]]
 name = "deranged"
diff --git a/Cargo.toml b/Cargo.toml
index 2c273331..b1b92600 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -18,7 +18,7 @@ default-members = ["crates/cli", "crates/app-server", "crates/tui"]
 resolver = "2"
 
 [workspace.package]
-version = "0.3.33"
+version = "0.4.0"
 edition = "2024"
 license = "MIT"
 repository = "https://github.com/Hmbown/DeepSeek-TUI"
diff --git a/README.md b/README.md
index 70b46717..ef7f0b9c 100644
--- a/README.md
+++ b/README.md
@@ -21,15 +21,18 @@ npm install -g deepseek-tui
 Start the TUI:
 
 ```bash
-deepseek-tui
+deepseek
 ```
 
 On first launch, it will prompt for your API key if one is not already configured.
+The package also installs `deepseek-tui`; both commands share the same
+`~/.deepseek/config.toml` for DeepSeek auth and default model settings.
 
 You can also set auth ahead of time with either of these:
 
 ```bash
-deepseek-tui login
+deepseek login --api-key "YOUR_DEEPSEEK_API_KEY"
+deepseek-tui login --api-key "YOUR_DEEPSEEK_API_KEY"
 DEEPSEEK_API_KEY="YOUR_DEEPSEEK_API_KEY" deepseek-tui
 ```
 
@@ -59,7 +62,7 @@ surface immediately.
 
 A terminal coding agent for DeepSeek models with file editing, shell execution, `web.run` browsing, git operations, session resume, and [MCP](https://modelcontextprotocol.io) server integration.
 
-Three visible modes (**Tab** / **Shift+Tab** to cycle):
+Three visible modes (**Tab** to cycle):
 
 | Mode | Behavior |
 |------|----------|
@@ -67,16 +70,35 @@ Three visible modes (**Tab** / **Shift+Tab** to cycle):
 | **Agent** | Default interactive mode with multi-step tool use |
 | **YOLO** | Auto-approve tools in a trusted workspace |
 
+**Shift+Tab** cycles the reasoning-effort tier for DeepSeek thinking mode:
+`off` → `high` → `max`. The current tier is shown as a ⚡ chip in the header.
+Set a default in config with `reasoning_effort = "max"` (or `off` / `low` /
+`medium` / `high`).
+
+## Models & pricing
+
+| Model | Thinking | Context | Input cache hit | Input cache miss | Output |
+|---|---|---|---|---|---|
+| `deepseek-v4-pro` | default | 1M | $0.145 / 1M | $1.74 / 1M | $3.48 / 1M |
+| `deepseek-v4-flash` | default | 1M | $0.028 / 1M | $0.14 / 1M | $0.28 / 1M |
+
+Legacy `deepseek-chat` and `deepseek-reasoner` remain as silent aliases for
+`deepseek-v4-flash` (priced identically). Pricing is per 1M tokens as published
+by DeepSeek and is subject to change.
+
 ## Usage
 
 ```bash
-deepseek-tui                                  # interactive TUI
-deepseek-tui -p "explain this in 2 sentences" # one-shot prompt
-deepseek-tui --yolo                           # YOLO mode
-deepseek-tui login                            # save API key to config
-deepseek-tui doctor                           # check setup
-deepseek-tui models                           # list available models
-deepseek-tui serve --http                     # HTTP/SSE API server
+deepseek                                      # interactive TUI
+deepseek "explain this in 2 sentences"        # one-shot prompt
+deepseek --model deepseek-v4-flash "summarize" # one-shot with model override
+deepseek --yolo                               # YOLO mode
+deepseek login --api-key "..."                # save API key to shared config
+deepseek doctor                               # check setup
+deepseek models                               # list live DeepSeek API models
+deepseek sessions                             # list saved sessions
+deepseek resume --last                        # resume the latest session
+deepseek serve --http                         # HTTP/SSE API server
 ```
 
 Controls: `F1` help, `Esc` backs out of the current action, `Ctrl+K` command palette.
@@ -85,7 +107,13 @@ Controls: `F1` help, `Esc` backs out of the current action, `Ctrl+K` command pal
 
 `~/.deepseek/config.toml` — see [config.example.toml](config.example.toml) for all options.
 
-Key environment overrides: `DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`, `DEEPSEEK_PROFILE`.
+Key environment overrides: `DEEPSEEK_API_KEY`, `DEEPSEEK_BASE_URL`,
+`DEEPSEEK_MODEL`, `DEEPSEEK_PROFILE`.
+
+The client targets DeepSeek's documented OpenAI-compatible Chat Completions API
+(`/chat/completions`). DeepSeek context caching is automatic; when the API
+returns cache hit/miss token fields, the TUI includes them in usage and cost
+tracking.
 
 Full reference: [docs/CONFIGURATION.md](docs/CONFIGURATION.md).
 
diff --git a/config.example.toml b/config.example.toml
index befdab9e..bff40a73 100644
--- a/config.example.toml
+++ b/config.example.toml
@@ -16,11 +16,28 @@ api_key = "YOUR_DEEPSEEK_API_KEY" # must be non-empty
 # ─────────────────────────────────────────────────────────────────────────────────
 base_url = "https://api.deepseek.com"
 # base_url = "https://api.deepseeki.com"         # China users
+# base_url = "https://api.deepseek.com/beta"     # DeepSeek beta features such as strict tool mode
 
 # ─────────────────────────────────────────────────────────────────────────────────
 # Default Models
 # ─────────────────────────────────────────────────────────────────────────────────
-default_text_model = "deepseek-reasoner" # common IDs: deepseek-reasoner, deepseek-chat (any deepseek-* ID is valid)
+# DeepSeek V4 family:
+#   deepseek-v4-pro    — flagship reasoning model
+#   deepseek-v4-flash  — fast, cost-efficient (legacy aliases: deepseek-chat, deepseek-reasoner)
+default_text_model = "deepseek-v4-pro"
+
+# ─────────────────────────────────────────────────────────────────────────────────
+# Thinking Mode (DeepSeek V4 reasoning effort)
+# ─────────────────────────────────────────────────────────────────────────────────
+# "off"    — disables chain-of-thought (thinking.type = disabled)
+# "low"    — compat-maps to "high" server-side
+# "medium" — compat-maps to "high" server-side
+# "high"   — reasoning_effort = high (DeepSeek default)
+# "max"    — reasoning_effort = max (deepest reasoning)
+#
+# Shift+Tab in the TUI cycles between off / high / max. The header shows the
+# current tier as a ⚡ chip.
+reasoning_effort = "max"
 
 # ─────────────────────────────────────────────────────────────────────────────────
 # Paths
@@ -81,7 +98,7 @@ exponential_base = 2.0
 # token_threshold = 50000          # Trigger compaction above this token estimate
 # message_threshold = 50           # Or above this message count
 # model = "deepseek-chat"           # Model to use for summarization
-# cache_summary = true             # Cache the summary block
+# cache_summary = true             # Keep summary blocks stable; DeepSeek context caching is automatic
 
 # ─────────────────────────────────────────────────────────────────────────────────
 # Capacity Controller (runtime pressure guardrails)
diff --git a/crates/agent/Cargo.toml b/crates/agent/Cargo.toml
index 34e4ccfc..b33f1f41 100644
--- a/crates/agent/Cargo.toml
+++ b/crates/agent/Cargo.toml
@@ -7,5 +7,5 @@ repository.workspace = true
 description = "Model/provider registry and fallback strategy for DeepSeek workspace architecture"
 
 [dependencies]
-deepseek-config = { path = "../config", version = "0.3.33" }
+deepseek-config = { path = "../config", version = "0.4.0" }
 serde.workspace = true
diff --git a/crates/agent/src/lib.rs b/crates/agent/src/lib.rs
index 44b20713..f88c6708 100644
--- a/crates/agent/src/lib.rs
+++ b/crates/agent/src/lib.rs
@@ -30,18 +30,24 @@ impl Default for ModelRegistry {
     fn default() -> Self {
         let models = vec![
             ModelInfo {
-                id: "deepseek-reasoner".to_string(),
+                id: "deepseek-v4-pro".to_string(),
                 provider: ProviderKind::Deepseek,
-                aliases: vec!["deepseek-r1".to_string()],
+                aliases: vec![],
                 supports_tools: true,
                 supports_reasoning: true,
             },
             ModelInfo {
-                id: "deepseek-chat".to_string(),
+                id: "deepseek-v4-flash".to_string(),
                 provider: ProviderKind::Deepseek,
-                aliases: vec!["deepseek-v3".to_string(), "deepseek-v3.2".to_string()],
+                aliases: vec![
+                    "deepseek-chat".to_string(),
+                    "deepseek-reasoner".to_string(),
+                    "deepseek-r1".to_string(),
+                    "deepseek-v3".to_string(),
+                    "deepseek-v3.2".to_string(),
+                ],
                 supports_tools: true,
-                supports_reasoning: false,
+                supports_reasoning: true,
             },
             ModelInfo {
                 id: "gpt-4.1".to_string(),
@@ -112,13 +118,13 @@ impl ModelRegistry {
         }
 
         let final_fallback = self.models.first().cloned().unwrap_or(ModelInfo {
-            id: "deepseek-reasoner".to_string(),
+            id: "deepseek-v4-pro".to_string(),
             provider: ProviderKind::Deepseek,
             aliases: Vec::new(),
             supports_tools: true,
             supports_reasoning: true,
         });
-        fallback_chain.push("global_default:deepseek-reasoner".to_string());
+        fallback_chain.push("global_default:deepseek-v4-pro".to_string());
         ModelResolution {
             requested: requested.map(ToOwned::to_owned),
             resolved: final_fallback,
diff --git a/crates/app-server/Cargo.toml b/crates/app-server/Cargo.toml
index 88f8e74b..35415872 100644
--- a/crates/app-server/Cargo.toml
+++ b/crates/app-server/Cargo.toml
@@ -10,15 +10,15 @@ description = "Codex-style app-server transport for DeepSeek workspace architect
 anyhow.workspace = true
 axum.workspace = true
 clap.workspace = true
-deepseek-agent = { path = "../agent", version = "0.3.33" }
-deepseek-config = { path = "../config", version = "0.3.33" }
-deepseek-core = { path = "../core", version = "0.3.33" }
-deepseek-execpolicy = { path = "../execpolicy", version = "0.3.33" }
-deepseek-hooks = { path = "../hooks", version = "0.3.33" }
-deepseek-mcp = { path = "../mcp", version = "0.3.33" }
-deepseek-protocol = { path = "../protocol", version = "0.3.33" }
-deepseek-state = { path = "../state", version = "0.3.33" }
-deepseek-tools = { path = "../tools", version = "0.3.33" }
+deepseek-agent = { path = "../agent", version = "0.4.0" }
+deepseek-config = { path = "../config", version = "0.4.0" }
+deepseek-core = { path = "../core", version = "0.4.0" }
+deepseek-execpolicy = { path = "../execpolicy", version = "0.4.0" }
+deepseek-hooks = { path = "../hooks", version = "0.4.0" }
+deepseek-mcp = { path = "../mcp", version = "0.4.0" }
+deepseek-protocol = { path = "../protocol", version = "0.4.0" }
+deepseek-state = { path = "../state", version = "0.4.0" }
+deepseek-tools = { path = "../tools", version = "0.4.0" }
 serde.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
diff --git a/crates/cli/Cargo.toml b/crates/cli/Cargo.toml
index 8547b1f4..77e771d2 100644
--- a/crates/cli/Cargo.toml
+++ b/crates/cli/Cargo.toml
@@ -14,12 +14,12 @@ path = "src/main.rs"
 anyhow.workspace = true
 clap.workspace = true
 clap_complete.workspace = true
-deepseek-agent = { path = "../agent", version = "0.3.33" }
-deepseek-app-server = { path = "../app-server", version = "0.3.33" }
-deepseek-config = { path = "../config", version = "0.3.33" }
-deepseek-execpolicy = { path = "../execpolicy", version = "0.3.33" }
-deepseek-mcp = { path = "../mcp", version = "0.3.33" }
-deepseek-state = { path = "../state", version = "0.3.33" }
+deepseek-agent = { path = "../agent", version = "0.4.0" }
+deepseek-app-server = { path = "../app-server", version = "0.4.0" }
+deepseek-config = { path = "../config", version = "0.4.0" }
+deepseek-execpolicy = { path = "../execpolicy", version = "0.4.0" }
+deepseek-mcp = { path = "../mcp", version = "0.4.0" }
+deepseek-state = { path = "../state", version = "0.4.0" }
 chrono.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
diff --git a/crates/cli/src/main.rs b/crates/cli/src/main.rs
index 32869513..2d4355cb 100644
--- a/crates/cli/src/main.rs
+++ b/crates/cli/src/main.rs
@@ -10,7 +10,7 @@ use deepseek_agent::ModelRegistry;
 use deepseek_app_server::{
     AppServerOptions, run as run_app_server, run_stdio as run_app_server_stdio,
 };
-use deepseek_config::{CliRuntimeOverrides, ConfigStore, ProviderKind};
+use deepseek_config::{CliRuntimeOverrides, ConfigStore, ProviderKind, ResolvedRuntimeOptions};
 use deepseek_execpolicy::{AskForApproval, ExecPolicyContext, ExecPolicyEngine};
 use deepseek_mcp::{McpServerDefinition, run_stdio_server};
 use deepseek_state::{StateStore, ThreadListFilters};
@@ -42,7 +42,11 @@ struct Cli {
     config: Option<PathBuf>,
     #[arg(long)]
     profile: Option<String>,
-    #[arg(long, value_enum)]
+    #[arg(
+        long,
+        value_enum,
+        help = "Advanced provider selector for non-TUI registry/config commands"
+    )]
     provider: Option<ProviderArg>,
     #[arg(long)]
     model: Option<String>,
@@ -70,7 +74,37 @@ struct Cli {
 enum Commands {
     /// Run interactive/non-interactive flows via the TUI binary.
     Run(RunArgs),
-    /// Login using API key, ChatGPT token, or device code style session.
+    /// Run DeepSeek TUI diagnostics.
+    Doctor(TuiPassthroughArgs),
+    /// List live DeepSeek API models via the TUI binary.
+    Models(TuiPassthroughArgs),
+    /// List saved TUI sessions.
+    Sessions(TuiPassthroughArgs),
+    /// Resume a saved TUI session.
+    Resume(TuiPassthroughArgs),
+    /// Fork a saved TUI session.
+    Fork(TuiPassthroughArgs),
+    /// Create a default AGENTS.md in the current directory.
+    Init(TuiPassthroughArgs),
+    /// Bootstrap MCP config and/or skills directories.
+    Setup(TuiPassthroughArgs),
+    /// Run the DeepSeek TUI non-interactive agent command.
+    Exec(TuiPassthroughArgs),
+    /// Run a DeepSeek-powered code review over a git diff.
+    Review(TuiPassthroughArgs),
+    /// Apply a patch file or stdin to the working tree.
+    Apply(TuiPassthroughArgs),
+    /// Run the offline TUI evaluation harness.
+    Eval(TuiPassthroughArgs),
+    /// Manage TUI MCP servers.
+    Mcp(TuiPassthroughArgs),
+    /// Inspect TUI feature flags.
+    Features(TuiPassthroughArgs),
+    /// Run a local TUI server.
+    Serve(TuiPassthroughArgs),
+    /// Generate shell completions for the TUI binary.
+    Completions(TuiPassthroughArgs),
+    /// Save a DeepSeek API key to the shared config.
     Login(LoginArgs),
     /// Remove saved authentication state.
     Logout,
@@ -101,17 +135,23 @@ struct RunArgs {
     args: Vec<String>,
 }
 
+#[derive(Debug, Args, Clone)]
+struct TuiPassthroughArgs {
+    #[arg(trailing_var_arg = true, allow_hyphen_values = true)]
+    args: Vec<String>,
+}
+
 #[derive(Debug, Args)]
 struct LoginArgs {
-    #[arg(long, value_enum, default_value_t = ProviderArg::Deepseek)]
+    #[arg(long, value_enum, default_value_t = ProviderArg::Deepseek, hide = true)]
     provider: ProviderArg,
     #[arg(long)]
     api_key: Option<String>,
-    #[arg(long, default_value_t = false)]
+    #[arg(long, default_value_t = false, hide = true)]
     chatgpt: bool,
-    #[arg(long, default_value_t = false)]
+    #[arg(long, default_value_t = false, hide = true)]
     device_code: bool,
-    #[arg(long)]
+    #[arg(long, hide = true)]
     token: Option<String>,
 }
 
@@ -279,12 +319,57 @@ fn run() -> Result<()> {
         approval_policy: cli.approval_policy.clone(),
         sandbox_mode: cli.sandbox_mode.clone(),
     };
-    let _resolved_runtime = store.config.resolve_runtime_options(&runtime_overrides);
+    let resolved_runtime = store.config.resolve_runtime_options(&runtime_overrides);
 
     let command = cli.command.take();
 
     match command {
-        Some(Commands::Run(args)) => delegate_to_tui(&cli, args.args),
+        Some(Commands::Run(args)) => delegate_to_tui(&cli, &resolved_runtime, args.args),
+        Some(Commands::Doctor(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("doctor", args))
+        }
+        Some(Commands::Models(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("models", args))
+        }
+        Some(Commands::Sessions(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("sessions", args))
+        }
+        Some(Commands::Resume(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("resume", args))
+        }
+        Some(Commands::Fork(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("fork", args))
+        }
+        Some(Commands::Init(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("init", args))
+        }
+        Some(Commands::Setup(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("setup", args))
+        }
+        Some(Commands::Exec(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("exec", args))
+        }
+        Some(Commands::Review(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("review", args))
+        }
+        Some(Commands::Apply(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("apply", args))
+        }
+        Some(Commands::Eval(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("eval", args))
+        }
+        Some(Commands::Mcp(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("mcp", args))
+        }
+        Some(Commands::Features(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("features", args))
+        }
+        Some(Commands::Serve(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("serve", args))
+        }
+        Some(Commands::Completions(args)) => {
+            delegate_to_tui(&cli, &resolved_runtime, tui_args("completions", args))
+        }
         Some(Commands::Login(args)) => run_login_command(&mut store, args),
         Some(Commands::Logout) => run_logout_command(&mut store),
         Some(Commands::Auth(args)) => run_auth_command(&mut store, args.command),
@@ -305,11 +390,18 @@ fn run() -> Result<()> {
                 forwarded.push("--prompt".to_string());
                 forwarded.push(prompt);
             }
-            delegate_to_tui(&cli, forwarded)
+            delegate_to_tui(&cli, &resolved_runtime, forwarded)
         }
     }
 }
 
+fn tui_args(command: &str, args: TuiPassthroughArgs) -> Vec<String> {
+    let mut forwarded = Vec::with_capacity(args.args.len() + 1);
+    forwarded.push(command.to_string());
+    forwarded.extend(args.args);
+    forwarded
+}
+
 fn run_login_command(store: &mut ConfigStore, args: LoginArgs) -> Result<()> {
     let provider: ProviderKind = args.provider.into();
     store.config.provider = provider;
@@ -349,12 +441,33 @@ fn run_login_command(store: &mut ConfigStore, args: LoginArgs) -> Result<()> {
     };
     store.config.auth_mode = Some("api_key".to_string());
     store.config.providers.for_provider_mut(provider).api_key = Some(api_key);
+    if provider == ProviderKind::Deepseek {
+        store.config.api_key = store.config.providers.deepseek.api_key.clone();
+        if store.config.default_text_model.is_none() {
+            store.config.default_text_model = Some(
+                store
+                    .config
+                    .providers
+                    .deepseek
+                    .model
+                    .clone()
+                    .unwrap_or_else(|| "deepseek-v4-pro".to_string()),
+            );
+        }
+    }
     store.save()?;
-    println!("logged in using API key mode ({})", provider.as_str());
+    if provider == ProviderKind::Deepseek {
+        println!(
+            "logged in using API key mode (deepseek). This also updates the shared deepseek-tui config."
+        );
+    } else {
+        println!("logged in using API key mode ({})", provider.as_str());
+    }
     Ok(())
 }
 
 fn run_logout_command(store: &mut ConfigStore) -> Result<()> {
+    store.config.api_key = None;
     store.config.providers.deepseek.api_key = None;
     store.config.providers.openai.api_key = None;
     store.config.auth_mode = None;
@@ -382,6 +495,7 @@ fn run_auth_command(store: &mut ConfigStore, command: AuthCommand) -> Result<()>
                 .deepseek
                 .api_key
                 .as_ref()
+                .or(store.config.api_key.as_ref())
                 .is_some_and(|v| !v.trim().is_empty());
             let openai_file = store
                 .config
@@ -407,6 +521,9 @@ fn run_auth_command(store: &mut ConfigStore, command: AuthCommand) -> Result<()>
             };
             store.config.provider = provider;
             store.config.providers.for_provider_mut(provider).api_key = Some(api_key);
+            if provider == ProviderKind::Deepseek {
+                store.config.api_key = store.config.providers.deepseek.api_key.clone();
+            }
             store.save()?;
             println!("saved API key for {}", provider.as_str());
             Ok(())
@@ -414,6 +531,9 @@ fn run_auth_command(store: &mut ConfigStore, command: AuthCommand) -> Result<()>
         AuthCommand::Clear { provider } => {
             let provider: ProviderKind = provider.into();
             store.config.providers.for_provider_mut(provider).api_key = None;
+            if provider == ProviderKind::Deepseek {
+                store.config.api_key = None;
+            }
             store.save()?;
             println!("cleared API key for {}", provider.as_str());
             Ok(())
@@ -623,7 +743,11 @@ fn persist_mcp_server_definitions(
     store.save()
 }
 
-fn delegate_to_tui(cli: &Cli, passthrough: Vec<String>) -> Result<()> {
+fn delegate_to_tui(
+    cli: &Cli,
+    resolved_runtime: &ResolvedRuntimeOptions,
+    passthrough: Vec<String>,
+) -> Result<()> {
     let current = std::env::current_exe().context("failed to locate current executable path")?;
     let tui = current.with_file_name("deepseek-tui");
     if !tui.exists() {
@@ -642,6 +766,19 @@ fn delegate_to_tui(cli: &Cli, passthrough: Vec<String>) -> Result<()> {
     }
     cmd.args(passthrough);
 
+    if resolved_runtime.provider != ProviderKind::Deepseek {
+        bail!(
+            "The interactive TUI only supports the DeepSeek API. Remove --provider {} or use `deepseek model ...` for provider registry inspection.",
+            resolved_runtime.provider.as_str()
+        );
+    }
+
+    cmd.env("DEEPSEEK_MODEL", &resolved_runtime.model);
+    cmd.env("DEEPSEEK_BASE_URL", &resolved_runtime.base_url);
+    if let Some(api_key) = resolved_runtime.api_key.as_ref() {
+        cmd.env("DEEPSEEK_API_KEY", api_key);
+    }
+
     if let Some(provider) = cli.provider {
         cmd.env("DEEPSEEK_PROVIDER", ProviderKind::from(provider).as_str());
     }
@@ -931,6 +1068,67 @@ mod tests {
         ));
     }
 
+    #[test]
+    fn parses_direct_tui_command_aliases() {
+        let cli = parse_ok(&["deepseek", "doctor"]);
+        assert!(matches!(
+            cli.command,
+            Some(Commands::Doctor(TuiPassthroughArgs { ref args })) if args.is_empty()
+        ));
+
+        let cli = parse_ok(&["deepseek", "models", "--json"]);
+        assert!(matches!(
+            cli.command,
+            Some(Commands::Models(TuiPassthroughArgs { ref args })) if args == &["--json"]
+        ));
+
+        let cli = parse_ok(&["deepseek", "resume", "abc123"]);
+        assert!(matches!(
+            cli.command,
+            Some(Commands::Resume(TuiPassthroughArgs { ref args })) if args == &["abc123"]
+        ));
+
+        let cli = parse_ok(&["deepseek", "setup", "--skills", "--local"]);
+        assert!(matches!(
+            cli.command,
+            Some(Commands::Setup(TuiPassthroughArgs { ref args }))
+                if args == &["--skills", "--local"]
+        ));
+    }
+
+    #[test]
+    fn deepseek_login_writes_tui_compatible_config() {
+        let nanos = chrono::Utc::now().timestamp_nanos_opt().unwrap_or_default();
+        let path = std::env::temp_dir().join(format!(
+            "deepseek-cli-login-test-{}-{nanos}.toml",
+            std::process::id()
+        ));
+        let mut store = ConfigStore::load(Some(path.clone())).expect("store should load");
+
+        run_login_command(
+            &mut store,
+            LoginArgs {
+                provider: ProviderArg::Deepseek,
+                api_key: Some("sk-test".to_string()),
+                chatgpt: false,
+                device_code: false,
+                token: None,
+            },
+        )
+        .expect("login should write config");
+
+        assert_eq!(store.config.api_key.as_deref(), Some("sk-test"));
+        assert_eq!(
+            store.config.default_text_model.as_deref(),
+            Some("deepseek-v4-pro")
+        );
+        let saved = std::fs::read_to_string(&path).expect("config should be written");
+        assert!(saved.contains("api_key = \"sk-test\""));
+        assert!(saved.contains("default_text_model = \"deepseek-v4-pro\""));
+
+        let _ = std::fs::remove_file(path);
+    }
+
     #[test]
     fn parses_global_override_flags() {
         let cli = parse_ok(&[
@@ -981,6 +1179,11 @@ mod tests {
 
         for token in [
             "run",
+            "doctor",
+            "models",
+            "sessions",
+            "resume",
+            "setup",
             "login",
             "logout",
             "auth",
diff --git a/crates/config/src/lib.rs b/crates/config/src/lib.rs
index 61214775..1b0513af 100644
--- a/crates/config/src/lib.rs
+++ b/crates/config/src/lib.rs
@@ -6,7 +6,7 @@ use anyhow::{Context, Result, bail};
 use serde::{Deserialize, Serialize};
 
 pub const CONFIG_FILE_NAME: &str = "config.toml";
-const DEFAULT_DEEPSEEK_MODEL: &str = "deepseek-reasoner";
+const DEFAULT_DEEPSEEK_MODEL: &str = "deepseek-v4-pro";
 const DEFAULT_OPENAI_MODEL: &str = "gpt-4.1";
 const DEFAULT_DEEPSEEK_BASE_URL: &str = "https://api.deepseek.com";
 const DEFAULT_OPENAI_BASE_URL: &str = "https://api.openai.com/v1";
@@ -72,6 +72,13 @@ impl ProvidersToml {
 
 #[derive(Debug, Clone, Serialize, Deserialize, Default)]
 pub struct ConfigToml {
+    /// TUI-compatible DeepSeek API key. Kept at the root so both `deepseek`
+    /// and `deepseek-tui` can share a single config file.
+    pub api_key: Option<String>,
+    /// TUI-compatible DeepSeek base URL.
+    pub base_url: Option<String>,
+    /// TUI-compatible default DeepSeek model.
+    pub default_text_model: Option<String>,
     #[serde(default)]
     pub provider: ProviderKind,
     pub model: Option<String>,
@@ -94,6 +101,9 @@ impl ConfigToml {
     pub fn get_value(&self, key: &str) -> Option<String> {
         match key {
             "provider" => Some(self.provider.as_str().to_string()),
+            "api_key" => self.api_key.clone(),
+            "base_url" => self.base_url.clone(),
+            "default_text_model" => self.default_text_model.clone(),
             "model" => self.model.clone(),
             "auth.mode" => self.auth_mode.clone(),
             "auth.chatgpt_access_token" => self.chatgpt_access_token.clone(),
@@ -119,6 +129,9 @@ impl ConfigToml {
                 self.provider = ProviderKind::parse(value)
                     .with_context(|| format!("unknown provider '{value}'"))?;
             }
+            "api_key" => self.api_key = Some(value.to_string()),
+            "base_url" => self.base_url = Some(value.to_string()),
+            "default_text_model" => self.default_text_model = Some(value.to_string()),
             "model" => self.model = Some(value.to_string()),
             "auth.mode" => self.auth_mode = Some(value.to_string()),
             "auth.chatgpt_access_token" => self.chatgpt_access_token = Some(value.to_string()),
@@ -131,12 +144,20 @@ impl ConfigToml {
             "approval_policy" => self.approval_policy = Some(value.to_string()),
             "sandbox_mode" => self.sandbox_mode = Some(value.to_string()),
             "providers.deepseek.api_key" => {
-                self.providers.deepseek.api_key = Some(value.to_string())
+                let value = value.to_string();
+                self.providers.deepseek.api_key = Some(value.clone());
+                self.api_key = Some(value);
             }
             "providers.deepseek.base_url" => {
-                self.providers.deepseek.base_url = Some(value.to_string());
+                let value = value.to_string();
+                self.providers.deepseek.base_url = Some(value.clone());
+                self.base_url = Some(value);
+            }
+            "providers.deepseek.model" => {
+                let value = value.to_string();
+                self.providers.deepseek.model = Some(value.clone());
+                self.default_text_model = Some(value);
             }
-            "providers.deepseek.model" => self.providers.deepseek.model = Some(value.to_string()),
             "providers.openai.api_key" => self.providers.openai.api_key = Some(value.to_string()),
             "providers.openai.base_url" => self.providers.openai.base_url = Some(value.to_string()),
             "providers.openai.model" => self.providers.openai.model = Some(value.to_string()),
@@ -151,6 +172,9 @@ impl ConfigToml {
     pub fn unset_value(&mut self, key: &str) -> Result<()> {
         match key {
             "provider" => self.provider = ProviderKind::Deepseek,
+            "api_key" => self.api_key = None,
+            "base_url" => self.base_url = None,
+            "default_text_model" => self.default_text_model = None,
             "model" => self.model = None,
             "auth.mode" => self.auth_mode = None,
             "auth.chatgpt_access_token" => self.chatgpt_access_token = None,
@@ -160,9 +184,18 @@ impl ConfigToml {
             "telemetry" => self.telemetry = None,
             "approval_policy" => self.approval_policy = None,
             "sandbox_mode" => self.sandbox_mode = None,
-            "providers.deepseek.api_key" => self.providers.deepseek.api_key = None,
-            "providers.deepseek.base_url" => self.providers.deepseek.base_url = None,
-            "providers.deepseek.model" => self.providers.deepseek.model = None,
+            "providers.deepseek.api_key" => {
+                self.providers.deepseek.api_key = None;
+                self.api_key = None;
+            }
+            "providers.deepseek.base_url" => {
+                self.providers.deepseek.base_url = None;
+                self.base_url = None;
+            }
+            "providers.deepseek.model" => {
+                self.providers.deepseek.model = None;
+                self.default_text_model = None;
+            }
             "providers.openai.api_key" => self.providers.openai.api_key = None,
             "providers.openai.base_url" => self.providers.openai.base_url = None,
             "providers.openai.model" => self.providers.openai.model = None,
@@ -178,6 +211,15 @@ impl ConfigToml {
         let mut out = BTreeMap::new();
         out.insert("provider".to_string(), self.provider.as_str().to_string());
 
+        if let Some(v) = self.api_key.as_ref() {
+            out.insert("api_key".to_string(), redact_secret(v));
+        }
+        if let Some(v) = self.base_url.as_ref() {
+            out.insert("base_url".to_string(), v.clone());
+        }
+        if let Some(v) = self.default_text_model.as_ref() {
+            out.insert("default_text_model".to_string(), v.clone());
+        }
         if let Some(v) = self.model.as_ref() {
             out.insert("model".to_string(), v.clone());
         }
@@ -236,17 +278,28 @@ impl ConfigToml {
         let provider = cli.provider.or(env.provider).unwrap_or(self.provider);
 
         let provider_cfg = self.providers.for_provider(provider);
+        let root_deepseek_api_key = (provider == ProviderKind::Deepseek)
+            .then(|| self.api_key.clone())
+            .flatten();
+        let root_deepseek_base_url = (provider == ProviderKind::Deepseek)
+            .then(|| self.base_url.clone())
+            .flatten();
+        let root_deepseek_model = (provider == ProviderKind::Deepseek)
+            .then(|| self.default_text_model.clone())
+            .flatten();
         let api_key = cli
             .api_key
             .clone()
             .or_else(|| env.api_key_for(provider))
-            .or_else(|| provider_cfg.api_key.clone());
+            .or_else(|| provider_cfg.api_key.clone())
+            .or(root_deepseek_api_key);
 
         let base_url = cli
             .base_url
             .clone()
             .or_else(|| env.base_url_for(provider))
             .or_else(|| provider_cfg.base_url.clone())
+            .or(root_deepseek_base_url)
             .unwrap_or_else(|| match provider {
                 ProviderKind::Deepseek => DEFAULT_DEEPSEEK_BASE_URL.to_string(),
                 ProviderKind::Openai => DEFAULT_OPENAI_BASE_URL.to_string(),
@@ -257,6 +310,7 @@ impl ConfigToml {
             .clone()
             .or_else(|| env.model.clone())
             .or_else(|| provider_cfg.model.clone())
+            .or(root_deepseek_model)
             .or_else(|| self.model.clone())
             .unwrap_or_else(|| match provider {
                 ProviderKind::Deepseek => DEFAULT_DEEPSEEK_MODEL.to_string(),
@@ -475,3 +529,117 @@ impl EnvRuntimeOverrides {
         }
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::env;
+    use std::ffi::OsString;
+    use std::sync::{Mutex, OnceLock};
+
+    fn env_lock() -> std::sync::MutexGuard<'static, ()> {
+        static LOCK: OnceLock<Mutex<()>> = OnceLock::new();
+        LOCK.get_or_init(|| Mutex::new(())).lock().unwrap()
+    }
+
+    struct EnvGuard {
+        deepseek_api_key: Option<OsString>,
+        deepseek_base_url: Option<OsString>,
+        deepseek_model: Option<OsString>,
+        deepseek_provider: Option<OsString>,
+    }
+
+    impl EnvGuard {
+        fn without_deepseek_runtime_overrides() -> Self {
+            let guard = Self {
+                deepseek_api_key: env::var_os("DEEPSEEK_API_KEY"),
+                deepseek_base_url: env::var_os("DEEPSEEK_BASE_URL"),
+                deepseek_model: env::var_os("DEEPSEEK_MODEL"),
+                deepseek_provider: env::var_os("DEEPSEEK_PROVIDER"),
+            };
+            // Safety: test-only environment mutation guarded by a module mutex.
+            unsafe {
+                env::remove_var("DEEPSEEK_API_KEY");
+                env::remove_var("DEEPSEEK_BASE_URL");
+                env::remove_var("DEEPSEEK_MODEL");
+                env::remove_var("DEEPSEEK_PROVIDER");
+            }
+            guard
+        }
+
+        unsafe fn restore_var(key: &str, value: Option<OsString>) {
+            if let Some(value) = value {
+                unsafe { env::set_var(key, value) };
+            } else {
+                unsafe { env::remove_var(key) };
+            }
+        }
+    }
+
+    impl Drop for EnvGuard {
+        fn drop(&mut self) {
+            // Safety: test-only environment mutation guarded by a module mutex.
+            unsafe {
+                Self::restore_var("DEEPSEEK_API_KEY", self.deepseek_api_key.take());
+                Self::restore_var("DEEPSEEK_BASE_URL", self.deepseek_base_url.take());
+                Self::restore_var("DEEPSEEK_MODEL", self.deepseek_model.take());
+                Self::restore_var("DEEPSEEK_PROVIDER", self.deepseek_provider.take());
+            }
+        }
+    }
+
+    #[test]
+    fn root_deepseek_fields_are_runtime_fallbacks() {
+        let _lock = env_lock();
+        let _env = EnvGuard::without_deepseek_runtime_overrides();
+        let config = ConfigToml {
+            api_key: Some("root-key".to_string()),
+            base_url: Some("https://api.deepseek.com".to_string()),
+            default_text_model: Some("deepseek-chat".to_string()),
+            ..ConfigToml::default()
+        };
+
+        let resolved = config.resolve_runtime_options(&CliRuntimeOverrides::default());
+
+        assert_eq!(resolved.provider, ProviderKind::Deepseek);
+        assert_eq!(resolved.api_key.as_deref(), Some("root-key"));
+        assert_eq!(resolved.base_url, "https://api.deepseek.com");
+        assert_eq!(resolved.model, "deepseek-chat");
+    }
+
+    #[test]
+    fn provider_specific_deepseek_fields_override_tui_compat_fields() {
+        let _lock = env_lock();
+        let _env = EnvGuard::without_deepseek_runtime_overrides();
+        let mut config = ConfigToml {
+            api_key: Some("root-key".to_string()),
+            base_url: Some("https://api.deepseek.com".to_string()),
+            default_text_model: Some("deepseek-chat".to_string()),
+            ..ConfigToml::default()
+        };
+        config.providers.deepseek.api_key = Some("provider-key".to_string());
+        config.providers.deepseek.base_url = Some("https://api.deepseeki.com".to_string());
+        config.providers.deepseek.model = Some("deepseek-reasoner".to_string());
+
+        let resolved = config.resolve_runtime_options(&CliRuntimeOverrides::default());
+
+        assert_eq!(resolved.api_key.as_deref(), Some("provider-key"));
+        assert_eq!(resolved.base_url, "https://api.deepseeki.com");
+        assert_eq!(resolved.model, "deepseek-reasoner");
+    }
+
+    #[test]
+    fn list_values_redacts_root_api_key() {
+        let config = ConfigToml {
+            api_key: Some("sk-deepseek-secret".to_string()),
+            ..ConfigToml::default()
+        };
+
+        let values = config.list_values();
+
+        assert_eq!(
+            values.get("api_key").map(String::as_str),
+            Some("sk-d***cret")
+        );
+    }
+}
diff --git a/crates/core/Cargo.toml b/crates/core/Cargo.toml
index 52a2022d..bf48386d 100644
--- a/crates/core/Cargo.toml
+++ b/crates/core/Cargo.toml
@@ -9,14 +9,14 @@ description = "Core runtime boundaries for DeepSeek workspace architecture"
 [dependencies]
 anyhow.workspace = true
 chrono.workspace = true
-deepseek-agent = { path = "../agent", version = "0.3.33" }
-deepseek-config = { path = "../config", version = "0.3.33" }
-deepseek-execpolicy = { path = "../execpolicy", version = "0.3.33" }
-deepseek-hooks = { path = "../hooks", version = "0.3.33" }
-deepseek-mcp = { path = "../mcp", version = "0.3.33" }
-deepseek-protocol = { path = "../protocol", version = "0.3.33" }
-deepseek-state = { path = "../state", version = "0.3.33" }
-deepseek-tools = { path = "../tools", version = "0.3.33" }
+deepseek-agent = { path = "../agent", version = "0.4.0" }
+deepseek-config = { path = "../config", version = "0.4.0" }
+deepseek-execpolicy = { path = "../execpolicy", version = "0.4.0" }
+deepseek-hooks = { path = "../hooks", version = "0.4.0" }
+deepseek-mcp = { path = "../mcp", version = "0.4.0" }
+deepseek-protocol = { path = "../protocol", version = "0.4.0" }
+deepseek-state = { path = "../state", version = "0.4.0" }
+deepseek-tools = { path = "../tools", version = "0.4.0" }
 serde_json.workspace = true
 tokio.workspace = true
 uuid.workspace = true
diff --git a/crates/execpolicy/Cargo.toml b/crates/execpolicy/Cargo.toml
index 09960f29..0cedaec1 100644
--- a/crates/execpolicy/Cargo.toml
+++ b/crates/execpolicy/Cargo.toml
@@ -8,5 +8,5 @@ description = "Execution policy and approval model parity for DeepSeek workspace
 
 [dependencies]
 anyhow.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.3.33" }
+deepseek-protocol = { path = "../protocol", version = "0.4.0" }
 serde.workspace = true
diff --git a/crates/hooks/Cargo.toml b/crates/hooks/Cargo.toml
index c9d7c08a..9d23fec8 100644
--- a/crates/hooks/Cargo.toml
+++ b/crates/hooks/Cargo.toml
@@ -10,7 +10,7 @@ description = "Hook dispatch and notifications parity for DeepSeek workspace arc
 anyhow.workspace = true
 async-trait.workspace = true
 chrono.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.3.33" }
+deepseek-protocol = { path = "../protocol", version = "0.4.0" }
 reqwest.workspace = true
 serde.workspace = true
 serde_json.workspace = true
diff --git a/crates/mcp/Cargo.toml b/crates/mcp/Cargo.toml
index a15bfa43..104167de 100644
--- a/crates/mcp/Cargo.toml
+++ b/crates/mcp/Cargo.toml
@@ -8,6 +8,6 @@ description = "MCP server lifecycle and tool proxy compatibility for DeepSeek wo
 
 [dependencies]
 anyhow.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.3.33" }
+deepseek-protocol = { path = "../protocol", version = "0.4.0" }
 serde.workspace = true
 serde_json.workspace = true
diff --git a/crates/tools/Cargo.toml b/crates/tools/Cargo.toml
index 9ffb6345..eb7cee71 100644
--- a/crates/tools/Cargo.toml
+++ b/crates/tools/Cargo.toml
@@ -9,7 +9,7 @@ description = "Tool invocation lifecycle, schema validation, and scheduler paral
 [dependencies]
 anyhow.workspace = true
 async-trait.workspace = true
-deepseek-protocol = { path = "../protocol", version = "0.3.33" }
+deepseek-protocol = { path = "../protocol", version = "0.4.0" }
 serde.workspace = true
 serde_json.workspace = true
 tokio.workspace = true
diff --git a/crates/tui/src/automation_manager.rs b/crates/tui/src/automation_manager.rs
index 6fcacbef..290d56e0 100644
--- a/crates/tui/src/automation_manager.rs
+++ b/crates/tui/src/automation_manager.rs
@@ -403,7 +403,7 @@ impl AutomationManager {
             }
             out.push(record);
         }
-        out.sort_by(|a, b| b.updated_at.cmp(&a.updated_at));
+        out.sort_by_key(|r| std::cmp::Reverse(r.updated_at));
         Ok(out)
     }
 
@@ -522,7 +522,7 @@ impl AutomationManager {
             out.push(run);
         }
 
-        out.sort_by(|a, b| b.created_at.cmp(&a.created_at));
+        out.sort_by_key(|r| std::cmp::Reverse(r.created_at));
         if let Some(limit) = limit {
             out.truncate(limit);
         }
diff --git a/crates/tui/src/client.rs b/crates/tui/src/client.rs
index 77f65a0f..8be8823c 100644
--- a/crates/tui/src/client.rs
+++ b/crates/tui/src/client.rs
@@ -1,7 +1,8 @@
-//! HTTP client for the DeepSeek OpenAI-compatible APIs.
+//! HTTP client for DeepSeek's OpenAI-compatible Chat Completions API.
 //!
-//! Uses the OpenAI Responses API when available, falling back to Chat Completions
-//! if the Responses endpoint is unsupported by the target base URL.
+//! DeepSeek documents `/chat/completions` as the primary endpoint. A legacy
+//! Responses probe remains available behind `DEEPSEEK_EXPERIMENTAL_RESPONSES_API`
+//! for local compatibility experiments, but normal traffic uses chat completions.
 
 use std::collections::HashSet;
 use std::pin::Pin;
@@ -134,15 +135,16 @@ pub struct DeepSeekClient {
     retry: RetryPolicy,
     default_model: String,
     use_chat_completions: AtomicBool,
-    /// Counter of chat-completions requests since last Responses API probe.
-    /// After RESPONSES_RECOVERY_INTERVAL requests, we retry the Responses API.
+    /// Counter of chat-completions requests since last experimental Responses API probe.
+    /// After RESPONSES_RECOVERY_INTERVAL requests, we retry the Responses API when
+    /// `DEEPSEEK_EXPERIMENTAL_RESPONSES_API` is set.
     chat_fallback_counter: AtomicU32,
     connection_health: Arc<AsyncMutex<ConnectionHealth>>,
     rate_limiter: Arc<AsyncMutex<TokenBucket>>,
 }
 
-/// After this many chat-completions requests, retry the Responses API to see
-/// if it has recovered.
+/// After this many chat-completions requests, retry the experimental Responses
+/// API to see if it has recovered.
 const RESPONSES_RECOVERY_INTERVAL: u32 = 20;
 const CONNECTION_FAILURE_THRESHOLD: u32 = 2;
 const RECOVERY_PROBE_COOLDOWN: Duration = Duration::from_secs(15);
@@ -150,6 +152,7 @@ const RECOVERY_PROBE_COOLDOWN: Duration = Duration::from_secs(15);
 const DEFAULT_CLIENT_RATE_LIMIT_RPS: f64 = 8.0;
 const DEFAULT_CLIENT_RATE_LIMIT_BURST: f64 = 16.0;
 const ALLOW_INSECURE_HTTP_ENV: &str = "DEEPSEEK_ALLOW_INSECURE_HTTP";
+const EXPERIMENTAL_RESPONSES_API_ENV: &str = "DEEPSEEK_EXPERIMENTAL_RESPONSES_API";
 
 const SSE_BACKPRESSURE_HIGH_WATERMARK: usize = 8 * 1024 * 1024; // 8 MB
 const SSE_BACKPRESSURE_SLEEP_MS: u64 = 10;
@@ -375,6 +378,30 @@ fn validate_base_url_security(base_url: &str) -> Result<()> {
     )
 }
 
+fn experimental_responses_api_enabled() -> bool {
+    std::env::var(EXPERIMENTAL_RESPONSES_API_ENV)
+        .ok()
+        .as_deref()
+        .is_some_and(|v| v == "1" || v.eq_ignore_ascii_case("true"))
+}
+
+fn versioned_base_url(base_url: &str) -> String {
+    let trimmed = base_url.trim_end_matches('/');
+    if trimmed.ends_with("/v1") || trimmed.ends_with("/beta") {
+        trimmed.to_string()
+    } else {
+        format!("{trimmed}/v1")
+    }
+}
+
+fn api_url(base_url: &str, path: &str) -> String {
+    format!(
+        "{}/{}",
+        versioned_base_url(base_url).trim_end_matches('/'),
+        path.trim_start_matches('/')
+    )
+}
+
 // === DeepSeekClient ===
 
 impl DeepSeekClient {
@@ -428,7 +455,7 @@ impl DeepSeekClient {
 
     /// List available models from the provider.
     pub async fn list_models(&self) -> Result<Vec<AvailableModel>> {
-        let url = format!("{}/v1/models", self.base_url.trim_end_matches('/'));
+        let url = api_url(&self.base_url, "models");
         let response = self.send_with_retry(|| self.http_client.get(&url)).await?;
 
         let status = response.status();
@@ -475,7 +502,7 @@ impl DeepSeekClient {
         if !should_probe {
             return;
         }
-        let health_url = format!("{}/v1/models", self.base_url.trim_end_matches('/'));
+        let health_url = api_url(&self.base_url, "models");
         let probe = self.http_client.get(health_url).send().await;
         match probe {
             Ok(resp) if resp.status().is_success() => {
@@ -581,8 +608,9 @@ impl DeepSeekClient {
         if let Some(choice) = request.tool_choice.as_ref() {
             body["tool_choice"] = choice.clone();
         }
+        apply_reasoning_effort(&mut body, request.reasoning_effort.as_deref());
 
-        let url = format!("{}/v1/responses", self.base_url.trim_end_matches('/'));
+        let url = api_url(&self.base_url, "responses");
         let response = self
             .send_with_retry(|| self.http_client.post(&url).json(&body))
             .await?;
@@ -632,11 +660,9 @@ impl DeepSeekClient {
         {
             body["tool_choice"] = mapped;
         }
+        apply_reasoning_effort(&mut body, request.reasoning_effort.as_deref());
 
-        let url = format!(
-            "{}/v1/chat/completions",
-            self.base_url.trim_end_matches('/')
-        );
+        let url = api_url(&self.base_url, "chat/completions");
         let response = self
             .send_with_retry(|| self.http_client.post(&url).json(&body))
             .await?;
@@ -666,7 +692,7 @@ impl LlmClient for DeepSeekClient {
     }
 
     async fn health_check(&self) -> Result<bool> {
-        let health_url = format!("{}/v1/models", self.base_url.trim_end_matches('/'));
+        let health_url = api_url(&self.base_url, "models");
         self.wait_for_rate_limit().await;
         let response = self.http_client.get(health_url).send().await;
         match response {
@@ -688,6 +714,10 @@ impl LlmClient for DeepSeekClient {
     }
 
     async fn create_message(&self, request: MessageRequest) -> Result<MessageResponse> {
+        if !experimental_responses_api_enabled() {
+            return self.create_message_chat(&request).await;
+        }
+
         // Check if it's time to probe Responses API recovery
         if self.use_chat_completions.load(Ordering::Relaxed) {
             let count = self.chat_fallback_counter.fetch_add(1, Ordering::Relaxed);
@@ -738,6 +768,9 @@ impl LlmClient for DeepSeekClient {
             "messages": messages,
             "max_tokens": request.max_tokens,
             "stream": true,
+            "stream_options": {
+                "include_usage": true
+            },
         });
 
         if let Some(temperature) = request.temperature {
@@ -754,11 +787,9 @@ impl LlmClient for DeepSeekClient {
         {
             body["tool_choice"] = mapped;
         }
+        apply_reasoning_effort(&mut body, request.reasoning_effort.as_deref());
 
-        let url = format!(
-            "{}/v1/chat/completions",
-            self.base_url.trim_end_matches('/')
-        );
+        let url = api_url(&self.base_url, "chat/completions");
         let response = self
             .send_with_retry(|| self.http_client.post(&url).json(&body))
             .await?;
@@ -789,7 +820,7 @@ impl LlmClient for DeepSeekClient {
                     usage: Usage {
                         input_tokens: 0,
                         output_tokens: 0,
-                        server_tool_use: None,
+                        ..Usage::default()
                     },
                 },
             });
@@ -1357,7 +1388,8 @@ fn build_chat_messages(
             let reasoning_content = thinking_parts.join("\n");
             let has_text = !content.trim().is_empty();
             let has_tool_calls = !tool_calls.is_empty();
-            let has_reasoning = include_reasoning && !reasoning_content.trim().is_empty();
+            let include_reasoning_for_turn = include_reasoning && has_tool_calls;
+            let has_reasoning = include_reasoning_for_turn && !reasoning_content.trim().is_empty();
 
             // DeepSeek rejects assistant messages where both `content` and
             // `tool_calls` are missing/null. Skip such entries even if they
@@ -1546,8 +1578,10 @@ fn tool_to_chat(tool: &Tool) -> Value {
     if let Some(input_examples) = &tool.input_examples {
         value["input_examples"] = json!(input_examples);
     }
-    if let Some(strict) = tool.strict {
-        value["strict"] = json!(strict);
+    if let Some(strict) = tool.strict
+        && let Some(function) = value.get_mut("function")
+    {
+        function["strict"] = json!(strict);
     }
     value
 }
@@ -1576,12 +1610,43 @@ fn map_tool_choice_for_chat(choice: &Value) -> Option<Value> {
 fn requires_reasoning_content(model: &str) -> bool {
     let lower = model.to_lowercase();
     lower.contains("deepseek-v3.2")
+        || lower.contains("deepseek-v4")
         || lower.contains("reasoner")
         || lower.contains("-reasoning")
         || lower.contains("-thinking")
         || has_deepseek_r_series_marker(&lower)
 }
 
+/// Translate the TUI's effort-tier string into DeepSeek's request fields.
+///
+/// The config surface accepts `off | low | medium | high | max`. DeepSeek
+/// itself collapses `low`/`medium` → `"high"` and `xhigh` → `"max"` at the
+/// API boundary (per their docs); `off` emits the disable toggle.
+fn apply_reasoning_effort(body: &mut Value, effort: Option<&str>) {
+    let Some(effort) = effort else {
+        return;
+    };
+    let normalized = effort.trim().to_ascii_lowercase();
+    match normalized.as_str() {
+        "off" | "disabled" | "none" | "false" => {
+            body["thinking"] = json!({ "type": "disabled" });
+        }
+        "max" | "maximum" | "xhigh" => {
+            body["reasoning_effort"] = json!("max");
+            body["thinking"] = json!({ "type": "enabled" });
+        }
+        "low" | "minimal" | "medium" | "mid" | "high" | "" => {
+            // Per DeepSeek docs: low/medium compat-map to "high".
+            body["reasoning_effort"] = json!("high");
+            body["thinking"] = json!({ "type": "enabled" });
+        }
+        _ => {
+            // Unknown value — do not mutate the request, let the provider
+            // apply its own defaults.
+        }
+    }
+}
+
 fn has_deepseek_r_series_marker(model_lower: &str) -> bool {
     const PREFIX: &str = "deepseek-r";
     model_lower.match_indices(PREFIX).any(|(idx, _)| {
@@ -1701,6 +1766,19 @@ fn parse_usage(usage: Option<&Value>) -> Usage {
         })
         .and_then(Value::as_u64)
         .unwrap_or(0);
+    let prompt_cache_hit_tokens = usage
+        .and_then(|u| u.get("prompt_cache_hit_tokens"))
+        .and_then(Value::as_u64)
+        .map(|v| v as u32);
+    let prompt_cache_miss_tokens = usage
+        .and_then(|u| u.get("prompt_cache_miss_tokens"))
+        .and_then(Value::as_u64)
+        .map(|v| v as u32);
+    let reasoning_tokens = usage
+        .and_then(|u| u.get("completion_tokens_details"))
+        .and_then(|details| details.get("reasoning_tokens"))
+        .and_then(Value::as_u64)
+        .map(|v| v as u32);
 
     let server_tool_use = usage.and_then(|u| u.get("server_tool_use")).map(|server| {
         let code_execution_requests = server
@@ -1720,6 +1798,9 @@ fn parse_usage(usage: Option<&Value>) -> Usage {
     Usage {
         input_tokens: input_tokens as u32,
         output_tokens: output_tokens as u32,
+        prompt_cache_hit_tokens,
+        prompt_cache_miss_tokens,
+        reasoning_tokens,
         server_tool_use,
     }
 }
@@ -1843,6 +1924,20 @@ fn parse_sse_chunk(
         return events;
     };
 
+    if choices.is_empty() {
+        if let Some(usage_val) = chunk.get("usage") {
+            let usage = parse_usage(Some(usage_val));
+            events.push(StreamEvent::MessageDelta {
+                delta: MessageDelta {
+                    stop_reason: None,
+                    stop_sequence: None,
+                },
+                usage: Some(usage),
+            });
+        }
+        return events;
+    }
+
     for choice in choices {
         let delta = choice.get("delta");
         let finish_reason = choice
@@ -2077,7 +2172,23 @@ mod tests {
     }
 
     #[test]
-    fn chat_messages_include_reasoning_content_for_reasoner() {
+    fn api_url_handles_default_v1_and_beta_base_urls() {
+        assert_eq!(
+            api_url("https://api.deepseek.com", "chat/completions"),
+            "https://api.deepseek.com/v1/chat/completions"
+        );
+        assert_eq!(
+            api_url("https://api.deepseek.com/v1", "chat/completions"),
+            "https://api.deepseek.com/v1/chat/completions"
+        );
+        assert_eq!(
+            api_url("https://api.deepseek.com/beta", "chat/completions"),
+            "https://api.deepseek.com/beta/chat/completions"
+        );
+    }
+
+    #[test]
+    fn chat_messages_strip_reasoning_content_from_final_answer() {
         let message = Message {
             role: "assistant".to_string(),
             content: vec![
@@ -2096,9 +2207,10 @@ mod tests {
             .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
             .expect("assistant message");
         assert_eq!(
-            assistant.get("reasoning_content").and_then(Value::as_str),
-            Some("plan")
+            assistant.get("content").and_then(Value::as_str),
+            Some("done")
         );
+        assert!(assistant.get("reasoning_content").is_none());
     }
 
     #[test]
@@ -2117,7 +2229,7 @@ mod tests {
     }
 
     #[test]
-    fn chat_messages_preserve_thinking_only_assistant_for_reasoner_model() {
+    fn chat_messages_drop_thinking_only_assistant_for_reasoner_model() {
         let message = Message {
             role: "assistant".to_string(),
             content: vec![ContentBlock::Thinking {
@@ -2125,19 +2237,14 @@ mod tests {
             }],
         };
         let out = build_chat_messages(None, &[message], "deepseek-reasoner");
-        let assistant = out
-            .iter()
-            .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
-            .expect("assistant message");
-        assert_eq!(assistant.get("content").and_then(Value::as_str), Some(""));
-        assert_eq!(
-            assistant.get("reasoning_content").and_then(Value::as_str),
-            Some("plan")
+        assert!(
+            !out.iter()
+                .any(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
         );
     }
 
     #[test]
-    fn chat_messages_preserve_thinking_only_assistant_for_r_series_model() {
+    fn chat_messages_drop_thinking_only_assistant_for_r_series_model() {
         let message = Message {
             role: "assistant".to_string(),
             content: vec![ContentBlock::Thinking {
@@ -2145,6 +2252,47 @@ mod tests {
             }],
         };
         let out = build_chat_messages(None, &[message], "deepseek-r2-lite-preview");
+        assert!(
+            !out.iter()
+                .any(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
+        );
+    }
+
+    #[test]
+    fn chat_messages_preserve_current_tool_round_reasoning_for_reasoner_model() {
+        let messages = vec![
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "Need the date".to_string(),
+                    cache_control: None,
+                }],
+            },
+            Message {
+                role: "assistant".to_string(),
+                content: vec![
+                    ContentBlock::Thinking {
+                        thinking: "Need to call a tool".to_string(),
+                    },
+                    ContentBlock::ToolUse {
+                        id: "tool-1".to_string(),
+                        name: "get_date".to_string(),
+                        input: json!({}),
+                        caller: None,
+                    },
+                ],
+            },
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::ToolResult {
+                    tool_use_id: "tool-1".to_string(),
+                    content: "2026-04-23".to_string(),
+                    is_error: None,
+                    content_blocks: None,
+                }],
+            },
+        ];
+        let out = build_chat_messages(None, &messages, "deepseek-reasoner");
         let assistant = out
             .iter()
             .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
@@ -2152,10 +2300,183 @@ mod tests {
         assert_eq!(assistant.get("content").and_then(Value::as_str), Some(""));
         assert_eq!(
             assistant.get("reasoning_content").and_then(Value::as_str),
-            Some("plan")
+            Some("Need to call a tool")
         );
     }
 
+    #[test]
+    fn chat_messages_preserve_prior_tool_round_reasoning_after_new_user_turn() {
+        let messages = vec![
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "Need the date".to_string(),
+                    cache_control: None,
+                }],
+            },
+            Message {
+                role: "assistant".to_string(),
+                content: vec![
+                    ContentBlock::Thinking {
+                        thinking: "Need to call a tool".to_string(),
+                    },
+                    ContentBlock::ToolUse {
+                        id: "tool-1".to_string(),
+                        name: "get_date".to_string(),
+                        input: json!({}),
+                        caller: None,
+                    },
+                ],
+            },
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::ToolResult {
+                    tool_use_id: "tool-1".to_string(),
+                    content: "2026-04-23".to_string(),
+                    is_error: None,
+                    content_blocks: None,
+                }],
+            },
+            Message {
+                role: "assistant".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "It is 2026-04-23.".to_string(),
+                    cache_control: None,
+                }],
+            },
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "Thanks. Next question.".to_string(),
+                    cache_control: None,
+                }],
+            },
+        ];
+        let out = build_chat_messages(None, &messages, "deepseek-reasoner");
+        let assistant = out
+            .iter()
+            .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
+            .expect("assistant message");
+        assert_eq!(
+            assistant.get("reasoning_content").and_then(Value::as_str),
+            Some("Need to call a tool")
+        );
+    }
+
+    #[test]
+    fn chat_messages_preserve_v4_tool_round_reasoning() {
+        let messages = vec![
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "Use a tool".to_string(),
+                    cache_control: None,
+                }],
+            },
+            Message {
+                role: "assistant".to_string(),
+                content: vec![
+                    ContentBlock::Thinking {
+                        thinking: "Need a tool for this".to_string(),
+                    },
+                    ContentBlock::ToolUse {
+                        id: "call-1".to_string(),
+                        name: "read_file".to_string(),
+                        input: json!({"path": "Cargo.toml"}),
+                        caller: None,
+                    },
+                ],
+            },
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::ToolResult {
+                    tool_use_id: "call-1".to_string(),
+                    content: "workspace manifest".to_string(),
+                    is_error: None,
+                    content_blocks: None,
+                }],
+            },
+            Message {
+                role: "assistant".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "Read it.".to_string(),
+                    cache_control: None,
+                }],
+            },
+            Message {
+                role: "user".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "Now continue.".to_string(),
+                    cache_control: None,
+                }],
+            },
+        ];
+
+        let out = build_chat_messages(None, &messages, "deepseek-v4-pro");
+        let assistant = out
+            .iter()
+            .find(|value| value.get("role").and_then(Value::as_str) == Some("assistant"))
+            .expect("assistant message");
+        assert_eq!(
+            assistant.get("reasoning_content").and_then(Value::as_str),
+            Some("Need a tool for this")
+        );
+        assert!(assistant.get("tool_calls").is_some());
+    }
+
+    #[test]
+    fn reasoning_effort_uses_deepseek_top_level_thinking_parameter() {
+        let mut body = json!({});
+        apply_reasoning_effort(&mut body, Some("max"));
+
+        assert_eq!(
+            body.get("reasoning_effort").and_then(Value::as_str),
+            Some("max")
+        );
+        assert_eq!(
+            body.pointer("/thinking/type").and_then(Value::as_str),
+            Some("enabled")
+        );
+        assert!(body.get("extra_body").is_none());
+    }
+
+    #[test]
+    fn reasoning_effort_off_disables_top_level_thinking() {
+        let mut body = json!({});
+        apply_reasoning_effort(&mut body, Some("off"));
+
+        assert_eq!(
+            body.pointer("/thinking/type").and_then(Value::as_str),
+            Some("disabled")
+        );
+        assert!(body.get("reasoning_effort").is_none());
+        assert!(body.get("extra_body").is_none());
+    }
+
+    #[test]
+    fn chat_tool_strict_flag_is_nested_under_function() {
+        let tool = Tool {
+            tool_type: Some("function".to_string()),
+            name: "emit_json".to_string(),
+            description: "Emit JSON".to_string(),
+            input_schema: json!({"type": "object", "properties": {}}),
+            allowed_callers: None,
+            defer_loading: None,
+            input_examples: None,
+            strict: Some(true),
+            cache_control: None,
+        };
+        let encoded = tool_to_chat(&tool);
+        assert_eq!(
+            encoded
+                .get("function")
+                .and_then(|function| function.get("strict"))
+                .and_then(Value::as_bool),
+            Some(true)
+        );
+        assert!(encoded.get("strict").is_none());
+    }
+
     #[test]
     fn chat_messages_drop_thinking_only_assistant_for_non_reasoning_model() {
         let message = Message {
@@ -2240,6 +2561,43 @@ mod tests {
         assert_eq!(deltas, vec![0, 1]);
     }
 
+    #[test]
+    fn parse_sse_chunk_handles_empty_choices_usage_chunk() {
+        let chunk = json!({
+            "choices": [],
+            "usage": {
+                "prompt_tokens": 100,
+                "completion_tokens": 20,
+                "prompt_cache_hit_tokens": 70,
+                "prompt_cache_miss_tokens": 30
+            }
+        });
+
+        let mut content_index = 0;
+        let mut text_started = false;
+        let mut thinking_started = false;
+        let mut tool_indices: std::collections::HashMap<u32, u32> =
+            std::collections::HashMap::new();
+        let events = parse_sse_chunk(
+            &chunk,
+            &mut content_index,
+            &mut text_started,
+            &mut thinking_started,
+            &mut tool_indices,
+            false,
+        );
+
+        let StreamEvent::MessageDelta {
+            usage: Some(usage), ..
+        } = &events[0]
+        else {
+            panic!("expected usage delta");
+        };
+        assert_eq!(usage.input_tokens, 100);
+        assert_eq!(usage.prompt_cache_hit_tokens, Some(70));
+        assert_eq!(usage.prompt_cache_miss_tokens, Some(30));
+    }
+
     #[test]
     fn chat_messages_drop_orphan_tool_results() {
         let messages = vec![Message {
@@ -2515,6 +2873,25 @@ mod tests {
         );
     }
 
+    #[test]
+    fn parse_usage_reads_deepseek_cache_and_reasoning_tokens() {
+        let usage = parse_usage(Some(&json!({
+            "prompt_tokens": 100,
+            "completion_tokens": 20,
+            "prompt_cache_hit_tokens": 70,
+            "prompt_cache_miss_tokens": 30,
+            "completion_tokens_details": {
+                "reasoning_tokens": 12
+            }
+        })));
+
+        assert_eq!(usage.input_tokens, 100);
+        assert_eq!(usage.output_tokens, 20);
+        assert_eq!(usage.prompt_cache_hit_tokens, Some(70));
+        assert_eq!(usage.prompt_cache_miss_tokens, Some(30));
+        assert_eq!(usage.reasoning_tokens, Some(12));
+    }
+
     #[test]
     fn token_bucket_enforces_delay_when_empty() {
         let now = Instant::now();
diff --git a/crates/tui/src/compaction.rs b/crates/tui/src/compaction.rs
index d886a7f0..11520d07 100644
--- a/crates/tui/src/compaction.rs
+++ b/crates/tui/src/compaction.rs
@@ -462,7 +462,9 @@ fn estimate_tokens_for_message(message: &Message) -> usize {
         .iter()
         .map(|c| match c {
             ContentBlock::Text { text, .. } => text.len() / 4,
-            ContentBlock::Thinking { thinking } => thinking.len() / 4,
+            // Historical reasoning blocks are UI/session metadata for DeepSeek.
+            // They are only sent back during an in-progress tool-call round.
+            ContentBlock::Thinking { .. } => 0,
             ContentBlock::ToolUse { input, .. } => serde_json::to_string(input)
                 .map(|s| s.len() / 4)
                 .unwrap_or(100),
@@ -819,6 +821,7 @@ async fn create_summary(
         tool_choice: None,
         metadata: None,
         thinking: None,
+        reasoning_effort: None,
         stream: Some(false),
         temperature: Some(0.3),
         top_p: None,
@@ -859,15 +862,14 @@ fn extract_workflow_context(messages: &[Message], workspace: Option<&Path>) -> S
                         files_touched.push(path);
                     }
                 }
-                ContentBlock::Text { text, .. } => {
+                ContentBlock::Text { text, .. }
                     // Look for task/todo mentions
-                    if text.contains("TODO") || text.contains("task") || text.contains("need to") {
+                    if (text.contains("TODO") || text.contains("task") || text.contains("need to")) => {
                         let task = truncate_chars(text, 200).to_string();
                         if !tasks_identified.contains(&task) {
                             tasks_identified.push(task);
                         }
                     }
-                }
                 _ => {}
             }
         }
diff --git a/crates/tui/src/config.rs b/crates/tui/src/config.rs
index 1ca4fa35..59432916 100644
--- a/crates/tui/src/config.rs
+++ b/crates/tui/src/config.rs
@@ -15,14 +15,24 @@ use crate::hooks::HooksConfig;
 
 pub const DEFAULT_MAX_SUBAGENTS: usize = 5;
 pub const MAX_SUBAGENTS: usize = 20;
-pub const DEFAULT_TEXT_MODEL: &str = "deepseek-reasoner";
+pub const DEFAULT_TEXT_MODEL: &str = "deepseek-v4-pro";
 const API_KEYRING_SENTINEL: &str = "__KEYRING__";
-pub const COMMON_DEEPSEEK_MODELS: &[&str] = &["deepseek-chat", "deepseek-reasoner"];
+pub const COMMON_DEEPSEEK_MODELS: &[&str] = &[
+    "deepseek-v4-pro",
+    "deepseek-v4-flash",
+    "deepseek-chat",
+    "deepseek-reasoner",
+];
 
 /// Canonicalize common model aliases to stable DeepSeek IDs.
+///
+/// Legacy `deepseek-chat` / `deepseek-reasoner` remain as silent aliases: they
+/// resolve to themselves for API compatibility and are priced as `deepseek-v4-flash`.
 #[must_use]
 pub fn canonical_model_name(model: &str) -> Option<&'static str> {
     match model.trim().to_ascii_lowercase().as_str() {
+        "deepseek-v4-pro" | "deepseek-v4pro" => Some("deepseek-v4-pro"),
+        "deepseek-v4-flash" | "deepseek-v4flash" => Some("deepseek-v4-flash"),
         "deepseek-chat" | "deepseek-v3" | "deepseek-v3.2" => Some("deepseek-chat"),
         "deepseek-reasoner" | "deepseek-r1" => Some("deepseek-reasoner"),
         _ => None,
@@ -123,6 +133,9 @@ pub struct Config {
     pub api_key: Option<String>,
     pub base_url: Option<String>,
     pub default_text_model: Option<String>,
+    /// DeepSeek reasoning-effort tier: `"off" | "low" | "medium" | "high" | "max"`.
+    /// Defaults to `"max"` at runtime if unset.
+    pub reasoning_effort: Option<String>,
     pub tools_file: Option<String>,
     pub skills_dir: Option<String>,
     pub mcp_config_path: Option<String>,
@@ -215,7 +228,7 @@ impl Config {
             && normalize_model_name(model).is_none()
         {
             anyhow::bail!(
-                "Invalid default_text_model '{model}': expected a DeepSeek model ID (for example: deepseek-chat, deepseek-reasoner, deepseek-v4)."
+                "Invalid default_text_model '{model}': expected a DeepSeek model ID (for example: deepseek-v4-pro, deepseek-v4-flash)."
             );
         }
         if let Some(policy) = self.approval_policy.as_deref() {
@@ -373,6 +386,12 @@ impl Config {
             .clamp(1, MAX_SUBAGENTS)
     }
 
+    /// Return the configured DeepSeek reasoning-effort tier, if any.
+    #[must_use]
+    pub fn reasoning_effort(&self) -> Option<&str> {
+        self.reasoning_effort.as_deref()
+    }
+
     /// Get hooks configuration, returning default if not configured.
     pub fn hooks_config(&self) -> HooksConfig {
         self.hooks.clone().unwrap_or_default()
@@ -566,6 +585,11 @@ fn apply_env_overrides(config: &mut Config) {
     if let Ok(value) = std::env::var("DEEPSEEK_BASE_URL") {
         config.base_url = Some(value);
     }
+    if let Ok(value) =
+        std::env::var("DEEPSEEK_MODEL").or_else(|_| std::env::var("DEEPSEEK_DEFAULT_TEXT_MODEL"))
+    {
+        config.default_text_model = Some(value);
+    }
     if let Ok(value) = std::env::var("DEEPSEEK_SKILLS_DIR") {
         config.skills_dir = Some(value);
     }
@@ -753,6 +777,7 @@ fn merge_config(base: Config, override_cfg: Config) -> Config {
         api_key: override_cfg.api_key.or(base.api_key),
         base_url: override_cfg.base_url.or(base.base_url),
         default_text_model: override_cfg.default_text_model.or(base.default_text_model),
+        reasoning_effort: override_cfg.reasoning_effort.or(base.reasoning_effort),
         tools_file: override_cfg.tools_file.or(base.tools_file),
         skills_dir: override_cfg.skills_dir.or(base.skills_dir),
         mcp_config_path: override_cfg.mcp_config_path.or(base.mcp_config_path),
@@ -926,6 +951,11 @@ api_key = "{key_to_write}"
 
 # Default model
 default_text_model = "{default_model}"
+
+# Thinking mode (DeepSeek V4 reasoning effort):
+# "off" | "low" | "medium" | "high" | "max"
+# Shift+Tab in the TUI cycles between off / high / max.
+reasoning_effort = "max"
 "#,
             default_model = DEFAULT_TEXT_MODEL
         )
@@ -1007,6 +1037,8 @@ mod tests {
         userprofile: Option<OsString>,
         deepseek_config_path: Option<OsString>,
         deepseek_api_key: Option<OsString>,
+        deepseek_model: Option<OsString>,
+        deepseek_default_text_model: Option<OsString>,
     }
 
     impl EnvGuard {
@@ -1018,18 +1050,24 @@ mod tests {
             let userprofile_prev = env::var_os("USERPROFILE");
             let deepseek_config_prev = env::var_os("DEEPSEEK_CONFIG_PATH");
             let api_key_prev = env::var_os("DEEPSEEK_API_KEY");
+            let model_prev = env::var_os("DEEPSEEK_MODEL");
+            let default_text_model_prev = env::var_os("DEEPSEEK_DEFAULT_TEXT_MODEL");
             // Safety: test-only environment mutation guarded by a global mutex.
             unsafe {
                 env::set_var("HOME", &home_str);
                 env::set_var("USERPROFILE", &home_str);
                 env::set_var("DEEPSEEK_CONFIG_PATH", &config_str);
                 env::remove_var("DEEPSEEK_API_KEY");
+                env::remove_var("DEEPSEEK_MODEL");
+                env::remove_var("DEEPSEEK_DEFAULT_TEXT_MODEL");
             }
             Self {
                 home: home_prev,
                 userprofile: userprofile_prev,
                 deepseek_config_path: deepseek_config_prev,
                 deepseek_api_key: api_key_prev,
+                deepseek_model: model_prev,
+                deepseek_default_text_model: default_text_model_prev,
             }
         }
     }
@@ -1042,6 +1080,11 @@ mod tests {
                 Self::restore_var("USERPROFILE", self.userprofile.take());
                 Self::restore_var("DEEPSEEK_CONFIG_PATH", self.deepseek_config_path.take());
                 Self::restore_var("DEEPSEEK_API_KEY", self.deepseek_api_key.take());
+                Self::restore_var("DEEPSEEK_MODEL", self.deepseek_model.take());
+                Self::restore_var(
+                    "DEEPSEEK_DEFAULT_TEXT_MODEL",
+                    self.deepseek_default_text_model.take(),
+                );
             }
         }
     }
@@ -1280,4 +1323,29 @@ mod tests {
         config.validate()?;
         Ok(())
     }
+
+    #[test]
+    fn deepseek_model_env_overrides_default_text_model() -> Result<()> {
+        let _lock = lock_test_env();
+        let nanos = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_nanos();
+        let temp_root = env::temp_dir().join(format!(
+            "deepseek-tui-model-env-test-{}-{}",
+            std::process::id(),
+            nanos
+        ));
+        fs::create_dir_all(&temp_root)?;
+        let _guard = EnvGuard::new(&temp_root);
+
+        // Safety: test-only environment mutation guarded by a global mutex.
+        unsafe {
+            env::set_var("DEEPSEEK_MODEL", "deepseek-chat");
+        }
+
+        let config = Config::load(None, None)?;
+        assert_eq!(config.default_text_model.as_deref(), Some("deepseek-chat"));
+        Ok(())
+    }
 }
diff --git a/crates/tui/src/core/engine.rs b/crates/tui/src/core/engine.rs
index 5f4ccafa..67cbc732 100644
--- a/crates/tui/src/core/engine.rs
+++ b/crates/tui/src/core/engine.rs
@@ -30,7 +30,7 @@ use crate::llm_client::LlmClient;
 use crate::mcp::McpPool;
 use crate::models::{
     ContentBlock, ContentBlockStart, DEFAULT_CONTEXT_WINDOW_TOKENS, Delta, Message, MessageRequest,
-    StreamEvent, SystemPrompt, Tool, ToolCaller, Usage, context_window_for_model,
+    StreamEvent, SystemBlock, SystemPrompt, Tool, ToolCaller, Usage, context_window_for_model,
 };
 use crate::prompts;
 use crate::tools::plan::{SharedPlanState, new_shared_plan_state};
@@ -360,6 +360,7 @@ const TOOL_RESULT_CONTEXT_SNIPPET_CHARS: usize = 900;
 /// Max chars to keep from metadata-provided output summaries.
 const TOOL_RESULT_METADATA_SUMMARY_CHARS: usize = 320;
 const COMPACTION_SUMMARY_MARKER: &str = "Conversation Summary (Auto-Generated)";
+const WORKING_SET_SUMMARY_MARKER: &str = "## Repo Working Set";
 
 const TOOL_CALL_START_MARKERS: [&str; 5] = [
     "[TOOL_CALL]",
@@ -1127,6 +1128,56 @@ fn extract_compaction_summary_prompt(prompt: Option<SystemPrompt>) -> Option<Sys
     }
 }
 
+fn remove_working_set_summary(prompt: Option<&SystemPrompt>) -> Option<SystemPrompt> {
+    match prompt {
+        Some(SystemPrompt::Blocks(blocks)) => {
+            let filtered: Vec<SystemBlock> = blocks
+                .iter()
+                .filter(|block| !block.text.contains(WORKING_SET_SUMMARY_MARKER))
+                .cloned()
+                .collect();
+            if filtered.is_empty() {
+                None
+            } else {
+                Some(SystemPrompt::Blocks(filtered))
+            }
+        }
+        Some(SystemPrompt::Text(text)) => Some(SystemPrompt::Text(text.clone())),
+        None => None,
+    }
+}
+
+fn append_working_set_summary(
+    prompt: Option<SystemPrompt>,
+    working_set_summary: Option<&str>,
+) -> Option<SystemPrompt> {
+    let Some(summary) = working_set_summary.map(str::trim).filter(|s| !s.is_empty()) else {
+        return prompt;
+    };
+    let working_set_block = SystemBlock {
+        block_type: "text".to_string(),
+        text: summary.to_string(),
+        cache_control: None,
+    };
+
+    match prompt {
+        Some(SystemPrompt::Text(text)) => Some(SystemPrompt::Blocks(vec![
+            SystemBlock {
+                block_type: "text".to_string(),
+                text,
+                cache_control: None,
+            },
+            working_set_block,
+        ])),
+        Some(SystemPrompt::Blocks(mut blocks)) => {
+            blocks.retain(|block| !block.text.contains(WORKING_SET_SUMMARY_MARKER));
+            blocks.push(working_set_block);
+            Some(SystemPrompt::Blocks(blocks))
+        }
+        None => Some(SystemPrompt::Blocks(vec![working_set_block])),
+    }
+}
+
 fn estimate_text_tokens_conservative(text: &str) -> usize {
     text.chars().count().div_ceil(3)
 }
@@ -1231,12 +1282,10 @@ impl Engine {
 
         // Set up system prompt with project context (default to agent mode)
         let working_set_summary = session.working_set.summary_block(&config.workspace);
-        let system_prompt = prompts::system_prompt_for_mode_with_context(
-            AppMode::Agent,
-            &config.workspace,
-            working_set_summary.as_deref(),
-        );
-        session.system_prompt = Some(system_prompt);
+        let system_prompt =
+            prompts::system_prompt_for_mode_with_context(AppMode::Agent, &config.workspace, None);
+        session.system_prompt =
+            append_working_set_summary(Some(system_prompt), working_set_summary.as_deref());
 
         let subagent_manager =
             new_shared_subagent_manager(config.workspace.clone(), config.max_subagents);
@@ -1285,6 +1334,7 @@ impl Engine {
                     content,
                     mode,
                     model,
+                    reasoning_effort,
                     allow_shell,
                     trust_mode,
                     auto_approve,
@@ -1293,6 +1343,7 @@ impl Engine {
                         content,
                         mode,
                         model,
+                        reasoning_effort,
                         allow_shell,
                         trust_mode,
                         auto_approve,
@@ -1444,11 +1495,13 @@ impl Engine {
     }
 
     /// Handle a send message operation
+    #[allow(clippy::too_many_arguments)]
     async fn handle_send_message(
         &mut self,
         content: String,
         mode: AppMode,
         model: String,
+        reasoning_effort: Option<String>,
         allow_shell: bool,
         trust_mode: bool,
         auto_approve: bool,
@@ -1511,6 +1564,7 @@ impl Engine {
 
         self.session.model = model;
         self.config.model.clone_from(&self.session.model);
+        self.session.reasoning_effort = reasoning_effort;
         self.session.allow_shell = allow_shell;
         self.config.allow_shell = allow_shell;
         self.session.trust_mode = trust_mode;
@@ -1650,7 +1704,7 @@ impl Engine {
         let zero_usage = Usage {
             input_tokens: 0,
             output_tokens: 0,
-            server_tool_use: None,
+            ..Usage::default()
         };
         let Some(client) = self.deepseek_client.clone() else {
             let message = "Manual compaction unavailable: API client not configured".to_string();
@@ -2428,6 +2482,7 @@ impl Engine {
                 },
                 metadata: None,
                 thinking: None,
+                reasoning_effort: self.session.reasoning_effort.clone(),
                 stream: Some(true),
                 temperature: None,
                 top_p: None,
@@ -2471,7 +2526,7 @@ impl Engine {
             let mut usage = Usage {
                 input_tokens: 0,
                 output_tokens: 0,
-                server_tool_use: None,
+                ..Usage::default()
             };
             let mut current_block_kind: Option<ContentBlockKind> = None;
             let mut current_tool_index: Option<usize> = None;
@@ -4277,13 +4332,11 @@ impl Engine {
             .session
             .working_set
             .summary_block(&self.config.workspace);
-        let base = prompts::system_prompt_for_mode_with_context(
-            mode,
-            &self.config.workspace,
-            working_set_summary.as_deref(),
-        );
-        self.session.system_prompt =
+        let base = prompts::system_prompt_for_mode_with_context(mode, &self.config.workspace, None);
+        let stable_prompt =
             merge_system_prompts(Some(&base), self.session.compaction_summary_prompt.clone());
+        self.session.system_prompt =
+            append_working_set_summary(stable_prompt, working_set_summary.as_deref());
     }
 
     fn merge_compaction_summary(&mut self, summary_prompt: Option<SystemPrompt>) {
@@ -4294,8 +4347,15 @@ impl Engine {
             self.session.compaction_summary_prompt.as_ref(),
             summary_prompt.clone(),
         );
+        let current_without_working_set =
+            remove_working_set_summary(self.session.system_prompt.as_ref());
+        let merged = merge_system_prompts(current_without_working_set.as_ref(), summary_prompt);
+        let working_set_summary = self
+            .session
+            .working_set
+            .summary_block(&self.config.workspace);
         self.session.system_prompt =
-            merge_system_prompts(self.session.system_prompt.as_ref(), summary_prompt);
+            append_working_set_summary(merged, working_set_summary.as_deref());
     }
 }
 
diff --git a/crates/tui/src/core/engine/tests.rs b/crates/tui/src/core/engine/tests.rs
index de510121..7a3071e5 100644
--- a/crates/tui/src/core/engine/tests.rs
+++ b/crates/tui/src/core/engine/tests.rs
@@ -221,12 +221,80 @@ fn detects_context_length_errors_from_provider_payloads() {
 
 #[test]
 fn context_budget_reserves_output_and_headroom() {
-    let budget = context_input_budget("deepseek-reasoner", TURN_MAX_OUTPUT_TOKENS)
+    let budget = context_input_budget("deepseek-v3.2-128k", TURN_MAX_OUTPUT_TOKENS)
         .expect("deepseek models should have known context window");
     let expected = 128_000usize - 4_096usize - 1_024usize;
     assert_eq!(budget, expected);
 }
 
+#[test]
+fn refresh_system_prompt_places_working_set_after_stable_prefix() {
+    let tmp = tempdir().expect("tempdir");
+    fs::create_dir_all(tmp.path().join("src")).expect("mkdir");
+    fs::write(tmp.path().join("src/lib.rs"), "pub fn sample() {}").expect("write");
+
+    let config = EngineConfig {
+        workspace: tmp.path().to_path_buf(),
+        ..Default::default()
+    };
+    let (mut engine, _handle) = Engine::new(config, &Config::default());
+    engine
+        .session
+        .working_set
+        .observe_user_message("please inspect src/lib.rs", tmp.path());
+
+    engine.refresh_system_prompt(AppMode::Agent);
+
+    let Some(SystemPrompt::Blocks(blocks)) = &engine.session.system_prompt else {
+        panic!("expected structured prompt blocks");
+    };
+    let last = blocks.last().expect("working-set block");
+    assert!(last.text.contains(WORKING_SET_SUMMARY_MARKER));
+    assert!(
+        blocks[..blocks.len() - 1]
+            .iter()
+            .all(|block| !block.text.contains(WORKING_SET_SUMMARY_MARKER))
+    );
+}
+
+#[test]
+fn compaction_summary_stays_before_volatile_working_set() {
+    let tmp = tempdir().expect("tempdir");
+    fs::create_dir_all(tmp.path().join("src")).expect("mkdir");
+    fs::write(tmp.path().join("src/main.rs"), "fn main() {}").expect("write");
+
+    let config = EngineConfig {
+        workspace: tmp.path().to_path_buf(),
+        ..Default::default()
+    };
+    let (mut engine, _handle) = Engine::new(config, &Config::default());
+    engine
+        .session
+        .working_set
+        .observe_user_message("continue in src/main.rs", tmp.path());
+    engine.refresh_system_prompt(AppMode::Agent);
+    engine.merge_compaction_summary(Some(SystemPrompt::Blocks(vec![SystemBlock {
+        block_type: "text".to_string(),
+        text: format!("{COMPACTION_SUMMARY_MARKER}\nsummary"),
+        cache_control: None,
+    }])));
+
+    let Some(SystemPrompt::Blocks(blocks)) = &engine.session.system_prompt else {
+        panic!("expected structured prompt blocks");
+    };
+    let summary_index = blocks
+        .iter()
+        .position(|block| block.text.contains(COMPACTION_SUMMARY_MARKER))
+        .expect("summary block");
+    let working_set_index = blocks
+        .iter()
+        .position(|block| block.text.contains(WORKING_SET_SUMMARY_MARKER))
+        .expect("working-set block");
+
+    assert!(summary_index < working_set_index);
+    assert_eq!(working_set_index, blocks.len() - 1);
+}
+
 #[tokio::test]
 async fn pre_request_refresh_invoked_when_medium_risk() {
     let capacity = CapacityControllerConfig {
@@ -245,6 +313,11 @@ async fn pre_request_refresh_invoked_when_medium_risk() {
         .capacity_controller
         .mark_turn_start(engine.turn_counter);
 
+    // Pin the model to an explicit 128k-context variant so the pressure ratio stays
+    // stable regardless of changes to the workspace-wide default model.
+    engine.session.model = "deepseek-v3.2-128k".to_string();
+    engine.config.model = "deepseek-v3.2-128k".to_string();
+
     let long = "x".repeat(5_000);
     for _ in 0..200 {
         engine.session.messages.push(Message {
diff --git a/crates/tui/src/core/ops.rs b/crates/tui/src/core/ops.rs
index f33f3558..efb42bcb 100644
--- a/crates/tui/src/core/ops.rs
+++ b/crates/tui/src/core/ops.rs
@@ -16,6 +16,9 @@ pub enum Op {
         content: String,
         mode: AppMode,
         model: String,
+        /// Reasoning-effort tier: `"off" | "low" | "medium" | "high" | "max"`.
+        /// `None` lets the provider apply its default.
+        reasoning_effort: Option<String>,
         allow_shell: bool,
         trust_mode: bool,
         auto_approve: bool,
@@ -72,6 +75,7 @@ impl Op {
         content: impl Into<String>,
         mode: AppMode,
         model: impl Into<String>,
+        reasoning_effort: Option<String>,
         allow_shell: bool,
         trust_mode: bool,
         auto_approve: bool,
@@ -80,6 +84,7 @@ impl Op {
             content: content.into(),
             mode,
             model: model.into(),
+            reasoning_effort,
             allow_shell,
             trust_mode,
             auto_approve,
diff --git a/crates/tui/src/core/session.rs b/crates/tui/src/core/session.rs
index 59a964f8..37629848 100644
--- a/crates/tui/src/core/session.rs
+++ b/crates/tui/src/core/session.rs
@@ -13,6 +13,11 @@ pub struct Session {
     /// Model being used
     pub model: String,
 
+    /// Reasoning-effort tier for DeepSeek thinking mode:
+    /// `"off" | "low" | "medium" | "high" | "max"`. `None` lets the provider
+    /// apply its own defaults.
+    pub reasoning_effort: Option<String>,
+
     /// Workspace directory
     pub workspace: PathBuf,
 
@@ -69,6 +74,12 @@ impl SessionUsage {
     pub fn add(&mut self, usage: &Usage) {
         self.input_tokens += u64::from(usage.input_tokens);
         self.output_tokens += u64::from(usage.output_tokens);
+        if let Some(tokens) = usage.prompt_cache_miss_tokens {
+            self.cache_creation_input_tokens += u64::from(tokens);
+        }
+        if let Some(tokens) = usage.prompt_cache_hit_tokens {
+            self.cache_read_input_tokens += u64::from(tokens);
+        }
     }
 }
 
@@ -88,6 +99,7 @@ impl Session {
 
         Self {
             model,
+            reasoning_effort: None,
             workspace,
             system_prompt: None,
             compaction_summary_prompt: None,
diff --git a/crates/tui/src/core/turn.rs b/crates/tui/src/core/turn.rs
index b14c1733..ea187710 100644
--- a/crates/tui/src/core/turn.rs
+++ b/crates/tui/src/core/turn.rs
@@ -57,7 +57,7 @@ impl TurnContext {
             usage: Usage {
                 input_tokens: 0,
                 output_tokens: 0,
-                server_tool_use: None,
+                ..Usage::default()
             },
         }
     }
@@ -94,6 +94,25 @@ impl TurnContext {
     pub fn add_usage(&mut self, usage: &Usage) {
         self.usage.input_tokens += usage.input_tokens;
         self.usage.output_tokens += usage.output_tokens;
+        self.usage.prompt_cache_hit_tokens = add_optional_usage(
+            self.usage.prompt_cache_hit_tokens,
+            usage.prompt_cache_hit_tokens,
+        );
+        self.usage.prompt_cache_miss_tokens = add_optional_usage(
+            self.usage.prompt_cache_miss_tokens,
+            usage.prompt_cache_miss_tokens,
+        );
+        self.usage.reasoning_tokens =
+            add_optional_usage(self.usage.reasoning_tokens, usage.reasoning_tokens);
+    }
+}
+
+fn add_optional_usage(total: Option<u32>, delta: Option<u32>) -> Option<u32> {
+    match (total, delta) {
+        (Some(total), Some(delta)) => Some(total.saturating_add(delta)),
+        (None, Some(delta)) => Some(delta),
+        (Some(total), None) => Some(total),
+        (None, None) => None,
     }
 }
 
diff --git a/crates/tui/src/main.rs b/crates/tui/src/main.rs
index 404eabca..304d25bf 100644
--- a/crates/tui/src/main.rs
+++ b/crates/tui/src/main.rs
@@ -1264,6 +1264,7 @@ async fn test_api_connectivity(config: &Config) -> Result<String> {
         tool_choice: None,
         metadata: None,
         thinking: None,
+        reasoning_effort: None,
         stream: Some(false),
         temperature: None,
         top_p: None,
@@ -1548,6 +1549,7 @@ Provide findings ordered by severity with file references, then open questions,
         tool_choice: None,
         metadata: None,
         thinking: None,
+        reasoning_effort: None,
         stream: Some(false),
         temperature: Some(0.2),
         top_p: Some(0.9),
@@ -2175,6 +2177,7 @@ async fn run_one_shot(config: &Config, model: &str, prompt: &str) -> Result<()>
         tool_choice: None,
         metadata: None,
         thinking: None,
+        reasoning_effort: None,
         stream: Some(false),
         temperature: None,
         top_p: None,
@@ -2213,6 +2216,7 @@ async fn run_one_shot_json(config: &Config, model: &str, prompt: &str) -> Result
         tool_choice: None,
         metadata: None,
         thinking: None,
+        reasoning_effort: None,
         stream: Some(false),
         temperature: Some(0.2),
         top_p: Some(0.9),
@@ -2293,6 +2297,7 @@ async fn run_exec_agent(
             prompt,
             mode,
             model,
+            None,
             auto_approve || config.allow_shell(),
             trust_mode,
             auto_approve,
@@ -2343,25 +2348,19 @@ async fn run_exec_agent(
                 }
                 ends_with_newline = content.ends_with('\n');
             }
-            Event::MessageComplete { .. } => {
-                if !json_output && !ends_with_newline {
-                    println!();
+            Event::MessageComplete { .. } if !json_output && !ends_with_newline => {
+                println!();
+            }
+            Event::ToolCallStarted { name, input, .. } if !json_output => {
+                let summary = summarize_tool_args(&input);
+                if let Some(summary) = summary {
+                    eprintln!("tool: {name} ({summary})");
+                } else {
+                    eprintln!("tool: {name}");
                 }
             }
-            Event::ToolCallStarted { name, input, .. } => {
-                if !json_output {
-                    let summary = summarize_tool_args(&input);
-                    if let Some(summary) = summary {
-                        eprintln!("tool: {name} ({summary})");
-                    } else {
-                        eprintln!("tool: {name}");
-                    }
-                }
-            }
-            Event::ToolCallProgress { id, output } => {
-                if !json_output {
-                    eprintln!("tool {id}: {}", summarize_tool_output(&output));
-                }
+            Event::ToolCallProgress { id, output } if !json_output => {
+                eprintln!("tool {id}: {}", summarize_tool_output(&output));
             }
             Event::ToolCallComplete { name, result, .. } => match result {
                 Ok(output) => {
diff --git a/crates/tui/src/models.rs b/crates/tui/src/models.rs
index 89c248eb..0ecdaae8 100644
--- a/crates/tui/src/models.rs
+++ b/crates/tui/src/models.rs
@@ -3,11 +3,12 @@
 use serde::{Deserialize, Serialize};
 
 pub const DEFAULT_CONTEXT_WINDOW_TOKENS: u32 = 128_000;
+pub const DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS: u32 = 1_000_000;
 pub const DEFAULT_COMPACTION_TOKEN_THRESHOLD: usize = 50_000;
 pub const DEFAULT_COMPACTION_MESSAGE_THRESHOLD: usize = 50;
 const COMPACTION_THRESHOLD_PERCENT: u32 = 80;
 const COMPACTION_MESSAGE_DIVISOR: u32 = 1200;
-const MAX_COMPACTION_MESSAGE_THRESHOLD: usize = 150;
+const MAX_COMPACTION_MESSAGE_THRESHOLD: usize = 500;
 
 // === Core Message Types ===
 
@@ -27,6 +28,10 @@ pub struct MessageRequest {
     pub metadata: Option<serde_json::Value>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub thinking: Option<serde_json::Value>,
+    /// DeepSeek reasoning-effort tier: "off" | "low" | "medium" | "high" | "max".
+    /// Translated by the client into DeepSeek's `reasoning_effort` + `thinking` fields.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub reasoning_effort: Option<String>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub stream: Option<bool>,
     #[serde(skip_serializing_if = "Option::is_none")]
@@ -176,11 +181,17 @@ pub struct MessageResponse {
 }
 
 /// Token usage metadata for a response.
-#[derive(Debug, Serialize, Deserialize, Clone)]
+#[derive(Debug, Serialize, Deserialize, Clone, Default)]
 pub struct Usage {
     pub input_tokens: u32,
     pub output_tokens: u32,
     #[serde(skip_serializing_if = "Option::is_none")]
+    pub prompt_cache_hit_tokens: Option<u32>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub prompt_cache_miss_tokens: Option<u32>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub reasoning_tokens: Option<u32>,
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub server_tool_use: Option<ServerToolUsage>,
 }
 
@@ -188,11 +199,15 @@ pub struct Usage {
 #[must_use]
 pub fn context_window_for_model(model: &str) -> Option<u32> {
     let lower = model.to_lowercase();
-    // DeepSeek models default to 128k unless an explicit *k suffix is present.
+    // Unknown DeepSeek model IDs default to 128k unless an explicit *k suffix is present.
+    // DeepSeek-V4 family and current legacy aliases ship with a 1M context window.
     if lower.contains("deepseek") {
         if let Some(explicit_window) = deepseek_context_window_hint(&lower) {
             return Some(explicit_window);
         }
+        if lower.contains("v4") || is_current_deepseek_v4_alias(&lower) {
+            return Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS);
+        }
         return Some(DEFAULT_CONTEXT_WINDOW_TOKENS);
     }
     if lower.contains("claude") {
@@ -201,6 +216,13 @@ pub fn context_window_for_model(model: &str) -> Option<u32> {
     None
 }
 
+fn is_current_deepseek_v4_alias(model_lower: &str) -> bool {
+    matches!(
+        model_lower,
+        "deepseek-chat" | "deepseek-reasoner" | "deepseek-r1" | "deepseek-v3" | "deepseek-v3.2"
+    )
+}
+
 fn deepseek_context_window_hint(model_lower: &str) -> Option<u32> {
     let bytes = model_lower.as_bytes();
     let mut i = 0usize;
@@ -342,13 +364,29 @@ mod tests {
     use super::*;
 
     #[test]
-    fn deepseek_models_map_to_128k_context_window() {
+    fn current_deepseek_aliases_map_to_v4_1m_context_window() {
         assert_eq!(
             context_window_for_model("deepseek-reasoner"),
-            Some(DEFAULT_CONTEXT_WINDOW_TOKENS)
+            Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS)
+        );
+        assert_eq!(
+            context_window_for_model("deepseek-chat"),
+            Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS)
+        );
+        assert_eq!(
+            context_window_for_model("deepseek-v3"),
+            Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS)
         );
         assert_eq!(
             context_window_for_model("deepseek-v3.2"),
+            Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS)
+        );
+    }
+
+    #[test]
+    fn unknown_deepseek_models_map_to_128k_context_window() {
+        assert_eq!(
+            context_window_for_model("deepseek-coder"),
             Some(DEFAULT_CONTEXT_WINDOW_TOKENS)
         );
         assert_eq!(
@@ -357,6 +395,18 @@ mod tests {
         );
     }
 
+    #[test]
+    fn deepseek_v4_models_map_to_1m_context_window() {
+        assert_eq!(
+            context_window_for_model("deepseek-v4-pro"),
+            Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS)
+        );
+        assert_eq!(
+            context_window_for_model("deepseek-v4-flash"),
+            Some(DEEPSEEK_V4_CONTEXT_WINDOW_TOKENS)
+        );
+    }
+
     #[test]
     fn deepseek_models_with_k_suffix_use_hint() {
         assert_eq!(context_window_for_model("deepseek-v3.2-32k"), Some(32_000));
@@ -372,17 +422,32 @@ mod tests {
 
     #[test]
     fn compaction_threshold_scales_with_context_window() {
-        assert_eq!(compaction_threshold_for_model("deepseek-reasoner"), 102_400);
+        assert_eq!(
+            compaction_threshold_for_model("deepseek-v3.2-128k"),
+            102_400
+        );
         assert_eq!(compaction_threshold_for_model("unknown-model"), 50_000);
     }
 
     #[test]
     fn compaction_message_threshold_scales_with_context_window() {
         assert_eq!(
-            compaction_message_threshold_for_model("deepseek-reasoner"),
+            compaction_message_threshold_for_model("deepseek-v3.2-128k"),
             106
         );
         assert_eq!(compaction_message_threshold_for_model("unknown-model"), 50);
-        assert_eq!(compaction_message_threshold_for_model("claude-3"), 150);
+        // 200k / 1200 = 166, within the raised cap of 500.
+        assert_eq!(compaction_message_threshold_for_model("claude-3"), 166);
+    }
+
+    #[test]
+    fn compaction_scales_for_deepseek_v4_1m_context() {
+        // 80% of 1M = 800k tokens before token-based compaction.
+        assert_eq!(compaction_threshold_for_model("deepseek-v4-pro"), 800_000);
+        // 1M / 1200 = 833, clamped to the 500-message cap.
+        assert_eq!(
+            compaction_message_threshold_for_model("deepseek-v4-pro"),
+            500
+        );
     }
 }
diff --git a/crates/tui/src/modules/text.rs b/crates/tui/src/modules/text.rs
index 6a898a61..66553112 100644
--- a/crates/tui/src/modules/text.rs
+++ b/crates/tui/src/modules/text.rs
@@ -180,6 +180,7 @@ async fn process_deepseek_turn(
         tool_choice: options.tool_choice.clone(),
         metadata: None,
         thinking: None,
+        reasoning_effort: None,
         stream: Some(options.stream),
         temperature: options.temperature,
         top_p: options.top_p,
diff --git a/crates/tui/src/pricing.rs b/crates/tui/src/pricing.rs
index 74762b1a..70d78048 100644
--- a/crates/tui/src/pricing.rs
+++ b/crates/tui/src/pricing.rs
@@ -2,53 +2,66 @@
 //!
 //! Pricing based on DeepSeek's published rates (per million tokens).
 
+use crate::models::Usage;
+
 /// Per-million-token pricing for a model.
 struct ModelPricing {
-    input_per_million: f64,
+    input_cache_hit_per_million: f64,
+    input_cache_miss_per_million: f64,
     output_per_million: f64,
 }
 
 /// Look up pricing for a model name.
 fn pricing_for_model(model: &str) -> Option<ModelPricing> {
     let lower = model.to_lowercase();
-    if lower.contains("deepseek-reasoner") || lower.contains("deepseek-r1") {
-        // DeepSeek-R1: $0.55/M input, $2.19/M output
+    if !lower.contains("deepseek") {
+        return None;
+    }
+    if lower.contains("v4-pro") || lower.contains("v4pro") {
         Some(ModelPricing {
-            input_per_million: 0.55,
-            output_per_million: 2.19,
-        })
-    } else if lower.contains("deepseek-v3.2") {
-        // DeepSeek-V3.2 (with reasoning): same pricing tier as V3
-        Some(ModelPricing {
-            input_per_million: 0.27,
-            output_per_million: 1.10,
-        })
-    } else if lower.contains("deepseek-chat") || lower.contains("deepseek-v3") {
-        // DeepSeek-V3: $0.27/M input, $1.10/M output
-        Some(ModelPricing {
-            input_per_million: 0.27,
-            output_per_million: 1.10,
-        })
-    } else if lower.contains("deepseek") {
-        // Generic DeepSeek fallback (V3 pricing)
-        Some(ModelPricing {
-            input_per_million: 0.27,
-            output_per_million: 1.10,
+            input_cache_hit_per_million: 0.145,
+            input_cache_miss_per_million: 1.74,
+            output_per_million: 3.48,
         })
     } else {
-        None
+        // deepseek-v4-flash and legacy aliases (deepseek-chat, deepseek-reasoner,
+        // deepseek-v3*) all price as v4-flash.
+        Some(ModelPricing {
+            input_cache_hit_per_million: 0.028,
+            input_cache_miss_per_million: 0.14,
+            output_per_million: 0.28,
+        })
     }
 }
 
 /// Calculate cost for a turn given token usage and model.
 #[must_use]
+#[allow(dead_code)]
 pub fn calculate_turn_cost(model: &str, input_tokens: u32, output_tokens: u32) -> Option<f64> {
     let pricing = pricing_for_model(model)?;
-    let input_cost = (input_tokens as f64 / 1_000_000.0) * pricing.input_per_million;
+    let input_cost = (input_tokens as f64 / 1_000_000.0) * pricing.input_cache_miss_per_million;
     let output_cost = (output_tokens as f64 / 1_000_000.0) * pricing.output_per_million;
     Some(input_cost + output_cost)
 }
 
+/// Calculate cost from provider usage, honoring DeepSeek context-cache fields.
+#[must_use]
+pub fn calculate_turn_cost_from_usage(model: &str, usage: &Usage) -> Option<f64> {
+    let pricing = pricing_for_model(model)?;
+    let hit_tokens = usage.prompt_cache_hit_tokens.unwrap_or(0);
+    let miss_tokens = usage
+        .prompt_cache_miss_tokens
+        .unwrap_or_else(|| usage.input_tokens.saturating_sub(hit_tokens));
+    let accounted_input = hit_tokens.saturating_add(miss_tokens);
+    let uncategorized_input = usage.input_tokens.saturating_sub(accounted_input);
+
+    let hit_cost = (hit_tokens as f64 / 1_000_000.0) * pricing.input_cache_hit_per_million;
+    let miss_cost = ((miss_tokens.saturating_add(uncategorized_input)) as f64 / 1_000_000.0)
+        * pricing.input_cache_miss_per_million;
+    let output_cost = (usage.output_tokens as f64 / 1_000_000.0) * pricing.output_per_million;
+    Some(hit_cost + miss_cost + output_cost)
+}
+
 /// Format a USD cost for compact display.
 #[must_use]
 #[allow(dead_code)]
diff --git a/crates/tui/src/runtime_api.rs b/crates/tui/src/runtime_api.rs
index ae1c9d8d..a7dc3523 100644
--- a/crates/tui/src/runtime_api.rs
+++ b/crates/tui/src/runtime_api.rs
@@ -2014,7 +2014,7 @@ mod tests {
                                 usage: Usage {
                                     input_tokens: 10,
                                     output_tokens: 5,
-                                    server_tool_use: None,
+                                    ..Usage::default()
                                 },
                                 status: TurnOutcomeStatus::Completed,
                                 error: None,
@@ -2027,7 +2027,7 @@ mod tests {
                                 usage: Usage {
                                     input_tokens: 0,
                                     output_tokens: 0,
-                                    server_tool_use: None,
+                                    ..Usage::default()
                                 },
                                 status: TurnOutcomeStatus::Completed,
                                 error: None,
@@ -2159,7 +2159,7 @@ mod tests {
                     usage: Usage {
                         input_tokens: 5,
                         output_tokens: 3,
-                        server_tool_use: None,
+                        ..Usage::default()
                     },
                     status: TurnOutcomeStatus::Completed,
                     error: None,
@@ -2281,7 +2281,7 @@ mod tests {
                     usage: Usage {
                         input_tokens: 2,
                         output_tokens: 1,
-                        server_tool_use: None,
+                        ..Usage::default()
                     },
                     status: TurnOutcomeStatus::Completed,
                     error: None,
@@ -2500,7 +2500,7 @@ mod tests {
                     usage: Usage {
                         input_tokens: 4,
                         output_tokens: 2,
-                        server_tool_use: None,
+                        ..Usage::default()
                     },
                     status: TurnOutcomeStatus::Completed,
                     error: None,
diff --git a/crates/tui/src/runtime_threads.rs b/crates/tui/src/runtime_threads.rs
index ca97963d..1b856626 100644
--- a/crates/tui/src/runtime_threads.rs
+++ b/crates/tui/src/runtime_threads.rs
@@ -320,7 +320,7 @@ impl RuntimeThreadStore {
             }
             out.push(thread);
         }
-        out.sort_by(|a, b| b.updated_at.cmp(&a.updated_at));
+        out.sort_by_key(|t| std::cmp::Reverse(t.updated_at));
         Ok(out)
     }
 
@@ -349,7 +349,7 @@ impl RuntimeThreadStore {
                 out.push(turn);
             }
         }
-        out.sort_by(|a, b| a.created_at.cmp(&b.created_at));
+        out.sort_by_key(|a| a.created_at);
         Ok(out)
     }
 
@@ -1044,6 +1044,7 @@ impl RuntimeThreadManager {
                 prompt,
                 mode,
                 model.clone(),
+                None,
                 allow_shell,
                 trust_mode,
                 auto_approve,
@@ -2638,7 +2639,7 @@ mod tests {
                         usage: Usage {
                             input_tokens: 10,
                             output_tokens: 12,
-                            server_tool_use: None,
+                            ..Usage::default()
                         },
                         status: TurnOutcomeStatus::Completed,
                         error: None,
@@ -2913,7 +2914,7 @@ mod tests {
                         usage: Usage {
                             input_tokens: 5,
                             output_tokens: 5,
-                            server_tool_use: None,
+                            ..Usage::default()
                         },
                         status: TurnOutcomeStatus::Completed,
                         error: None,
@@ -3144,7 +3145,7 @@ mod tests {
                 usage: Usage {
                     input_tokens: 0,
                     output_tokens: 0,
-                    server_tool_use: None,
+                    ..Usage::default()
                 },
                 status: TurnOutcomeStatus::Completed,
                 error: None,
@@ -3226,7 +3227,7 @@ mod tests {
                 usage: Usage {
                     input_tokens: 0,
                     output_tokens: 0,
-                    server_tool_use: None,
+                    ..Usage::default()
                 },
                 status: TurnOutcomeStatus::Completed,
                 error: None,
@@ -3286,7 +3287,7 @@ mod tests {
                         usage: Usage {
                             input_tokens: 8,
                             output_tokens: 9,
-                            server_tool_use: None,
+                            ..Usage::default()
                         },
                         status: TurnOutcomeStatus::Completed,
                         error: None,
@@ -3395,7 +3396,7 @@ mod tests {
                                 usage: Usage {
                                     input_tokens: 3,
                                     output_tokens: 3,
-                                    server_tool_use: None,
+                                    ..Usage::default()
                                 },
                                 status: TurnOutcomeStatus::Completed,
                                 error: None,
@@ -3425,7 +3426,7 @@ mod tests {
                                 usage: Usage {
                                     input_tokens: 1,
                                     output_tokens: 1,
-                                    server_tool_use: None,
+                                    ..Usage::default()
                                 },
                                 status: TurnOutcomeStatus::Completed,
                                 error: None,
diff --git a/crates/tui/src/session_manager.rs b/crates/tui/src/session_manager.rs
index 013d7eb1..81c8b408 100644
--- a/crates/tui/src/session_manager.rs
+++ b/crates/tui/src/session_manager.rs
@@ -306,7 +306,7 @@ impl SessionManager {
         }
 
         // Sort by updated_at descending (most recent first)
-        sessions.sort_by(|a, b| b.updated_at.cmp(&a.updated_at));
+        sessions.sort_by_key(|s| std::cmp::Reverse(s.updated_at));
 
         Ok(sessions)
     }
diff --git a/crates/tui/src/settings.rs b/crates/tui/src/settings.rs
index a96306fd..0430be7e 100644
--- a/crates/tui/src/settings.rs
+++ b/crates/tui/src/settings.rs
@@ -235,7 +235,7 @@ impl Settings {
 
                 let Some(model) = normalize_model_name(trimmed) else {
                     anyhow::bail!(
-                        "Failed to update setting: invalid model '{value}'. Expected: a DeepSeek model ID (for example deepseek-chat, deepseek-reasoner, deepseek-v4), or none/default."
+                        "Failed to update setting: invalid model '{value}'. Expected: a DeepSeek model ID (for example deepseek-v4-pro, deepseek-v4-flash), or none/default."
                     );
                 };
                 self.default_model = Some(model);
diff --git a/crates/tui/src/task_manager.rs b/crates/tui/src/task_manager.rs
index 6dd60406..de3d33e5 100644
--- a/crates/tui/src/task_manager.rs
+++ b/crates/tui/src/task_manager.rs
@@ -767,7 +767,7 @@ impl TaskManager {
             .values()
             .map(TaskSummary::from)
             .collect::<Vec<_>>();
-        items.sort_by(|a, b| b.created_at.cmp(&a.created_at));
+        items.sort_by_key(|i| std::cmp::Reverse(i.created_at));
         if let Some(limit) = limit {
             items.truncate(limit);
         }
@@ -1483,7 +1483,7 @@ mod tests {
             data_dir: root,
             worker_count: 1,
             default_workspace: PathBuf::from("."),
-            default_model: "deepseek-v3.2".to_string(),
+            default_model: "deepseek-v4-flash".to_string(),
             default_mode: "agent".to_string(),
             allow_shell: false,
             trust_mode: false,
diff --git a/crates/tui/src/tools/review.rs b/crates/tui/src/tools/review.rs
index eb5d4e9a..497567c6 100644
--- a/crates/tui/src/tools/review.rs
+++ b/crates/tui/src/tools/review.rs
@@ -227,6 +227,7 @@ impl ToolSpec for ReviewTool {
             tool_choice: None,
             metadata: None,
             thinking: None,
+            reasoning_effort: None,
             stream: Some(false),
             temperature: Some(0.2),
             top_p: Some(0.9),
diff --git a/crates/tui/src/tools/subagent.rs b/crates/tui/src/tools/subagent.rs
index e28478f2..732f2866 100644
--- a/crates/tui/src/tools/subagent.rs
+++ b/crates/tui/src/tools/subagent.rs
@@ -2301,6 +2301,7 @@ async fn run_subagent(
             tool_choice: Some(json!({ "type": "auto" })),
             metadata: None,
             thinking: None,
+            reasoning_effort: None,
             stream: Some(false),
             temperature: None,
             top_p: None,
@@ -2311,10 +2312,8 @@ async fn run_subagent(
         let mut tool_uses = Vec::new();
         for block in &response.content {
             match block {
-                ContentBlock::Text { text, .. } => {
-                    if !text.trim().is_empty() {
-                        final_result = Some(text.clone());
-                    }
+                ContentBlock::Text { text, .. } if !text.trim().is_empty() => {
+                    final_result = Some(text.clone());
                 }
                 ContentBlock::ToolUse {
                     id, name, input, ..
diff --git a/crates/tui/src/tui/app.rs b/crates/tui/src/tui/app.rs
index d84bec83..95ab05a3 100644
--- a/crates/tui/src/tui/app.rs
+++ b/crates/tui/src/tui/app.rs
@@ -49,6 +49,81 @@ pub enum AppMode {
     Plan,
 }
 
+/// DeepSeek reasoning-effort tier, mirrored on ChatGPT/Claude effort pickers.
+///
+/// The config file accepts all five string values for forward-compat with
+/// providers that expose the full spectrum; DeepSeek currently collapses
+/// `Low`/`Medium` → `high` and `Max` → `max` at the API boundary. The
+/// keyboard cycler (Shift+Tab) walks only the three behaviorally distinct
+/// tiers: `Off` → `High` → `Max` → `Off`.
+#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
+pub enum ReasoningEffort {
+    Off,
+    Low,
+    Medium,
+    High,
+    #[default]
+    Max,
+}
+
+impl ReasoningEffort {
+    /// Parse a config-file string into an effort tier. Unknown values fall
+    /// back to the default (`Max`) rather than erroring out.
+    #[must_use]
+    pub fn from_setting(value: &str) -> Self {
+        match value.trim().to_ascii_lowercase().as_str() {
+            "off" | "disabled" | "none" | "false" => Self::Off,
+            "low" | "minimal" => Self::Low,
+            "medium" | "mid" => Self::Medium,
+            "high" => Self::High,
+            "max" | "maximum" | "xhigh" => Self::Max,
+            _ => Self::default(),
+        }
+    }
+
+    /// Canonical lowercase label used for config storage and UI hints.
+    #[must_use]
+    pub fn as_setting(self) -> &'static str {
+        match self {
+            Self::Off => "off",
+            Self::Low => "low",
+            Self::Medium => "medium",
+            Self::High => "high",
+            Self::Max => "max",
+        }
+    }
+
+    /// Short label for the header chip.
+    #[must_use]
+    pub fn short_label(self) -> &'static str {
+        match self {
+            Self::Off => "off",
+            Self::Low => "low",
+            Self::Medium => "med",
+            Self::High => "high",
+            Self::Max => "max",
+        }
+    }
+
+    /// Value forwarded to the engine/client. `None` means "provider default"
+    /// (for `Off` we still emit `"off"` so the client can inject
+    /// `thinking = {"type": "disabled"}`).
+    #[must_use]
+    pub fn api_value(self) -> Option<&'static str> {
+        Some(self.as_setting())
+    }
+
+    /// Cycle through the three behaviorally distinct tiers.
+    #[must_use]
+    pub fn cycle_next(self) -> Self {
+        match self {
+            Self::Off => Self::High,
+            Self::Low | Self::Medium | Self::High => Self::Max,
+            Self::Max => Self::Off,
+        }
+    }
+}
+
 /// Sidebar content focus mode.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub enum SidebarFocus {
@@ -302,6 +377,9 @@ pub struct App {
     /// Last status text already promoted from `status_message` into toast state.
     pub last_status_message_seen: Option<String>,
     pub model: String,
+    /// Current reasoning-effort tier for DeepSeek thinking mode.
+    /// Cycled via Shift+Tab; initialized from config at startup.
+    pub reasoning_effort: ReasoningEffort,
     pub workspace: PathBuf,
     pub skills_dir: PathBuf,
     pub use_alt_screen: bool,
@@ -415,6 +493,10 @@ pub struct App {
     pub last_prompt_tokens: Option<u32>,
     /// Last completion token usage
     pub last_completion_tokens: Option<u32>,
+    /// DeepSeek context-cache hit tokens from the last API call.
+    pub last_prompt_cache_hit_tokens: Option<u32>,
+    /// DeepSeek context-cache miss tokens from the last API call.
+    pub last_prompt_cache_miss_tokens: Option<u32>,
     /// Cached git context snapshot for the footer.
     pub workspace_context: Option<String>,
     /// Timestamp for cached workspace context.
@@ -593,6 +675,11 @@ impl App {
             sticky_status: None,
             last_status_message_seen: None,
             model,
+            reasoning_effort: config
+                .reasoning_effort()
+                .map_or_else(ReasoningEffort::default, |s| {
+                    ReasoningEffort::from_setting(s)
+                }),
             workspace,
             skills_dir,
             use_alt_screen,
@@ -673,6 +760,8 @@ impl App {
             runtime_turn_status: None,
             last_prompt_tokens: None,
             last_completion_tokens: None,
+            last_prompt_cache_hit_tokens: None,
+            last_prompt_cache_miss_tokens: None,
             workspace_context: None,
             workspace_context_refreshed_at: None,
             task_panel: Vec::new(),
@@ -763,6 +852,7 @@ impl App {
     }
 
     /// Cycle through modes in reverse: YOLO -> Agent -> Plan
+    #[allow(dead_code)]
     pub fn cycle_mode_reverse(&mut self) {
         let next = match self.mode {
             AppMode::Agent => AppMode::Plan,
@@ -772,6 +862,18 @@ impl App {
         let _ = self.set_mode(next);
     }
 
+    /// Cycle reasoning-effort through the three behaviorally distinct tiers:
+    /// `Off` → `High` → `Max` → `Off`.
+    pub fn cycle_effort(&mut self) {
+        self.reasoning_effort = self.reasoning_effort.cycle_next();
+        self.needs_redraw = true;
+        self.push_status_toast(
+            format!("Thinking: {}", self.reasoning_effort.short_label()),
+            StatusToastLevel::Info,
+            Some(1_500),
+        );
+    }
+
     /// Execute hooks for a specific event with the given context
     pub fn execute_hooks(&self, event: HookEvent, context: &HookContext) -> Vec<HookResult> {
         self.hooks.execute(event, context)
@@ -1538,10 +1640,10 @@ mod tests {
     fn test_update_model_compaction_budget() {
         let mut app = App::new(test_options(false), &Config::default());
         let initial_threshold = app.compact_threshold;
-        app.model = "deepseek-reasoner".to_string();
+        app.model = "deepseek-v3.2-128k".to_string();
         app.update_model_compaction_budget();
         // Threshold may have changed based on model
-        // deepseek-reasoner has 128k context, so threshold should be higher
+        // Explicit 128k DeepSeek model IDs have a higher threshold than unknown models.
         assert!(app.compact_threshold >= initial_threshold);
     }
 
diff --git a/crates/tui/src/tui/history.rs b/crates/tui/src/tui/history.rs
index fd8ff54f..7f31b97a 100644
--- a/crates/tui/src/tui/history.rs
+++ b/crates/tui/src/tui/history.rs
@@ -1432,11 +1432,9 @@ fn status_symbol(started_at: Option<Instant>, status: ToolStatus, low_motion: bo
                 |t| t.elapsed().as_millis(),
             );
             let cycle = u128::from(TOOL_STATUS_SYMBOL_MS);
-            let idx = if cycle == 0 {
-                0
-            } else {
-                (elapsed_ms / cycle) % (TOOL_RUNNING_SYMBOLS.len() as u128)
-            };
+            let idx = elapsed_ms
+                .checked_div(cycle)
+                .map_or(0, |d| d % (TOOL_RUNNING_SYMBOLS.len() as u128));
             TOOL_RUNNING_SYMBOLS[usize::try_from(idx).unwrap_or_default()].to_string()
         }
         ToolStatus::Success => TOOL_DONE_SYMBOL.to_string(),
diff --git a/crates/tui/src/tui/session_picker.rs b/crates/tui/src/tui/session_picker.rs
index f2170bf6..8ac6481b 100644
--- a/crates/tui/src/tui/session_picker.rs
+++ b/crates/tui/src/tui/session_picker.rs
@@ -83,14 +83,14 @@ impl SessionPickerView {
         match self.sort_mode {
             SortMode::Recent => {
                 self.sessions
-                    .sort_by(|a, b| b.updated_at.cmp(&a.updated_at));
+                    .sort_by_key(|s| std::cmp::Reverse(s.updated_at));
             }
             SortMode::Name => {
                 self.sessions.sort_by(|a, b| a.title.cmp(&b.title));
             }
             SortMode::Size => {
                 self.sessions
-                    .sort_by(|a, b| b.message_count.cmp(&a.message_count));
+                    .sort_by_key(|s| std::cmp::Reverse(s.message_count));
             }
         }
 
diff --git a/crates/tui/src/tui/ui.rs b/crates/tui/src/tui/ui.rs
index 1d2ffa2c..a386fc18 100644
--- a/crates/tui/src/tui/ui.rs
+++ b/crates/tui/src/tui/ui.rs
@@ -148,6 +148,8 @@ pub async fn run_tui(config: &Config, options: TuiOptions) -> Result<()> {
                 app.total_conversation_tokens = app.total_tokens;
                 app.last_prompt_tokens = None;
                 app.last_completion_tokens = None;
+                app.last_prompt_cache_hit_tokens = None;
+                app.last_prompt_cache_miss_tokens = None;
                 if let Some(prompt) = saved.system_prompt {
                     app.system_prompt = Some(SystemPrompt::Text(prompt));
                 }
@@ -521,16 +523,16 @@ async fn run_event_loop(
                             app.total_conversation_tokens.saturating_add(turn_tokens);
                         app.last_prompt_tokens = Some(usage.input_tokens);
                         app.last_completion_tokens = Some(usage.output_tokens);
+                        app.last_prompt_cache_hit_tokens = usage.prompt_cache_hit_tokens;
+                        app.last_prompt_cache_miss_tokens = usage.prompt_cache_miss_tokens;
                         if let Some(error) = error {
                             app.status_message = Some(format!("Turn failed: {error}"));
                         }
 
                         // Update session cost
-                        if let Some(turn_cost) = crate::pricing::calculate_turn_cost(
-                            &app.model,
-                            usage.input_tokens,
-                            usage.output_tokens,
-                        ) {
+                        if let Some(turn_cost) =
+                            crate::pricing::calculate_turn_cost_from_usage(&app.model, &usage)
+                        {
                             app.session_cost += turn_cost;
                         }
 
@@ -907,13 +909,11 @@ async fn run_event_loop(
                         let _ = engine_handle.send(Op::Shutdown).await;
                         return Ok(());
                     }
-                    KeyCode::Esc => {
-                        if app.onboarding == OnboardingState::ApiKey {
-                            app.onboarding = OnboardingState::Welcome;
-                            app.api_key_input.clear();
-                            app.api_key_cursor = 0;
-                            app.status_message = None;
-                        }
+                    KeyCode::Esc if app.onboarding == OnboardingState::ApiKey => {
+                        app.onboarding = OnboardingState::Welcome;
+                        app.api_key_input.clear();
+                        app.api_key_cursor = 0;
+                        app.status_message = None;
                     }
                     KeyCode::Enter => match app.onboarding {
                         OnboardingState::Welcome => {
@@ -1068,20 +1068,26 @@ async fn run_event_loop(
 
             // Global keybindings
             match key.code {
-                KeyCode::Enter if app.input.is_empty() && app.transcript_selection.is_active() => {
-                    if open_pager_for_selection(app) {
-                        continue;
-                    }
+                KeyCode::Enter
+                    if app.input.is_empty()
+                        && app.transcript_selection.is_active()
+                        && open_pager_for_selection(app) =>
+                {
+                    continue;
                 }
-                KeyCode::Char('l') if key.modifiers.is_empty() && app.input.is_empty() => {
-                    if open_pager_for_last_message(app) {
-                        continue;
-                    }
+                KeyCode::Char('l')
+                    if key.modifiers.is_empty()
+                        && app.input.is_empty()
+                        && open_pager_for_last_message(app) =>
+                {
+                    continue;
                 }
-                KeyCode::Char('v') if key.modifiers.is_empty() && app.input.is_empty() => {
-                    if open_tool_details_pager(app) {
-                        continue;
-                    }
+                KeyCode::Char('v')
+                    if key.modifiers.is_empty()
+                        && app.input.is_empty()
+                        && open_tool_details_pager(app) =>
+                {
+                    continue;
                 }
                 KeyCode::Char('1') if key.modifiers.contains(KeyModifiers::ALT) => {
                     if key.modifiers.contains(KeyModifiers::CONTROL) {
@@ -1169,11 +1175,11 @@ async fn run_event_loop(
                         return Ok(());
                     }
                 }
-                KeyCode::Char('d') if key.modifiers.contains(KeyModifiers::CONTROL) => {
-                    if app.input.is_empty() {
-                        let _ = engine_handle.send(Op::Shutdown).await;
-                        return Ok(());
-                    }
+                KeyCode::Char('d')
+                    if key.modifiers.contains(KeyModifiers::CONTROL) && app.input.is_empty() =>
+                {
+                    let _ = engine_handle.send(Op::Shutdown).await;
+                    return Ok(());
                 }
                 KeyCode::Esc => match next_escape_action(app, slash_menu_open) {
                     EscapeAction::CloseSlashMenu => app.close_slash_menu(),
@@ -1193,10 +1199,12 @@ async fn run_event_loop(
                 KeyCode::Up if key.modifiers.contains(KeyModifiers::ALT) => {
                     app.scroll_up(3);
                 }
-                KeyCode::Up if key.modifiers.is_empty() && slash_menu_open => {
-                    if app.slash_menu_selected > 0 {
-                        app.slash_menu_selected = app.slash_menu_selected.saturating_sub(1);
-                    }
+                KeyCode::Up
+                    if key.modifiers.is_empty()
+                        && slash_menu_open
+                        && app.slash_menu_selected > 0 =>
+                {
+                    app.slash_menu_selected = app.slash_menu_selected.saturating_sub(1);
                 }
                 KeyCode::Down if key.modifiers.contains(KeyModifiers::ALT) => {
                     app.scroll_down(3);
@@ -1224,7 +1232,7 @@ async fn run_event_loop(
                     app.cycle_mode();
                 }
                 KeyCode::BackTab => {
-                    app.cycle_mode_reverse();
+                    app.cycle_effort();
                 }
                 KeyCode::Char('g')
                     if key.modifiers.is_empty() && app.input.is_empty() && !slash_menu_open =>
@@ -1243,18 +1251,20 @@ async fn run_event_loop(
                     app.scroll_to_bottom();
                 }
                 KeyCode::Char('[')
-                    if key.modifiers.is_empty() && app.input.is_empty() && !slash_menu_open =>
+                    if key.modifiers.is_empty()
+                        && app.input.is_empty()
+                        && !slash_menu_open
+                        && !jump_to_adjacent_tool_cell(app, SearchDirection::Backward) =>
                 {
-                    if !jump_to_adjacent_tool_cell(app, SearchDirection::Backward) {
-                        app.status_message = Some("No previous tool output".to_string());
-                    }
+                    app.status_message = Some("No previous tool output".to_string());
                 }
                 KeyCode::Char(']')
-                    if key.modifiers.is_empty() && app.input.is_empty() && !slash_menu_open =>
+                    if key.modifiers.is_empty()
+                        && app.input.is_empty()
+                        && !slash_menu_open
+                        && !jump_to_adjacent_tool_cell(app, SearchDirection::Forward) =>
                 {
-                    if !jump_to_adjacent_tool_cell(app, SearchDirection::Forward) {
-                        app.status_message = Some("No next tool output".to_string());
-                    }
+                    app.status_message = Some("No next tool output".to_string());
                 }
                 // Input handling
                 KeyCode::Char('j') if key.modifiers.contains(KeyModifiers::CONTROL) => {
@@ -1861,6 +1871,8 @@ async fn dispatch_user_message(
     }
     app.last_prompt_tokens = None;
     app.last_completion_tokens = None;
+    app.last_prompt_cache_hit_tokens = None;
+    app.last_prompt_cache_miss_tokens = None;
     // Persist immediately so abrupt termination can recover this in-flight turn.
     persist_checkpoint(app);
 
@@ -1869,6 +1881,7 @@ async fn dispatch_user_message(
             content,
             mode: app.mode,
             model: app.model.clone(),
+            reasoning_effort: app.reasoning_effort.api_value().map(str::to_string),
             allow_shell: app.allow_shell,
             trust_mode: app.trust_mode,
             auto_approve: app.mode == AppMode::Yolo,
@@ -2324,6 +2337,7 @@ fn render(f: &mut Frame, app: &mut App) {
             .and_then(|value| value.to_str())
             .filter(|value| !value.is_empty())
             .unwrap_or("workspace");
+        let effort_label = app.reasoning_effort.short_label();
         let header_data = HeaderData::new(
             app.mode,
             &app.model,
@@ -2336,7 +2350,8 @@ fn render(f: &mut Frame, app: &mut App) {
             sanitized_context_window,
             app.session_cost,
             sanitized_prompt_tokens,
-        );
+        )
+        .with_reasoning_effort(Some(effort_label));
         let header_widget = HeaderWidget::new(header_data);
         let buf = f.buffer_mut();
         header_widget.render(chunks[0], buf);
@@ -2966,6 +2981,8 @@ fn apply_loaded_session(app: &mut App, session: &SavedSession) {
     app.total_conversation_tokens = app.total_tokens;
     app.last_prompt_tokens = None;
     app.last_completion_tokens = None;
+    app.last_prompt_cache_hit_tokens = None;
+    app.last_prompt_cache_miss_tokens = None;
     app.current_session_id = Some(session.metadata.id.clone());
     app.workspace_context = None;
     app.workspace_context_refreshed_at = None;
@@ -3180,6 +3197,7 @@ fn render_footer(f: &mut Frame, area: Rect, app: &mut App) {
 
 fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec<Span<'static>> {
     let context_spans = footer_context_spans(app);
+    let cache_spans = footer_cache_spans(app);
     let cost_spans = if app.session_cost > 0.001 {
         vec![Span::styled(
             format!("${:.2}", app.session_cost),
@@ -3190,6 +3208,20 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec<Span<'static>> {
     };
 
     let mut candidates = Vec::new();
+    if !context_spans.is_empty() && !cache_spans.is_empty() && !cost_spans.is_empty() {
+        let mut combined = context_spans.clone();
+        combined.push(Span::raw("  "));
+        combined.extend(cache_spans.clone());
+        combined.push(Span::raw("  "));
+        combined.extend(cost_spans.clone());
+        candidates.push(combined);
+    }
+    if !context_spans.is_empty() && !cache_spans.is_empty() {
+        let mut combined = context_spans.clone();
+        combined.push(Span::raw("  "));
+        combined.extend(cache_spans.clone());
+        candidates.push(combined);
+    }
     if !context_spans.is_empty() && !cost_spans.is_empty() {
         let mut combined = context_spans.clone();
         combined.push(Span::raw("  "));
@@ -3199,6 +3231,9 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec<Span<'static>> {
     if !context_spans.is_empty() {
         candidates.push(context_spans);
     }
+    if !cache_spans.is_empty() {
+        candidates.push(cache_spans);
+    }
     if !cost_spans.is_empty() {
         candidates.push(cost_spans);
     }
@@ -3210,6 +3245,23 @@ fn footer_auxiliary_spans(app: &App, max_width: usize) -> Vec<Span<'static>> {
         .unwrap_or_default()
 }
 
+fn footer_cache_spans(app: &App) -> Vec<Span<'static>> {
+    let Some(hit_tokens) = app.last_prompt_cache_hit_tokens else {
+        return Vec::new();
+    };
+    let miss_tokens = app.last_prompt_cache_miss_tokens.unwrap_or(0);
+    let total = hit_tokens.saturating_add(miss_tokens);
+    if total == 0 {
+        return Vec::new();
+    }
+
+    let percent = (f64::from(hit_tokens) / f64::from(total) * 100.0).clamp(0.0, 100.0);
+    vec![Span::styled(
+        format!("cache {:.0}%", percent),
+        Style::default().fg(palette::TEXT_MUTED),
+    )]
+}
+
 fn footer_context_spans(app: &App) -> Vec<Span<'static>> {
     let (_, _, percent) = match context_usage_snapshot(app) {
         Some(snapshot) => snapshot,
@@ -3612,12 +3664,10 @@ fn handle_mouse_event(app: &mut App, mouse: MouseEvent) {
                 app.transcript_selection.head = Some(point);
             }
         }
-        MouseEventKind::Up(MouseButton::Left) => {
-            if app.transcript_selection.dragging {
-                app.transcript_selection.dragging = false;
-                if selection_has_content(app) {
-                    copy_active_selection(app);
-                }
+        MouseEventKind::Up(MouseButton::Left) if app.transcript_selection.dragging => {
+            app.transcript_selection.dragging = false;
+            if selection_has_content(app) {
+                copy_active_selection(app);
             }
         }
         _ => {}
diff --git a/crates/tui/src/tui/ui/tests.rs b/crates/tui/src/tui/ui/tests.rs
index 36746f75..bf51a038 100644
--- a/crates/tui/src/tui/ui/tests.rs
+++ b/crates/tui/src/tui/ui/tests.rs
@@ -322,6 +322,8 @@ fn footer_auxiliary_spans_prioritize_context_when_busy() {
     let mut app = create_test_app();
     app.is_loading = true;
     app.last_prompt_tokens = Some(48_000);
+    app.last_prompt_cache_hit_tokens = Some(36_000);
+    app.last_prompt_cache_miss_tokens = Some(12_000);
     app.session_cost = 12.34;
 
     let compact = spans_text(&footer_auxiliary_spans(&app, 8));
@@ -332,13 +334,27 @@ fn footer_auxiliary_spans_prioritize_context_when_busy() {
     let roomy = spans_text(&footer_auxiliary_spans(&app, 20));
     assert!(roomy.contains("ctx"));
     assert!(roomy.contains('%'));
+    assert!(roomy.contains("cache"));
+}
+
+#[test]
+fn footer_auxiliary_spans_can_display_cache_and_cost_when_roomy() {
+    let mut app = create_test_app();
+    app.last_prompt_tokens = Some(48_000);
+    app.last_prompt_cache_hit_tokens = Some(36_000);
+    app.last_prompt_cache_miss_tokens = Some(12_000);
+    app.session_cost = 12.34;
+
+    let roomy = spans_text(&footer_auxiliary_spans(&app, 32));
+    assert!(roomy.contains("ctx"));
+    assert!(roomy.contains("cache 75%"));
     assert!(roomy.contains("$12.34"));
 }
 
 #[test]
 fn context_usage_snapshot_prefers_estimate_when_reported_exceeds_window() {
     let mut app = create_test_app();
-    app.last_prompt_tokens = Some(320_000);
+    app.last_prompt_tokens = Some(1_200_000);
     app.api_messages = vec![Message {
         role: "user".to_string(),
         content: vec![ContentBlock::Text {
@@ -349,7 +365,7 @@ fn context_usage_snapshot_prefers_estimate_when_reported_exceeds_window() {
 
     let (used, max, percent) =
         context_usage_snapshot(&app).expect("context usage should be available");
-    assert_eq!(max, 128_000);
+    assert_eq!(max, 1_000_000);
     assert!(used > 0);
     assert!(used <= i64::from(max));
     assert!(percent < 100.0);
@@ -372,7 +388,7 @@ fn context_usage_snapshot_prefers_live_estimate_while_loading() {
     let (used, max, percent) =
         context_usage_snapshot(&app).expect("context usage should be available");
     assert_eq!(used, estimated);
-    assert_eq!(max, 128_000);
+    assert_eq!(max, 1_000_000);
     assert!(used > i64::from(app.last_prompt_tokens.expect("reported tokens")));
     assert!(percent > 0.0);
 }
@@ -380,7 +396,7 @@ fn context_usage_snapshot_prefers_live_estimate_while_loading() {
 #[test]
 fn should_auto_compact_before_send_respects_threshold_and_setting() {
     let mut app = create_test_app();
-    app.last_prompt_tokens = Some(123_000);
+    app.last_prompt_tokens = Some(950_000);
     app.auto_compact = true;
     assert!(should_auto_compact_before_send(&app));
 
diff --git a/crates/tui/src/tui/views/mod.rs b/crates/tui/src/tui/views/mod.rs
index bafa1e75..3c808e92 100644
--- a/crates/tui/src/tui/views/mod.rs
+++ b/crates/tui/src/tui/views/mod.rs
@@ -580,7 +580,7 @@ impl ConfigView {
 fn config_hint_for_key(key: &str) -> &'static str {
     match key {
         "model" => {
-            "deepseek-chat | deepseek-reasoner | deepseek-* (aliases: deepseek-v3, deepseek-v3.2, deepseek-r1)"
+            "deepseek-v4-pro | deepseek-v4-flash | deepseek-* (aliases: deepseek-chat, deepseek-reasoner, deepseek-v3, deepseek-v3.2, deepseek-r1)"
         }
         "approval_mode" => "auto | suggest | never",
         "auto_compact" | "calm_mode" | "low_motion" | "show_thinking" | "show_tool_details"
@@ -591,7 +591,7 @@ fn config_hint_for_key(key: &str) -> &'static str {
         "sidebar_width" => "10..=50",
         "sidebar_focus" => "auto | plan | todos | tasks | agents",
         "max_history" => "integer (0 allowed)",
-        "default_model" => "deepseek-chat | deepseek-reasoner | deepseek-* | none/default",
+        "default_model" => "deepseek-v4-pro | deepseek-v4-flash | deepseek-* | none/default",
         _ => "",
     }
 }
@@ -1256,7 +1256,7 @@ impl ModalView for SubAgentsView {
                 if idx > 0 {
                     summary.push(Span::raw("  ·  "));
                 }
-                summary.extend(part.into_iter());
+                summary.extend(part);
             }
             lines.push(Line::from(summary));
             lines.push(Line::from(Span::styled(
diff --git a/crates/tui/src/tui/widgets/header.rs b/crates/tui/src/tui/widgets/header.rs
index 6a7a0598..6b482a38 100644
--- a/crates/tui/src/tui/widgets/header.rs
+++ b/crates/tui/src/tui/widgets/header.rs
@@ -33,6 +33,9 @@ pub struct HeaderData<'a> {
     pub session_cost: f64,
     /// Input tokens from the most recent API call (current context utilization).
     pub last_prompt_tokens: Option<u32>,
+    /// Short label for the current reasoning-effort tier (e.g. "max", "high",
+    /// "off"). Rendered as a chip when space allows.
+    pub reasoning_effort_label: Option<&'a str>,
 }
 
 impl<'a> HeaderData<'a> {
@@ -55,9 +58,17 @@ impl<'a> HeaderData<'a> {
             context_window: None,
             session_cost: 0.0,
             last_prompt_tokens: None,
+            reasoning_effort_label: None,
         }
     }
 
+    /// Attach a short reasoning-effort label for the header chip.
+    #[must_use]
+    pub fn with_reasoning_effort(mut self, label: Option<&'a str>) -> Self {
+        self.reasoning_effort_label = label;
+        self
+    }
+
     /// Set token/cost fields.
     #[must_use]
     pub fn with_usage(
@@ -191,6 +202,27 @@ impl<'a> HeaderWidget<'a> {
         )]
     }
 
+    fn effort_chip_spans(&self, include_prefix: bool) -> Vec<Span<'static>> {
+        let Some(label) = self.data.reasoning_effort_label else {
+            return Vec::new();
+        };
+        let trimmed = label.trim();
+        if trimmed.is_empty() {
+            return Vec::new();
+        }
+        let color = if trimmed.eq_ignore_ascii_case("off") {
+            palette::TEXT_HINT
+        } else {
+            palette::DEEPSEEK_SKY
+        };
+        let body = if include_prefix {
+            format!("⚡{trimmed}")
+        } else {
+            trimmed.to_string()
+        };
+        vec![Span::styled(body, Style::default().fg(color))]
+    }
+
     fn status_variant(
         &self,
         show_stream_label: bool,
@@ -199,7 +231,16 @@ impl<'a> HeaderWidget<'a> {
     ) -> Vec<Span<'static>> {
         let mut spans = Vec::new();
 
+        let effort_spans = self.effort_chip_spans(true);
+        let has_effort = !effort_spans.is_empty();
+        if has_effort {
+            spans.extend(effort_spans);
+        }
+
         if self.data.is_streaming {
+            if has_effort {
+                spans.push(Span::raw("  "));
+            }
             spans.push(Span::styled(
                 "●",
                 Style::default()
diff --git a/crates/tui/src/working_set.rs b/crates/tui/src/working_set.rs
index 974223ff..d0b7ad2e 100644
--- a/crates/tui/src/working_set.rs
+++ b/crates/tui/src/working_set.rs
@@ -274,7 +274,7 @@ impl WorkingSet {
             .values()
             .map(|entry| (entry.path.clone(), score_entry(entry, self.turn)))
             .collect();
-        ranked.sort_by(|a, b| a.1.cmp(&b.1));
+        ranked.sort_by_key(|a| a.1);
 
         let to_remove = self.entries.len().saturating_sub(max_entries);
         for (path, _) in ranked.into_iter().take(to_remove) {
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 904b6ece..a8495b5d 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -81,18 +81,21 @@ Current boundary note:
 
 ### LLM Integration
 
-- **`client.rs`** - HTTP client for DeepSeek's OpenAI-compatible Responses API (with chat fallback)
+- **`client.rs`** - HTTP client for DeepSeek's documented OpenAI-compatible Chat Completions API
 - **`llm_client.rs`** - Abstract LLM client trait with retry logic
 - **`models.rs`** - Data structures for API requests/responses
 
 #### DeepSeek API Endpoints
 
 DeepSeek exposes OpenAI-compatible endpoints. The CLI uses:
-- `https://api.deepseek.com/v1/responses` - preferred Responses API
-- `https://api.deepseek.com/v1/chat/completions` - fallback if Responses is unavailable
+- `https://api.deepseek.com/v1/chat/completions` - normal and streaming model turns
+- `https://api.deepseek.com/v1/models` - live model discovery and health checks
 
-The engine uses `handle_deepseek_turn()` to drive the agent loop against the
-Responses API (with automatic fallback if needed).
+`https://api.deepseek.com/v1` is accepted for OpenAI SDK compatibility, and
+`https://api.deepseek.com/beta` can be configured for beta-only features such as
+strict tool mode, chat prefix completion, and FIM completion. The public
+DeepSeek docs do not document a Responses API path for this workflow; the engine
+drives turns through Chat Completions.
 
 ### Tool System
 
diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md
index 0c3957e2..9a14aae4 100644
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -15,6 +15,11 @@ Overrides:
 
 If both are set, `--config` wins. Environment variable overrides are applied after the file is loaded.
 
+The `deepseek` facade and `deepseek-tui` binary share the same config file for
+DeepSeek auth and model defaults. `deepseek login --api-key ...` writes the
+root `api_key` field that `deepseek-tui` reads directly, and `deepseek --model
+deepseek-chat` is forwarded to the TUI as `DEEPSEEK_MODEL`.
+
 To bootstrap MCP and skills directories at their resolved paths, run `deepseek-tui setup`.
 To only scaffold MCP, run `deepseek-tui mcp init`.
 
@@ -30,7 +35,7 @@ You can define multiple profiles in the same file:
 
 ```toml
 api_key = "PERSONAL_KEY"
-default_text_model = "deepseek-reasoner"
+default_text_model = "deepseek-v4-pro"
 
 [profiles.work]
 api_key = "WORK_KEY"
@@ -50,6 +55,7 @@ These override config values:
 
 - `DEEPSEEK_API_KEY`
 - `DEEPSEEK_BASE_URL`
+- `DEEPSEEK_MODEL` or `DEEPSEEK_DEFAULT_TEXT_MODEL`
 - `DEEPSEEK_SKILLS_DIR`
 - `DEEPSEEK_MCP_CONFIG`
 - `DEEPSEEK_NOTES_PATH`
@@ -123,8 +129,9 @@ If you are upgrading from older releases:
 ### Core keys (used by the TUI/engine)
 
 - `api_key` (string, required): must be non-empty (or set `DEEPSEEK_API_KEY`).
-- `base_url` (string, optional): defaults to `https://api.deepseek.com` (OpenAI-compatible Responses API).
-- `default_text_model` (string, optional): defaults to `deepseek-reasoner`. Any valid DeepSeek model ID is accepted (common IDs: `deepseek-reasoner`, `deepseek-chat`). Use `/models` to discover live IDs from your configured endpoint.
+- `base_url` (string, optional): defaults to `https://api.deepseek.com` for DeepSeek's OpenAI-compatible Chat Completions API. `https://api.deepseek.com/v1` is also accepted for SDK compatibility; use `https://api.deepseek.com/beta` only for DeepSeek beta features such as strict tool mode, chat prefix completion, and FIM completion.
+- `default_text_model` (string, optional): defaults to `deepseek-v4-pro`. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash`. Use `/models` or `deepseek models` to discover live IDs from your configured endpoint. `DEEPSEEK_MODEL` overrides this for a single process.
+- `reasoning_effort` (string, optional): `off`, `low`, `medium`, `high`, or `max`; defaults to the configured UI tier. `off` sends `thinking = {"type": "disabled"}`. `low` and `medium` are normalized to `high`; `max` sends `reasoning_effort = "max"`.
 - `allow_shell` (bool, optional): defaults to `true` (sandboxed).
 - `approval_policy` (string, optional): `on-request`, `untrusted`, or `never`. Runtime `approval_mode` editing in `/config` also accepts `on-request` and `untrusted` aliases.
 - `sandbox_mode` (string, optional): `read-only`, `workspace-write`, `danger-full-access`, `external-sandbox`.
diff --git a/docs/MODES.md b/docs/MODES.md
index 50100290..f5fd30a8 100644
--- a/docs/MODES.md
+++ b/docs/MODES.md
@@ -73,6 +73,7 @@ See `MCP.md`.
 Run `deepseek --help` for the canonical list. Common flags:
 
 - `-p, --prompt <TEXT>`: one-shot prompt mode (prints and exits)
+- `--model <MODEL>`: when using the `deepseek` facade, forward a DeepSeek model override to the TUI
 - `--workspace <DIR>`: workspace root for file tools
 - `--yolo`: start in YOLO mode
 - `-r, --resume <ID|PREFIX|latest>`: resume a saved session
diff --git a/npm/deepseek-tui/README.md b/npm/deepseek-tui/README.md
index fc0c0fd0..482cb246 100644
--- a/npm/deepseek-tui/README.md
+++ b/npm/deepseek-tui/README.md
@@ -20,6 +20,23 @@ npx deepseek-tui --help
 `postinstall` downloads platform binaries into `bin/downloads/` and exposes
 `deepseek` and `deepseek-tui` commands.
 
+## First run
+
+```bash
+deepseek login --api-key "YOUR_DEEPSEEK_API_KEY"
+deepseek doctor
+deepseek
+```
+
+The `deepseek` facade and `deepseek-tui` binary share `~/.deepseek/config.toml`
+for DeepSeek auth and default model settings. Common TUI commands are available
+directly through the facade, including `deepseek doctor`, `deepseek models`,
+`deepseek sessions`, and `deepseek resume --last`.
+
+The app talks to DeepSeek's documented OpenAI-compatible Chat Completions API.
+Set `DEEPSEEK_BASE_URL` only if you need the China endpoint or DeepSeek beta
+features such as strict tool mode, chat prefix completion, or FIM completion.
+
 ## Supported platforms
 
 - Linux x64
diff --git a/npm/deepseek-tui/package.json b/npm/deepseek-tui/package.json
index de1548ea..003d2490 100644
--- a/npm/deepseek-tui/package.json
+++ b/npm/deepseek-tui/package.json
@@ -1,7 +1,7 @@
 {
   "name": "deepseek-tui",
-  "version": "0.3.33",
-  "deepseekBinaryVersion": "0.3.33",
+  "version": "0.4.0",
+  "deepseekBinaryVersion": "0.4.0",
   "description": "Install and run deepseek and deepseek-tui binaries from GitHub release artifacts.",
   "author": "Hmbown",
   "license": "MIT",
diff --git a/npm/deepseek-tui/scripts/install.js b/npm/deepseek-tui/scripts/install.js
index a325938e..0afae017 100644
--- a/npm/deepseek-tui/scripts/install.js
+++ b/npm/deepseek-tui/scripts/install.js
@@ -150,7 +150,7 @@ async function ensureBinary(targetPath, assetName, version, repo, checksums) {
     }
   }
   const url = releaseAssetUrl(assetName, version, repo);
-  const destination = `${targetPath}.download`;
+  const destination = `${targetPath}.${process.pid}.${Date.now()}.download`;
   await download(url, destination);
   try {
     await verifyChecksum(destination, assetName, checksums);