codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	b678058ff0	test: serialise composer_history env mutation + update rlm test for #358 Two test fixes uncovered by the full-suite run: 1. composer_history tests were using a module-local mutex to serialize their HOME env mutation, but other tests in the workspace (config, commands::restore, etc.) ALSO mutate HOME without that lock. Switch to the crate-wide `test_support::lock_test_env()` so all HOME- mutating tests share one mutex. 2. The `prompts::tests::rlm_first_class_guidance_present` test was pinning the OLD "RLM Is First-Class" framing that #358 deliberately reframed as "RLM Is a Specialty Tool". Renamed the test to `rlm_specialty_tool_guidance_present` and updated the assertions to guard the new framing — so a future encouraging-language regression lights up CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:40:57 -05:00
Hunter Bown	735287774f	fix(approval): cache denials per session — ESC on dangerous command stops re-prompting (#360 ) When the user pressed ESC (or Deny / Abort) on an approval prompt, the TUI correctly told the engine to deny the call. But the model would often retry the same command — same name, same args, same approval fingerprint — and the user would see the dialog again, frustrating in the same way the equivalent yes-yes-yes loop would be. Symmetric to the existing `approval_session_approved` "always approve" cache: add `approval_session_denied: HashSet<String>` populated when the user denies (not when the timeout fired — a timeout might mean the user stepped away rather than refused). Subsequent ApprovalRequired events whose approval_key or tool_name match the cache auto-deny via `engine.deny_tool_call(...)` without re-showing the dialog. Logged via `tool.approval.auto_deny_session` so the audit log captures the silent denial. Closes #360. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:35:25 -05:00
Hunter Bown	162e2e027c	feat(composer): cross-session input history persistence (#366 ) Pressing Up-arrow at the composer now recalls submissions from previous sessions, not just the current one. Implementation: - New `crates/tui/src/composer_history.rs` module with `load_history()` + `append_history()`. Persists to `~/.deepseek/composer_history.txt` (one entry per line, oldest first). Capped at 1000 entries — entries older than the cap are pruned at append time so the file never grows unboundedly. - `App::new` now seeds `input_history` from the persisted file at startup, so Up-arrow at first launch shows yesterday's prompts. - `App::submit_message` mirrors each non-slash submission to the persisted history. Slash commands and empty/whitespace submissions are skipped — those don't help recall and would pollute the stream. - Consecutive-duplicate dedup so re-submitting the same prompt doesn't bloat the file. The persisted history is global (not per-workspace) — matches the arrow-up recall pattern users expect from shells and Claude Code. Per- workspace scoping is a follow-up if multi-project users find it noisy. Tests: 6 unit tests cover round-trip, slash-skip, empty-skip, consecutive-duplicate dedup, cap-pruning, and missing-file safety. The test module uses an internal Mutex to serialize HOME env mutations so tests can still run in parallel without stomping each other. Closes #366. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:32:45 -05:00
Hunter Bown	40f7037d8e	docs(prompts): tell the model to skip markdown tables in terminal output (#372 ) Markdown tables don't render correctly in a terminal — monospace fonts plus variable-width content (especially CJK characters) can't reliably align column borders. Adds an "Output formatting" section to both base.md and base.txt instructing the model to prefer plain prose, bulleted/numbered lists, code blocks, or `- Label: value` pairs over tables. If column-aligned data is genuinely necessary, the guidance asks for narrow, ASCII-only, 2–3 column tables. Closes #372. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:27:54 -05:00
Hunter Bown	1986a15dd5	feat(persistence): schema migration framework — forward upgrade path for ~/.deepseek/ records (#350 ) Every persistence layer in crates/tui/src/ already gates `schema_version > CURRENT_` to reject newer-than-supported records (good — prevents silent truncation when an older binary tries to load a v3 file with v4 fields). What was missing: the forward upgrade path* for older records. When we bump CURRENT_SESSION_SCHEMA_VERSION from 3 to 4 to add a field, every v3 session on disk would silently load with the new field's serde default — which is OK for additions but breaks catastrophically for renames or shape changes. This commit lays down the framework: `crates/tui/src/schema_migration.rs` — new module: - `SchemaMigration` trait. Each persistence domain implements it once with `CURRENT_VERSION`, `DOMAIN`, and an ordered `MIGRATIONS` list of `fn(&mut serde_json::Value) -> Result<(), MigrationError>` steps. Index `i` migrates from version `i+1` to `i+2`. - `SchemaMigration::migrate(value, from_version)` — runs every required step, stamping `value["schema_version"]` after each step so a partial failure leaves a known-state record rather than mixed. - `MigrationError` — typed error with from/to versions + reason. - `backup_before_migrate(path, domain)` — creates a `.bak` copy of the source file before mutation. Errors are warn-logged and ignored (continues because `write_atomic` is itself crash-safe). The `.bak` is left on disk as a manual recovery artifact — no automatic GC. `schema_migration::registry` — submodule that registers every existing persistence domain (session, offline_queue, runtime, task, automation, automation_run) at its current version with an empty MIGRATIONS list. No domain has shipped a schema bump yet, so today's behavior is a no-op. The next bump is now a 4-step recipe: 1. Write the `migrate_<domain>_v<N>_to_v<N+1>` step in this module. 2. Append it to `MIGRATIONS` and bump `CURRENT_VERSION`. 3. Wire `<Domain>Migration::migrate(...)` into the load function in the owning module. 4. Add a fixture-based integration test. Tests: 6 unit tests covering no-op, all-steps, partial migration, newer-than-current rejection, backup creation, and backup-failure robustness. Wiring into individual load sites (session_manager, runtime_threads, task_manager, automation_manager) is intentionally deferred until the first actual schema bump needs it — wiring without migrations would add code paths nothing exercises, and the framework is the part that needs to land before the next bump can ship safely. Closes #350. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:23:33 -05:00
Hunter Bown	4a282e767b	chore(deps): remove 8 unused dependencies flagged by cargo-machete (#341 ) cargo-machete found 8 direct dependencies that are declared but never used in the source tree. Removing them tightens the dependency graph and shrinks Cargo.lock by 40 lines (transitive crate removals where nothing else pulled them in). Removed: - deepseek-core: tokio (the core scaffold doesn't drive any tasks itself) - deepseek-config: serde_json (TOML-only crate; no JSON serialization) - deepseek-mcp: deepseek-protocol (proxy boundary doesn't consume protocol types) - deepseek-app-server: tracing (no tracing! macros in the transport layer) - deepseek-tui: bytes, csv, deepseek-tui-cli, tokio-stream - bytes: no Bytes-typed I/O paths in the TUI - csv: agent_swarm/spawn_agents_on_csv removed in #336/#357 - deepseek-tui-cli: TUI is the runtime, not the dispatcher; no facade calls - tokio-stream: futures-util::StreamExt is sufficient for our SSE / mpsc paths Verified by grep across each crate's `src/` — no `use` of the dep, no fully-qualified path references. cargo build, cargo clippy -D warnings, and cargo test continue to pass with the slimmed graph. Closes #341. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:18:31 -05:00
Hunter Bown	f0e1a6c63a	docs(prompts): tighten /rlm guidance — specialty tool, not first-class (#358 ) The previous rlm prompt guidance ("Treat rlm as a normal reasoning tool, not a last-resort escape hatch") encouraged the model to reach for rlm in cases where a direct read_file or focused agent_spawn would do better. The "RLM Is First-Class" framing was too encouraging given that rlm is genuinely a specialty tool: it pays off ONLY when the input can't fit in the model's context window. Three audit items from #358 addressed: 1. Reaching for rlm too often. Reframed as "specialty tool" with explicit do-not-use-when guidance front-loaded. The decomposition workflow now says "ONLY when an input genuinely doesn't fit" with a concrete size threshold (~50K tokens / a whole file / a long transcript / a multi-document corpus). 2. Tool description encourages overuse. The rlm tool's description() now leads with "DO NOT use this tool when..." (input fits, grep suffices, short classification, interactive exploration), and only then describes the legitimate use cases. Adds explicit cost/speed caveat. 3. Helpers documented as if they were tools. Both the rlm tool description and base.md/base.txt now state plainly: `llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched` live INSIDE the Python REPL. They are functions the sub-agent uses, NOT separately-callable tools the model invokes. Closes #358. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:16:31 -05:00
Hunter Bown	bc13dbfee7	fix(tui): show live submit disposition in composer hint (#345 ) When the user has typed something into the composer and hits Enter, the message goes to one of four fates depending on engine state: - Immediate (idle + online) — most common, sends right away - Steer (busy + tool execution) — forwards mid-turn - QueueFollowUp (busy + streaming text) — parks for after TurnComplete - Queue (offline) — parks on offline queue Previously the user had no way to tell which would fire BEFORE pressing Enter. The disposition flips with fast-changing internal state (whether the model is currently streaming text vs. running a tool, whether network connectivity has just dropped) and only the post-submit status toast hinted at the result — which is too late if you wanted a different behaviour. Fix: extend the composer's bottom hint line so when the composer has non-empty content, it shows what Enter will do RIGHT NOW. The hint flips live with engine state, so the user sees the real behaviour before pressing Enter: ↵ steer into current turn (sky blue, busy + tool execution) ↵ queue for next turn (muted, busy + streaming) ↵ offline queue (no engine) (warning yellow, offline) The Immediate case stays unhinted — that's the default and surfacing it would be noise. Closes #345. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:13:45 -05:00
Hunter Bown	c88b980d52	perf(session): metadata-only path for list_sessions — string-extract metadata block (#337 ) `SessionManager::list_sessions` previously called `serde_json::from_reader` to extract just the `metadata` field, which forced serde to scan every JSON token in the file just to validate structure — including the entire `messages` and `tool_log` arrays we were about to discard. For a user with hundreds of long sessions, a single startup `list_sessions()` was reading and parsing tens of MB of JSON. Optimization: read at most 64 KB up front and string-extract the top-level `metadata` object with a brace-balanced, string-aware scanner. Real metadata blocks are < 1 KB and always appear before the large `messages` payload, so the prefix read covers every realistic case. Falls back to a full-file read only if the metadata block isn't extractable from the prefix (legacy or oddly-formatted file). Net: typical session metadata load goes from O(file size) to O(1 KB) regardless of conversation length, and the disk read is bounded. Tests: - extract_top_level_metadata_skips_huge_messages_array — verifies the scanner correctly extracts metadata from a session whose `messages` array contains the literal string `"metadata"` in a user message. - extract_top_level_metadata_handles_braces_inside_strings — verifies brace-in-string handling so `{` / `}` inside JSON string values don't throw off the depth counter. Closes #337. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:11:37 -05:00
Hunter Bown	c5627ebb14	fix(config): /logout + new key now uses the new key (#343 ) After running /logout and entering a new API key, subsequent requests could still be sent with the old key because the resolution path checked the OS keyring before the in-memory override. The keyring still held the old credential, so it shadowed the freshly-typed one. Three changes: 1. `Config::deepseek_api_key()` — explicit override is now path 0. When `self.api_key` is explicitly set (non-empty, non-sentinel), it wins over keyring/env/provider-config. This is what the user just typed, so it should be authoritative. Existing keyring-based flows are unaffected: users who store their key via `auth set` have `self.api_key = None`, so path 1 (keyring) still wins for them. 2. `clear_api_key()` now wipes the keyring + provider-scoped keys. Previously only the legacy root `api_key = ...` line was stripped from config.toml. Now every known provider slot in the OS keyring (deepseek, nvidia-nim, openrouter, novita, fireworks, sglang) is deleted, and every `api_key` line nested in a `[providers.<name>]` table is also stripped. 3. `/logout` clears the in-memory `Config` too. The dispatcher handler in ui.rs::execute_command_input wipes `config.api_key` and every `config.providers.*.api_key` so a future clone of the long-lived Config doesn't leak the stale value. The companion onboarding flow in ui.rs also stamps the new key onto `config` itself rather than only on a one-shot clone, so subsequent /provider switches see the new credential. Test coverage: - `clear_api_key_strips_root_and_provider_scoped_keys` — verifies all three credential locations get wiped from a fixture config.toml. - `deepseek_api_key_prefers_explicit_in_memory_override` — guards the precedence flip. - `deepseek_api_key_ignores_sentinel_placeholder` — confirms the legacy `KEYRING_SENTINEL` placeholder still falls through. Closes #343. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:08:52 -05:00
Hunter Bown	899c703d81	fix(tui): convert remaining tokio::spawn sites to spawn_supervised + restore terminal on panic (#346 ) Completes the panic-safety work #346 started in `a8be33b3`. Converts every trivial production tokio::spawn site to spawn_supervised so a panicking task writes a crash dump to ~/.deepseek/crashes/ and the parent process stays alive. Sites converted: - tools/rlm.rs:190 — RLM progress drain - tools/subagent/mod.rs:888 — run_subagent_task spawn - tools/subagent/mod.rs:988 — run_subagent_task resume - core/engine.rs:744 — sub-agent mailbox drainer - core/engine.rs:1601 — engine event-loop spawn - lsp/client.rs:127 — LSP writer - lsp/client.rs:129 — LSP reader - lsp/client.rs:135 — LSP dispatcher - rlm/bridge.rs:188 — bridge progress drain - task_manager.rs:790 — task worker loop - automation_manager.rs:822 — automation scheduler Sites left as-is (already panic-safe with their own catch_unwind): - runtime_threads.rs:1242, 1462 — custom AssertUnwindSafe + catch_unwind - mcp.rs:322 — MCP SSE loop with custom catch_unwind Sites that don't need conversion: - runtime_api.rs:287 — axum::serve runs in the parent task, not spawned - runtime_api.rs:1583+ — test-helper spawn_test_server inside #[cfg(test)] - All other spawn calls are in #[cfg(test)] modules where panics are expected to propagate. Also: - main.rs panic hook now restores the terminal (LeaveAlternateScreen + disable_raw_mode) before invoking the original hook, so a panicked TUI doesn't leave the user's shell stuck in alt-screen mode. - Adds spawn_supervised_tests::panicking_task_writes_crash_dump_and_does_not_kill_parent that proves a panicking task produces a dated crash log under ~/.deepseek/crashes/<task>.log and the parent task completes Ok. Closes #346. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:02:28 -05:00
Hunter Bown	0c55c732a2	chore(tools): full SwarmOutcome cascade — delete swarm.rs + event variant + UI handlers (#357 ) Completes the v0.8.5 cleanup #336 started: with the model-callable swarm surface gone, the supporting event/UI/state plumbing has no consumers. - Delete crates/tui/src/tools/swarm.rs (2215 lines, parked under #![allow(dead_code)] since #336) - Drop pub mod swarm from tools/mod.rs - Remove Event::SwarmProgress variant + handler in tui/ui.rs - Remove app.rs swarm fields: pending_swarm_task_count, swarm_jobs, last_swarm_id, swarm_card_index (and SwarmOutcome import + retain) - Remove subagent_routing.rs swarm helpers: seed_fanout_card_from_tool_call, sync_fanout_card_from_tool_result, sync_fanout_card_from_swarm_outcome, worker_slot_from_swarm_task, status_to_lifecycle, swarm_task_status_to_lifecycle - Simplify active_fanout_counts to read directly from the active FanoutCard - Simplify handle_subagent_mailbox is_fanout to only "rlm" dispatches - Strip dead "agent_swarm" / "spawn_agents_on_csv" string match arms in ui.rs (tool dispatch, task panel refresh, ListSubAgents trigger, active-cell skip), tool_card.rs (ToolFamily::Fanout), and tool_routing.rs (extract_fanout_prompts function deleted entirely) - Trim WorkerSlot to id/agent_id/status (label/model/nickname were only populated by worker_slot_from_swarm_task); remove unused with_agent ctor - Remove unused SubAgentManager::max_agents and ::available_slots methods (only swarm.rs called them) - Update widgets/agent_card.rs doc comments to point at rlm + future multi-child dispatch instead of agent_swarm FanoutCard decision: kept. It remains the visual primitive for rlm and for any future multi-child dispatch the parent agent makes via repeated agent_spawn calls. Net: 2698 lines removed, 90 added. Closes #357. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:56:33 -05:00
Hunter Bown	42eea19066	fix(tui): route bracketed paste to provider picker key entry instead of composer (#342 ) Add handle_paste(text) -> ViewAction method to the ModalView trait with a default no-op. ProviderPickerView overrides it in KeyEntry stage to sanitize and append pasted text to api_key_input (rejecting whitespace in the same way as the Char handler). Wire into the Event::Paste handler in ui.rs: before falling through to app.insert_paste_text(), check view_stack.handle_paste(). If the top modal consumes the paste, skip the composer entirely. If a modal is open but does NOT consume the paste, also skip the composer — any modal that receives paste while focused should handle it, not leak into the chat input.	2026-05-02 02:33:26 -05:00
Hunter Bown	47bb91a9b7	fix(commands): wire /config <key> <value> to setter — args no longer silently ignored (#338 ) Add config_command(app, arg) that dispatches three paths: /config (no args) -> opens interactive editor (existing behavior) /config <key> -> shows current value of a single setting /config <key> <value> -> sets value via existing set_config_value Keys like model, approval_mode, locale, auto_compact, calm_mode, show_thinking, mode, max_history, sidebar_width, sidebar_focus, composer_density, composer_border, transcript_spacing are all read live from App state for the /config <key> display path. Unknown keys show a helpful error referencing /help config.	2026-05-02 02:28:25 -05:00
Hunter Bown	e5f56dee82	feat(config): add ApiProvider::DeepseekCN variant for China endpoint (#361 ) Add DeepseekCN as a first-class provider variant with: - Enum variant + parse/as_str/display_name/all methods - DEFAULT_DEEPSEEKCN_BASE_URL (https://api.deepseeki.com) - Auto-detection when base_url contains api.deepseeki.com - Locale-based auto-suggest: if no provider is configured and system locale (LC_ALL/LC_MESSAGES/LANG) starts with 'zh-*', the TUI defaults to DeepseekCN at startup - ProvidersConfig.deepseek_cn for provider-scoped credentials - All match arms updated across config.rs, client.rs, provider_picker.rs, main.rs, and ui.rs - provider_picker tests updated for the 7th provider entry	2026-05-02 02:10:57 -05:00
Hunter Bown	a8be33b35b	fix(tui): panic safety foundations — spawn_supervised wrapper + process panic hook (#346 ) Add spawn_supervised(name, location, future) to utils.rs that wraps futures in AssertUnwindSafe + catch_unwind, logs panics via tracing::error!, and writes crash dumps to ~/.deepseek/crashes/. Add process-level panic hook to main.rs that writes crash dumps before the default hook fires. Convert persistence_actor::spawn_persistence_actor as the first spawn_supervised caller to prove the wiring. Remaining 34 tokio::spawn sites marked as follow-up for a focused PR. Also fix save_mcp_config in main.rs to use write_atomic (missed in #355).	2026-05-02 01:53:50 -05:00
Hunter Bown	5bd63c779a	fix(tui): atomic file writes for ~/.deepseek/ persistence (#355 ) Add write_atomic helper (NamedTempFile + fsync + rename) in utils.rs. Convert all non-append fs::write sites: - session_manager.rs: save_session/save_checkpoint/save_offline_queue_state - workspace_trust.rs: write_trust_file_at - task_manager.rs: write_json_atomic → delegates to write_atomic - runtime_threads.rs: write_json_atomic → delegates to write_atomic - mcp.rs: save_config/init_config/save_legacy - audit.rs: buffered append with flush_and_sync after each event - runtime_threads append_event: add sync_all after flush	2026-05-02 01:50:21 -05:00
Hunter Bown	aa23182674	chore(tools): remove /swarm command + agent_swarm/spawn_agents_on_csv tool surface; park swarm.rs pending #357 cascade (#336 ) Surface removed: /swarm slash command, agent_swarm, spawn_agents_on_csv, swarm_status, swarm_result, swarm_cancel tools, report_agent_job_result. Prompts/docs/tests updated. swarm.rs parked with #![allow(dead_code)] pending the full cascade in #357. RLM prompt audit tracked in #358.	2026-05-02 01:30:23 -05:00
Hunter Bown	0ca0570a86	fix(tui): preserve composer draft when navigating input history (#283 )	2026-05-02 00:52:08 -05:00
Hunter Bown	f1f601c28b	perf(tui): persistence actor for session save/checkpoint I/O Replaces synchronous disk writes on the UI thread with a dedicated persistence actor task. The UI now try_sends a PersistRequest and returns immediately — keyboard input is never gated on write completion. Changes: - New persistence_actor module with bounded-coalescing actor - Actor spawns at TUI startup; global singleton so no App struct change - All persist_checkpoint/persist_session_snapshot/clear_checkpoint calls replaced with persistence_actor::persist(PersistRequest::...) - Dropped redundant TurnStarted persist (nothing changed between SendMessage's checkpoint and TurnStarted) - Fixed collapsible_if clippy lint This is the P0 fix for the post-send terminal freeze caused by serialising 500KB+ sessions to disk on the UI thread.	2026-05-02 00:13:45 -05:00
Hunter Bown	6659026dc1	Merge feat/swarm-command-phase-a into feat/v0.8.5 Resolved conflict in commands/mod.rs: both branches added entries to the COMMANDS registry; kept the /swarm command entry.	2026-05-02 00:04:43 -05:00
Hunter Bown	24fc5b6de1	feat(i18n): Phase 1c-extra — keybinding descriptions, /home, /settings, /help labels Closes the gate the maintainer set for v0.8.5: every / command, /help, and /settings should look perfect in both English and Chinese before multi-agent work begins. v0.8.4 shipped Phase 1a/b/c (88 MessageIds) but four mixed-language gaps remained: 1. Keybinding descriptions (41 entries) — the help overlay showed translated section labels (Phase 1c) over English description text. `KeybindingEntry` now carries `description_id: MessageId` instead of a raw `&'static str`; all 41 descriptions translated to en/ja/zh-Hans/pt-BR. 2. Settings: header — `Settings::display` now takes a `Locale` and resolves the title via `MessageId::SettingsTitle`. The field-name keys (auto_compact, calm_mode, etc.) intentionally stay English — they are the literal TOML keys users edit. 3. /home dashboard — entirely English before. ~25 lines of section headers, mode tips, and quick-action hints translated. Path interpolations route through `display_path` (privacy invariant). 4. /help <topic> text command — the inline labels `Usage:` and `Aliases:` plus the `Unknown command:` fallback all use tr(). Also adds three buffer-render tests confirming the help overlay / settings / home dashboard render in zh-Hans without missing markers or English bleed-through. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 23:57:35 -05:00
Hunter Bown	87f42656a7	feat(swarm): add /swarm command with sequential\|mixture\|distill\|deliberate modes (Phase A foundation, #303 )	2026-05-01 23:48:24 -05:00
Hunter Bown	359c27437b	feat(i18n): Phase 1c-extra — keybinding descriptions, /home, /settings, /help labels Closes the gate the maintainer set for v0.8.5: every / command, /help, and /settings should look perfect in both English and Chinese before multi-agent work begins. v0.8.4 shipped Phase 1a/b/c (88 MessageIds) but four mixed-language gaps remained: 1. Keybinding descriptions (41 entries) — the help overlay showed translated section labels (Phase 1c) over English description text. `KeybindingEntry` now carries `description_id: MessageId` instead of a raw `&'static str`; all 41 descriptions translated to en/ja/zh-Hans/pt-BR. 2. Settings: header — `Settings::display` now takes a `Locale` and resolves the title via `MessageId::SettingsTitle`. The field-name keys (auto_compact, calm_mode, etc.) intentionally stay English — they are the literal TOML keys users edit. 3. /home dashboard — entirely English before. ~25 lines of section headers, mode tips, and quick-action hints translated. Path interpolations route through `display_path` (privacy invariant). 4. /help <topic> text command — the inline labels `Usage:` and `Aliases:` plus the `Unknown command:` fallback all use tr(). Also adds three buffer-render tests confirming the help overlay / settings / home dashboard render in zh-Hans without missing markers or English bleed-through. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 23:38:27 -05:00
Hunter Bown	3d3ff0c5cf	Release v0.8.4: Phase 1 i18n + cache-prefix stability * fix(pricing): extend V4 Pro 75% discount expiry to 2026-05-31 15:59 UTC DeepSeek extended the promotional discount past the original 2026-05-05 cutoff. Without this update the TUI would have started showing 4× the actual billed cost on May 6. Source: https://api-docs.deepseek.com/quick_start/pricing — "extended until 2026/05/31 15:59 UTC". Adds a regression test pinning the new active window so a future revert to the May 5 date trips the suite immediately. Closes #267 * chore: remove stale TODO(integrate) markers from already-integrated modules Five `// TODO(integrate)` comments and one matching "Not yet integrated" note were misleading anyone grepping for integration work. Each module is in fact wired up: - execpolicy/mod.rs → tools/shell.rs:1322 (load_default_policy) - sandbox/mod.rs → tools/shell.rs:28, main.rs:2647, tui/approval.rs:30 - sandbox/policy.rs → main.rs:2752, tui/approval.rs:30 (SandboxPolicy) - command_safety.rs → tools/shell.rs:1321, tools/tasks.rs:13, tools/approval_cache.rs:26 - tui/streaming/mod.rs → tui/app.rs:38 (StreamingState) The remaining TODO at mcp.rs:1771 covers a separate "wire legacy sync API into CLI subcommands or remove" decision and is left in place. Closes #266 * docs(release): add install + dual-binary template to GitHub Release page Closes #265. The Release page used the auto-generated commit-title body. New users hitting the Release page from Twitter / npm-search had no on-page guidance that the dispatcher (`deepseek`) and the TUI runtime (`deepseek-tui`) ship as two binaries that must coexist; #258 was an external user spending 11 minutes figuring this out and #272 was the follow-on confusion. The new body covers: - npm wrapper as the recommended install - `cargo install deepseek-tui-cli deepseek-tui --locked` (both crates) - Manual download with a per-platform table showing both artifacts - sha256 verify using the existing `deepseek-artifacts-sha256.txt` - Changelog link * feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss Step 1 of #263. Without per-turn telemetry the prefix-cache audit is unfounded speculation; the rest of the issue's investigation steps depend on this surface. The DeepSeek API already returns `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` per turn, and we already store the latest on App. This adds a 50-turn ring (`turn_cache_history`) populated at the same site as `last_prompt_cache__tokens`, plus a `/cache [count]` slash command that renders a fixed-width table of the last N turns with per-turn ratios and a session aggregate. Default count is 10; larger values clamp to the ring size. Edge cases the formatter handles: - No telemetry yet → friendly "no turns recorded" message - `cache_hit_tokens = None` (provider didn't report) → row renders all em-dashes and is excluded from session aggregates so one missing- telemetry turn can't make the average ratio look broken. - `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as `input − hit` and mark the cell with ``. Footer documents the asterisk. - Ring at cap (50) → push evicts oldest. Tests cover all four paths plus the cap. * test(prompts): add cache-prefix stability harness for #263 step 2 The DeepSeek prefix-cache only hits while the byte prefix of each request matches the prior call. Anything in the cached prefix that varies turn-to-turn for unchanged inputs is a cache buster. Adds a focused harness next to the production surface so the property is regression-guarded: 1. `first_divergence(a, b)` helper that returns the first divergent byte position with a `±32 byte` window of context, used by the custom assertion `assert_byte_identical`. Future suspect tests can reuse this to surface "where" rather than just "fail". 2. `compose_prompt_is_byte_stable_across_calls` — sweeps every (mode, personality) pair and pins that two consecutive calls produce identical bytes. Rules out suspect #4 (mode-prompt churn). 3. `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace` — the call site `engine.rs::build_tool_context` actually invokes, pinned for an empty workspace across all three modes. 4. `system_prompt_with_working_set_summary_is_byte_stable_for_constant_summary` — pins that the surrounding prompt construction faithfully embeds the working_set summary it's given without injecting extra non-determinism. (The actual working_set summary stability lives in `working_set.rs` and is the next investigation target — see issue note in PR description.) Foundation for the suspect-by-suspect bisection in the rest of #263. * fix(secrets): never overwrite the secrets file when load_unlocked errors `FileKeyringStore::set` and `delete` did `self.load_unlocked().unwrap_or_default()`, which wiped every existing secret if the read failed for any reason other than \"file is missing\": - file mode != 0600 (`InsecurePermissions`) — easy on headless / CI environments where a permissive umask got applied - corrupt JSON - transient I/O error In all of those, the next `store_unlocked` overwrote the file with an empty-or-single-entry blob and reset perms to 0600, silently losing every other provider's key. Switch both call sites to `?`. `load_unlocked` already returns `Ok(default)` for a missing file, so the first-write-creates-the-file ergonomic is preserved (covered by the new `file_store_set_still_creates_file_when_missing` test). Adds four regression tests: - set: insecure perms surface InsecurePermissions and leave the file byte-identical. - delete: same. - set: corrupt JSON surfaces the parse error and leaves the file byte-identical. - set: missing file path still works (idempotence guard). Closes #281 * fix(cache): make tool catalog byte-stable across calls and sessions DeepSeek's KV prefix cache hits on the longest matching byte prefix of the request. Two places in the tool-array path were silently introducing divergence: 1. `ToolRegistry::to_api_tools()` iterated `self.tools.values()` directly. Rust's default `HashMap` is seeded with `RandomState` per process, so every `deepseek` launch produced a different tool order — the cross- session resume case (the one with the biggest cache wins) never hit. 2. `active_tool_list_from_catalog()` filtered the catalog `Vec` by the active set in catalog order. When ToolSearch activated a previously- deferred tool mid-conversation, the new tool appeared at its catalog index, shifting every later tool's byte offset and busting the cached prefix from there onwards. Fixes: - `to_api_tools()` now sorts by tool name before emitting the API tool array. Stable across calls AND across launches. - `build_model_tool_catalog()` sorts each partition (built-ins first, contiguous; MCP tools after, also alphabetical). Mirrors Claude Code's `assembleToolPool` strategy where they explicitly call out cache stability as the reason: "a flat sort would interleave MCP tools into built-ins and invalidate all downstream cache keys whenever an MCP tool sorts between existing built-ins." - `active_tool_list_from_catalog()` puts always-loaded tools in catalog order at the head and deferred-but-now-active tools at the tail. A deferred-tool activation during ToolSearch no longer shifts earlier tools' positions. Adds three regression tests: - `to_api_tools_emits_alphabetical_order_regardless_of_registration_order` - `model_tool_catalog_sorts_each_partition_for_prefix_cache_stability` - `active_tool_list_pushes_deferred_activations_to_the_tail` Refs #263. Findings produced by reading reference Claude Code source side-by-side with our request-building flow; full delta analysis in the PR description. * fix(sandbox): elevate Agent-mode shell sandbox to allow network access The seatbelt-default policy is `WorkspaceWrite { network_access: false }`, which on macOS emits `(deny default)` with no `(allow network-outbound)` / `(allow system-socket)`. Every outbound socket call from a sandboxed shell command — including `getaddrinfo` for DNS — gets denied by the kernel. Symptom: "DNS resolution failed" for any URL the model tries to reach via curl, yt-dlp, package managers, etc. Engine.build_tool_context only elevated the policy in Yolo mode, leaving Agent mode (the default) stuck on the strict default. That's tighter than competitors (Claude Code, Codex) without buying any safety the application-level NetworkPolicy or the approval flow doesn't already provide. Switch the elevation to a `match` so: - Plan → no elevation (read-only investigation; shell tool not registered) - Agent → WorkspaceWrite { network_access: true, … } - Yolo → WorkspaceWrite { network_access: true, … } (unchanged) Adds `agent_and_yolo_modes_elevate_shell_sandbox_to_allow_network` so a future revert to the no-network default trips CI immediately. Closes #273 * fix(skills): treat bare github.com/<owner>/<repo> URLs as GitHubRepo Closes #269. `/skill install https://github.com/obra/superpowers` failed on every platform with `invalid gzip header`. Root cause: `InstallSource::parse` matched any `https://`-prefixed spec as `DirectUrl`, so the installer downloaded the HTML repo page (200 OK, `text/html`) and tried to gzip-decode HTML. The user reported it from Win11 + PowerShell but the parse path is platform-independent. Recognize bare GitHub repo URLs in `InstallSource::parse`: - `https://github.com/<owner>/<repo>` - `https://github.com/<owner>/<repo>/` - `https://github.com/<owner>/<repo>.git` - `https://github.com/<owner>/<repo>.git/` - `https://www.github.com/<owner>/<repo>` - `http://github.com/<owner>/<repo>` (legacy) …all route to the existing `GitHubRepo` source, which already produces `https://github.com/<repo>/archive/refs/heads/{main,master}.tar.gz` candidates with proper fallback. URLs with a third path segment (`/archive/...`, `/blob/...`, `/tree/...`) keep going through `DirectUrl` because the user picked that exact path. Adds two regression tests: one asserting the seven recognised forms all canonicalize to `github:obra/superpowers`, and one pinning the sub-resource paths to `DirectUrl`. * fix(cache): drop volatile fields from working_set summary block (#280) (#287) The working-set summary lands inside the system prompt before the historical conversation, so any byte that drifts there cache-misses everything that follows in DeepSeek's KV prefix cache. Two sources of turn-over-turn drift are removed: 1. The rendered line is now `- {path} ({kind})`. The previous form interpolated `entry.touches` and `self.turn - entry.last_turn`, both of which advance on every user message even when no new paths are observed. 2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC) instead of the turn-aware `sorted_entries`. The recency bonus in `score_entry` crosses bucket boundaries as turns advance, so even without rendering `last seen` the order — and which entries cross the `max_prompt_entries` cutoff — drifted. Compaction pinning still uses `sorted_entries` because it genuinely wants recency. Adds a regression test that observes a fixed message set, calls `summary_block` before and after `next_turn()`, and asserts the two outputs are byte-identical. The shared `first_divergence` / `assert_byte_identical` helpers (from #279) move from `prompts::tests` into `test_support` so working_set tests can reuse them. Closes #280. * fix(cache): memoise tool catalog so descriptions stay byte-stable (#289) `to_api_tools` previously re-sampled `tool.description()` and `tool.input_schema()` on every call. Native tools return `&'static str` and a `json!` literal, so the bytes were stable in practice — but the `McpToolAdapter` returns `self.tool.description.as_deref()`, which can drift when the upstream MCP server reconnects with a different description string. Any drift mid-session rewrites the tool catalog that lands in the cached prefix and busts every byte that follows. Adds an `api_cache: OnceLock<Vec<Tool>>` field on `ToolRegistry`. The first `to_api_tools` call materialises the catalog; subsequent calls return a clone of the cached vector. Mutations (`register`, `remove`, `clear`) reset the field so the next read rebuilds. Mirrors reference-cc's `getToolSchemaCache` (`utils/api.ts:119–208`). Tests: - `to_api_tools_pins_description_bytes_across_calls` registers a tool whose `description()` advances through a script of pre-built strings on each call. After the cache is populated, the second `to_api_tools` read returns the original description because `description()` is no longer invoked. Without the cache the second read would return the next script entry. - `register_invalidates_api_tools_cache` registers a tool, snapshots, registers another, snapshots again, and asserts the second snapshot reflects both tools (cache rebuilt) and that the varying tool's description advanced (proving the rebuild actually re-sampled). - `remove_and_clear_invalidate_api_tools_cache` covers the other two invalidation paths. * fix(cache): sort project_tree and summarize_project output (#290) Both helpers walked the workspace via `ignore::WalkBuilder::build()` and emitted entries in the OS readdir order — non-deterministic across filesystems (htree-hash on ext4, insertion-order on APFS, etc.). Their output lands in the fallback branch of the system prompt's project context (when the workspace has no AGENTS.md / CLAUDE.md) and inside the `project_map` tool surface, both of which feed the cached prefix. `summarize_project` now sorts the collected key-files list before the type-detection logic and the fallback `Project with key files: …` join. `project_tree` collects `(rel_path, is_dir)` tuples, sorts by full path, and only then formats the indented tree. Sorting by full path preserves the visual tree shape — `"src" < "src/lib.rs"` because the shorter string compares less — while making siblings deterministic. Tests cover sibling order, parent-before-children invariant, byte stability across two consecutive calls, and the fallback `Project with key files:` branch (the only branch where the joined order escapes into output without further sorting downstream). * fix(client): unique fallback id for parallel streaming tool calls (#291) When a streamed tool_call delta omits the `id` field, the chat-completion decoder used to fall back to the literal string `"tool_call"` for every call. With the V4 API's native parallel tool calls (multiple tool_calls in one delta), every parallel call ended up with the same fallback id — downstream tool-result routing then matched the first call's result twice and the second call hung waiting for an answer that never arrived. The fallback now indexes by the assigned `content_block` position, producing `"call_0"`, `"call_1"`, … within a single response. Upstream- supplied ids are still forwarded verbatim; only the fallback path changes. Tests pin both invariants: - `decoder_assigns_unique_fallback_ids_to_parallel_tool_calls_missing_id` feeds two tool calls without `id` in one delta and asserts they get distinct ids. - `decoder_preserves_upstream_tool_call_id_when_present` keeps the forward-as-is path honest. * fix(cache): place handoff and working_set after static prompt blocks (#292) * fix(cache): drop volatile fields from working_set summary block (#280) The working-set summary lands inside the system prompt before the historical conversation, so any byte that drifts there cache-misses everything that follows in DeepSeek's KV prefix cache. Two sources of turn-over-turn drift are removed: 1. The rendered line is now `- {path} ({kind})`. The previous form interpolated `entry.touches` and `self.turn - entry.last_turn`, both of which advance on every user message even when no new paths are observed. 2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC) instead of the turn-aware `sorted_entries`. The recency bonus in `score_entry` crosses bucket boundaries as turns advance, so even without rendering `last seen` the order — and which entries cross the `max_prompt_entries` cutoff — drifted. Compaction pinning still uses `sorted_entries` because it genuinely wants recency. Adds a regression test that observes a fixed message set, calls `summary_block` before and after `next_turn()`, and asserts the two outputs are byte-identical. The shared `first_divergence` / `assert_byte_identical` helpers (from #279) move from `prompts::tests` into `test_support` so working_set tests can reuse them. Closes #280. * fix(cache): place handoff and working_set after static prompt blocks `system_prompt_for_mode_with_context_and_skills` previously interleaved volatile content into the static prefix: 1. mode prompt static 2. project context static 3. working_set_summary ← volatile 4. skills_block static 5. handoff_block ← volatile 6. ## Context Management static 7. COMPACT_TEMPLATE static Anything past byte (3) cache-missed every time the working-set drifted or `/compact` rewrote `.deepseek/handoff.md` — including the static `## Context Management` and `## Compaction Handoff` blocks behind them. New order keeps every static block in the cached prefix and pushes the two volatile blocks to the end: 1. mode prompt 2. project context (or fallback automap) 3. skills block 4. ## Context Management (Agent / Yolo only) 5. COMPACT_TEMPLATE ── volatile boundary ── 6. handoff block 7. working-set summary Adds a doc comment on the function describing the volatile-content-last invariant so future contributors don't reintroduce churn into the prefix. Adds two regression tests: - `system_prompt_with_handoff_file_is_byte_stable_when_file_is_unchanged` pins the handoff path with a fixture file. - `handoff_and_working_set_appear_after_static_blocks` asserts the ordering invariant directly so a future reorder fails loudly. Reference: Claude Code's own prompt builder marks this same boundary with a `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` constant; we don't introduce the abstraction yet but match the principle. * feat(i18n): localize slash command help (Phase 1a, #285) (#294) Adds 44 new MessageIds, one per slash command, and translations to all four shipped locales (en/ja/zh-Hans/pt-BR). Refactors CommandInfo so the English description now lives in localization.rs (single source of truth) instead of being duplicated on the struct, and threads the active Locale through the three render surfaces: - crates/tui/src/tui/views/help.rs (the ?/F1/Ctrl+/ help overlay) - crates/tui/src/tui/command_palette.rs (Ctrl+K palette) - crates/tui/src/commands/core.rs (the /help text command) Usage strings (e.g. /cache [count]) stay English by design — they're placeholder syntax, not natural language. The existing locale-coverage test (`shipped_first_pack_has_no_missing_core_messages`) already iterates ALL_MESSAGE_IDS across Locale::shipped(), so the 44 new IDs are automatically required to be present in all four locale arms or CI fails. This is the first of several incremental Phase 1 PRs. Phase 1b covers the debug commands (/tokens /cost /cache), 1c the footer hints, and 1d doctor output. Phases 2–3 cover onboarding and error surfaces. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(i18n): localize /tokens /cost /cache debug output (Phase 1b, #285) (#295) Adds 13 new MessageIds covering the report templates and the sub-strings shared across them, with translations for all four shipped locales (en/ja/zh-Hans/pt-BR): - CmdTokensReport, CmdTokensContextWithWindow, CmdTokensContextUnknownWindow - CmdTokensCacheBoth, CmdTokensCacheHitOnly, CmdTokensCacheMissOnly - CmdTokensNotReported - CmdCostReport - CmdCacheNoData, CmdCacheHeader, CmdCacheTotals, CmdCacheFootnote, CmdCacheAdvice Each template uses {placeholder} substitution via String::replace rather than format!, since format! requires a literal — the locale-resolved &'static str isn't one. The placeholder convention ({active}, {hit}, {miss}, …) means a translator can re-order or restructure a sentence freely without changing the call site. Helpers `token_count`, `active_context_summary`, `cache_summary`, and `format_cache_history` now take `Locale` so each can resolve their templates from the same source of truth. The English templates byte-match the previous hardcoded format strings so the existing 16 debug-command tests pass unchanged. Column headers in the cache table (`turn in out hit miss …`) are intentionally NOT localized — the body rows are formatted with fixed column widths and translating the header words would break alignment. Numbers, ratios, and the model id stay in English form. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(i18n): localize footer state + help section labels (Phase 1c, #285) (#296) Adds 11 new MessageIds covering visible footer chrome and the help-overlay section headings, with translations for all four shipped locales: Footer: - FooterWorking — animated `working` / `working.` / … pulse - FooterAgentSingular / FooterAgentsPlural — the sub-agent count chip - FooterPressCtrlCAgain — the quit-confirmation toast Help overlay sections (`?` / `F1` / `Ctrl+/`): - HelpSectionNavigation, HelpSectionEditing, HelpSectionActions, HelpSectionModes, HelpSectionSessions, HelpSectionClipboard, HelpSectionHelp `KeybindingSection::label` now takes Locale and returns tr(locale, …). `footer_working_label` and `footer_agents_chip` likewise take Locale; the two production callsites in tui/ui.rs pass `app.ui_locale`. The mode chip itself (agent / yolo / plan) intentionally stays English — those are brand/acronym labels, and translating them would mean explaining to maintainers what `代理` means in a bug report. The keybinding catalog DESCRIPTIONS (41 entries) are not translated in this PR — those are technical prose that would dwarf the rest of i18n work and can ship in v0.8.5. Section labels are translated so the help overlay groups read as expected in any locale. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(commands): smoke-test that every / command dispatches to a handler (#299) Adds two parallel-safe smoke tests in `crates/tui/src/commands/mod.rs` that iterate the COMMANDS registry and verify every command — and every declared alias — dispatches to a real handler. A dispatch miss surfaces as the fall-through `Unknown command:` error message in `execute`, which used to be invisible until a user typed the command and saw the "did you mean" suggestion fire on a registered command. The tests build a workspace-isolated app via `tempfile::TempDir` so side-effecting handlers (`/init` writing AGENTS.md, `/save` and `/export` writing files) do not pollute `crates/tui/` when CI runs from there. `/save` and `/export` get an explicit tempdir-relative path because their no-arg defaults still resolve relative to `cwd`. `/restore` is skipped — it shells out to git for the snapshot repo and its own dedicated tests in `commands/restore.rs` already serialize on the global env mutex via `scoped_home`. The existing coverage there is sufficient. Closes a gap surfaced when verifying that the v0.8.4 i18n refactor (#294, #295, #296) did not silently break any slash-command dispatch. All 44 commands and their aliases pass (16 aliases on top of the 44 names; `/restore` is the only skip). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump version to 0.8.4 (#297) CHANGELOG entry covers the v0.8.4 work landed since 0.8.3: - Localization Phase 1 (#285) — slash command help (#294), debug command output (#295), footer state and help-overlay section labels (#296). Adds 68 new MessageIds across all four shipped locales (en/ja/zh-Hans/pt-BR). - Cache-prefix stability (#263) — five companion fixes (#287, #288→#292, #289, #290, #291) that keep the DeepSeek prefix cache stable across turns. - Plus the items already in [Unreleased]: agent-mode network exec (#272), /skill GitHub URL parsing (#269), and the V4 Pro discount expiry extension (#267). Bumps: - Cargo.toml workspace version 0.8.3 → 0.8.4 - npm/deepseek-tui/package.json version + deepseekBinaryVersion 0.8.3 → 0.8.4 - Cargo.lock regenerated from the new workspace version. Phase 1d (doctor output), Phase 2 (onboarding/init/missing-companion), and Phase 3 (tool errors / sandbox denials / approvals) deferred to v0.8.5. The shipped Phase 1 surfaces (slash commands, debug telemetry, footer chrome) cover the highest-traffic UI paths Chinese users see first. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(release): bump internal path-dep versions + repair doc link (#301) CI on PR #300 (release feat/v0.8.4 → main) flagged two regressions introduced by the 0.8.4 version bump: 1. Version drift — path-dependency `version = "0.8.3"` references inside the workspace crates (10 crates: agent, app-server, cli, config, core, execpolicy, hooks, mcp, tools, tui) did not move with the workspace `[workspace.package] version = "0.8.4"`. The CI guard `scripts/release/check-versions.sh` requires they match. 2. Broken intra-doc-link `[crate::localization::english]` in the CommandInfo doc comment — `english` is private. Replaced with a reference to the public `description_for` accessor and the public `tr()` function. Verified with: - scripts/release/check-versions.sh — Version state OK. - RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — green. - cargo fmt + clippy + test all green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 23:02:38 -05:00
Hunter Bown	40ec563f3e	Merge pull request #256 from Hmbown/feat/v0.8.3 feat/v0.8.3: privacy, skills bug fix, palette + schema test coverage	2026-05-01 18:41:36 -05:00
Hunter Bown	997c7f4bcd	chore(release): verify dual registry publish state	2026-05-01 11:06:45 -05:00
Hunter Bown	1042e37fbd	refactor(cli): centralize feature command output	2026-05-01 11:06:42 -05:00
Hunter Bown	e620e75f99	chore: release v0.8.3 Bumps workspace, all internal path-deps, and npm wrapper (version + deepseekBinaryVersion) from 0.8.2 → 0.8.3. Lockfile re-locked offline. CHANGELOG entry summarizing the 0.8.3 lane: skills path bug fix, privacy contraction, helpful missing-companion error (#258), engine decomposition (#227), bridge/persistence/palette test gap closures, crates.io badge, and 10 issue closures. Local v0.8.3 verified at /tmp/deepseek-0.8.3-test/ before publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:46:21 -05:00
Hunter Bown	a7e629ae4d	test(parity): scan engine submodules after decomposition refactor The protocol-recovery contract tests `include_str!`-ed `engine.rs` and asserted the fake-wrapper markers (`[TOOL_CALL]`, `<function_calls>`, …) appeared as string literals in that file. The recent engine decomposition refactor (commits f0fad7aa..a64bc9bb) split engine.rs into `engine/streaming.rs`, `engine/turn_loop.rs`, `engine/dispatch.rs`, `engine/tool_setup.rs`, `engine/tool_execution.rs`, `engine/tool_catalog.rs`, `engine/context.rs`, `engine/approval.rs`, `engine/capacity_flow.rs`, and `engine/lsp_hooks.rs`. The marker literals followed the code into those files, so the original single-file `include_str!` no longer saw them and 4 protocol-recovery tests went red. Switch to an `ENGINE_SOURCES: &[&str]` array of `include_str!`s across engine.rs + every submodule, with a small `any_engine_source_contains` helper. Test bodies are otherwise unchanged. The file-size sanity check on `engine.rs` (>10_000 bytes) still passes — engine.rs is still ~65k bytes after the refactor. Same regression coverage as before; just survives the new file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:34:11 -05:00
Hunter Bown	bb88ab9129	fix(cli): make missing-companion-binary error actually helpful (#258 ) @whereiszebra (issue #258) downloaded just \`deepseek-macos-arm64\` from the GitHub Release, ran it, hit: error: deepseek-tui binary not found at /path/to/deepseek-tui. Build workspace default members to install it, or set DEEPSEEK_TUI_BIN to its absolute path. …spent 11 minutes figuring out they also needed \`deepseek-tui-macos-arm64\` sitting next to it, and self-closed with: "Release page does not document that both deepseek-macos-arm64 and deepseek-tui-macos-arm64 must be downloaded together." The dispatcher's error was the wrong message for the population that hits it most often — direct GitHub Release downloaders. "Build workspace default members" is meaningless if you didn't clone the repo. \`DEEPSEEK_TUI_BIN\` is also not what they need. New message lists the three concrete install paths that actually work for a fresh user — npm, cargo, or grab BOTH binaries from the same release page — and keeps the env var override as a final fallback for power users. No logic change; just better text. Existing \`locate_sibling_tui_binary_honours_env_override\` test still passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:28:28 -05:00
Hunter Bown	a64bc9bbe5	refactor(engine): isolate streaming state helpers	2026-05-01 09:09:09 -05:00
Hunter Bown	8379230ef1	refactor(engine): split tool execution helpers	2026-05-01 08:07:22 -05:00
Hunter Bown	8dd5ed38d7	refactor(engine): extract context helpers	2026-05-01 07:09:30 -05:00
Hunter Bown	f0fad7aa2e	refactor(engine): modularize turn tool setup	2026-05-01 06:07:59 -05:00
Hunter Bown	0887a88465	refactor(engine): extract tool catalog helpers	2026-05-01 05:09:39 -05:00
Hunter Bown	d2c007833f	test(rlm): make bridge client seam mockable	2026-05-01 04:10:59 -05:00
Hunter Bown	2cf0c20c76	Merge main into feat/v0.8.3 Brings in PR #255 (npm wrapper Windows smoke). No file overlap with the 0.8.3 work on this branch.	2026-05-01 03:10:54 -05:00
Hunter Bown	84da3b7fc6	test(rlm): cover bridge batch and depth guard	2026-05-01 03:09:05 -05:00
Hunter Bown	df53a22113	test(utils): gate display_path tests to cfg(unix) The tests set \$HOME to drive `dirs::home_dir()`. On Unix that's the contract dirs uses; on Windows dirs reads %USERPROFILE% first, so setting HOME has no effect and the tests fail. The `display_path` function itself is platform-identical — it delegates to `dirs::home_dir()` for the home prefix and uses `std::path::MAIN_SEPARATOR` for the separator after the tilde. The contraction logic is exercised on macOS/Linux which is sufficient coverage for an abstraction whose platform detail is delegated. If we want Windows-specific assertion coverage in the future, it should either set USERPROFILE alongside HOME or accept an injected home dir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:59:57 -05:00
Hunter Bown	100efced80	chore: cargo fmt + drop needless borrow flagged by clippy CI on PR #256 flagged two minor lint hits in the privacy lane: - skills/mod.rs: rustfmt wanted the new regression test's `let rendered = …` line collapsed to one chain. - main.rs:1614: `selected_skills_dir` is already `&PathBuf`, so passing `&selected_skills_dir` is a `&&PathBuf` and clippy's `needless_borrow` triggers under `-D warnings`. No behavior change; same coverage and outputs. Re-runs locally: cargo fmt --all -- --check → clean cargo clippy --workspace --all-targets --all-features --locked -- -D warnings → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:53:29 -05:00
Hunter Bown	cad725f411	Merge pull request #255 from Hmbown/codex/npm-wrapper-windows-smoke ci: smoke npm wrapper on Windows	2026-05-01 02:51:00 -05:00
Hunter Bown	1512afae69	feat(privacy): contract \$HOME to ~ in user-visible display paths Anywhere the TUI, doctor stdout, setup stdout, or onboarding shows a file path, it used to print the absolute form (e.g. /Users/<name>/...). On macOS/Linux the home-directory segment reveals the OS account name, which is often the same as a public handle — undesirable for users who share screenshots, screencasts, or paste doctor output into a public help request. Adds `crate::utils::display_path` that contracts a leading $HOME to `~` and falls through unchanged otherwise. Used at every viewer-visible site: doctor: workspace, config.toml, MCP config, all skills dirs, selected skills dir, tools dir, plugins dir setup: workspace, skills/tools/plugins paths and status output TUI: context inspector header, trust-directory onboarding, shell-job cwd (sidebar + detail pager), subagent task header Persisted state, audit log, session checkpoints, and LLM-bound system prompts intentionally keep the absolute path — those need full fidelity to resolve correctly across processes and the LLM provider sees absolute paths anyway by virtue of the workspace summary. `display_path` has 4 tests covering: home contraction, bare-`~` for home itself, untouched-when-unrelated, and a username-prefix regression guard (so `/Users/alice2/...` doesn't get rewritten when $HOME is `/Users/alice`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:46:20 -05:00
Hunter Bown	f00bae3bfb	ci: smoke npm wrapper on windows	2026-05-01 02:37:37 -05:00
Hunter Bown	d8acd6e3cb	fix(skills): use real on-disk path in model-visible block, not frontmatter name `render_available_skills_context` rendered each skill's file path as `<skills_dir>/<frontmatter-name>/SKILL.md`. The directory name and the frontmatter `name` can differ — community installs and manually-placed skills routinely have this drift — and when they do, the model is told the file lives at a path that does not exist, so it can't open the SKILL.md it needs to actually use the skill. `Skill` now carries a `path: PathBuf` populated by `discover()` from the real directory entry. Renderer uses it directly. Adds a regression test that creates a skill at `weird-dir-name/SKILL.md` with `name: friendly-name` and asserts the rendered prompt contains the real path and not the fabricated one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:36:56 -05:00
Hunter Bown	eac8d82ab8	test(palette): cover disabled MCP server case (#197 ) Acceptance list called for tests of no-config / healthy / disabled / failed servers. Healthy and failed already had a single dense test (`command_palette_includes_mcp_discovery_and_failed_servers`); no-config is implicit in the existing call sites that pass `None` for the snapshot. Disabled was the actual gap — adds one focused case asserting the `[disabled]` state tag appears in the rendered description so users can see disabled servers in the palette without opening the MCP manager. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:36:56 -05:00
Hunter Bown	1089bd3447	docs: add crates.io badge alongside CI + npm Surfaces the deepseek-tui-cli crates.io version next to the existing CI and npm badges in both English and Simplified Chinese READMEs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 02:05:16 -05:00
Hunter Bown	c424f3ec08	test(persistence): cover schema-version rejection in session + runtime_threads (#233 ) The architecture promises that session_manager, runtime_threads, and task_manager reject persisted state from a newer schema_version on load, so a downgraded binary fails loud instead of silently truncating or corrupting data. Existing tests covered: - session_manager::test_checkpoint_rejects_newer_schema - task_manager (newer task schema rejection) - runtime_threads::store_load_thread_rejects_newer_schema_version Adds the missing coverage for the other persistence paths: - session_manager::test_load_session_rejects_newer_schema - session_manager::test_load_offline_queue_rejects_newer_schema - runtime_threads::store_load_turn_rejects_newer_schema_version - runtime_threads::store_load_item_rejects_newer_schema_version Each writes a JSON file with schema_version = CURRENT + 1 (or 999), loads through the public API, and asserts the error message contains "newer than supported". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:51:35 -05:00
Hunter Bown	9a39e69c4d	test(skills): make path assertion portable across separators The render-skills test asserted `rendered.contains("test-skill/SKILL.md")` which only matched on Unix; Windows uses backslashes via Path::display(), so the assertion failed only in CI on windows-latest. Build the expected substring through PathBuf::display() so the assertion matches the platform-correct separator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:49:04 -05:00
Hunter Bown	bf6d82e4ba	chore: release v0.8.2 — Windows build fix, npm offline, model-visible skills, zh-CN README Bumps workspace, all internal path-deps, and npm wrapper (version + deepseekBinaryVersion) from 0.8.1 → 0.8.2. Lockfile re-locked offline to keep the registry index untouched. Triggers auto-tag.yml on push, which creates v0.8.2 and fires release.yml to build cross-platform binaries and draft the GitHub Release. npm publish remains manual per CLAUDE.md release runbook. Note: npm registry already has 0.8.2 published (with binaryVersion 0.8.1 from an earlier checkpoint). That release keeps working unchanged because v0.8.1 binaries stay on GitHub. Repo state aligns to 0.8.2 so the version-drift gate passes; next npm publish (which will need to be 0.8.3 since 0.8.2 is taken) will pick up binaryVersion=0.8.2 and pull the new binaries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:41:47 -05:00

1 2 3 4 5 ...

426 Commits