codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	6fb8739feb	feat(composer): prompt stash — Ctrl+S parks, /stash list+pop (#440 ) A stash is a side-channel from history: it holds drafts the user parked deliberately instead of submissions made in the past (which live in `composer_history.rs`). * `crates/tui/src/composer_stash.rs` — JSONL-backed store at `~/.deepseek/composer_stash.jsonl`. One JSON object per line with `ts` (RFC 3339) and `text`. Self-healing parser drops malformed lines instead of poisoning the file. Multi-line drafts round-trip intact via JSON's newline escaping. Capped at 200 entries; oldest pruned at push time. Empty / whitespace-only text is silently dropped. * `crates/tui/src/commands/stash.rs` — `/stash list` renders the stash with one-line previews and timestamps; `/stash pop` restores the most recently parked draft into the composer (LIFO) and rewrites the file. `/park` aliases `/stash`. * Composer Ctrl+S handler in `tui/ui.rs` — pushes the current draft onto the stash, clears the composer, and surfaces a toast confirming the action so the no-op-feel doesn't fool users into thinking nothing happened. Empty composers are a no-op so a stray Ctrl+S can't pollute the file. * New `KbStashDraft` keybinding entry registered in the help overlay; localized in en, ja, zh-Hans, pt-BR. Tests: 7 unit tests in `composer_stash.rs` cover round-trip, LIFO pop, empty-on-pop, drop-empty-text, multi-line preservation, malformed-line resilience, and cap pruning. 4 unit tests in `commands/stash.rs` cover the preview helper's truncation, multi-line first-line behavior, and empty-input handling.	2026-05-03 06:09:35 -05:00
Hunter Bown	99223b148c	docs(prompt): list load_skill in the model's toolbox reference (#434 ) The new `load_skill` tool was registered into the agent and plan mode tool sets in 0c1699 but the prompt's `## Toolbox` quick-reference still listed only the legacy progressive- disclosure pattern (system prompt → read_file). The model has to read the tool description to know `load_skill` exists, but without a hint in the toolbox it's easy to miss when scanning. Adds a `Skills` line that points at `load_skill` and explains when to prefer it over `read_file` + `list_dir`. Pulls from the existing `## Skills` section above for context, so the model sees one short cross-reference instead of duplicate setup instructions. No code change; prompt-only doc edit. Existing prompt-stability tests pass unchanged because they don't compare prose.	2026-05-03 06:01:15 -05:00
Hunter Bown	0fa042dc99	feat(audit): emit tool.spillover events when output is spilled (#500 polish) The existing `tool.result` audit event records that a tool finished but says nothing about spillover — operators tailing `~/.deepseek/audit.log` couldn't see when 200 KiB of stdout landed under `~/.deepseek/tool_outputs/`. Adds a discrete `tool.spillover` event keyed off `apply_spillover`'s return value, fired in both the sequential and parallel tool paths so the log entry exists regardless of how the tool was scheduled. Each event carries: {"event": "tool.spillover", "tool_id": "...", "tool_name": "exec_shell", "path": "/.../call-abc.txt"} This is a pure observability addition. The model still receives the same truncated head + footer; the UI still renders the inline `full output: <path>` annotation; the spillover writer contract is unchanged. No new tests — `apply_spillover` already has unit-level coverage and the engine paths are exercised by integration runs.	2026-05-03 05:58:02 -05:00
Hunter Bown	6b0a60883a	test(skill): integration tests for the load_skill execute path (#434/#432) The five existing tests cover the helpers (`format_skill_body`, `collect_companion_files`) directly. Adds two integration tests that drive the full `LoadSkillTool::execute` async path: * `execute_finds_skills_in_opencode_dir_via_workspace_discovery` — installs a skill under `<workspace>/.opencode/skills/` and verifies the tool finds it via `discover_in_workspace`, returns the body, and stamps `metadata.skill_path` pointing at the .opencode dir. Pins #432's multi-dir wiring through the actual tool entry point, not just the unit-level helper. * `execute_returns_helpful_error_for_unknown_skill` — verifies the "skill not found" error includes both the missing name and the available skill list so the model can recover without a separate discovery call. Both use `#[tokio::test]` because `ToolSpec::execute` is async. ToolContext is constructed via the existing `ToolContext::new` helper so the test stays hermetic across hosts.	2026-05-03 05:56:29 -05:00
Hunter Bown	d7017b7829	feat(skills): walk workspace .opencode + .claude skill dirs (#432 ) The skills catalogue and `load_skill` tool now scan every candidate directory in the workspace plus the global default, not just the first one that exists: <workspace>/.agents/skills (deepseek-native convention) <workspace>/skills (flat, project-local) <workspace>/.opencode/skills (OpenCode interop) <workspace>/.claude/skills (Claude Code interop) ~/.deepseek/skills (global, user-installed) Skills installed for any AI-tool convention land in the same catalogue without the user having to symlink or duplicate files. Name conflicts resolve first-match-wins per the precedence list above, so workspace-local skills shadow user/global ones — that's the right shadowing for "this repo overrides my defaults". Implementation: * `skills::skills_directories(workspace)` returns the existing candidate dirs in precedence order (host-dependent for the global default). * `skills::discover_in_workspace(workspace)` walks each, merges the discovered skills, and accumulates warnings. * `render_available_skills_context_for_workspace(workspace)` wraps `discover_in_workspace` for `prompts.rs`. The legacy single-dir `render_available_skills_context(skills_dir)` is retained as a fallback so callers that don't have a workspace view (e.g. mcp_server.rs) still work. * `LoadSkillTool` (#434) routes through `discover_in_workspace` so its lookup matches what the system-prompt catalogue advertises. The "skill not found" error message now lists the searched dirs to help the user debug missing installs. Tests: 4 new tests in `skills/mod.rs`: precedence-order resolution, first-wins merge across .agents and .claude, .opencode discovery, system-prompt rendering for cross-tool dirs. The existing 6 single-dir tests pass unchanged.	2026-05-03 05:52:28 -05:00
Hunter Bown	8290b136e1	feat(tui): push DISAMBIGUATE_ESCAPE_CODES on startup (#442 ) Opt into the Kitty keyboard protocol's escape-code disambiguation so terminals that support it (Kitty, Ghostty, Alacritty 0.13+, WezTerm, recent Konsole / xterm) report unambiguous events for Option/Alt-modified keys, plain Esc, and multi-byte sequences. Push happens after `enable_raw_mode` and the alt-screen / mouse-capture / bracketed-paste setup so the order matches shutdown's reverse-order pop. Only the disambiguation tier is pushed — `REPORT_EVENT_TYPES` and the higher tiers emit release events that the existing key handlers would mis-route as duplicate presses. Pop on exit was already wired in main.rs (panic) and ui.rs (normal shutdown) per #443; the recent #443 follow-up extended that to the suspend paths so editor / shell-suspend children inherit a clean keyboard mode. The push + the four pops form a complete pair. Failure to push is logged at debug level and ignored — a quirky terminal can't block startup. On terminals without protocol support the escape sequence is silently discarded and behaviour is identical to today (iTerm2, Terminal.app, Windows 10 conhost). No new dependency; everything runs through crossterm's existing `PushKeyboardEnhancementFlags` command.	2026-05-03 05:45:52 -05:00
Hunter Bown	e8af3cd37d	feat(tools): load_skill model-callable tool (#434 ) Adds a `load_skill` tool that takes a skill id and returns the SKILL.md body plus the sibling companion-file list in one tool call. The existing progressive-disclosure pattern (system prompt lists skills → model `read_file <path>`) still works; this tool is the higher-level affordance for skills that ship with multiple resource files. Implementation: * `LoadSkillTool` lives in `crates/tui/src/tools/skill.rs`. Read- only, auto-approved, parallel-safe. * On call, resolves the active skills directory via the new `skills::resolve_skills_dir` helper, which mirrors `App::new`'s hierarchy: `<workspace>/.agents/skills` → `<workspace>/skills` → `~/.deepseek/skills`. No new plumbing through ToolContext — the workspace is already there. * Returns the skill body wrapped in a self-contained block: description quote, source path, the SKILL.md verbatim, and a `## Companion files` section listing siblings (sorted lex, deterministic for tests). Solo skills skip the companions section entirely so the tool result stays tight. * Errors with a helpful hint when the name is unknown — the hint includes the catalogue ("Available: foo, bar, baz") so the model can recover without an extra discovery call. * Wired into `ToolRegistryBuilder::with_skill_tools` and pulled into both Agent and Plan tool-setup paths. Plan mode benefits because skills are read-only references that planners often need. Tests: 5 unit tests covering: description-headed body, companion enumeration excluding SKILL.md and nested dirs, empty result for solo skills, and the conditional `## Companion files` section.	2026-05-03 05:43:18 -05:00
Hunter Bown	20913b2f17	test(config): pin instructions-array merge semantics (#454 follow-up) Adds four tests that pin the documented contract for the new `instructions = [...]` field added in 0c1699: * Project array replaces the user array wholesale (the typical "merge" pattern is for users who want both — they list ~/global.md inside the project array). * Explicit `instructions = []` clears the user list — a project signalling "this repo doesn't want any of those globals". * Absent project field leaves the user list intact (nothing in the project file → user wins by default). * Empty / whitespace-only entries are filtered out — the user shouldn't get a "could not read instructions file" warning for a stray `""` in the array. These were the semantics promised in the original #454 commit and the `config.example.toml` doc; pinning them with tests prevents regressions.	2026-05-03 05:33:09 -05:00
Hunter Bown	5deaf97253	fix(tui): pop keyboard flags on suspend paths too (#443 follow-up) `main.rs` (process panic) and the normal TUI shutdown both pop keyboard enhancement flags before handing the terminal back to the child shell. The two suspend paths — `pause_terminal` (Ctrl+Z and shell-suspend) and `external_editor::spawn_editor_for_input` (composer `$EDITOR` launch) — were missing the same defensive pop. Today this is dormant: the TUI doesn't push keyboard enhancement flags explicitly, so there's nothing to pop. The fix is defence-in-depth: the day a future code path enables the flags (kitty keyboard protocol for sub-second-precision modifier reporting, say), the suspend handlers won't leak the half-configured input mode to Vim / less / a shell child. Aligns the four terminal-handoff sites (shutdown, panic, suspend, editor) so they all do the same thing.	2026-05-03 05:29:11 -05:00
Hunter Bown	ac0c16996e	feat(config): instructions array merged into system prompt (#454 ) Adds a new optional `instructions = ["./AGENTS.md", "~/.deepseek/global.md"]` config field that's loaded at startup and concatenated into the system prompt, in declared order, above the skills block. * `Config::instructions: Option<Vec<String>>` — raw paths from `~/.deepseek/config.toml` or the per-project overlay. * `Config::instructions_paths()` — `expand_path` each entry, drop empties, return the resolved `Vec<PathBuf>`. * `merge_project_config` — project's array replaces the user-level array wholesale (including `instructions = []` to clear the user list for the current repo). The typical "merge" pattern is for users who want both — they list `~/global.md` inside the project array. * `EngineConfig::instructions: Vec<PathBuf>` — threaded from config through both engine entry points (`Engine::new` for Default and `refresh_system_prompt` for runtime swaps). * `prompts::render_instructions_block(paths)` — loads each file in order, caps each at 100 KiB with a `[…elided]` marker on overflow, skips missing files with a tracing warning. Returns `None` when nothing renders so the caller appends nothing. * `system_prompt_for_mode_with_context_and_skills` gains an `instructions: Option<&[PathBuf]>` parameter. Block lives between the project-context block and the skills block so it benefits from KV prefix caching and per-project overrides apply consistently turn-over-turn. Documentation: * `config.example.toml` documents the field, the wholesale- override semantics, and the size cap. Tests: * 5 new tests in `prompts.rs`: no-op for empty input, skip missing files, declared-order concatenation, skip empty files, truncate oversize files, plus an end-to-end test that the block appears in the assembled system prompt when configured.	2026-05-03 05:25:31 -05:00
Hunter Bown	5e83f073b1	feat(footer): cumulative session-elapsed indicator (#448 ) Adds `App::session_started_at: Instant` (set at construction) and a low-priority `worked Nh Mm` chip in the footer's right cluster that surfaces session age once it crosses 60s. * `footer_worked_chip(elapsed)` returns empty spans for the first minute of a session so a fresh launch doesn't render a noisy ticker. Above the threshold it reuses the multi-day `humanize_duration` helper (#447) so the band promotion stays consistent: `1m`, `3h 12m`, `2d 5h`, `1w 2d`. * The chip slots in last in `auxiliary_spans`, which means under narrow widths it's the first thing the priority-drop loop removes — the existing chips (coherence / agents / replay / cache / mcp) keep their slots. * `FooterProps` carries a captured `worked: Vec<Span<'static>>` built at props-build time (matches the existing `retry` capture pattern). Render stays pure, tests can pin a known state without relying on wall-clock. Tests: 3 new tests in `tui/widgets/footer.rs` — chip hidden under 60s, chip rendered with humanized labels at 60s / 3h 12m / 2d 5h bands. The existing `from_app_idle_state` test gains a `worked.is_empty()` assertion (the test app is freshly constructed, well under the 60s threshold).	2026-05-03 05:17:01 -05:00
Hunter Bown	6dfb10f321	feat(a11y): NO_ANIMATIONS env override + accessibility docs (#450 ) `fancy_animations: false` and `low_motion: true` already exist on the settings struct, but the flag was undocumented and the only ways to opt in were the `/settings` slash command or hand-editing `~/.config/deepseek/settings.toml` — there was no environment- level signal that platform a11y tooling could carry forward. * `NO_ANIMATIONS=1` env var now forces `low_motion = true` and `fancy_animations = false` at startup, regardless of what's on disk. Recognises `1`, `true`, `yes`, `on` (case-insensitive); any other value is treated as unset. * `Settings::apply_env_overrides()` is now called at the end of `Settings::load()`, so every consumer (App::new, /config, the doctor surface) sees the override applied uniformly. The override is a startup-time overlay — changing the env var mid-session has no effect. * New `docs/ACCESSIBILITY.md` documents the existing `low_motion`, `fancy_animations`, `calm_mode`, `show_thinking`, and `show_tool_details` toggles plus the `NO_ANIMATIONS` startup override. Includes guidance for screen-reader users and a link back to this issue for follow-up motion regressions. Tests: 3 new tests in `settings.rs` (force-low-motion-on, override- user-opt-in, truthy-spelling-recognition). All three serialise through a static Mutex so the cargo parallel runner doesn't observe interleaved env mutations.	2026-05-03 05:09:17 -05:00
Hunter Bown	3625b887fa	feat(ui): humanize_duration handles hours, days, and weeks (#447 ) Long-running sessions (multi-hour cycles, multi-day automations) were rendering with the seconds/minutes-only formatter, so a two-day session showed as `2880m 0s` and `/goal` status used Rust's Debug Duration form (`188415.234s`). `humanize_duration` now walks through w/d/h/m/s and caps the output at two units so it stays compact in headers and notifications: * `45s`, `1m 12s`, `59m 59s` (existing seconds/minutes path) * `1h`, `2h 2m`, `3h 12m` (was `192m 30s`) * `1d`, `1d 1h`, `2d 5h` (the multi-day case from the issue) * `1w`, `1w 1d`, `3w 2d` (long-running automation) The two-tier rule drops sub-minute precision once you're past the hour boundary; the goal is "is this a couple of hours or days," not stopwatch precision. `/goal` status now wires through this formatter so multi-day goal-elapsed times read as `2d 3h` instead of the previous `188415.234s` Debug form. The notification system was the existing caller and picks up the new format automatically. Tests: 4 test functions in `notifications.rs` covering the four formatting bands (s/m, h/m, d/h, w/d) plus the boundary cases on each unit.	2026-05-03 05:05:30 -05:00
Hunter Bown	0b99ad1f25	feat(engine): wire tool-output spillover into the engine and pager (#500 ) The spillover writer (#422) and inline cell annotation (#423) were already in place; this commit makes the pipeline actually fire and gives the user a way to see the elided tail. * `apply_spillover` lives in `tools/truncate.rs` and mutates a `ToolResult` in place: writes the full content to `~/.deepseek/tool_outputs/<id>.txt`, replaces the inline content with a 32 KiB head plus a footer pointing at the file, and stamps `metadata.spillover_path` so downstream renderers can find it. Skips error results so the model still sees the failure verbatim. Preserves prior metadata when present. * `core/engine/turn_loop.rs` calls `apply_spillover` immediately after `execute_tool_with_lock` returns, before the result fans out to the model context (`ContentBlock::ToolResult`) and the UI (`Event::ToolCallComplete`). Both the parallel and sequential tool paths get the same hook so the model and the UI always see the same truncated content. * `tui/ui.rs::open_details_pager_for_cell` now folds the full spillover-file body into the tool-details pager when the focused cell has a `spillover_path`. Truncated head stays at the top (so the user can see what the model received) followed by a `── Full output (spillover) ──` separator and the file body. Missing files render an inline notice instead of silently truncating. * The model's footer ("Use `read_file path=…` if you need the elided tail") teaches the agent how to recover the rest of the payload on its next turn, so spilled output is not lost — just not paid for in context tokens unless the agent decides it actually needs the tail. Tests: 4 new unit tests in `tools/truncate.rs` (no-op below threshold, no-op for errors, truncate + stamp above threshold, preserve prior metadata). 3 new tests in `tui/ui/tests.rs` for the pager helper (no-op without spillover_path, file-load happy path, graceful notice when the file is missing).	2026-05-03 05:02:11 -05:00
Hunter Bown	637d0f088f	fix(agents): list Implementer/Verifier in agent_spawn + agent_assign schemas (#404 ) The SubAgentType enum gained `Implementer` and `Verifier` variants in #404, but the JSON-schema `description` strings on AgentSpawnTool::input_schema and DelegateToAgentTool::input_schema still listed the pre-#404 set (general/explore/plan/review/custom). The model only sees those descriptions, so the new roles were effectively hidden behind a docs lookup. Updates both descriptions to the post-#404 surface and references docs/SUBAGENTS.md for posture. Also adds the long-form aliases (builder/validator/tester) to the agent_assign hint so it matches the canonical alias map. Pure copy change — no behaviour delta.	2026-05-03 04:50:51 -05:00
Hunter Bown	de4085304d	feat(tui): inline spillover-path annotation in tool cells (#423 ) PR #422 (sister commit on this branch) shipped the storage half: \`crates/tui/src/tools/truncate.rs\` writes large tool outputs to \`~/.deepseek/tool_outputs/<id>.txt\` and the boot prune drops files older than 7 days. This commit ships the UI half — the inline annotation that surfaces the spilled path in the tool cell so the user (and the model) can find the elided tail. ### What's wired - New \`spillover_path: Option<PathBuf>\` field on \`GenericToolCell\`. Threaded through every construction site (production + test fixtures = 28 sites; bulk-updated via a Python regex that preserves indentation per site). - \`tool_routing::push_orphan_tool_completion\` now reads \`ToolResult.metadata.spillover_path\` and stamps it on the cell. When tools start writing the metadata field (#500's wiring step), the annotation lights up automatically. - \`GenericToolCell::lines_with_mode\` emits a one-line muted annotation in \`RenderMode::Live\` only: full output: /Users/you/.deepseek/tool_outputs/call-abc12.txt Transcript-mode replay omits the annotation because the full output is already inline. - \`render_spillover_annotation\` truncates the path to fit narrow widths (40-col sidebar friendly) using the existing \`truncate_text\` helper. ### Why no OSC 8 hyperlink yet The OSC 8 wrap-link helper lives on PR #515's branch (also stacked on \`chore/v0.8.8-stabilization\`); both PRs land independently to \`main\`. Once both are in, a follow-up commit can wrap the path in \`osc8::wrap_link\` so supporting terminals make it Cmd+click-openable. The plain-text path works in every terminal today, so there's no functional regression. ### Tests 4 new tests in \`tui::history::tests\`: - \`render_spillover_annotation_shows_path\` — full path appears in the live-mode render - \`render_spillover_annotation_omitted_in_transcript_mode\` — transcript replay leaves the annotation off - \`render_spillover_annotation_omitted_when_no_path_set\` — the common case (most tool results don't trigger spillover) is unaffected - \`render_spillover_annotation_truncates_to_width\` — narrow widths don't overflow the line ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1877 + supporting (was 1873) Closes #423. #500 (preview pane) now has both halves of its prerequisites in place — the bytes are on disk (#422) and the path is surfaced in the cell (#423). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:44:43 -05:00
Hunter Bown	cf616e03bd	feat(tools): spillover-file writer + 7-day boot prune (#422 ) #500 (tool-output spillover preview pane) explicitly depends on #422 (the storage writer) and #423 (the UI annotation). This ships the storage half so the other two unblock cleanly. ### What's wired New module \`crates/tui/src/tools/truncate.rs\`: - \`spillover_root()\` — resolves \`~/.deepseek/tool_outputs/\`. - \`spillover_path(id)\` — sanitises the tool call id (ASCII alphanumerics + \`-\`/\`_\`, drops \`.\` so \`..\` can't escape) and returns \`<root>/<id>.txt\`. - \`write_spillover(id, content)\` — atomic write via the existing \`utils::write_atomic\` helper. Creates parent directory on demand. - \`prune_older_than(max_age)\` — drops files older than \`max_age\` by mtime. Returns the count pruned. Per-file errors are logged and skipped, never propagated. - \`maybe_spillover(id, content, threshold, head_bytes)\` — convenience for the "too long? spill it." pattern. Walks back to the previous UTF-8 char boundary so the head slice is always valid \`str\`. Constants: - \`SPILLOVER_DIR_NAME = "tool_outputs"\` - \`SPILLOVER_THRESHOLD_BYTES = 100 KiB\` (mirrors \`MAX_MEMORY_SIZE\` for cross-feature consistency) - \`SPILLOVER_MAX_AGE = 7 days\` (mirrors workspace snapshot prune) Boot wiring in \`run_interactive\` calls \`prune_older_than\` unconditionally; non-fatal — any error is logged at WARN and the TUI starts anyway. ### Module-level \`#[allow(dead_code)]\` The boot-prune is the only live caller today. The storage helpers (\`write_spillover\`, \`maybe_spillover\`, \`spillover_path\`) are intentionally unused outside the module's own tests until #423 / #500 land — those follow-ups need the bytes to exist, and the contracts are pinned by tests so they can't drift before then. Module-level \`#![allow(dead_code)]\` documents the deferral with a comment pointing at the follow-up issues. ### Tests 8 unit tests in \`tools::truncate::tests\`: - \`sanitise_id\` keeps safe chars, drops dangerous ones (\`..\`, \`/etc/passwd\` traversal attempts). - \`write_spillover\` creates the directory and writes content. - \`write_spillover\` rejects empty / fully-invalid ids. - \`maybe_spillover\` returns \`None\` below threshold. - \`maybe_spillover\` writes + returns the head slice above threshold. - \`maybe_spillover\` walks back to a char boundary so the head string is never mid-codepoint (regression test using 4-byte whale emojis). - \`prune_older_than\` is a no-op when the root doesn't exist. - \`prune_older_than\` keeps fresh files and drops stale ones via a Unix \`utimensat\` backdating helper. Tests serialize through a static \`Mutex\` because they share process-global \`$HOME\`; the \`with_test_home\` helper documents the SAFETY contract for the env-var override. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1873 + supporting (was 1865) Closes #422 (storage half). #423 and #500 remain open with the bytes now reachable on disk for them to cite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:38:00 -05:00
Hunter Bown	89500e4ebe	feat(commands): /sessions prune <days> slash command (#406 phase-1.5) The previous commit shipped \`SessionManager::prune_sessions_older_than\` as a bare helper marked \`#[allow(dead_code)]\` pending phase-2 wiring. This commit wires it into a user-callable slash command so users can clean up stale sessions today, and removes the dead-code allow. ### Surface \`\`\` /sessions → open the picker (existing) /sessions show \| list \| picker → alias for the picker /sessions prune <days> → drop sessions older than N days \`\`\` \`/sessions prune 30\` returns "pruned N sessions older than 30d" or "no sessions older than 30d to prune". Errors: - missing arg → usage hint - non-positive / non-integer arg → typed error - unknown subcommand → typed error with usage The prune handler builds a fresh \`SessionManager\` from \`default_location\` so it reads the same \`~/.deepseek/sessions/\` directory the persistence layer writes; doesn't take a lock since it's a one-shot CLI-style operation that runs to completion. ### What changed - \`commands::session::sessions\` now takes \`arg: Option<&str>\` and dispatches \`show\` / \`prune\` / unknown. - New \`prune\` private fn parses the days argument, opens \`SessionManager::default_location\`, calls \`prune_sessions_older_than\` with the corresponding \`Duration\`. - \`commands::COMMANDS\` table updated: usage now reads \`/sessions [show\|prune <days>]\`. - \`commands::mod\` dispatch arm passes \`arg\` through. - \`SessionManager::prune_sessions_older_than\` doc comment updated to reflect the live wiring; \`#[allow(dead_code)]\` removed. ### Tests 5 new tests in \`commands::session::tests\`: - existing \`test_sessions_pushes_picker_view\` updated to the new signature - \`test_sessions_show_subcommand_pushes_picker_view\` — \`/sessions show\` is an explicit alias for the picker - \`test_sessions_prune_requires_days_argument\` — missing arg produces usage hint - \`test_sessions_prune_rejects_non_positive_days\` — \`0\`, negative, non-numeric, and decimal inputs are all rejected - \`test_sessions_unknown_subcommand_errors\` — typo path errors with subcommand list ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1865 + supporting Refines #406 — phase 1.5 (user-callable surface) shipped on top of phase 1 (helper). Phase 2 (boot-prune + retention policy) stays open for v0.8.9 once the policy is decided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:31:00 -05:00
Hunter Bown	220f1b30c5	feat(sessions): SessionManager::prune_sessions_older_than helper (#406 phase-1) #406 asks for an auto-archive system for old session state. The full design needs prior-art survey + retention-policy decisions that are explicitly out of scope for v0.8.8. This commit ships the building block — a tested public method that removes session files older than a given Duration — so phase 2 can wire it into a config-knob boot prune without re-litigating the implementation. \`\`\`rust pub fn prune_sessions_older_than( &self, max_age: std::time::Duration, ) -> std::io::Result<usize> { ... } \`\`\` Behaviour: - Compares against the metadata's \`updated_at\` (not filesystem mtime — the user may have rsynced \`~/.deepseek\`; fs mtimes can lie about real session age). - Returns the count pruned; failures on individual files are logged at WARN and skipped, not propagated, so one bad record doesn't block the rest. - Skips the checkpoint subdirectory entirely. Top-level \`<session_id>.json\` files are the only candidates; \`checkpoints/latest.json\` and friends are owned by the checkpoint subsystem and live with stricter durability rules. - Marked \`#[allow(dead_code)]\` with a comment pointing at #406 phase 2 — the helper exists today, the production wiring lands next. ### Tests 5 new tests in \`session_manager::tests\`: - empty directory returns zero - all-fresh records survive - all-stale records get removed - mixed directory removes only the stale ones - checkpoint subdirectory is left alone (file untouched, count is still 1 for the top-level stale record) ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1861 + supporting Refines #406 — phase 1 (helper + tests) shipped. Issue stays open for the v0.8.9 phase-2 work that decides the retention policy and boot-prune wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:26:35 -05:00
Hunter Bown	1b7939e680	feat(doctor): surface memory feature state in --json output (#489 ) Operators ask "is memory on?" and "where does it live?" without wanting to boot the TUI. Adds a \`memory\` block to the JSON doctor report: \`\`\`json "memory": { "enabled": false, // honours DEEPSEEK_MEMORY env "path": "/Users/you/.deepseek/memory.md", // expanded path "file_present": false // does the file exist on disk? } \`\`\` The \`enabled\` field reads \`DEEPSEEK_MEMORY\` directly so it stays correct on this stabilization branch even though the dedicated \`Config::memory_enabled()\` accessor lives on the memory-MVP branch (#518). When both PRs land, the duplicated env-parse can collapse to a single method call — TODO comment marks the spot. Verified: - \`deepseek doctor --json\` shows \`enabled: false\` by default - \`DEEPSEEK_MEMORY=on deepseek doctor --json\` shows \`enabled: true\` - All gates green (1856 main + supporting) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:16:01 -05:00
Hunter Bown	2f9b58b910	fix(agents): include Implementer/Verifier aliases in error message hint (#404 ) The "Invalid sub-agent type" error message lists the accepted type strings so the model can self-correct. The list still showed the original 5 names plus their aliases — adding the new types' canonical names and aliases keeps the error helpful when the model misspells. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:06:43 -05:00
Hunter Bown	1ae042d56b	feat(agents): add Implementer and Verifier sub-agent roles (#404 ) The existing taxonomy (General / Explore / Plan / Review / Custom) covered "do something" / "go look" / "think first" / "grade work" / "explicit allowlist" but had no distinct posture for two of the most common patterns: - Implementer — "land this change with the minimum surrounding edit". Distinct from General's flexible posture and Plan's no-execution stance. - Verifier — "run the test suite and report pass/fail with evidence". Distinct from Review's read-and-grade stance. Per the issue body's guidance ("avoid a large undifferentiated role list") this PR adds only those two. Researcher and ReleaseManager stay open as v0.8.9 candidates if user demand surfaces. ### What's wired - Two new \`SubAgentType\` variants: \`Implementer\`, \`Verifier\`. - New prompt constants \`IMPLEMENTER_AGENT_PROMPT\` and \`VERIFIER_AGENT_PROMPT\` with role-specific posture (write-the- minimum-edit / run-the-tests-don't-fix-them). - \`from_str\` accepts the obvious aliases: \`implementer\` / \`implement\` / \`implementation\` / \`builder\` for Implementer; \`verifier\` / \`verify\` / \`verification\` / \`validator\` / \`tester\` for Verifier. Case-insensitive like the existing aliases. - \`as_str\` round-trips: every variant's label parses back to the same variant via \`from_str\`. Test pins this so a future role addition can't accidentally drop the round-trip. - Deprecated \`allowed_tools()\` advisory list updated: Implementer carries write/edit/patch + shell + checklist tools; Verifier carries read + shell + run_tests + diagnostics but no write tools. - \`crates/tui/src/tui/views/mod.rs\` agent-type sort order extended to include the new variants; \`format_agent_type\` now delegates to \`as_str\` so future additions land in one place. ### Tests - 6 new tests in \`tools::subagent::tests\`: - alias coverage for Implementer (4 aliases) and Verifier (5) - round-trip via \`as_str\` for all variants — guards against forgetting to register a new variant in either direction - distinct-prompts guard: catches the copy-paste bug where two new variants would inherit the same prompt as General - Implementer's advisory list contains write tools - Verifier's advisory list contains test-runner tools but NO writes ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1856 + supporting Closes #404 (minimal-taxonomy interpretation per the issue body). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:05:26 -05:00
Hunter Bown	68ec91999b	feat(tui): clarify Plan panel role + drop empty-state placeholder (#408 ) The Plan panel used to render a blunt "No active plan" line whenever the model hadn't called \`update_plan\` yet — even when the panel had a goal or a cycle counter to show above it. That made the panel look broken on every fresh session. Per the audit posted on the issue (option 1 of three), this PR keeps the Plan panel as the single source of truth for the \`update_plan\` tool's output and drops the placeholder when the panel is fully quiet, replacing it with a one-line hint that explains what the panel tracks. When a goal or cycle counter is already showing above, the empty-steps body collapses entirely so the section doesn't look ambiguous next to populated content. The panel's role is documented in a doc comment on \`render_sidebar_plan\` so the next person doesn't have to re-derive it from the issue tracker. ### What's wired - \`render_sidebar_plan\` checks "anything above" (goal + cycle_count) before deciding whether to emit the empty-state hint. If either is showing, the empty steps body adds nothing. - New \`plan_panel_empty_hint(width)\` helper composes the hint string with proper width-aware truncation. - New module-level doc comment explains the Plan panel's role (update_plan output + /goal + cycle counter) and contrasts it with Todos. ### Tests - 3 new tests in \`tui::sidebar::tests\`: - hint mentions \`update_plan\` and \`/goal\` so the user understands what populates the panel - hint truncates correctly to a 16-column sidebar without overflowing - regression guard: the hint never re-introduces "no active plan" wording ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1850 + supporting Closes #408 (option 1 of the audit). Options 2 (merge with todos) and 3 (move to top-row chip) remain open as v0.9.0 design candidates once #504's right-panel work is on the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:00:47 -05:00
Hunter Bown	256f59dd33	feat(agents): session-boundary classification for sub-agents (#405 ) \`agent_list\` previously surfaced every persisted sub-agent the manager had on disk — including agents from prior sessions still hanging around in \`subagents.v1.json\`. In long-lived sessions this piled up and the model had to reason past 13 listed agents when only four were live. Now: each \`SubAgentManager\` assigns a fresh \`session_boot_id\` at construction. Every agent it spawns is stamped with that id, persisted alongside the existing fields, and reloaded as-is by future managers. At list time the manager classifies any agent whose stamp doesn't match the current id (or that loaded with no stamp at all from pre-#405 records) as \`from_prior_session\`. \`agent_list\` defaults to the current-session view: prior-session agents are dropped from the listing unless they're still \`Running\` (which can happen after a process restart — the manager flagged them as \`Interrupted\` on load). Pass \`include_archived=true\` to surface every record, with the \`from_prior_session\` flag on each result so the model can tell live vs archived apart at a glance. ### What's wired - \`SubAgentManager::current_session_boot_id\` — UUID-derived, generated in \`new\`. - \`SubAgent::session_boot_id\` and \`PersistedSubAgent::session_boot_id\` — the latter \`#[serde(default)]\` for backward compat (pre-#405 records load with empty string and classify as prior-session). - \`SubAgentResult::from_prior_session\` — \`#[serde(default, skip_serializing_if = "is_false")]\` so today's clients reading archived snapshots see the field, while default-false snapshots serialize without an extra noisy key. - \`SubAgentManager::list_filtered(include_archived)\` — the new user-facing API. \`SubAgent::snapshot()\` still defaults the flag to \`false\`; \`snapshot_for_listing\` (manager-only) fills it in. - \`AgentListTool\` accepts \`include_archived: bool\` (default false) and routes through the filter. The model-facing description explains the behaviour. ### Tests - \`session_boot_ids_are_unique_per_manager\` — each manager mints its own id. - \`list_filtered_drops_prior_session_terminals_by_default\` — the three-agent matrix (current running / prior completed / prior running) collapses to the right two with the right flags. - \`list_filtered_with_include_archived_returns_everything\` — archived view returns all records with correct flags. - \`agents_with_empty_boot_id_classify_as_prior_session\` — pre-#405 records load and behave as expected. - \`persist_round_trip_preserves_session_boot_id\` — write with one manager, reload with a fresh manager, confirm the agent flips to prior-session in the new manager's view. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1847 + supporting Closes #405 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:57:06 -05:00
Hunter Bown	b54a708cf7	feat(tui): compact agent_spawn rendering — single line, DelegateCard owns the rest (#409 ) The transcript previously rendered each \`agent_spawn\` call as a 3-4 line generic tool block (header + name kv + args summary + output JSON) AND its companion \`DelegateCard\` (header + live action lines + summary). Four parallel spawns produced ≥16 lines of nearly-identical scaffolding before the model said anything useful. In live mode \`agent_spawn\` now renders as a single header line — \`◐ delegate · agent-abc12 [running]\` — with the agent id pulled from the tool's JSON output. The DelegateCard remains the source of truth for live action progress and the final summary; the generic block is no longer fighting it for attention. Transcript-mode replay (used by \`/pager\`, session export, and the detail pager opened with Alt+V) keeps the full multi-line block so debug history is preserved. ### What's wired - \`GenericToolCell::lines_with_mode\` early-returns \`render_agent_spawn_compact\` when \`name == "agent_spawn"\` and \`mode == RenderMode::Live\`. - New \`render_agent_spawn_compact\` builds one header line with the family glyph (Delegate), the spawned agent id (or \`…\` placeholder while the spawn is in flight), and the tool status. - New \`extract_agent_id(output)\` parser: deterministic, allocation-free string scan for \`"agent_id"\` → quoted value. Avoids dragging serde into a render hot path. ### Tests - 4 \`extract_agent_id\` tests: typical JSON, extra whitespace, missing key (None), empty string id (None). - 4 render tests: live one-liner contains agent id + status with no args/name kv leaking, pending render uses \`…\` placeholder, transcript mode keeps the full block, non-spawn tools (read_file) still render their normal multi-line block. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1842 + supporting Closes #409 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:44:19 -05:00
Hunter Bown	c52f2c46f1	feat(tui): concise todo / checklist update rendering (#403 ) When the model fires \`todo_update\` / \`checklist_update\` repeatedly during a long run, the live transcript previously dumped the full checklist card (header + every item + progress) on every call. In sessions with 20+ items and a dozen status flips the same item list appears over and over, drowning the actual work. Now: when a checklist update output starts with the "Updated todo #N to STATUS" prefix the tool already emits, the live renderer shows a compact one-line state-change card — \`Todo #N: <title> → STATUS\` — plus a \`M/N · pct%\` summary in the header and a \`N items (Alt+V for full list)\` affordance underneath. The full item list is still reachable via the existing detail pager. Falls back to the full-card render path for: - \`todo_write\` / \`checklist_write\` (no "Updated" prefix — first emission of the list) - transcript-mode replays (the user wants the full snapshot when scrolling history) - malformed prefixes (parse failure → fall through, never crash) ### What's wired - New \`parse_update_prefix(output)\` parser handles both \`Updated todo #N to STATUS\` and \`Updated checklist #N to STATUS\` forms. - New \`render_checklist_change_card\` builds the compact card. Looks up the title from the snapshot's items array (id is 1-indexed), falls back to \`(missing title)\` if the id is out of range. - \`try_render_as_checklist\` calls the change-card path only in \`RenderMode::Live\` and only when the parser matches. Pre-existing cases (writes, transcript replay) keep the full-card behavior. ### Tests - 4 parser tests: todo form, checklist form, write outputs falling through, malformed prefixes falling through. - 2 renderer tests: compact card shows only the changed item (with assertions that other titles do NOT appear), missing-title path. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1834 + supporting Closes #403 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:36:21 -05:00
Hunter Bown	4d4a9b424c	feat(config): expand per-project overlay to cover provider, sandbox, approval, mcp_path, max_subagents, allow_shell (#485 ) The project-config overlay (`<workspace>/.deepseek/config.toml` merged on top of the user's global `~/.deepseek/config.toml`) was already wired but only carried four string fields: model, api_key, base_url, reasoning_effort. The use cases users actually file under #485 — "this repo wants a different sandbox / approval policy / MCP server set / hard sub-agent cap" — weren't covered. ### What ships Adds the following keys to the project overlay, all merged with identical "non-empty wins" semantics for strings: - `provider` — pick a different backend per repo (e.g. `nvidia-nim` for an enterprise repo, `deepseek-cn` for a CN-team repo). - `approval_policy` — `never` / `on-request` / `untrusted` for repos with strict policies. - `sandbox_mode` — `read-only` / `workspace-write` / `danger-full-access`. - `mcp_config_path` — per-repo MCP server set without touching the user's global file. - `notes_path` — keep notes in-repo for projects where the notes tool is part of the dev workflow. Plus two non-string fields: - `max_subagents` (positive integer; clamped to `1..=MAX_SUBAGENTS=20`). - `allow_shell` (bool). ### What stays user-global `skills_dir`, `hooks`, `[capacity]`, `[retry]`, `[memory]`, etc. — those are user-shaped settings, not repo-shaped. If a future use case demands per-project values for any of them, a follow-up PR can extend the overlay rather than letting the boundary blur. ### Tests - 8 new tests in `project_config_tests` covering: provider+model, approval+sandbox, max_subagents+allow_shell, max_subagents clamping, negative-max_subagents rejection, missing config file pass-through, malformed TOML pass-through, and empty-string no-op. ### Docs - New "Per-project overlay (#485)" section in `docs/CONFIGURATION.md` with a table of supported keys and the rationale for which fields stay user-global. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1828 + supporting (was 1820) Closes #485 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:25:43 -05:00
Hunter Bown	a723ddd63d	feat(tui): MCP server health chip in footer (#502 ) Adds a small `MCP M/N` chip to the footer's right-side auxiliary cluster so users with MCP servers see at-a-glance health without running `/mcp`. The chip is color-coded by reachability: - all configured servers reachable → success (sky) - some reachable, some failed → warning (amber) - zero reachable but ≥1 configured → error (red) - configured but no live snapshot yet → muted (gray, count only) When zero servers are configured the chip is hidden entirely; users who don't use MCP see no change. ### What's wired - New `App::mcp_configured_count`, populated at app boot from `mcp::load_config(&mcp_config_path)`. Cheap (just reads the JSON config; no server connections), so it doesn't block startup. - `app.mcp_snapshot.servers.iter().filter(\|s\| s.connected).count()` drives the live-state portion when the user has run `/mcp` at least once. - `FooterProps` gains an `mcp: Vec<Span<'static>>` field built by `footer_mcp_chip(connected, configured)`. Threaded into `auxiliary_spans` so it participates in the priority-drop pipeline at narrow widths. - After any `/mcp` action that returns a snapshot, the count is refreshed from the snapshot so post-edit state is reflected. ### #499 follow-up: pure render path Moves the retry-status capture into `FooterProps` (`pub retry: RetryState`) sampled in `from_app`, instead of pulling from the global surface inside `render`. The render method is now pure with respect to its props — fixes a parallel-test race where retry-banner tests and unrelated footer tests would observe each other's writes through the process-wide retry surface. ### Tests - 5 unit tests in `footer_mcp_chip`: hidden when zero, count-only when no snapshot, success / warning / error colours by reachability. - Existing retry-banner tests now pin `props.retry` directly rather than mutating the global surface — no more `test_guard()` needed, no more parallel-runner flakes. - All 31 footer tests pass in parallel. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1820 + supporting Closes #502 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:16:27 -05:00
Hunter Bown	8680a43298	feat(tui): visual retry/backoff countdown in footer (#499 ) When the API client retries a 429 / 5xx / network failure, the TUI previously went silent during the backoff sleep. The user saw "thinking" or "ready" with no signal anything was wrong, until the request finally either succeeded or raised. PR adds a foreground retry banner so the user sees what's happening and how long until the next attempt. ### What ships - New `crates/tui/src/retry_status.rs` module exposing a process-wide `RetryState` (`Idle \| Active(banner) \| Failed { reason }`) with `start`, `succeeded`, `failed`, and `clear` helpers. The state is process-global because the user-facing TUI is one engine per process; sub-agent retries deliberately don't light up the foreground banner. - `client::send_with_retry` now flips the state in its retry callback (`start(attempt+1, delay, reason)`) and on the final outcome (`succeeded()` on Ok, `failed(reason)` on Err with retries-exhausted, `clear()` on Err with attempts==1 so non-retryable errors don't pin the failure row). - `human_retry_reason` translates the structured `LlmError` into a short label: rate-limit reasons include the `Retry-After` header when the upstream provided one ("rate limited (Retry-After 30s)"). - Footer's `render` checks `retry_status::snapshot()` first; when `Active` it renders `⟳ retry N in Ms — <reason>` in the warning color; when `Failed` it renders `× failed: <reason>` in the error color. Banner takes precedence over the toast and the regular status line. - `Engine::handle_user_message` calls `retry_status::clear()` right after emitting `TurnStarted` so the previous turn's failure row doesn't bleed into a new turn. ### Tests - 4 unit tests in `retry_status::tests` covering idle default, the active → succeeded round-trip, the failed-state pin, and a past-deadline saturation. - 2 footer rendering tests asserting the banner / failure-row text appears in the rendered buffer. - All tests touching the global retry surface serialize through `retry_status::test_guard()` so cargo's parallel runner can't observe a torn read. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1815 + supporting (was 1809 on stabilization base) cargo test -p deepseek-tui --bin deepseek-tui --locked retry ✓ 28 passed Closes #499 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:08:53 -05:00
Hunter Bown	dd3882e1bf	fix(utils): display_path normalizes home-relative suffix to platform separator `display_path_with_home` joined the `~` prefix with `MAIN_SEPARATOR_STR` but called `rest.display()` for the suffix, which preserves whatever separators the input carried. On Windows that produced mixed-separator output like `~\projects/foo` for any path that came in with forward slashes — visible in the tests that #506 added to lock down the contract (the tests passed locally on Unix but failed on the windows-latest CI runner). Walk `rest.components()` and join each `Normal` component with `MAIN_SEPARATOR_STR`. Pure-Rust, no extra deps, behavior is byte-identical on Unix because the input separator was already `/`. Verified locally: - `cargo test -p deepseek-tui --locked display_path` ✓ (5 passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 02:17:50 -05:00
Hunter Bown	ad8064b143	chore(v0.8.8): stabilization batch — sub-agent caps, mutex contention, RLM polish, CI cleanup Bundles the v0.8.8 stabilization fixes that were already implemented in the working tree, plus the workflow/doc reconciliation called out in #507. ### Sub-agent runtime fixes - #509 Default sub-agent cap raised to 10 (configurable via `[subagents].max_concurrent` in `config.toml`, hard ceiling 20). The running-count calculation now ignores non-running, no-handle, and finished handles so completed agents stop counting against the cap. - #510 `SharedSubAgentManager` is now `Arc<RwLock<...>>`; the read paths that previously held a `Mutex` for inspection now take a read lock, eliminating the multi-agent fan-out UI freeze. - #511 `compact_tool_result_for_context` summarizes `agent_result` / `agent_wait` payloads before they are folded into the parent context. - #512 RLM tool cards map to `ToolFamily::Rlm` and render `rlm`, not `swarm`. Stale "swarm" wording cleaned in docs/comments/tests. - #513 (foreground stopgap only) Foreground RLM work is visible in the Agents sidebar projection. Full async RLM lifecycle remains v0.8.9 — the issue stays open with a refined scope. ### TUI / UX fixes - #487 Offline composer queue is now session-scoped; legacy unscoped queues fail closed. - #488 Composer Option+Backspace deletes by word; cross-platform key routing helpers added. - #443/#444 Keyboard enhancement flags pop on normal AND panic exit; the raw-mode startup probe is now bounded by a configurable timeout. - #449 Production footer reads statusline colors from `app.ui_theme` rather than the bespoke palette. - #506 `display_path_with_home` no longer mutates `HOME` in tests; the flake on shared-env CI is gone. ### Self-update / packaging - #503 `update.rs` arch mapping uses release-asset naming (`arm64`/`x64`) instead of the raw Rust constants. The platform-asset selector also rejects `.sha256` siblings as primary binaries. Tests now live alongside the source in `mod tests` (the `#[path]`-based integration test was removed because it duplicated test runs and forced a `pub(crate)` helper that no real caller used). - `Max 5 in flight` wording updated in `agent_spawn` description, `prompts/base.md`, and `docs/TOOL_SURFACE.md` so the model sees the real default cap (10) and the configuration knob name. ### CI / release docs (#507) - Pruned three duplicated/dead workflows: `crates-publish.yml`, `parity.yml`, `publish-npm.yml`. Their gates already run in `ci.yml` for every push/PR. - `release.yml` build job now allows `parity` to be skipped (it only runs on tag push), unblocking `workflow_dispatch` reruns. The job still fails closed on a real parity failure. - `RELEASE_RUNBOOK.md` reconciled: crate publishing is documented as the manual `scripts/release/publish-crates.sh` flow (no automated workflow); references to the deleted workflows removed. - `CLAUDE.md` notes the `RELEASE_TAG_PAT` requirement for the auto-tag → release.yml chain (without it, the tag is created but `release.yml` does not fire) and documents the `workflow_dispatch` parity-skip behavior. ### Docs - `docs/COMPETITIVE_ANALYSIS.md` added — capability matrix vs OpenCode and Codex CLI, gap analysis, and recommended implementation order. ### Verification (this branch) - `cargo fmt --all -- --check` ✓ - `cargo check --workspace --all-targets --locked` ✓ - `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` ✓ - `cargo test --workspace --all-features --locked` ✓ (1809 + supporting) - Parity gates ✓ (snapshot, parity_protocol, parity_state) - `cargo build --release --locked -p deepseek-tui-cli -p deepseek-tui` ✓ - Lockfile drift guard ✓ - `deepseek doctor --json` clean - `deepseek eval` (offline harness) success=true, 0 tool errors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 01:57:37 -05:00
Hunter Bown	15da881e1d	chore(release): bump version to 0.8.7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:56:37 -05:00
Hunter Bown	a4102ccad4	fix(tui): allow selection across all transcript cell types The selection-tightening from `7125172f` restricted copy/select to user and assistant cell bodies only, which made it impossible to copy text from system notes, thinking blocks, or tool output. Drop the body-start gate so the rendered transcript block is selectable in full. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:54:57 -05:00
Hunter Bown	7125172f67	fix(tui): tighten selection and live task panels	2026-05-02 21:05:15 -05:00
Hunter Bown	5bfc1feb62	v0.8.6: survivability, UX polish, and release hardening Merge the v0.8.6 feature batch and release hardening.\n\nIncludes the full #373-#380/#382-#402 milestone scope, version bump to 0.8.6, secure /share temp-file handling, Windows-safe self-update replacement, and CI portability fixes.\n\nRemote PR checks passed on the final head before merge.	2026-05-02 20:11:33 -05:00
Hunter Bown	2d61513a9e	v0.8.5: config test fixes + default_model session-apply bugfix (#381 ) * feat: add config UI support for TUI and web modes - Introduced a new `config_ui.rs` module to handle configuration UI for TUI and web. - Updated `TuiOptions` and `App` structures to include `config_path` and `config_profile`. - Implemented functions to build and apply configuration documents. - Added tests to ensure the new configuration UI behaves as expected. - Integrated web configuration session handling into the event loop. - Updated various modules to accommodate the new configuration options and UI. * refactor(tui): remove local path reference for schemaui dependency Remove the local file system path reference for schemaui in favor of using the published crate from the registry. This change updates the Cargo.toml to use only the version specification and adds the source and checksum information to Cargo.lock. * fix: add AGENTS.md guide and improve config error handling - Add comprehensive AGENTS.md file with project instructions for AI assistants, including build commands, dependencies, and GitHub operations guidance - Introduce is_error field to CommandResult struct for better error tracking - Refactor config application logic to properly handle errors using the new is_error flag - Add test utilities for WebConfigSession to support testing - Optimize web config event polling by extracting drain logic into separate function - Add unit tests for session-only config application and engine sync requirements * fix(security): add SSRF protection to fetch_url (#261) Block private, link-local, and cloud metadata IPs in fetch_url HTTP requests. Co-authored-by: JasonOA888 * test(portability): inject paths instead of mutating HOME (Windows fix) CI's `Test (windows-latest)` job failed because both my new tests (composer_history and the spawn_supervised crash-dump test) mutated HOME to redirect `dirs::home_dir()`. That works on macOS / Linux but not on Windows, where dirs::home_dir() reads USERPROFILE / queries SHGetKnownFolderPath rather than HOME. Fix: refactor both modules to expose path-injecting helpers so tests never need to touch the env var: - composer_history: split load_history / append_history into thin wrappers around load_history_from(&Path) / append_history_to(&Path). Tests use the _to / _from form with a tempdir path. - utils::write_panic_dump: same pattern — write_panic_dump_to(&Path) takes the crash dir directly. The spawn_supervised end-to-end test splits into two: one verifies panic-doesn't-propagate (no on-disk side effect needed), one verifies write_panic_dump_to writes the expected log format. Production callers continue to use the env-driven default (`HOME`/ `USERPROFILE` via `dirs::home_dir()`) so no behavior change. Tests work identically on every platform now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tui): clear chat area each frame so stale cells don't bleed into sidebar ChatWidget's render path was `Paragraph::new(lines).render(content_area, buf)` with no Block and no Clear — ratatui's Paragraph only writes cells that contain text, leaving any cell the current frame's paragraph doesn't touch holding the previous frame's contents. With wide tool output (`gh pr list`, `git log`) emitting ISO-8601 timestamps like `2026-05-02T07:29:24Z`, then a subsequent shorter-paragraph frame, the old timestamp tails (`:24Z`, `7:29:24Z`, etc.) persisted on the right edge of the chat area, visually colliding with the section headers in the sidebar (`Plan` rendering as `:24Zan`, `Agents` as `:24Zents`). Fix: render `Clear` over the full content_area before drawing the Paragraph. Cheap (one buffer-fill per frame) and guarantees stale cells can never persist into the next frame's render. Reported in v0.8.5 testing right after install. The other v0.8.5 bordered widgets (composer, sidebar sections, footer) already render into a Block with a solid background style, so they were never affected — only the chat area used a bare Paragraph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(theme): vendor + theme schemaui to deepseek navy palette (config UI) The schemaui-0.12.0 crate the contributor brought in via #365 ships hardcoded Color::Gray / Color::DarkGray / Color::White / Color::Yellow references across its rendering components. Visually it clashed with the rest of deepseek-tui — the editor area read as gray-on-black on a TUI that's otherwise navy ink + sky accents. Two ship-day options weren't acceptable: defaulting back to the legacy modal lost the new editor's UX, and living with gray was off-brand. This commit forks schemaui at 0.12.0 into vendor/schemaui-0.12.0 and themes the rendering layer to match deepseek-tui's palette. The patch is wired in via a workspace-level [patch.crates-io] override so the deepseek-tui Cargo.toml continues to depend on `schemaui = "0.12.0"` and would automatically resolve back to crates.io if we ever drop the override (e.g. once upstream lands a ColorTheme API). Changes inside the vendored fork: - New `src/deepseek_palette.rs` with the brand RGB values: SURFACE_INK / SURFACE_RAISED for backgrounds, BORDER_DIM / BORDER_ACTIVE for chrome, TEXT_PRIMARY / TEXT_MUTED / TEXT_DIM, ACCENT_SKY / ACCENT_BLUE / ACCENT_PURPLE, and STATUS_OK / WARN / ERROR. Values mirror crates/tui/src/palette.rs in the workspace. - `src/lib.rs` exposes the palette module under `cfg(feature = "tui")`. - `src/tui/view/frame.rs::draw` paints a navy backdrop across the full frame area before any child widget renders, so any cell that doesn't get explicitly written reads as ink instead of the terminal default. - `tabstrip.rs`, `overlay.rs`, `popup.rs`, `body.rs`, `sections.rs`, `footer.rs`, `help.rs`, `fields.rs`: every Color::Gray / DarkGray / White / Yellow / Cyan / Blue / Magenta / Red / Green / LightBlue swapped out for a deepseek_palette token, plus explicit `bg(...)` fills on the top-level Block styles and Paragraph wrappers. - `Cargo.toml` adds an empty `[workspace]` so the vendored crate builds standalone (its dev-deps don't drift into ours). Workspace-level changes: - `Cargo.toml` adds `[patch.crates-io] schemaui = { path = "vendor/schemaui-0.12.0" }`. Production deepseek-tui builds pick up the themed fork transparently. - `.gitignore` excludes `vendor/.../web/ui/node_modules/` (15 MB of npm artefacts the Rust build doesn't need) and the vendored Cargo.lock (regenerated locally per build). Verification: - cargo build --workspace --all-features: clean - cargo clippy --workspace --all-targets --all-features --locked: clean - cargo test --workspace: 1777 passed, 0 failed - /config inside `deepseek` now opens a navy-themed editor matching the rest of the TUI; tabs, body panel, footer, popup, and help overlay all read on brand. Future work tracked separately: upstream a `with_theme(ColorTheme)` builder API to schemaui so we can drop the fork. Until then, sync the fork against new schemaui releases when we want their fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "feat(theme): vendor + theme schemaui to deepseek navy palette" This reverts ed597ccc — vendoring 28,913 lines of schemaui to recolor a config editor was the wrong tradeoff. Maintenance cost for a cosmetic match wasn't worth it, and the recolor wasn't even fully working (terminal-default bg kept bleeding through Style::default() calls in the form fields). The simpler path: keep the schemaui-driven editor available as `/config tui` for users who want the form-style UX, but make bare `/config` open the legacy native modal that already matches the deepseek-tui navy chrome by inheritance. No fork, no vendored copy, no ongoing sync burden. Changes: - `git rm -r vendor/schemaui-0.12.0/` (28,913 lines gone) - Drop `[patch.crates-io]` from workspace Cargo.toml — schemaui resolves back to crates.io v0.12.0 unmodified. - Drop the corresponding `.gitignore` exclusions (no more vendor dir to filter). - `config_ui::parse_mode` default mode flipped from `Tui` to `Native`. Bare `/config` → legacy navy modal. Explicit `/config tui` → the contributor's schemaui editor (still available, gray-on-default chrome, but opt-in). `/config web` and `/config <key>` / `/config <key> <value>` unchanged. - Help text updated to list `[native\|tui\|web]` in that order. Verified: cargo build / clippy --workspace --all-features --locked with -D warnings: clean. The contributor's work (#365) ships and gets credit; users discover the alternate editor via the help text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tui): paint chat area with explicit navy ink instead of Clear The Clear-instead-of-fill in 0ae2cead reset cells to the terminal's default background, which read as a brown-gray on most user setups even though the rest of the TUI chrome is navy. Replace the Clear with an explicit Block fill at palette::DEEPSEEK_INK, and pass the same bg through to the Paragraph itself so streamed text cells inherit ink rather than bouncing back to terminal default. Net effect: the chat area visually unifies with the sidebar / composer / footer instead of showing as a contrasting brown-gray panel in the middle of an otherwise navy frame. Stale-cell guarantee from #372-followup is preserved — the Block fills every cell in the area on each frame, so wide tool output (`gh pr list` ISO timestamps, etc.) still can't bleed past the current frame's actual text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(config): update tests for Native default + fix default_model override in session-only apply - Update test_show_config_defaults_to_native and execute_config_opens_config_view_action to expect OpenConfigView (Native) instead of OpenConfigEditor(Tui), matching the parse_mode default change from ce98f054. - Fix apply_document bug where default_model was processed in the main key-value loop after model, causing set_config_value('default_model') to overwrite the runtime model. default_model is now only applied when persist=true, preventing session-only edits from being silently reverted. * style: cargo fmt * chore: remove end-of-night report (session artifact) --------- Co-authored-by: unic <yuniqueunic@gmail.com> Co-authored-by: Jason <jason@aveoresearchlabs.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: YuniqueUnic <YuniqueUnic@users.noreply.github.com>	2026-05-02 16:25:03 -05:00
Hunter Bown	07be656412	style: cargo fmt --all sweep Applies the workspace formatter to the v0.8.5 commits — local builds ran without `cargo fmt --check` so a few format inconsistencies slipped through and CI's `parity` job (which runs fmt --check) failed. Mechanical reflow only; no functional changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 14:50:59 -05:00
Hunter Bown	216f6be349	chore(release): bump version to 0.8.5 Workspace + npm wrapper + every internal crate path-dep pin moved from 0.8.4 → 0.8.5. scripts/release/check-versions.sh confirms parity across the three sources. cargo build / clippy / test all clean. Pushing this commit to main is the trigger for auto-tag.yml to create the v0.8.5 tag, which fires release.yml to build the cross-platform matrix and draft the GitHub Release. The npm publish remains a manual follow-up (2FA on every publish, no automation token provisioned). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 14:48:18 -05:00
Hunter Bown	b678058ff0	test: serialise composer_history env mutation + update rlm test for #358 Two test fixes uncovered by the full-suite run: 1. composer_history tests were using a module-local mutex to serialize their HOME env mutation, but other tests in the workspace (config, commands::restore, etc.) ALSO mutate HOME without that lock. Switch to the crate-wide `test_support::lock_test_env()` so all HOME- mutating tests share one mutex. 2. The `prompts::tests::rlm_first_class_guidance_present` test was pinning the OLD "RLM Is First-Class" framing that #358 deliberately reframed as "RLM Is a Specialty Tool". Renamed the test to `rlm_specialty_tool_guidance_present` and updated the assertions to guard the new framing — so a future encouraging-language regression lights up CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:40:57 -05:00
Hunter Bown	735287774f	fix(approval): cache denials per session — ESC on dangerous command stops re-prompting (#360 ) When the user pressed ESC (or Deny / Abort) on an approval prompt, the TUI correctly told the engine to deny the call. But the model would often retry the same command — same name, same args, same approval fingerprint — and the user would see the dialog again, frustrating in the same way the equivalent yes-yes-yes loop would be. Symmetric to the existing `approval_session_approved` "always approve" cache: add `approval_session_denied: HashSet<String>` populated when the user denies (not when the timeout fired — a timeout might mean the user stepped away rather than refused). Subsequent ApprovalRequired events whose approval_key or tool_name match the cache auto-deny via `engine.deny_tool_call(...)` without re-showing the dialog. Logged via `tool.approval.auto_deny_session` so the audit log captures the silent denial. Closes #360. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:35:25 -05:00
Hunter Bown	162e2e027c	feat(composer): cross-session input history persistence (#366 ) Pressing Up-arrow at the composer now recalls submissions from previous sessions, not just the current one. Implementation: - New `crates/tui/src/composer_history.rs` module with `load_history()` + `append_history()`. Persists to `~/.deepseek/composer_history.txt` (one entry per line, oldest first). Capped at 1000 entries — entries older than the cap are pruned at append time so the file never grows unboundedly. - `App::new` now seeds `input_history` from the persisted file at startup, so Up-arrow at first launch shows yesterday's prompts. - `App::submit_message` mirrors each non-slash submission to the persisted history. Slash commands and empty/whitespace submissions are skipped — those don't help recall and would pollute the stream. - Consecutive-duplicate dedup so re-submitting the same prompt doesn't bloat the file. The persisted history is global (not per-workspace) — matches the arrow-up recall pattern users expect from shells and Claude Code. Per- workspace scoping is a follow-up if multi-project users find it noisy. Tests: 6 unit tests cover round-trip, slash-skip, empty-skip, consecutive-duplicate dedup, cap-pruning, and missing-file safety. The test module uses an internal Mutex to serialize HOME env mutations so tests can still run in parallel without stomping each other. Closes #366. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:32:45 -05:00
Hunter Bown	40f7037d8e	docs(prompts): tell the model to skip markdown tables in terminal output (#372 ) Markdown tables don't render correctly in a terminal — monospace fonts plus variable-width content (especially CJK characters) can't reliably align column borders. Adds an "Output formatting" section to both base.md and base.txt instructing the model to prefer plain prose, bulleted/numbered lists, code blocks, or `- Label: value` pairs over tables. If column-aligned data is genuinely necessary, the guidance asks for narrow, ASCII-only, 2–3 column tables. Closes #372. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:27:54 -05:00
Hunter Bown	1986a15dd5	feat(persistence): schema migration framework — forward upgrade path for ~/.deepseek/ records (#350 ) Every persistence layer in crates/tui/src/ already gates `schema_version > CURRENT_` to reject newer-than-supported records (good — prevents silent truncation when an older binary tries to load a v3 file with v4 fields). What was missing: the forward upgrade path* for older records. When we bump CURRENT_SESSION_SCHEMA_VERSION from 3 to 4 to add a field, every v3 session on disk would silently load with the new field's serde default — which is OK for additions but breaks catastrophically for renames or shape changes. This commit lays down the framework: `crates/tui/src/schema_migration.rs` — new module: - `SchemaMigration` trait. Each persistence domain implements it once with `CURRENT_VERSION`, `DOMAIN`, and an ordered `MIGRATIONS` list of `fn(&mut serde_json::Value) -> Result<(), MigrationError>` steps. Index `i` migrates from version `i+1` to `i+2`. - `SchemaMigration::migrate(value, from_version)` — runs every required step, stamping `value["schema_version"]` after each step so a partial failure leaves a known-state record rather than mixed. - `MigrationError` — typed error with from/to versions + reason. - `backup_before_migrate(path, domain)` — creates a `.bak` copy of the source file before mutation. Errors are warn-logged and ignored (continues because `write_atomic` is itself crash-safe). The `.bak` is left on disk as a manual recovery artifact — no automatic GC. `schema_migration::registry` — submodule that registers every existing persistence domain (session, offline_queue, runtime, task, automation, automation_run) at its current version with an empty MIGRATIONS list. No domain has shipped a schema bump yet, so today's behavior is a no-op. The next bump is now a 4-step recipe: 1. Write the `migrate_<domain>_v<N>_to_v<N+1>` step in this module. 2. Append it to `MIGRATIONS` and bump `CURRENT_VERSION`. 3. Wire `<Domain>Migration::migrate(...)` into the load function in the owning module. 4. Add a fixture-based integration test. Tests: 6 unit tests covering no-op, all-steps, partial migration, newer-than-current rejection, backup creation, and backup-failure robustness. Wiring into individual load sites (session_manager, runtime_threads, task_manager, automation_manager) is intentionally deferred until the first actual schema bump needs it — wiring without migrations would add code paths nothing exercises, and the framework is the part that needs to land before the next bump can ship safely. Closes #350. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:23:33 -05:00
Hunter Bown	4a282e767b	chore(deps): remove 8 unused dependencies flagged by cargo-machete (#341 ) cargo-machete found 8 direct dependencies that are declared but never used in the source tree. Removing them tightens the dependency graph and shrinks Cargo.lock by 40 lines (transitive crate removals where nothing else pulled them in). Removed: - deepseek-core: tokio (the core scaffold doesn't drive any tasks itself) - deepseek-config: serde_json (TOML-only crate; no JSON serialization) - deepseek-mcp: deepseek-protocol (proxy boundary doesn't consume protocol types) - deepseek-app-server: tracing (no tracing! macros in the transport layer) - deepseek-tui: bytes, csv, deepseek-tui-cli, tokio-stream - bytes: no Bytes-typed I/O paths in the TUI - csv: agent_swarm/spawn_agents_on_csv removed in #336/#357 - deepseek-tui-cli: TUI is the runtime, not the dispatcher; no facade calls - tokio-stream: futures-util::StreamExt is sufficient for our SSE / mpsc paths Verified by grep across each crate's `src/` — no `use` of the dep, no fully-qualified path references. cargo build, cargo clippy -D warnings, and cargo test continue to pass with the slimmed graph. Closes #341. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:18:31 -05:00
Hunter Bown	f0e1a6c63a	docs(prompts): tighten /rlm guidance — specialty tool, not first-class (#358 ) The previous rlm prompt guidance ("Treat rlm as a normal reasoning tool, not a last-resort escape hatch") encouraged the model to reach for rlm in cases where a direct read_file or focused agent_spawn would do better. The "RLM Is First-Class" framing was too encouraging given that rlm is genuinely a specialty tool: it pays off ONLY when the input can't fit in the model's context window. Three audit items from #358 addressed: 1. Reaching for rlm too often. Reframed as "specialty tool" with explicit do-not-use-when guidance front-loaded. The decomposition workflow now says "ONLY when an input genuinely doesn't fit" with a concrete size threshold (~50K tokens / a whole file / a long transcript / a multi-document corpus). 2. Tool description encourages overuse. The rlm tool's description() now leads with "DO NOT use this tool when..." (input fits, grep suffices, short classification, interactive exploration), and only then describes the legitimate use cases. Adds explicit cost/speed caveat. 3. Helpers documented as if they were tools. Both the rlm tool description and base.md/base.txt now state plainly: `llm_query`, `llm_query_batched`, `rlm_query`, `rlm_query_batched` live INSIDE the Python REPL. They are functions the sub-agent uses, NOT separately-callable tools the model invokes. Closes #358. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:16:31 -05:00
Hunter Bown	bc13dbfee7	fix(tui): show live submit disposition in composer hint (#345 ) When the user has typed something into the composer and hits Enter, the message goes to one of four fates depending on engine state: - Immediate (idle + online) — most common, sends right away - Steer (busy + tool execution) — forwards mid-turn - QueueFollowUp (busy + streaming text) — parks for after TurnComplete - Queue (offline) — parks on offline queue Previously the user had no way to tell which would fire BEFORE pressing Enter. The disposition flips with fast-changing internal state (whether the model is currently streaming text vs. running a tool, whether network connectivity has just dropped) and only the post-submit status toast hinted at the result — which is too late if you wanted a different behaviour. Fix: extend the composer's bottom hint line so when the composer has non-empty content, it shows what Enter will do RIGHT NOW. The hint flips live with engine state, so the user sees the real behaviour before pressing Enter: ↵ steer into current turn (sky blue, busy + tool execution) ↵ queue for next turn (muted, busy + streaming) ↵ offline queue (no engine) (warning yellow, offline) The Immediate case stays unhinted — that's the default and surfacing it would be noise. Closes #345. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:13:45 -05:00
Hunter Bown	c88b980d52	perf(session): metadata-only path for list_sessions — string-extract metadata block (#337 ) `SessionManager::list_sessions` previously called `serde_json::from_reader` to extract just the `metadata` field, which forced serde to scan every JSON token in the file just to validate structure — including the entire `messages` and `tool_log` arrays we were about to discard. For a user with hundreds of long sessions, a single startup `list_sessions()` was reading and parsing tens of MB of JSON. Optimization: read at most 64 KB up front and string-extract the top-level `metadata` object with a brace-balanced, string-aware scanner. Real metadata blocks are < 1 KB and always appear before the large `messages` payload, so the prefix read covers every realistic case. Falls back to a full-file read only if the metadata block isn't extractable from the prefix (legacy or oddly-formatted file). Net: typical session metadata load goes from O(file size) to O(1 KB) regardless of conversation length, and the disk read is bounded. Tests: - extract_top_level_metadata_skips_huge_messages_array — verifies the scanner correctly extracts metadata from a session whose `messages` array contains the literal string `"metadata"` in a user message. - extract_top_level_metadata_handles_braces_inside_strings — verifies brace-in-string handling so `{` / `}` inside JSON string values don't throw off the depth counter. Closes #337. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:11:37 -05:00
Hunter Bown	c5627ebb14	fix(config): /logout + new key now uses the new key (#343 ) After running /logout and entering a new API key, subsequent requests could still be sent with the old key because the resolution path checked the OS keyring before the in-memory override. The keyring still held the old credential, so it shadowed the freshly-typed one. Three changes: 1. `Config::deepseek_api_key()` — explicit override is now path 0. When `self.api_key` is explicitly set (non-empty, non-sentinel), it wins over keyring/env/provider-config. This is what the user just typed, so it should be authoritative. Existing keyring-based flows are unaffected: users who store their key via `auth set` have `self.api_key = None`, so path 1 (keyring) still wins for them. 2. `clear_api_key()` now wipes the keyring + provider-scoped keys. Previously only the legacy root `api_key = ...` line was stripped from config.toml. Now every known provider slot in the OS keyring (deepseek, nvidia-nim, openrouter, novita, fireworks, sglang) is deleted, and every `api_key` line nested in a `[providers.<name>]` table is also stripped. 3. `/logout` clears the in-memory `Config` too. The dispatcher handler in ui.rs::execute_command_input wipes `config.api_key` and every `config.providers.*.api_key` so a future clone of the long-lived Config doesn't leak the stale value. The companion onboarding flow in ui.rs also stamps the new key onto `config` itself rather than only on a one-shot clone, so subsequent /provider switches see the new credential. Test coverage: - `clear_api_key_strips_root_and_provider_scoped_keys` — verifies all three credential locations get wiped from a fixture config.toml. - `deepseek_api_key_prefers_explicit_in_memory_override` — guards the precedence flip. - `deepseek_api_key_ignores_sentinel_placeholder` — confirms the legacy `KEYRING_SENTINEL` placeholder still falls through. Closes #343. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:08:52 -05:00
Hunter Bown	899c703d81	fix(tui): convert remaining tokio::spawn sites to spawn_supervised + restore terminal on panic (#346 ) Completes the panic-safety work #346 started in `a8be33b3`. Converts every trivial production tokio::spawn site to spawn_supervised so a panicking task writes a crash dump to ~/.deepseek/crashes/ and the parent process stays alive. Sites converted: - tools/rlm.rs:190 — RLM progress drain - tools/subagent/mod.rs:888 — run_subagent_task spawn - tools/subagent/mod.rs:988 — run_subagent_task resume - core/engine.rs:744 — sub-agent mailbox drainer - core/engine.rs:1601 — engine event-loop spawn - lsp/client.rs:127 — LSP writer - lsp/client.rs:129 — LSP reader - lsp/client.rs:135 — LSP dispatcher - rlm/bridge.rs:188 — bridge progress drain - task_manager.rs:790 — task worker loop - automation_manager.rs:822 — automation scheduler Sites left as-is (already panic-safe with their own catch_unwind): - runtime_threads.rs:1242, 1462 — custom AssertUnwindSafe + catch_unwind - mcp.rs:322 — MCP SSE loop with custom catch_unwind Sites that don't need conversion: - runtime_api.rs:287 — axum::serve runs in the parent task, not spawned - runtime_api.rs:1583+ — test-helper spawn_test_server inside #[cfg(test)] - All other spawn calls are in #[cfg(test)] modules where panics are expected to propagate. Also: - main.rs panic hook now restores the terminal (LeaveAlternateScreen + disable_raw_mode) before invoking the original hook, so a panicked TUI doesn't leave the user's shell stuck in alt-screen mode. - Adds spawn_supervised_tests::panicking_task_writes_crash_dump_and_does_not_kill_parent that proves a panicking task produces a dated crash log under ~/.deepseek/crashes/<task>.log and the parent task completes Ok. Closes #346. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 10:02:28 -05:00
Hunter Bown	0c55c732a2	chore(tools): full SwarmOutcome cascade — delete swarm.rs + event variant + UI handlers (#357 ) Completes the v0.8.5 cleanup #336 started: with the model-callable swarm surface gone, the supporting event/UI/state plumbing has no consumers. - Delete crates/tui/src/tools/swarm.rs (2215 lines, parked under #![allow(dead_code)] since #336) - Drop pub mod swarm from tools/mod.rs - Remove Event::SwarmProgress variant + handler in tui/ui.rs - Remove app.rs swarm fields: pending_swarm_task_count, swarm_jobs, last_swarm_id, swarm_card_index (and SwarmOutcome import + retain) - Remove subagent_routing.rs swarm helpers: seed_fanout_card_from_tool_call, sync_fanout_card_from_tool_result, sync_fanout_card_from_swarm_outcome, worker_slot_from_swarm_task, status_to_lifecycle, swarm_task_status_to_lifecycle - Simplify active_fanout_counts to read directly from the active FanoutCard - Simplify handle_subagent_mailbox is_fanout to only "rlm" dispatches - Strip dead "agent_swarm" / "spawn_agents_on_csv" string match arms in ui.rs (tool dispatch, task panel refresh, ListSubAgents trigger, active-cell skip), tool_card.rs (ToolFamily::Fanout), and tool_routing.rs (extract_fanout_prompts function deleted entirely) - Trim WorkerSlot to id/agent_id/status (label/model/nickname were only populated by worker_slot_from_swarm_task); remove unused with_agent ctor - Remove unused SubAgentManager::max_agents and ::available_slots methods (only swarm.rs called them) - Update widgets/agent_card.rs doc comments to point at rlm + future multi-child dispatch instead of agent_swarm FanoutCard decision: kept. It remains the visual primitive for rlm and for any future multi-child dispatch the parent agent makes via repeated agent_spawn calls. Net: 2698 lines removed, 90 added. Closes #357. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 09:56:33 -05:00

1 2 3 4 5 ...

318 Commits