codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	8a679bf662	chore(hooks): tracing::warn on hook failures (#455 follow-up) Hook failures were silent — the executor returned a `HookResult` with `success=false`, but every call site discards it with `let _ = ...`. Operators tailing `deepseek` had no visibility into hook errors short of running each hook command by hand. Centralizes the logging inside `HookExecutor::execute` so every fire site benefits without sprinkling instrumentation. Logs through `tracing::warn!` with structured fields (`hook`, `event`, `exit_code`, `duration_ms`, `error`, `stderr_head`) so operators can `RUST_LOG=warn deepseek` and immediately see which hooks are misbehaving. Successful runs log nothing — `tool_call_before` / `tool_call_after` fire on every tool dispatch, so per-call success logging would be unreadably noisy. No behavioral change for users with no hooks (the function fast-paths out before reaching this branch). No behavioral change for users with passing hooks. Failed hooks still respect `continue_on_error` and the surrounding loop is unchanged.	2026-05-03 07:10:19 -05:00
Hunter Bown	c0b6c2a1e5	perf(hooks): fast-path skip when no hooks configured (#455 follow-up) Now that `tool_call_before` / `tool_call_after` fire on every tool dispatch, the cost of constructing a `HookContext` (which allocates for `workspace`, `model`, `session_id`, …) shows up on the hot path even when the user has zero hooks configured — the common case. Adds `HookExecutor::has_hooks_for_event(event)` as a cheap boolean gate that callers consult before building the context. The pre-check returns false when: * `config.enabled == false` (globally disabled). * No hook in the config has the given `event`. Wired through every fire site: * `tool_routing.rs::handle_tool_call_started` — `ToolCallBefore`. * `tool_routing.rs::handle_tool_call_complete` — `ToolCallAfter`. Also skips the `result.content.clone()` that the `with_tool_result` builder demands. * `ui.rs::dispatch_user_message` — `MessageSubmit`. * `ui.rs::apply_engine_error_to_app` — `OnError`. Inside `HookExecutor::execute` itself, also short-circuit before calling `context.to_env_vars()` when no hooks match the event — defends against a caller that builds the context but forgets to gate. Tests: 3 new tests cover empty-config / globally-disabled / per-event filtering. The existing 18 hook tests pass unchanged. No behavioral change for users with hooks configured; pure allocation-free fast path otherwise.	2026-05-03 07:07:11 -05:00
Hunter Bown	e569f2ca99	feat(hooks): fire message_submit + on_error too (#455 observer-only) Completes the observer-only slice of #455 by wiring the two remaining `HookEvent` variants that were defined but never fired: * `MessageSubmit` fires from `dispatch_user_message` before the message is handed to the engine. Hook context carries `message` so observers can log every prompt the user submits, redact for compliance audit, or page on `/wipe-database`-style content. Read-only. * `OnError` fires from `apply_engine_error_to_app` before the error cell reaches the transcript. Hook context carries `error`. Useful for paging on auth / billing / invalid- request failures without tailing the audit log. Combined with the prior `tool_call_before` / `tool_call_after` wiring, every `HookEvent` variant now has a live producer: `SessionStart`, `SessionEnd`, `MessageSubmit`, `ToolCallBefore`, `ToolCallAfter`, `ModeChange`, `OnError`. The `/hooks events` listing already enumerates them with their on-fire semantics. Hooks remain read-only observers in this slice. Mutation is v0.8.9 follow-up because it needs a synchronous-gate contract that would change semantics for every hook surface — including the lifecycle events that have shipped for many releases.	2026-05-03 07:01:52 -05:00
Hunter Bown	4310202645	feat(hooks): fire tool_call_before / tool_call_after (#455 observer-only) The `HookEvent::ToolCallBefore` and `HookEvent::ToolCallAfter` enum variants were defined but never fired from production code, so `[[hooks.hooks]]` entries with those events sat dormant. Wires the fires from `tui/tool_routing.rs`: * `handle_tool_call_started` fires `ToolCallBefore` with the hook context populated with `tool_name` and `tool_args`. The fire happens before any UI bookkeeping so observers see the call as early as possible. * `handle_tool_call_complete` fires `ToolCallAfter` after the cell finalization with the result content (or stringified error) + success flag. Stays last in the function so any UI state the hook might want to observe via shell-out is already settled. Hooks remain read-only observers in this slice. Mutation (modifying tool args before execution, or the result before it reaches the model) is a v0.8.9 follow-up that needs a synchronous-gate contract; the existing executor is fire-and- forget and adding mutation would change semantics for every existing hook surface (session_start, mode_change, etc.). Operators can wire `tool_call_before` / `tool_call_after` hooks in `~/.deepseek/config.toml` immediately to log every tool call, page on long shell exec, or audit risky operations. The `/hooks events` listing already enumerates them. No new tests — `tool_routing.rs` has no existing test surface, and the hook execution path is already covered via `hooks::tests::*`. The wiring is mechanically minimal.	2026-05-03 06:59:26 -05:00
Hunter Bown	a2c7c94f5d	test(pr): pin is_command_available contract (#451 follow-up) Adds a tiny test that exercises both branches of the helper used by `deepseek pr <N>` to detect `gh`'s presence: * Positive case — `sh` (POSIX baseline) is reported present. Gated on `cfg(unix)` because Windows runners aren't guaranteed to have `sh.exe` outside git-bash. * Negative case — a deliberately-implausible `this-command-cannot-exist-…ENOENT-marker` returns `false` rather than panicking from the `Command::new` exec failure. Pure additive coverage; no production change.	2026-05-03 06:54:05 -05:00
Hunter Bown	8ed1cb4e68	feat(hooks): /hooks events subcommand for discovery (#460 polish) The shipped `/hooks list` told users WHAT was configured but not WHAT they could configure. Without this, the only way to learn the supported `HookEvent` values is to grep source — not ideal when most users just want to wire up a notification on session_end. Adds `/hooks events` (aliases `event` / `list-events`) which prints every `HookEvent` variant alongside a short descriptive blurb (when it fires, current observability-vs-mutation status). Ordered lifecycle → per-tool → situational so the listing reads naturally and stays stable across releases. Updates `CommandInfo::usage` to `/hooks [list\|events]` so the fuzzy autocomplete shows the new subcommand. Tests: 1 new test (`events_subcommand_lists_every_event_variant_in_documented_order`) pins the order, the per-event descriptive blurb format, and exhaustive variant coverage. The existing 6 hooks tests pass unchanged.	2026-05-03 06:51:27 -05:00
Hunter Bown	14931566b5	test(audit): pin emit_tool_audit contract (#500 follow-up) The `tool.spillover` audit emission shipped in 0fa042 added a new caller to `emit_tool_audit` but the function itself had no unit tests pinning its contract — operators relying on `DEEPSEEK_TOOL_AUDIT_LOG` deserve regression coverage on the JSONL writer. Adds 3 tests: * `emit_tool_audit_writes_jsonl_line_when_env_var_set` — verifies each call appends a parseable JSON line, with the expected `event` and `tool_id` keys reaching disk. * `emit_tool_audit_is_noop_when_env_var_unset` — pins the early-return when the env var is missing (no panic, no file side effects). * `emit_tool_audit_creates_parent_directory` — confirms the `create_dir_all(parent)` step works for previously-missing paths so operators can point the env var at a fresh path without a chicken-and-egg setup step. All three serialise through a static Mutex because they mutate process-global `DEEPSEEK_TOOL_AUDIT_LOG`. Cleanup happens on each test under the same guard.	2026-05-03 06:48:59 -05:00
Hunter Bown	a8e0693958	feat(doctor): report spillover dir + composer stash file (#422/#440 polish) The v0.8.8 polish stack added two on-disk surfaces operators might want to inspect — `~/.deepseek/tool_outputs/` for spilled tool output (#422 / #500), and `~/.deepseek/composer_stash.jsonl` for parked composer drafts (#440). Neither showed up in `deepseek doctor`, so users couldn't see at a glance "do I have parked drafts?" or "how much disk has spillover claimed?" Adds a `Storage:` section to the human-readable doctor and a `storage` object to the JSON doctor: * Spillover slot reports the dir's existence and entry count. Pre-creation state ("not yet created") is shown explicitly rather than as a missing dir — the dir is created lazily on first spill, not at boot. * Stash slot reports the file's existence and parked-draft count by re-reading via `composer_stash::load_stash`. Empty / missing stash shows the Ctrl+S hint so the user knows how to use the feature. The JSON schema always emits both nested slots regardless of state (so dashboard schemas stay stable across hosts); the human-readable hides the "not yet created" line for spillover when the dir is missing to keep the report scannable.	2026-05-03 06:46:20 -05:00
Hunter Bown	b1c6e6b173	feat(doctor): report .opencode + .claude skill dirs (#432 follow-up) The cross-tool skill discovery shipped in 432a0c1 walks `.opencode/skills/` and `.claude/skills/` alongside the `.agents/skills/` and `skills/` workspace folders, but the `deepseek doctor` output still only listed the original three slots. Operators staring at "where are my Claude-style skills?" had no way to confirm whether the new dirs were even being checked. Updates both surfaces: * Human-readable doctor — adds two conditionally-printed lines for `.opencode skills dir` and `.claude skills dir`. Empty dirs are omitted to keep the report scannable; the dirs exist on most workspaces only when the user has installed another AI tool's skill catalog there. * JSON doctor (`deepseek doctor --json`) — adds `opencode` and `claude` slots to the `skills` object alongside the existing `global`, `agents`, `local`. Each carries `path`, `present`, and `count`. JSON consumers see all five keys regardless of presence so dashboard schemas stay stable across hosts. The `selected_skills_dir` field still reflects the legacy "highest-precedence single dir" — workspace-aware discovery is done at runtime by `discover_in_workspace`, but `selected` is a useful "where do I install a NEW skill" hint and stays unchanged for backwards compatibility with existing diagnostic tooling.	2026-05-03 06:43:47 -05:00
Hunter Bown	5627d6535b	docs: document NO_ANIMATIONS, instructions array, /hooks, /stash Catches up `docs/CONFIGURATION.md` with the v0.8.8 polish stack so operators have one source of truth for the new surfaces: * `NO_ANIMATIONS` env override (#450) joins the existing environment-variable list, with a cross-reference to `docs/ACCESSIBILITY.md`. * New `### Instruction sources` section documents the `instructions = [...]` config field (#454): expansion rules, 100 KiB per-file cap with `[…elided]` marker, missing-file warning behavior, and the project-wholesale-replaces-user override semantics. * New `### /hooks listing` section documents the read-only slash command (#460 MVP) so users know how to introspect configured lifecycle hooks without `cat`-ing config.toml. * New `### Composer stash` section documents Ctrl+S + `/stash list\|pop\|clear` (#440) including the 200-entry cap and multiline preservation. Pure documentation; no code changes. Existing prompt-stability and config-loading tests are unaffected.	2026-05-03 06:39:29 -05:00
Hunter Bown	a368dc53b8	feat(commands): /hooks read-only lifecycle hook listing (#460 MVP) Slash command enumerates configured lifecycle hooks from the user's `[hooks]` table, grouped by event. The full picker / persisted enable-disable surface in #460 is still M-sized work; this MVP gives users a no-typing view of what's actually loaded — the most-asked question once hooks start firing. Implementation: * `crates/tui/src/commands/hooks.rs` formats the hook list with per-event headings, hook name (or `(unnamed)`), background marker, timeout, condition summary, and a 60-char shell command preview. * `condition_summary` covers every `HookCondition` variant (Always/ToolName/ToolCategory/Mode/ExitCode/All/Any) so the listing stays informative for compound conditions too. * `event_label` maps each `HookEvent` to its config-file string so the listing matches what the user wrote in TOML. * New `HookExecutor::config()` accessor exposes the underlying `HooksConfig` for read-only callers; doesn't open the door to mutation, which still belongs to the broader #460 work. * Registered in `commands::COMMANDS` with `aliases: &["hook"]`, usage `/hooks [list]`, and `MessageId::CmdHooksDescription` localized in en, ja, zh-Hans, pt-BR. * Wired into `command_palette::command_runs_directly` so pressing Enter from Ctrl+K runs `/hooks list` straight. Tests: 6 unit tests covering preview-cap truncation, newline stripping, condition-summary variants, event-label exhaustiveness, and BTreeMap-grouping ordering.	2026-05-03 06:36:37 -05:00
Hunter Bown	15127046e8	feat(stash): /stash clear subcommand to wipe the stash file (#440 polish) Pairs with `/stash list` and `/stash pop` so the user can fully manage the stash from inside the TUI without reaching for `rm`. * New `composer_stash::clear_stash()` returns the number of entries dropped so the slash command can report it. Atomic-write replaces the file with empty content; missing / empty files return `Ok(0)` without erroring. * `clear` / `wipe` / `drop` are accepted as the subcommand alias. The "unknown subcommand" hint now lists the three live subcommands explicitly. * CommandInfo usage updated to `/stash [list\|pop\|clear]` so `/help` and the autocomplete reflect the new option. * 3 new tests in `composer_stash`: returns-0 when file absent, returns-0 when file is empty, drops entries and reports count on a populated stash. No new dependency; reuses `crate::utils::write_atomic` for the truncate-and-rewrite.	2026-05-03 06:28:18 -05:00
Hunter Bown	ba871c56f6	feat(cli): deepseek pr <N> — pre-seed TUI with PR context (#451 ) `deepseek pr 1234` fetches the PR's title, body, base/head, URL, and full diff via `gh`, then launches the interactive TUI with a review prompt already typed in the composer. The user can edit before sending or hit Enter to fire as-is. Falls back gracefully with an actionable error when `gh` is not on PATH. Implementation: * `Commands::Pr { number, repo, checkout }` subcommand. Optional `--repo <owner/name>` mirrors `gh pr view`'s flag. Optional `--checkout` opt-in for `gh pr checkout`; default is to leave the working tree alone since `gh pr checkout` errors out on dirty trees. * `run_pr` helper drives three best-effort gh shell-outs (`pr view --json`, `pr diff`, optional `pr checkout`) and formats a structured prompt: PR header → URL → branches → description → fenced ```diff block. * `format_pr_prompt` caps the diff at 200 KiB with codepoint- safe truncation so a massive PR doesn't blow the model's context window before the user even hits Enter. * New `TuiOptions::initial_input: Option<String>` plumbs the pre-typed text into `App::new` (which now branches its composer-state init around the option). Cursor lands at the end of the seed text. Future callers (welcome screens, share- link landing pages, etc.) can reuse the same channel. * `run_interactive` gains an `initial_input: Option<String>` parameter; existing callers pass `None`. Tests: 3 new tests in `pr_prompt_tests` cover the happy path (title/url/branches/body/diff render correctly), empty-input fallbacks (placeholder for missing title/body/branches/url), and codepoint-safe truncation when the diff exceeds the 200 KiB cap. Bulk update: every other `TuiOptions { ... }` test-builder across the workspace (~21 sites) gains `initial_input: None` so the new field doesn't break the existing test suite.	2026-05-03 06:23:54 -05:00
Hunter Bown	a9222f4b8c	feat(stash): make /stash run directly from the command palette (#440 polish) `/stash` defaults to `list` when invoked without an argument, so in the Ctrl+K command palette it should execute on Enter rather than insert `/stash ` and wait for the user to type `list`. The identical pattern already applies to `/queue`, which has the same optional-arg shape. Adds `"stash"` to the `command_runs_directly` allowlist alongside `queue`. The fuzzy-search rank, label match, and section grouping already pick up `/stash` automatically because they iterate over `commands::COMMANDS` (which gained the entry in `2db4843`). No behavior change on type-then-Enter — only on the hit-Enter-from-the-palette path. The existing 8 command-palette tests pass unchanged.	2026-05-03 06:14:34 -05:00
Hunter Bown	2db48435e8	feat(stash): register /stash in /help and autocomplete (#440 polish) The slash command landed in 6fb87 but only via the dispatch match arm — `/help` and the fuzzy autocomplete consult `COMMANDS: &[CommandInfo]` to enumerate available commands, and without a `CommandInfo` entry the new `/stash` was effectively hidden from discovery. Adds a `CommandInfo` row with `aliases: &["park"]`, a `/stash [list\|pop]` usage hint, and a new `MessageId::CmdStashDescription` localized in en, ja, zh-Hans, pt-BR. The description reminds users that Ctrl+S is the matching push entry point — both surfaces should reinforce each other in the help overlay. No behavior change on the dispatch path; this is pure discoverability.	2026-05-03 06:13:06 -05:00
Hunter Bown	6fb8739feb	feat(composer): prompt stash — Ctrl+S parks, /stash list+pop (#440 ) A stash is a side-channel from history: it holds drafts the user parked deliberately instead of submissions made in the past (which live in `composer_history.rs`). * `crates/tui/src/composer_stash.rs` — JSONL-backed store at `~/.deepseek/composer_stash.jsonl`. One JSON object per line with `ts` (RFC 3339) and `text`. Self-healing parser drops malformed lines instead of poisoning the file. Multi-line drafts round-trip intact via JSON's newline escaping. Capped at 200 entries; oldest pruned at push time. Empty / whitespace-only text is silently dropped. * `crates/tui/src/commands/stash.rs` — `/stash list` renders the stash with one-line previews and timestamps; `/stash pop` restores the most recently parked draft into the composer (LIFO) and rewrites the file. `/park` aliases `/stash`. * Composer Ctrl+S handler in `tui/ui.rs` — pushes the current draft onto the stash, clears the composer, and surfaces a toast confirming the action so the no-op-feel doesn't fool users into thinking nothing happened. Empty composers are a no-op so a stray Ctrl+S can't pollute the file. * New `KbStashDraft` keybinding entry registered in the help overlay; localized in en, ja, zh-Hans, pt-BR. Tests: 7 unit tests in `composer_stash.rs` cover round-trip, LIFO pop, empty-on-pop, drop-empty-text, multi-line preservation, malformed-line resilience, and cap pruning. 4 unit tests in `commands/stash.rs` cover the preview helper's truncation, multi-line first-line behavior, and empty-input handling.	2026-05-03 06:09:35 -05:00
Hunter Bown	99223b148c	docs(prompt): list load_skill in the model's toolbox reference (#434 ) The new `load_skill` tool was registered into the agent and plan mode tool sets in 0c1699 but the prompt's `## Toolbox` quick-reference still listed only the legacy progressive- disclosure pattern (system prompt → read_file). The model has to read the tool description to know `load_skill` exists, but without a hint in the toolbox it's easy to miss when scanning. Adds a `Skills` line that points at `load_skill` and explains when to prefer it over `read_file` + `list_dir`. Pulls from the existing `## Skills` section above for context, so the model sees one short cross-reference instead of duplicate setup instructions. No code change; prompt-only doc edit. Existing prompt-stability tests pass unchanged because they don't compare prose.	2026-05-03 06:01:15 -05:00
Hunter Bown	0fa042dc99	feat(audit): emit tool.spillover events when output is spilled (#500 polish) The existing `tool.result` audit event records that a tool finished but says nothing about spillover — operators tailing `~/.deepseek/audit.log` couldn't see when 200 KiB of stdout landed under `~/.deepseek/tool_outputs/`. Adds a discrete `tool.spillover` event keyed off `apply_spillover`'s return value, fired in both the sequential and parallel tool paths so the log entry exists regardless of how the tool was scheduled. Each event carries: {"event": "tool.spillover", "tool_id": "...", "tool_name": "exec_shell", "path": "/.../call-abc.txt"} This is a pure observability addition. The model still receives the same truncated head + footer; the UI still renders the inline `full output: <path>` annotation; the spillover writer contract is unchanged. No new tests — `apply_spillover` already has unit-level coverage and the engine paths are exercised by integration runs.	2026-05-03 05:58:02 -05:00
Hunter Bown	6b0a60883a	test(skill): integration tests for the load_skill execute path (#434/#432) The five existing tests cover the helpers (`format_skill_body`, `collect_companion_files`) directly. Adds two integration tests that drive the full `LoadSkillTool::execute` async path: * `execute_finds_skills_in_opencode_dir_via_workspace_discovery` — installs a skill under `<workspace>/.opencode/skills/` and verifies the tool finds it via `discover_in_workspace`, returns the body, and stamps `metadata.skill_path` pointing at the .opencode dir. Pins #432's multi-dir wiring through the actual tool entry point, not just the unit-level helper. * `execute_returns_helpful_error_for_unknown_skill` — verifies the "skill not found" error includes both the missing name and the available skill list so the model can recover without a separate discovery call. Both use `#[tokio::test]` because `ToolSpec::execute` is async. ToolContext is constructed via the existing `ToolContext::new` helper so the test stays hermetic across hosts.	2026-05-03 05:56:29 -05:00
Hunter Bown	d7017b7829	feat(skills): walk workspace .opencode + .claude skill dirs (#432 ) The skills catalogue and `load_skill` tool now scan every candidate directory in the workspace plus the global default, not just the first one that exists: <workspace>/.agents/skills (deepseek-native convention) <workspace>/skills (flat, project-local) <workspace>/.opencode/skills (OpenCode interop) <workspace>/.claude/skills (Claude Code interop) ~/.deepseek/skills (global, user-installed) Skills installed for any AI-tool convention land in the same catalogue without the user having to symlink or duplicate files. Name conflicts resolve first-match-wins per the precedence list above, so workspace-local skills shadow user/global ones — that's the right shadowing for "this repo overrides my defaults". Implementation: * `skills::skills_directories(workspace)` returns the existing candidate dirs in precedence order (host-dependent for the global default). * `skills::discover_in_workspace(workspace)` walks each, merges the discovered skills, and accumulates warnings. * `render_available_skills_context_for_workspace(workspace)` wraps `discover_in_workspace` for `prompts.rs`. The legacy single-dir `render_available_skills_context(skills_dir)` is retained as a fallback so callers that don't have a workspace view (e.g. mcp_server.rs) still work. * `LoadSkillTool` (#434) routes through `discover_in_workspace` so its lookup matches what the system-prompt catalogue advertises. The "skill not found" error message now lists the searched dirs to help the user debug missing installs. Tests: 4 new tests in `skills/mod.rs`: precedence-order resolution, first-wins merge across .agents and .claude, .opencode discovery, system-prompt rendering for cross-tool dirs. The existing 6 single-dir tests pass unchanged.	2026-05-03 05:52:28 -05:00
Hunter Bown	8290b136e1	feat(tui): push DISAMBIGUATE_ESCAPE_CODES on startup (#442 ) Opt into the Kitty keyboard protocol's escape-code disambiguation so terminals that support it (Kitty, Ghostty, Alacritty 0.13+, WezTerm, recent Konsole / xterm) report unambiguous events for Option/Alt-modified keys, plain Esc, and multi-byte sequences. Push happens after `enable_raw_mode` and the alt-screen / mouse-capture / bracketed-paste setup so the order matches shutdown's reverse-order pop. Only the disambiguation tier is pushed — `REPORT_EVENT_TYPES` and the higher tiers emit release events that the existing key handlers would mis-route as duplicate presses. Pop on exit was already wired in main.rs (panic) and ui.rs (normal shutdown) per #443; the recent #443 follow-up extended that to the suspend paths so editor / shell-suspend children inherit a clean keyboard mode. The push + the four pops form a complete pair. Failure to push is logged at debug level and ignored — a quirky terminal can't block startup. On terminals without protocol support the escape sequence is silently discarded and behaviour is identical to today (iTerm2, Terminal.app, Windows 10 conhost). No new dependency; everything runs through crossterm's existing `PushKeyboardEnhancementFlags` command.	2026-05-03 05:45:52 -05:00
Hunter Bown	e8af3cd37d	feat(tools): load_skill model-callable tool (#434 ) Adds a `load_skill` tool that takes a skill id and returns the SKILL.md body plus the sibling companion-file list in one tool call. The existing progressive-disclosure pattern (system prompt lists skills → model `read_file <path>`) still works; this tool is the higher-level affordance for skills that ship with multiple resource files. Implementation: * `LoadSkillTool` lives in `crates/tui/src/tools/skill.rs`. Read- only, auto-approved, parallel-safe. * On call, resolves the active skills directory via the new `skills::resolve_skills_dir` helper, which mirrors `App::new`'s hierarchy: `<workspace>/.agents/skills` → `<workspace>/skills` → `~/.deepseek/skills`. No new plumbing through ToolContext — the workspace is already there. * Returns the skill body wrapped in a self-contained block: description quote, source path, the SKILL.md verbatim, and a `## Companion files` section listing siblings (sorted lex, deterministic for tests). Solo skills skip the companions section entirely so the tool result stays tight. * Errors with a helpful hint when the name is unknown — the hint includes the catalogue ("Available: foo, bar, baz") so the model can recover without an extra discovery call. * Wired into `ToolRegistryBuilder::with_skill_tools` and pulled into both Agent and Plan tool-setup paths. Plan mode benefits because skills are read-only references that planners often need. Tests: 5 unit tests covering: description-headed body, companion enumeration excluding SKILL.md and nested dirs, empty result for solo skills, and the conditional `## Companion files` section.	2026-05-03 05:43:18 -05:00
Hunter Bown	20913b2f17	test(config): pin instructions-array merge semantics (#454 follow-up) Adds four tests that pin the documented contract for the new `instructions = [...]` field added in 0c1699: * Project array replaces the user array wholesale (the typical "merge" pattern is for users who want both — they list ~/global.md inside the project array). * Explicit `instructions = []` clears the user list — a project signalling "this repo doesn't want any of those globals". * Absent project field leaves the user list intact (nothing in the project file → user wins by default). * Empty / whitespace-only entries are filtered out — the user shouldn't get a "could not read instructions file" warning for a stray `""` in the array. These were the semantics promised in the original #454 commit and the `config.example.toml` doc; pinning them with tests prevents regressions.	2026-05-03 05:33:09 -05:00
Hunter Bown	5deaf97253	fix(tui): pop keyboard flags on suspend paths too (#443 follow-up) `main.rs` (process panic) and the normal TUI shutdown both pop keyboard enhancement flags before handing the terminal back to the child shell. The two suspend paths — `pause_terminal` (Ctrl+Z and shell-suspend) and `external_editor::spawn_editor_for_input` (composer `$EDITOR` launch) — were missing the same defensive pop. Today this is dormant: the TUI doesn't push keyboard enhancement flags explicitly, so there's nothing to pop. The fix is defence-in-depth: the day a future code path enables the flags (kitty keyboard protocol for sub-second-precision modifier reporting, say), the suspend handlers won't leak the half-configured input mode to Vim / less / a shell child. Aligns the four terminal-handoff sites (shutdown, panic, suspend, editor) so they all do the same thing.	2026-05-03 05:29:11 -05:00
Hunter Bown	ac0c16996e	feat(config): instructions array merged into system prompt (#454 ) Adds a new optional `instructions = ["./AGENTS.md", "~/.deepseek/global.md"]` config field that's loaded at startup and concatenated into the system prompt, in declared order, above the skills block. * `Config::instructions: Option<Vec<String>>` — raw paths from `~/.deepseek/config.toml` or the per-project overlay. * `Config::instructions_paths()` — `expand_path` each entry, drop empties, return the resolved `Vec<PathBuf>`. * `merge_project_config` — project's array replaces the user-level array wholesale (including `instructions = []` to clear the user list for the current repo). The typical "merge" pattern is for users who want both — they list `~/global.md` inside the project array. * `EngineConfig::instructions: Vec<PathBuf>` — threaded from config through both engine entry points (`Engine::new` for Default and `refresh_system_prompt` for runtime swaps). * `prompts::render_instructions_block(paths)` — loads each file in order, caps each at 100 KiB with a `[…elided]` marker on overflow, skips missing files with a tracing warning. Returns `None` when nothing renders so the caller appends nothing. * `system_prompt_for_mode_with_context_and_skills` gains an `instructions: Option<&[PathBuf]>` parameter. Block lives between the project-context block and the skills block so it benefits from KV prefix caching and per-project overrides apply consistently turn-over-turn. Documentation: * `config.example.toml` documents the field, the wholesale- override semantics, and the size cap. Tests: * 5 new tests in `prompts.rs`: no-op for empty input, skip missing files, declared-order concatenation, skip empty files, truncate oversize files, plus an end-to-end test that the block appears in the assembled system prompt when configured.	2026-05-03 05:25:31 -05:00
Hunter Bown	5e83f073b1	feat(footer): cumulative session-elapsed indicator (#448 ) Adds `App::session_started_at: Instant` (set at construction) and a low-priority `worked Nh Mm` chip in the footer's right cluster that surfaces session age once it crosses 60s. * `footer_worked_chip(elapsed)` returns empty spans for the first minute of a session so a fresh launch doesn't render a noisy ticker. Above the threshold it reuses the multi-day `humanize_duration` helper (#447) so the band promotion stays consistent: `1m`, `3h 12m`, `2d 5h`, `1w 2d`. * The chip slots in last in `auxiliary_spans`, which means under narrow widths it's the first thing the priority-drop loop removes — the existing chips (coherence / agents / replay / cache / mcp) keep their slots. * `FooterProps` carries a captured `worked: Vec<Span<'static>>` built at props-build time (matches the existing `retry` capture pattern). Render stays pure, tests can pin a known state without relying on wall-clock. Tests: 3 new tests in `tui/widgets/footer.rs` — chip hidden under 60s, chip rendered with humanized labels at 60s / 3h 12m / 2d 5h bands. The existing `from_app_idle_state` test gains a `worked.is_empty()` assertion (the test app is freshly constructed, well under the 60s threshold).	2026-05-03 05:17:01 -05:00
Hunter Bown	6dfb10f321	feat(a11y): NO_ANIMATIONS env override + accessibility docs (#450 ) `fancy_animations: false` and `low_motion: true` already exist on the settings struct, but the flag was undocumented and the only ways to opt in were the `/settings` slash command or hand-editing `~/.config/deepseek/settings.toml` — there was no environment- level signal that platform a11y tooling could carry forward. * `NO_ANIMATIONS=1` env var now forces `low_motion = true` and `fancy_animations = false` at startup, regardless of what's on disk. Recognises `1`, `true`, `yes`, `on` (case-insensitive); any other value is treated as unset. * `Settings::apply_env_overrides()` is now called at the end of `Settings::load()`, so every consumer (App::new, /config, the doctor surface) sees the override applied uniformly. The override is a startup-time overlay — changing the env var mid-session has no effect. * New `docs/ACCESSIBILITY.md` documents the existing `low_motion`, `fancy_animations`, `calm_mode`, `show_thinking`, and `show_tool_details` toggles plus the `NO_ANIMATIONS` startup override. Includes guidance for screen-reader users and a link back to this issue for follow-up motion regressions. Tests: 3 new tests in `settings.rs` (force-low-motion-on, override- user-opt-in, truthy-spelling-recognition). All three serialise through a static Mutex so the cargo parallel runner doesn't observe interleaved env mutations.	2026-05-03 05:09:17 -05:00
Hunter Bown	3625b887fa	feat(ui): humanize_duration handles hours, days, and weeks (#447 ) Long-running sessions (multi-hour cycles, multi-day automations) were rendering with the seconds/minutes-only formatter, so a two-day session showed as `2880m 0s` and `/goal` status used Rust's Debug Duration form (`188415.234s`). `humanize_duration` now walks through w/d/h/m/s and caps the output at two units so it stays compact in headers and notifications: * `45s`, `1m 12s`, `59m 59s` (existing seconds/minutes path) * `1h`, `2h 2m`, `3h 12m` (was `192m 30s`) * `1d`, `1d 1h`, `2d 5h` (the multi-day case from the issue) * `1w`, `1w 1d`, `3w 2d` (long-running automation) The two-tier rule drops sub-minute precision once you're past the hour boundary; the goal is "is this a couple of hours or days," not stopwatch precision. `/goal` status now wires through this formatter so multi-day goal-elapsed times read as `2d 3h` instead of the previous `188415.234s` Debug form. The notification system was the existing caller and picks up the new format automatically. Tests: 4 test functions in `notifications.rs` covering the four formatting bands (s/m, h/m, d/h, w/d) plus the boundary cases on each unit.	2026-05-03 05:05:30 -05:00
Hunter Bown	0b99ad1f25	feat(engine): wire tool-output spillover into the engine and pager (#500 ) The spillover writer (#422) and inline cell annotation (#423) were already in place; this commit makes the pipeline actually fire and gives the user a way to see the elided tail. * `apply_spillover` lives in `tools/truncate.rs` and mutates a `ToolResult` in place: writes the full content to `~/.deepseek/tool_outputs/<id>.txt`, replaces the inline content with a 32 KiB head plus a footer pointing at the file, and stamps `metadata.spillover_path` so downstream renderers can find it. Skips error results so the model still sees the failure verbatim. Preserves prior metadata when present. * `core/engine/turn_loop.rs` calls `apply_spillover` immediately after `execute_tool_with_lock` returns, before the result fans out to the model context (`ContentBlock::ToolResult`) and the UI (`Event::ToolCallComplete`). Both the parallel and sequential tool paths get the same hook so the model and the UI always see the same truncated content. * `tui/ui.rs::open_details_pager_for_cell` now folds the full spillover-file body into the tool-details pager when the focused cell has a `spillover_path`. Truncated head stays at the top (so the user can see what the model received) followed by a `── Full output (spillover) ──` separator and the file body. Missing files render an inline notice instead of silently truncating. * The model's footer ("Use `read_file path=…` if you need the elided tail") teaches the agent how to recover the rest of the payload on its next turn, so spilled output is not lost — just not paid for in context tokens unless the agent decides it actually needs the tail. Tests: 4 new unit tests in `tools/truncate.rs` (no-op below threshold, no-op for errors, truncate + stamp above threshold, preserve prior metadata). 3 new tests in `tui/ui/tests.rs` for the pager helper (no-op without spillover_path, file-load happy path, graceful notice when the file is missing).	2026-05-03 05:02:11 -05:00
Hunter Bown	637d0f088f	fix(agents): list Implementer/Verifier in agent_spawn + agent_assign schemas (#404 ) The SubAgentType enum gained `Implementer` and `Verifier` variants in #404, but the JSON-schema `description` strings on AgentSpawnTool::input_schema and DelegateToAgentTool::input_schema still listed the pre-#404 set (general/explore/plan/review/custom). The model only sees those descriptions, so the new roles were effectively hidden behind a docs lookup. Updates both descriptions to the post-#404 surface and references docs/SUBAGENTS.md for posture. Also adds the long-form aliases (builder/validator/tester) to the agent_assign hint so it matches the canonical alias map. Pure copy change — no behaviour delta.	2026-05-03 04:50:51 -05:00
Hunter Bown	482fcdee7c	docs(changelog): collapse #422 + #423 spillover entry Both halves now shipped; combined entry reads more clearly than two separate ones split across Added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:45:03 -05:00
Hunter Bown	de4085304d	feat(tui): inline spillover-path annotation in tool cells (#423 ) PR #422 (sister commit on this branch) shipped the storage half: \`crates/tui/src/tools/truncate.rs\` writes large tool outputs to \`~/.deepseek/tool_outputs/<id>.txt\` and the boot prune drops files older than 7 days. This commit ships the UI half — the inline annotation that surfaces the spilled path in the tool cell so the user (and the model) can find the elided tail. ### What's wired - New \`spillover_path: Option<PathBuf>\` field on \`GenericToolCell\`. Threaded through every construction site (production + test fixtures = 28 sites; bulk-updated via a Python regex that preserves indentation per site). - \`tool_routing::push_orphan_tool_completion\` now reads \`ToolResult.metadata.spillover_path\` and stamps it on the cell. When tools start writing the metadata field (#500's wiring step), the annotation lights up automatically. - \`GenericToolCell::lines_with_mode\` emits a one-line muted annotation in \`RenderMode::Live\` only: full output: /Users/you/.deepseek/tool_outputs/call-abc12.txt Transcript-mode replay omits the annotation because the full output is already inline. - \`render_spillover_annotation\` truncates the path to fit narrow widths (40-col sidebar friendly) using the existing \`truncate_text\` helper. ### Why no OSC 8 hyperlink yet The OSC 8 wrap-link helper lives on PR #515's branch (also stacked on \`chore/v0.8.8-stabilization\`); both PRs land independently to \`main\`. Once both are in, a follow-up commit can wrap the path in \`osc8::wrap_link\` so supporting terminals make it Cmd+click-openable. The plain-text path works in every terminal today, so there's no functional regression. ### Tests 4 new tests in \`tui::history::tests\`: - \`render_spillover_annotation_shows_path\` — full path appears in the live-mode render - \`render_spillover_annotation_omitted_in_transcript_mode\` — transcript replay leaves the annotation off - \`render_spillover_annotation_omitted_when_no_path_set\` — the common case (most tool results don't trigger spillover) is unaffected - \`render_spillover_annotation_truncates_to_width\` — narrow widths don't overflow the line ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1877 + supporting (was 1873) Closes #423. #500 (preview pane) now has both halves of its prerequisites in place — the bytes are on disk (#422) and the path is surfaced in the cell (#423). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:44:43 -05:00
Hunter Bown	cea4617fb4	docs(changelog): record #422 spillover writer in v0.8.8 entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:38:17 -05:00
Hunter Bown	cf616e03bd	feat(tools): spillover-file writer + 7-day boot prune (#422 ) #500 (tool-output spillover preview pane) explicitly depends on #422 (the storage writer) and #423 (the UI annotation). This ships the storage half so the other two unblock cleanly. ### What's wired New module \`crates/tui/src/tools/truncate.rs\`: - \`spillover_root()\` — resolves \`~/.deepseek/tool_outputs/\`. - \`spillover_path(id)\` — sanitises the tool call id (ASCII alphanumerics + \`-\`/\`_\`, drops \`.\` so \`..\` can't escape) and returns \`<root>/<id>.txt\`. - \`write_spillover(id, content)\` — atomic write via the existing \`utils::write_atomic\` helper. Creates parent directory on demand. - \`prune_older_than(max_age)\` — drops files older than \`max_age\` by mtime. Returns the count pruned. Per-file errors are logged and skipped, never propagated. - \`maybe_spillover(id, content, threshold, head_bytes)\` — convenience for the "too long? spill it." pattern. Walks back to the previous UTF-8 char boundary so the head slice is always valid \`str\`. Constants: - \`SPILLOVER_DIR_NAME = "tool_outputs"\` - \`SPILLOVER_THRESHOLD_BYTES = 100 KiB\` (mirrors \`MAX_MEMORY_SIZE\` for cross-feature consistency) - \`SPILLOVER_MAX_AGE = 7 days\` (mirrors workspace snapshot prune) Boot wiring in \`run_interactive\` calls \`prune_older_than\` unconditionally; non-fatal — any error is logged at WARN and the TUI starts anyway. ### Module-level \`#[allow(dead_code)]\` The boot-prune is the only live caller today. The storage helpers (\`write_spillover\`, \`maybe_spillover\`, \`spillover_path\`) are intentionally unused outside the module's own tests until #423 / #500 land — those follow-ups need the bytes to exist, and the contracts are pinned by tests so they can't drift before then. Module-level \`#![allow(dead_code)]\` documents the deferral with a comment pointing at the follow-up issues. ### Tests 8 unit tests in \`tools::truncate::tests\`: - \`sanitise_id\` keeps safe chars, drops dangerous ones (\`..\`, \`/etc/passwd\` traversal attempts). - \`write_spillover\` creates the directory and writes content. - \`write_spillover\` rejects empty / fully-invalid ids. - \`maybe_spillover\` returns \`None\` below threshold. - \`maybe_spillover\` writes + returns the head slice above threshold. - \`maybe_spillover\` walks back to a char boundary so the head string is never mid-codepoint (regression test using 4-byte whale emojis). - \`prune_older_than\` is a no-op when the root doesn't exist. - \`prune_older_than\` keeps fresh files and drops stale ones via a Unix \`utimensat\` backdating helper. Tests serialize through a static \`Mutex\` because they share process-global \`$HOME\`; the \`with_test_home\` helper documents the SAFETY contract for the env-var override. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1873 + supporting (was 1865) Closes #422 (storage half). #423 and #500 remain open with the bytes now reachable on disk for them to cite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:38:00 -05:00
Hunter Bown	01fa11b96f	docs(changelog): note /sessions prune slash command in #406 entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:31:18 -05:00
Hunter Bown	89500e4ebe	feat(commands): /sessions prune <days> slash command (#406 phase-1.5) The previous commit shipped \`SessionManager::prune_sessions_older_than\` as a bare helper marked \`#[allow(dead_code)]\` pending phase-2 wiring. This commit wires it into a user-callable slash command so users can clean up stale sessions today, and removes the dead-code allow. ### Surface \`\`\` /sessions → open the picker (existing) /sessions show \| list \| picker → alias for the picker /sessions prune <days> → drop sessions older than N days \`\`\` \`/sessions prune 30\` returns "pruned N sessions older than 30d" or "no sessions older than 30d to prune". Errors: - missing arg → usage hint - non-positive / non-integer arg → typed error - unknown subcommand → typed error with usage The prune handler builds a fresh \`SessionManager\` from \`default_location\` so it reads the same \`~/.deepseek/sessions/\` directory the persistence layer writes; doesn't take a lock since it's a one-shot CLI-style operation that runs to completion. ### What changed - \`commands::session::sessions\` now takes \`arg: Option<&str>\` and dispatches \`show\` / \`prune\` / unknown. - New \`prune\` private fn parses the days argument, opens \`SessionManager::default_location\`, calls \`prune_sessions_older_than\` with the corresponding \`Duration\`. - \`commands::COMMANDS\` table updated: usage now reads \`/sessions [show\|prune <days>]\`. - \`commands::mod\` dispatch arm passes \`arg\` through. - \`SessionManager::prune_sessions_older_than\` doc comment updated to reflect the live wiring; \`#[allow(dead_code)]\` removed. ### Tests 5 new tests in \`commands::session::tests\`: - existing \`test_sessions_pushes_picker_view\` updated to the new signature - \`test_sessions_show_subcommand_pushes_picker_view\` — \`/sessions show\` is an explicit alias for the picker - \`test_sessions_prune_requires_days_argument\` — missing arg produces usage hint - \`test_sessions_prune_rejects_non_positive_days\` — \`0\`, negative, non-numeric, and decimal inputs are all rejected - \`test_sessions_unknown_subcommand_errors\` — typo path errors with subcommand list ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1865 + supporting Refines #406 — phase 1.5 (user-callable surface) shipped on top of phase 1 (helper). Phase 2 (boot-prune + retention policy) stays open for v0.8.9 once the policy is decided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:31:00 -05:00
Hunter Bown	2fa23c1d74	docs(changelog): record session-prune helper + doctor memory block Two items added in this stabilization pass that weren't yet in the changelog: - SessionManager::prune_sessions_older_than (#406 phase-1) - doctor --json memory block (#489 follow-up) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:27:12 -05:00
Hunter Bown	220f1b30c5	feat(sessions): SessionManager::prune_sessions_older_than helper (#406 phase-1) #406 asks for an auto-archive system for old session state. The full design needs prior-art survey + retention-policy decisions that are explicitly out of scope for v0.8.8. This commit ships the building block — a tested public method that removes session files older than a given Duration — so phase 2 can wire it into a config-knob boot prune without re-litigating the implementation. \`\`\`rust pub fn prune_sessions_older_than( &self, max_age: std::time::Duration, ) -> std::io::Result<usize> { ... } \`\`\` Behaviour: - Compares against the metadata's \`updated_at\` (not filesystem mtime — the user may have rsynced \`~/.deepseek\`; fs mtimes can lie about real session age). - Returns the count pruned; failures on individual files are logged at WARN and skipped, not propagated, so one bad record doesn't block the rest. - Skips the checkpoint subdirectory entirely. Top-level \`<session_id>.json\` files are the only candidates; \`checkpoints/latest.json\` and friends are owned by the checkpoint subsystem and live with stricter durability rules. - Marked \`#[allow(dead_code)]\` with a comment pointing at #406 phase 2 — the helper exists today, the production wiring lands next. ### Tests 5 new tests in \`session_manager::tests\`: - empty directory returns zero - all-fresh records survive - all-stale records get removed - mixed directory removes only the stale ones - checkpoint subdirectory is left alone (file untouched, count is still 1 for the top-level stale record) ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1861 + supporting Refines #406 — phase 1 (helper + tests) shipped. Issue stays open for the v0.8.9 phase-2 work that decides the retention policy and boot-prune wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:26:35 -05:00
Hunter Bown	8e7664bc70	docs(changelog): populate [Unreleased] with v0.8.8 stabilization entries Catalogues the 24 v0.8.8 issues shipped across PRs #514 / #515 / #517 / #518 / #519 in the standard Keep-a-Changelog format, organized into Added / Changed / Fixed buckets with issue cross-references. Captured: - Added (10): memory MVP + remember tool, inline diff, OSC 8, retry banner, MCP chip, project config overlay, Implementer/Verifier roles, two doc files, competitive analysis - Changed (8): sub-agent cap, RwLock, output summarization, agent_list session boundary, concise todos, compact agent_spawn, Plan panel, RLM family - Fixed (8): self-update arch, Option+Backspace word delete, offline queue scope, display_path Windows, footer theme color, panic-exit keyboard flags, CI workflow cleanup, plus the v0.8.8 release-base fix Plus a Releases callout reminding maintainers that the npm wrapper publish stays manual and the GitHub release automation depends on the \`RELEASE_TAG_PAT\` secret. The dated section header lands when the actual version-bump commit fires \`auto-tag.yml\`. This commit just populates the [Unreleased] body so contributors get a clean summary while the PRs are still in review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:19:09 -05:00
Hunter Bown	1b7939e680	feat(doctor): surface memory feature state in --json output (#489 ) Operators ask "is memory on?" and "where does it live?" without wanting to boot the TUI. Adds a \`memory\` block to the JSON doctor report: \`\`\`json "memory": { "enabled": false, // honours DEEPSEEK_MEMORY env "path": "/Users/you/.deepseek/memory.md", // expanded path "file_present": false // does the file exist on disk? } \`\`\` The \`enabled\` field reads \`DEEPSEEK_MEMORY\` directly so it stays correct on this stabilization branch even though the dedicated \`Config::memory_enabled()\` accessor lives on the memory-MVP branch (#518). When both PRs land, the duplicated env-parse can collapse to a single method call — TODO comment marks the spot. Verified: - \`deepseek doctor --json\` shows \`enabled: false\` by default - \`DEEPSEEK_MEMORY=on deepseek doctor --json\` shows \`enabled: true\` - All gates green (1856 main + supporting) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:16:01 -05:00
Hunter Bown	8071bce319	docs: MEMORY.md — user-facing memory documentation (#489 ) The memory MVP shipped in PR #518 added three surfaces (\`# \` quick-add, \`/memory\` slash command, \`remember\` model tool) plus the opt-in toggle, but the only user-facing reference today is the one-line mention of \`memory_path\` in CONFIGURATION.md and the \`#489\` cross- reference in SUBAGENTS.md. This commit adds a dedicated user-facing doc covering the whole feature. Coverage: - Why opt-in by default - How to enable (env var + config.toml) - What the system prompt block looks like - Three ways to add to memory: 1. \`# foo\` composer prefix (#492) 2. \`/memory\` slash command (#491) — show / path / clear / edit 3. \`remember\` tool (#489) — model-callable, auto-approved - File format — timestamped Markdown bullets, hand-editable - What stays out of memory — secrets / transient state / long instructions / conversation snippets - Privacy and scope — per-user, never uploaded, provider-agnostic - Configuration reference — settings table with defaults and overrides Cross-link added in CONFIGURATION.md so the existing \`memory_path\` mention now points at the full feature doc. No Rust code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:10:55 -05:00
Hunter Bown	d129ab4150	docs: SUBAGENTS.md — role taxonomy, lifecycle, output contract (#404 ) The role taxonomy expansion in #404 added Implementer + Verifier as distinct postures alongside General / Explore / Plan / Review / Custom. The issue body explicitly lists \`docs/AGENTS.md or docs/SUBAGENTS.md\` as a target file; this commit creates that file. Coverage: - Role taxonomy table — stance, write/shell access, typical use per role. - "When to pick which role" — narrative guidance the model can read if the role choice isn't obvious. - Alias map — every accepted spelling routed to a canonical role, matching what \`SubAgentType::from_str\` accepts. - Concurrency cap — the 10-by-default value, the \`[subagents].max_concurrent\` knob, and the running-only semantics (#509). - Lifecycle — Pending → Running → terminal states, plus \`Interrupted\` after a process restart. - Session boundaries (#405) — \`session_boot_id\` mechanics, default current-session filter, \`include_archived=true\` escape hatch, pre-#405 record handling. - Output contract — the SUMMARY/CHANGES/EVIDENCE/RISKS/BLOCKERS format every sub-agent must produce. - Memory + \`remember\` integration (#489) — sub-agents inherit the parent's memory file when memory is enabled and can append durable notes. - Implementation notes — source path, persisted state file, is_running semantics, RwLock pattern. Cross-link added in \`docs/TOOL_SURFACE.md\` so the sub-agent section points to this doc. No Rust code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:09:27 -05:00
Hunter Bown	2f9b58b910	fix(agents): include Implementer/Verifier aliases in error message hint (#404 ) The "Invalid sub-agent type" error message lists the accepted type strings so the model can self-correct. The list still showed the original 5 names plus their aliases — adding the new types' canonical names and aliases keeps the error helpful when the model misspells. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:06:43 -05:00
Hunter Bown	1ae042d56b	feat(agents): add Implementer and Verifier sub-agent roles (#404 ) The existing taxonomy (General / Explore / Plan / Review / Custom) covered "do something" / "go look" / "think first" / "grade work" / "explicit allowlist" but had no distinct posture for two of the most common patterns: - Implementer — "land this change with the minimum surrounding edit". Distinct from General's flexible posture and Plan's no-execution stance. - Verifier — "run the test suite and report pass/fail with evidence". Distinct from Review's read-and-grade stance. Per the issue body's guidance ("avoid a large undifferentiated role list") this PR adds only those two. Researcher and ReleaseManager stay open as v0.8.9 candidates if user demand surfaces. ### What's wired - Two new \`SubAgentType\` variants: \`Implementer\`, \`Verifier\`. - New prompt constants \`IMPLEMENTER_AGENT_PROMPT\` and \`VERIFIER_AGENT_PROMPT\` with role-specific posture (write-the- minimum-edit / run-the-tests-don't-fix-them). - \`from_str\` accepts the obvious aliases: \`implementer\` / \`implement\` / \`implementation\` / \`builder\` for Implementer; \`verifier\` / \`verify\` / \`verification\` / \`validator\` / \`tester\` for Verifier. Case-insensitive like the existing aliases. - \`as_str\` round-trips: every variant's label parses back to the same variant via \`from_str\`. Test pins this so a future role addition can't accidentally drop the round-trip. - Deprecated \`allowed_tools()\` advisory list updated: Implementer carries write/edit/patch + shell + checklist tools; Verifier carries read + shell + run_tests + diagnostics but no write tools. - \`crates/tui/src/tui/views/mod.rs\` agent-type sort order extended to include the new variants; \`format_agent_type\` now delegates to \`as_str\` so future additions land in one place. ### Tests - 6 new tests in \`tools::subagent::tests\`: - alias coverage for Implementer (4 aliases) and Verifier (5) - round-trip via \`as_str\` for all variants — guards against forgetting to register a new variant in either direction - distinct-prompts guard: catches the copy-paste bug where two new variants would inherit the same prompt as General - Implementer's advisory list contains write tools - Verifier's advisory list contains test-runner tools but NO writes ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1856 + supporting Closes #404 (minimal-taxonomy interpretation per the issue body). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:05:26 -05:00
Hunter Bown	68ec91999b	feat(tui): clarify Plan panel role + drop empty-state placeholder (#408 ) The Plan panel used to render a blunt "No active plan" line whenever the model hadn't called \`update_plan\` yet — even when the panel had a goal or a cycle counter to show above it. That made the panel look broken on every fresh session. Per the audit posted on the issue (option 1 of three), this PR keeps the Plan panel as the single source of truth for the \`update_plan\` tool's output and drops the placeholder when the panel is fully quiet, replacing it with a one-line hint that explains what the panel tracks. When a goal or cycle counter is already showing above, the empty-steps body collapses entirely so the section doesn't look ambiguous next to populated content. The panel's role is documented in a doc comment on \`render_sidebar_plan\` so the next person doesn't have to re-derive it from the issue tracker. ### What's wired - \`render_sidebar_plan\` checks "anything above" (goal + cycle_count) before deciding whether to emit the empty-state hint. If either is showing, the empty steps body adds nothing. - New \`plan_panel_empty_hint(width)\` helper composes the hint string with proper width-aware truncation. - New module-level doc comment explains the Plan panel's role (update_plan output + /goal + cycle counter) and contrasts it with Todos. ### Tests - 3 new tests in \`tui::sidebar::tests\`: - hint mentions \`update_plan\` and \`/goal\` so the user understands what populates the panel - hint truncates correctly to a 16-column sidebar without overflowing - regression guard: the hint never re-introduces "no active plan" wording ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1850 + supporting Closes #408 (option 1 of the audit). Options 2 (merge with todos) and 3 (move to top-row chip) remain open as v0.9.0 design candidates once #504's right-panel work is on the table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:00:47 -05:00
Hunter Bown	256f59dd33	feat(agents): session-boundary classification for sub-agents (#405 ) \`agent_list\` previously surfaced every persisted sub-agent the manager had on disk — including agents from prior sessions still hanging around in \`subagents.v1.json\`. In long-lived sessions this piled up and the model had to reason past 13 listed agents when only four were live. Now: each \`SubAgentManager\` assigns a fresh \`session_boot_id\` at construction. Every agent it spawns is stamped with that id, persisted alongside the existing fields, and reloaded as-is by future managers. At list time the manager classifies any agent whose stamp doesn't match the current id (or that loaded with no stamp at all from pre-#405 records) as \`from_prior_session\`. \`agent_list\` defaults to the current-session view: prior-session agents are dropped from the listing unless they're still \`Running\` (which can happen after a process restart — the manager flagged them as \`Interrupted\` on load). Pass \`include_archived=true\` to surface every record, with the \`from_prior_session\` flag on each result so the model can tell live vs archived apart at a glance. ### What's wired - \`SubAgentManager::current_session_boot_id\` — UUID-derived, generated in \`new\`. - \`SubAgent::session_boot_id\` and \`PersistedSubAgent::session_boot_id\` — the latter \`#[serde(default)]\` for backward compat (pre-#405 records load with empty string and classify as prior-session). - \`SubAgentResult::from_prior_session\` — \`#[serde(default, skip_serializing_if = "is_false")]\` so today's clients reading archived snapshots see the field, while default-false snapshots serialize without an extra noisy key. - \`SubAgentManager::list_filtered(include_archived)\` — the new user-facing API. \`SubAgent::snapshot()\` still defaults the flag to \`false\`; \`snapshot_for_listing\` (manager-only) fills it in. - \`AgentListTool\` accepts \`include_archived: bool\` (default false) and routes through the filter. The model-facing description explains the behaviour. ### Tests - \`session_boot_ids_are_unique_per_manager\` — each manager mints its own id. - \`list_filtered_drops_prior_session_terminals_by_default\` — the three-agent matrix (current running / prior completed / prior running) collapses to the right two with the right flags. - \`list_filtered_with_include_archived_returns_everything\` — archived view returns all records with correct flags. - \`agents_with_empty_boot_id_classify_as_prior_session\` — pre-#405 records load and behave as expected. - \`persist_round_trip_preserves_session_boot_id\` — write with one manager, reload with a fresh manager, confirm the agent flips to prior-session in the new manager's view. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1847 + supporting Closes #405 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:57:06 -05:00
Hunter Bown	b54a708cf7	feat(tui): compact agent_spawn rendering — single line, DelegateCard owns the rest (#409 ) The transcript previously rendered each \`agent_spawn\` call as a 3-4 line generic tool block (header + name kv + args summary + output JSON) AND its companion \`DelegateCard\` (header + live action lines + summary). Four parallel spawns produced ≥16 lines of nearly-identical scaffolding before the model said anything useful. In live mode \`agent_spawn\` now renders as a single header line — \`◐ delegate · agent-abc12 [running]\` — with the agent id pulled from the tool's JSON output. The DelegateCard remains the source of truth for live action progress and the final summary; the generic block is no longer fighting it for attention. Transcript-mode replay (used by \`/pager\`, session export, and the detail pager opened with Alt+V) keeps the full multi-line block so debug history is preserved. ### What's wired - \`GenericToolCell::lines_with_mode\` early-returns \`render_agent_spawn_compact\` when \`name == "agent_spawn"\` and \`mode == RenderMode::Live\`. - New \`render_agent_spawn_compact\` builds one header line with the family glyph (Delegate), the spawned agent id (or \`…\` placeholder while the spawn is in flight), and the tool status. - New \`extract_agent_id(output)\` parser: deterministic, allocation-free string scan for \`"agent_id"\` → quoted value. Avoids dragging serde into a render hot path. ### Tests - 4 \`extract_agent_id\` tests: typical JSON, extra whitespace, missing key (None), empty string id (None). - 4 render tests: live one-liner contains agent id + status with no args/name kv leaking, pending render uses \`…\` placeholder, transcript mode keeps the full block, non-spawn tools (read_file) still render their normal multi-line block. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1842 + supporting Closes #409 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:44:19 -05:00
Hunter Bown	c52f2c46f1	feat(tui): concise todo / checklist update rendering (#403 ) When the model fires \`todo_update\` / \`checklist_update\` repeatedly during a long run, the live transcript previously dumped the full checklist card (header + every item + progress) on every call. In sessions with 20+ items and a dozen status flips the same item list appears over and over, drowning the actual work. Now: when a checklist update output starts with the "Updated todo #N to STATUS" prefix the tool already emits, the live renderer shows a compact one-line state-change card — \`Todo #N: <title> → STATUS\` — plus a \`M/N · pct%\` summary in the header and a \`N items (Alt+V for full list)\` affordance underneath. The full item list is still reachable via the existing detail pager. Falls back to the full-card render path for: - \`todo_write\` / \`checklist_write\` (no "Updated" prefix — first emission of the list) - transcript-mode replays (the user wants the full snapshot when scrolling history) - malformed prefixes (parse failure → fall through, never crash) ### What's wired - New \`parse_update_prefix(output)\` parser handles both \`Updated todo #N to STATUS\` and \`Updated checklist #N to STATUS\` forms. - New \`render_checklist_change_card\` builds the compact card. Looks up the title from the snapshot's items array (id is 1-indexed), falls back to \`(missing title)\` if the id is out of range. - \`try_render_as_checklist\` calls the change-card path only in \`RenderMode::Live\` and only when the parser matches. Pre-existing cases (writes, transcript replay) keep the full-card behavior. ### Tests - 4 parser tests: todo form, checklist form, write outputs falling through, malformed prefixes falling through. - 2 renderer tests: compact card shows only the changed item (with assertions that other titles do NOT appear), missing-title path. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1834 + supporting Closes #403 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:36:21 -05:00
Hunter Bown	4d4a9b424c	feat(config): expand per-project overlay to cover provider, sandbox, approval, mcp_path, max_subagents, allow_shell (#485 ) The project-config overlay (`<workspace>/.deepseek/config.toml` merged on top of the user's global `~/.deepseek/config.toml`) was already wired but only carried four string fields: model, api_key, base_url, reasoning_effort. The use cases users actually file under #485 — "this repo wants a different sandbox / approval policy / MCP server set / hard sub-agent cap" — weren't covered. ### What ships Adds the following keys to the project overlay, all merged with identical "non-empty wins" semantics for strings: - `provider` — pick a different backend per repo (e.g. `nvidia-nim` for an enterprise repo, `deepseek-cn` for a CN-team repo). - `approval_policy` — `never` / `on-request` / `untrusted` for repos with strict policies. - `sandbox_mode` — `read-only` / `workspace-write` / `danger-full-access`. - `mcp_config_path` — per-repo MCP server set without touching the user's global file. - `notes_path` — keep notes in-repo for projects where the notes tool is part of the dev workflow. Plus two non-string fields: - `max_subagents` (positive integer; clamped to `1..=MAX_SUBAGENTS=20`). - `allow_shell` (bool). ### What stays user-global `skills_dir`, `hooks`, `[capacity]`, `[retry]`, `[memory]`, etc. — those are user-shaped settings, not repo-shaped. If a future use case demands per-project values for any of them, a follow-up PR can extend the overlay rather than letting the boundary blur. ### Tests - 8 new tests in `project_config_tests` covering: provider+model, approval+sandbox, max_subagents+allow_shell, max_subagents clamping, negative-max_subagents rejection, missing config file pass-through, malformed TOML pass-through, and empty-string no-op. ### Docs - New "Per-project overlay (#485)" section in `docs/CONFIGURATION.md` with a table of supported keys and the rationale for which fields stay user-global. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1828 + supporting (was 1820) Closes #485 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:25:43 -05:00
Hunter Bown	a723ddd63d	feat(tui): MCP server health chip in footer (#502 ) Adds a small `MCP M/N` chip to the footer's right-side auxiliary cluster so users with MCP servers see at-a-glance health without running `/mcp`. The chip is color-coded by reachability: - all configured servers reachable → success (sky) - some reachable, some failed → warning (amber) - zero reachable but ≥1 configured → error (red) - configured but no live snapshot yet → muted (gray, count only) When zero servers are configured the chip is hidden entirely; users who don't use MCP see no change. ### What's wired - New `App::mcp_configured_count`, populated at app boot from `mcp::load_config(&mcp_config_path)`. Cheap (just reads the JSON config; no server connections), so it doesn't block startup. - `app.mcp_snapshot.servers.iter().filter(\|s\| s.connected).count()` drives the live-state portion when the user has run `/mcp` at least once. - `FooterProps` gains an `mcp: Vec<Span<'static>>` field built by `footer_mcp_chip(connected, configured)`. Threaded into `auxiliary_spans` so it participates in the priority-drop pipeline at narrow widths. - After any `/mcp` action that returns a snapshot, the count is refreshed from the snapshot so post-edit state is reflected. ### #499 follow-up: pure render path Moves the retry-status capture into `FooterProps` (`pub retry: RetryState`) sampled in `from_app`, instead of pulling from the global surface inside `render`. The render method is now pure with respect to its props — fixes a parallel-test race where retry-banner tests and unrelated footer tests would observe each other's writes through the process-wide retry surface. ### Tests - 5 unit tests in `footer_mcp_chip`: hidden when zero, count-only when no snapshot, success / warning / error colours by reachability. - Existing retry-banner tests now pin `props.retry` directly rather than mutating the global surface — no more `test_guard()` needed, no more parallel-runner flakes. - All 31 footer tests pass in parallel. ### Verification cargo fmt --all -- --check ✓ cargo clippy --workspace --all-targets --all-features --locked -- -D warnings ✓ cargo test --workspace --all-features --locked ✓ 1820 + supporting Closes #502 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:16:27 -05:00

1 2 3 4 5 ...

493 Commits