Commit Graph

492 Commits

Author SHA1 Message Date
Hunter Bown c0b6c2a1e5 perf(hooks): fast-path skip when no hooks configured (#455 follow-up)
Now that `tool_call_before` / `tool_call_after` fire on every
tool dispatch, the cost of constructing a `HookContext` (which
allocates for `workspace`, `model`, `session_id`, …) shows up
on the hot path even when the user has zero hooks configured —
the common case.

Adds `HookExecutor::has_hooks_for_event(event)` as a cheap
boolean gate that callers consult before building the context.
The pre-check returns false when:

* `config.enabled == false` (globally disabled).
* No hook in the config has the given `event`.

Wired through every fire site:

* `tool_routing.rs::handle_tool_call_started` —
  `ToolCallBefore`.
* `tool_routing.rs::handle_tool_call_complete` —
  `ToolCallAfter`. Also skips the `result.content.clone()`
  that the `with_tool_result` builder demands.
* `ui.rs::dispatch_user_message` — `MessageSubmit`.
* `ui.rs::apply_engine_error_to_app` — `OnError`.

Inside `HookExecutor::execute` itself, also short-circuit
before calling `context.to_env_vars()` when no hooks match the
event — defends against a caller that builds the context but
forgets to gate.

Tests:
  3 new tests cover empty-config / globally-disabled /
  per-event filtering. The existing 18 hook tests pass
  unchanged.

No behavioral change for users with hooks configured; pure
allocation-free fast path otherwise.
2026-05-03 07:07:11 -05:00
Hunter Bown e569f2ca99 feat(hooks): fire message_submit + on_error too (#455 observer-only)
Completes the observer-only slice of #455 by wiring the two
remaining `HookEvent` variants that were defined but never
fired:

* `MessageSubmit` fires from `dispatch_user_message` before
  the message is handed to the engine. Hook context carries
  `message` so observers can log every prompt the user
  submits, redact for compliance audit, or page on
  `/wipe-database`-style content. Read-only.
* `OnError` fires from `apply_engine_error_to_app` before the
  error cell reaches the transcript. Hook context carries
  `error`. Useful for paging on auth / billing / invalid-
  request failures without tailing the audit log.

Combined with the prior `tool_call_before` / `tool_call_after`
wiring, every `HookEvent` variant now has a live producer:
`SessionStart`, `SessionEnd`, `MessageSubmit`, `ToolCallBefore`,
`ToolCallAfter`, `ModeChange`, `OnError`. The `/hooks events`
listing already enumerates them with their on-fire semantics.

Hooks remain read-only observers in this slice. Mutation is
v0.8.9 follow-up because it needs a synchronous-gate contract
that would change semantics for every hook surface — including
the lifecycle events that have shipped for many releases.
2026-05-03 07:01:52 -05:00
Hunter Bown 4310202645 feat(hooks): fire tool_call_before / tool_call_after (#455 observer-only)
The `HookEvent::ToolCallBefore` and `HookEvent::ToolCallAfter`
enum variants were defined but never fired from production code,
so `[[hooks.hooks]]` entries with those events sat dormant.

Wires the fires from `tui/tool_routing.rs`:

* `handle_tool_call_started` fires `ToolCallBefore` with the
  hook context populated with `tool_name` and `tool_args`. The
  fire happens before any UI bookkeeping so observers see the
  call as early as possible.
* `handle_tool_call_complete` fires `ToolCallAfter` after the
  cell finalization with the result content (or stringified
  error) + success flag. Stays last in the function so any UI
  state the hook might want to observe via shell-out is
  already settled.

Hooks remain read-only observers in this slice. Mutation
(modifying tool args before execution, or the result before it
reaches the model) is a v0.8.9 follow-up that needs a
synchronous-gate contract; the existing executor is fire-and-
forget and adding mutation would change semantics for every
existing hook surface (session_start, mode_change, etc.).

Operators can wire `tool_call_before` / `tool_call_after`
hooks in `~/.deepseek/config.toml` immediately to log every
tool call, page on long shell exec, or audit risky operations.
The `/hooks events` listing already enumerates them.

No new tests — `tool_routing.rs` has no existing test surface,
and the hook execution path is already covered via
`hooks::tests::*`. The wiring is mechanically minimal.
2026-05-03 06:59:26 -05:00
Hunter Bown a2c7c94f5d test(pr): pin is_command_available contract (#451 follow-up)
Adds a tiny test that exercises both branches of the helper used
by `deepseek pr <N>` to detect `gh`'s presence:

* Positive case — `sh` (POSIX baseline) is reported present.
  Gated on `cfg(unix)` because Windows runners aren't
  guaranteed to have `sh.exe` outside git-bash.
* Negative case — a deliberately-implausible
  `this-command-cannot-exist-…ENOENT-marker` returns `false`
  rather than panicking from the `Command::new` exec failure.

Pure additive coverage; no production change.
2026-05-03 06:54:05 -05:00
Hunter Bown 8ed1cb4e68 feat(hooks): /hooks events subcommand for discovery (#460 polish)
The shipped `/hooks list` told users WHAT was configured but
not WHAT they could configure. Without this, the only way to
learn the supported `HookEvent` values is to grep source — not
ideal when most users just want to wire up a notification on
session_end.

Adds `/hooks events` (aliases `event` / `list-events`) which
prints every `HookEvent` variant alongside a short descriptive
blurb (when it fires, current observability-vs-mutation status).
Ordered lifecycle → per-tool → situational so the listing reads
naturally and stays stable across releases.

Updates `CommandInfo::usage` to `/hooks [list|events]` so the
fuzzy autocomplete shows the new subcommand.

Tests:
  1 new test (`events_subcommand_lists_every_event_variant_in_documented_order`)
  pins the order, the per-event descriptive blurb format, and
  exhaustive variant coverage. The existing 6 hooks tests pass
  unchanged.
2026-05-03 06:51:27 -05:00
Hunter Bown 14931566b5 test(audit): pin emit_tool_audit contract (#500 follow-up)
The `tool.spillover` audit emission shipped in 0fa042 added a
new caller to `emit_tool_audit` but the function itself had no
unit tests pinning its contract — operators relying on
`DEEPSEEK_TOOL_AUDIT_LOG` deserve regression coverage on the
JSONL writer.

Adds 3 tests:

* `emit_tool_audit_writes_jsonl_line_when_env_var_set` —
  verifies each call appends a parseable JSON line, with the
  expected `event` and `tool_id` keys reaching disk.
* `emit_tool_audit_is_noop_when_env_var_unset` — pins the
  early-return when the env var is missing (no panic, no file
  side effects).
* `emit_tool_audit_creates_parent_directory` — confirms the
  `create_dir_all(parent)` step works for previously-missing
  paths so operators can point the env var at a fresh path
  without a chicken-and-egg setup step.

All three serialise through a static Mutex because they mutate
process-global `DEEPSEEK_TOOL_AUDIT_LOG`. Cleanup happens on
each test under the same guard.
2026-05-03 06:48:59 -05:00
Hunter Bown a8e0693958 feat(doctor): report spillover dir + composer stash file (#422/#440 polish)
The v0.8.8 polish stack added two on-disk surfaces operators
might want to inspect — `~/.deepseek/tool_outputs/` for spilled
tool output (#422 / #500), and `~/.deepseek/composer_stash.jsonl`
for parked composer drafts (#440). Neither showed up in
`deepseek doctor`, so users couldn't see at a glance "do I have
parked drafts?" or "how much disk has spillover claimed?"

Adds a `Storage:` section to the human-readable doctor and a
`storage` object to the JSON doctor:

* Spillover slot reports the dir's existence and entry count.
  Pre-creation state ("not yet created") is shown explicitly
  rather than as a missing dir — the dir is created lazily on
  first spill, not at boot.
* Stash slot reports the file's existence and parked-draft
  count by re-reading via `composer_stash::load_stash`. Empty /
  missing stash shows the Ctrl+S hint so the user knows how to
  use the feature.

The JSON schema always emits both nested slots regardless of
state (so dashboard schemas stay stable across hosts); the
human-readable hides the "not yet created" line for spillover
when the dir is missing to keep the report scannable.
2026-05-03 06:46:20 -05:00
Hunter Bown b1c6e6b173 feat(doctor): report .opencode + .claude skill dirs (#432 follow-up)
The cross-tool skill discovery shipped in 432a0c1 walks
`.opencode/skills/` and `.claude/skills/` alongside the
`.agents/skills/` and `skills/` workspace folders, but the
`deepseek doctor` output still only listed the original three
slots. Operators staring at "where are my Claude-style skills?"
had no way to confirm whether the new dirs were even being
checked.

Updates both surfaces:

* Human-readable doctor — adds two conditionally-printed lines
  for `.opencode skills dir` and `.claude skills dir`. Empty
  dirs are omitted to keep the report scannable; the dirs
  exist on most workspaces only when the user has installed
  another AI tool's skill catalog there.
* JSON doctor (`deepseek doctor --json`) — adds `opencode` and
  `claude` slots to the `skills` object alongside the existing
  `global`, `agents`, `local`. Each carries `path`, `present`,
  and `count`. JSON consumers see all five keys regardless of
  presence so dashboard schemas stay stable across hosts.

The `selected_skills_dir` field still reflects the legacy
"highest-precedence single dir" — workspace-aware discovery is
done at runtime by `discover_in_workspace`, but `selected` is a
useful "where do I install a NEW skill" hint and stays
unchanged for backwards compatibility with existing diagnostic
tooling.
2026-05-03 06:43:47 -05:00
Hunter Bown 5627d6535b docs: document NO_ANIMATIONS, instructions array, /hooks, /stash
Catches up `docs/CONFIGURATION.md` with the v0.8.8 polish stack so
operators have one source of truth for the new surfaces:

* `NO_ANIMATIONS` env override (#450) joins the existing
  environment-variable list, with a cross-reference to
  `docs/ACCESSIBILITY.md`.
* New `### Instruction sources` section documents the
  `instructions = [...]` config field (#454): expansion rules,
  100 KiB per-file cap with `[…elided]` marker, missing-file
  warning behavior, and the project-wholesale-replaces-user
  override semantics.
* New `### /hooks listing` section documents the read-only
  slash command (#460 MVP) so users know how to introspect
  configured lifecycle hooks without `cat`-ing config.toml.
* New `### Composer stash` section documents Ctrl+S +
  `/stash list|pop|clear` (#440) including the 200-entry cap
  and multiline preservation.

Pure documentation; no code changes. Existing prompt-stability
and config-loading tests are unaffected.
2026-05-03 06:39:29 -05:00
Hunter Bown a368dc53b8 feat(commands): /hooks read-only lifecycle hook listing (#460 MVP)
Slash command enumerates configured lifecycle hooks from the
user's `[hooks]` table, grouped by event. The full picker /
persisted enable-disable surface in #460 is still M-sized work;
this MVP gives users a no-typing view of what's actually loaded
— the most-asked question once hooks start firing.

Implementation:

* `crates/tui/src/commands/hooks.rs` formats the hook list with
  per-event headings, hook name (or `(unnamed)`), background
  marker, timeout, condition summary, and a 60-char shell
  command preview.
* `condition_summary` covers every `HookCondition` variant
  (Always/ToolName/ToolCategory/Mode/ExitCode/All/Any) so the
  listing stays informative for compound conditions too.
* `event_label` maps each `HookEvent` to its config-file string
  so the listing matches what the user wrote in TOML.
* New `HookExecutor::config()` accessor exposes the underlying
  `HooksConfig` for read-only callers; doesn't open the door
  to mutation, which still belongs to the broader #460 work.
* Registered in `commands::COMMANDS` with `aliases: &["hook"]`,
  usage `/hooks [list]`, and `MessageId::CmdHooksDescription`
  localized in en, ja, zh-Hans, pt-BR.
* Wired into `command_palette::command_runs_directly` so
  pressing Enter from Ctrl+K runs `/hooks list` straight.

Tests:
  6 unit tests covering preview-cap truncation, newline
  stripping, condition-summary variants, event-label
  exhaustiveness, and BTreeMap-grouping ordering.
2026-05-03 06:36:37 -05:00
Hunter Bown 15127046e8 feat(stash): /stash clear subcommand to wipe the stash file (#440 polish)
Pairs with `/stash list` and `/stash pop` so the user can fully
manage the stash from inside the TUI without reaching for `rm`.

* New `composer_stash::clear_stash()` returns the number of
  entries dropped so the slash command can report it.
  Atomic-write replaces the file with empty content; missing /
  empty files return `Ok(0)` without erroring.
* `clear` / `wipe` / `drop` are accepted as the subcommand
  alias. The "unknown subcommand" hint now lists the three live
  subcommands explicitly.
* CommandInfo usage updated to `/stash [list|pop|clear]` so
  `/help` and the autocomplete reflect the new option.
* 3 new tests in `composer_stash`: returns-0 when file absent,
  returns-0 when file is empty, drops entries and reports count
  on a populated stash.

No new dependency; reuses `crate::utils::write_atomic` for the
truncate-and-rewrite.
2026-05-03 06:28:18 -05:00
Hunter Bown ba871c56f6 feat(cli): deepseek pr <N> — pre-seed TUI with PR context (#451)
`deepseek pr 1234` fetches the PR's title, body, base/head, URL,
and full diff via `gh`, then launches the interactive TUI with a
review prompt already typed in the composer. The user can edit
before sending or hit Enter to fire as-is. Falls back gracefully
with an actionable error when `gh` is not on PATH.

Implementation:

* `Commands::Pr { number, repo, checkout }` subcommand. Optional
  `--repo <owner/name>` mirrors `gh pr view`'s flag. Optional
  `--checkout` opt-in for `gh pr checkout`; default is to leave
  the working tree alone since `gh pr checkout` errors out on
  dirty trees.
* `run_pr` helper drives three best-effort gh shell-outs
  (`pr view --json`, `pr diff`, optional `pr checkout`) and
  formats a structured prompt: PR header → URL → branches →
  description → fenced ```diff block.
* `format_pr_prompt` caps the diff at 200 KiB with codepoint-
  safe truncation so a massive PR doesn't blow the model's
  context window before the user even hits Enter.
* New `TuiOptions::initial_input: Option<String>` plumbs the
  pre-typed text into `App::new` (which now branches its
  composer-state init around the option). Cursor lands at the
  end of the seed text. Future callers (welcome screens, share-
  link landing pages, etc.) can reuse the same channel.
* `run_interactive` gains an `initial_input: Option<String>`
  parameter; existing callers pass `None`.

Tests:
  3 new tests in `pr_prompt_tests` cover the happy path
  (title/url/branches/body/diff render correctly), empty-input
  fallbacks (placeholder for missing title/body/branches/url),
  and codepoint-safe truncation when the diff exceeds the
  200 KiB cap.

Bulk update: every other `TuiOptions { ... }` test-builder
across the workspace (~21 sites) gains `initial_input: None`
so the new field doesn't break the existing test suite.
2026-05-03 06:23:54 -05:00
Hunter Bown a9222f4b8c feat(stash): make /stash run directly from the command palette (#440 polish)
`/stash` defaults to `list` when invoked without an argument, so
in the Ctrl+K command palette it should execute on Enter rather
than insert `/stash ` and wait for the user to type `list`. The
identical pattern already applies to `/queue`, which has the same
optional-arg shape.

Adds `"stash"` to the `command_runs_directly` allowlist alongside
`queue`. The fuzzy-search rank, label match, and section grouping
already pick up `/stash` automatically because they iterate over
`commands::COMMANDS` (which gained the entry in 2db4843).

No behavior change on type-then-Enter — only on the
hit-Enter-from-the-palette path. The existing 8 command-palette
tests pass unchanged.
2026-05-03 06:14:34 -05:00
Hunter Bown 2db48435e8 feat(stash): register /stash in /help and autocomplete (#440 polish)
The slash command landed in 6fb87 but only via the dispatch
match arm — `/help` and the fuzzy autocomplete consult
`COMMANDS: &[CommandInfo]` to enumerate available commands, and
without a `CommandInfo` entry the new `/stash` was effectively
hidden from discovery.

Adds a `CommandInfo` row with `aliases: &["park"]`, a
`/stash [list|pop]` usage hint, and a new
`MessageId::CmdStashDescription` localized in en, ja, zh-Hans,
pt-BR. The description reminds users that Ctrl+S is the
matching push entry point — both surfaces should reinforce each
other in the help overlay.

No behavior change on the dispatch path; this is pure
discoverability.
2026-05-03 06:13:06 -05:00
Hunter Bown 6fb8739feb feat(composer): prompt stash — Ctrl+S parks, /stash list+pop (#440)
A stash is a side-channel from history: it holds drafts the user
parked deliberately instead of submissions made in the past
(which live in `composer_history.rs`).

* `crates/tui/src/composer_stash.rs` — JSONL-backed store at
  `~/.deepseek/composer_stash.jsonl`. One JSON object per line
  with `ts` (RFC 3339) and `text`. Self-healing parser drops
  malformed lines instead of poisoning the file. Multi-line
  drafts round-trip intact via JSON's newline escaping. Capped
  at 200 entries; oldest pruned at push time. Empty /
  whitespace-only text is silently dropped.
* `crates/tui/src/commands/stash.rs` — `/stash list` renders the
  stash with one-line previews and timestamps; `/stash pop`
  restores the most recently parked draft into the composer
  (LIFO) and rewrites the file. `/park` aliases `/stash`.
* Composer Ctrl+S handler in `tui/ui.rs` — pushes the current
  draft onto the stash, clears the composer, and surfaces a
  toast confirming the action so the no-op-feel doesn't fool
  users into thinking nothing happened. Empty composers are a
  no-op so a stray Ctrl+S can't pollute the file.
* New `KbStashDraft` keybinding entry registered in the help
  overlay; localized in en, ja, zh-Hans, pt-BR.

Tests:
  7 unit tests in `composer_stash.rs` cover round-trip, LIFO pop,
  empty-on-pop, drop-empty-text, multi-line preservation,
  malformed-line resilience, and cap pruning. 4 unit tests in
  `commands/stash.rs` cover the preview helper's truncation,
  multi-line first-line behavior, and empty-input handling.
2026-05-03 06:09:35 -05:00
Hunter Bown 99223b148c docs(prompt): list load_skill in the model's toolbox reference (#434)
The new `load_skill` tool was registered into the agent and plan
mode tool sets in 0c1699 but the prompt's `## Toolbox`
quick-reference still listed only the legacy progressive-
disclosure pattern (system prompt → read_file). The model has to
read the tool description to know `load_skill` exists, but
without a hint in the toolbox it's easy to miss when scanning.

Adds a `**Skills**` line that points at `load_skill` and explains
when to prefer it over `read_file` + `list_dir`. Pulls from the
existing `## Skills` section above for context, so the model
sees one short cross-reference instead of duplicate setup
instructions.

No code change; prompt-only doc edit. Existing prompt-stability
tests pass unchanged because they don't compare prose.
2026-05-03 06:01:15 -05:00
Hunter Bown 0fa042dc99 feat(audit): emit tool.spillover events when output is spilled (#500 polish)
The existing `tool.result` audit event records that a tool
finished but says nothing about spillover — operators tailing
`~/.deepseek/audit.log` couldn't see when 200 KiB of stdout
landed under `~/.deepseek/tool_outputs/`.

Adds a discrete `tool.spillover` event keyed off
`apply_spillover`'s return value, fired in both the sequential
and parallel tool paths so the log entry exists regardless of
how the tool was scheduled. Each event carries:

  {"event": "tool.spillover", "tool_id": "...",
   "tool_name": "exec_shell", "path": "/.../call-abc.txt"}

This is a pure observability addition. The model still receives
the same truncated head + footer; the UI still renders the
inline `full output: <path>` annotation; the spillover writer
contract is unchanged. No new tests — `apply_spillover` already
has unit-level coverage and the engine paths are exercised by
integration runs.
2026-05-03 05:58:02 -05:00
Hunter Bown 6b0a60883a test(skill): integration tests for the load_skill execute path (#434/#432)
The five existing tests cover the helpers (`format_skill_body`,
`collect_companion_files`) directly. Adds two integration tests
that drive the full `LoadSkillTool::execute` async path:

* `execute_finds_skills_in_opencode_dir_via_workspace_discovery` —
  installs a skill under `<workspace>/.opencode/skills/` and
  verifies the tool finds it via `discover_in_workspace`,
  returns the body, and stamps `metadata.skill_path` pointing
  at the .opencode dir. Pins #432's multi-dir wiring through
  the actual tool entry point, not just the unit-level helper.
* `execute_returns_helpful_error_for_unknown_skill` — verifies
  the "skill not found" error includes both the missing name
  and the available skill list so the model can recover
  without a separate discovery call.

Both use `#[tokio::test]` because `ToolSpec::execute` is async.
ToolContext is constructed via the existing `ToolContext::new`
helper so the test stays hermetic across hosts.
2026-05-03 05:56:29 -05:00
Hunter Bown d7017b7829 feat(skills): walk workspace .opencode + .claude skill dirs (#432)
The skills catalogue and `load_skill` tool now scan every
candidate directory in the workspace plus the global default,
not just the first one that exists:

  <workspace>/.agents/skills    (deepseek-native convention)
  <workspace>/skills            (flat, project-local)
  <workspace>/.opencode/skills  (OpenCode interop)
  <workspace>/.claude/skills    (Claude Code interop)
  ~/.deepseek/skills            (global, user-installed)

Skills installed for any AI-tool convention land in the same
catalogue without the user having to symlink or duplicate
files. Name conflicts resolve first-match-wins per the
precedence list above, so workspace-local skills shadow
user/global ones — that's the right shadowing for "this repo
overrides my defaults".

Implementation:

* `skills::skills_directories(workspace)` returns the existing
  candidate dirs in precedence order (host-dependent for the
  global default).
* `skills::discover_in_workspace(workspace)` walks each, merges
  the discovered skills, and accumulates warnings.
* `render_available_skills_context_for_workspace(workspace)`
  wraps `discover_in_workspace` for `prompts.rs`. The legacy
  single-dir `render_available_skills_context(skills_dir)` is
  retained as a fallback so callers that don't have a workspace
  view (e.g. mcp_server.rs) still work.
* `LoadSkillTool` (#434) routes through `discover_in_workspace`
  so its lookup matches what the system-prompt catalogue
  advertises. The "skill not found" error message now lists the
  searched dirs to help the user debug missing installs.

Tests:
  4 new tests in `skills/mod.rs`: precedence-order resolution,
  first-wins merge across .agents and .claude, .opencode
  discovery, system-prompt rendering for cross-tool dirs. The
  existing 6 single-dir tests pass unchanged.
2026-05-03 05:52:28 -05:00
Hunter Bown 8290b136e1 feat(tui): push DISAMBIGUATE_ESCAPE_CODES on startup (#442)
Opt into the Kitty keyboard protocol's escape-code disambiguation
so terminals that support it (Kitty, Ghostty, Alacritty 0.13+,
WezTerm, recent Konsole / xterm) report unambiguous events for
Option/Alt-modified keys, plain Esc, and multi-byte sequences.

Push happens after `enable_raw_mode` and the alt-screen /
mouse-capture / bracketed-paste setup so the order matches
shutdown's reverse-order pop. Only the disambiguation tier is
pushed — `REPORT_EVENT_TYPES` and the higher tiers emit release
events that the existing key handlers would mis-route as
duplicate presses.

Pop on exit was already wired in main.rs (panic) and ui.rs
(normal shutdown) per #443; the recent #443 follow-up extended
that to the suspend paths so editor / shell-suspend children
inherit a clean keyboard mode. The push + the four pops form
a complete pair.

Failure to push is logged at debug level and ignored — a quirky
terminal can't block startup. On terminals without protocol
support the escape sequence is silently discarded and behaviour
is identical to today (iTerm2, Terminal.app, Windows 10 conhost).

No new dependency; everything runs through crossterm's existing
`PushKeyboardEnhancementFlags` command.
2026-05-03 05:45:52 -05:00
Hunter Bown e8af3cd37d feat(tools): load_skill model-callable tool (#434)
Adds a `load_skill` tool that takes a skill id and returns the
SKILL.md body plus the sibling companion-file list in one tool
call. The existing progressive-disclosure pattern (system prompt
lists skills → model `read_file <path>`) still works; this tool
is the higher-level affordance for skills that ship with multiple
resource files.

Implementation:

* `LoadSkillTool` lives in `crates/tui/src/tools/skill.rs`. Read-
  only, auto-approved, parallel-safe.
* On call, resolves the active skills directory via the new
  `skills::resolve_skills_dir` helper, which mirrors
  `App::new`'s hierarchy: `<workspace>/.agents/skills` →
  `<workspace>/skills` → `~/.deepseek/skills`. No new plumbing
  through ToolContext — the workspace is already there.
* Returns the skill body wrapped in a self-contained block:
  description quote, source path, the SKILL.md verbatim, and a
  `## Companion files` section listing siblings (sorted lex,
  deterministic for tests). Solo skills skip the companions
  section entirely so the tool result stays tight.
* Errors with a helpful hint when the name is unknown — the
  hint includes the catalogue ("Available: foo, bar, baz") so
  the model can recover without an extra discovery call.
* Wired into `ToolRegistryBuilder::with_skill_tools` and pulled
  into both Agent and Plan tool-setup paths. Plan mode benefits
  because skills are read-only references that planners often
  need.

Tests:
  5 unit tests covering: description-headed body, companion
  enumeration excluding SKILL.md and nested dirs, empty result
  for solo skills, and the conditional `## Companion files`
  section.
2026-05-03 05:43:18 -05:00
Hunter Bown 20913b2f17 test(config): pin instructions-array merge semantics (#454 follow-up)
Adds four tests that pin the documented contract for the new
`instructions = [...]` field added in 0c1699:

* Project array replaces the user array wholesale (the typical
  "merge" pattern is for users who want both — they list
  ~/global.md inside the project array).
* Explicit `instructions = []` clears the user list — a project
  signalling "this repo doesn't want any of those globals".
* Absent project field leaves the user list intact (nothing
  in the project file → user wins by default).
* Empty / whitespace-only entries are filtered out — the user
  shouldn't get a "could not read instructions file" warning
  for a stray `""` in the array.

These were the semantics promised in the original #454 commit
and the `config.example.toml` doc; pinning them with tests
prevents regressions.
2026-05-03 05:33:09 -05:00
Hunter Bown 5deaf97253 fix(tui): pop keyboard flags on suspend paths too (#443 follow-up)
`main.rs` (process panic) and the normal TUI shutdown both pop
keyboard enhancement flags before handing the terminal back to
the child shell. The two suspend paths — `pause_terminal`
(Ctrl+Z and shell-suspend) and
`external_editor::spawn_editor_for_input` (composer `$EDITOR`
launch) — were missing the same defensive pop.

Today this is dormant: the TUI doesn't push keyboard
enhancement flags explicitly, so there's nothing to pop. The
fix is defence-in-depth: the day a future code path enables
the flags (kitty keyboard protocol for sub-second-precision
modifier reporting, say), the suspend handlers won't leak the
half-configured input mode to Vim / less / a shell child.

Aligns the four terminal-handoff sites (shutdown, panic,
suspend, editor) so they all do the same thing.
2026-05-03 05:29:11 -05:00
Hunter Bown ac0c16996e feat(config): instructions array merged into system prompt (#454)
Adds a new optional `instructions = ["./AGENTS.md", "~/.deepseek/global.md"]`
config field that's loaded at startup and concatenated into the
system prompt, in declared order, above the skills block.

* `Config::instructions: Option<Vec<String>>` — raw paths from
  `~/.deepseek/config.toml` or the per-project overlay.
* `Config::instructions_paths()` — `expand_path` each entry,
  drop empties, return the resolved `Vec<PathBuf>`.
* `merge_project_config` — project's array replaces the
  user-level array wholesale (including `instructions = []` to
  clear the user list for the current repo). The typical "merge"
  pattern is for users who want both — they list `~/global.md`
  inside the project array.
* `EngineConfig::instructions: Vec<PathBuf>` — threaded from
  config through both engine entry points (`Engine::new` for
  Default and `refresh_system_prompt` for runtime swaps).
* `prompts::render_instructions_block(paths)` — loads each file
  in order, caps each at 100 KiB with a `[…elided]` marker on
  overflow, skips missing files with a tracing warning. Returns
  `None` when nothing renders so the caller appends nothing.
* `system_prompt_for_mode_with_context_and_skills` gains an
  `instructions: Option<&[PathBuf]>` parameter. Block lives
  between the project-context block and the skills block so it
  benefits from KV prefix caching and per-project overrides
  apply consistently turn-over-turn.

Documentation:
* `config.example.toml` documents the field, the wholesale-
  override semantics, and the size cap.

Tests:
* 5 new tests in `prompts.rs`: no-op for empty input, skip
  missing files, declared-order concatenation, skip empty files,
  truncate oversize files, plus an end-to-end test that the
  block appears in the assembled system prompt when configured.
2026-05-03 05:25:31 -05:00
Hunter Bown 5e83f073b1 feat(footer): cumulative session-elapsed indicator (#448)
Adds `App::session_started_at: Instant` (set at construction) and
a low-priority `worked Nh Mm` chip in the footer's right cluster
that surfaces session age once it crosses 60s.

* `footer_worked_chip(elapsed)` returns empty spans for the first
  minute of a session so a fresh launch doesn't render a noisy
  ticker. Above the threshold it reuses the multi-day
  `humanize_duration` helper (#447) so the band promotion stays
  consistent: `1m`, `3h 12m`, `2d 5h`, `1w 2d`.
* The chip slots in last in `auxiliary_spans`, which means under
  narrow widths it's the first thing the priority-drop loop
  removes — the existing chips (coherence / agents / replay /
  cache / mcp) keep their slots.
* `FooterProps` carries a captured `worked: Vec<Span<'static>>`
  built at props-build time (matches the existing `retry`
  capture pattern). Render stays pure, tests can pin a known
  state without relying on wall-clock.

Tests:
  3 new tests in `tui/widgets/footer.rs` — chip hidden under 60s,
  chip rendered with humanized labels at 60s / 3h 12m / 2d 5h
  bands. The existing `from_app_idle_state` test gains a
  `worked.is_empty()` assertion (the test app is freshly
  constructed, well under the 60s threshold).
2026-05-03 05:17:01 -05:00
Hunter Bown 6dfb10f321 feat(a11y): NO_ANIMATIONS env override + accessibility docs (#450)
`fancy_animations: false` and `low_motion: true` already exist on
the settings struct, but the flag was undocumented and the only
ways to opt in were the `/settings` slash command or hand-editing
`~/.config/deepseek/settings.toml` — there was no environment-
level signal that platform a11y tooling could carry forward.

* `NO_ANIMATIONS=1` env var now forces `low_motion = true` and
  `fancy_animations = false` at startup, regardless of what's on
  disk. Recognises `1`, `true`, `yes`, `on` (case-insensitive);
  any other value is treated as unset.
* `Settings::apply_env_overrides()` is now called at the end of
  `Settings::load()`, so every consumer (App::new, /config, the
  doctor surface) sees the override applied uniformly. The
  override is a startup-time overlay — changing the env var
  mid-session has no effect.
* New `docs/ACCESSIBILITY.md` documents the existing `low_motion`,
  `fancy_animations`, `calm_mode`, `show_thinking`, and
  `show_tool_details` toggles plus the `NO_ANIMATIONS` startup
  override. Includes guidance for screen-reader users and a link
  back to this issue for follow-up motion regressions.

Tests:
  3 new tests in `settings.rs` (force-low-motion-on, override-
  user-opt-in, truthy-spelling-recognition). All three serialise
  through a static Mutex so the cargo parallel runner doesn't
  observe interleaved env mutations.
2026-05-03 05:09:17 -05:00
Hunter Bown 3625b887fa feat(ui): humanize_duration handles hours, days, and weeks (#447)
Long-running sessions (multi-hour cycles, multi-day automations)
were rendering with the seconds/minutes-only formatter, so a
two-day session showed as `2880m 0s` and `/goal` status used
Rust's Debug Duration form (`188415.234s`).

`humanize_duration` now walks through w/d/h/m/s and caps the
output at two units so it stays compact in headers and
notifications:

* `45s`, `1m 12s`, `59m 59s` (existing seconds/minutes path)
* `1h`, `2h 2m`, `3h 12m` (was `192m 30s`)
* `1d`, `1d 1h`, `2d 5h` (the multi-day case from the issue)
* `1w`, `1w 1d`, `3w 2d` (long-running automation)

The two-tier rule drops sub-minute precision once you're past
the hour boundary; the goal is "is this a couple of hours or
days," not stopwatch precision.

`/goal` status now wires through this formatter so multi-day
goal-elapsed times read as `2d 3h` instead of the previous
`188415.234s` Debug form. The notification system was the
existing caller and picks up the new format automatically.

Tests:
  4 test functions in `notifications.rs` covering the four
  formatting bands (s/m, h/m, d/h, w/d) plus the boundary cases
  on each unit.
2026-05-03 05:05:30 -05:00
Hunter Bown 0b99ad1f25 feat(engine): wire tool-output spillover into the engine and pager (#500)
The spillover writer (#422) and inline cell annotation (#423) were
already in place; this commit makes the pipeline actually fire and
gives the user a way to see the elided tail.

* `apply_spillover` lives in `tools/truncate.rs` and mutates a
  `ToolResult` in place: writes the full content to
  `~/.deepseek/tool_outputs/<id>.txt`, replaces the inline content
  with a 32 KiB head plus a footer pointing at the file, and stamps
  `metadata.spillover_path` so downstream renderers can find it.
  Skips error results so the model still sees the failure verbatim.
  Preserves prior metadata when present.

* `core/engine/turn_loop.rs` calls `apply_spillover` immediately
  after `execute_tool_with_lock` returns, before the result fans
  out to the model context (`ContentBlock::ToolResult`) and the UI
  (`Event::ToolCallComplete`). Both the parallel and sequential
  tool paths get the same hook so the model and the UI always see
  the same truncated content.

* `tui/ui.rs::open_details_pager_for_cell` now folds the full
  spillover-file body into the tool-details pager when the focused
  cell has a `spillover_path`. Truncated head stays at the top
  (so the user can see what the model received) followed by a
  `── Full output (spillover) ──` separator and the file body.
  Missing files render an inline notice instead of silently
  truncating.

* The model's footer ("Use `read_file path=…` if you need the
  elided tail") teaches the agent how to recover the rest of the
  payload on its next turn, so spilled output is not lost — just
  not paid for in context tokens unless the agent decides it
  actually needs the tail.

Tests:
  4 new unit tests in `tools/truncate.rs` (no-op below threshold,
  no-op for errors, truncate + stamp above threshold, preserve
  prior metadata). 3 new tests in `tui/ui/tests.rs` for the pager
  helper (no-op without spillover_path, file-load happy path,
  graceful notice when the file is missing).
2026-05-03 05:02:11 -05:00
Hunter Bown 637d0f088f fix(agents): list Implementer/Verifier in agent_spawn + agent_assign schemas (#404)
The SubAgentType enum gained `Implementer` and `Verifier` variants in #404,
but the JSON-schema `description` strings on AgentSpawnTool::input_schema
and DelegateToAgentTool::input_schema still listed the pre-#404 set
(general/explore/plan/review/custom). The model only sees those
descriptions, so the new roles were effectively hidden behind a
docs lookup.

Updates both descriptions to the post-#404 surface and references
docs/SUBAGENTS.md for posture. Also adds the long-form aliases
(builder/validator/tester) to the agent_assign hint so it matches
the canonical alias map. Pure copy change — no behaviour delta.
2026-05-03 04:50:51 -05:00
Hunter Bown 482fcdee7c docs(changelog): collapse #422 + #423 spillover entry
Both halves now shipped; combined entry reads more clearly than two
separate ones split across Added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:45:03 -05:00
Hunter Bown de4085304d feat(tui): inline spillover-path annotation in tool cells (#423)
PR #422 (sister commit on this branch) shipped the storage half:
\`crates/tui/src/tools/truncate.rs\` writes large tool outputs to
\`~/.deepseek/tool_outputs/<id>.txt\` and the boot prune drops files
older than 7 days. This commit ships the UI half — the inline
annotation that surfaces the spilled path in the tool cell so the
user (and the model) can find the elided tail.

### What's wired

- New \`spillover_path: Option<PathBuf>\` field on \`GenericToolCell\`.
  Threaded through every construction site (production +
  test fixtures = 28 sites; bulk-updated via a Python regex that
  preserves indentation per site).
- \`tool_routing::push_orphan_tool_completion\` now reads
  \`ToolResult.metadata.spillover_path\` and stamps it on the cell.
  When tools start writing the metadata field (#500's wiring step),
  the annotation lights up automatically.
- \`GenericToolCell::lines_with_mode\` emits a one-line muted
  annotation in \`RenderMode::Live\` only:

      full output: /Users/you/.deepseek/tool_outputs/call-abc12.txt

  Transcript-mode replay omits the annotation because the full
  output is already inline.
- \`render_spillover_annotation\` truncates the path to fit narrow
  widths (40-col sidebar friendly) using the existing
  \`truncate_text\` helper.

### Why no OSC 8 hyperlink yet

The OSC 8 wrap-link helper lives on PR #515's branch (also stacked
on \`chore/v0.8.8-stabilization\`); both PRs land independently to
\`main\`. Once both are in, a follow-up commit can wrap the path
in \`osc8::wrap_link\` so supporting terminals make it
Cmd+click-openable. The plain-text path works in every terminal
today, so there's no functional regression.

### Tests

4 new tests in \`tui::history::tests\`:
- \`render_spillover_annotation_shows_path\` — full path appears in
  the live-mode render
- \`render_spillover_annotation_omitted_in_transcript_mode\` —
  transcript replay leaves the annotation off
- \`render_spillover_annotation_omitted_when_no_path_set\` — the
  common case (most tool results don't trigger spillover) is
  unaffected
- \`render_spillover_annotation_truncates_to_width\` — narrow
  widths don't overflow the line

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1877 + supporting (was 1873)

Closes #423. #500 (preview pane) now has both halves of its
prerequisites in place — the bytes are on disk (#422) and the path
is surfaced in the cell (#423).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:44:43 -05:00
Hunter Bown cea4617fb4 docs(changelog): record #422 spillover writer in v0.8.8 entry
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:38:17 -05:00
Hunter Bown cf616e03bd feat(tools): spillover-file writer + 7-day boot prune (#422)
#500 (tool-output spillover preview pane) explicitly depends on
#422 (the storage writer) and #423 (the UI annotation). This ships
the storage half so the other two unblock cleanly.

### What's wired

New module \`crates/tui/src/tools/truncate.rs\`:

- \`spillover_root()\` — resolves \`~/.deepseek/tool_outputs/\`.
- \`spillover_path(id)\` — sanitises the tool call id (ASCII
  alphanumerics + \`-\`/\`_\`, drops \`.\` so \`..\` can't escape) and
  returns \`<root>/<id>.txt\`.
- \`write_spillover(id, content)\` — atomic write via the existing
  \`utils::write_atomic\` helper. Creates parent directory on demand.
- \`prune_older_than(max_age)\` — drops files older than \`max_age\`
  by mtime. Returns the count pruned. Per-file errors are logged
  and skipped, never propagated.
- \`maybe_spillover(id, content, threshold, head_bytes)\` —
  convenience for the "too long? spill it." pattern. Walks back to
  the previous UTF-8 char boundary so the head slice is always
  valid \`str\`.

Constants:
- \`SPILLOVER_DIR_NAME = "tool_outputs"\`
- \`SPILLOVER_THRESHOLD_BYTES = 100 KiB\` (mirrors
  \`MAX_MEMORY_SIZE\` for cross-feature consistency)
- \`SPILLOVER_MAX_AGE = 7 days\` (mirrors workspace snapshot prune)

Boot wiring in \`run_interactive\` calls \`prune_older_than\`
unconditionally; non-fatal — any error is logged at WARN and the
TUI starts anyway.

### Module-level \`#[allow(dead_code)]\`

The boot-prune is the only live caller today. The storage helpers
(\`write_spillover\`, \`maybe_spillover\`, \`spillover_path\`) are
intentionally unused outside the module's own tests until #423 / #500
land — those follow-ups need the bytes to exist, and the contracts
are pinned by tests so they can't drift before then. Module-level
\`#![allow(dead_code)]\` documents the deferral with a comment
pointing at the follow-up issues.

### Tests

8 unit tests in \`tools::truncate::tests\`:
- \`sanitise_id\` keeps safe chars, drops dangerous ones (\`..\`,
  \`/etc/passwd\` traversal attempts).
- \`write_spillover\` creates the directory and writes content.
- \`write_spillover\` rejects empty / fully-invalid ids.
- \`maybe_spillover\` returns \`None\` below threshold.
- \`maybe_spillover\` writes + returns the head slice above
  threshold.
- \`maybe_spillover\` walks back to a char boundary so the head
  string is never mid-codepoint (regression test using 4-byte
  whale emojis).
- \`prune_older_than\` is a no-op when the root doesn't exist.
- \`prune_older_than\` keeps fresh files and drops stale ones via a
  Unix \`utimensat\` backdating helper.

Tests serialize through a static \`Mutex\` because they share
process-global \`$HOME\`; the \`with_test_home\` helper documents
the SAFETY contract for the env-var override.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1873 + supporting (was 1865)

Closes #422 (storage half). #423 and #500 remain open with the
bytes now reachable on disk for them to cite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:38:00 -05:00
Hunter Bown 01fa11b96f docs(changelog): note /sessions prune slash command in #406 entry
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:31:18 -05:00
Hunter Bown 89500e4ebe feat(commands): /sessions prune <days> slash command (#406 phase-1.5)
The previous commit shipped \`SessionManager::prune_sessions_older_than\`
as a bare helper marked \`#[allow(dead_code)]\` pending phase-2 wiring.
This commit wires it into a user-callable slash command so users can
clean up stale sessions today, and removes the dead-code allow.

### Surface

\`\`\`
/sessions                      → open the picker (existing)
/sessions show | list | picker → alias for the picker
/sessions prune <days>         → drop sessions older than N days
\`\`\`

\`/sessions prune 30\` returns "pruned N sessions older than 30d" or
"no sessions older than 30d to prune". Errors:
- missing arg → usage hint
- non-positive / non-integer arg → typed error
- unknown subcommand → typed error with usage

The prune handler builds a fresh \`SessionManager\` from
\`default_location\` so it reads the same \`~/.deepseek/sessions/\`
directory the persistence layer writes; doesn't take a lock since
it's a one-shot CLI-style operation that runs to completion.

### What changed

- \`commands::session::sessions\` now takes \`arg: Option<&str>\`
  and dispatches \`show\` / \`prune\` / unknown.
- New \`prune\` private fn parses the days argument, opens
  \`SessionManager::default_location\`, calls
  \`prune_sessions_older_than\` with the corresponding \`Duration\`.
- \`commands::COMMANDS\` table updated: usage now reads
  \`/sessions [show|prune <days>]\`.
- \`commands::mod\` dispatch arm passes \`arg\` through.
- \`SessionManager::prune_sessions_older_than\` doc comment updated
  to reflect the live wiring; \`#[allow(dead_code)]\` removed.

### Tests

5 new tests in \`commands::session::tests\`:
- existing \`test_sessions_pushes_picker_view\` updated to the new
  signature
- \`test_sessions_show_subcommand_pushes_picker_view\` —
  \`/sessions show\` is an explicit alias for the picker
- \`test_sessions_prune_requires_days_argument\` — missing arg
  produces usage hint
- \`test_sessions_prune_rejects_non_positive_days\` — \`0\`,
  negative, non-numeric, and decimal inputs are all rejected
- \`test_sessions_unknown_subcommand_errors\` — typo path errors
  with subcommand list

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1865 + supporting

Refines #406 — phase 1.5 (user-callable surface) shipped on top of
phase 1 (helper). Phase 2 (boot-prune + retention policy) stays open
for v0.8.9 once the policy is decided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:31:00 -05:00
Hunter Bown 2fa23c1d74 docs(changelog): record session-prune helper + doctor memory block
Two items added in this stabilization pass that weren't yet in the
changelog:
- SessionManager::prune_sessions_older_than (#406 phase-1)
- doctor --json memory block (#489 follow-up)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:27:12 -05:00
Hunter Bown 220f1b30c5 feat(sessions): SessionManager::prune_sessions_older_than helper (#406 phase-1)
#406 asks for an auto-archive system for old session state. The full
design needs prior-art survey + retention-policy decisions that are
explicitly out of scope for v0.8.8. This commit ships the **building
block** — a tested public method that removes session files older
than a given Duration — so phase 2 can wire it into a config-knob
boot prune without re-litigating the implementation.

\`\`\`rust
pub fn prune_sessions_older_than(
    &self,
    max_age: std::time::Duration,
) -> std::io::Result<usize> { ... }
\`\`\`

Behaviour:
- Compares against the metadata's \`updated_at\` (not filesystem
  mtime — the user may have rsynced \`~/.deepseek\`; fs mtimes can
  lie about real session age).
- Returns the count pruned; failures on individual files are
  logged at WARN and skipped, not propagated, so one bad record
  doesn't block the rest.
- Skips the checkpoint subdirectory entirely. Top-level
  \`<session_id>.json\` files are the only candidates;
  \`checkpoints/latest.json\` and friends are owned by the
  checkpoint subsystem and live with stricter durability rules.
- Marked \`#[allow(dead_code)]\` with a comment pointing at #406
  phase 2 — the helper exists today, the production wiring lands
  next.

### Tests

5 new tests in \`session_manager::tests\`:
- empty directory returns zero
- all-fresh records survive
- all-stale records get removed
- mixed directory removes only the stale ones
- checkpoint subdirectory is left alone (file untouched, count is
  still 1 for the top-level stale record)

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1861 + supporting

Refines #406 — phase 1 (helper + tests) shipped. Issue stays open
for the v0.8.9 phase-2 work that decides the retention policy and
boot-prune wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:26:35 -05:00
Hunter Bown 8e7664bc70 docs(changelog): populate [Unreleased] with v0.8.8 stabilization entries
Catalogues the 24 v0.8.8 issues shipped across PRs #514 / #515 /
#517 / #518 / #519 in the standard Keep-a-Changelog format,
organized into Added / Changed / Fixed buckets with issue
cross-references.

Captured:
- Added (10): memory MVP + remember tool, inline diff, OSC 8,
  retry banner, MCP chip, project config overlay,
  Implementer/Verifier roles, two doc files, competitive analysis
- Changed (8): sub-agent cap, RwLock, output summarization,
  agent_list session boundary, concise todos, compact agent_spawn,
  Plan panel, RLM family
- Fixed (8): self-update arch, Option+Backspace word delete,
  offline queue scope, display_path Windows, footer theme color,
  panic-exit keyboard flags, CI workflow cleanup, plus the v0.8.8
  release-base fix

Plus a Releases callout reminding maintainers that the npm wrapper
publish stays manual and the GitHub release automation depends on
the \`RELEASE_TAG_PAT\` secret.

The dated section header lands when the actual version-bump
commit fires \`auto-tag.yml\`. This commit just populates the
[Unreleased] body so contributors get a clean summary while the
PRs are still in review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:19:09 -05:00
Hunter Bown 1b7939e680 feat(doctor): surface memory feature state in --json output (#489)
Operators ask "is memory on?" and "where does it live?" without
wanting to boot the TUI. Adds a \`memory\` block to the JSON doctor
report:

\`\`\`json
"memory": {
  "enabled": false,                                  // honours DEEPSEEK_MEMORY env
  "path": "/Users/you/.deepseek/memory.md",          // expanded path
  "file_present": false                              // does the file exist on disk?
}
\`\`\`

The \`enabled\` field reads \`DEEPSEEK_MEMORY\` directly so it stays
correct on this stabilization branch even though the dedicated
\`Config::memory_enabled()\` accessor lives on the memory-MVP branch
(#518). When both PRs land, the duplicated env-parse can collapse to
a single method call — TODO comment marks the spot.

Verified:
- \`deepseek doctor --json\` shows \`enabled: false\` by default
- \`DEEPSEEK_MEMORY=on deepseek doctor --json\` shows \`enabled: true\`
- All gates green (1856 main + supporting)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:16:01 -05:00
Hunter Bown 8071bce319 docs: MEMORY.md — user-facing memory documentation (#489)
The memory MVP shipped in PR #518 added three surfaces (\`# \` quick-add,
\`/memory\` slash command, \`remember\` model tool) plus the opt-in
toggle, but the only user-facing reference today is the one-line
mention of \`memory_path\` in CONFIGURATION.md and the \`#489\` cross-
reference in SUBAGENTS.md. This commit adds a dedicated user-facing
doc covering the whole feature.

Coverage:

- Why opt-in by default
- How to enable (env var + config.toml)
- What the system prompt block looks like
- Three ways to add to memory:
  1. \`# foo\` composer prefix (#492)
  2. \`/memory\` slash command (#491) — show / path / clear / edit
  3. \`remember\` tool (#489) — model-callable, auto-approved
- File format — timestamped Markdown bullets, hand-editable
- What stays out of memory — secrets / transient state / long
  instructions / conversation snippets
- Privacy and scope — per-user, never uploaded, provider-agnostic
- Configuration reference — settings table with defaults and overrides

Cross-link added in CONFIGURATION.md so the existing \`memory_path\`
mention now points at the full feature doc.

No Rust code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:10:55 -05:00
Hunter Bown d129ab4150 docs: SUBAGENTS.md — role taxonomy, lifecycle, output contract (#404)
The role taxonomy expansion in #404 added Implementer + Verifier as
distinct postures alongside General / Explore / Plan / Review /
Custom. The issue body explicitly lists \`docs/AGENTS.md or
docs/SUBAGENTS.md\` as a target file; this commit creates that file.

Coverage:

- Role taxonomy table — stance, write/shell access, typical use per
  role.
- "When to pick which role" — narrative guidance the model can read
  if the role choice isn't obvious.
- Alias map — every accepted spelling routed to a canonical role,
  matching what \`SubAgentType::from_str\` accepts.
- Concurrency cap — the 10-by-default value, the
  \`[subagents].max_concurrent\` knob, and the running-only
  semantics (#509).
- Lifecycle — Pending → Running → terminal states, plus
  \`Interrupted\` after a process restart.
- Session boundaries (#405) — \`session_boot_id\` mechanics,
  default current-session filter, \`include_archived=true\` escape
  hatch, pre-#405 record handling.
- Output contract — the SUMMARY/CHANGES/EVIDENCE/RISKS/BLOCKERS
  format every sub-agent must produce.
- Memory + \`remember\` integration (#489) — sub-agents inherit the
  parent's memory file when memory is enabled and can append durable
  notes.
- Implementation notes — source path, persisted state file,
  is_running semantics, RwLock pattern.

Cross-link added in \`docs/TOOL_SURFACE.md\` so the sub-agent section
points to this doc.

No Rust code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:09:27 -05:00
Hunter Bown 2f9b58b910 fix(agents): include Implementer/Verifier aliases in error message hint (#404)
The "Invalid sub-agent type" error message lists the accepted type
strings so the model can self-correct. The list still showed the
original 5 names plus their aliases — adding the new types' canonical
names and aliases keeps the error helpful when the model misspells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:06:43 -05:00
Hunter Bown 1ae042d56b feat(agents): add Implementer and Verifier sub-agent roles (#404)
The existing taxonomy (General / Explore / Plan / Review / Custom)
covered "do something" / "go look" / "think first" / "grade work" /
"explicit allowlist" but had no distinct posture for two of the most
common patterns:

- **Implementer** — "land this change with the minimum surrounding
  edit". Distinct from General's flexible posture and Plan's
  no-execution stance.
- **Verifier** — "run the test suite and report pass/fail with
  evidence". Distinct from Review's read-and-grade stance.

Per the issue body's guidance ("avoid a large undifferentiated role
list") this PR adds **only those two**. Researcher and ReleaseManager
stay open as v0.8.9 candidates if user demand surfaces.

### What's wired

- Two new \`SubAgentType\` variants: \`Implementer\`, \`Verifier\`.
- New prompt constants \`IMPLEMENTER_AGENT_PROMPT\` and
  \`VERIFIER_AGENT_PROMPT\` with role-specific posture (write-the-
  minimum-edit / run-the-tests-don't-fix-them).
- \`from_str\` accepts the obvious aliases:
  \`implementer\` / \`implement\` / \`implementation\` / \`builder\`
  for Implementer; \`verifier\` / \`verify\` / \`verification\` /
  \`validator\` / \`tester\` for Verifier. Case-insensitive like the
  existing aliases.
- \`as_str\` round-trips: every variant's label parses back to the
  same variant via \`from_str\`. Test pins this so a future role
  addition can't accidentally drop the round-trip.
- Deprecated \`allowed_tools()\` advisory list updated:
  Implementer carries write/edit/patch + shell + checklist tools;
  Verifier carries read + shell + run_tests + diagnostics but
  **no** write tools.
- \`crates/tui/src/tui/views/mod.rs\` agent-type sort order extended
  to include the new variants; \`format_agent_type\` now delegates to
  \`as_str\` so future additions land in one place.

### Tests

- 6 new tests in \`tools::subagent::tests\`:
  - alias coverage for Implementer (4 aliases) and Verifier (5)
  - round-trip via \`as_str\` for **all** variants — guards against
    forgetting to register a new variant in either direction
  - distinct-prompts guard: catches the copy-paste bug where two new
    variants would inherit the same prompt as General
  - Implementer's advisory list contains write tools
  - Verifier's advisory list contains test-runner tools but NO writes

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1856 + supporting

Closes #404 (minimal-taxonomy interpretation per the issue body).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:05:26 -05:00
Hunter Bown 68ec91999b feat(tui): clarify Plan panel role + drop empty-state placeholder (#408)
The Plan panel used to render a blunt "No active plan" line whenever
the model hadn't called \`update_plan\` yet — even when the panel had a
goal or a cycle counter to show above it. That made the panel look
broken on every fresh session.

Per the audit posted on the issue (option 1 of three), this PR keeps
the Plan panel as the **single source of truth for the \`update_plan\`
tool's output** and drops the placeholder when the panel is fully
quiet, replacing it with a one-line hint that explains what the panel
tracks. When a goal or cycle counter is already showing above, the
empty-steps body collapses entirely so the section doesn't look
ambiguous next to populated content.

The panel's role is documented in a doc comment on
\`render_sidebar_plan\` so the next person doesn't have to re-derive it
from the issue tracker.

### What's wired

- \`render_sidebar_plan\` checks "anything above" (goal +
  cycle_count) before deciding whether to emit the empty-state hint.
  If either is showing, the empty steps body adds nothing.
- New \`plan_panel_empty_hint(width)\` helper composes the hint
  string with proper width-aware truncation.
- New module-level doc comment explains the Plan panel's role
  (update_plan output + /goal + cycle counter) and contrasts it
  with Todos.

### Tests

- 3 new tests in \`tui::sidebar::tests\`:
  - hint mentions \`update_plan\` and \`/goal\` so the user
    understands what populates the panel
  - hint truncates correctly to a 16-column sidebar without
    overflowing
  - regression guard: the hint never re-introduces "no active plan"
    wording

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1850 + supporting

Closes #408 (option 1 of the audit). Options 2 (merge with todos)
and 3 (move to top-row chip) remain open as v0.9.0 design candidates
once #504's right-panel work is on the table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:00:47 -05:00
Hunter Bown 256f59dd33 feat(agents): session-boundary classification for sub-agents (#405)
\`agent_list\` previously surfaced every persisted sub-agent the manager
had on disk — including agents from prior sessions still hanging around
in \`subagents.v1.json\`. In long-lived sessions this piled up and the
model had to reason past 13 listed agents when only four were live.

Now: each \`SubAgentManager\` assigns a fresh \`session_boot_id\` at
construction. Every agent it spawns is stamped with that id, persisted
alongside the existing fields, and reloaded as-is by future managers.
At list time the manager classifies any agent whose stamp doesn't match
the current id (or that loaded with no stamp at all from pre-#405
records) as \`from_prior_session\`.

\`agent_list\` defaults to the current-session view: prior-session
agents are dropped from the listing **unless** they're still
\`Running\` (which can happen after a process restart — the manager
flagged them as \`Interrupted\` on load). Pass \`include_archived=true\`
to surface every record, with the \`from_prior_session\` flag on each
result so the model can tell live vs archived apart at a glance.

### What's wired

- \`SubAgentManager::current_session_boot_id\` — UUID-derived,
  generated in \`new\`.
- \`SubAgent::session_boot_id\` and \`PersistedSubAgent::session_boot_id\` —
  the latter \`#[serde(default)]\` for backward compat (pre-#405 records
  load with empty string and classify as prior-session).
- \`SubAgentResult::from_prior_session\` — \`#[serde(default,
  skip_serializing_if = "is_false")]\` so today's clients reading
  archived snapshots see the field, while default-false snapshots
  serialize without an extra noisy key.
- \`SubAgentManager::list_filtered(include_archived)\` — the new
  user-facing API. \`SubAgent::snapshot()\` still defaults the flag
  to \`false\`; \`snapshot_for_listing\` (manager-only) fills it in.
- \`AgentListTool\` accepts \`include_archived: bool\` (default
  false) and routes through the filter. The model-facing description
  explains the behaviour.

### Tests

- \`session_boot_ids_are_unique_per_manager\` — each manager mints
  its own id.
- \`list_filtered_drops_prior_session_terminals_by_default\` — the
  three-agent matrix (current running / prior completed / prior
  running) collapses to the right two with the right flags.
- \`list_filtered_with_include_archived_returns_everything\` —
  archived view returns all records with correct flags.
- \`agents_with_empty_boot_id_classify_as_prior_session\` — pre-#405
  records load and behave as expected.
- \`persist_round_trip_preserves_session_boot_id\` — write with one
  manager, reload with a fresh manager, confirm the agent flips to
  prior-session in the new manager's view.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1847 + supporting

Closes #405

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:57:06 -05:00
Hunter Bown b54a708cf7 feat(tui): compact agent_spawn rendering — single line, DelegateCard owns the rest (#409)
The transcript previously rendered each \`agent_spawn\` call as a 3-4 line
generic tool block (header + name kv + args summary + output JSON) AND
its companion \`DelegateCard\` (header + live action lines + summary).
Four parallel spawns produced ≥16 lines of nearly-identical scaffolding
before the model said anything useful.

In live mode \`agent_spawn\` now renders as a single header line —
\`◐ delegate · agent-abc12  [running]\` — with the agent id pulled from
the tool's JSON output. The DelegateCard remains the source of truth
for live action progress and the final summary; the generic block is no
longer fighting it for attention.

Transcript-mode replay (used by \`/pager\`, session export, and the
detail pager opened with Alt+V) keeps the full multi-line block so
debug history is preserved.

### What's wired

- \`GenericToolCell::lines_with_mode\` early-returns
  \`render_agent_spawn_compact\` when \`name == "agent_spawn"\` and
  \`mode == RenderMode::Live\`.
- New \`render_agent_spawn_compact\` builds one header line with the
  family glyph (Delegate), the spawned agent id (or \`…\`
  placeholder while the spawn is in flight), and the tool status.
- New \`extract_agent_id(output)\` parser: deterministic, allocation-free
  string scan for \`"agent_id"\` → quoted value. Avoids dragging serde
  into a render hot path.

### Tests

- 4 \`extract_agent_id\` tests: typical JSON, extra whitespace, missing
  key (None), empty string id (None).
- 4 render tests: live one-liner contains agent id + status with no
  args/name kv leaking, pending render uses \`…\` placeholder,
  transcript mode keeps the full block, non-spawn tools (read_file)
  still render their normal multi-line block.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1842 + supporting

Closes #409

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:44:19 -05:00
Hunter Bown c52f2c46f1 feat(tui): concise todo / checklist update rendering (#403)
When the model fires \`todo_update\` / \`checklist_update\` repeatedly
during a long run, the live transcript previously dumped the full
checklist card (header + every item + progress) on every call. In
sessions with 20+ items and a dozen status flips the same item list
appears over and over, drowning the actual work.

Now: when a checklist update output starts with the
"Updated todo #N to STATUS" prefix the tool already emits, the live
renderer shows a compact one-line state-change card —
\`Todo #N: <title> → STATUS\` — plus a \`M/N · pct%\` summary in the
header and a \`N items (Alt+V for full list)\` affordance underneath.
The full item list is still reachable via the existing detail pager.

Falls back to the full-card render path for:
- \`todo_write\` / \`checklist_write\` (no "Updated" prefix — first
  emission of the list)
- transcript-mode replays (the user wants the full snapshot when
  scrolling history)
- malformed prefixes (parse failure → fall through, never crash)

### What's wired

- New \`parse_update_prefix(output)\` parser handles both
  \`Updated todo #N to STATUS\` and \`Updated checklist #N to STATUS\`
  forms.
- New \`render_checklist_change_card\` builds the compact card. Looks
  up the title from the snapshot's items array (id is 1-indexed),
  falls back to \`(missing title)\` if the id is out of range.
- \`try_render_as_checklist\` calls the change-card path only in
  \`RenderMode::Live\` and only when the parser matches. Pre-existing
  cases (writes, transcript replay) keep the full-card behavior.

### Tests

- 4 parser tests: todo form, checklist form, write outputs falling
  through, malformed prefixes falling through.
- 2 renderer tests: compact card shows only the changed item (with
  assertions that other titles do NOT appear), missing-title path.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1834 + supporting

Closes #403

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:36:21 -05:00
Hunter Bown 4d4a9b424c feat(config): expand per-project overlay to cover provider, sandbox, approval, mcp_path, max_subagents, allow_shell (#485)
The project-config overlay (`<workspace>/.deepseek/config.toml` merged
on top of the user's global `~/.deepseek/config.toml`) was already
wired but only carried four string fields: model, api_key, base_url,
reasoning_effort. The use cases users actually file under #485 — "this
repo wants a different sandbox / approval policy / MCP server set / hard
sub-agent cap" — weren't covered.

### What ships

Adds the following keys to the project overlay, all merged with
identical "non-empty wins" semantics for strings:

- `provider` — pick a different backend per repo (e.g. `nvidia-nim` for
  an enterprise repo, `deepseek-cn` for a CN-team repo).
- `approval_policy` — `never` / `on-request` / `untrusted` for repos
  with strict policies.
- `sandbox_mode` — `read-only` / `workspace-write` / `danger-full-access`.
- `mcp_config_path` — per-repo MCP server set without touching the
  user's global file.
- `notes_path` — keep notes in-repo for projects where the notes tool
  is part of the dev workflow.

Plus two non-string fields:

- `max_subagents` (positive integer; clamped to `1..=MAX_SUBAGENTS=20`).
- `allow_shell` (bool).

### What stays user-global

`skills_dir`, `hooks`, `[capacity]`, `[retry]`, `[memory]`, etc. — those
are user-shaped settings, not repo-shaped. If a future use case
demands per-project values for any of them, a follow-up PR can extend
the overlay rather than letting the boundary blur.

### Tests

- 8 new tests in `project_config_tests` covering: provider+model,
  approval+sandbox, max_subagents+allow_shell, max_subagents
  clamping, negative-max_subagents rejection, missing config file
  pass-through, malformed TOML pass-through, and empty-string
  no-op.

### Docs

- New "Per-project overlay (#485)" section in `docs/CONFIGURATION.md`
  with a table of supported keys and the rationale for which fields
  stay user-global.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1828 + supporting (was 1820)

Closes #485

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:25:43 -05:00
Hunter Bown a723ddd63d feat(tui): MCP server health chip in footer (#502)
Adds a small `MCP M/N` chip to the footer's right-side auxiliary cluster
so users with MCP servers see at-a-glance health without running `/mcp`.
The chip is color-coded by reachability:

- all configured servers reachable     → success (sky)
- some reachable, some failed          → warning (amber)
- zero reachable but ≥1 configured     → error (red)
- configured but no live snapshot yet  → muted (gray, count only)

When zero servers are configured the chip is hidden entirely; users who
don't use MCP see no change.

### What's wired

- New `App::mcp_configured_count`, populated at app boot from
  `mcp::load_config(&mcp_config_path)`. Cheap (just reads the JSON
  config; no server connections), so it doesn't block startup.
- `app.mcp_snapshot.servers.iter().filter(|s| s.connected).count()`
  drives the live-state portion when the user has run `/mcp` at
  least once.
- `FooterProps` gains an `mcp: Vec<Span<'static>>` field built by
  `footer_mcp_chip(connected, configured)`. Threaded into
  `auxiliary_spans` so it participates in the priority-drop pipeline at
  narrow widths.
- After any `/mcp` action that returns a snapshot, the count is
  refreshed from the snapshot so post-edit state is reflected.

### #499 follow-up: pure render path

Moves the retry-status capture into `FooterProps` (`pub retry:
RetryState`) sampled in `from_app`, instead of pulling from the global
surface inside `render`. The render method is now pure with respect to
its props — fixes a parallel-test race where retry-banner tests and
unrelated footer tests would observe each other's writes through the
process-wide retry surface.

### Tests

- 5 unit tests in `footer_mcp_chip`: hidden when zero, count-only when
  no snapshot, success / warning / error colours by reachability.
- Existing retry-banner tests now pin `props.retry` directly rather
  than mutating the global surface — no more `test_guard()` needed,
  no more parallel-runner flakes.
- All 31 footer tests pass in parallel.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1820 + supporting

Closes #502

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:16:27 -05:00
Hunter Bown 8680a43298 feat(tui): visual retry/backoff countdown in footer (#499)
When the API client retries a 429 / 5xx / network failure, the TUI
previously went silent during the backoff sleep. The user saw "thinking"
or "ready" with no signal anything was wrong, until the request finally
either succeeded or raised. PR adds a foreground retry banner so the
user sees what's happening and how long until the next attempt.

### What ships

- New `crates/tui/src/retry_status.rs` module exposing a process-wide
  `RetryState` (`Idle | Active(banner) | Failed { reason }`) with
  `start`, `succeeded`, `failed`, and `clear` helpers. The state is
  process-global because the user-facing TUI is one engine per process;
  sub-agent retries deliberately don't light up the foreground banner.
- `client::send_with_retry` now flips the state in its retry callback
  (`start(attempt+1, delay, reason)`) and on the final outcome
  (`succeeded()` on Ok, `failed(reason)` on Err with retries-exhausted,
  `clear()` on Err with attempts==1 so non-retryable errors don't pin
  the failure row).
- `human_retry_reason` translates the structured `LlmError` into a
  short label: rate-limit reasons include the `Retry-After` header
  when the upstream provided one ("rate limited (Retry-After 30s)").
- Footer's `render` checks `retry_status::snapshot()` first; when
  `Active` it renders `⟳ retry N in Ms — <reason>` in the warning
  color; when `Failed` it renders `× failed: <reason>` in the error
  color. Banner takes precedence over the toast and the regular
  status line.
- `Engine::handle_user_message` calls `retry_status::clear()` right
  after emitting `TurnStarted` so the previous turn's failure row
  doesn't bleed into a new turn.

### Tests

- 4 unit tests in `retry_status::tests` covering idle default, the
  active → succeeded round-trip, the failed-state pin, and a
  past-deadline saturation.
- 2 footer rendering tests asserting the banner / failure-row text
  appears in the rendered buffer.
- All tests touching the global retry surface serialize through
  `retry_status::test_guard()` so cargo's parallel runner can't observe
  a torn read.

### Verification

cargo fmt --all -- --check                                          ✓
cargo clippy --workspace --all-targets --all-features --locked --   -D warnings   ✓
cargo test --workspace --all-features --locked                      ✓ 1815 + supporting (was 1809 on stabilization base)
cargo test -p deepseek-tui --bin deepseek-tui --locked retry        ✓ 28 passed

Closes #499

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 03:08:53 -05:00