Commit Graph

101 Commits

Author SHA1 Message Date
20bytes 8aed1bb674 memory: polish help and docs (#569)
- add /memory help and clearer invalid-subcommand guidance
- register /memory in shared slash-command help
- align memory docs with current behavior and config
- add focused tests for help and discovery
2026-05-04 02:25:13 -05:00
Hunter Bown fc1970fa55 fix(auth): use config-backed setup without credential prompts 2026-05-03 23:02:11 -05:00
Hunter Bown 84c55e9022 chore(release): bump version to 0.8.8
- Workspace `version = "0.8.8"` in root `Cargo.toml`.
- 31 internal `deepseek-*` path-dep version pins across the
  9 crates that declare them.
- `npm/deepseek-tui/package.json` `version` and
  `deepseekBinaryVersion` both updated.
- `Cargo.lock` regenerated for the new workspace version.
- `CHANGELOG.md` `[Unreleased]` heading promoted to
  `[0.8.8] - 2026-05-03`.

`scripts/release/check-versions.sh` reports the workspace, npm
wrapper, and lockfile all aligned. Pushing this to `main` should
fire `auto-tag.yml`, which creates the `v0.8.8` tag with
`RELEASE_TAG_PAT`. The tag triggers `release.yml` to build the
matrix and draft the GitHub Release. The npm wrapper publish
remains manual (npm 2FA OTP requirement).

What ships in v0.8.8
====================

The full polish stack already merged via PRs #514 (stabilization),
#515 (OSC 8 hyperlinks), #517 (inline diff render), #518 (user
memory MVP), #519 (foreground polish + per-project overlay +
security + Windows redraw fix), and #508 (Linux ARM64 prebuilts +
install docs). See `CHANGELOG.md` and the README "What's new in
v0.8.8" section for the full list.
2026-05-03 08:55:41 -05:00
Hunter Bown bda30b0fd6 Merge main into feat/v0.8.8-tui-polish + gemini-code-assist feedback
Resolves the post-#514/#517/#518 conflicts:

- CHANGELOG.md: kept both polish-stack and Linux ARM64 entries under
  [Unreleased]; reordered so the ARM64/install-message Changed/Docs
  sections precede the Releases footer.
- config.example.toml: kept both the `instructions = [...]` example
  and the `[memory]` opt-in stanza in sequence.
- crates/tui/src/config.rs: kept both `instructions_paths()` (#454)
  and `memory_enabled()` (#489) on the Config impl.
- crates/tui/src/prompts.rs: extended
  `system_prompt_for_mode_with_context_and_skills` to take BOTH
  `instructions: Option<&[PathBuf]>` and `user_memory_block:
  Option<&str>`. Section 2.5a renders instructions; 2.5b renders the
  memory block — both above the skills block so KV prefix caching
  still wins.
- crates/tui/src/core/engine.rs: thread both args through the two
  call sites.
- crates/tui/src/prompts.rs: update the `system_prompt_for_mode_with_context`
  forwarder and the test caller to pass `None` for the new arg.
- .gitignore: ignore `.claude/*.local.md` and `*.local.json` so
  local ralph / Claude-Code notes can't leak into commits.

Folds in two valid suggestions from the gemini-code-assist review on #519:

- `client.rs`: collapse the duplicated `LlmError → label` match and the
  `human_retry_reason` body into a single
  `retry_reason_label_and_human(err) -> (&'static str, String)` helper.
- `widgets/footer.rs::retry_banner_spans`: merge the two separate
  `match &props.retry` blocks into one that returns both `(label, color)`.

Behavior is unchanged; refactor is a pure DRY win.
2026-05-03 08:29:59 -05:00
Hunter Bown d9701c1dde perf(tui): lock composer height while slash/mention menu is open
User feedback (Windows 10 PowerShell + WSL, Telegram thread): typing
through `/skill` feels visibly laggy because every keystroke shrinks
the matched-entry list, which shrinks the composer panel, which
forces the chat area above to repaint cells. On Unix terminals the
work is invisible; on the Windows console backend the per-cell write
cost makes it noticeable.

Fix: when the slash- or mention-menu is open, `desired_height`
reserves the panel's worst-case envelope (`composer_max_height`) for
the whole menu session instead of tracking the matched-entry count.
The chat-area Rect stays stable, so ratatui's diff renderer skips
the cells above the composer entirely. The menu itself still renders
only the entries that actually match — extra rows are panel padding
inside the same Rect.

`render()` and `cursor_pos` route through the same locked-budget
calculation so the input stays at the top of the panel and the
cursor lands on the row the input is drawn on. New unit test pins
the invariant: 5-match and 1-match menus produce the same composer
height; closing the menu releases the reserved rows.
2026-05-03 08:02:23 -05:00
Hunter Bown f6c7a36076 feat(execpolicy): heredoc body parsing in normalize_command (#419)
`normalize_command` now strips heredoc bodies before shlex tokenization
so a user's `auto_allow = ["cat > file.txt"]` pattern matches the
heredoc form `cat <<EOF > file.txt\nbody\nEOF` cleanly. Recognises the
common forms (`<<DELIM`, `<<-DELIM`, `<<'DELIM'`, `<<"DELIM"`) while
leaving the here-string operator (`<<<`) untouched.

Six unit tests cover: simple body strip, dash form, quoted delimiter,
non-heredoc passthrough, here-string preservation, and the end-to-end
pattern-match path.
2026-05-03 07:44:43 -05:00
Hunter Bown 604edc9f83 feat(tls): honor SSL_CERT_FILE for corporate-CA / MITM proxies (#418)
Corporate users behind TLS-inspecting proxies (Zscaler, Netskope,
Palo Alto, in-house mitmproxy fleets) need to add the proxy's
intermediate CA to the trusted-roots set so the deepseek client
doesn't fail with `unable to get local issuer certificate`.

The reqwest builder already trusts the platform's system store
via native-tls. This adds opt-in support for the conventional
`SSL_CERT_FILE` env var so users can point at their own bundle:

* New `add_extra_root_certs(builder, path)` helper reads the
  file, tries `Certificate::from_pem_bundle` (covers single-cert
  files too), falls back to `from_der` for binary cert files.
* Wired into `build_http_client` when `SSL_CERT_FILE` is set
  and non-empty. Failures log a warning via the existing
  `logging::warn` channel and return the builder unchanged —
  the existing system trust still applies, so a malformed env
  var degrades gracefully instead of bricking the launch.
* Each successful load logs `info` with the cert count so
  operators can confirm their bundle was picked up.

Documented in `docs/CONFIGURATION.md`'s environment-variables
list alongside the existing TLS-related notes.

No new dependency — reqwest's `native-tls` feature already
exposes `Certificate::from_pem_bundle` / `from_der`.
2026-05-03 07:35:23 -05:00
Hunter Bown 6566a59097 feat(security): deny loosest approval/sandbox values at project scope (#417)
Continues #417 by closing the value-level escalation case for
the two pure-loosening values:

* `approval_policy = "auto"` would auto-approve every tool
  call that the user's stricter setting (\`suggest\`, \`never\`,
  etc.) was prompting on. Pure escalation; project should
  never be able to set this.
* `sandbox_mode = "danger-full-access"` exits the workspace
  sandbox entirely. Pure escalation; project should never be
  able to set this.

Both denies are unconditional at project scope — the user's
prior value (or absence) doesn't matter. The denied value
emits a stderr warning so users see the deny.

Sub-tightening comparisons (e.g. user `"never"` → project
`"on-request"` is allowed even though it loosens) stay
v0.8.9 follow-up because they need a richer ordering check
across all `approval_policy` / `sandbox_mode` values.

Tests:

* `project_overlay_denies_approval_auto_and_sandbox_danger_values`
  exercises both escalation values in the same merge and
  confirms a non-escalation field on the same project file
  still applies.
* `project_overlay_preserves_user_strict_value_when_project_tries_to_loosen`
  exercises the belt-and-suspenders case: user has
  `approval_policy = "never"`, project tries `"auto"`, the
  user's strict value survives.
2026-05-03 07:32:08 -05:00
Hunter Bown 926ffcb4f4 feat(security): deny dangerous keys at project-config scope (#417)
A malicious `<workspace>/.deepseek/config.toml` could escalate
privileges via the per-project overlay shipped in #485:

* `api_key` / `base_url` / `provider` — exfiltrate prompts to
  an attacker-controlled endpoint by swapping the user's
  credentials and target host.
* `mcp_config_path` — point the MCP loader at a config that
  spawns arbitrary stdio servers under the user's identity.

Adds a `DENY_AT_PROJECT_SCOPE` allowlist-by-omission to
`merge_project_config`. The four credential / redirect keys
are silently dropped from the overlay; a stderr warning fires
when one is present so a user who *did* expect the override
sees the deny instead of a silent discard:

  warning: project-scope config key `api_key` is ignored —
  set it in `~/.deepseek/config.toml` instead.

The remaining override surface (model, approval_policy,
sandbox_mode, notes_path, reasoning_effort, max_subagents,
allow_shell, instructions array) is unchanged. Note that this
slice does NOT yet block escalation via value comparison — a
project setting `approval_policy = "auto"` still wins over a
user's stricter `"never"`. That richer check is filed as a
v0.8.9 follow-up.

Tests:

* `project_overlay_overrides_model_but_denies_provider`
  replaces the previous test that asserted provider WOULD
  override (now reversed).
* New `project_overlay_denies_dangerous_credentials_and_redirects`
  models the attacker scenario directly: project sets all four
  denied keys, asserts the user's pre-existing values survive
  and the project's are discarded.

CHANGELOG documents the deny-list rationale and lists which
fields remain overridable.
2026-05-03 07:27:44 -05:00
Hunter Bown a4c8cb2514 feat(prompts): structured Markdown compaction template (#429)
Replaces the legacy compaction template with the spec'd
Goal / Constraints / Progress (Done / In Progress / Blocked) /
Key Decisions / Next step structure.

The richer Progress sub-bullets help long resumed sessions
distinguish "what's verified done" from "what's mid-flight" —
useful when the model writes `.deepseek/handoff.md` before a
long break. The previous Active-task / Files-touched /
Key-decisions / Open-blockers / Next-step framing collapsed
"in progress" and "blocked" into a single "open blockers"
heading, which lost the lineage of "I started X, hit Y,
then…" trails.

Backwards compat: existing `.deepseek/handoff.md` files
continue to render fine because the loader
(`prompts.rs::load_handoff_block`) injects them as plain
markdown — the template only guides what NEW handoffs look
like.

The "pinned-tool-output configurability" half of #429's spec
remains a v0.8.9 follow-up because it requires changes to
`cycle_manager.rs` compaction logic itself; the template
restructure is independently shippable and is the bigger UX
delta in practice.

Tests: existing `compact_template_is_included_in_full_prompt`
updated to assert the new section headings and the nested
Progress sub-bullets. All 24 prompt tests pass.
2026-05-03 07:12:45 -05:00
Hunter Bown c0b6c2a1e5 perf(hooks): fast-path skip when no hooks configured (#455 follow-up)
Now that `tool_call_before` / `tool_call_after` fire on every
tool dispatch, the cost of constructing a `HookContext` (which
allocates for `workspace`, `model`, `session_id`, …) shows up
on the hot path even when the user has zero hooks configured —
the common case.

Adds `HookExecutor::has_hooks_for_event(event)` as a cheap
boolean gate that callers consult before building the context.
The pre-check returns false when:

* `config.enabled == false` (globally disabled).
* No hook in the config has the given `event`.

Wired through every fire site:

* `tool_routing.rs::handle_tool_call_started` —
  `ToolCallBefore`.
* `tool_routing.rs::handle_tool_call_complete` —
  `ToolCallAfter`. Also skips the `result.content.clone()`
  that the `with_tool_result` builder demands.
* `ui.rs::dispatch_user_message` — `MessageSubmit`.
* `ui.rs::apply_engine_error_to_app` — `OnError`.

Inside `HookExecutor::execute` itself, also short-circuit
before calling `context.to_env_vars()` when no hooks match the
event — defends against a caller that builds the context but
forgets to gate.

Tests:
  3 new tests cover empty-config / globally-disabled /
  per-event filtering. The existing 18 hook tests pass
  unchanged.

No behavioral change for users with hooks configured; pure
allocation-free fast path otherwise.
2026-05-03 07:07:11 -05:00
Hunter Bown e569f2ca99 feat(hooks): fire message_submit + on_error too (#455 observer-only)
Completes the observer-only slice of #455 by wiring the two
remaining `HookEvent` variants that were defined but never
fired:

* `MessageSubmit` fires from `dispatch_user_message` before
  the message is handed to the engine. Hook context carries
  `message` so observers can log every prompt the user
  submits, redact for compliance audit, or page on
  `/wipe-database`-style content. Read-only.
* `OnError` fires from `apply_engine_error_to_app` before the
  error cell reaches the transcript. Hook context carries
  `error`. Useful for paging on auth / billing / invalid-
  request failures without tailing the audit log.

Combined with the prior `tool_call_before` / `tool_call_after`
wiring, every `HookEvent` variant now has a live producer:
`SessionStart`, `SessionEnd`, `MessageSubmit`, `ToolCallBefore`,
`ToolCallAfter`, `ModeChange`, `OnError`. The `/hooks events`
listing already enumerates them with their on-fire semantics.

Hooks remain read-only observers in this slice. Mutation is
v0.8.9 follow-up because it needs a synchronous-gate contract
that would change semantics for every hook surface — including
the lifecycle events that have shipped for many releases.
2026-05-03 07:01:52 -05:00
Hunter Bown 4310202645 feat(hooks): fire tool_call_before / tool_call_after (#455 observer-only)
The `HookEvent::ToolCallBefore` and `HookEvent::ToolCallAfter`
enum variants were defined but never fired from production code,
so `[[hooks.hooks]]` entries with those events sat dormant.

Wires the fires from `tui/tool_routing.rs`:

* `handle_tool_call_started` fires `ToolCallBefore` with the
  hook context populated with `tool_name` and `tool_args`. The
  fire happens before any UI bookkeeping so observers see the
  call as early as possible.
* `handle_tool_call_complete` fires `ToolCallAfter` after the
  cell finalization with the result content (or stringified
  error) + success flag. Stays last in the function so any UI
  state the hook might want to observe via shell-out is
  already settled.

Hooks remain read-only observers in this slice. Mutation
(modifying tool args before execution, or the result before it
reaches the model) is a v0.8.9 follow-up that needs a
synchronous-gate contract; the existing executor is fire-and-
forget and adding mutation would change semantics for every
existing hook surface (session_start, mode_change, etc.).

Operators can wire `tool_call_before` / `tool_call_after`
hooks in `~/.deepseek/config.toml` immediately to log every
tool call, page on long shell exec, or audit risky operations.
The `/hooks events` listing already enumerates them.

No new tests — `tool_routing.rs` has no existing test surface,
and the hook execution path is already covered via
`hooks::tests::*`. The wiring is mechanically minimal.
2026-05-03 06:59:26 -05:00
Hunter Bown 8ed1cb4e68 feat(hooks): /hooks events subcommand for discovery (#460 polish)
The shipped `/hooks list` told users WHAT was configured but
not WHAT they could configure. Without this, the only way to
learn the supported `HookEvent` values is to grep source — not
ideal when most users just want to wire up a notification on
session_end.

Adds `/hooks events` (aliases `event` / `list-events`) which
prints every `HookEvent` variant alongside a short descriptive
blurb (when it fires, current observability-vs-mutation status).
Ordered lifecycle → per-tool → situational so the listing reads
naturally and stays stable across releases.

Updates `CommandInfo::usage` to `/hooks [list|events]` so the
fuzzy autocomplete shows the new subcommand.

Tests:
  1 new test (`events_subcommand_lists_every_event_variant_in_documented_order`)
  pins the order, the per-event descriptive blurb format, and
  exhaustive variant coverage. The existing 6 hooks tests pass
  unchanged.
2026-05-03 06:51:27 -05:00
Hunter Bown a8e0693958 feat(doctor): report spillover dir + composer stash file (#422/#440 polish)
The v0.8.8 polish stack added two on-disk surfaces operators
might want to inspect — `~/.deepseek/tool_outputs/` for spilled
tool output (#422 / #500), and `~/.deepseek/composer_stash.jsonl`
for parked composer drafts (#440). Neither showed up in
`deepseek doctor`, so users couldn't see at a glance "do I have
parked drafts?" or "how much disk has spillover claimed?"

Adds a `Storage:` section to the human-readable doctor and a
`storage` object to the JSON doctor:

* Spillover slot reports the dir's existence and entry count.
  Pre-creation state ("not yet created") is shown explicitly
  rather than as a missing dir — the dir is created lazily on
  first spill, not at boot.
* Stash slot reports the file's existence and parked-draft
  count by re-reading via `composer_stash::load_stash`. Empty /
  missing stash shows the Ctrl+S hint so the user knows how to
  use the feature.

The JSON schema always emits both nested slots regardless of
state (so dashboard schemas stay stable across hosts); the
human-readable hides the "not yet created" line for spillover
when the dir is missing to keep the report scannable.
2026-05-03 06:46:20 -05:00
Hunter Bown b1c6e6b173 feat(doctor): report .opencode + .claude skill dirs (#432 follow-up)
The cross-tool skill discovery shipped in 432a0c1 walks
`.opencode/skills/` and `.claude/skills/` alongside the
`.agents/skills/` and `skills/` workspace folders, but the
`deepseek doctor` output still only listed the original three
slots. Operators staring at "where are my Claude-style skills?"
had no way to confirm whether the new dirs were even being
checked.

Updates both surfaces:

* Human-readable doctor — adds two conditionally-printed lines
  for `.opencode skills dir` and `.claude skills dir`. Empty
  dirs are omitted to keep the report scannable; the dirs
  exist on most workspaces only when the user has installed
  another AI tool's skill catalog there.
* JSON doctor (`deepseek doctor --json`) — adds `opencode` and
  `claude` slots to the `skills` object alongside the existing
  `global`, `agents`, `local`. Each carries `path`, `present`,
  and `count`. JSON consumers see all five keys regardless of
  presence so dashboard schemas stay stable across hosts.

The `selected_skills_dir` field still reflects the legacy
"highest-precedence single dir" — workspace-aware discovery is
done at runtime by `discover_in_workspace`, but `selected` is a
useful "where do I install a NEW skill" hint and stays
unchanged for backwards compatibility with existing diagnostic
tooling.
2026-05-03 06:43:47 -05:00
Hunter Bown a368dc53b8 feat(commands): /hooks read-only lifecycle hook listing (#460 MVP)
Slash command enumerates configured lifecycle hooks from the
user's `[hooks]` table, grouped by event. The full picker /
persisted enable-disable surface in #460 is still M-sized work;
this MVP gives users a no-typing view of what's actually loaded
— the most-asked question once hooks start firing.

Implementation:

* `crates/tui/src/commands/hooks.rs` formats the hook list with
  per-event headings, hook name (or `(unnamed)`), background
  marker, timeout, condition summary, and a 60-char shell
  command preview.
* `condition_summary` covers every `HookCondition` variant
  (Always/ToolName/ToolCategory/Mode/ExitCode/All/Any) so the
  listing stays informative for compound conditions too.
* `event_label` maps each `HookEvent` to its config-file string
  so the listing matches what the user wrote in TOML.
* New `HookExecutor::config()` accessor exposes the underlying
  `HooksConfig` for read-only callers; doesn't open the door
  to mutation, which still belongs to the broader #460 work.
* Registered in `commands::COMMANDS` with `aliases: &["hook"]`,
  usage `/hooks [list]`, and `MessageId::CmdHooksDescription`
  localized in en, ja, zh-Hans, pt-BR.
* Wired into `command_palette::command_runs_directly` so
  pressing Enter from Ctrl+K runs `/hooks list` straight.

Tests:
  6 unit tests covering preview-cap truncation, newline
  stripping, condition-summary variants, event-label
  exhaustiveness, and BTreeMap-grouping ordering.
2026-05-03 06:36:37 -05:00
Hunter Bown 15127046e8 feat(stash): /stash clear subcommand to wipe the stash file (#440 polish)
Pairs with `/stash list` and `/stash pop` so the user can fully
manage the stash from inside the TUI without reaching for `rm`.

* New `composer_stash::clear_stash()` returns the number of
  entries dropped so the slash command can report it.
  Atomic-write replaces the file with empty content; missing /
  empty files return `Ok(0)` without erroring.
* `clear` / `wipe` / `drop` are accepted as the subcommand
  alias. The "unknown subcommand" hint now lists the three live
  subcommands explicitly.
* CommandInfo usage updated to `/stash [list|pop|clear]` so
  `/help` and the autocomplete reflect the new option.
* 3 new tests in `composer_stash`: returns-0 when file absent,
  returns-0 when file is empty, drops entries and reports count
  on a populated stash.

No new dependency; reuses `crate::utils::write_atomic` for the
truncate-and-rewrite.
2026-05-03 06:28:18 -05:00
Hunter Bown ba871c56f6 feat(cli): deepseek pr <N> — pre-seed TUI with PR context (#451)
`deepseek pr 1234` fetches the PR's title, body, base/head, URL,
and full diff via `gh`, then launches the interactive TUI with a
review prompt already typed in the composer. The user can edit
before sending or hit Enter to fire as-is. Falls back gracefully
with an actionable error when `gh` is not on PATH.

Implementation:

* `Commands::Pr { number, repo, checkout }` subcommand. Optional
  `--repo <owner/name>` mirrors `gh pr view`'s flag. Optional
  `--checkout` opt-in for `gh pr checkout`; default is to leave
  the working tree alone since `gh pr checkout` errors out on
  dirty trees.
* `run_pr` helper drives three best-effort gh shell-outs
  (`pr view --json`, `pr diff`, optional `pr checkout`) and
  formats a structured prompt: PR header → URL → branches →
  description → fenced ```diff block.
* `format_pr_prompt` caps the diff at 200 KiB with codepoint-
  safe truncation so a massive PR doesn't blow the model's
  context window before the user even hits Enter.
* New `TuiOptions::initial_input: Option<String>` plumbs the
  pre-typed text into `App::new` (which now branches its
  composer-state init around the option). Cursor lands at the
  end of the seed text. Future callers (welcome screens, share-
  link landing pages, etc.) can reuse the same channel.
* `run_interactive` gains an `initial_input: Option<String>`
  parameter; existing callers pass `None`.

Tests:
  3 new tests in `pr_prompt_tests` cover the happy path
  (title/url/branches/body/diff render correctly), empty-input
  fallbacks (placeholder for missing title/body/branches/url),
  and codepoint-safe truncation when the diff exceeds the
  200 KiB cap.

Bulk update: every other `TuiOptions { ... }` test-builder
across the workspace (~21 sites) gains `initial_input: None`
so the new field doesn't break the existing test suite.
2026-05-03 06:23:54 -05:00
Hunter Bown 6fb8739feb feat(composer): prompt stash — Ctrl+S parks, /stash list+pop (#440)
A stash is a side-channel from history: it holds drafts the user
parked deliberately instead of submissions made in the past
(which live in `composer_history.rs`).

* `crates/tui/src/composer_stash.rs` — JSONL-backed store at
  `~/.deepseek/composer_stash.jsonl`. One JSON object per line
  with `ts` (RFC 3339) and `text`. Self-healing parser drops
  malformed lines instead of poisoning the file. Multi-line
  drafts round-trip intact via JSON's newline escaping. Capped
  at 200 entries; oldest pruned at push time. Empty /
  whitespace-only text is silently dropped.
* `crates/tui/src/commands/stash.rs` — `/stash list` renders the
  stash with one-line previews and timestamps; `/stash pop`
  restores the most recently parked draft into the composer
  (LIFO) and rewrites the file. `/park` aliases `/stash`.
* Composer Ctrl+S handler in `tui/ui.rs` — pushes the current
  draft onto the stash, clears the composer, and surfaces a
  toast confirming the action so the no-op-feel doesn't fool
  users into thinking nothing happened. Empty composers are a
  no-op so a stray Ctrl+S can't pollute the file.
* New `KbStashDraft` keybinding entry registered in the help
  overlay; localized in en, ja, zh-Hans, pt-BR.

Tests:
  7 unit tests in `composer_stash.rs` cover round-trip, LIFO pop,
  empty-on-pop, drop-empty-text, multi-line preservation,
  malformed-line resilience, and cap pruning. 4 unit tests in
  `commands/stash.rs` cover the preview helper's truncation,
  multi-line first-line behavior, and empty-input handling.
2026-05-03 06:09:35 -05:00
Hunter Bown 0fa042dc99 feat(audit): emit tool.spillover events when output is spilled (#500 polish)
The existing `tool.result` audit event records that a tool
finished but says nothing about spillover — operators tailing
`~/.deepseek/audit.log` couldn't see when 200 KiB of stdout
landed under `~/.deepseek/tool_outputs/`.

Adds a discrete `tool.spillover` event keyed off
`apply_spillover`'s return value, fired in both the sequential
and parallel tool paths so the log entry exists regardless of
how the tool was scheduled. Each event carries:

  {"event": "tool.spillover", "tool_id": "...",
   "tool_name": "exec_shell", "path": "/.../call-abc.txt"}

This is a pure observability addition. The model still receives
the same truncated head + footer; the UI still renders the
inline `full output: <path>` annotation; the spillover writer
contract is unchanged. No new tests — `apply_spillover` already
has unit-level coverage and the engine paths are exercised by
integration runs.
2026-05-03 05:58:02 -05:00
Hunter Bown d7017b7829 feat(skills): walk workspace .opencode + .claude skill dirs (#432)
The skills catalogue and `load_skill` tool now scan every
candidate directory in the workspace plus the global default,
not just the first one that exists:

  <workspace>/.agents/skills    (deepseek-native convention)
  <workspace>/skills            (flat, project-local)
  <workspace>/.opencode/skills  (OpenCode interop)
  <workspace>/.claude/skills    (Claude Code interop)
  ~/.deepseek/skills            (global, user-installed)

Skills installed for any AI-tool convention land in the same
catalogue without the user having to symlink or duplicate
files. Name conflicts resolve first-match-wins per the
precedence list above, so workspace-local skills shadow
user/global ones — that's the right shadowing for "this repo
overrides my defaults".

Implementation:

* `skills::skills_directories(workspace)` returns the existing
  candidate dirs in precedence order (host-dependent for the
  global default).
* `skills::discover_in_workspace(workspace)` walks each, merges
  the discovered skills, and accumulates warnings.
* `render_available_skills_context_for_workspace(workspace)`
  wraps `discover_in_workspace` for `prompts.rs`. The legacy
  single-dir `render_available_skills_context(skills_dir)` is
  retained as a fallback so callers that don't have a workspace
  view (e.g. mcp_server.rs) still work.
* `LoadSkillTool` (#434) routes through `discover_in_workspace`
  so its lookup matches what the system-prompt catalogue
  advertises. The "skill not found" error message now lists the
  searched dirs to help the user debug missing installs.

Tests:
  4 new tests in `skills/mod.rs`: precedence-order resolution,
  first-wins merge across .agents and .claude, .opencode
  discovery, system-prompt rendering for cross-tool dirs. The
  existing 6 single-dir tests pass unchanged.
2026-05-03 05:52:28 -05:00
Hunter Bown 8290b136e1 feat(tui): push DISAMBIGUATE_ESCAPE_CODES on startup (#442)
Opt into the Kitty keyboard protocol's escape-code disambiguation
so terminals that support it (Kitty, Ghostty, Alacritty 0.13+,
WezTerm, recent Konsole / xterm) report unambiguous events for
Option/Alt-modified keys, plain Esc, and multi-byte sequences.

Push happens after `enable_raw_mode` and the alt-screen /
mouse-capture / bracketed-paste setup so the order matches
shutdown's reverse-order pop. Only the disambiguation tier is
pushed — `REPORT_EVENT_TYPES` and the higher tiers emit release
events that the existing key handlers would mis-route as
duplicate presses.

Pop on exit was already wired in main.rs (panic) and ui.rs
(normal shutdown) per #443; the recent #443 follow-up extended
that to the suspend paths so editor / shell-suspend children
inherit a clean keyboard mode. The push + the four pops form
a complete pair.

Failure to push is logged at debug level and ignored — a quirky
terminal can't block startup. On terminals without protocol
support the escape sequence is silently discarded and behaviour
is identical to today (iTerm2, Terminal.app, Windows 10 conhost).

No new dependency; everything runs through crossterm's existing
`PushKeyboardEnhancementFlags` command.
2026-05-03 05:45:52 -05:00
Hunter Bown e8af3cd37d feat(tools): load_skill model-callable tool (#434)
Adds a `load_skill` tool that takes a skill id and returns the
SKILL.md body plus the sibling companion-file list in one tool
call. The existing progressive-disclosure pattern (system prompt
lists skills → model `read_file <path>`) still works; this tool
is the higher-level affordance for skills that ship with multiple
resource files.

Implementation:

* `LoadSkillTool` lives in `crates/tui/src/tools/skill.rs`. Read-
  only, auto-approved, parallel-safe.
* On call, resolves the active skills directory via the new
  `skills::resolve_skills_dir` helper, which mirrors
  `App::new`'s hierarchy: `<workspace>/.agents/skills` →
  `<workspace>/skills` → `~/.deepseek/skills`. No new plumbing
  through ToolContext — the workspace is already there.
* Returns the skill body wrapped in a self-contained block:
  description quote, source path, the SKILL.md verbatim, and a
  `## Companion files` section listing siblings (sorted lex,
  deterministic for tests). Solo skills skip the companions
  section entirely so the tool result stays tight.
* Errors with a helpful hint when the name is unknown — the
  hint includes the catalogue ("Available: foo, bar, baz") so
  the model can recover without an extra discovery call.
* Wired into `ToolRegistryBuilder::with_skill_tools` and pulled
  into both Agent and Plan tool-setup paths. Plan mode benefits
  because skills are read-only references that planners often
  need.

Tests:
  5 unit tests covering: description-headed body, companion
  enumeration excluding SKILL.md and nested dirs, empty result
  for solo skills, and the conditional `## Companion files`
  section.
2026-05-03 05:43:18 -05:00
Hunter Bown 5deaf97253 fix(tui): pop keyboard flags on suspend paths too (#443 follow-up)
`main.rs` (process panic) and the normal TUI shutdown both pop
keyboard enhancement flags before handing the terminal back to
the child shell. The two suspend paths — `pause_terminal`
(Ctrl+Z and shell-suspend) and
`external_editor::spawn_editor_for_input` (composer `$EDITOR`
launch) — were missing the same defensive pop.

Today this is dormant: the TUI doesn't push keyboard
enhancement flags explicitly, so there's nothing to pop. The
fix is defence-in-depth: the day a future code path enables
the flags (kitty keyboard protocol for sub-second-precision
modifier reporting, say), the suspend handlers won't leak the
half-configured input mode to Vim / less / a shell child.

Aligns the four terminal-handoff sites (shutdown, panic,
suspend, editor) so they all do the same thing.
2026-05-03 05:29:11 -05:00
Hunter Bown ac0c16996e feat(config): instructions array merged into system prompt (#454)
Adds a new optional `instructions = ["./AGENTS.md", "~/.deepseek/global.md"]`
config field that's loaded at startup and concatenated into the
system prompt, in declared order, above the skills block.

* `Config::instructions: Option<Vec<String>>` — raw paths from
  `~/.deepseek/config.toml` or the per-project overlay.
* `Config::instructions_paths()` — `expand_path` each entry,
  drop empties, return the resolved `Vec<PathBuf>`.
* `merge_project_config` — project's array replaces the
  user-level array wholesale (including `instructions = []` to
  clear the user list for the current repo). The typical "merge"
  pattern is for users who want both — they list `~/global.md`
  inside the project array.
* `EngineConfig::instructions: Vec<PathBuf>` — threaded from
  config through both engine entry points (`Engine::new` for
  Default and `refresh_system_prompt` for runtime swaps).
* `prompts::render_instructions_block(paths)` — loads each file
  in order, caps each at 100 KiB with a `[…elided]` marker on
  overflow, skips missing files with a tracing warning. Returns
  `None` when nothing renders so the caller appends nothing.
* `system_prompt_for_mode_with_context_and_skills` gains an
  `instructions: Option<&[PathBuf]>` parameter. Block lives
  between the project-context block and the skills block so it
  benefits from KV prefix caching and per-project overrides
  apply consistently turn-over-turn.

Documentation:
* `config.example.toml` documents the field, the wholesale-
  override semantics, and the size cap.

Tests:
* 5 new tests in `prompts.rs`: no-op for empty input, skip
  missing files, declared-order concatenation, skip empty files,
  truncate oversize files, plus an end-to-end test that the
  block appears in the assembled system prompt when configured.
2026-05-03 05:25:31 -05:00
Hunter Bown 5e83f073b1 feat(footer): cumulative session-elapsed indicator (#448)
Adds `App::session_started_at: Instant` (set at construction) and
a low-priority `worked Nh Mm` chip in the footer's right cluster
that surfaces session age once it crosses 60s.

* `footer_worked_chip(elapsed)` returns empty spans for the first
  minute of a session so a fresh launch doesn't render a noisy
  ticker. Above the threshold it reuses the multi-day
  `humanize_duration` helper (#447) so the band promotion stays
  consistent: `1m`, `3h 12m`, `2d 5h`, `1w 2d`.
* The chip slots in last in `auxiliary_spans`, which means under
  narrow widths it's the first thing the priority-drop loop
  removes — the existing chips (coherence / agents / replay /
  cache / mcp) keep their slots.
* `FooterProps` carries a captured `worked: Vec<Span<'static>>`
  built at props-build time (matches the existing `retry`
  capture pattern). Render stays pure, tests can pin a known
  state without relying on wall-clock.

Tests:
  3 new tests in `tui/widgets/footer.rs` — chip hidden under 60s,
  chip rendered with humanized labels at 60s / 3h 12m / 2d 5h
  bands. The existing `from_app_idle_state` test gains a
  `worked.is_empty()` assertion (the test app is freshly
  constructed, well under the 60s threshold).
2026-05-03 05:17:01 -05:00
Hunter Bown 6dfb10f321 feat(a11y): NO_ANIMATIONS env override + accessibility docs (#450)
`fancy_animations: false` and `low_motion: true` already exist on
the settings struct, but the flag was undocumented and the only
ways to opt in were the `/settings` slash command or hand-editing
`~/.config/deepseek/settings.toml` — there was no environment-
level signal that platform a11y tooling could carry forward.

* `NO_ANIMATIONS=1` env var now forces `low_motion = true` and
  `fancy_animations = false` at startup, regardless of what's on
  disk. Recognises `1`, `true`, `yes`, `on` (case-insensitive);
  any other value is treated as unset.
* `Settings::apply_env_overrides()` is now called at the end of
  `Settings::load()`, so every consumer (App::new, /config, the
  doctor surface) sees the override applied uniformly. The
  override is a startup-time overlay — changing the env var
  mid-session has no effect.
* New `docs/ACCESSIBILITY.md` documents the existing `low_motion`,
  `fancy_animations`, `calm_mode`, `show_thinking`, and
  `show_tool_details` toggles plus the `NO_ANIMATIONS` startup
  override. Includes guidance for screen-reader users and a link
  back to this issue for follow-up motion regressions.

Tests:
  3 new tests in `settings.rs` (force-low-motion-on, override-
  user-opt-in, truthy-spelling-recognition). All three serialise
  through a static Mutex so the cargo parallel runner doesn't
  observe interleaved env mutations.
2026-05-03 05:09:17 -05:00
Hunter Bown 3625b887fa feat(ui): humanize_duration handles hours, days, and weeks (#447)
Long-running sessions (multi-hour cycles, multi-day automations)
were rendering with the seconds/minutes-only formatter, so a
two-day session showed as `2880m 0s` and `/goal` status used
Rust's Debug Duration form (`188415.234s`).

`humanize_duration` now walks through w/d/h/m/s and caps the
output at two units so it stays compact in headers and
notifications:

* `45s`, `1m 12s`, `59m 59s` (existing seconds/minutes path)
* `1h`, `2h 2m`, `3h 12m` (was `192m 30s`)
* `1d`, `1d 1h`, `2d 5h` (the multi-day case from the issue)
* `1w`, `1w 1d`, `3w 2d` (long-running automation)

The two-tier rule drops sub-minute precision once you're past
the hour boundary; the goal is "is this a couple of hours or
days," not stopwatch precision.

`/goal` status now wires through this formatter so multi-day
goal-elapsed times read as `2d 3h` instead of the previous
`188415.234s` Debug form. The notification system was the
existing caller and picks up the new format automatically.

Tests:
  4 test functions in `notifications.rs` covering the four
  formatting bands (s/m, h/m, d/h, w/d) plus the boundary cases
  on each unit.
2026-05-03 05:05:30 -05:00
Hunter Bown 0b99ad1f25 feat(engine): wire tool-output spillover into the engine and pager (#500)
The spillover writer (#422) and inline cell annotation (#423) were
already in place; this commit makes the pipeline actually fire and
gives the user a way to see the elided tail.

* `apply_spillover` lives in `tools/truncate.rs` and mutates a
  `ToolResult` in place: writes the full content to
  `~/.deepseek/tool_outputs/<id>.txt`, replaces the inline content
  with a 32 KiB head plus a footer pointing at the file, and stamps
  `metadata.spillover_path` so downstream renderers can find it.
  Skips error results so the model still sees the failure verbatim.
  Preserves prior metadata when present.

* `core/engine/turn_loop.rs` calls `apply_spillover` immediately
  after `execute_tool_with_lock` returns, before the result fans
  out to the model context (`ContentBlock::ToolResult`) and the UI
  (`Event::ToolCallComplete`). Both the parallel and sequential
  tool paths get the same hook so the model and the UI always see
  the same truncated content.

* `tui/ui.rs::open_details_pager_for_cell` now folds the full
  spillover-file body into the tool-details pager when the focused
  cell has a `spillover_path`. Truncated head stays at the top
  (so the user can see what the model received) followed by a
  `── Full output (spillover) ──` separator and the file body.
  Missing files render an inline notice instead of silently
  truncating.

* The model's footer ("Use `read_file path=…` if you need the
  elided tail") teaches the agent how to recover the rest of the
  payload on its next turn, so spilled output is not lost — just
  not paid for in context tokens unless the agent decides it
  actually needs the tail.

Tests:
  4 new unit tests in `tools/truncate.rs` (no-op below threshold,
  no-op for errors, truncate + stamp above threshold, preserve
  prior metadata). 3 new tests in `tui/ui/tests.rs` for the pager
  helper (no-op without spillover_path, file-load happy path,
  graceful notice when the file is missing).
2026-05-03 05:02:11 -05:00
Hunter Bown 637d0f088f fix(agents): list Implementer/Verifier in agent_spawn + agent_assign schemas (#404)
The SubAgentType enum gained `Implementer` and `Verifier` variants in #404,
but the JSON-schema `description` strings on AgentSpawnTool::input_schema
and DelegateToAgentTool::input_schema still listed the pre-#404 set
(general/explore/plan/review/custom). The model only sees those
descriptions, so the new roles were effectively hidden behind a
docs lookup.

Updates both descriptions to the post-#404 surface and references
docs/SUBAGENTS.md for posture. Also adds the long-form aliases
(builder/validator/tester) to the agent_assign hint so it matches
the canonical alias map. Pure copy change — no behaviour delta.
2026-05-03 04:50:51 -05:00
Hunter Bown 482fcdee7c docs(changelog): collapse #422 + #423 spillover entry
Both halves now shipped; combined entry reads more clearly than two
separate ones split across Added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:45:03 -05:00
Hunter Bown cea4617fb4 docs(changelog): record #422 spillover writer in v0.8.8 entry
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:38:17 -05:00
Hunter Bown 01fa11b96f docs(changelog): note /sessions prune slash command in #406 entry
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:31:18 -05:00
Hunter Bown 2fa23c1d74 docs(changelog): record session-prune helper + doctor memory block
Two items added in this stabilization pass that weren't yet in the
changelog:
- SessionManager::prune_sessions_older_than (#406 phase-1)
- doctor --json memory block (#489 follow-up)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:27:12 -05:00
Hunter Bown 8e7664bc70 docs(changelog): populate [Unreleased] with v0.8.8 stabilization entries
Catalogues the 24 v0.8.8 issues shipped across PRs #514 / #515 /
#517 / #518 / #519 in the standard Keep-a-Changelog format,
organized into Added / Changed / Fixed buckets with issue
cross-references.

Captured:
- Added (10): memory MVP + remember tool, inline diff, OSC 8,
  retry banner, MCP chip, project config overlay,
  Implementer/Verifier roles, two doc files, competitive analysis
- Changed (8): sub-agent cap, RwLock, output summarization,
  agent_list session boundary, concise todos, compact agent_spawn,
  Plan panel, RLM family
- Fixed (8): self-update arch, Option+Backspace word delete,
  offline queue scope, display_path Windows, footer theme color,
  panic-exit keyboard flags, CI workflow cleanup, plus the v0.8.8
  release-base fix

Plus a Releases callout reminding maintainers that the npm wrapper
publish stays manual and the GitHub release automation depends on
the \`RELEASE_TAG_PAT\` secret.

The dated section header lands when the actual version-bump
commit fires \`auto-tag.yml\`. This commit just populates the
[Unreleased] body so contributors get a clean summary while the
PRs are still in review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 04:19:09 -05:00
Claude 0e5afe0b01 feat(v0.8.8): linux ARM64 prebuilts + install docs overhaul
Triggered by a Telegram report from a Chinese user trying to deploy
DeepSeek TUI on a HarmonyOS ARM64 thin-and-light: `npm i -g deepseek-tui`
exited with `Unsupported architecture: arm64 on platform linux` because
v0.8.7 only published x64 Linux artifacts. They worked around it with
`cargo install`, but the README never documented that path for ARM users.

This PR closes that gap on three layers:

- **Release workflow** — add `aarch64-unknown-linux-gnu` to the build
  matrix using GitHub's `ubuntu-24.04-arm` runner. v0.8.8 will publish
  `deepseek-linux-arm64` and `deepseek-tui-linux-arm64` alongside the
  existing x64/macOS/Windows assets, plus add the row to the Release
  body's manual-download table.

- **npm wrapper** — uncomment the linux/arm64 row in `ASSET_MATRIX`,
  rewrite the `Unsupported architecture/platform` error to print the
  full `cargo install deepseek-tui-cli deepseek-tui --locked` recipe
  and link to docs/INSTALL.md, and add `DEEPSEEK_TUI_OPTIONAL_INSTALL=1`
  so CI matrices that include unsupported platforms can keep running
  without a binary.

- **Docs** — new docs/INSTALL.md covering every supported platform,
  prebuilt vs. cargo install vs. manual download, cross-compiling x64
  -> ARM64 with `cross` or `gcc-aarch64-linux-gnu`, China mirror setup,
  and a troubleshooting section for the common arm64, MISSING_COMPANION_BINARY,
  and self-update arch-mapping (#503) errors. README and README.zh-CN
  now have an explicit Linux ARM64 quickstart pointing at `cargo install`
  for v0.8.7 today and `npm i -g` for v0.8.8+; the v0.8.7 known-issue
  block is updated to mention both #503 and the missing arm64 prebuilt.

https://claude.ai/code/session_01Fg1FKMtDxVnC4pp6bNBRCS
2026-05-03 04:42:53 +00:00
Hunter Bown 98ab76a99c docs: add v0.8.7 changelog + README release notes
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:26:31 -05:00
Hunter Bown 5bfc1feb62 v0.8.6: survivability, UX polish, and release hardening
Merge the v0.8.6 feature batch and release hardening.\n\nIncludes the full #373-#380/#382-#402 milestone scope, version bump to 0.8.6, secure /share temp-file handling, Windows-safe self-update replacement, and CI portability fixes.\n\nRemote PR checks passed on the final head before merge.
2026-05-02 20:11:33 -05:00
Hunter Bown 3d3ff0c5cf Release v0.8.4: Phase 1 i18n + cache-prefix stability
* fix(pricing): extend V4 Pro 75% discount expiry to 2026-05-31 15:59 UTC

DeepSeek extended the promotional discount past the original 2026-05-05
cutoff. Without this update the TUI would have started showing 4× the
actual billed cost on May 6.

Source: https://api-docs.deepseek.com/quick_start/pricing — "extended
until 2026/05/31 15:59 UTC".

Adds a regression test pinning the new active window so a future revert
to the May 5 date trips the suite immediately.

Closes #267

* chore: remove stale TODO(integrate) markers from already-integrated modules

Five `// TODO(integrate)` comments and one matching "Not yet integrated"
note were misleading anyone grepping for integration work. Each module
is in fact wired up:

- execpolicy/mod.rs       → tools/shell.rs:1322 (load_default_policy)
- sandbox/mod.rs          → tools/shell.rs:28, main.rs:2647, tui/approval.rs:30
- sandbox/policy.rs       → main.rs:2752, tui/approval.rs:30 (SandboxPolicy)
- command_safety.rs       → tools/shell.rs:1321, tools/tasks.rs:13,
                            tools/approval_cache.rs:26
- tui/streaming/mod.rs    → tui/app.rs:38 (StreamingState)

The remaining TODO at mcp.rs:1771 covers a separate "wire legacy sync API
into CLI subcommands or remove" decision and is left in place.

Closes #266

* docs(release): add install + dual-binary template to GitHub Release page

Closes #265.

The Release page used the auto-generated commit-title body. New users
hitting the Release page from Twitter / npm-search had no on-page
guidance that the dispatcher (`deepseek`) and the TUI runtime
(`deepseek-tui`) ship as two binaries that must coexist; #258 was an
external user spending 11 minutes figuring this out and #272 was the
follow-on confusion.

The new body covers:
- npm wrapper as the recommended install
- `cargo install deepseek-tui-cli deepseek-tui --locked` (both crates)
- Manual download with a per-platform table showing both artifacts
- sha256 verify using the existing `deepseek-artifacts-sha256.txt`
- Changelog link

* feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss

Step 1 of #263. Without per-turn telemetry the prefix-cache audit is
unfounded speculation; the rest of the issue's investigation steps
depend on this surface.

The DeepSeek API already returns `prompt_cache_hit_tokens` and
`prompt_cache_miss_tokens` per turn, and we already store the *latest*
on App. This adds a 50-turn ring (`turn_cache_history`) populated at
the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]`
slash command that renders a fixed-width table of the last N turns
with per-turn ratios and a session aggregate. Default count is 10;
larger values clamp to the ring size.

Edge cases the formatter handles:

- No telemetry yet → friendly "no turns recorded" message
- `cache_hit_tokens = None` (provider didn't report) → row renders all
  em-dashes and is excluded from session aggregates so one missing-
  telemetry turn can't make the average ratio look broken.
- `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as
  `input − hit` and mark the cell with `*`. Footer documents the
  asterisk.
- Ring at cap (50) → push evicts oldest.

Tests cover all four paths plus the cap.

* test(prompts): add cache-prefix stability harness for #263 step 2

The DeepSeek prefix-cache only hits while the byte prefix of each
request matches the prior call. Anything in the cached prefix that
varies turn-to-turn for unchanged inputs is a cache buster.

Adds a focused harness next to the production surface so the property
is regression-guarded:

1. `first_divergence(a, b)` helper that returns the first divergent
   byte position with a `±32 byte` window of context, used by the
   custom assertion `assert_byte_identical`. Future suspect tests can
   reuse this to surface "where" rather than just "fail".

2. `compose_prompt_is_byte_stable_across_calls` — sweeps every
   (mode, personality) pair and pins that two consecutive calls
   produce identical bytes. Rules out suspect #4 (mode-prompt churn).

3. `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace`
   — the call site `engine.rs::build_tool_context` actually invokes,
   pinned for an empty workspace across all three modes.

4. `system_prompt_with_working_set_summary_is_byte_stable_for_constant_summary`
   — pins that the surrounding prompt construction faithfully embeds
   the working_set summary it's given without injecting extra
   non-determinism. (The actual working_set summary stability lives
   in `working_set.rs` and is the next investigation target — see
   issue note in PR description.)

Foundation for the suspect-by-suspect bisection in the rest of #263.

* fix(secrets): never overwrite the secrets file when load_unlocked errors

`FileKeyringStore::set` and `delete` did
`self.load_unlocked().unwrap_or_default()`, which wiped every existing
secret if the read failed for any reason other than \"file is missing\":

- file mode != 0600 (`InsecurePermissions`) — easy on headless / CI
  environments where a permissive umask got applied
- corrupt JSON
- transient I/O error

In all of those, the next `store_unlocked` overwrote the file with an
empty-or-single-entry blob and reset perms to 0600, silently losing
every other provider's key.

Switch both call sites to `?`. `load_unlocked` already returns
`Ok(default)` for a missing file, so the first-write-creates-the-file
ergonomic is preserved (covered by the new
`file_store_set_still_creates_file_when_missing` test).

Adds four regression tests:

- set: insecure perms surface InsecurePermissions and leave the file
  byte-identical.
- delete: same.
- set: corrupt JSON surfaces the parse error and leaves the file
  byte-identical.
- set: missing file path still works (idempotence guard).

Closes #281

* fix(cache): make tool catalog byte-stable across calls and sessions

DeepSeek's KV prefix cache hits on the longest matching byte prefix of
the request. Two places in the tool-array path were silently introducing
divergence:

1. `ToolRegistry::to_api_tools()` iterated `self.tools.values()` directly.
   Rust's default `HashMap` is seeded with `RandomState` per process, so
   every `deepseek` launch produced a different tool order — the cross-
   session resume case (the one with the biggest cache wins) never hit.

2. `active_tool_list_from_catalog()` filtered the catalog `Vec` by the
   active set in catalog order. When ToolSearch activated a previously-
   deferred tool mid-conversation, the new tool appeared at its catalog
   index, shifting every later tool's byte offset and busting the cached
   prefix from there onwards.

Fixes:

- `to_api_tools()` now sorts by tool name before emitting the API tool
  array. Stable across calls AND across launches.
- `build_model_tool_catalog()` sorts each partition (built-ins first,
  contiguous; MCP tools after, also alphabetical). Mirrors Claude Code's
  `assembleToolPool` strategy where they explicitly call out cache
  stability as the reason: "a flat sort would interleave MCP tools into
  built-ins and invalidate all downstream cache keys whenever an MCP
  tool sorts between existing built-ins."
- `active_tool_list_from_catalog()` puts always-loaded tools in catalog
  order at the head and deferred-but-now-active tools at the tail. A
  deferred-tool activation during ToolSearch no longer shifts earlier
  tools' positions.

Adds three regression tests:

- `to_api_tools_emits_alphabetical_order_regardless_of_registration_order`
- `model_tool_catalog_sorts_each_partition_for_prefix_cache_stability`
- `active_tool_list_pushes_deferred_activations_to_the_tail`

Refs #263. Findings produced by reading reference Claude Code source
side-by-side with our request-building flow; full delta analysis in
the PR description.

* fix(sandbox): elevate Agent-mode shell sandbox to allow network access

The seatbelt-default policy is `WorkspaceWrite { network_access: false }`,
which on macOS emits `(deny default)` with no `(allow network-outbound)` /
`(allow system-socket)`. Every outbound socket call from a sandboxed
shell command — including `getaddrinfo` for DNS — gets denied by the
kernel. Symptom: "DNS resolution failed" for any URL the model tries to
reach via curl, yt-dlp, package managers, etc.

Engine.build_tool_context only elevated the policy in Yolo mode, leaving
Agent mode (the default) stuck on the strict default. That's tighter
than competitors (Claude Code, Codex) without buying any safety the
application-level NetworkPolicy or the approval flow doesn't already
provide.

Switch the elevation to a `match` so:

- Plan       → no elevation (read-only investigation; shell tool not registered)
- Agent      → WorkspaceWrite { network_access: true, … }
- Yolo       → WorkspaceWrite { network_access: true, … } (unchanged)

Adds `agent_and_yolo_modes_elevate_shell_sandbox_to_allow_network` so a
future revert to the no-network default trips CI immediately.

Closes #273

* fix(skills): treat bare github.com/<owner>/<repo> URLs as GitHubRepo

Closes #269.

`/skill install https://github.com/obra/superpowers` failed on every
platform with `invalid gzip header`. Root cause: `InstallSource::parse`
matched any `https://`-prefixed spec as `DirectUrl`, so the installer
downloaded the HTML repo page (200 OK, `text/html`) and tried to
gzip-decode HTML. The user reported it from Win11 + PowerShell but the
parse path is platform-independent.

Recognize bare GitHub repo URLs in `InstallSource::parse`:

- `https://github.com/<owner>/<repo>`
- `https://github.com/<owner>/<repo>/`
- `https://github.com/<owner>/<repo>.git`
- `https://github.com/<owner>/<repo>.git/`
- `https://www.github.com/<owner>/<repo>`
- `http://github.com/<owner>/<repo>` (legacy)

…all route to the existing `GitHubRepo` source, which already produces
`https://github.com/<repo>/archive/refs/heads/{main,master}.tar.gz`
candidates with proper fallback. URLs with a third path segment
(`/archive/...`, `/blob/...`, `/tree/...`) keep going through
`DirectUrl` because the user picked that exact path.

Adds two regression tests: one asserting the seven recognised forms
all canonicalize to `github:obra/superpowers`, and one pinning the
sub-resource paths to `DirectUrl`.

* fix(cache): drop volatile fields from working_set summary block (#280) (#287)

The working-set summary lands inside the system prompt before the
historical conversation, so any byte that drifts there cache-misses
everything that follows in DeepSeek's KV prefix cache. Two sources of
turn-over-turn drift are removed:

1. The rendered line is now `- {path} ({kind})`. The previous form
   interpolated `entry.touches` and `self.turn - entry.last_turn`,
   both of which advance on every user message even when no new
   paths are observed.

2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC)
   instead of the turn-aware `sorted_entries`. The recency bonus in
   `score_entry` crosses bucket boundaries as turns advance, so even
   without rendering `last seen` the order — and which entries cross
   the `max_prompt_entries` cutoff — drifted. Compaction pinning
   still uses `sorted_entries` because it genuinely wants recency.

Adds a regression test that observes a fixed message set, calls
`summary_block` before and after `next_turn()`, and asserts the two
outputs are byte-identical. The shared `first_divergence` /
`assert_byte_identical` helpers (from #279) move from `prompts::tests`
into `test_support` so working_set tests can reuse them.

Closes #280.

* fix(cache): memoise tool catalog so descriptions stay byte-stable (#289)

`to_api_tools` previously re-sampled `tool.description()` and
`tool.input_schema()` on every call. Native tools return `&'static str`
and a `json!` literal, so the bytes were stable in practice — but the
`McpToolAdapter` returns `self.tool.description.as_deref()`, which can
drift when the upstream MCP server reconnects with a different
description string. Any drift mid-session rewrites the tool catalog
that lands in the cached prefix and busts every byte that follows.

Adds an `api_cache: OnceLock<Vec<Tool>>` field on `ToolRegistry`. The
first `to_api_tools` call materialises the catalog; subsequent calls
return a clone of the cached vector. Mutations (`register`, `remove`,
`clear`) reset the field so the next read rebuilds. Mirrors
reference-cc's `getToolSchemaCache` (`utils/api.ts:119–208`).

Tests:
- `to_api_tools_pins_description_bytes_across_calls` registers a tool
  whose `description()` advances through a script of pre-built strings
  on each call. After the cache is populated, the second `to_api_tools`
  read returns the original description because `description()` is no
  longer invoked. Without the cache the second read would return the
  next script entry.
- `register_invalidates_api_tools_cache` registers a tool, snapshots,
  registers another, snapshots again, and asserts the second snapshot
  reflects both tools (cache rebuilt) and that the varying tool's
  description advanced (proving the rebuild actually re-sampled).
- `remove_and_clear_invalidate_api_tools_cache` covers the other two
  invalidation paths.

* fix(cache): sort project_tree and summarize_project output (#290)

Both helpers walked the workspace via `ignore::WalkBuilder::build()`
and emitted entries in the OS readdir order — non-deterministic across
filesystems (htree-hash on ext4, insertion-order on APFS, etc.). Their
output lands in the fallback branch of the system prompt's project
context (when the workspace has no AGENTS.md / CLAUDE.md) and inside
the `project_map` tool surface, both of which feed the cached prefix.

`summarize_project` now sorts the collected key-files list before the
type-detection logic and the fallback `Project with key files: …` join.

`project_tree` collects `(rel_path, is_dir)` tuples, sorts by full
path, and only then formats the indented tree. Sorting by full path
preserves the visual tree shape — `"src" < "src/lib.rs"` because the
shorter string compares less — while making siblings deterministic.

Tests cover sibling order, parent-before-children invariant, byte
stability across two consecutive calls, and the fallback `Project
with key files:` branch (the only branch where the joined order
escapes into output without further sorting downstream).

* fix(client): unique fallback id for parallel streaming tool calls (#291)

When a streamed tool_call delta omits the `id` field, the chat-completion
decoder used to fall back to the literal string `"tool_call"` for every
call. With the V4 API's native parallel tool calls (multiple tool_calls
in one delta), every parallel call ended up with the same fallback id —
downstream tool-result routing then matched the first call's result
twice and the second call hung waiting for an answer that never arrived.

The fallback now indexes by the assigned `content_block` position,
producing `"call_0"`, `"call_1"`, … within a single response. Upstream-
supplied ids are still forwarded verbatim; only the fallback path
changes.

Tests pin both invariants:
- `decoder_assigns_unique_fallback_ids_to_parallel_tool_calls_missing_id`
  feeds two tool calls without `id` in one delta and asserts they get
  distinct ids.
- `decoder_preserves_upstream_tool_call_id_when_present` keeps the
  forward-as-is path honest.

* fix(cache): place handoff and working_set after static prompt blocks (#292)

* fix(cache): drop volatile fields from working_set summary block (#280)

The working-set summary lands inside the system prompt before the
historical conversation, so any byte that drifts there cache-misses
everything that follows in DeepSeek's KV prefix cache. Two sources of
turn-over-turn drift are removed:

1. The rendered line is now `- {path} ({kind})`. The previous form
   interpolated `entry.touches` and `self.turn - entry.last_turn`,
   both of which advance on every user message even when no new
   paths are observed.

2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC)
   instead of the turn-aware `sorted_entries`. The recency bonus in
   `score_entry` crosses bucket boundaries as turns advance, so even
   without rendering `last seen` the order — and which entries cross
   the `max_prompt_entries` cutoff — drifted. Compaction pinning
   still uses `sorted_entries` because it genuinely wants recency.

Adds a regression test that observes a fixed message set, calls
`summary_block` before and after `next_turn()`, and asserts the two
outputs are byte-identical. The shared `first_divergence` /
`assert_byte_identical` helpers (from #279) move from `prompts::tests`
into `test_support` so working_set tests can reuse them.

Closes #280.

* fix(cache): place handoff and working_set after static prompt blocks

`system_prompt_for_mode_with_context_and_skills` previously interleaved
volatile content into the static prefix:

  1. mode prompt           static
  2. project context       static
  3. working_set_summary   ← volatile
  4. skills_block          static
  5. handoff_block         ← volatile
  6. ## Context Management static
  7. COMPACT_TEMPLATE      static

Anything past byte (3) cache-missed every time the working-set drifted
or `/compact` rewrote `.deepseek/handoff.md` — including the static
`## Context Management` and `## Compaction Handoff` blocks behind them.

New order keeps every static block in the cached prefix and pushes the
two volatile blocks to the end:

  1. mode prompt
  2. project context (or fallback automap)
  3. skills block
  4. ## Context Management (Agent / Yolo only)
  5. COMPACT_TEMPLATE
  ── volatile boundary ──
  6. handoff block
  7. working-set summary

Adds a doc comment on the function describing the volatile-content-last
invariant so future contributors don't reintroduce churn into the
prefix. Adds two regression tests:

- `system_prompt_with_handoff_file_is_byte_stable_when_file_is_unchanged`
  pins the handoff path with a fixture file.
- `handoff_and_working_set_appear_after_static_blocks` asserts the
  ordering invariant directly so a future reorder fails loudly.

Reference: Claude Code's own prompt builder marks this same boundary
with a `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` constant; we don't introduce
the abstraction yet but match the principle.

* feat(i18n): localize slash command help (Phase 1a, #285) (#294)

Adds 44 new MessageIds, one per slash command, and translations to all
four shipped locales (en/ja/zh-Hans/pt-BR). Refactors CommandInfo so the
English description now lives in localization.rs (single source of
truth) instead of being duplicated on the struct, and threads the
active Locale through the three render surfaces:

- crates/tui/src/tui/views/help.rs (the ?/F1/Ctrl+/ help overlay)
- crates/tui/src/tui/command_palette.rs (Ctrl+K palette)
- crates/tui/src/commands/core.rs (the /help text command)

Usage strings (e.g. /cache [count]) stay English by design — they're
placeholder syntax, not natural language.

The existing locale-coverage test
(`shipped_first_pack_has_no_missing_core_messages`) already iterates
ALL_MESSAGE_IDS across Locale::shipped(), so the 44 new IDs are
automatically required to be present in all four locale arms or CI
fails.

This is the first of several incremental Phase 1 PRs. Phase 1b covers
the debug commands (/tokens /cost /cache), 1c the footer hints, and
1d doctor output. Phases 2–3 cover onboarding and error surfaces.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(i18n): localize /tokens /cost /cache debug output (Phase 1b, #285) (#295)

Adds 13 new MessageIds covering the report templates and the
sub-strings shared across them, with translations for all four
shipped locales (en/ja/zh-Hans/pt-BR):

- CmdTokensReport, CmdTokensContextWithWindow, CmdTokensContextUnknownWindow
- CmdTokensCacheBoth, CmdTokensCacheHitOnly, CmdTokensCacheMissOnly
- CmdTokensNotReported
- CmdCostReport
- CmdCacheNoData, CmdCacheHeader, CmdCacheTotals, CmdCacheFootnote, CmdCacheAdvice

Each template uses {placeholder} substitution via String::replace
rather than format!, since format! requires a literal — the
locale-resolved &'static str isn't one. The placeholder convention
({active}, {hit}, {miss}, …) means a translator can re-order or
restructure a sentence freely without changing the call site.

Helpers `token_count`, `active_context_summary`, `cache_summary`, and
`format_cache_history` now take `Locale` so each can resolve their
templates from the same source of truth.

The English templates byte-match the previous hardcoded format strings
so the existing 16 debug-command tests pass unchanged.

Column headers in the cache table (`turn   in    out   hit   miss …`)
are intentionally NOT localized — the body rows are formatted with
fixed column widths and translating the header words would break
alignment. Numbers, ratios, and the model id stay in English form.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(i18n): localize footer state + help section labels (Phase 1c, #285) (#296)

Adds 11 new MessageIds covering visible footer chrome and the help-overlay
section headings, with translations for all four shipped locales:

Footer:
- FooterWorking — animated `working` / `working.` / … pulse
- FooterAgentSingular / FooterAgentsPlural — the sub-agent count chip
- FooterPressCtrlCAgain — the quit-confirmation toast

Help overlay sections (`?` / `F1` / `Ctrl+/`):
- HelpSectionNavigation, HelpSectionEditing, HelpSectionActions,
  HelpSectionModes, HelpSectionSessions, HelpSectionClipboard,
  HelpSectionHelp

`KeybindingSection::label` now takes Locale and returns tr(locale, …).
`footer_working_label` and `footer_agents_chip` likewise take Locale; the
two production callsites in tui/ui.rs pass `app.ui_locale`.

The mode chip itself (agent / yolo / plan) intentionally stays English —
those are brand/acronym labels, and translating them would mean explaining
to maintainers what `代理` means in a bug report.

The keybinding catalog DESCRIPTIONS (41 entries) are not translated in this
PR — those are technical prose that would dwarf the rest of i18n work and
can ship in v0.8.5. Section labels are translated so the help overlay
groups read as expected in any locale.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(commands): smoke-test that every / command dispatches to a handler (#299)

Adds two parallel-safe smoke tests in `crates/tui/src/commands/mod.rs`
that iterate the COMMANDS registry and verify every command — and every
declared alias — dispatches to a real handler. A dispatch miss surfaces
as the fall-through `Unknown command:` error message in `execute`,
which used to be invisible until a user typed the command and saw the
"did you mean" suggestion fire on a registered command.

The tests build a workspace-isolated app via `tempfile::TempDir` so
side-effecting handlers (`/init` writing AGENTS.md, `/save` and
`/export` writing files) do not pollute `crates/tui/` when CI runs from
there. `/save` and `/export` get an explicit tempdir-relative path
because their no-arg defaults still resolve relative to `cwd`.

`/restore` is skipped — it shells out to git for the snapshot repo and
its own dedicated tests in `commands/restore.rs` already serialize on
the global env mutex via `scoped_home`. The existing coverage there is
sufficient.

Closes a gap surfaced when verifying that the v0.8.4 i18n refactor
(#294, #295, #296) did not silently break any slash-command dispatch.
All 44 commands and their aliases pass (16 aliases on top of the
44 names; `/restore` is the only skip).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump version to 0.8.4 (#297)

CHANGELOG entry covers the v0.8.4 work landed since 0.8.3:

- Localization Phase 1 (#285) — slash command help (#294), debug command
  output (#295), footer state and help-overlay section labels (#296).
  Adds 68 new MessageIds across all four shipped locales (en/ja/zh-Hans/pt-BR).

- Cache-prefix stability (#263) — five companion fixes (#287, #288→#292,
  #289, #290, #291) that keep the DeepSeek prefix cache stable across turns.

- Plus the items already in [Unreleased]: agent-mode network exec (#272),
  /skill GitHub URL parsing (#269), and the V4 Pro discount expiry extension
  (#267).

Bumps:
- Cargo.toml workspace version 0.8.3 → 0.8.4
- npm/deepseek-tui/package.json version + deepseekBinaryVersion 0.8.3 → 0.8.4
- Cargo.lock regenerated from the new workspace version.

Phase 1d (doctor output), Phase 2 (onboarding/init/missing-companion),
and Phase 3 (tool errors / sandbox denials / approvals) deferred to v0.8.5.
The shipped Phase 1 surfaces (slash commands, debug telemetry, footer
chrome) cover the highest-traffic UI paths Chinese users see first.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(release): bump internal path-dep versions + repair doc link (#301)

CI on PR #300 (release feat/v0.8.4 → main) flagged two regressions
introduced by the 0.8.4 version bump:

1. Version drift — path-dependency `version = "0.8.3"` references
   inside the workspace crates (10 crates: agent, app-server, cli,
   config, core, execpolicy, hooks, mcp, tools, tui) did not move with
   the workspace `[workspace.package] version = "0.8.4"`. The CI guard
   `scripts/release/check-versions.sh` requires they match.

2. Broken intra-doc-link `[crate::localization::english]` in the
   CommandInfo doc comment — `english` is private. Replaced with a
   reference to the public `description_for` accessor and the public
   `tr()` function.

Verified with:
- scripts/release/check-versions.sh — Version state OK.
- RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — green.
- cargo fmt + clippy + test all green.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:02:38 -05:00
Hunter Bown e620e75f99 chore: release v0.8.3
Bumps workspace, all internal path-deps, and npm wrapper (version +
deepseekBinaryVersion) from 0.8.2 → 0.8.3. Lockfile re-locked offline.
CHANGELOG entry summarizing the 0.8.3 lane: skills path bug fix,
privacy contraction, helpful missing-companion error (#258), engine
decomposition (#227), bridge/persistence/palette test gap closures,
crates.io badge, and 10 issue closures.

Local v0.8.3 verified at /tmp/deepseek-0.8.3-test/ before publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:46:21 -05:00
Hunter Bown bf6d82e4ba chore: release v0.8.2 — Windows build fix, npm offline, model-visible skills, zh-CN README
Bumps workspace, all internal path-deps, and npm wrapper (version +
deepseekBinaryVersion) from 0.8.1 → 0.8.2. Lockfile re-locked offline
to keep the registry index untouched.

Triggers auto-tag.yml on push, which creates v0.8.2 and fires
release.yml to build cross-platform binaries and draft the GitHub
Release. npm publish remains manual per CLAUDE.md release runbook.

Note: npm registry already has 0.8.2 published (with binaryVersion
0.8.1 from an earlier checkpoint). That release keeps working unchanged
because v0.8.1 binaries stay on GitHub. Repo state aligns to 0.8.2 so
the version-drift gate passes; next npm publish (which will need to be
0.8.3 since 0.8.2 is taken) will pick up binaryVersion=0.8.2 and pull
the new binaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:41:47 -05:00
Hunter Bown 5770a5747b fix cargo install packaging for v0.8.1 2026-04-30 23:45:21 -05:00
Hunter Bown 3f24759966 release: stabilize shell handles for v0.8.0
Bumps the workspace/npm wrapper to 0.8.0 and fixes completed background shell jobs retaining live process handles, which could cause Too many open files, checkpoint save failures, shell spawn failures, and lag around send/close/Esc. Also includes Windows REPL bootstrap timeout hardening and Cargo/TUNA mirror install docs.
2026-04-30 21:34:00 -05:00
Hunter Bown 3e8da4b99b chore: bump version to 0.7.9
Includes:
- Post-turn freeze fix (reorder maybe_advance_cycle before TurnComplete)
- Enter/steering fix (QueueFollowUp when model is streaming)
- Esc fanout hardening (idempotent finalize methods)
- cargo fmt pass on new code
- CHANGELOG, README, and version bump across workspace + npm
2026-04-30 20:53:10 -05:00
Hunter Bown d25783fe5b fix(v0.7.8): reconcile swarm state and unicode search 2026-04-30 19:50:01 -05:00
Hunter Bown 820985671d chore: bump version to 0.7.8
- Cargo.toml workspace version: 0.7.7 → 0.7.8
- npm/deepseek-tui/package.json: 0.7.7 → 0.7.8
- deepseekBinaryVersion: 0.7.7 → 0.7.8
- CHANGELOG.md: add v0.7.8 section
2026-04-30 18:13:35 -05:00
Hunter Bown 4a1768001b docs: add v0.7.7 CHANGELOG entry 2026-04-30 10:43:40 -05:00
Hunter Bown 8ba8600155 release: v0.7.6
- Bump workspace version to 0.7.6 (Cargo.toml + all crate internal dep pins)
- Bump npm wrapper version and deepseekBinaryVersion to 0.7.6
- Add v0.7.6 changelog entry: localization, paste burst, history search,
  pending input preview, grouped /config editor, searchable help overlay,
  Alt+↑ edit-last-queued, composer attachment management
- Update README with v0.7.6 features (localization, paste, history search)
- Archive v0.7.5 implementation plan to docs/archive/
- Update Cargo.lock
2026-04-29 17:00:36 -05:00
Hunter Bown 64d1698bde Release 0.7.1 (#156) 2026-04-28 18:38:44 -05:00