Commit Graph

401 Commits

Author SHA1 Message Date
Hunter Bown e98efcf31d fix(engine): drop dead working set prompt marker 2026-05-04 22:08:07 -05:00
Hunter Bown 1a6589c55a perf(tools): anchor tool array with cache control 2026-05-04 22:06:58 -05:00
Hunter Bown b48b68f078 perf(engine): stabilize system prompt and move working set metadata 2026-05-04 22:06:55 -05:00
Hunter Bown a14227edf8 refactor(models): rename legacy DeepSeek context window 2026-05-04 22:06:16 -05:00
Hunter Bown a4dee56fcc fix(compaction): 500K hard floor plus V4 default 2026-05-04 22:06:07 -05:00
Hunter Bown cba5e829fc diag(tui): trace view_stack push/pop for post-mortem black-screen repro
Maintainer-reported (handoff): after spawning a sub-agent in YOLO, the
transcript renders solid black and scroll keys go dead, but footer +
sidebar still render fine. The shape (black + dead input together)
strongly suggests a `View` is on the stack that returns empty layout
AND intercepts key events at the top level. The fix wants a tighter
repro than we have today.

Add `tracing::debug!` to every push / push_boxed / pop on `ViewStack`
and to the implicit pops in `apply_action` (Close + EmitAndClose).
Each line carries the `ModalKind` and post-action depth, so a future
`RUST_LOG=deepseek_tui::view_stack=debug` capture will show exactly
which view stayed pushed when the symptom recurred.

No behavior change. The handoff explicitly suggested this as the
first-look diagnostic step; we ship the diagnostic now so the next
report comes with evidence.

Refs the unresolved sub-agent black-transcript symptom captured in
session-3 handoff. Will surface to a tracking issue once we have a
concrete repro from the maintainer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 20:17:43 -05:00
Hunter Bown f6e4f634d4 Merge pull request #665 from Hmbown/fix/v0.8.11-stop-compacting-so-much
fix(compaction): default off + raise unknown-model floor to 80% (#664, v0.8.11)
2026-05-04 20:05:57 -05:00
Hunter Bown 2640d8c091 Merge pull request #594 from Hmbown/fix/593-keyring-shadow
fix(auth): dual-write API key to keyring + config to stop stale-keyring shadow (#593)
2026-05-04 20:05:52 -05:00
Hunter Bown 68f6d6995d Merge pull request #592 from Hmbown/feat/584-compaction-telemetry
feat(compaction): debug telemetry on summary calls + document framing fork (#584)
2026-05-04 20:05:46 -05:00
Hunter Bown f1764704d8 Merge pull request #590 from Hmbown/fix/588-mirror-user-language
fix(prompts): mirror user's language in reasoning + reply (#588)
2026-05-04 20:05:40 -05:00
Hunter Bown d586ff05a8 Merge pull request #591 from Hmbown/fix/583-windows-bel-default-off
fix(notifications): default Windows Auto fallback to Off, not BEL (#583)
2026-05-04 20:05:34 -05:00
Hunter Bown fc4f1e6564 fix(compaction): default to off + raise unknown-model floor to 80% (#664)
Two coordinated changes that stop the engine from routinely rewriting the
prompt prefix and burning DeepSeek V4's prefix-cache discount:

1. `Settings::default().auto_compact` flips from `true` to `false`. The
   `auto_compact = on` opt-in and the explicit `/compact` slash command
   stay available for users / agents that decide their workload benefits
   from compaction more than from cache stability. With V4's 1M-token
   window the user has plenty of headroom to run long sessions without
   auto-trimming, and aggressive compaction has been the dominant
   cost-spike vector in long sessions (the rewritten prefix invalidates
   ~90% of the cache discount on every compaction event).

2. `DEFAULT_COMPACTION_TOKEN_THRESHOLD` raised from `50_000` to
   `102_400` (80% of `DEFAULT_CONTEXT_WINDOW_TOKENS = 128_000`). This is
   the last-resort threshold used when `context_window_for_model` returns
   `None` — i.e. an unrecognised model id. Pre-v0.8.11 the fallback
   compacted at ~5% of a V4 window when model detection silently fell
   through. Now the fallback inherits the same late-trigger discipline as
   the V4 path, so model-detection drift doesn't quietly burn cache.

Together: the two changes mean compaction never fires automatically by
default, and even when explicitly opted in (or when the runtime-thread /
capacity-flow paths invoke compaction with their own `enabled = true`
config), the threshold is anchored at 80% of the model's context window
(or 80% of the 128K default if the model is unknown), never below.

Tests
=====

- `default_settings_disable_auto_compact_to_protect_v4_prefix_cache` —
  pins the new default and explains the rationale inline.
- `auto_compact_remains_explicitly_configurable` — unchanged; still
  asserts the `set("auto_compact", "on" | "off")` round-trip works.
- `compaction_threshold_scales_with_context_window` — updated to assert
  `compaction_threshold_for_model("unknown-model") == 102_400`.
- `v4_soft_caps_only_apply_to_v4_models` — updated to assert the
  unknown-model + reasoning-effort path also lands on the new floor.

Verification
============

- `cargo fmt --all -- --check` clean.
- `cargo clippy -p deepseek-tui --bin deepseek-tui --all-features
  --locked -- -D warnings` clean.
- `cargo test -p deepseek-tui --bin deepseek-tui --locked` →
  2028 passed, 2 ignored.

Refs #664 (handoff-instead-of-compact pattern, full implementation
deferred). Behaviour-only change for v0.8.11; the larger
agent-aware-handoff mechanism is its own design surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:28:02 -05:00
Hunter Bown 03d72840e6 test(tui): pin Chinese / IME character input contract for the composer
Adds two regression tests to crates/tui/src/tui/paste.rs::tests that
nail down what is currently a working code path but was not previously
covered by name:

* `ime_chinese_chars_route_through_to_composer` — simulates the
  macOS/Windows IME commit pattern (one `KeyCode::Char(c)` event per
  Chinese codepoint with realistic ~50 ms gaps so the paste-burst
  heuristic doesn't false-positive). Asserts that "你好世界" lands in
  `app.input` verbatim and that `cursor_position` advances by one per
  codepoint, not per UTF-8 byte. The non-ASCII branch in
  `handle_paste_burst_key` (paste.rs:42) is the structural anchor;
  this test pins it so a future "filter to ASCII for the paste-burst
  detector" change would surface immediately.

* `bracketed_paste_preserves_chinese_and_mixed_text` — pastes a mix
  of CJK and Latin text ("你好世界 hello 世界 café") through the
  bracketed-paste path (`insert_paste_text` → `normalize_paste_text`
  → `insert_str`) and confirms every codepoint survives plus the
  cursor tracks codepoints, not bytes.

Why these tests, why now: a community report surfaced the question
"can users input Chinese characters" without specifying the exact
failure mode. Code review of the input data path turned up nothing
broken, and these tests confirm the data path is correct end-to-end
for both single-char IME commits and bulk bracketed paste. The tests
serve as evidence (the data path is provably fine) and as a guard
against future regressions to Chinese-input support.

The tests cost nothing at runtime and build under `cfg(test)` only.

If users are still seeing a Chinese-input failure after this lands,
the candidates worth investigating in priority order are: (1) display
layer — `wrap_input_lines` / `cursor_row_col` may be miscounting
double-width CJK cells; (2) terminal-specific delivery — certain
IMEs / terminals don't emit the events crossterm expects; (3) locale
at launch — `LC_ALL=C` in non-interactive shells breaks UTF-8 input
upstream of crossterm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:50:24 -05:00
Hunter Bown 071d23a4b7 fix(auth): dual-write API key to keyring + config so stale keyring stops shadowing onboarding (#593)
Reproduction (from the user who filed #593, also the reporter of #586):

1. At any prior point, the user runs `deepseek auth set --provider deepseek`,
   which writes to the OS keyring under the `deepseek` slot.
2. The key is later rotated, the prior install is replaced, or the user
   moves to a different account.
3. The user opens the TUI, gets the in-TUI onboarding screen, and pastes
   their fresh API key.
4. `submit_api_key` → `save_api_key` writes only to `~/.deepseek/config.toml`.
5. At request time, `Secrets::resolve` follows the documented
   `keyring → env → config-file` precedence, and the **stale** keyring
   entry shadows the fresh config.toml value.
6. API call goes out with the dead key, gets a 401, the TUI shows
   "no response" with no obvious diagnostic.

The fix
=======

`save_api_key` now writes to **both** layers when a keyring backend is
reachable:

* The config file remains the durable, inspectable record of the
  active key (works in npm installs, IDE terminals, headless CI —
  everywhere). v0.8.8 made this the canonical location for a reason.
* The OS keyring entry is rewritten on every onboarding submit so a
  stale credential from a prior install is overwritten in place.

`SavedCredential` gains a new `KeyringAndConfigFile { backend, path }`
variant; the existing `ConfigFile(PathBuf)` variant remains the
fallback when no keyring backend is reachable (or under `cfg(test)`,
so the unit suite never pollutes the host keyring). The onboarding
toast naturally reports the actual outcome via
`SavedCredential::describe`, which now reads
`OS keyring (system keyring) and ~/.deepseek/config.toml` for the
common case.

`save_api_key_for` (the multi-provider entry point) is updated to
extract the path from either variant, so non-DeepSeek providers
(OpenRouter / Novita / Fireworks / NIM / SGLang) continue writing
provider-table entries to config.toml only, with no behavior change.

`deepseek doctor` warning
=========================

`run_doctor` now compares the keyring's `deepseek` slot against the
config file's `api_key` slot. When both are present and differ, the
report surfaces the discrepancy with copy-paste remediation —
`deepseek auth set --provider deepseek` rewrites both layers in one
shot, and the in-TUI onboarding now does the same. The check skips
keyring probes for other providers because they don't write to the
keyring today; probing absent slots only triggers macOS Always-Allow
prompts for nothing.

Why dual-write rather than keyring-only
=======================================

A previous attempt (`4e360274`, never merged to main) swapped the
write path to keyring-only. That hides the key from anyone who
expected to see it under `~/.deepseek/config.toml` and breaks the
"deepseek-tui works in every folder, in npm installs, in IDE
terminals" promise of v0.8.8. Dual-write keeps the inspectable copy
and adds the layered override that defeats stale-shadow without
changing the visible mental model.

Tests
=====

* `saved_credential_describe_lists_both_targets_for_keyring_and_config`
  pins the toast text shape so the user sees both targets after
  onboarding.
* The existing `save_api_key_writes_config_file_under_cfg_test` and
  `test_save_api_key_doesnt_match_similar_keys` continue to pass —
  under `cfg(test)` the keyring path is gated out, so the
  config-only outcome remains the test-time contract.

Verification
============

* `cargo fmt --all -- --check` clean.
* `cargo clippy -p deepseek-tui --bin deepseek-tui --all-features
  --locked -- -D warnings` clean.
* `cargo test -p deepseek-tui --bin deepseek-tui --locked` →
  2029 passed, 2 ignored.

Closes #593.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:55:07 -05:00
Hunter Bown 4e86a0fb8e fix(prompts): expand language-mirroring carve-out + pin reasoning_content anchor
Two small follow-ups to #588's review:

* Gemini-code-assist suggested explicitly listing environment variables,
  command-line flags, and URLs alongside identifiers/tool-names in the
  carve-out clause, since those are exactly the categories an LLM is
  likeliest to "helpfully" translate (e.g. `--verbose` or `DEBUG=true`).
  Adopting verbatim — the additions are non-controversial and the failure
  mode they prevent is real.

* Copilot flagged that the structural test only checked for the `## Language`
  heading. A future edit could keep the heading but silently weaken the
  section to a generic "respond in the user's language" directive,
  dropping the cross-cutting #588 commitment that the model's
  `reasoning_content` field — not just the visible reply — follows the
  user's language. Add a second structural anchor: assert the section
  body mentions `reasoning_content`. This matches the existing rlm test's
  "anchor tokens, not prose" convention (the API field name is the
  feature contract, not a wording choice).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:41:30 -05:00
Hunter Bown a68c8dc974 docs(notifications): only completed turns notify; add Key Reference + WezTerm-on-Windows test
Post-merge review feedback on #583 surfaced four small accuracy gaps:

1. The narrative docs in `docs/CONFIGURATION.md` and the inline comment
   in `config.example.toml` said the notification fires "when a turn
   takes longer than a threshold" — but the call site in
   `tui/ui.rs:928` is gated on `TurnOutcomeStatus::Completed`. Failed
   and cancelled turns are silent on purpose. Spell that out so users
   don't expect alerts on long failures.

2. The `notify_done` rustdoc still summarised `Auto` as "Osc9 for known
   terminals, Bel otherwise" — internally inconsistent with the new
   Windows-aware fallback documented one screen earlier on the
   `Method::Auto` enum and on `resolve_method`. Update the public
   rustdoc to point at the canonical resolution table on
   `resolve_method` and call out the `Off`-on-Windows branch.

3. The `## Key Reference` list in `docs/CONFIGURATION.md` had no entries
   for `[notifications].method`, `[notifications].threshold_secs`, or
   `[notifications].include_summary`. Other features with a dedicated
   subsection (e.g. `[memory].enabled`) are listed there too, so readers
   scanning the canonical key list could not discover the notification
   knobs. Added the three keys with cross-references to the
   Notifications subsection.

4. The Windows-only test only covered the unknown-`TERM_PROGRAM` →
   `Off` fallback. The positive path (known OSC-9 terminal still
   resolves to `Osc9`) was only tested via `iTerm.app`, which is a
   macOS-only program — Windows CI would still pass if the `WezTerm`
   arm of the match disappeared. Added
   `auto_detect_picks_osc9_for_wezterm_on_windows` so the
   WezTerm-on-Windows compatibility guarantee is exercised on the
   Windows runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:38:21 -05:00
Hunter Bown dcadb5d388 fix(compaction): cache-hit % denominator + correct RUST_LOG filter docs
The post-merge review on #584 surfaced two real bugs in the new
summary-call telemetry:

1. The cache-hit percentage used `cache_hit + cache_miss` as the
   denominator. Providers that populate `prompt_cache_hit_tokens` but
   leave `prompt_cache_miss_tokens` as `None` (the rest of the codebase
   already infers misses from `input_tokens` for cost reporting and
   `/cache`) were silently reported as a flat 100% hit rate, masking the
   actual ratio. Switch the denominator to `usage.input_tokens` so the
   ratio matches how the rest of the project reasons about cache spread.
   Extract the calc into a small `summary_cache_hit_percent` helper so
   the invariant is unit-testable.

2. The doc comment on the emit site advertised that
   `RUST_LOG=deepseek_tui::compaction=debug` would also work as a
   filter. It does not — `EnvFilter` matches the explicit target string
   when one is set, so only `RUST_LOG=compaction=debug` activates the
   event. Drop the misleading parenthetical and call out the filter
   semantics explicitly.

The new unit test pins the partial-telemetry guard so a future regress
to `(hit + miss)` denominator would be caught immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:35:05 -05:00
Hunter Bown 7712a37272 feat(compaction): debug telemetry on summary calls + document framing fork
Two follow-ups from the post-#572 cache-aligned compaction review (#584):

1. `should_use_cache_aligned_summary` now carries a doc comment that
   explains why the cache-aligned and fallback summary requests are
   framed differently. Cache-aligned replays the conversation as the
   model's own history under `system: None`; fallback reformats it
   into a `User:/Assistant:` transcript under a "concise summaries"
   system prompt. The fallback's external-transcript framing is more
   conservative for the older / smaller models the cache-aligned path
   explicitly excludes, so dropping the system prompt risks regressing
   those models without a corresponding gain. Unifying the two paths
   is a research question that wants an A/B summary-quality eval, not
   a drive-by cleanup — flagged here for a future PR rather than
   resolved silently.

2. `create_summary` now emits one `tracing::debug!` event per summary
   call carrying which path was chosen, the prompt-token count, and
   the cache-hit / miss split. Filter with `RUST_LOG=compaction=debug`
   (or the full module path
   `RUST_LOG=deepseek_tui::compaction=debug`). This makes the V4
   prefix-cache win from #572 observable post-deploy without adding
   UI surface — the compaction summary call is the request we most
   expect to benefit, and previously we had no per-call signal for it.

No UI surface changes. No model-facing prompt changes. Only adds the
path-choice variable and the debug log; existing compaction tests
(56 across `compaction::*` and `models::*`) still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:54:59 -05:00
Hunter Bown 3636908bb9 fix(notifications): default Windows Auto fallback to Off, not BEL
On Windows, the audio stack maps BEL (`\x07`) to the
`SystemAsterisk` / `MB_OK` chime — the same sound applications use
for error popups. So with the previous `Method::Auto` fallback to
`Bel`, every successful turn-completion notification ended up
sounding identical to a software error.

Reported by a community user who described it as "the popup-error
sound from a CAD program I used to use" (#583).

resolve_method() now returns `Off` instead of `Bel` on Windows for
unknown TERM_PROGRAM values. Known OSC-9-capable terminals
(`iTerm.app`, `Ghostty`, `WezTerm`) still resolve to `Osc9` on
every platform, so users running WezTerm on Windows keep getting
real notifications. macOS and Linux behaviour is unchanged.

Windows users who actively want an audible cue can opt back in by
setting `[notifications].method = "bel"` in `~/.deepseek/config.toml`.

Also:
- Documents `[notifications]` in `docs/CONFIGURATION.md` with an
  explicit Windows note (the schema was previously undocumented).
- Updates the inline comment in `config.example.toml` so users
  reading the seed config see the platform-specific behaviour.
- Splits the existing `auto_detect_picks_bel_for_unknown` test
  into a Unix variant (`#[cfg(not(target_os = "windows"))]`) and
  adds a new Windows-gated test that asserts the `Off` fallback,
  so CI's Windows runner exercises the platform-specific path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:49:03 -05:00
Hunter Bown a239968f5b fix(prompts): mirror user's language in reasoning + reply
DeepSeek V4's `reasoning_content` channel inherits the system
prompt's English bias even when users write in Chinese, so the
visible thinking trace stays in English alongside (sometimes
mixed-language) replies.

Adds a `## Language` section near the top of `base.md` directing
the model to mirror the user's language in *both*
`reasoning_content` and the final reply, with a carve-out so
identifiers, file paths, tool names, and log lines stay in their
original form (translating `read_file` to `读取文件` would break
tool calls). Default remains English when no clear signal is
present, so existing behaviour is preserved.

Includes a structural test in `crates/tui/src/prompts.rs` that
asserts the section ships in every mode (Agent / Yolo / Plan).
Wording is intentionally not asserted on, per the existing test
module's "don't fail on prose" comment.

Reported via the project Telegram community (#588).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:10:23 -05:00
Hunter Bown 6ba6add03d fix(release): switch TUI reqwest from native-tls to rustls
The aarch64-unknown-linux-gnu release build for `deepseek-tui` failed in
release.yml run 25327475634 with:

    openssl-sys v0.9.111: 'openssl/opensslconf.h' file not found

`crates/tui/src/main.rs` was the only crate in the workspace pulling
`reqwest` with `default-features = false, features = ["native-tls", ...]`
— every other crate (including the dispatcher in `crates/cli`) already
inherits the workspace default `["json", "rustls"]`. The aarch64 leg
builds with `cargo zigbuild --target aarch64-unknown-linux-gnu.2.28`,
whose zig sysroot does not ship openssl headers; the matching native-tls
job for v0.8.9 succeeded by chance against an earlier runner image but
the current `ubuntu-24.04-arm` image no longer satisfies openssl-sys's
header probe under zigbuild.

Switching the TUI's reqwest features from `native-tls` to `rustls` brings
it in line with the rest of the workspace and removes nine crates from
the build graph entirely (`openssl`, `openssl-sys`, `openssl-probe`,
`openssl-macros`, `native-tls`, `hyper-tls`, `tokio-native-tls`,
`foreign-types`, `foreign-types-shared`). reqwest 0.13.1 already uses
`rustls-platform-verifier` for OS trust-store integration, so end-user
TLS behavior against api.deepseek.com remains equivalent.

Verified locally:
- cargo clippy --workspace --all-targets --all-features --locked passes
- cargo build --release -p deepseek-tui --locked succeeds
- cargo fmt --all -- --check is clean
- no source code in `crates/` references native-tls / openssl directly

This is a release-pipeline-only fix; no user-visible feature changes.
2026-05-04 11:00:54 -05:00
Hunter Bown a92c449de5 chore(release): bump version to 0.8.10 + CHANGELOG
Picks up the v0.8.10 patch release contents:
* Daemon API quartet for whalescale-desktop integration (#561-#564,
  PR #567).
* Bug cluster: macOS seatbelt cargo registry (#558), MCP SIGTERM
  shutdown (#420), Linux PR_SET_PDEATHSIG (#421).
* npm install on older glibc fix (#555/#560 via #556 + #565).
* Shell cwd workspace-boundary validation (#524).
* Memory help/docs polish (#497 via #569).
* Onboarding language picker (#566).
* Whale nicknames interleaved with Simplified Chinese.

First-time contributors credited in CHANGELOG: @staryxchen,
@shentoumengxin, @Vishnu1837, @20bytes.

Workspace `Cargo.toml`, all 9 internal path-dep version pins, and
`npm/deepseek-tui/package.json` all bumped to 0.8.10. `Cargo.lock`
regenerated and committed alongside.

Verified locally:
* cargo fmt --all -- --check
* cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
* cargo test --workspace --all-features --locked
* bash scripts/release/check-versions.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:13:26 -05:00
Wu Yuxin 6bcf07a479 Update crates/tui/src/tui/markdown_render.rs
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-04 10:01:52 -05:00
Wu Yuxin 08a3a8f5f5 Update crates/tui/src/tui/markdown_render.rs
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-04 09:59:24 -05:00
wuyuxin c8fe367e3d fix(markdown): render tables, bold/italic, and horizontal rules
- Add Block::TableRow and Block::HorizontalRule variants
- Parse | table | rows |, drop separator rows (|---|)
- Parse --- / *** / ___ as horizontal rules
- Rewrite inline span parser to handle **bold** and *italic*
  spanning multiple words, with infinite-loop guard for unclosed markers
- Render table cells with │ separators and equal-width columns
- Apply inline formatting inside table cells

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 09:59:24 -05:00
Hunter Bown 754e8bd468 fix(v0.8.10): cache-aware compaction and onboarding paste 2026-05-04 09:58:05 -05:00
Hunter Bown 874e8b4b78 feat(prompts,tui): cache awareness in agent prompt + slash prefix Enter (#573)
Two related polish items wrapped together because both touch how the
user perceives the model's context behavior.

### Cache awareness in the agent prompt

The system prompt's Context Management section already lives inside
the volatile-content-last invariant — but the model never knew *why*
the prompt is shaped that way, or that it has any agency over keeping
the cache hit rate up.

Added a `### Prompt-cache awareness` subsection (Agent / Yolo modes)
with five concrete dos-and-don'ts:
- Append, don't reorder.
- Don't paraphrase quoted content (refer back by path).
- Use `/compact` as a hard reset, not a tweak.
- Read once, refer back instead of re-reading.
- Watch the `cache hit %` chip — red < 40%, yellow < 80%.

The chip itself already exists in the default footer status set
(`StatusItem::Cache`); the prompt addition closes the loop so the model
treats it as a real signal instead of a passive readout.

### #573 — typing `/mo` + Enter activates the first matching command

Previously a partial slash command + Enter sent the literal `/mo` as a
turn. The popup was already showing `/model` highlighted, so the user
expectation (and the OPENCODE behavior the issue cites) is that Enter
runs the highlight. The fix routes Enter through
`apply_slash_menu_selection` first when the popup is open and the input
starts with `/`. If the popup is empty (no matches) the legacy submit
path still fires — Enter on a non-slash line is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 04:22:38 -05:00
Hunter Bown 4fe3bc37bc feat(tui): file @-mention frecency ranking (#441)
When the user @-mentions a file, score it; on the next mention popup,
re-sort completions so files mentioned often + recently float to the top.
Never-mentioned candidates fall back to the workspace ranker's order
without surprises.

* New `tui/file_frecency.rs` module:
  - `FrecencyRecord { path, count, last_used }`, persisted as a JSONL
    append at `~/.deepseek/file-frecency.jsonl`.
  - `record_mention(path)` bumps the count, stamps the time, appends a
    line, and evicts to a 1000-entry cap (matches the issue's acceptance
    criterion). Eviction drops the lowest-scored entries.
  - `rerank_by_frecency(candidates)` decays each record's score by
    `count * exp(-ln(2) * age / HALF_LIFE)` (7-day half-life — same as
    the OPENCODE source) and stable-sorts the candidate list.
* Wired into `find_file_mention_completions` so the menu shows
  re-ranked entries automatically.
* Wired into both confirmation paths: `apply_mention_menu_selection`
  (Enter / Tab on the popup) and `try_autocomplete_file_mention`'s
  unique-match shortcut.

I/O is best-effort: a missing home directory, a permission failure,
or a corrupt JSONL line gets silently skipped — frecency loss is never
worth blocking the user's autocomplete.

Two unit tests cover the core: rerank floats a hot path above
never-mentioned ones (and preserves the original order for ties), and
score decay drops a stale-but-popular entry below a fresh one after
~8 half-lives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 03:06:04 -05:00
Hunter Bown 59e1dd4e99 feat(tui): stacked toast overlay above footer (#439)
The status-toast bus already typed Info/Success/Warning/Error with
configurable per-toast TTL, a 24-bounded queue, and a sync adapter that
migrates legacy `app.status_message` writes — what was missing was
visibility when several events arrive in quick succession. The footer
showed only the most recent and the rest expired silently.

* New `App::active_status_toasts(limit)` returns up to `limit` currently
  active toasts (sticky pinned first, then queued newest-last so a stack
  reads chronologically). Drains expired toasts off the front as a side
  effect — same cleanup as the single-toast path.
* New `render_toast_stack_overlay` renders up to 2 *additional* toasts
  as a 1-2 line strip directly above the footer when the queue has 2+
  entries. Doesn't touch the layout chunk constraints — it's an
  absolute-position overlay, so the chat area never reflows when toasts
  arrive or expire. Older entries render dimmed in the level color so
  the freshest still draws the eye in the footer line itself.
* `TOAST_STACK_MAX_VISIBLE = 3` (footer line + up to 2 overlay rows).
  Anything beyond that ages out silently as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 02:58:02 -05:00
Hunter Bown af9e651017 feat(hooks): shell_env hook for per-shell-tool env injection (#456)
New `HookEvent::ShellEnv` fires immediately before each `exec_shell`
invocation. The hook's stdout is parsed as `KEY=VALUE\n` lines and the
resolved env vars are merged on top of the spawned process environment.
Useful for ephemeral credentials (`aws-vault export …`), per-skill
PATH adjustments, short-lived tokens.

* `HookExecutor::collect_shell_env(&context)` runs every matching
  `shell_env` hook synchronously, captures stdout, parses it, returns
  the merged map. Later hooks override earlier ones.
* `parse_env_lines` tolerates `export KEY=VAL`, quoted values
  (`"…"` / `'…'`), comments (`#`), blank lines. Lines without `=` are
  silently dropped — easier than failing the whole hook for one stray
  human-friendly line. Values are taken verbatim; we don't run the
  string through a shell to avoid expansion surprises.
* Resolved KEY names (NEVER values) are written to
  `~/.deepseek/audit.log` so a session can be reconciled later
  without leaking the secret material.
* Hook failure / timeout contributes no vars — `exec_shell` is never
  aborted because of a misbehaving env hook.

Plumbing:
* `RuntimeToolServices` gains an optional
  `Arc<HookExecutor>`. Wired in `tui/ui.rs` from the App's existing
  `app.hooks` clone. Test contexts default to `None`.
* `ShellManager::execute_with_options_env` and
  `execute_interactive_with_policy_env` are new variants that accept
  an `extra_env: HashMap<String, String>` and forward it via
  `CommandSpec::with_env` so `prepare()` carries it into `ExecEnv.env`.
* The original `execute_with_options` / `execute_interactive_with_policy`
  call the new variants with an empty map so existing callers
  (including all 5 internal call sites) keep working unchanged.
* `commands/hooks.rs` `event_label` covers the new variant.

Tests cover `parse_env_lines` against realistic hook output (bare
assignments, `export` prefix, quoted values, comments, blanks, malformed
lines). `cargo clippy --workspace --all-targets --all-features --locked --
-D warnings` clean.

`config.example.toml` documents the new event with an `aws-vault`
example and the audit-logging contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 02:52:20 -05:00
Hunter Bown e92403de7a fix(v0.8.10): bug cluster (#558 #420 #421) (#570)
* fix(sandbox): allow ~/.cargo/registry under macOS seatbelt (#558)

Sandboxed shell sessions on macOS were rejecting reads/writes to
~/.cargo/registry/{cache,index,src} and ~/.cargo/git, making
`cargo build`/`cargo publish` unrunnable from inside the TUI's shell tool
(hit while shipping v0.8.9).

* Resolve cargo home via `CARGO_HOME` env (cargo's own override) with a
  `$HOME/.cargo` fallback. New helper `resolve_cargo_home()` is shared by
  the policy generator and the param table to keep them in lockstep —
  emit one without the other and `sandbox-exec` refuses to load the
  profile.
* Always allow read access on `(param "CARGO_HOME")`. Grant write access
  to the `registry/` and `git/` subpaths whenever the policy isn't
  read-only — those directories must be mutable for `cargo` to populate
  them on a cache miss.
* Skip the cargo block entirely when neither `CARGO_HOME` nor `HOME` is
  set so we never reference an undefined `(param ...)`. (Practically
  only fires in stripped CI containers.)

Two tests cover the policy/param sync — one with HOME set, one with
both vars cleared — using a module-local `ENV_LOCK` mutex to serialize
env mutation, mirroring the pattern landed in `main.rs` at d06eaed0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mcp): graceful SIGTERM shutdown for stdio servers (#420)

Stdio MCP child processes were getting SIGKILL'd via tokio's
`kill_on_drop(true)` on TUI exit. The contract calls for SIGTERM so
well-behaved servers can flush pending state before dying.

Changes:
* New `async fn shutdown(&mut self)` on `McpTransport` (default no-op).
  `StdioTransport` overrides it to send SIGTERM via `libc::kill` and
  await child exit up to a 2-second grace window before letting drop
  fire SIGKILL as the backstop. Graceful path on Unix; on Windows the
  `kill_on_drop` (TerminateProcess) path remains unchanged because
  there's no SIGTERM-equivalent.
* New `Drop` on `StdioTransport` sends SIGTERM as a fallback for code
  paths that didn't call `shutdown` explicitly. Drop is sync, so the
  signal arrives microseconds before tokio's own Child drop fires
  SIGKILL, but it still gives MCP servers that handle SIGTERM idempotently
  a chance to start cleanup.
* New `McpPool::shutdown_all` walks every connection, calls the async
  shutdown, and clears the pool.
* The agent engine's run loop calls `shutdown_all` on `Op::Shutdown`
  before the pool drops so graceful exit is the default path. Best-effort
  — if the pool isn't initialized or the lock is contended, the Drop
  fallback still sends SIGTERM.

Test: `stdio_transport_shutdown_terminates_child` spawns a real `cat`
child, calls `shutdown`, asserts the call returns within the grace
window, and confirms the pid is reaped (`kill(pid, 0)` returns ESRCH).
Unix-only — Windows already exercised by the kill_on_drop path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(shell): set PR_SET_PDEATHSIG on Linux to reap orphaned children (#421)

Shell-spawned children survive the TUI on abnormal exit (panic without
unwind, SIGKILL of the parent, OOM). The existing cooperative cancel
path SIGKILLs the whole process group via the cancellation token, but
that only fires when the parent gets to run its drop / cleanup code.
A crashed parent leaves children orphaned to init.

* New `install_parent_death_signal` helper called on every shell
  Command setup. On Linux it adds a `pre_exec` hook that runs
  `prctl(PR_SET_PDEATHSIG, SIGTERM)` immediately after fork — the
  kernel then sends SIGTERM to the child the moment our process exits,
  even on SIGKILL of the TUI itself.
* All three Command spawn sites in `tools/shell.rs` (one-shot, wait,
  interactive) get the same hook.
* Documented the macOS / Windows gap: those platforms have no kernel
  equivalent. The cooperative path still handles normal shutdown;
  abnormal exit there is tracked as a watchdog follow-up per the
  issue's acceptance criteria.

The pre_exec body is `unsafe`-marked because it runs in the post-fork
async-signal-safe window. The closure only calls `libc::prctl` with
stack-allocated constants; no heap, no locks. Errno is surfaced via
`std::io::Error::last_os_error` but the spawn is not aborted — losing
the safety net is strictly less bad than failing the user's command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(subagent): interleave Chinese whale names with English in nickname pool

Sub-agent UI labels rotate through `WHALE_NICKNAMES`. The list was
English-only — every spawn produced "Blue", "Humpback", etc. Adding
Simplified-Chinese names (蓝鲸, 座头鲸, 抹香鲸, …) interleaved with the
English ones doubles the pool size and gives a roughly even mix on
each new spawn, with the same wraparound behavior at index >= 48.

Goal is friendly variety, not strict locale matching — a CN-locale user
still gets some English names and vice versa. Pure cosmetic; no
behavioral or persistence-format change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: cargo fmt for seatbelt cargo home block

* memory: polish help and docs (#569)

- add /memory help and clearer invalid-subcommand guidance
- register /memory in shared slash-command help
- align memory docs with current behavior and config
- add focused tests for help and discovery

* feat(onboarding): language picker step before API key (#566)

First-run users hit Welcome → API key → Trust → Tips with no obvious way
to discover that a Chinese / Japanese / Portuguese UI exists.
Issue #566 surfaced this from a Chinese user. The TUI already has full
translations for `en`, `ja`, `zh-Hans`, `pt-BR` (plus `auto` detection
from `LC_ALL` / `LANG`); the only gap was discoverability.

* New `OnboardingState::Language` variant inserted between Welcome and
  ApiKey. `Welcome → Language → ApiKey/Trust/Tips` is the new flow;
  `Esc` from Language returns to Welcome.
* New `tui/onboarding/language.rs` panel renders the picker with hotkeys
  1-5 for `auto` / `en` / `ja` / `zh-Hans` / `pt-BR`. Each row shows the
  native name (日本語, 简体中文, …) plus an English label so the user
  doesn't have to read the target language to pick it. The currently
  persisted setting is highlighted with a filled bullet.
* Selecting a hotkey calls the new `App::set_locale_from_onboarding`
  which writes through `Settings::set("locale", …)` + `Settings::save`
  and re-resolves `app.ui_locale` immediately so the rest of onboarding
  renders in the chosen language. Pressing Enter keeps the current
  setting (defaults to `auto`).
* `onboarding_step` now reports `1/N` … `N/N` correctly with the new
  step inserted (Welcome=1, Language=2, ApiKey=3 if needed, …).
* Doesn't expand the supported-locale set — the QA-pending list in
  `localization::PLANNED_QA_LOCALES` is unchanged. We only show what
  ships with full coverage today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: 20bytes <133551439+20bytes@users.noreply.github.com>
2026-05-04 02:37:29 -05:00
20bytes 8aed1bb674 memory: polish help and docs (#569)
- add /memory help and clearer invalid-subcommand guidance
- register /memory in shared slash-command help
- align memory docs with current behavior and config
- add focused tests for help and discovery
2026-05-04 02:25:13 -05:00
Hunter Bown 0047b3225b feat(runtime-api): daemon API quartet for whalescale (#561 #562 #563 #564) (#567)
Bridge work to unblock whalescale-desktop's Settings/Composer/Archived-chats
flows without requiring a daemon recompile per dev-port or client-side
aggregation.

#561 / whalescale#255 — CORS allow-list configurable
* Add `[runtime_api] cors_origins` config field, `--cors-origin URL`
  (repeatable) flag on `deepseek serve --http`, and `DEEPSEEK_CORS_ORIGINS`
  env var. User entries stack on top of the built-in defaults
  (localhost:3000, localhost:1420, tauri://localhost). Resolution preserves
  first-seen order and drops empty/duplicate values; invalid HeaderValues
  log a warning and are skipped.
* Refactor `cors_layer()` to read merged origins from `RuntimeApiState`.

#562 / whalescale#256 — `PATCH /v1/threads/{id}` accepts the full editable
field set
* Extend `UpdateThreadRequest` with `allow_shell`, `trust_mode`,
  `auto_approve`, `model`, `mode`, `title`, `system_prompt`. Each is
  optional; missing means no change. Empty-string clears `title`/
  `system_prompt`. Empty `model`/`mode` rejected with 400.
* Add `title: Option<String>` to `ThreadRecord` (additive, no schema bump
  per documented criteria — old readers ignore the field without
  misinterpretation). `list_threads_summary` now returns the user-set title
  when present, falling back to the derived input-summary title.
* `thread.updated` event payload now carries a `changes` map with only the
  fields that actually changed.

#563 / whalescale#260 — list-archived-only filter
* New `archived_only=true` query param on `GET /v1/threads` and
  `GET /v1/threads/summary`. Backed by a new `ThreadListFilter` enum
  (`ActiveOnly` | `IncludeArchived` | `ArchivedOnly`). `archived_only`
  takes precedence over `include_archived`. Default behavior unchanged.

#564 / whalescale#261 — `GET /v1/usage` aggregation
* New `RuntimeThreadManager::aggregate_usage` walks all threads/turns,
  filters by inclusive `since`/`until` RFC 3339 bounds, accumulates token
  totals + cost (via `pricing::calculate_turn_cost_from_usage`), and
  groups by `day` (default), `model`, `provider`, or `thread`.
* New `GET /v1/usage` route. `since`/`until`/`group_by` query params,
  `since > until` and unknown `group_by` rejected with 400. Empty time
  ranges yield empty `buckets` (never 404).

5 new tests cover preflight Allow-Origin echoing for both default and
extra origins, the extended PATCH field set + clear-by-empty + 400 paths,
the archived_only filter on list + summary endpoints, and the
/v1/usage envelope + validation errors. Existing 13 runtime_api tests
continue to pass; the parity gates and full workspace test suite are clean.

`docs/RUNTIME_API.md` and `config.example.toml` updated to document the
new params, body shape, endpoint, and CORS knob.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 02:18:19 -05:00
Zhang Zihan 3e56f3526e fix(shell): validate cwd parameter against workspace boundary (#524)
The shell tool's `cwd` / `working_dir` parameter was accepted raw
without any workspace boundary check, unlike file tools which all go
through `ToolContext::resolve_path()`.  This allowed the AI model to
execute shell commands from arbitrary directories outside the workspace.

Reuse the existing `resolve_path()` validation so that:
- Paths outside the workspace root are rejected with `PathEscape`
- `trust_mode = true` still bypasses the check (consistent behavior)
- `trusted_external_paths` entries are respected automatically
- Default behavior (no cwd argument) remains unchanged

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-04 02:18:16 -05:00
Hunter Bown d06eaed008 fix(tests): serialize env-mutating tests with module mutex
`resolve_api_key_source_reports_env_when_set` and
`resolve_api_key_source_prefers_config_over_env` both mutate
DEEPSEEK_API_KEY in process-global env. With cargo test's default
parallelism they race — one test reads while the other's set is still
active — causing intermittent CI failures on Linux (passes locally).

Fix: module-level `static ENV_LOCK: Mutex<()>`, both tests acquire
before touching env. `unwrap_or_else(|p| p.into_inner())` recovers
from poisoning so a panic in one test doesn't cascade.

Closes the CI failure introduced in the v0.8.9 cut (4511ea76); does
not affect runtime behavior — `Config::default()` is still empty and
`resolve_api_key_source` semantics are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 01:16:44 -05:00
Hunter Bown 4511ea763f chore(release): bump version to 0.8.9 + cargo fmt 2026-05-04 00:56:51 -05:00
Hunter Bown 6ff4db5ba0 feat(v0.8.9): address all issues labeled v0.8.9
#551 — sidebar filters prior-session agents (from_prior_session)
#552 — status messages prioritise ↑ affordance over /queue
#553 — oversized paste consolidation to @mention file (+uuid suffix)
#523 — release.yml: add if: guard so release job doesn't skip on dispatch
#526 — verify cost_status side-channel is fully wired (already in place)
#554 — mouse/trackpad scroll now sets user_scrolled_during_stream
#522 — set RELEASE_TAG_PAT secret for auto-tag → release trigger
#504 — session-context panel (SidebarFocus::Context, config toggle, default off)
#501 — multi-arch Dockerfile (+BUILDPLATFORM pin) + devcontainer + release CI
#484 — docs/RUNTIME_API.md rewritten against actual runtime_api.rs endpoints
#482 — close v0.8.8 planning tracker

Fixes from review:
- RUNTIME_API.md: corrected endpoints (/v1/...), port (7878), doctor JSON schema (flat)
- Dockerfile: added --platform=$BUILDPLATFORM for native multi-arch builds
- docs/DOCKER.md: removed Docker Hub references (GHCR only)
- sidebar.rs: dropped unused _theme variable
- settings.rs: context_panel default changed to false
- app.rs: paste filename now includes 8-char uuid suffix to avoid collision
2026-05-04 00:33:08 -05:00
Hunter Bown fc1970fa55 fix(auth): use config-backed setup without credential prompts 2026-05-03 23:02:11 -05:00
Hunter Bown 449312cf2b fix(sidebar): collapse empty Todos/Tasks/Agents panels in Auto layout
Auto-mode reserved 25% of the sidebar height for each of Plan / Todos
/ Tasks / Agents regardless of content, so on a typical 32-row sidebar
each slot was ~8 rows. With Todos/Tasks/Agents empty (the common case
when a goal is set but no checklist exists), Plan ended up with ~5
content rows of its 8-row slot consumed by header + token bar +
separator, and steps got silently clipped — the user-reported
"sidebar broken / Plan disappearing".

Build the constraint list dynamically: include a slot only for panels
that actually have content. Plan always renders (it owns the
session-wide empty hint). Todos/Tasks/Agents collapse to zero rows
when empty, letting the visible panels share the full height.
2026-05-03 13:53:37 -05:00
Hunter Bown cef095f105 fix(tui): disable bracketed paste + mouse capture in panic hook
The panic hook only popped kitty keyboard flags, disabled raw mode,
and left the alt-screen. Bracketed paste (`\e[?2004h`) and SGR mouse
capture (`\e[?1006h`) stayed on, so any panic would leave the user's
parent shell stuck wrapping pastes in `\e[200~…\e[201~` and printing
`\e[<…M` mouse events. Mirror the clean-shutdown teardown so the
shell is fully restored even when the TUI crashes.
2026-05-03 13:50:36 -05:00
Hunter Bown 68102e600c fix(paste): stop modals swallowing Cmd-V when they don't override handle_paste
`ViewStack::handle_paste` interpreted `ViewAction::None` (the trait
default) as "the modal consumed the paste," so any modal that didn't
override `handle_paste` — command palette, model picker, approval
dialog, pager, etc. — silently dropped every paste while it was on
top. The call site at `tui/ui.rs::Event::Paste` then took the
"consumed" branch and skipped the composer insert.

Switch the trait method to return `bool` (default `false` =
not consumed). `ProviderPickerView::handle_paste` now returns `true`
only when it actually appended to its key-entry buffer. Pin the
default-behavior contract with a regression test.
2026-05-03 13:50:13 -05:00
Hunter Bown 4c7be1f90b fix(render): disable OSC 8 default + strip ANSI from tool output
ratatui's buffer drops the bare ESC byte but happily paints every
other byte of an escape (`[`, `0`, `;`, `m`, OSC payloads, etc.) into
a buffer cell. That drifts columns by the escape-body length and
produces user-reported corruption like `526sOPEN` instead of
`526   OPEN` when shell tools (`gh`, `git` with color forced on, PTY
runs) emit ANSI in stdout.

Two changes:

- Default OSC 8 emission off on every platform until it can be emitted
  out-of-band of the ratatui buffer pipeline. macOS users with a
  conformant terminal can still opt in via `[ui] osc8_links = true`.
- Add `osc8::strip_ansi_into` (handles CSI, OSC, DCS/SOS/PM/APC, and
  standalone two-byte ESC) and apply it in `output_rows` so shell
  tool output is sanitized before it enters the transcript. Raw bytes
  remain available to spillover and the model.

Tests cover SGR stripping, OSC 8 wrappers, control-byte handling, and
preservation of `\n` / `\r` / `\t`.
2026-05-03 13:48:07 -05:00
Hunter Bown 1d315ec3d6 fix(cost): accrue review tool LLM usage 2026-05-03 13:32:14 -05:00
Hunter Bown db2f761120 fix(goal): inject session goal into system prompt
Thread the /goal objective from the TUI into engine prompt assembly so follow-up turns can see the current session objective. Add prompt and engine regression tests that pin the session_goal block and verify empty goals are skipped.
2026-05-03 13:26:00 -05:00
Hunter Bown 12de76b7b5 fix(cost): accrue background-LLM cost via cost_status side-channel (#526)
Same root cause as the RLM gap fixed in the previous commit
(child-token usage falling through the cracks), but for engine-
internal background calls — compaction summaries, seam recompaction,
and cycle briefings. They use `flash_client.create_message` directly
to avoid bloating the engine event channel and never feed
`response.usage` into `App::accrue_session_cost`. A long session
that fired auto-compaction or cycle-restart under-reported cost by
however many tokens those calls consumed.

5 leak sites fixed in this commit:

- `compaction.rs:894` (auto-compaction summary)
- `seam_manager.rs:330,425,518` (3 seam recompaction paths)
- `cycle_manager.rs:384` (cycle briefing turn)

Why a side-channel and not a plumbed callback: the leaky callers
are engine-internal helpers without a direct handle to `App` or
the engine's event channel. A side-channel (`cost_status::report` /
`drain`, mirroring `retry_status`) keeps the change surface tiny —
one new `report` line per call site — and any future background
caller (summarizers, retrieval helpers) gets accrued for free.

Mechanism:

- New `cost_status` module: `OnceLock<Mutex<f64>>` backed pool;
  `report(model, &usage)` adds via `pricing::calculate_turn_cost_from_usage`,
  `drain()` reads-and-zeros.
- TUI render loop drains once per tick (in the same idle-tick spot
  as `tick_quit_armed`) and folds the result into
  `App::accrue_subagent_cost` so the high-water mark stays monotonic.
- Three unit tests pin the contract: report accumulates, drain
  zeros, unknown models are no-ops.

CLI one-shot leakers (`run_review`, `run_one_shot`,
`run_one_shot_json`, doctor health probe) intentionally NOT
patched — they don't run inside an interactive session, so they
don't affect the dashboard. They could be added later for parity
with `deepseek doctor --json` cost-reporting, but that's separate.

Combined with the prior `tool_routing::accrue_child_token_cost_if_any`
fix for `rlm`, this closes every TUI-internal cost-tracking gap I
could find. The dashboard should now match DeepSeek website billing
within the usual rounding (cache-hit vs miss heuristics aside).

Verified
========
- `cargo fmt --all -- --check`
- `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings`
- `cargo test --workspace --all-features --locked`
- 3 new tests for the cost_status module pass.
2026-05-03 13:03:27 -05:00
Hunter Bown 6589ff44aa fix(v0.8.8 hotfix): worked-chip + RLM cost accrual + Windows OSC8 default
Three foreground-visible v0.8.8 regressions surfaced after the
GitHub Release went up. v0.8.8 was taken back down (release
deleted, tag deleted) so this lands cleanly on a re-tag.

1. Worked-chip claimed model work that never happened
=====================================================

`footer_worked_chip` read `App::session_started_at.elapsed()`, so a
TUI that had been open and idle for 4 minutes rendered "worked 4m"
even though no turn had ever fired. The label literally says
"worked" — it should track real model work, not idle uptime.

Fix:
- Add `App::cumulative_turn_duration: Duration`, init to zero.
- Increment on `EngineEvent::TurnComplete` from the just-finished
  turn's elapsed time (the same value already captured for the
  desktop-notification path).
- Drop the now-unused `session_started_at` field.
- `FooterProps::from_app` reads `cumulative_turn_duration`. The
  60s threshold inside `footer_worked_chip` stays — it now means
  "60s of real model work," not "60s since launch."

New regression test pins the invariant: idle app with zero
cumulative turn time → empty chip; 90s of real work → "worked 1m 30s."

2. RLM child-token cost wasn't reaching `session_cost`
=======================================================

A user reported the dashboard showing $0.15 spent for a session
that the DeepSeek website billed at $3+. Sub-agent token usage
already feeds the parent's cost via `MailboxMessage::TokenUsage`
(#166), but the `rlm` tool spawns its own DeepSeek calls under
`child_model` and reports them only in display metadata
(`input_tokens` / `output_tokens`) that nothing consumes for
billing. A session that uses RLM heavily under-reports cost
linearly with the child token count.

Fix: define a contract — tools that spawn their own LLM calls
populate `metadata.child_input_tokens` / `child_output_tokens` /
`child_prompt_cache_hit_tokens` / `child_prompt_cache_miss_tokens`
/ `child_model`. `tool_routing::accrue_child_token_cost_if_any`
runs after every `handle_tool_call_complete`, reads those fields,
and routes the cost through `accrue_subagent_cost`. RLM's
metadata block is updated to populate the contract.

Generic on purpose — future tools that spawn LLM calls (batch
summarizers, retrieval helpers) get accrued for free.

3. OSC 8 hyperlinks corrupting Windows console rendering
========================================================

A Windows user reported the model-name strip showing
"eepseek-v4-flash" (leading `d` consumed) and three overlapping
copies of the composer panel. Likely cause: legacy `cmd.exe` and
pre-Win11 PowerShell consoles don't always honor the OSC 8 string
terminator (`ESC \`) cleanly, and v0.8.8 emitted OSC 8 by default.

Fix: default `osc8_links` to `false` on Windows targets only
(`!cfg!(windows)`). Mac/Linux still default-on. Windows users on
modern terminals (Windows Terminal, Alacritty, WezTerm) can opt
back in via `[ui] osc8_links = true`.

Doesn't address the rest of the rendering corruption — that
needs a Windows machine to reproduce — but the OSC 8 escape was
the most likely culprit and disabling it on Windows is a strict
no-op for terminals that *don't* support it.

Verified
========
- `cargo fmt --all -- --check`
- `cargo clippy --workspace --all-targets --all-features --locked
   -- -D warnings`
- `cargo test --workspace --all-features --locked`
- New regression test for worked-chip pins the bug.
2026-05-03 12:53:17 -05:00
Hunter Bown 84c55e9022 chore(release): bump version to 0.8.8
- Workspace `version = "0.8.8"` in root `Cargo.toml`.
- 31 internal `deepseek-*` path-dep version pins across the
  9 crates that declare them.
- `npm/deepseek-tui/package.json` `version` and
  `deepseekBinaryVersion` both updated.
- `Cargo.lock` regenerated for the new workspace version.
- `CHANGELOG.md` `[Unreleased]` heading promoted to
  `[0.8.8] - 2026-05-03`.

`scripts/release/check-versions.sh` reports the workspace, npm
wrapper, and lockfile all aligned. Pushing this to `main` should
fire `auto-tag.yml`, which creates the `v0.8.8` tag with
`RELEASE_TAG_PAT`. The tag triggers `release.yml` to build the
matrix and draft the GitHub Release. The npm wrapper publish
remains manual (npm 2FA OTP requirement).

What ships in v0.8.8
====================

The full polish stack already merged via PRs #514 (stabilization),
#515 (OSC 8 hyperlinks), #517 (inline diff render), #518 (user
memory MVP), #519 (foreground polish + per-project overlay +
security + Windows redraw fix), and #508 (Linux ARM64 prebuilts +
install docs). See `CHANGELOG.md` and the README "What's new in
v0.8.8" section for the full list.
2026-05-03 08:55:41 -05:00
Hunter Bown 2cfcca471e fix(truncate): drop dead Windows stub for filetime_set_modified
The previous commit gated `prune_older_than_keeps_fresh_files_drops_stale_ones`
on `#[cfg(unix)]` because the mtime-backdate helper relies on
`utimensat`, which doesn't exist on Windows. That left the
`#[cfg(not(unix))]` stub of `filetime_set_modified` with zero callers
on Windows, and `-D dead-code` (implied by `-D warnings`) refused to
compile the test binary on Windows runners.

Drop the Windows stub entirely. The `cfg(unix)` test is the only
caller; `cfg(not(unix))` builds need nothing in its place.

Restores PR #519 Windows CI to green.
2026-05-03 08:43:52 -05:00
Hunter Bown 6a2d95ba3d fix(truncate): Windows test fixes — path components + cfg(unix) on mtime test
CI surfaced two Windows-only failures in `tools::truncate::tests`:

1. `write_spillover_creates_directory_and_writes_file` asserted
   `path.to_string_lossy().contains(".deepseek/tool_outputs")`. On
   Windows the path separator is `\`, so the substring match never
   matched even though the file lived in the correct directory.
   Replace with a `path.components()` walk that checks for the two
   directory names individually — passes on Windows, Linux, and macOS.

2. `prune_older_than_keeps_fresh_files_drops_stale_ones` relied on
   `filetime_set_modified` to backdate a file by 30 days. The helper
   is implemented with `utimensat` on Unix and is a no-op on Windows,
   which means the prune step had no stale file to drop and the
   `assert_eq!(pruned, 1)` always failed. The mtime invariant is
   already covered by Linux + macOS in CI; gate the test on
   `cfg(unix)` rather than ship a no-op Windows variant that can't
   fail meaningfully.

Restores PR #519 CI to green so the v0.8.8 release can land.
2026-05-03 08:37:53 -05:00
Hunter Bown bda30b0fd6 Merge main into feat/v0.8.8-tui-polish + gemini-code-assist feedback
Resolves the post-#514/#517/#518 conflicts:

- CHANGELOG.md: kept both polish-stack and Linux ARM64 entries under
  [Unreleased]; reordered so the ARM64/install-message Changed/Docs
  sections precede the Releases footer.
- config.example.toml: kept both the `instructions = [...]` example
  and the `[memory]` opt-in stanza in sequence.
- crates/tui/src/config.rs: kept both `instructions_paths()` (#454)
  and `memory_enabled()` (#489) on the Config impl.
- crates/tui/src/prompts.rs: extended
  `system_prompt_for_mode_with_context_and_skills` to take BOTH
  `instructions: Option<&[PathBuf]>` and `user_memory_block:
  Option<&str>`. Section 2.5a renders instructions; 2.5b renders the
  memory block — both above the skills block so KV prefix caching
  still wins.
- crates/tui/src/core/engine.rs: thread both args through the two
  call sites.
- crates/tui/src/prompts.rs: update the `system_prompt_for_mode_with_context`
  forwarder and the test caller to pass `None` for the new arg.
- .gitignore: ignore `.claude/*.local.md` and `*.local.json` so
  local ralph / Claude-Code notes can't leak into commits.

Folds in two valid suggestions from the gemini-code-assist review on #519:

- `client.rs`: collapse the duplicated `LlmError → label` match and the
  `human_retry_reason` body into a single
  `retry_reason_label_and_human(err) -> (&'static str, String)` helper.
- `widgets/footer.rs::retry_banner_spans`: merge the two separate
  `match &props.retry` blocks into one that returns both `(label, color)`.

Behavior is unchanged; refactor is a pure DRY win.
2026-05-03 08:29:59 -05:00