Commit Graph

324 Commits

Author SHA1 Message Date
Hunter Bown 83a1060460 feat(tui): terminal-aware keybinding rendering
Add `KeyBinding` widget at `crates/tui/src/tui/widgets/key_hint.rs` that
renders chord shortcuts in the host platform's notation: `⌥+X` on macOS
and `alt+X` on Linux/Windows. Constructors `plain`, `alt`, `shift`,
`ctrl`, `ctrl_alt` plus `KeyBinding::new` for arbitrary modifier sets.
Both a `Display` impl and `From<KeyBinding> for Span<'static>` are
provided so call sites can use the type in plain `format!` strings as
well as in ratatui line/span builders.

The Help view now renders its keybinding hints through `KeyBinding`
instead of hardcoded `Alt+`/`Shift+`/`Ctrl+` strings, so macOS users see
`⌥+↑` etc. matching every other Mac app.

Windows AltGr handling: `is_altgr()` flags `Ctrl+Alt` (the way
European-layout AltGr keypresses arrive on Windows) and
`has_ctrl_or_alt()` exposes a "real modifier" predicate that returns
`false` for AltGr. The composer's plain-char detector in `ui.rs` now
uses `has_ctrl_or_alt`, so AltGr-typed glyphs (`@`, `\`, `|`, …) on
European keyboards reach the input buffer instead of being swallowed as
modified shortcuts. Distinguishing left-vs-right Alt keys is not
portable across crossterm backends; `Ctrl+Alt+<char>` chords are
unbindable on Windows as a result, and that limitation is documented in
the module comment.

Tests: 10 unit tests cover plain/alt/shift/ctrl/ctrl_alt rendering,
modifier order, function keys, lowercase normalization, the `Span`
conversion, `is_press` semantics, and AltGr platform branching.

Fixes #89

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:33:56 -05:00
Hunter Bown ec3470fd07 feat(tui): two-tap Ctrl+C quit confirmation with 2s countdown
Add a codex-rs-style "Press Ctrl+C again to quit" prompt so a single
stray Ctrl+C in idle state no longer kills the session.

State is held on `App::quit_armed_until: Option<Instant>` and threaded
through three small helpers (`arm_quit`, `quit_is_armed`, `disarm_quit`,
`tick_quit_armed`). The redraw loop calls `tick_quit_armed` each pass
and caps `event::poll` at the deadline so the prompt expires on time
even when no input arrives.

Behavior:
- First Ctrl+C in idle state: arm a 2s window; footer shows
  "Press Ctrl+C again to quit" (warning color) overriding any active
  status toast.
- Second Ctrl+C inside the window: clean shutdown via `Op::Shutdown`.
- Ctrl+C 3 seconds later: re-arms instead of quitting.
- Ctrl+C while a turn is in flight (`is_loading`): unchanged — still
  cancels the turn, and explicitly disarms the quit prompt.

Tests cover the timer lifecycle: default-disarmed, arm-sets-2s-window,
disarm-clears-and-redraws, tick-no-op-within-window, tick-clears-after-
expiry, and re-arm-after-expiry-starts-fresh-window.

Fixes #90
2026-04-26 17:32:24 -05:00
Hunter Bown 697fb5de4d feat(tui): external editor support (Ctrl+E)
Bind Ctrl+E in the composer to suspend the TUI, spawn $VISUAL/$EDITOR
(fallback `vi`) on a temp file pre-populated with the composer's current
contents, then read the file back into the composer on save.

- New `crates/tui/src/tui/external_editor.rs`:
  - `spawn_editor_for_input` toggles raw mode + alt-screen + mouse +
    bracketed-paste before/after the edit and forces a `terminal.clear()`
    on return so a SIGWINCH during the edit doesn't leave a stale viewport.
  - `run_editor_raw` is the testable core: writes seed -> spawns editor ->
    reads back -> returns `Edited(new) | Unchanged | Cancelled`. Uses
    `tempfile::NamedTempFile` so the temp file is unlinked on every path
    (success, non-zero exit, missing binary, IO error).
- `tui/ui.rs`: split the existing `End | Ctrl+e` cursor-end keybinding
  so `End` still moves the cursor to the line end, and `Ctrl+E` now
  spawns the external editor. Status-line feedback: "Edited in <editor>",
  "Editor closed (no changes)", or "Editor cancelled".
- 7 new unit tests cover the resolver precedence (VISUAL > EDITOR > vi),
  the no-op / failure / missing-binary / edited paths, and an explicit
  temp-file-cleanup assertion via a captured editor argv.

Fixes #91
2026-04-26 17:29:05 -05:00
Hunter Bown bf2a1765fa feat(tui): priority-ordered footer hint dropping for narrow terminals
Rework `FooterWidget::status_line_spans` so the footer never wraps
mid-hint at any width. Hints now drop in priority order:

1. mode label (always visible; truncated only as a last resort)
2. model name (always visible alongside mode; truncated mid-word only
   after status has already been dropped)
3. status label ("working", "draft", "refreshing context", ...) — drops
   first when space is tight

Previously the model name would ellipsize the moment a long status
label crowded the line, even at 60–80 columns. The new tier system
keeps mode + model intact down to ~25 cols and only falls back to
mode-only on extreme narrow widths.

Includes snapshot-style tests at widths 40, 60, 80, 100, 120 covering
the full / drop-status / truncate-model / mode-only tiers.

Fixes #88
2026-04-26 17:28:36 -05:00
Hunter Bown ec98a64711 Merge pull request #106 from Hmbown/fix/issue-103-stream-resume
fix(client): TCP/HTTP2 keepalives + stream-error diagnostics (#103 Phase 1+2)
2026-04-26 17:24:05 -05:00
Hunter Bown bbdfb26f3c fix(client): TCP/HTTP2 keepalives + stream-error diagnostics (#103 Phase 1+2)
Two fixes for the persistent "Stream read error: error decoding response
body" we saw mid-turn during long V4-pro thinking sessions.

1) HTTP transport tuning (`crates/tui/src/client.rs`):
   - Drop the blanket 300s request timeout. Long V4 thinking turns
     legitimately exceed the wall-clock window; per-chunk and per-stream
     guards in `engine.rs` already bound how long we wait without progress.
   - Add `tcp_keepalive(30s)` so dead-peer detection happens at the TCP
     layer instead of waiting for the application to notice.
   - Add `http2_keep_alive_interval(15s)` + `http2_keep_alive_timeout(20s)`
     so HTTP/2 connections to DeepSeek's edge don't go silent and get
     killed by an upstream proxy mid-thinking.

2) Stream-error diagnostics (`crates/tui/src/client/chat.rs`):
   - Walk reqwest's `std::error::Error::source()` chain when a chunk read
     errors, so the underlying hyper / h2 / io error is logged. Without
     this the outer "error decoding response body" message tells us
     nothing about WHY the stream died.
   - Track elapsed wall time, bytes received so far, and ms since the
     last successful event; log them alongside the error chain. Lets us
     tell HTTP/2 RST_STREAM mid-idle from chunk-decode-failure on a
     short stream from gzip-corruption mid-burst.

Phase 3 (transparent retry with `prefix` continuation) is intentionally
NOT in this PR. The retry-flag plumbing on MessageRequest + chat.rs prefix
wire format + engine.rs retry loop is a meaningful surface that deserves
its own review pass; this PR ships the diagnostic-and-resilience floor so
we can land the harder retry work knowing the underlying network state is
better.

Refs #103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:19:42 -05:00
Hunter Bown 97c89745fa Merge pull request #105 from Hmbown/fix/issue-101-path-routing
fix: @-mention path routing (#101)
2026-04-26 17:16:55 -05:00
Hunter Bown 98ba62915a Merge pull request #104 from Hmbown/fix/issue-100-rlm-query
fix(rlm_query): Phase 1+2 - Add diagnostic logs and fallback for empty responses
2026-04-26 17:16:52 -05:00
Hunter Bown 6beb2099e4 style(working_set): collapse nested if-let in expand_mention_home
Parity CI's `cargo clippy ... -D warnings` rejects the nested `if let`
pattern in `expand_mention_home` under the new clippy::collapsible_if
lint (rust-clippy 1.95). Use the chained `if let ... && let ...` form
the lint suggests. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:12:38 -05:00
Hunter Bown adb60ba6b0 perf+fix(file_mention): address Gemini and Devin code-review findings
Five issues from the PR #105 review pass:

1. Lazy file-tree index for fuzzy_resolve (HIGH, Gemini)
   The previous `fuzzy_resolve` walked the workspace up to depth 6 on every
   miss. A user typing several non-existent paths in one message could
   trigger multiple disk-intensive walks. Replace with a `OnceLock`-backed
   basename → paths index, built once on first miss and reused thereafter.

2. Cache cwd at construction (MEDIUM, Gemini)
   `std::env::current_dir()` was a syscall on every `resolve` call (up to
   8 mentions per message). Capture once in `Workspace::new()` and store
   in the struct.

3. Include directories in fuzzy match (MEDIUM, Gemini)
   The mention system supports directory listings, but fuzzy_resolve was
   restricted to `is_file()`. Allow directories too.

4. Drop `Path::canonicalize` from the mention loop (MEDIUM, Gemini)
   `Workspace::resolve` already returns absolute paths when the workspace
   root is absolute (always true in TUI use). Removed the per-mention
   `canonicalize` syscall on the message-send hot path. The rare
   symlink-aliasing dedup miss is an acceptable cost.

5. Gate <missing-file> blocks through dedup set (Devin)
   The `Err` arm in `local_context_from_file_mentions` pushed a
   `<missing-file>` block before reaching the `seen.insert` check, so the
   same non-existent mention typed twice produced duplicate blocks and
   wasted prompt tokens. Restructured the loop so all blocks (existing
   AND missing) flow through the dedup gate.

Bonus: replaced the test that was mutating the global cwd (race-prone
across tests) with one that constructs `Workspace::with_cwd` explicitly.
Added a second test exercising the lazy-index + directory-match paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:08:33 -05:00
Hunter Bown d7b202d996 style(rlm_query): cargo fmt fixes for parity CI
CI's `cargo fmt --all -- --check` rejected two long expressions in
`crates/tui/src/tools/rlm_query.rs` (the per-child structured-log match
arm, and the `Ok(Ok(response))` extract_text invocation). Reformatted to
match rustfmt's preferred multi-line block style. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:58:14 -05:00
Hunter Bown 08706f77d6 fix(file_mention): two-pass resolver for @-mentions (#101)
- Moved resolution logic to  in .
- Two-pass resolution: workspace, then cwd, then fuzzy fallback.
-  now uses cwd if different from workspace.
2026-04-26 16:52:43 -05:00
Hunter Bown 96c979f160 fix(rlm_query): Phase 1+2 - Add diagnostic logs and fallback for empty responses 2026-04-26 16:48:00 -05:00
Hunter Bown c1259eff68 merge: don't enter offline mode on recoverable engine errors (closes #86) 2026-04-26 16:30:21 -05:00
Hunter Bown a8fe5298a2 fix(tui): don't enter offline mode on recoverable engine errors
The TUI's `EngineEvent::Error` handler in `ui.rs` matched on `{ message, .. }`
and unconditionally set `app.offline_mode = true`. This meant any transient
stream-disconnect (e.g., the chunked-transfer connection getting closed during
a long V4 thinking turn with no SSE keepalive) flipped the session offline,
queued the user's next message, and forced them to recover manually — even
though the engine had already classified the error as recoverable.

The engine has been emitting `Event::error(message, recoverable)` with the
correct boolean since the error-taxonomy work in #66. Stream stalls
(engine.rs:2286), max-duration aborts (:2322), max-bytes aborts (:2334), and
upstream stream errors (:2344) all set `recoverable = true`. Hard failures
like sub-agent spawn failures (:1202) and post-recovery context overflows
(:1378, :1559) set `recoverable = false`. The UI just wasn't reading it.

Pull the body out into a `pub(crate) fn apply_engine_error_to_app` helper so
the branch logic is unit-testable from `ui/tests.rs`, then split:

- `recoverable = true`  → status: "Connection interrupted: …"; stay online.
- `recoverable = false` → status: "Engine error; queued messages stay
                          pending: …"; flip into offline mode.

Add two regression tests covering both branches.

Fixes #86

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:30:18 -05:00
Hunter Bown 1f00ac6311 chore(release): drop auto npm publish, document manual flow, trim research PDF
- Remove the `publish-npm` job from `release.yml`. It has been failing on
  every release with `npm error code EOTP` because the configured `NPM_TOKEN`
  doesn't bypass 2FA. Manual publish from a developer machine is the actual
  ship path; codify that.
- Update `docs/RELEASE_RUNBOOK.md` "npm Wrapper Release" to describe the
  manual flow (`npm publish --access public` + OTP) and explain why the auto
  path is gone, with a recovery note for future Trusted-Publishing migration.
- Refresh stale cross-reference comment in `publish-npm.yml` (the workflow
  remains as inert plumbing for an eventual Trusted Publishing setup).
- Stop tracking `docs/DeepSeek_V4.pdf` (4.4 MB). It was never referenced
  outside test fixture filenames; the tests synthesize their own fake PDF.
  Add to `.gitignore` so a local copy can sit there without nagging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:26:10 -05:00
Hunter Bown 9758e26349 fix(docs): escape <no body> in mcp doc-comment so rustdoc accepts it
CI Documentation job was failing with `error: unclosed HTML tag 'no'` at
crates/tui/src/mcp.rs:71. Rustdoc parses unescaped `<no body>` as an HTML
tag and `RUSTDOCFLAGS=-Dwarnings` (used in CI) promotes that to a hard
error. Reword the comment to refer to the literal string instead.
2026-04-26 15:56:01 -05:00
Hunter Bown fa99fb5124 fix(engine): 256K output budget + capacity controller off by default
User feedback after v0.6.2 dogfooding: "we'd be better off simplifying and
removing guardrails." Two changes that meaningfully shrink the surface:

1. TURN_MAX_OUTPUT_TOKENS: 32_768 → 262_144 (256K).
   V4 thinking models can produce tens of thousands of reasoning tokens
   on hard prompts before the visible reply, and DeepSeek V4 has a 1M
   context window. 32K was tight for that workload (showed up as the
   model "stopping mid-response" once reasoning exhausted the budget).
   256K is generous enough that the per-turn ceiling effectively never
   bites in normal use.

2. CapacityControllerConfig::enabled: true → false.
   The controller's main intervention, `TargetedContextRefresh`, runs
   `compact_messages_safe` which rewrites the live conversation —
   visually identical to the agent "restarting" mid-turn. The failure
   mode it protects against (context overflow) is rare in practice and
   self-correcting (the model surfaces a clear error). Power users on
   V4 do not need the guardrail; users who do can re-enable it via
   `capacity.enabled = true` in `~/.deepseek/config.toml`.

Tests:
- context_budget_reserves_output_and_headroom: switched fixture model
  to deepseek-v4-pro (1M context) so the 256K reservation doesn't
  saturate the budget to zero.
- cooldown_blocks_repeated_action: explicitly enables the controller
  (the cooldown logic short-circuits when disabled).

cargo clippy --workspace -- -D warnings clean; full test suite green
(990 + adjacent crate tests).
2026-04-26 15:51:58 -05:00
Hunter Bown 6ab2fcc21f fix(tui): rustfmt parity + working-strip stays visible all turn
Two fixes folded into one commit (the parity failure was blocking the
v0.6.2 npm publish, the strip fix is the dogfooding follow-up):

1. cargo fmt --all: subagent/mod.rs (long timeout wrapper) was over the
   line-length budget when committed earlier; rustfmt rewraps it. CI
   parity (`cargo fmt -- --check`) was failing the release pipeline.

2. footer working-strip stays visible for the entire turn: previously
   the strip only animated while `is_loading || is_compacting ||
   running_agents > 0`. Between LLM rounds inside a single turn (tool
   execution, reasoning replay, capacity refresh) `is_loading` flickers
   off — and so the user saw the strip vanish for seconds at a time
   even though the agent was clearly still working. Widen the gate to
   ALSO include `runtime_turn_status == Some("in_progress")`, which
   only clears when `EngineEvent::TurnComplete` fires — so the strip
   now stays lit for the whole turn duration.
2026-04-26 15:45:13 -05:00
Hunter Bown 0d92eb847b fix(engine): raise output-token + step ceilings (mid-response cutoff)
User repro: V4 thinking on hard prompts (~107s of thinking) randomly
"stops mid-response", more often when starting in Agent mode and
switching to YOLO. Two ceilings were too tight:

1. TURN_MAX_OUTPUT_TOKENS: 4096 → 32768
   `reasoning_content` from V4 thinking can easily exceed 4K tokens on
   hard problems. Once the per-turn output budget exhausts, the API
   closes the SSE stream with `finish_reason: "length"` and the visible
   reply ends up empty — surfaced as the assistant "stopping randomly".
   32K leaves comfortable headroom for thinking + the visible reply on
   every realistic turn while staying well below DeepSeek V4's
   1M-context output ceiling.

2. max_steps: 100 → u32::MAX (effectively unlimited)
   100 was hitting the ceiling on long multi-step plans (wide refactors,
   sub-agent orchestration) and presenting as the agent "giving up
   mid-task". V4's 1M context window means there's no good reason to
   cap steps administratively. Users can still interrupt with Ctrl+C /
   Esc; a turn naturally ends when the model stops emitting tool calls.

All 54 turn tests pass; full workspace clippy + fmt remain clean.
2026-04-26 15:38:59 -05:00
Hunter Bown d98cc58028 fix(tui): sidebar padding + capacity controller tuning
Two tuning fixes for issues observed in v0.6.2 dogfooding:

#63 follow-up — sidebar panels still empty in compact terminals:
  `section_padding: Padding::uniform(1)` ate two rows of every sidebar
  panel (one above content, one below). At the 25% layout split, in
  terminals around 12-15 rows tall, Plan/Todos/Tasks each get only
  3 rows total — borders take 2, vertical padding takes 2, leaving
  -1 (saturated to 0) rows for the actual content. Even "No todos" /
  "No active plan" got eaten. Switched to horizontal-only padding so
  the inner row survives.

Capacity-controller tuning (user feedback: "refreshing context is overtuned"):
  `apply_targeted_context_refresh` runs `compact_messages_safe` which
  rewrites the conversation history — visually identical to the agent
  "restarting" mid-session. The previous defaults
  (low_risk_max=0.34, refresh_cooldown_turns=2, min_turns=2)
  fired this every couple of turns once p_fail crept above 0.34.
  Bumped:
    - low_risk_max:               0.34 → 0.50
    - refresh_cooldown_turns:     2 → 6
    - min_turns_before_guardrail: 2 → 4
  Still well below the medium-risk ceiling (0.62), so genuine drift
  still triggers; routine noise no longer does.

All 14 capacity tests + workspace clippy + fmt remain clean.
2026-04-26 15:27:22 -05:00
Hunter Bown aa8d0dc73a merge: per-cell transcript line cache + revisions (closes #78)
Resolves conflicts with the #65 resize fix that landed first. Both branches
converged on the same resize-coalescing + display-width truncation fix;
took the perf branch's more detailed inline comments and combined the
transcript bench from #78 with the existing #65 resize regression tests.

Issue #78 baseline (release, 5000-cell synthetic transcript):
  pure scroll, off=0    3549µs → 21µs   (~150x)
  pure scroll, off=2000 3303µs → 19µs   (~170x)
  streaming append      11.6ms → 3.4ms  (~3.4x)
2026-04-26 14:53:04 -05:00
Hunter Bown eee5081ef7 merge: clean resize redraw + display-width truncation (closes #65) 2026-04-26 14:49:42 -05:00
Hunter Bown ab70c40beb perf(tui): cache wrapped transcript lines per-cell (closes #78)
Scrolling far back through a long transcript stalled the entire UI: every
keypress paid the cost of re-wrapping every history cell from index 0 on
every frame. Two bugs combined to defeat the existing per-cell cache:

1. **Uniform cache keys** — `widgets/mod.rs` synthesized
   `cell_revisions = vec![app.history_version; len]`, so a single mutation
   anywhere bumped every cell's revision and busted the entire cache.
2. **Vec-deep-clone on cache hit** — `CachedCell.lines: Vec<Line>` deep-cloned
   on every `prev.clone()` inside `ensure`, so even a fully-cached frame paid
   O(total_lines) per render.

Fix mirrors Codex's chatwidget pattern: track per-cell revisions in
`App.history_revisions`, bump only the cell whose content actually
changed, and store cached lines behind `Arc<Vec<Line>>` so a cache-hit
clone is O(1). The cache reuse path is unchanged; what changed is the
keying.

Touchpoints:
* `App::history_revisions` + `next_history_revision` counter, kept in
  lockstep with `history` via `add_message` / `extend_history` /
  `push_history_cell` / `clear_history` / `pop_history` /
  `bump_history_cell` helpers.
* `cell_at_virtual_index_mut` and the `append_streaming_text` path now
  bump only the targeted cell's revision instead of fanning the global
  `history_version` across the whole transcript.
* `TranscriptViewCache::ensure_split` accepts cell shards directly so the
  caller no longer concatenates history + active-cell entries into a
  fresh `Vec<HistoryCell>` every frame.
* `mark_history_updated` resyncs `history_revisions.len()` to
  `history.len()`, preserving correctness for direct callers that bulk
  mutate via `clear`/`extend`.

Bench (release, 5000-cell synthetic transcript, 100×30 area):

| scenario             | before  | after  |
|----------------------|--------:|-------:|
| pure scroll, off=0   | 3549 µs |  23 µs |
| pure scroll, off=100 | 3338 µs |  23 µs |
| pure scroll, off=500 | 3306 µs |  20 µs |
| pure scroll, off=2k  | 3303 µs |  20 µs |
| streaming, off=0     | 11.6 ms | 3.4 ms |
| streaming, off=2k    | 11.6 ms | 3.3 ms |

Pure-scroll renders are now ~150× faster and constant-time vs scroll
offset; streaming cost is ~3.5× lower (the remaining cost is the
per-frame flatten which always rebuilds the line buffer when the cell
count changes — orthogonal follow-up).

Bench is `#[ignore]`'d:
`cargo test -p deepseek-tui --release bench_transcript_scroll -- --ignored --nocapture`

All existing transcript and scroll tests pass; clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:47:17 -05:00
Hunter Bown 033fef6cb2 fix(tui): force clean redraw on resize / bound sidebar labels (closes #65)
After v0.6.1's light-theme removal exposed it more visibly, rapid resizes
left stale glyphs in the right column (sidebar fragments, mid-character
title truncation, duplicated transcript timestamps). Three small fixes:

- Coalesce queued `Event::Resize` events, run a single `terminal.clear()`,
  and immediately draw the new frame instead of waiting for the next event
  loop iteration. Previously the cleared screen could sit blank between
  the resize handler's `continue` and the next draw, so any other event
  arriving in that window would be processed before the repaint.
- `truncate_line_to_width` for budgets `<= 3` was counting codepoints
  instead of display widths, overrunning the cell budget for any
  double-width grapheme. Fix by accumulating display widths consistently.
- Add a `tracing::debug!` log to the resize handler so users hitting this
  in the wild can confirm whether crossterm is delivering the event.

Adds two regression tests in `tui/widgets` (resize cycle + cache
invalidation on width change) and one in `tui/ui` (truncate semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:42:42 -05:00
Hunter Bown 6d06595b76 merge: animated working strip (closes #61) 2026-04-26 14:22:37 -05:00
Hunter Bown 70ce26e196 merge: rlm_query parallelism verification + per-child UI (closes #60) 2026-04-26 14:22:32 -05:00
Hunter Bown 2b7800885e merge: 'deepseek metrics' CLI (closes #70) 2026-04-26 14:22:27 -05:00
Hunter Bown 49673d2ea3 feat(rlm_query): verify parallel fan-out + per-child prompt rendering (closes #60)
Introduce `RlmChildClient` — a dyn-compatible `#[async_trait]` wrapper around
the single create_message operation — so tests can inject a `MockRlmClient`
without a live API key. This replaces the direct `Arc<DeepSeekClient>` field
with `Arc<dyn RlmChildClient>`, wired transparently via `RlmQueryTool::new`.

Concurrency regression test (`rlm_parallel_fanout_overlaps_not_serialized`):
fires N=4 children each sleeping 50 ms through `join_all`. Asserts total
elapsed < 4×50 ms (serial bound) and that all start timestamps cluster within
<50 ms of each other. First run: total_elapsed=54 ms, start_spread=141 µs —
fan-out was already correct; no serialization fix needed.

UI wiring tests (`rlm_query_tool_cell_wired_with_prompts_on_start` etc.) verify
that `handle_tool_call_started` with `rlm_query` populates `GenericToolCell.prompts`
from the `prompts` (array) and `prompt` (singular) input shapes, and that
non-fan-out tools leave `prompts: None`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:21:43 -05:00
Hunter Bown 9804c92c21 feat(cli): add 'deepseek metrics' command (closes #70)
Implement `deepseek metrics` as a dispatcher-handled subcommand (no TUI
binary roundtrip) that reads ~/.deepseek/audit.log, session JSON files,
and tasks runtime JSONL event streams, then prints a human-readable
usage rollup aggregated by tool name, compaction events, sub-agent
spawns, and capacity-controller interventions.

Flags: --json (machine-readable) and --since DURATION (e.g. 7d, 24h,
30m, now-2h, 2h30m). Empty/missing audit log exits 0 with an empty
rollup; malformed lines are skipped silently via tracing::trace!.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:17:58 -05:00
Hunter Bown 7d0450f541 feat(tui): animated water-spout working strip in the footer (closes #61)
Replace the single-spout bounce animation with two independent `╭───╮`
arcs sweeping at different speeds across a calm `─` water surface. Add
`footer_working_label` to pulse `working` → `working...` at 400 ms
cadence while a turn is live. The dot-pulse fires even in low-motion
mode; the arc strip is gated behind `!app.low_motion`. Frame math is
purely deterministic so the test suite can pin specific frames.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:17:17 -05:00
Hunter Bown 9bc8eee927 feat(tui): animated water-spout working strip in the footer (closes #61)
Replace the single-spout bounce animation with two independent `╭───╮`
arcs sweeping at different speeds across a calm `─` water surface. Add
`footer_working_label` to pulse `working` → `working...` at 400 ms
cadence while a turn is live. The dot-pulse fires even in low-motion
mode; the arc strip is gated behind `!app.low_motion`. Frame math is
purely deterministic so the test suite can pin specific frames.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:17:02 -05:00
Hunter Bown ebc70176ad feat(tui): bracketed-paste config toggle + capacity hot-path bench
Closes #77, refs #75.

#77 — bracketed paste was unconditionally enabled at terminal init. Add a
\`bracketed_paste\` field to Settings (default true) and propagate it through
TuiOptions → App → run_tui / pause_terminal / resume_terminal so users on
the rare terminal that mishandles \`\e[?2004h\` can disable it via
\`/set bracketed_paste off\` or \`bracketed_paste = false\` in
\`~/.config/deepseek/settings.toml\`. Modern terminals continue to work as
before. All TuiOptions construction sites updated in one pass.

#75 — added an ignored-test microbench for \`compute_profile\` in
\`crates/tui/src/core/capacity.rs\`. Run with:
  cargo test -p deepseek-tui --release bench_compute_profile -- --ignored --nocapture

Baseline (release, M1):
  window=  16  per-call=  48ns
  window=  64  per-call= 126ns
  window= 256  per-call= 385ns
  window=1024  per-call=1438ns

Sub-µs at typical window sizes — no optimization shipped, bench locks in
the regression contract. No new dev-deps (uses std::time::Instant +
black_box, gated as #[ignore]).
2026-04-26 14:10:50 -05:00
Hunter Bown 432082e956 docs(agents): document deepseek as the canonical CLI binary
The user-facing entry point for every flow is the `deepseek` dispatcher
(crates/cli), not `deepseek-tui`. Future agent sessions and example
commands should default to `deepseek` / `cargo run --bin deepseek`.
Mirror the same directive in the local CLAUDE.md (gitignored).
2026-04-26 14:01:30 -05:00
Hunter Bown ac1332565c release: v0.6.2
Highlights:
- fix(client): SSE idle-timeout so a stalled stream surfaces a clear error
  instead of hanging the active cell (#76)
- fix(tui): sidebar Agents panel reads live engine progress, not just the
  cached snapshot — matches the footer chip in real time (#63)
- fix(tui): generic tool result preview preserves newlines for diff stats
  / file lists / todo snapshots (#80)
- fix(tui): slash-menu scroll viewport now exercises center-tracking past
  the first 6 entries (#64)
- feat(mcp): connect-failure errors include URL, status, body excerpt,
  transport — credentials masked (#71)
- feat(tools): mark alias tools (spawn_agent, close_agent, send_input,
  delegate_to_agent) with _deprecation metadata; removal slated 0.8.0 (#72)
- feat(capacity): V4 model priors (deepseek-v4-pro/flash) + key
  normalization, plus DEEPSEEK_CAPACITY_PRIOR_V4_* env overrides (#73)
- feat(tools): explain parallel fan-out caps in agent_spawn vs rlm_query
  descriptions and error messages — cost-class table in TOOL_SURFACE.md (#81)
- chore(errors): partial wiring of the error taxonomy — classify_error_message
  helper used in capacity controller, audit log fields pending (#66)
- chore(providers): scaffold OpenRouter and Novita variants end-to-end
  (env keys, default base URLs, model normalization). Modal /provider
  picker UI still pending (#52)

Build hygiene:
- cargo fmt clean, cargo clippy --workspace -- -D warnings clean
- cargo test --workspace passes (979+ tests across crates)
- pre-existing dead-code warnings gated per-item with TODO refs to #61/#66
EOF
)
2026-04-26 13:56:40 -05:00
Hunter Bown 3375fc7285 merge: explain parallel fan-out caps (fixes #81 — was PR #82) 2026-04-26 13:55:21 -05:00
Hunter Bown 1107b723b1 chore: simplify pass + clippy clean for v0.6.2
Cleanup pass after the issue fixes (#64, #71, #80, #63):

Simplifications:
- sidebar.rs: extract `push_agent_row` closure to remove the duplicated
  two-line agent rendering (cached + progress-only paths used the same
  shape with different summary text).
- engine.rs: replace `error_categories.iter().any(|c| c == X)` with
  `.contains(&X)` (clippy::manual_contains).
- widgets/mod.rs: replace `for idx in menu_top..menu_bottom` index loop
  with `.iter().enumerate().take(menu_bottom).skip(menu_top)`
  (clippy::needless_range_loop).

Build hygiene (CI runs `cargo clippy ... -- -D warnings`):
- error_taxonomy.rs: per-item `#[allow(dead_code)]` on `ErrorSeverity`,
  `ErrorEnvelope`, and `ErrorEnvelope::new` with TODO notes referencing
  #66. Keeps deepseek's removal of the file-wide allow but stops the
  scaffold from breaking the build until #66 follows up.
- app.rs: per-field `#[allow(dead_code)]` on `fancy_animations` (pending
  #61 footer animation consumer).
- config/lib.rs: complete the OpenRouter/Novita variant scaffolding so
  `match ProviderKind { ... }` is exhaustive — add api_key/base_url env
  loading (`OPENROUTER_API_KEY`, `NOVITA_API_KEY`, optional `*_BASE_URL`
  overrides), wire `api_key_for` / `base_url_for` arms with the documented
  defaults, and extend `normalize_model_for_provider` so generic V4 model
  names map to each provider's catalog ID. Full /provider picker UI still
  pending #52.

Verified: cargo fmt clean, cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings clean, full test suite passes
(979 + adjacent crate tests).
2026-04-26 13:54:54 -05:00
Hunter Bown 124011a862 fix(tui): sidebar Agents panel reads live progress, not just cache (closes #63)
Repro: spawn 5 sub-agents. The footer chip correctly shows "5 agents" because
running_agent_count() unions app.agent_progress (live engine events) with
app.subagent_cache (settled snapshot from Op::ListSubAgents). The sidebar's
Agents panel only read app.subagent_cache and so showed "No agents" while
the footer said 5 — same data flow bug the user screenshotted in #63.

Mirror the footer's union here:

- Live progress-only IDs (in agent_progress, not yet in subagent_cache) get a
  one-line "starting" row with the latest progress message — surfaces the
  freshest signal first.
- Cached entries get the full status row (steps taken, role, objective).
- Header shows "{live_running} running / {total}" with both counts unified.

The Agents panel now stays in sync with the footer chip and never lies
about whether agents are in flight. Todos panel was already wired correctly
to app.todos (the SharedTodoList lock); only the agents path was racing.

Refs #63
2026-04-26 13:48:28 -05:00
Hunter Bown f342d6508e fix(tui): preserve newlines in generic tool result preview (closes #80)
Before, GenericToolCell rendered its `output` through `render_compact_kv`, which
treated the entire string as one logical line and let the wrapper handle
overflow. Multi-line output (git diff --stat, todo snapshots, file lists)
ended up squashed into a single hard-wrapped blob — the screenshot in the
issue showed "Cargo.lock | 1 + crates/cli/Cargo.toml | 1 + crates/cli/src/main.rs"
all on one row.

Switch the result rendering to `render_tool_output_mode` (already used by
ExecCell) which:

- splits on `\n` first, then wraps each line independently;
- caps live view at TOOL_OUTPUT_LINE_LIMIT (= 6) rows with a "+N more lines;
  press v for details" affordance;
- emits the full body in transcript view.

Threaded `RenderMode` through `ToolCell::Generic(...)` dispatch and renamed
`GenericToolCell::lines_with_motion` → `lines_with_mode(mode)` (sole caller).

Tests:
- `generic_tool_cell_preserves_multi_line_output_in_transcript` asserts each
  diff-stat file lands on its own row.
- `generic_tool_cell_caps_multi_line_output_in_live_with_affordance` pins the
  live cap + affordance + transcript-includes-everything contract.

Fixes #80
2026-04-26 13:44:51 -05:00
Hunter Bown ec92e535e8 feat(mcp): surface URL, status, body excerpt, transport on connect failure
Before: a failed MCP server connection just said "Failed to connect to SSE: 401" or
"Failed to spawn MCP server 'foo'" — devs had to enable RUST_LOG=debug to see
what actually went wrong.

Now:
- SSE failures show "MCP SSE rejected (transport=http url=... status=401):
  <body excerpt up to 200 bytes>", with userinfo + bearer tokens + api_key
  query params masked.
- stdio spawn failures show "MCP stdio spawn failed (transport=stdio
  server=foo cmd="..." args=[...] env_keys=[...])" — env values stay private,
  only keys leak.

Helpers `mask_url_secrets`, `redact_body_preview`, `bounded_body_excerpt` are
covered by 4 unit tests.

Fixes #71
2026-04-26 13:40:07 -05:00
Hunter Bown 86f59cd2c2 merge: slash-menu scroll viewport fix (fixes #64) 2026-04-26 13:37:30 -05:00
Hunter Bown 320325e419 fix(tui): bump SLASH_MENU_LIMIT to 128 so the scroll viewport works
The composer's render path already paginates with center-tracking, but the
source list was hard-capped at 6 entries — so pressing Down arrow past
index 5 had no entries to land on. Repro: with ~37 slash commands, hitting
Down repeatedly stuck at the last visible row.

Bumping the source cap to 128 lets the existing viewport scroll logic
exercise the full filtered command list. No render-path change needed.

Fixes #64
2026-04-26 13:37:29 -05:00
Hunter Bown feb3cf1e0c feat: explain parallel fan-out caps in tool descriptions and error messages (fixes #81) 2026-04-26 13:16:12 -05:00
Hunter Bown 38069700cc chore: wip capacity canonical state + tool alias deprecation 2026-04-26 13:11:57 -05:00
Hunter Bown 2adbe398ba merge: tool alias deprecation metadata (fixes #72) 2026-04-26 12:55:17 -05:00
Hunter Bown 4f18809d74 merge: V4 capacity priors (fixes #73) 2026-04-26 12:53:31 -05:00
Hunter Bown c58d10ded1 feat(tools): mark alias tools with deprecation metadata
Add `wrap_with_deprecation_notice` helper in the subagent module that
merges a `_deprecation` block into a ToolResult's metadata. Applied
exclusively on alias invocations:

- `spawn_agent` → use `agent_spawn` (removed in v0.8.0)
- `delegate_to_agent` → use `agent_spawn` (removed in v0.8.0)
- `close_agent` → use `agent_cancel` (removed in v0.8.0)
- `send_input` → use `agent_send_input` (removed in v0.8.0)

Canonical names are unaffected. Each alias invocation also emits a
`tracing::warn` so the deprecation appears in audit logs. Documents
the deprecation schedule in `docs/TOOL_SURFACE.md`. Four unit tests
verify the notice shape and that canonical tools stay clean.

Refs #72

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:32:26 -05:00
Hunter Bown cf9fdef9d8 fix(capacity): add V4 model priors and key normalization
Add deepseek_v4_pro (3.5) and deepseek_v4_flash (4.2) priors to
CapacityControllerConfig::default() so V4 models are no longer silently
mapped to the generic 3.8 fallback.

Extend normalize_model_prior_key to match v4-pro, v4_pro, v4-flash,
v4_flash, and deepseek-ai/-prefixed NIM identifiers before the V3/
reasoner branches to prevent cross-matches. V3 and reasoner fallbacks
are unchanged.

Add deepseek_v4_pro_prior / deepseek_v4_flash_prior fields to
CapacityConfig (config.toml) and DEEPSEEK_CAPACITY_PRIOR_V4_PRO /
DEEPSEEK_CAPACITY_PRIOR_V4_FLASH env-var overrides, matching the
existing V3 pattern.

Refs #73

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:28:21 -05:00
Hunter Bown e9970fcad3 ci: switch npm publish to NPM_TOKEN + add auto-tag workflow
The OIDC Trusted Publisher path for npm has 404'd on PUT for v0.5.1,
v0.5.2, and v0.6.1, even with valid OIDC tokens. Switch publish-npm and
publish-npm-manual to a classic NPM_TOKEN automation token (set the
NPM_TOKEN repo secret to a granular access token scoped to deepseek-tui
with publish permission) so future releases ship reliably.

Also add .github/workflows/auto-tag.yml: when the workspace version on
main changes, push the matching v$VERSION tag automatically so release.yml
fires without a manual tag push. Requires a RELEASE_TAG_PAT secret to
trigger downstream workflows (GITHUB_TOKEN tag pushes don't trigger
on: push: tags by design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:22:15 -05:00
Hunter Bown e1ac84ae44 release: v0.6.1 — pricing update, remove light theme + theme setting
- V4 cache-hit input prices cut to 1/10th per DeepSeek pricing update:
  Pro promo 0.03625→0.003625, Pro base 0.145→0.0145, Flash 0.028→0.0028
- Remove the 'light' theme variant (Variant::Light, Theme::light(), test)
- Remove the theme setting entirely — hardcode UI_THEME to whale/dark,
  drop the theme field from Settings, ConfigView, and config command
- Bump workspace version 0.6.0 → 0.6.1 (Cargo.toml, npm pkg, CHANGELOG)
- De-cringe the README: drop emojis, marketing fluff, unverified claims
2026-04-26 11:56:41 -05:00