Adds the true Recursive Language Model (RLM) inference paradigm:
- rlm/mod.rs — module root with public API
- rlm/prompt.rs — RLM system prompt teaching the model to write code
- rlm/turn.rs — Algorithm 1 implementation:
- P stored as REPL variable (NEVER in LLM context window)
- Metadata-only context sent to root LLM (constant-size)
- LLM generates Python code, not free text
- Code executed in PythonRuntime with llm_query() for recursion
- FINAL() detection ends the loop
- Op::RlmQuery variant in ops.rs
- /rlm command in the command system
- AppAction::RlmQuery handler in ui.rs
- PythonRuntime::with_state_path made public for RLM integration
- 18 new unit tests for code extraction, metadata building, truncation
Key differences from previous 'RLM-inspired' approach:
✅ P is external (REPL variable), not in LLM context
✅ Only metadata(state) in LLM context (constant-size)
✅ LLM generates code, not free text + tool calls
✅ sub-LLM recursion via llm_query() inside REPL code
✅ FINAL() mechanism for programmatic termination
After the assistant message is persisted, when tool_uses is empty,
check for inline ```repl blocks and execute them via PythonRuntime:
- Extract REPL blocks from assistant text
- Spawn PythonRuntime and execute each block sequentially
- If a round returns FINAL: replace the assistant message text with
the final value and break the turn
- If no FINAL: append truncated stdout/stderr as user feedback and
continue the turn loop for iterative refinement
- Emit status events so the user sees 'REPL round N: ...' in the UI
All 26 REPL tests + RLM tests pass. Release build verified.
Refs: paper-spec RLM (Zhang et al., arXiv:2512.24601) §2
Port of codex-rs's `bottom_pane/pending_input_preview.rs` for the queued /
pending steer / rejected steer surface. Phase 1 ships the widget + 7 unit
tests in isolation so reviewers can evaluate the rendering decisions
without also reviewing the composer-area integration. Phase 2 wires it
into `ui.rs` and threads the `pending_steers` / `rejected_steers` fields
onto `App`.
The widget renders three semantic buckets when any are non-empty:
• Messages to be submitted after next tool call (press Esc to send now)
↳ <pending steer>
↳ <pending steer>
• Messages to be submitted at end of turn
↳ <rejected steer>
• Queued follow-up inputs
↳ <queued message>
Alt+↑ edit last queued message
Items truncate to 3 visible rows with a `…` overflow indicator. Long
URL-like tokens emit on their own row instead of fanning out into junk
ellipsis rows (regression test included). Empty state renders zero rows
so the composer doesn't gain wasted height when nothing is queued.
Refs #85.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported: "the context % at the top is pretty inconsistent — like I
just had a message where it was 31% then I sent another message and it
went to 9%? not sure how that works......"
Root cause: `context_usage_snapshot` preferred `app.last_prompt_tokens`
(reported, from `Event::TurnComplete.usage`) over the estimate computed
from `app.api_messages`. The engine populates that usage via
`turn.add_usage`, which SUMS `input_tokens` across every round in a turn:
```
pub fn add_usage(&mut self, usage: &Usage) {
self.usage.input_tokens += usage.input_tokens;
...
}
```
So a multi-round tool-call turn reports a value much larger than the
actual context window state (e.g., 200k from round 1 + 210k from round 2
= 410k displayed as 31% of 1M), then the next single-round turn drops
back to a single round's input_tokens (e.g., 90k displayed as 9%).
Fix: prefer the estimate, which is computed from the current
`api_messages` and is monotonic wrt conversation growth. Reported tokens
fall back only when no estimate is available (e.g., immediately after a
session restore). Also clamp `used` to the model's context window so the
ratio never exceeds 100%.
`is_reported_context_inflated` is no longer in the primary path; kept
behind `#[allow(dead_code)]` because existing tests still exercise it
and a future heuristic may want to distinguish "obviously inflated
reported tokens" from healthy reports.
Regression test
`context_usage_does_not_drop_when_reported_shrinks_after_multi_round_turn`
exercises the exact 31% → 9% scenario the user hit.
Fixes#115.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Save clipboard images as PNG under ~/.deepseek/clipboard-images/
instead of PPM in the workspace, and surface dimensions + size in the
composer's [Attached image: WxH PNG (NkB) at <path>] token plus the
post-paste status hint. DeepSeek V4 does not currently accept inline
image input on its Chat Completions endpoint, so we materialize the
bytes to disk and let the model reach them via the existing file
tools rather than base64-embedding them in the request.
Adds the `image` crate (PNG-only feature; already pulled in
transitively via arboard, so no compile-time delta) plus unit tests
covering PNG header round-trip and label formatting.
Fixes#92
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a modal overlay (`FilePickerView`) bound to Ctrl+P from the composer
when no other modal is open and the engine is not streaming.
* Single-pass `WalkBuilder` walk at construction (depth 6, hidden=true,
follow_links=false, .gitignore honored) caches workspace-relative paths
so per-keystroke filtering is fully in-memory.
* Custom subsequence scorer with start/boundary bonuses, consecutive-run
reward, and gap penalty. ~70 lines, no new crate dependency.
* Up/Down + PgUp/PgDn navigate; Backspace and Ctrl+U edit the query;
Enter emits `ViewEvent::FilePickerSelected` which the UI handler
inserts at the composer cursor as `@<path>` (with surrounding spaces
so the existing `@`-mention parser picks it up); Esc closes without
modifying the composer.
* Ten unit tests cover the scorer (subsequence / boundary / case /
empty-query edge cases) and the view (typing narrows, backspace
widens, Enter emits, Esc closes, `.ignore` is honored).
Fixes#97
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the chunked-transfer connection to DeepSeek dies mid-stream — the
"Stream read error: error decoding response body" symptom — the engine
previously surfaced the error to the user and ended the turn as Failed,
even when no useful content had been received. The user's only recourse
was to manually re-send the same message.
Phase 3 closes that loop. After the inner stream-consumption loop ends,
detect "stream died with nothing actionable":
- stream_errors > 0 (the stream errored at some point)
- tool_uses.is_empty() (no tool call landed)
- current_text_visible is empty/whitespace
- current_thinking is empty/whitespace
- !pending_message_complete
If all hold AND stream_retry_attempts < MAX_STREAM_RETRIES (3), silently
re-issue the SAME outer-loop iteration: rebuilds the request from
self.session.messages, calls create_message_stream again, and starts a
fresh inner loop. Surface a "Connection interrupted; retrying (N/3)"
status to the user so they know something's happening, but don't trip
the engine-level Error event so we don't double-display the failure as
a History cell.
Healthy rounds (stream_errors == 0) reset the retry budget so a single
proxy hiccup doesn't poison subsequent rounds in the same turn.
Crucially: if we got partial output (any tool call, any visible text,
or any thinking), we DON'T retry. Re-running the request would
double-bill the user; ship the partial state to the rest of the turn
pipeline (existing tool execution, content_blocks finalization) and
let the agent loop continue.
Combined with #103 Phase 1+2 (TCP/HTTP2 keepalives + diagnostic logging
in client.rs), this should turn the user-visible "Turn failed: Stream
read error" into either a fully-recovered turn OR a clearly-labeled
3-attempts-exhausted failure.
Refs #103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Polish pass on the existing pager search loop. The infrastructure was
already there (`/` opens search, type query, Enter commits, `n`/`N` cycle,
`match X/Y (n/N)` status row gets pushed) but had three rough edges that
made it less than the codex pager-overlay parity #96 asks for:
1. **Status row clipped on small popup heights.** `visible_height` was
`popup_area.height - 2` (borders only). With `Padding::uniform(1)` on
the block we actually have 4 rows of overhead, not 2 — so the status
row got pushed past the viewport on shorter pagers and the user never
saw match-count feedback. Subtract 4, then reserve another row for
the status when matches exist.
2. **Matched lines weren't visually distinguished.** Searching jumped the
scroll to the match but the line itself rendered the same as
surrounding rows. Now the current match row gets a Yellow/Black
bold background; other matches get a DarkGray/Yellow background.
Per-substring highlighting (preserving the original spans' styling)
is deferred — the all-row highlight is enough to navigate and avoids
the substring-styling-vs-pre-styled-spans interaction that needs its
own design pass.
3. **Esc in the search prompt left stale matches behind.** Pressing `/`
then Esc to bail now ALSO clears `search_input` / `search_matches` /
`search_index`, returning the pager to a clean un-highlighted view.
Codex parity. To resume from where the user left off they re-`/` and
re-type.
4 new tests (`search_finds_matches_and_renders_match_counter`,
`esc_in_search_mode_clears_matches`, `n_and_capital_n_cycle_matches_with_wrap`,
`matched_lines_get_highlight_background`). 22/22 pager tests pass.
Fixes#96.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `KeyBinding` widget at `crates/tui/src/tui/widgets/key_hint.rs` that
renders chord shortcuts in the host platform's notation: `⌥+X` on macOS
and `alt+X` on Linux/Windows. Constructors `plain`, `alt`, `shift`,
`ctrl`, `ctrl_alt` plus `KeyBinding::new` for arbitrary modifier sets.
Both a `Display` impl and `From<KeyBinding> for Span<'static>` are
provided so call sites can use the type in plain `format!` strings as
well as in ratatui line/span builders.
The Help view now renders its keybinding hints through `KeyBinding`
instead of hardcoded `Alt+`/`Shift+`/`Ctrl+` strings, so macOS users see
`⌥+↑` etc. matching every other Mac app.
Windows AltGr handling: `is_altgr()` flags `Ctrl+Alt` (the way
European-layout AltGr keypresses arrive on Windows) and
`has_ctrl_or_alt()` exposes a "real modifier" predicate that returns
`false` for AltGr. The composer's plain-char detector in `ui.rs` now
uses `has_ctrl_or_alt`, so AltGr-typed glyphs (`@`, `\`, `|`, …) on
European keyboards reach the input buffer instead of being swallowed as
modified shortcuts. Distinguishing left-vs-right Alt keys is not
portable across crossterm backends; `Ctrl+Alt+<char>` chords are
unbindable on Windows as a result, and that limitation is documented in
the module comment.
Tests: 10 unit tests cover plain/alt/shift/ctrl/ctrl_alt rendering,
modifier order, function keys, lowercase normalization, the `Span`
conversion, `is_press` semantics, and AltGr platform branching.
Fixes#89
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a codex-rs-style "Press Ctrl+C again to quit" prompt so a single
stray Ctrl+C in idle state no longer kills the session.
State is held on `App::quit_armed_until: Option<Instant>` and threaded
through three small helpers (`arm_quit`, `quit_is_armed`, `disarm_quit`,
`tick_quit_armed`). The redraw loop calls `tick_quit_armed` each pass
and caps `event::poll` at the deadline so the prompt expires on time
even when no input arrives.
Behavior:
- First Ctrl+C in idle state: arm a 2s window; footer shows
"Press Ctrl+C again to quit" (warning color) overriding any active
status toast.
- Second Ctrl+C inside the window: clean shutdown via `Op::Shutdown`.
- Ctrl+C 3 seconds later: re-arms instead of quitting.
- Ctrl+C while a turn is in flight (`is_loading`): unchanged — still
cancels the turn, and explicitly disarms the quit prompt.
Tests cover the timer lifecycle: default-disarmed, arm-sets-2s-window,
disarm-clears-and-redraws, tick-no-op-within-window, tick-clears-after-
expiry, and re-arm-after-expiry-starts-fresh-window.
Fixes#90
Bind Ctrl+E in the composer to suspend the TUI, spawn $VISUAL/$EDITOR
(fallback `vi`) on a temp file pre-populated with the composer's current
contents, then read the file back into the composer on save.
- New `crates/tui/src/tui/external_editor.rs`:
- `spawn_editor_for_input` toggles raw mode + alt-screen + mouse +
bracketed-paste before/after the edit and forces a `terminal.clear()`
on return so a SIGWINCH during the edit doesn't leave a stale viewport.
- `run_editor_raw` is the testable core: writes seed -> spawns editor ->
reads back -> returns `Edited(new) | Unchanged | Cancelled`. Uses
`tempfile::NamedTempFile` so the temp file is unlinked on every path
(success, non-zero exit, missing binary, IO error).
- `tui/ui.rs`: split the existing `End | Ctrl+e` cursor-end keybinding
so `End` still moves the cursor to the line end, and `Ctrl+E` now
spawns the external editor. Status-line feedback: "Edited in <editor>",
"Editor closed (no changes)", or "Editor cancelled".
- 7 new unit tests cover the resolver precedence (VISUAL > EDITOR > vi),
the no-op / failure / missing-binary / edited paths, and an explicit
temp-file-cleanup assertion via a captured editor argv.
Fixes#91
Rework `FooterWidget::status_line_spans` so the footer never wraps
mid-hint at any width. Hints now drop in priority order:
1. mode label (always visible; truncated only as a last resort)
2. model name (always visible alongside mode; truncated mid-word only
after status has already been dropped)
3. status label ("working", "draft", "refreshing context", ...) — drops
first when space is tight
Previously the model name would ellipsize the moment a long status
label crowded the line, even at 60–80 columns. The new tier system
keeps mode + model intact down to ~25 cols and only falls back to
mode-only on extreme narrow widths.
Includes snapshot-style tests at widths 40, 60, 80, 100, 120 covering
the full / drop-status / truncate-model / mode-only tiers.
Fixes#88
Two fixes for the persistent "Stream read error: error decoding response
body" we saw mid-turn during long V4-pro thinking sessions.
1) HTTP transport tuning (`crates/tui/src/client.rs`):
- Drop the blanket 300s request timeout. Long V4 thinking turns
legitimately exceed the wall-clock window; per-chunk and per-stream
guards in `engine.rs` already bound how long we wait without progress.
- Add `tcp_keepalive(30s)` so dead-peer detection happens at the TCP
layer instead of waiting for the application to notice.
- Add `http2_keep_alive_interval(15s)` + `http2_keep_alive_timeout(20s)`
so HTTP/2 connections to DeepSeek's edge don't go silent and get
killed by an upstream proxy mid-thinking.
2) Stream-error diagnostics (`crates/tui/src/client/chat.rs`):
- Walk reqwest's `std::error::Error::source()` chain when a chunk read
errors, so the underlying hyper / h2 / io error is logged. Without
this the outer "error decoding response body" message tells us
nothing about WHY the stream died.
- Track elapsed wall time, bytes received so far, and ms since the
last successful event; log them alongside the error chain. Lets us
tell HTTP/2 RST_STREAM mid-idle from chunk-decode-failure on a
short stream from gzip-corruption mid-burst.
Phase 3 (transparent retry with `prefix` continuation) is intentionally
NOT in this PR. The retry-flag plumbing on MessageRequest + chat.rs prefix
wire format + engine.rs retry loop is a meaningful surface that deserves
its own review pass; this PR ships the diagnostic-and-resilience floor so
we can land the harder retry work knowing the underlying network state is
better.
Refs #103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parity CI's `cargo clippy ... -D warnings` rejects the nested `if let`
pattern in `expand_mention_home` under the new clippy::collapsible_if
lint (rust-clippy 1.95). Use the chained `if let ... && let ...` form
the lint suggests. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five issues from the PR #105 review pass:
1. Lazy file-tree index for fuzzy_resolve (HIGH, Gemini)
The previous `fuzzy_resolve` walked the workspace up to depth 6 on every
miss. A user typing several non-existent paths in one message could
trigger multiple disk-intensive walks. Replace with a `OnceLock`-backed
basename → paths index, built once on first miss and reused thereafter.
2. Cache cwd at construction (MEDIUM, Gemini)
`std::env::current_dir()` was a syscall on every `resolve` call (up to
8 mentions per message). Capture once in `Workspace::new()` and store
in the struct.
3. Include directories in fuzzy match (MEDIUM, Gemini)
The mention system supports directory listings, but fuzzy_resolve was
restricted to `is_file()`. Allow directories too.
4. Drop `Path::canonicalize` from the mention loop (MEDIUM, Gemini)
`Workspace::resolve` already returns absolute paths when the workspace
root is absolute (always true in TUI use). Removed the per-mention
`canonicalize` syscall on the message-send hot path. The rare
symlink-aliasing dedup miss is an acceptable cost.
5. Gate <missing-file> blocks through dedup set (Devin)
The `Err` arm in `local_context_from_file_mentions` pushed a
`<missing-file>` block before reaching the `seen.insert` check, so the
same non-existent mention typed twice produced duplicate blocks and
wasted prompt tokens. Restructured the loop so all blocks (existing
AND missing) flow through the dedup gate.
Bonus: replaced the test that was mutating the global cwd (race-prone
across tests) with one that constructs `Workspace::with_cwd` explicitly.
Added a second test exercising the lazy-index + directory-match paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `cargo fmt --all -- --check` rejected two long expressions in
`crates/tui/src/tools/rlm_query.rs` (the per-child structured-log match
arm, and the `Ok(Ok(response))` extract_text invocation). Reformatted to
match rustfmt's preferred multi-line block style. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TUI's `EngineEvent::Error` handler in `ui.rs` matched on `{ message, .. }`
and unconditionally set `app.offline_mode = true`. This meant any transient
stream-disconnect (e.g., the chunked-transfer connection getting closed during
a long V4 thinking turn with no SSE keepalive) flipped the session offline,
queued the user's next message, and forced them to recover manually — even
though the engine had already classified the error as recoverable.
The engine has been emitting `Event::error(message, recoverable)` with the
correct boolean since the error-taxonomy work in #66. Stream stalls
(engine.rs:2286), max-duration aborts (:2322), max-bytes aborts (:2334), and
upstream stream errors (:2344) all set `recoverable = true`. Hard failures
like sub-agent spawn failures (:1202) and post-recovery context overflows
(:1378, :1559) set `recoverable = false`. The UI just wasn't reading it.
Pull the body out into a `pub(crate) fn apply_engine_error_to_app` helper so
the branch logic is unit-testable from `ui/tests.rs`, then split:
- `recoverable = true` → status: "Connection interrupted: …"; stay online.
- `recoverable = false` → status: "Engine error; queued messages stay
pending: …"; flip into offline mode.
Add two regression tests covering both branches.
Fixes#86
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the `publish-npm` job from `release.yml`. It has been failing on
every release with `npm error code EOTP` because the configured `NPM_TOKEN`
doesn't bypass 2FA. Manual publish from a developer machine is the actual
ship path; codify that.
- Update `docs/RELEASE_RUNBOOK.md` "npm Wrapper Release" to describe the
manual flow (`npm publish --access public` + OTP) and explain why the auto
path is gone, with a recovery note for future Trusted-Publishing migration.
- Refresh stale cross-reference comment in `publish-npm.yml` (the workflow
remains as inert plumbing for an eventual Trusted Publishing setup).
- Stop tracking `docs/DeepSeek_V4.pdf` (4.4 MB). It was never referenced
outside test fixture filenames; the tests synthesize their own fake PDF.
Add to `.gitignore` so a local copy can sit there without nagging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI Documentation job was failing with `error: unclosed HTML tag 'no'` at
crates/tui/src/mcp.rs:71. Rustdoc parses unescaped `<no body>` as an HTML
tag and `RUSTDOCFLAGS=-Dwarnings` (used in CI) promotes that to a hard
error. Reword the comment to refer to the literal string instead.
User feedback after v0.6.2 dogfooding: "we'd be better off simplifying and
removing guardrails." Two changes that meaningfully shrink the surface:
1. TURN_MAX_OUTPUT_TOKENS: 32_768 → 262_144 (256K).
V4 thinking models can produce tens of thousands of reasoning tokens
on hard prompts before the visible reply, and DeepSeek V4 has a 1M
context window. 32K was tight for that workload (showed up as the
model "stopping mid-response" once reasoning exhausted the budget).
256K is generous enough that the per-turn ceiling effectively never
bites in normal use.
2. CapacityControllerConfig::enabled: true → false.
The controller's main intervention, `TargetedContextRefresh`, runs
`compact_messages_safe` which rewrites the live conversation —
visually identical to the agent "restarting" mid-turn. The failure
mode it protects against (context overflow) is rare in practice and
self-correcting (the model surfaces a clear error). Power users on
V4 do not need the guardrail; users who do can re-enable it via
`capacity.enabled = true` in `~/.deepseek/config.toml`.
Tests:
- context_budget_reserves_output_and_headroom: switched fixture model
to deepseek-v4-pro (1M context) so the 256K reservation doesn't
saturate the budget to zero.
- cooldown_blocks_repeated_action: explicitly enables the controller
(the cooldown logic short-circuits when disabled).
cargo clippy --workspace -- -D warnings clean; full test suite green
(990 + adjacent crate tests).
Two fixes folded into one commit (the parity failure was blocking the
v0.6.2 npm publish, the strip fix is the dogfooding follow-up):
1. cargo fmt --all: subagent/mod.rs (long timeout wrapper) was over the
line-length budget when committed earlier; rustfmt rewraps it. CI
parity (`cargo fmt -- --check`) was failing the release pipeline.
2. footer working-strip stays visible for the entire turn: previously
the strip only animated while `is_loading || is_compacting ||
running_agents > 0`. Between LLM rounds inside a single turn (tool
execution, reasoning replay, capacity refresh) `is_loading` flickers
off — and so the user saw the strip vanish for seconds at a time
even though the agent was clearly still working. Widen the gate to
ALSO include `runtime_turn_status == Some("in_progress")`, which
only clears when `EngineEvent::TurnComplete` fires — so the strip
now stays lit for the whole turn duration.
User repro: V4 thinking on hard prompts (~107s of thinking) randomly
"stops mid-response", more often when starting in Agent mode and
switching to YOLO. Two ceilings were too tight:
1. TURN_MAX_OUTPUT_TOKENS: 4096 → 32768
`reasoning_content` from V4 thinking can easily exceed 4K tokens on
hard problems. Once the per-turn output budget exhausts, the API
closes the SSE stream with `finish_reason: "length"` and the visible
reply ends up empty — surfaced as the assistant "stopping randomly".
32K leaves comfortable headroom for thinking + the visible reply on
every realistic turn while staying well below DeepSeek V4's
1M-context output ceiling.
2. max_steps: 100 → u32::MAX (effectively unlimited)
100 was hitting the ceiling on long multi-step plans (wide refactors,
sub-agent orchestration) and presenting as the agent "giving up
mid-task". V4's 1M context window means there's no good reason to
cap steps administratively. Users can still interrupt with Ctrl+C /
Esc; a turn naturally ends when the model stops emitting tool calls.
All 54 turn tests pass; full workspace clippy + fmt remain clean.
Two tuning fixes for issues observed in v0.6.2 dogfooding:
#63 follow-up — sidebar panels still empty in compact terminals:
`section_padding: Padding::uniform(1)` ate two rows of every sidebar
panel (one above content, one below). At the 25% layout split, in
terminals around 12-15 rows tall, Plan/Todos/Tasks each get only
3 rows total — borders take 2, vertical padding takes 2, leaving
-1 (saturated to 0) rows for the actual content. Even "No todos" /
"No active plan" got eaten. Switched to horizontal-only padding so
the inner row survives.
Capacity-controller tuning (user feedback: "refreshing context is overtuned"):
`apply_targeted_context_refresh` runs `compact_messages_safe` which
rewrites the conversation history — visually identical to the agent
"restarting" mid-session. The previous defaults
(low_risk_max=0.34, refresh_cooldown_turns=2, min_turns=2)
fired this every couple of turns once p_fail crept above 0.34.
Bumped:
- low_risk_max: 0.34 → 0.50
- refresh_cooldown_turns: 2 → 6
- min_turns_before_guardrail: 2 → 4
Still well below the medium-risk ceiling (0.62), so genuine drift
still triggers; routine noise no longer does.
All 14 capacity tests + workspace clippy + fmt remain clean.
Resolves conflicts with the #65 resize fix that landed first. Both branches
converged on the same resize-coalescing + display-width truncation fix;
took the perf branch's more detailed inline comments and combined the
transcript bench from #78 with the existing #65 resize regression tests.
Issue #78 baseline (release, 5000-cell synthetic transcript):
pure scroll, off=0 3549µs → 21µs (~150x)
pure scroll, off=2000 3303µs → 19µs (~170x)
streaming append 11.6ms → 3.4ms (~3.4x)
Scrolling far back through a long transcript stalled the entire UI: every
keypress paid the cost of re-wrapping every history cell from index 0 on
every frame. Two bugs combined to defeat the existing per-cell cache:
1. **Uniform cache keys** — `widgets/mod.rs` synthesized
`cell_revisions = vec![app.history_version; len]`, so a single mutation
anywhere bumped every cell's revision and busted the entire cache.
2. **Vec-deep-clone on cache hit** — `CachedCell.lines: Vec<Line>` deep-cloned
on every `prev.clone()` inside `ensure`, so even a fully-cached frame paid
O(total_lines) per render.
Fix mirrors Codex's chatwidget pattern: track per-cell revisions in
`App.history_revisions`, bump only the cell whose content actually
changed, and store cached lines behind `Arc<Vec<Line>>` so a cache-hit
clone is O(1). The cache reuse path is unchanged; what changed is the
keying.
Touchpoints:
* `App::history_revisions` + `next_history_revision` counter, kept in
lockstep with `history` via `add_message` / `extend_history` /
`push_history_cell` / `clear_history` / `pop_history` /
`bump_history_cell` helpers.
* `cell_at_virtual_index_mut` and the `append_streaming_text` path now
bump only the targeted cell's revision instead of fanning the global
`history_version` across the whole transcript.
* `TranscriptViewCache::ensure_split` accepts cell shards directly so the
caller no longer concatenates history + active-cell entries into a
fresh `Vec<HistoryCell>` every frame.
* `mark_history_updated` resyncs `history_revisions.len()` to
`history.len()`, preserving correctness for direct callers that bulk
mutate via `clear`/`extend`.
Bench (release, 5000-cell synthetic transcript, 100×30 area):
| scenario | before | after |
|----------------------|--------:|-------:|
| pure scroll, off=0 | 3549 µs | 23 µs |
| pure scroll, off=100 | 3338 µs | 23 µs |
| pure scroll, off=500 | 3306 µs | 20 µs |
| pure scroll, off=2k | 3303 µs | 20 µs |
| streaming, off=0 | 11.6 ms | 3.4 ms |
| streaming, off=2k | 11.6 ms | 3.3 ms |
Pure-scroll renders are now ~150× faster and constant-time vs scroll
offset; streaming cost is ~3.5× lower (the remaining cost is the
per-frame flatten which always rebuilds the line buffer when the cell
count changes — orthogonal follow-up).
Bench is `#[ignore]`'d:
`cargo test -p deepseek-tui --release bench_transcript_scroll -- --ignored --nocapture`
All existing transcript and scroll tests pass; clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After v0.6.1's light-theme removal exposed it more visibly, rapid resizes
left stale glyphs in the right column (sidebar fragments, mid-character
title truncation, duplicated transcript timestamps). Three small fixes:
- Coalesce queued `Event::Resize` events, run a single `terminal.clear()`,
and immediately draw the new frame instead of waiting for the next event
loop iteration. Previously the cleared screen could sit blank between
the resize handler's `continue` and the next draw, so any other event
arriving in that window would be processed before the repaint.
- `truncate_line_to_width` for budgets `<= 3` was counting codepoints
instead of display widths, overrunning the cell budget for any
double-width grapheme. Fix by accumulating display widths consistently.
- Add a `tracing::debug!` log to the resize handler so users hitting this
in the wild can confirm whether crossterm is delivering the event.
Adds two regression tests in `tui/widgets` (resize cycle + cache
invalidation on width change) and one in `tui/ui` (truncate semantics).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>