Files
codewhale/crates
Hunter Bown ab70c40beb perf(tui): cache wrapped transcript lines per-cell (closes #78)
Scrolling far back through a long transcript stalled the entire UI: every
keypress paid the cost of re-wrapping every history cell from index 0 on
every frame. Two bugs combined to defeat the existing per-cell cache:

1. **Uniform cache keys** — `widgets/mod.rs` synthesized
   `cell_revisions = vec![app.history_version; len]`, so a single mutation
   anywhere bumped every cell's revision and busted the entire cache.
2. **Vec-deep-clone on cache hit** — `CachedCell.lines: Vec<Line>` deep-cloned
   on every `prev.clone()` inside `ensure`, so even a fully-cached frame paid
   O(total_lines) per render.

Fix mirrors Codex's chatwidget pattern: track per-cell revisions in
`App.history_revisions`, bump only the cell whose content actually
changed, and store cached lines behind `Arc<Vec<Line>>` so a cache-hit
clone is O(1). The cache reuse path is unchanged; what changed is the
keying.

Touchpoints:
* `App::history_revisions` + `next_history_revision` counter, kept in
  lockstep with `history` via `add_message` / `extend_history` /
  `push_history_cell` / `clear_history` / `pop_history` /
  `bump_history_cell` helpers.
* `cell_at_virtual_index_mut` and the `append_streaming_text` path now
  bump only the targeted cell's revision instead of fanning the global
  `history_version` across the whole transcript.
* `TranscriptViewCache::ensure_split` accepts cell shards directly so the
  caller no longer concatenates history + active-cell entries into a
  fresh `Vec<HistoryCell>` every frame.
* `mark_history_updated` resyncs `history_revisions.len()` to
  `history.len()`, preserving correctness for direct callers that bulk
  mutate via `clear`/`extend`.

Bench (release, 5000-cell synthetic transcript, 100×30 area):

| scenario             | before  | after  |
|----------------------|--------:|-------:|
| pure scroll, off=0   | 3549 µs |  23 µs |
| pure scroll, off=100 | 3338 µs |  23 µs |
| pure scroll, off=500 | 3306 µs |  20 µs |
| pure scroll, off=2k  | 3303 µs |  20 µs |
| streaming, off=0     | 11.6 ms | 3.4 ms |
| streaming, off=2k    | 11.6 ms | 3.3 ms |

Pure-scroll renders are now ~150× faster and constant-time vs scroll
offset; streaming cost is ~3.5× lower (the remaining cost is the
per-frame flatten which always rebuilds the line buffer when the cell
count changes — orthogonal follow-up).

Bench is `#[ignore]`'d:
`cargo test -p deepseek-tui --release bench_transcript_scroll -- --ignored --nocapture`

All existing transcript and scroll tests pass; clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:47:17 -05:00
..