Adds a queue-driven `MockLlmClient` that implements the `LlmClient` trait
by replaying canned per-turn `StreamEvent` vectors and capturing every
outgoing `MessageRequest`. The mock lives at the trait boundary so it
stays decoupled from the concrete reqwest plumbing inside `DeepSeekClient`,
and surfaces builders (`canned::*`) for the common event shapes (text
delta, thinking delta, tool_use start, input JSON delta, message delta).
Wires a new `--record <DIR>` flag into `deepseek eval` that appends one
JSON Lines fixture line per step to `<DIR>/<scenario>.jsonl`. The format
is documented at the top of `eval.rs` and is the storage shape the mock
will replay from.
`crates/tui/src/llm_client.rs` becomes `crates/tui/src/llm_client/mod.rs`
to host the new submodule cleanly. The trait shape is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows preserves the user-typed `/` when Path::join() ingests a multi-
component string with forward slashes, producing a mixed-separator path
in the rendered <file> block (e.g. `C:\...\.tmpKxj0Pk\nested/deep/file.md`).
The test compared full paths via display(), which mismatched.
Switch to a basename comparison per CLAUDE.md's portability rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed dead_code allow from error_taxonomy.rs. Event::error now carries
an ErrorEnvelope with category + severity instead of (String, bool); all
~13 engine callsites migrated through a small helper API (ErrorEnvelope::transient
/ ::fatal / ::fatal_auth / ::context_overflow / ::network / ::tool / ::classify).
Capacity controller branches on ErrorCategory instead of substring matches.
TUI renders severity with distinct palette tokens via a new HistoryCell::Error
variant. Audit log carries category + severity fields so downstream tooling
can categorize.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pager view gains a sticky_to_bottom mode: scroll-up pauses auto-tail,
scrolling back to bottom resumes it. Wrapped lines cached by
(cell_id, width, revision); revisions bumped on live-cell mutation so
resize doesn't reflow the world.
Ctrl+T toggles; Esc returns. Engine continues streaming while overlay open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port of codex-main's PendingInputPreview pattern. Three semantic buckets
render in the composer area: pending steers (Esc submits), rejected
steers (re-fires at end of turn), queued follow-ups (Alt+Up edits last).
Empty state renders zero rows.
Engine populates the new App.pending_steers / rejected_steers fields
through the steer-submission path; existing queued_messages plumbing
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cards consume the #130 mailbox stream and render live in the transcript:
- DelegateCard: last-3-actions tree for active agent_spawn
- FanoutCard: dot-grid + aggregate stats for agent_swarm / rlm fanout
Sidebar demoted to a navigator (count + role); detail lives in the card.
Engine wires SubAgentRuntime::with_mailbox so the primitive actually flows.
Cards re-bind on session resume via runtime_threads agent_ids.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
StatusItem enum covers all current + new (rate-limit, ctx %, git branch,
last-tool elapsed) items. /statusline opens a multi-select picker with
live preview; selections persist to config.toml under `tui.status_items`.
Default selection mirrors today's footer so upgraders see no change.
Reflowed approval.rs to a full-screen modal with two stakes-based variants:
benign (single-key approve / always) and destructive (explicit confirm).
Variant routing classifies from tool kind + command-safety so destructive
ops never get a muscle-memory accept.
Existing approval tests still pass; new tests cover variant routing + keys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1: log full reqwest error chain + headers + bytes-received at decode site
Phase 2: HTTP/2 keepalive settings + tcp keepalive on the reqwest builder
Phase 3: engine transparently retries when stream errors before any content;
surface error on mid-stream failure (no double-bill); stream_errors
threshold relaxed 3 -> 5 with the new keepalive
Phase 4: unit tests for the four classes of stream failure
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New HelpView modal lists all slash commands with descriptions and all
keybindings, with a live substring filter. Bound to `?` when focus is
outside the composer; Esc / `?` toggles.
Slash commands pull from the existing slash_menu registry; keybindings
pull from a new KeybindingCatalog single-source-of-truth so docs can't
drift from the wired handlers.
Two-pass resolution in file_mention::resolve_mention_path -- try
workspace.join first, then std::env::current_dir().join, then a basename
fuzzy fallback. Extracted the shared resolver into working_set.rs so the
future Ctrl+P fuzzy picker (#97) uses the same logic.
Tests cover the workspace/CWD divergence repro that was masquerading as a
"@ doesn't work" report.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each sub-agent type now has an explicit SUMMARY / EVIDENCE / CHANGES /
RISKS / BLOCKERS output contract, mode-specific guidance (explorer /
planner / reviewer / general), and tool-calling conventions that prefer
the typed tool surface over exec_shell shellouts.
The output format is defined once and referenced from each per-type
prompt, so future tweaks live in one place.
Internal upgrade — public tool surface (agent_spawn, agent_swarm, …) unchanged.
The mailbox primitive replaces ad-hoc mpsc plumbing in the runtime so:
- progress events have monotonic ordering
- subscribers get watch-based backpressure
- close-as-cancel propagates through nested children
Pairs with #128 (in-transcript cards consume the mailbox stream).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Windows CI test for archive_cycle_writes_jsonl_with_header_and_messages
was failing because:
1. The path-suffix assertion used "/1.jsonl" as a literal — Windows uses
backslashes, so the assertion never matched. Replace with file_name()
comparison which is platform-agnostic.
2. set_var("HOME", ...) doesn't redirect dirs::home_dir() on Windows;
that function reads USERPROFILE on Windows. Set both env vars so the
test redirects portably.
Same HomeGuard pattern in tools/recall_archive.rs tests was passing on
Windows by luck (each test uses a unique session_id, so writes to the
real ~/.deepseek/sessions/ didn't clash) but was polluting the user's
home directory in CI. Mirror the dual-env-var fix there too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run `cargo fmt --all` after the four redesign sub-areas land to settle
attribute placement (`#[allow(dead_code)]` lives after doc comments,
not between them — interleaving was splitting docs from items).
Inline the trailing `let dom = …; dom` in `nearest_ansi16` to satisfy
clippy::let_and_return.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub-area #3 of the v0.6.6 UI redesign (issue #121).
Introduces `crates/tui/src/tui/widgets/tool_card.rs` — a small,
self-contained vocabulary module that owns:
- `ToolFamily` enum (Read / Patch / Run / Find / Delegate / Fanout /
Think / Generic) and the verb-glyph + label per family
(▷ read, ◆ patch, ▶ run, ⌕ find, ◐ delegate, ⋮⋮ fanout, … think)
- `tool_family_for_title` — maps the legacy header titles
(`"Shell"`, `"Patch"`, `"Workspace"`, `"Search"`, `"Diff"`, `"Image"`)
to a family, so existing call sites pick up the new glyph without
re-architecture
- `tool_family_for_name` — maps actual tool names
(`agent_spawn`, `apply_patch`, etc.) for `GenericToolCell`, which
shares the catch-all `"Tool"` title across every model-facing tool
- `CardRail` + `rail_glyph` — the `╭ │ ╰` rail vocabulary, declared
here so any future per-card refactor has the matching glyphs
Wires the verb glyph + label into `render_tool_header` and adds a
`render_tool_header_with_family` overload so `GenericToolCell` can
route by tool name rather than the generic title. The header now reads
`<spinner> <verb-glyph> <verb> <state>` instead of
`<spinner> <Title-Case-Word> <state>`.
Existing parity tests for ExecCell / PlanUpdate are updated to assert
against the new header structure (verb + glyph) — the colour wiring is
unchanged. New tests pin the verb-glyph format end-to-end:
`agent_spawn` → `◐ delegate`, exec → `▶ run`.
Spinner cadence (TOOL_STATUS_SYMBOL_MS = 720 ms) is unchanged — the
spec already matched.
Deferred to a follow-up: full per-card rail (`╭ │ ╰`) refactor that
threads `CardRail` through every cell render path. The vocabulary is
in place; the layout pass is the next bite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub-area #5 of the v0.6.6 UI redesign (issue #121).
Spout strip:
- Replace box-drawing arc cups (`╭───╮`) with paired crests (`⌒‿`) over
the existing water-surface (`─`). Crests read as gentle ripples instead
of hard architectural arches — calmer eye-feel.
- Two crests at independent cadences: A advances every 4 ticks (~600 ms),
B every 6 ticks (~900 ms). Phase jitter every 17 ticks (~2.5 s) keeps
the pattern from settling into a strict beat.
- Frame counter cadence in `ui.rs` retimed from 80 ms to 150 ms so the
4×6×17 tick math lands on the spec'd timings.
Footer left cluster consolidates "what costs you what":
- Cost chip moves from the right-hand parade to the left, between model
and status: `mode · model · cost · status`.
- Priority drop is now status → cost → truncate model → mode-only. Cost
outranks status because it's steady info; status is a transient signal.
- Right cluster shrinks to coherence / agents / replay / cache.
Tests: existing strip determinism / position-advances tests are retuned
for the new tick math (12-tick window covers both crests). New tests pin
the cost slot's order on wide widths and confirm cost survives status
drop in the priority cascade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While a turn is running, Esc with non-empty composer input now steers:
the typed text is captured, the in-flight HTTP request is cancelled,
and on TurnComplete::Interrupted every accumulated steer is merged
into a single fresh user message that re-enters the engine. Empty-input
Esc still cancels exactly as before.
State machine. SubmitDisposition::{Immediate, Queue, Steer} replaces
the implicit if/else in submit_or_steer_message; truth table preserves
the offline_mode + busy fallback path. App.pending_steers /
rejected_steers / submit_pending_steers_after_interrupt back the new
flow and the queue-visibility widget.
Partial save. Deliberate divergence from openai/codex which discards
on abort: V4 thinking is expensive, so the streaming Assistant cell is
tagged '[interrupted] …' (or '[interrupted]' when nothing streamed yet)
and its spinner is flipped off. The TurnComplete handler also calls the
helper so Ctrl+C / network failures get the same treatment, idempotent
with the optimistic call in the Esc handler.
Queue visibility. PendingInputPreview already supported all three
buckets; build_pending_input_preview now populates pending_steers and
rejected_steers alongside queued_messages. rejected_steers stays empty
under today's engine paths (no rejection signal yet) but renders if/when
populated.
Recovery. If TurnComplete arrives with Failed instead of Interrupted
while pending_steers is non-empty, the steers are demoted to the
visible queue so they're not silently lost.
Tests. 13 new app-level units cover the disposition truth table,
push/drain semantics, double-Esc idempotency, and the partial-save
helper. 11 new ui-level units cover Esc-action routing, slash-menu
priority, whitespace-only input handling, merge_pending_steers (empty
/ single / multi / skill-instruction), and the three-bucket preview.
Closes#122.
Sub-area #2 of the v0.6.6 UI redesign (issue #121).
Reasoning is the only deliberately-warm element in the redesigned
transcript. The treatment makes that visible:
- Header opener becomes `…` (slow exhale) instead of the spinner glyph
- Body left rail switches from solid `▏` to dashed `╎` so it visibly
differs from message body and tool output rails
- Body text carries an italic modifier
- Body lines tint with `palette::reasoning_surface_tint(depth)` — 12%
blend of SURFACE_REASONING (#362C1A) over DEEPSEEK_INK. ANSI-16
terminals get no bg (the named-color mapping would distort the warm)
- A trailing `▎` cursor in ACCENT_REASONING_LIVE follows the most recent
body line during streaming, suppressed under low_motion
Wires up palette helpers from the prior commit: `ColorDepth::detect`,
`reasoning_surface_tint`, `blend`. SURFACE_REASONING is no longer
dead-coded. The unused `thinking_symbol` helper is removed since the new
header doesn't spin.
Tests: dashed rail and italic body land on every body line; streaming
cursor appears only when motion is allowed; collapsed-summary affordance
keeps working.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces lossy summarization compaction with a checkpoint-restart
architecture (#124). At 110K cumulative tokens (per V4's 128K retrieval
elbow) the engine runs a briefing turn, archives the cycle to JSONL at
~/.deepseek/sessions/<id>/cycles/<n>.jsonl, then resets the in-memory
buffer to a fresh context: original system prompt + structured state
(plan/todos/working-set/sub-agents) + the model-curated <carry_forward>
briefing (~3K token cap, hard-bounded).
The compaction summarizer is now off by default. Per-model thresholds in
[cycle.per_model] let operators tune deepseek-v4-pro vs -flash separately.
Phase guard in should_advance_cycle blocks mid-tool/stream/approval boundaries;
engine only invokes at clean turn-completed events. Sub-agents are not
awaited — their handles are captured in the structured-state block so the
new cycle sees them still running.
Adds the recall_archive tool (#127) — BM25 over message text in archived
cycles, top-N hits with cycle/index/excerpt. Always-loaded across modes
via should_default_defer_tool so the agent doesn't need ToolSearch to
discover it. Children inherit it via with_full_agent_surface.
UI surfaces:
- /cycles, /cycle <n>, /recall <query> slash commands
- Sidebar shows cycle counter once a boundary fires
- CycleAdvanced engine event carries the full briefing so the UI can
populate app.cycle_briefings for /cycle <n>
- runtime_threads schema bumped to v2 (cycle.advanced events appear in
the durable timeline; load rejects future versions)
Tests: 21 cycle_manager + 13 recall_archive + 4 commands::cycle.
All 1168 workspace tests pass. Three parity gates pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub-area #1 of the v0.6.6 UI redesign (issue #121).
User cells now lead with `▎` (solid bar, no animation — input is finished).
Assistant cells lead with `●`. While streaming, the bullet pulses on a
2-second cycle between 30%..100% brightness via `palette::pulse_brightness`;
once a turn completes the bullet sits at full DEEPSEEK_SKY so finished
history reads as solid.
Honors `low_motion`: the pulse is suppressed and the glyph holds full
brightness regardless of streaming state. Pager / clipboard exports
(`transcript_lines`) also skip the pulse so screenshots are stable.
Existing pager titles in `ui.rs` (`history_cell_to_text`) keep the literal
"You" / "Assistant" wording — those drive the modal title bar and read
better as words than as glyphs.
Tests: glyphs replace the literal labels in both User and Assistant cells;
streaming pulse demonstrably dips below source brightness; idle and
low_motion both pin to source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the foundation for the v0.6.6 UI redesign (issue #121):
- ColorDepth enum with detect() reading COLORTERM/TERM
- adapt_color / adapt_bg gates for ANSI-16 terminals (drops bg tints
rather than coarse-mapping them, which would distort the palette)
- blend(fg, bg, alpha) for alpha compositing on RGB
- reasoning_surface_tint() — 12% blend over DEEPSEEK_INK (None on ANSI-16)
- pulse_brightness(color, now_ms) — 30%..100% sine swing on a 2s cycle
- nearest_ansi16() helper for legacy-terminal foreground mapping
Helpers carry #[allow(dead_code)] until the per-area redesign sub-tasks
wire them in (speaker pulse, reasoning treatment, footer cost cluster).
Tests cover envelope bounds and brand-color routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ClientError, StreamError, and their impl blocks in error_taxonomy.rs
- ApprovalCache in approval_cache.rs (pending #66 follow-up wiring)
- Legacy prompt constants in prompts.rs (backward compat)
- with_mcp_tools and McpToolAdapter in registry.rs (pending MCP migration)
- Fix rlm_query → rlm in when_not_to_use_sections_present test
- Wire 120 FPS FrameRateLimiter into run_event_loop via
time_until_next_draw + mark_emitted
- Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL
- Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode
- Add StreamingState::set_low_motion() to propagate to all block policies
- Tool spinner already freezes on first frame when low_motion is set
TODO_BACKEND.md §3, TODO_FIXES.md #4
v0.6.6 — sub-agents inherit the parent's full tool registry,
auto-approve, respect a depth cap, and propagate cancellation.
Adds optional cwd to agent_spawn for parallel-worktree dispatch.
Schema-ready for roles (full library lands in 0.6.7).
Changes:
- New ToolRegistryBuilder::with_full_agent_surface(...) shared by parent and child
- SubAgentToolRegistry::new refactored to use shared builder; per-type
allowlist becomes advisory
- SubAgentRuntime gains auto_approve, spawn_depth, max_spawn_depth, cancel_token
- Depth check at spawn entry; cancellation cascade via CancellationToken::child_token()
- <deepseek:subagent.done> sentinel emitted on child completion
- cwd: Option<PathBuf> on agent_spawn with workspace-boundary validation
- Stream wall-clock cap bumped to 30 min (was 300s)
- max_spawn_depth configurable via EngineConfig (default 3)
- Version bump to 0.6.6
Closes#99.
Two near-duplicate top-level tools made the surface confusing. With
parallel_fanout (formerly rlm_query) removed, there's exactly one RLM
shape: load a long input as `context` in a Python REPL via `rlm`, and
let the sub-agent fan out from inside the REPL via `llm_query_batched`
where it has `context` in scope to chunk against.
For non-RLM parallel work the dispatcher already runs multiple tool
calls per turn concurrently — no separate fan-out tool needed. The
GenericToolCell.prompts rendering hook stays (one-row-per-child for any
future fan-out tool), but no tool currently populates it.
Also drops two stray test artifacts (rlm_catalog.md, rlm_test_doc.md)
the model wrote to repo root during a previous live test session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two top-level tools shared the rlm_ prefix but did completely different
things — rlm_query was a flat parallel-completion fan-out wearing an
RLM-shaped name, and rlm_process was the actual recursive language model.
The overlap was the source of the "our rlm query is completely wrong"
confusion.
rlm_process → rlm # single, honest name for the recursive tool
rlm_query → parallel_fanout # honest name for the flat fanout
Internal renames follow:
Op::RlmQuery → Op::Rlm
AppAction::RlmQuery → AppAction::Rlm
handle_rlm_query → handle_rlm
RlmProcessTool → RlmTool
RlmQueryTool → ParallelFanoutTool
RlmChildClient → FanoutChildClient
with_rlm_process_tool → with_rlm_tool
with_rlm_query_tool → with_parallel_fanout_tool
The REPL helpers `rlm_query` / `rlm_query_batched` / `llm_query` /
`llm_query_batched` keep their names — those are correctly named (they
ARE recursive within the REPL) and the model knows them from the system
prompt and metadata.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The RLM tool used to spawn a fresh `python3 -c "..."` per round and route
sub-LLM calls through a localhost axum sidecar; state persisted only via a
JSON file (lossy: imports and non-JSON values were lost). The model could
also short-circuit by replying with prose and the loop would ship the
prose as if it came from the REPL.
This commit replaces that with one long-lived `python3 -u` subprocess per
turn driven by a stdin/stdout RPC protocol with UUID-prefixed sentinels.
No more HTTP server, no more port allocation, no more JSON state file —
variables, imports, and any other Python state persist naturally across
rounds. The `RlmBridge` (`crates/tui/src/rlm/bridge.rs`) services
`llm_query` / `llm_query_batched` / `rlm_query` / `rlm_query_batched`
calls inline, recursing into `run_rlm_turn_inner` for sub-RLMs.
The system prompt is tightened: the only legal turn shape is one
` ```repl ` block; calling `FINAL(...)` from prose without ever invoking a
sub-LLM is rejected with a strict reminder. The `DirectAnswer` termination
is gone, replaced by `NoCode` which only surfaces after multiple consecutive
empty rounds. `rlm_process` now returns a per-round trace (code summary,
sub-LLM call count, elapsed) so callers can verify the model actually
engaged with `context` rather than guessing from the preview.
Net: -313 lines. 17 new REPL runtime tests cover variable persistence,
import persistence, RPC round-trips, FINAL capture, and error recovery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>