Adds `integration_mock_llm.rs` covering the LlmClient trait surface:
- streaming turn loop (text deltas + finish reason)
- reasoning-content replay across tool-call rounds (V4 §5.1.1, the
HTTP 400 path that broke v0.4.9-v0.5.1)
- tool-call round-trip with chunked input JSON
- multiple tool calls in one turn preserve event ordering
- compaction-style non-streaming `create_message`
- sub-agent style independent parent/child mocks
- capacity-gate observation of a captured request
Four full-engine tests are `#[ignore]`-marked as BLOCKED on the engine
refactor from concrete `Option<DeepSeekClient>` to `Arc<dyn LlmClient>`.
Once that wiring lands the ignored tests light up with no mock changes.
Adds:
- `tests/support/llm_client.rs` mirrors the trait so the mock can be
brought into the integration test via `#[path]` without dragging in
the rest of the binary's module tree
- `tests/fixtures/.gitkeep` so the `eval --record` output directory
rides the repo
- `tests/README.md` documents both the trait-level mocking strategy
and the `--record` fixture flow
- `record_flag_writes_one_jsonl_line_per_step` in `eval_harness.rs`
exercises the new `--record` flag end-to-end
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a queue-driven `MockLlmClient` that implements the `LlmClient` trait
by replaying canned per-turn `StreamEvent` vectors and capturing every
outgoing `MessageRequest`. The mock lives at the trait boundary so it
stays decoupled from the concrete reqwest plumbing inside `DeepSeekClient`,
and surfaces builders (`canned::*`) for the common event shapes (text
delta, thinking delta, tool_use start, input JSON delta, message delta).
Wires a new `--record <DIR>` flag into `deepseek eval` that appends one
JSON Lines fixture line per step to `<DIR>/<scenario>.jsonl`. The format
is documented at the top of `eval.rs` and is the storage shape the mock
will replay from.
`crates/tui/src/llm_client.rs` becomes `crates/tui/src/llm_client/mod.rs`
to host the new submodule cleanly. The trait shape is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds first-class keyring management on the dispatcher CLI and wires
the TUI to read its DeepSeek key through the same Secrets façade.
Subcommands:
* `auth set --provider <name>` writes to the OS keyring; prompts
on stdin without echo, never prints the key, never touches
`config.toml`. Supports `--api-key` and `--api-key-stdin`.
* `auth get --provider <name>` reports `set` / `not set` plus the
resolving layer (keyring / env / config-file). Never prints the
value.
* `auth clear --provider <name>` deletes from keyring and from any
legacy plaintext slot in `config.toml` for parity.
* `auth list` table of all known providers and
whether each layer holds a key. Non-revealing.
* `auth migrate [--dry-run]` reads `api_key` (root + per-provider
blocks) from `config.toml`, writes them to the keyring, then
strips the entries from disk. Idempotent.
* `auth status` expanded to also report the active
keyring backend and per-provider keyring state.
`doctor` now prints `keyring backend: ...` plus per-provider
`keyring=yes/no, env=yes/no` lines and points users at
`deepseek auth set` when no key resolves.
`Config::deepseek_api_key()` in the TUI is rewritten to consult
`Secrets::auto_detect()` first (keyring -> env), then fall back to
the existing TOML slots with a deprecation warning. Error messages
now lead with `deepseek auth set --provider <name>`.
5 new unit tests cover argument parsing for the new subcommands and
end-to-end auth set/clear/migrate behaviour against an
`InMemoryKeyringStore`, verifying that no plaintext key ever lands
in `config.toml`.
Verified manually on macOS:
$ deepseek auth set --provider deepseek --api-key-stdin
$ security find-generic-password -s deepseek -a deepseek
# entry present
$ deepseek auth migrate
# api_key lines stripped from ~/.deepseek/config.toml
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes `ConfigToml::resolve_runtime_options` through the new
`deepseek_secrets::Secrets` façade so API keys are read from the OS
keyring before any environment variable, with the existing
plaintext-config layer kept as a deprecated last resort. The
precedence is now:
CLI flag -> keyring -> env -> config-file
Reads of an `api_key` value from `~/.deepseek/config.toml` now emit
a one-time `tracing::warn!` directing users to
`deepseek auth set` / `deepseek auth migrate`.
`resolve_runtime_options_with_secrets` is exposed for tests and
process-level injection (the `cfg(test)` default uses an in-memory
store so unit tests never touch the real OS keychain). The
nvidia-nim provider keeps its `DEEPSEEK_API_KEY` env fallback for
back-compat. New tests cover keyring > env > config-file precedence
end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `deepseek-secrets` crate with the OS keyring backend,
in-memory store for tests, and a JSON-on-disk fallback for
headless environments. The Secrets façade collapses keyring -> env
into a single resolver; callers layer on CLI flags above and TOML
config below to preserve the keyring -> env -> config-file precedence.
* `KeyringStore` trait + `DefaultKeyringStore` (keyring 3.6 with
per-platform native features).
* `InMemoryKeyringStore` for unit tests.
* `FileKeyringStore` writes ~/.deepseek/secrets/secrets.json with
mode 0600 on unix; rejects world-readable files at read time.
* `Secrets::auto_detect` probes the OS keyring and falls back to
the file store on headless Linux.
* 9 unit tests covering round-trips, precedence, and 0600 perms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add crates/tui/assets/skills/skill-creator/SKILL.md (ported from
OpenAI codex repo, MIT-licensed; all Codex→deepseek / ~/.codex→~/.deepseek
replacements applied; MIT attribution comment at top).
- Convert crates/tui/src/skills.rs → skills/mod.rs + new skills/system.rs.
install_system_skills() writes the bundled SKILL.md on fresh install or
version bump; respects user-deleted directories; idempotent by design.
Version marker at ~/.deepseek/skills/.system-installed-version.
- Wire install_system_skills() into run_interactive() (main.rs) before the
TUI mounts; errors are non-fatal (logged as warnings).
- Add /skill new as an alias for /skill skill-creator in commands/skills.rs.
- 7 unit tests covering fresh install, idempotence, user-deleted dir, version
bump, uninstall (all pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows preserves the user-typed `/` when Path::join() ingests a multi-
component string with forward slashes, producing a mixed-separator path
in the rendered <file> block (e.g. `C:\...\.tmpKxj0Pk\nested/deep/file.md`).
The test compared full paths via display(), which mismatched.
Switch to a basename comparison per CLAUDE.md's portability rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed dead_code allow from error_taxonomy.rs. Event::error now carries
an ErrorEnvelope with category + severity instead of (String, bool); all
~13 engine callsites migrated through a small helper API (ErrorEnvelope::transient
/ ::fatal / ::fatal_auth / ::context_overflow / ::network / ::tool / ::classify).
Capacity controller branches on ErrorCategory instead of substring matches.
TUI renders severity with distinct palette tokens via a new HistoryCell::Error
variant. Audit log carries category + severity fields so downstream tooling
can categorize.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pager view gains a sticky_to_bottom mode: scroll-up pauses auto-tail,
scrolling back to bottom resumes it. Wrapped lines cached by
(cell_id, width, revision); revisions bumped on live-cell mutation so
resize doesn't reflow the world.
Ctrl+T toggles; Esc returns. Engine continues streaming while overlay open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port of codex-main's PendingInputPreview pattern. Three semantic buckets
render in the composer area: pending steers (Esc submits), rejected
steers (re-fires at end of turn), queued follow-ups (Alt+Up edits last).
Empty state renders zero rows.
Engine populates the new App.pending_steers / rejected_steers fields
through the steer-submission path; existing queued_messages plumbing
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cards consume the #130 mailbox stream and render live in the transcript:
- DelegateCard: last-3-actions tree for active agent_spawn
- FanoutCard: dot-grid + aggregate stats for agent_swarm / rlm fanout
Sidebar demoted to a navigator (count + role); detail lives in the card.
Engine wires SubAgentRuntime::with_mailbox so the primitive actually flows.
Cards re-bind on session resume via runtime_threads agent_ids.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
StatusItem enum covers all current + new (rate-limit, ctx %, git branch,
last-tool elapsed) items. /statusline opens a multi-select picker with
live preview; selections persist to config.toml under `tui.status_items`.
Default selection mirrors today's footer so upgraders see no change.
Reflowed approval.rs to a full-screen modal with two stakes-based variants:
benign (single-key approve / always) and destructive (explicit confirm).
Variant routing classifies from tool kind + command-safety so destructive
ops never get a muscle-memory accept.
Existing approval tests still pass; new tests cover variant routing + keys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1: log full reqwest error chain + headers + bytes-received at decode site
Phase 2: HTTP/2 keepalive settings + tcp keepalive on the reqwest builder
Phase 3: engine transparently retries when stream errors before any content;
surface error on mid-stream failure (no double-bill); stream_errors
threshold relaxed 3 -> 5 with the new keepalive
Phase 4: unit tests for the four classes of stream failure
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New HelpView modal lists all slash commands with descriptions and all
keybindings, with a live substring filter. Bound to `?` when focus is
outside the composer; Esc / `?` toggles.
Slash commands pull from the existing slash_menu registry; keybindings
pull from a new KeybindingCatalog single-source-of-truth so docs can't
drift from the wired handlers.
Two-pass resolution in file_mention::resolve_mention_path -- try
workspace.join first, then std::env::current_dir().join, then a basename
fuzzy fallback. Extracted the shared resolver into working_set.rs so the
future Ctrl+P fuzzy picker (#97) uses the same logic.
Tests cover the workspace/CWD divergence repro that was masquerading as a
"@ doesn't work" report.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each sub-agent type now has an explicit SUMMARY / EVIDENCE / CHANGES /
RISKS / BLOCKERS output contract, mode-specific guidance (explorer /
planner / reviewer / general), and tool-calling conventions that prefer
the typed tool surface over exec_shell shellouts.
The output format is defined once and referenced from each per-type
prompt, so future tweaks live in one place.
Internal upgrade — public tool surface (agent_spawn, agent_swarm, …) unchanged.
The mailbox primitive replaces ad-hoc mpsc plumbing in the runtime so:
- progress events have monotonic ordering
- subscribers get watch-based backpressure
- close-as-cancel propagates through nested children
Pairs with #128 (in-transcript cards consume the mailbox stream).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Windows CI test for archive_cycle_writes_jsonl_with_header_and_messages
was failing because:
1. The path-suffix assertion used "/1.jsonl" as a literal — Windows uses
backslashes, so the assertion never matched. Replace with file_name()
comparison which is platform-agnostic.
2. set_var("HOME", ...) doesn't redirect dirs::home_dir() on Windows;
that function reads USERPROFILE on Windows. Set both env vars so the
test redirects portably.
Same HomeGuard pattern in tools/recall_archive.rs tests was passing on
Windows by luck (each test uses a unique session_id, so writes to the
real ~/.deepseek/sessions/ didn't clash) but was polluting the user's
home directory in CI. Mirror the dual-env-var fix there too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run `cargo fmt --all` after the four redesign sub-areas land to settle
attribute placement (`#[allow(dead_code)]` lives after doc comments,
not between them — interleaving was splitting docs from items).
Inline the trailing `let dom = …; dom` in `nearest_ansi16` to satisfy
clippy::let_and_return.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub-area #3 of the v0.6.6 UI redesign (issue #121).
Introduces `crates/tui/src/tui/widgets/tool_card.rs` — a small,
self-contained vocabulary module that owns:
- `ToolFamily` enum (Read / Patch / Run / Find / Delegate / Fanout /
Think / Generic) and the verb-glyph + label per family
(▷ read, ◆ patch, ▶ run, ⌕ find, ◐ delegate, ⋮⋮ fanout, … think)
- `tool_family_for_title` — maps the legacy header titles
(`"Shell"`, `"Patch"`, `"Workspace"`, `"Search"`, `"Diff"`, `"Image"`)
to a family, so existing call sites pick up the new glyph without
re-architecture
- `tool_family_for_name` — maps actual tool names
(`agent_spawn`, `apply_patch`, etc.) for `GenericToolCell`, which
shares the catch-all `"Tool"` title across every model-facing tool
- `CardRail` + `rail_glyph` — the `╭ │ ╰` rail vocabulary, declared
here so any future per-card refactor has the matching glyphs
Wires the verb glyph + label into `render_tool_header` and adds a
`render_tool_header_with_family` overload so `GenericToolCell` can
route by tool name rather than the generic title. The header now reads
`<spinner> <verb-glyph> <verb> <state>` instead of
`<spinner> <Title-Case-Word> <state>`.
Existing parity tests for ExecCell / PlanUpdate are updated to assert
against the new header structure (verb + glyph) — the colour wiring is
unchanged. New tests pin the verb-glyph format end-to-end:
`agent_spawn` → `◐ delegate`, exec → `▶ run`.
Spinner cadence (TOOL_STATUS_SYMBOL_MS = 720 ms) is unchanged — the
spec already matched.
Deferred to a follow-up: full per-card rail (`╭ │ ╰`) refactor that
threads `CardRail` through every cell render path. The vocabulary is
in place; the layout pass is the next bite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub-area #5 of the v0.6.6 UI redesign (issue #121).
Spout strip:
- Replace box-drawing arc cups (`╭───╮`) with paired crests (`⌒‿`) over
the existing water-surface (`─`). Crests read as gentle ripples instead
of hard architectural arches — calmer eye-feel.
- Two crests at independent cadences: A advances every 4 ticks (~600 ms),
B every 6 ticks (~900 ms). Phase jitter every 17 ticks (~2.5 s) keeps
the pattern from settling into a strict beat.
- Frame counter cadence in `ui.rs` retimed from 80 ms to 150 ms so the
4×6×17 tick math lands on the spec'd timings.
Footer left cluster consolidates "what costs you what":
- Cost chip moves from the right-hand parade to the left, between model
and status: `mode · model · cost · status`.
- Priority drop is now status → cost → truncate model → mode-only. Cost
outranks status because it's steady info; status is a transient signal.
- Right cluster shrinks to coherence / agents / replay / cache.
Tests: existing strip determinism / position-advances tests are retuned
for the new tick math (12-tick window covers both crests). New tests pin
the cost slot's order on wide widths and confirm cost survives status
drop in the priority cascade.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While a turn is running, Esc with non-empty composer input now steers:
the typed text is captured, the in-flight HTTP request is cancelled,
and on TurnComplete::Interrupted every accumulated steer is merged
into a single fresh user message that re-enters the engine. Empty-input
Esc still cancels exactly as before.
State machine. SubmitDisposition::{Immediate, Queue, Steer} replaces
the implicit if/else in submit_or_steer_message; truth table preserves
the offline_mode + busy fallback path. App.pending_steers /
rejected_steers / submit_pending_steers_after_interrupt back the new
flow and the queue-visibility widget.
Partial save. Deliberate divergence from openai/codex which discards
on abort: V4 thinking is expensive, so the streaming Assistant cell is
tagged '[interrupted] …' (or '[interrupted]' when nothing streamed yet)
and its spinner is flipped off. The TurnComplete handler also calls the
helper so Ctrl+C / network failures get the same treatment, idempotent
with the optimistic call in the Esc handler.
Queue visibility. PendingInputPreview already supported all three
buckets; build_pending_input_preview now populates pending_steers and
rejected_steers alongside queued_messages. rejected_steers stays empty
under today's engine paths (no rejection signal yet) but renders if/when
populated.
Recovery. If TurnComplete arrives with Failed instead of Interrupted
while pending_steers is non-empty, the steers are demoted to the
visible queue so they're not silently lost.
Tests. 13 new app-level units cover the disposition truth table,
push/drain semantics, double-Esc idempotency, and the partial-save
helper. 11 new ui-level units cover Esc-action routing, slash-menu
priority, whitespace-only input handling, merge_pending_steers (empty
/ single / multi / skill-instruction), and the three-bucket preview.
Closes#122.
Sub-area #2 of the v0.6.6 UI redesign (issue #121).
Reasoning is the only deliberately-warm element in the redesigned
transcript. The treatment makes that visible:
- Header opener becomes `…` (slow exhale) instead of the spinner glyph
- Body left rail switches from solid `▏` to dashed `╎` so it visibly
differs from message body and tool output rails
- Body text carries an italic modifier
- Body lines tint with `palette::reasoning_surface_tint(depth)` — 12%
blend of SURFACE_REASONING (#362C1A) over DEEPSEEK_INK. ANSI-16
terminals get no bg (the named-color mapping would distort the warm)
- A trailing `▎` cursor in ACCENT_REASONING_LIVE follows the most recent
body line during streaming, suppressed under low_motion
Wires up palette helpers from the prior commit: `ColorDepth::detect`,
`reasoning_surface_tint`, `blend`. SURFACE_REASONING is no longer
dead-coded. The unused `thinking_symbol` helper is removed since the new
header doesn't spin.
Tests: dashed rail and italic body land on every body line; streaming
cursor appears only when motion is allowed; collapsed-summary affordance
keeps working.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces lossy summarization compaction with a checkpoint-restart
architecture (#124). At 110K cumulative tokens (per V4's 128K retrieval
elbow) the engine runs a briefing turn, archives the cycle to JSONL at
~/.deepseek/sessions/<id>/cycles/<n>.jsonl, then resets the in-memory
buffer to a fresh context: original system prompt + structured state
(plan/todos/working-set/sub-agents) + the model-curated <carry_forward>
briefing (~3K token cap, hard-bounded).
The compaction summarizer is now off by default. Per-model thresholds in
[cycle.per_model] let operators tune deepseek-v4-pro vs -flash separately.
Phase guard in should_advance_cycle blocks mid-tool/stream/approval boundaries;
engine only invokes at clean turn-completed events. Sub-agents are not
awaited — their handles are captured in the structured-state block so the
new cycle sees them still running.
Adds the recall_archive tool (#127) — BM25 over message text in archived
cycles, top-N hits with cycle/index/excerpt. Always-loaded across modes
via should_default_defer_tool so the agent doesn't need ToolSearch to
discover it. Children inherit it via with_full_agent_surface.
UI surfaces:
- /cycles, /cycle <n>, /recall <query> slash commands
- Sidebar shows cycle counter once a boundary fires
- CycleAdvanced engine event carries the full briefing so the UI can
populate app.cycle_briefings for /cycle <n>
- runtime_threads schema bumped to v2 (cycle.advanced events appear in
the durable timeline; load rejects future versions)
Tests: 21 cycle_manager + 13 recall_archive + 4 commands::cycle.
All 1168 workspace tests pass. Three parity gates pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub-area #1 of the v0.6.6 UI redesign (issue #121).
User cells now lead with `▎` (solid bar, no animation — input is finished).
Assistant cells lead with `●`. While streaming, the bullet pulses on a
2-second cycle between 30%..100% brightness via `palette::pulse_brightness`;
once a turn completes the bullet sits at full DEEPSEEK_SKY so finished
history reads as solid.
Honors `low_motion`: the pulse is suppressed and the glyph holds full
brightness regardless of streaming state. Pager / clipboard exports
(`transcript_lines`) also skip the pulse so screenshots are stable.
Existing pager titles in `ui.rs` (`history_cell_to_text`) keep the literal
"You" / "Assistant" wording — those drive the modal title bar and read
better as words than as glyphs.
Tests: glyphs replace the literal labels in both User and Assistant cells;
streaming pulse demonstrably dips below source brightness; idle and
low_motion both pin to source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the foundation for the v0.6.6 UI redesign (issue #121):
- ColorDepth enum with detect() reading COLORTERM/TERM
- adapt_color / adapt_bg gates for ANSI-16 terminals (drops bg tints
rather than coarse-mapping them, which would distort the palette)
- blend(fg, bg, alpha) for alpha compositing on RGB
- reasoning_surface_tint() — 12% blend over DEEPSEEK_INK (None on ANSI-16)
- pulse_brightness(color, now_ms) — 30%..100% sine swing on a 2s cycle
- nearest_ansi16() helper for legacy-terminal foreground mapping
Helpers carry #[allow(dead_code)] until the per-area redesign sub-tasks
wire them in (speaker pulse, reasoning treatment, footer cost cluster).
Tests cover envelope bounds and brand-color routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ClientError, StreamError, and their impl blocks in error_taxonomy.rs
- ApprovalCache in approval_cache.rs (pending #66 follow-up wiring)
- Legacy prompt constants in prompts.rs (backward compat)
- with_mcp_tools and McpToolAdapter in registry.rs (pending MCP migration)
- Fix rlm_query → rlm in when_not_to_use_sections_present test
- Wire 120 FPS FrameRateLimiter into run_event_loop via
time_until_next_draw + mark_emitted
- Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL
- Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode
- Add StreamingState::set_low_motion() to propagate to all block policies
- Tool spinner already freezes on first frame when low_motion is set
TODO_BACKEND.md §3, TODO_FIXES.md #4
v0.6.6 — sub-agents inherit the parent's full tool registry,
auto-approve, respect a depth cap, and propagate cancellation.
Adds optional cwd to agent_spawn for parallel-worktree dispatch.
Schema-ready for roles (full library lands in 0.6.7).
Changes:
- New ToolRegistryBuilder::with_full_agent_surface(...) shared by parent and child
- SubAgentToolRegistry::new refactored to use shared builder; per-type
allowlist becomes advisory
- SubAgentRuntime gains auto_approve, spawn_depth, max_spawn_depth, cancel_token
- Depth check at spawn entry; cancellation cascade via CancellationToken::child_token()
- <deepseek:subagent.done> sentinel emitted on child completion
- cwd: Option<PathBuf> on agent_spawn with workspace-boundary validation
- Stream wall-clock cap bumped to 30 min (was 300s)
- max_spawn_depth configurable via EngineConfig (default 3)
- Version bump to 0.6.6
Closes#99.