- README: add a "Publishing your own skill" section explaining the
`github:owner/repo` install path, the multi-skill `skills/<name>/`
layout, and how to submit to the curated registry.
- config.example.toml: document `[skills] registry_url` /
`max_install_size_bytes` next to the existing `[network]` section so
users see the network-gate dependency in context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slash-command surface for the community-skill installer:
- `/skill install <github:owner/repo|https://...|<registry-name>>` parses
the spec via `InstallSource::parse`, calls `install_with_registry`, and
surfaces `NeedsApproval`/`NetworkDenied` with actionable messages
pointing at `[network]` config (we deliberately don't dispatch a modal
from the sync slash-command path; the underlying installer returns the
outcome so a future approval wiring can reuse it).
- `/skill update <name>` re-fetches and prints "no upstream change" when
the checksum matches.
- `/skill uninstall <name>` and `/skill trust <name>` both refuse to
touch system skills (no `.installed-from` marker).
- `/skills --remote` (or `/skills remote`) fetches the curated registry
through the same network gate and prints `name — description (source)`.
Internals:
- Sub-command dispatch happens in `run_skill` before activation lookup,
so a user can't accidentally activate a skill literally named
`install`. Async install/update/uninstall plumbed through
`tokio::task::block_in_place` + `Handle::current().block_on`, matching
the existing pattern in `commands/cycle.rs`.
- `installer_settings` loads `Config` on demand — `App` doesn't carry a
`Config` reference, and the cost of a single TOML parse is negligible
next to the network round-trip the install will make.
Config:
- New `[skills]` section in both `crates/tui/src/config.rs::Config` and
the workspace `crates/config/src/lib.rs::ConfigToml` with
`registry_url` (default: bundled raw GitHub index) and
`max_install_size_bytes` (default: 5 MiB).
- `merge_config` propagates the new field, default impls cover the
unset case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `crates/tui/src/skills/install.rs` — async installer that pulls
user-authored skills from GitHub repos, raw tarball URLs, or a curated
`index.json` registry. The whole pipeline is gated by the per-domain
`NetworkPolicy` (#135), validated against path-traversal / size / symlink
attacks before any bytes hit the destination, and atomic-renamed into place
so a half-installed skill cannot survive a failure mid-extract.
Public surface:
- `InstallSource::{GitHubRepo,DirectUrl,Registry}` with `parse(spec)`.
- `install` / `install_with_registry` returning
`InstallOutcome::{Installed,NeedsApproval,NetworkDenied}`.
- `update` / `update_with_registry` returning
`UpdateResult::{NoChange,Updated,NeedsApproval,NetworkDenied}` — uses a
SHA-256 over the downloaded tarball to short-circuit no-op fetches.
- `uninstall` / `trust` — both refuse to touch directories without an
`.installed-from` marker, so the bundled `skill-creator` system skill is
protected.
- `fetch_registry` — typed loader for the curated `index.json`.
Validation hard rules (each covered by an integration test):
- `..` segments and absolute paths in tar entries are rejected.
- Symlinks / hardlinks in tar entries are rejected outright.
- Uncompressed total size is bounded by `max_size` (default 5 MiB).
- SKILL.md must exist at the archive root or under `skills/<name>/`.
- Frontmatter must carry both `name` and `description`.
- `install` with an existing destination requires `update = true`.
- `update` re-fetches and only replaces the on-disk install when the
checksum changes; no-change paths skip the rename entirely.
Adds `tar`, `flate2`, and `sha2` to `crates/tui/Cargo.toml` and propagates
the resulting lockfile drift to `Cargo.lock`.
Tests: 11 colocated unit tests in `install.rs` + 11 integration tests in
`crates/tui/tests/skill_install.rs` driving a `tiny_http`-based server so
the network gate, download cap, validation pipeline, and atomic rename
all run end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Connects the new BacktrackState to the live UI:
- App: holds a `backtrack: BacktrackState` and a new
`truncate_history_to(new_len)` helper that keeps `tool_cells`,
`tool_details_by_cell`, and the sub-agent card index consistent.
- live_transcript: gains a `Mode::BacktrackPreview { selected_idx }`
that highlights the Nth-from-tail HistoryCell::User with a `▶` marker
and reverse-video styling. Cache stays valid across mode flips —
decoration is applied post-wrap. Left/Right/Enter/Esc emit new
`ViewEvent::Backtrack{Step,Confirm,Cancel}` events.
- ui.rs: routes Esc through `BacktrackState::handle_esc` only when
no popup is open and not streaming, opens the preview overlay on
the second Esc, and on confirm trims `app.history` /
`app.api_messages` and refills the composer with the dropped user
input. Streaming and existing popup paths preserve their original
Esc behaviour.
- keybindings: documents the `Esc Esc` chord in the help catalog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces `tui::backtrack::BacktrackState` — a small Inactive/Primed/
Selecting state machine for the two-step Esc chord. The module owns
nothing beyond its phase enum; transcript snapshots, popup detection,
and fork side-effects all stay in the UI layer so the state machine is
trivially unit-testable.
`handle_esc(total_user_messages)` returns one of `None | Prime |
Cancel | OpenOverlay`, `step(Direction)` walks the selection in
`Selecting`, and `confirm()` yields the depth-from-tail and resets to
`Inactive`. 15 unit tests cover every transition including bounds
clamping, empty-transcript short-circuit, and defensive Esc routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `RuntimeThreadManager::fork_at_user_message(id, depth_from_tail)` —
a sibling of the existing `fork_thread` that drops every turn from the
Nth-from-tail user message onward and returns the dropped user input so
the caller can pre-populate the composer.
The existing `fork_thread` is left untouched. The new helper mirrors its
copy loop but stops short of the cutoff turn, emitting a
`thread.forked` event with backtrack provenance fields. Includes unit
tests covering depth=0, depth=1, out-of-range error, and source-thread
non-mutation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- crates/tui/src/tui/ui.rs — new Alt+V/Alt+Shift+V arm next to Alt+A/Y/P family, no empty-input gate
- crates/tui/src/tui/history.rs — 5 hint strings + 4 test assertions updated to "Alt+V for details"
- crates/tui/src/tui/keybindings.rs — entry under Submission so ? overlay lists it
- Bare 'v' handler unchanged (legacy muscle memory)
Adds `integration_mock_llm.rs` covering the LlmClient trait surface:
- streaming turn loop (text deltas + finish reason)
- reasoning-content replay across tool-call rounds (V4 §5.1.1, the
HTTP 400 path that broke v0.4.9-v0.5.1)
- tool-call round-trip with chunked input JSON
- multiple tool calls in one turn preserve event ordering
- compaction-style non-streaming `create_message`
- sub-agent style independent parent/child mocks
- capacity-gate observation of a captured request
Four full-engine tests are `#[ignore]`-marked as BLOCKED on the engine
refactor from concrete `Option<DeepSeekClient>` to `Arc<dyn LlmClient>`.
Once that wiring lands the ignored tests light up with no mock changes.
Adds:
- `tests/support/llm_client.rs` mirrors the trait so the mock can be
brought into the integration test via `#[path]` without dragging in
the rest of the binary's module tree
- `tests/fixtures/.gitkeep` so the `eval --record` output directory
rides the repo
- `tests/README.md` documents both the trait-level mocking strategy
and the `--record` fixture flow
- `record_flag_writes_one_jsonl_line_per_step` in `eval_harness.rs`
exercises the new `--record` flag end-to-end
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a queue-driven `MockLlmClient` that implements the `LlmClient` trait
by replaying canned per-turn `StreamEvent` vectors and capturing every
outgoing `MessageRequest`. The mock lives at the trait boundary so it
stays decoupled from the concrete reqwest plumbing inside `DeepSeekClient`,
and surfaces builders (`canned::*`) for the common event shapes (text
delta, thinking delta, tool_use start, input JSON delta, message delta).
Wires a new `--record <DIR>` flag into `deepseek eval` that appends one
JSON Lines fixture line per step to `<DIR>/<scenario>.jsonl`. The format
is documented at the top of `eval.rs` and is the storage shape the mock
will replay from.
`crates/tui/src/llm_client.rs` becomes `crates/tui/src/llm_client/mod.rs`
to host the new submodule cleanly. The trait shape is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threads the optional `NetworkPolicyDecider` from `EngineConfig` through to
`ToolContext.network_policy` and `McpPool::with_network_policy`. Each gate
point follows the same pattern: extract the host, call `decider.evaluate`,
then `Allow` proceeds, `Deny` returns a structured permission-denied error,
and `Prompt` falls through to the same denial with a hint pointing to
`/network allow <host>` (full modal flow lands in a follow-up).
* `fetch_url` — gates on the parsed URL host.
* `web_search` — gates DuckDuckGo (`html.duckduckgo.com`) and the Bing
fallback (`www.bing.com`) independently so a deny on one engine doesn't
silently let the other through.
* MCP — only the HTTP/SSE transport is gated; STDIO MCP servers are
unaffected. `McpConnection::connect_with_policy` replaces the old
`connect` (no external callers existed).
The session cache short-circuits `evaluate` once a host is approved, so
the existing `approve_session` hook is enough to wire the prompt-once
flow when the approval modal lands.
`NetworkPolicyDecider::with_default_audit` materializes the auditor at
`~/.deepseek/audit.log` when the config has `audit = true`.
Includes one tool-level test asserting `fetch_url` denies a blocked host
through the policy gate.
Adds the `[network]` table to both the workspace config crate (`ConfigToml`)
and the live tui config (`Config`), plus a documented example block in
`config.example.toml`. Schema:
```toml
[network]
default = "prompt" # allow | deny | prompt
allow = ["api.deepseek.com", "github.com"]
deny = []
audit = true
```
`NetworkPolicyToml::into_runtime()` builds a runtime `NetworkPolicy` so the
engine can construct a `NetworkPolicyDecider` without reaching across crate
boundaries. Defaults preserve pre-v0.7.0 behavior: when the section is
absent, no policy is enforced.
Introduces `network_policy::{Decision, NetworkPolicy, NetworkPolicyDecider,
NetworkAuditor, NetworkSessionCache, NetworkDenied}` for gating outbound
network calls.
Deny-wins precedence: a host listed in both `allow` and `deny` is denied.
Subdomain wildcard via leading-dot entries (`.example.com` matches
`api.example.com` but not the apex). Audit log writes one plaintext line
per terminal decision to `~/.deepseek/audit.log` in the format
`<RFC3339> network <host> <tool> <Allow|Deny|Prompt-Approved|Prompt-Denied>`.
Approve-once-for-session caching is implemented in `NetworkSessionCache`;
`approve_persistent` mutates the policy's allow list so callers can write
back to config later.
19 unit tests cover deny-wins precedence, subdomain matching, audit
logging, session-cache short-circuit, and `NetworkDenied` shape.
Adds first-class keyring management on the dispatcher CLI and wires
the TUI to read its DeepSeek key through the same Secrets façade.
Subcommands:
* `auth set --provider <name>` writes to the OS keyring; prompts
on stdin without echo, never prints the key, never touches
`config.toml`. Supports `--api-key` and `--api-key-stdin`.
* `auth get --provider <name>` reports `set` / `not set` plus the
resolving layer (keyring / env / config-file). Never prints the
value.
* `auth clear --provider <name>` deletes from keyring and from any
legacy plaintext slot in `config.toml` for parity.
* `auth list` table of all known providers and
whether each layer holds a key. Non-revealing.
* `auth migrate [--dry-run]` reads `api_key` (root + per-provider
blocks) from `config.toml`, writes them to the keyring, then
strips the entries from disk. Idempotent.
* `auth status` expanded to also report the active
keyring backend and per-provider keyring state.
`doctor` now prints `keyring backend: ...` plus per-provider
`keyring=yes/no, env=yes/no` lines and points users at
`deepseek auth set` when no key resolves.
`Config::deepseek_api_key()` in the TUI is rewritten to consult
`Secrets::auto_detect()` first (keyring -> env), then fall back to
the existing TOML slots with a deprecation warning. Error messages
now lead with `deepseek auth set --provider <name>`.
5 new unit tests cover argument parsing for the new subcommands and
end-to-end auth set/clear/migrate behaviour against an
`InMemoryKeyringStore`, verifying that no plaintext key ever lands
in `config.toml`.
Verified manually on macOS:
$ deepseek auth set --provider deepseek --api-key-stdin
$ security find-generic-password -s deepseek -a deepseek
# entry present
$ deepseek auth migrate
# api_key lines stripped from ~/.deepseek/config.toml
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes `ConfigToml::resolve_runtime_options` through the new
`deepseek_secrets::Secrets` façade so API keys are read from the OS
keyring before any environment variable, with the existing
plaintext-config layer kept as a deprecated last resort. The
precedence is now:
CLI flag -> keyring -> env -> config-file
Reads of an `api_key` value from `~/.deepseek/config.toml` now emit
a one-time `tracing::warn!` directing users to
`deepseek auth set` / `deepseek auth migrate`.
`resolve_runtime_options_with_secrets` is exposed for tests and
process-level injection (the `cfg(test)` default uses an in-memory
store so unit tests never touch the real OS keychain). The
nvidia-nim provider keeps its `DEEPSEEK_API_KEY` env fallback for
back-compat. New tests cover keyring > env > config-file precedence
end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `deepseek-secrets` crate with the OS keyring backend,
in-memory store for tests, and a JSON-on-disk fallback for
headless environments. The Secrets façade collapses keyring -> env
into a single resolver; callers layer on CLI flags above and TOML
config below to preserve the keyring -> env -> config-file precedence.
* `KeyringStore` trait + `DefaultKeyringStore` (keyring 3.6 with
per-platform native features).
* `InMemoryKeyringStore` for unit tests.
* `FileKeyringStore` writes ~/.deepseek/secrets/secrets.json with
mode 0600 on unix; rejects world-readable files at read time.
* `Secrets::auto_detect` probes the OS keyring and falls back to
the file store on headless Linux.
* 9 unit tests covering round-trips, precedence, and 0600 perms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds crates/tui/src/tui/notifications.rs with Method enum (Auto/Osc9/Bel/Off),
notify_done / notify_done_to helpers, tmux DCS passthrough, and 9 unit tests.
Wires the hook at the TurnComplete event in tui/ui.rs so turns >= threshold_secs
(default 30 s) emit an escape to stdout; method auto-detects iTerm.app/Ghostty/
WezTerm for OSC 9 and falls back to BEL. Config exposed under [notifications] in
config.toml and documented in config.example.toml.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add crates/tui/assets/skills/skill-creator/SKILL.md (ported from
OpenAI codex repo, MIT-licensed; all Codex→deepseek / ~/.codex→~/.deepseek
replacements applied; MIT attribution comment at top).
- Convert crates/tui/src/skills.rs → skills/mod.rs + new skills/system.rs.
install_system_skills() writes the bundled SKILL.md on fresh install or
version bump; respects user-deleted directories; idempotent by design.
Version marker at ~/.deepseek/skills/.system-installed-version.
- Wire install_system_skills() into run_interactive() (main.rs) before the
TUI mounts; errors are non-fatal (logged as warnings).
- Add /skill new as an alias for /skill skill-creator in commands/skills.rs.
- 7 unit tests covering fresh install, idempotence, user-deleted dir, version
bump, uninstall (all pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Alt+V keybinding that opens the tool-details pager regardless of
composer state, fixing the broken "press v for details" hint. Update
all 5 transcript hint strings to show "Alt+V for details". Bare v with
empty composer is preserved for legacy muscle memory.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Windows preserves the user-typed `/` when Path::join() ingests a multi-
component string with forward slashes, producing a mixed-separator path
in the rendered <file> block (e.g. `C:\...\.tmpKxj0Pk\nested/deep/file.md`).
The test compared full paths via display(), which mismatched.
Switch to a basename comparison per CLAUDE.md's portability rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed dead_code allow from error_taxonomy.rs. Event::error now carries
an ErrorEnvelope with category + severity instead of (String, bool); all
~13 engine callsites migrated through a small helper API (ErrorEnvelope::transient
/ ::fatal / ::fatal_auth / ::context_overflow / ::network / ::tool / ::classify).
Capacity controller branches on ErrorCategory instead of substring matches.
TUI renders severity with distinct palette tokens via a new HistoryCell::Error
variant. Audit log carries category + severity fields so downstream tooling
can categorize.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pager view gains a sticky_to_bottom mode: scroll-up pauses auto-tail,
scrolling back to bottom resumes it. Wrapped lines cached by
(cell_id, width, revision); revisions bumped on live-cell mutation so
resize doesn't reflow the world.
Ctrl+T toggles; Esc returns. Engine continues streaming while overlay open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port of codex-main's PendingInputPreview pattern. Three semantic buckets
render in the composer area: pending steers (Esc submits), rejected
steers (re-fires at end of turn), queued follow-ups (Alt+Up edits last).
Empty state renders zero rows.
Engine populates the new App.pending_steers / rejected_steers fields
through the steer-submission path; existing queued_messages plumbing
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cards consume the #130 mailbox stream and render live in the transcript:
- DelegateCard: last-3-actions tree for active agent_spawn
- FanoutCard: dot-grid + aggregate stats for agent_swarm / rlm fanout
Sidebar demoted to a navigator (count + role); detail lives in the card.
Engine wires SubAgentRuntime::with_mailbox so the primitive actually flows.
Cards re-bind on session resume via runtime_threads agent_ids.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
StatusItem enum covers all current + new (rate-limit, ctx %, git branch,
last-tool elapsed) items. /statusline opens a multi-select picker with
live preview; selections persist to config.toml under `tui.status_items`.
Default selection mirrors today's footer so upgraders see no change.
Reflowed approval.rs to a full-screen modal with two stakes-based variants:
benign (single-key approve / always) and destructive (explicit confirm).
Variant routing classifies from tool kind + command-safety so destructive
ops never get a muscle-memory accept.
Existing approval tests still pass; new tests cover variant routing + keys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1: log full reqwest error chain + headers + bytes-received at decode site
Phase 2: HTTP/2 keepalive settings + tcp keepalive on the reqwest builder
Phase 3: engine transparently retries when stream errors before any content;
surface error on mid-stream failure (no double-bill); stream_errors
threshold relaxed 3 -> 5 with the new keepalive
Phase 4: unit tests for the four classes of stream failure
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New HelpView modal lists all slash commands with descriptions and all
keybindings, with a live substring filter. Bound to `?` when focus is
outside the composer; Esc / `?` toggles.
Slash commands pull from the existing slash_menu registry; keybindings
pull from a new KeybindingCatalog single-source-of-truth so docs can't
drift from the wired handlers.