Adds riscv64 to build pipelines so CodeWhale ships prebuilt binaries
and npm wrappers for 64-bit RISC-V Linux (glibc) systems.
Changes:
**CI / build**
- release.yml: +2 build matrix entries (codewhale + codewhale-tui for
riscv64gc-unknown-linux-gnu), cross-compilation toolchain step using
a dedicated DEB822-format apt source for ports.ubuntu.com, bundle
step, and release-notes table row.
- nightly.yml: +2 matrix entries, matching cross-compilation setup.
- resolve job: handle workflow_dispatch when the target tag does not
yet exist (fall back to HEAD SHA).
**Packaging**
- npm/codewhale/scripts/artifacts.js: add riscv64 to ASSET_MATRIX
under linux so npm install -g codewhale resolves on RISC-V.
**Docs**
- docs/INSTALL.md: add riscv64 row to supported platforms table;
replace with clearer 'other architectures' wording.
Build strategy: cross-compile from ubuntu-latest (x86_64) using
gcc-riscv64-linux-gnu. The dbus runtime dependency (from the keyring
crate's secret-service backend) is satisfied via ports.ubuntu.com.
PKG_CONFIG_ALLOW_CROSS and a cross-target libdir are set so the
keyring crate finds dbus-1 during cross-compilation.
Docker support for linux/riscv64 is intentionally not added here:
GitHub Actions does not yet provide the infrastructure to build or
emulate riscv64 containers. The Dockerfile changes will follow when
the hosted CI surface supports it.
* docs: v0.8.46 CHANGELOG — platform archives, palette, sub-agents, sandbox, web install, search fixes
Closes#2188
* feat(v0.8.46): quick fixes — palette, model picker Esc, sub-agent sidebar, shell chip, model name casing, CVE bump (#2212)
* fix: bump qs to >=6.15.2 for CVE-2026-8723
Add qs override in feishu-bridge package.json to force transitive
dependency resolution to >=6.15.2, addressing CVE-2026-8723.
Refs: #2198
* fix: Esc in model picker applies last-highlighted choice
Previously Esc reverted to the initial model when the user hadn't
moved the selection. Now Esc always applies the currently highlighted
model and thinking-effort tier, making Esc consistent with Enter.
Also updates the picker footer hint from 'Esc cancel' to 'Esc apply'.
Refs: #2196
* feat: show '⏳ shell running' chip in TUI footer
Adds a footer_shell_chip function that displays a '⏳ shell running'
status chip in the footer's right cluster whenever a foreground shell
command is active via exec_shell. The chip is always visible regardless
of user-configured status items.
Refs: #2194
* feat: auto-collapse finished sub-agents in sidebar
When a sub-agent completes (status = 'done'), its detail lines
(id, steps, duration, progress) are now hidden in the sidebar agents
panel. Only the summary label line is shown, keeping the sidebar
compact. Running agents still show full detail.
Refs: #2195
* feat: refresh Whale dark palette for better contrast
Improve contrast and layer separation in the Whale dark theme:
- Deepen base background for more depth (10,17,32)
- Lighten panel (22,34,56) for clearer distinction from bg
- Lighten elevated surface (36,52,78) for better elevation
- Lighten selection (48,68,100) for clearer selected state
- Boost text hint (138,150,174) and dim (118,130,156) readability
- Brighter border (52,88,145) for better edge definition
- Update tool surface colors for consistency
Refs: #2197
* fix: preserve model name casing in normalize_model_name_for_provider
When the user enters a model name like 'DeepSeek-V4-Flash', the
normalizer was lowercasing it to 'deepseek-v4-flash' via the
canonical_official_deepseek_model_id function. Now the normalizer
preserves the caller's casing when the input already matches a known
model id case-insensitively. Compact aliases like 'deepseek-v4pro'
are still rewritten to 'deepseek-v4-pro'.
Refs: #2109
* feat(web): install download tile with arch detection, SHA256, China mirrors + companion binary fix (#2213)
* fix(web): download both codewhale and codewhale-tui binaries in install snippets
The SNIPPETS map only fetched one binary per platform, causing the
dispatcher to fail with MISSING_COMPANION_BINARY. Every arch now
downloads both codewhale AND codewhale-tui side-by-side.
- macOS/Linux: added second curl + combined chmod/xattr/mv for tui
- Windows: added second Invoke-WebRequest for codewhale-tui.exe
- VERIFY: PowerShell now hashes both binaries; Unix --ignore-missing
covers all present binaries in a single sha256sum pass
* feat(web): add install download tile with arch detection, SHA256, and China mirrors (#2192)
* feat(sandbox/linux): process hardening — PR_SET_DUMPABLE, NO_NEW_PRIVS, RLIMIT_CORE (#2214)
* feat(sandbox/linux): add process hardening module — PR_SET_DUMPABLE, NO_NEW_PRIVS, RLIMIT_CORE (#2183)
* feat(sandbox/linux): seccomp filter + bwrap passthrough
- seccomp: BPF filter whitelisting safe syscalls, denying ptrace/mount/kexec
and other dangerous syscalls. Uses raw BPF instructions via libc prctl to
avoid external dependencies (#2182).
- bwrap: optional bubblewrap passthrough when /usr/bin/bwrap is present
and [sandbox] prefer_bwrap=true in config. Creates read-only rootfs with
write access limited to the working directory (#2184).
- landlock detect_denial extended to recognize seccomp SIGSYS/"Bad system
call" patterns alongside existing Landlock EACCES/EPERM detection.
- SandboxManager gains prefer_bwrap field; set_prefer_bwrap on ShellManager.
- EngineConfig gains prefer_bwrap field, wired through main/ui/runtime_threads.
- Diagnostics now reports bwrap_available and cgroup_version.
- config.example.toml documents the prefer_bwrap key.
Pre-existing clippy fixes picked up in the same build:
- collapsible_if in ui.rs version-check
- cmp_owned in goal.rs test
- consecutive str::replace in normalize_auth_mode
Closes#2182, closes#2184
* docs: add cross-links to issue and PR templates in CONTRIBUTING.md (#2215)
- Link .github/ISSUE_TEMPLATE/bug_report.md and feature_request.md from
the Reporting Issues section
- Link .github/PULL_REQUEST_TEMPLATE.md from the Pull Request Guidelines
section
* feat(release): bundle platform archives with install scripts (#2216)
- Add bundle job to release workflow that creates per-platform archives
(tar.gz for Linux/macOS, .zip for Windows) containing both codewhale
and codewhale-tui binaries plus install scripts
- Create install.bat (Windows) — copies binaries to %USERPROFILE%\bin
- Create install.sh (Unix) — copies binaries to ~/.local/bin
- Windows gets a portable .zip variant without install script
- Release notes updated to promote archives as primary download method
- Individual binaries retained for npm wrapper and scripting
Closes#2193
* fix(web_search): fall back to DuckDuckGo when Bing returns zero results (#2130)
When the configured search provider is Bing and the query returns zero
results (common for technical/compound queries), fall through to the
DuckDuckGo path instead of reporting empty. A provenance message is
surfaced: "Bing returned no results; used DuckDuckGo fallback".
Also adds Security and Code of Conduct cross-links to CONTRIBUTING.md
per the sub-agent renovation (#2203).
* docs: SANDBOX.md threat model + RFCs for persistence and MCP + SandboxExecutor trait
- docs/SANDBOX.md: complete threat model describing each platform's sandbox
(Seatbelt, Landlock, seccomp, process hardening, bwrap, Windows v1).
Covers defense-in-depth layering, config keys, denial detection, limitations.
- docs/rfcs/2189-persistence-sqlite.md: RFC for SQLite migration (drafted by sub-agent)
- docs/rfcs/2190-mcp-modularization.md: RFC for MCP crate split into
protocol/client/server with OAuth support
- crates/tui/src/sandbox/policy.rs: SandboxExecutor trait definition and
SafetyLevel→SandboxPolicyBehavior mapping function with tests
Closes#2180, closes#2186, closes#2189, closes#2190
* feat: sandbox parity tests + remove sub-agent 100-turn cap
- Add sandbox parity tests covering platform detection, denial patterns,
bwrap preference, and policy consistency across modes (#2187)
- Remove arbitrary 100-turn sub-agent cap: DEFAULT_MAX_STEPS changed
from 100 to u32::MAX. Sub-agents now run until they produce a final
text response, are cancelled by the parent, or hit a configured
explicit budget (#2034)
Closes#2187, closes#2034
Rename the canonical binaries:
- `deepseek` → `codewhale` (CLI dispatcher)
- `deepseek-tui` → `codewhale-tui` (TUI runtime)
Both legacy names continue to ship as tiny deprecation shims that print
a one-line warning to stderr and forward argv to the new binary. The
shims are produced by two new `[[bin]]` entries in `crates/cli/Cargo.toml`
and `crates/tui/Cargo.toml` pointing at small source files under
`src/bin/`. They will be removed in v0.9.0.
Touchpoints:
- Cargo bin entries + new shim source files.
- clap `name`/`bin_name`/usage strings flip to `codewhale`.
- Dispatcher's sibling-binary discovery looks for `codewhale-tui` and
reports `codewhale` in its error/help prose. `DEEPSEEK_TUI_BIN` env
var stays — env vars are explicitly anti-scope.
- `update.rs` now downloads `codewhale-*` assets and verifies them
against `codewhale-artifacts-sha256.txt`. Legacy `deepseek-*` assets
and `deepseek-artifacts-sha256.txt` are still produced by the release
matrix so v0.8.40's `deepseek update` keeps working through one
transition release.
- `ci.yml`, `nightly.yml`, `release.yml` updated to build/upload the new
canonical binaries; `release.yml`'s matrix doubles to also ship the
legacy shim binaries so v0.8.40 update clients land on the shim.
- `scripts/release/crates.sh` and `check-versions.sh` updated for the
renamed crate names from R1.
Local gates green: `cargo check --workspace --all-targets --locked`,
`cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings`, `cargo test --workspace
--all-features --locked` (3226+ pass, 0 fail), and `cargo build
--release` produces all four binaries:
- target/release/codewhale (canonical dispatcher)
- target/release/codewhale-tui (canonical TUI)
- target/release/deepseek (legacy shim, forwards to codewhale)
- target/release/deepseek-tui (legacy shim, forwards to codewhale-tui)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a release follow-up job that updates the Homebrew tap from the checksum manifest when a tap token is configured.
The job now skips before checkout/download/update when neither HOMEBREW_TAP_PAT nor RELEASE_TAG_PAT is configured, so missing tap credentials do not fail an otherwise successful release.
Closes#1602.
Co-authored-by: Zhiping <2716057626@qq.com>
Co-authored-by: Oliver-ZPLiu <47081637+Oliver-ZPLiu@users.noreply.github.com>
* fix(config): keep DeepSeek beta endpoint for legacy cn alias
* fix(ci): filter download-artifact to deepseek* pattern
Prevents the release aggregation job from picking up non-binary
artifacts (e.g. Docker .dockerbuild cache layers) that cause the
checksum manifest to include spurious entries and the Release to
carry files it shouldn't.
* fix(tui): enable focus events to restore IME after app-switch
On macOS, switching away (Cmd+Tab) and back suspends the IME compositor.
Without focus-event handling, the TUI never signals readiness to the
terminal, so CJK input methods (Pinyin, Zhuyin, etc.) stop working.
- EnableFocusChange on startup so the terminal reports FocusGained/FocusLost
- Re-push KeyboardEnhancementFlags on FocusGained (some terminals reset
the enhanced keyboard mode on focus-loss)
- DisableFocusChange on shutdown for clean terminal handoff
* chore: cargo fmt
* docs: add DataWhale and DeepSeek to acknowledgments
* docs: fix DeepSeek name etymology in acknowledgments
* fix(tui): recapture viewport on focus restore
* docs: thank DeepSeek and DataWhale bilingually
Closes #944\n\n## Summary\n- mark Docker/GHCR publishing as experimental while the package is not publicly readable\n- align installer and release docs with the live npm/Scoop state\n- keep package-channel verification explicit for release triage\n\n## Test plan\n- ruby -e 'require "yaml"; YAML.load_file(".github/workflows/release.yml"); puts "release.yml ok"'\n- cargo test -p deepseek-tui-cli update::tests::test_asset_matching_accepts_binary_assets_and_rejects_checksums --locked\n- cargo fmt --all -- --check\n- git diff --check origin/main...HEAD\n- CI: Version drift, Lint, Test (ubuntu-latest), Test (macos-latest), Test (windows-latest), npm wrapper smoke
PR #646 imported `MessageBeep` from `windows::Win32::UI::WindowsAndMessaging`,
but in `windows` crate 0.60 the function lives at
`windows::Win32::System::Diagnostics::Debug::MessageBeep` and now takes a
typed `MESSAGEBOX_STYLE` returning `Result<()>`. The wrong import broke
every Windows build (Test, npm wrapper smoke, and the windows-msvc release
matrix entry). Fix the import path, enable the `Win32_System_Diagnostics_Debug`
feature, pass `MESSAGEBOX_STYLE(0)` for MB_OK, and discard the Result.
The v0.8.12 release also tripped on a transient `apt-get update` mirror
sync error on the ubuntu-24.04-arm runner, cascading via fail-fast. Wrap
every apt-get update in CI/release with a 5-attempt retry so flaky
ports.ubuntu.com mirrors no longer take down the binary matrix.
Verified: cargo check --target x86_64-pc-windows-gnu compiles cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The aarch64 deepseek build in release.yml run 25329602631 succeeded in
4m 53s but the rename step failed:
cp: cannot stat 'target/aarch64-unknown-linux-gnu.2.28/release/deepseek'
cargo zigbuild parses `aarch64-unknown-linux-gnu.2.28` by passing
`aarch64-unknown-linux-gnu` to cargo and the `.2.28` glibc minimum to
zig's CC. The cargo target output dir is therefore
`target/aarch64-unknown-linux-gnu/release/`, never the
glibc-versioned form.
v0.8.9 release.yml hard-coded the rust triple in the rename step and
worked. v0.8.10 added `target_zig: <triple>.<glibc>` to the matrix and
switched the rename step to `${{ matrix.target_zig || matrix.target }}`,
which silently became wrong for every zigbuild matrix leg.
This commit:
- Always uses `matrix.target` (rust triple) for the copy source path.
- Adds a defensive `find target -name "${binary}"` debug listing if
the expected binary isn't at the rust-target path, so future
cargo-zigbuild output-dir changes are visible in the build log
rather than just "No such file".
Replace `cargo build` with `cargo zigbuild` for Linux release binaries,
targeting `x86_64-unknown-linux-gnu.2.28` and `aarch64-unknown-linux-gnu.2.28`
so prebuilt binaries run on distributions with glibc ≥ 2.28 (RHEL 8+, CentOS 8+,
TencentOS 3, Debian 10+, Ubuntu 20.04+) instead of requiring glibc ≥ 2.39.
Fixes#555.
Signed-off-by: staryxchen <staryxchen@tencent.com>
Bundles the v0.8.8 stabilization fixes that were already implemented in the
working tree, plus the workflow/doc reconciliation called out in #507.
### Sub-agent runtime fixes
- **#509** Default sub-agent cap raised to 10 (configurable via
`[subagents].max_concurrent` in `config.toml`, hard ceiling 20). The
running-count calculation now ignores non-running, no-handle, and finished
handles so completed agents stop counting against the cap.
- **#510** `SharedSubAgentManager` is now `Arc<RwLock<...>>`; the read paths
that previously held a `Mutex` for inspection now take a read lock,
eliminating the multi-agent fan-out UI freeze.
- **#511** `compact_tool_result_for_context` summarizes `agent_result` /
`agent_wait` payloads before they are folded into the parent context.
- **#512** RLM tool cards map to `ToolFamily::Rlm` and render `rlm`, not
`swarm`. Stale "swarm" wording cleaned in docs/comments/tests.
- **#513** (foreground stopgap only) Foreground RLM work is visible in the
Agents sidebar projection. Full async RLM lifecycle remains v0.8.9 — the
issue stays open with a refined scope.
### TUI / UX fixes
- **#487** Offline composer queue is now session-scoped; legacy unscoped
queues fail closed.
- **#488** Composer Option+Backspace deletes by word; cross-platform key
routing helpers added.
- **#443/#444** Keyboard enhancement flags pop on normal AND panic exit; the
raw-mode startup probe is now bounded by a configurable timeout.
- **#449** Production footer reads statusline colors from `app.ui_theme`
rather than the bespoke palette.
- **#506** `display_path_with_home` no longer mutates `HOME` in tests; the
flake on shared-env CI is gone.
### Self-update / packaging
- **#503** `update.rs` arch mapping uses release-asset naming (`arm64`/`x64`)
instead of the raw Rust constants. The platform-asset selector also rejects
`.sha256` siblings as primary binaries. Tests now live alongside the source
in `mod tests` (the `#[path]`-based integration test was removed because it
duplicated test runs and forced a `pub(crate)` helper that no real caller
used).
- **`Max 5 in flight` wording updated** in `agent_spawn` description,
`prompts/base.md`, and `docs/TOOL_SURFACE.md` so the model sees the real
default cap (10) and the configuration knob name.
### CI / release docs (#507)
- Pruned three duplicated/dead workflows: `crates-publish.yml`, `parity.yml`,
`publish-npm.yml`. Their gates already run in `ci.yml` for every push/PR.
- `release.yml` build job now allows `parity` to be skipped (it only runs on
tag push), unblocking `workflow_dispatch` reruns. The job still fails
closed on a real parity failure.
- `RELEASE_RUNBOOK.md` reconciled: crate publishing is documented as the
manual `scripts/release/publish-crates.sh` flow (no automated workflow);
references to the deleted workflows removed.
- `CLAUDE.md` notes the `RELEASE_TAG_PAT` requirement for the auto-tag →
release.yml chain (without it, the tag is created but `release.yml` does
not fire) and documents the `workflow_dispatch` parity-skip behavior.
### Docs
- `docs/COMPETITIVE_ANALYSIS.md` added — capability matrix vs OpenCode and
Codex CLI, gap analysis, and recommended implementation order.
### Verification (this branch)
- `cargo fmt --all -- --check` ✓
- `cargo check --workspace --all-targets --locked` ✓
- `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` ✓
- `cargo test --workspace --all-features --locked` ✓ (1809 + supporting)
- Parity gates ✓ (snapshot, parity_protocol, parity_state)
- `cargo build --release --locked -p deepseek-tui-cli -p deepseek-tui` ✓
- Lockfile drift guard ✓
- `deepseek doctor --json` clean
- `deepseek eval` (offline harness) success=true, 0 tool errors
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Triggered by a Telegram report from a Chinese user trying to deploy
DeepSeek TUI on a HarmonyOS ARM64 thin-and-light: `npm i -g deepseek-tui`
exited with `Unsupported architecture: arm64 on platform linux` because
v0.8.7 only published x64 Linux artifacts. They worked around it with
`cargo install`, but the README never documented that path for ARM users.
This PR closes that gap on three layers:
- **Release workflow** — add `aarch64-unknown-linux-gnu` to the build
matrix using GitHub's `ubuntu-24.04-arm` runner. v0.8.8 will publish
`deepseek-linux-arm64` and `deepseek-tui-linux-arm64` alongside the
existing x64/macOS/Windows assets, plus add the row to the Release
body's manual-download table.
- **npm wrapper** — uncomment the linux/arm64 row in `ASSET_MATRIX`,
rewrite the `Unsupported architecture/platform` error to print the
full `cargo install deepseek-tui-cli deepseek-tui --locked` recipe
and link to docs/INSTALL.md, and add `DEEPSEEK_TUI_OPTIONAL_INSTALL=1`
so CI matrices that include unsupported platforms can keep running
without a binary.
- **Docs** — new docs/INSTALL.md covering every supported platform,
prebuilt vs. cargo install vs. manual download, cross-compiling x64
-> ARM64 with `cross` or `gcc-aarch64-linux-gnu`, China mirror setup,
and a troubleshooting section for the common arm64, MISSING_COMPANION_BINARY,
and self-update arch-mapping (#503) errors. README and README.zh-CN
now have an explicit Linux ARM64 quickstart pointing at `cargo install`
for v0.8.7 today and `npm i -g` for v0.8.8+; the v0.8.7 known-issue
block is updated to mention both #503 and the missing arm64 prebuilt.
https://claude.ai/code/session_01Fg1FKMtDxVnC4pp6bNBRCS
* fix(pricing): extend V4 Pro 75% discount expiry to 2026-05-31 15:59 UTC
DeepSeek extended the promotional discount past the original 2026-05-05
cutoff. Without this update the TUI would have started showing 4× the
actual billed cost on May 6.
Source: https://api-docs.deepseek.com/quick_start/pricing — "extended
until 2026/05/31 15:59 UTC".
Adds a regression test pinning the new active window so a future revert
to the May 5 date trips the suite immediately.
Closes#267
* chore: remove stale TODO(integrate) markers from already-integrated modules
Five `// TODO(integrate)` comments and one matching "Not yet integrated"
note were misleading anyone grepping for integration work. Each module
is in fact wired up:
- execpolicy/mod.rs → tools/shell.rs:1322 (load_default_policy)
- sandbox/mod.rs → tools/shell.rs:28, main.rs:2647, tui/approval.rs:30
- sandbox/policy.rs → main.rs:2752, tui/approval.rs:30 (SandboxPolicy)
- command_safety.rs → tools/shell.rs:1321, tools/tasks.rs:13,
tools/approval_cache.rs:26
- tui/streaming/mod.rs → tui/app.rs:38 (StreamingState)
The remaining TODO at mcp.rs:1771 covers a separate "wire legacy sync API
into CLI subcommands or remove" decision and is left in place.
Closes#266
* docs(release): add install + dual-binary template to GitHub Release page
Closes#265.
The Release page used the auto-generated commit-title body. New users
hitting the Release page from Twitter / npm-search had no on-page
guidance that the dispatcher (`deepseek`) and the TUI runtime
(`deepseek-tui`) ship as two binaries that must coexist; #258 was an
external user spending 11 minutes figuring this out and #272 was the
follow-on confusion.
The new body covers:
- npm wrapper as the recommended install
- `cargo install deepseek-tui-cli deepseek-tui --locked` (both crates)
- Manual download with a per-platform table showing both artifacts
- sha256 verify using the existing `deepseek-artifacts-sha256.txt`
- Changelog link
* feat(debug): add /cache command surfacing per-turn DeepSeek cache hit/miss
Step 1 of #263. Without per-turn telemetry the prefix-cache audit is
unfounded speculation; the rest of the issue's investigation steps
depend on this surface.
The DeepSeek API already returns `prompt_cache_hit_tokens` and
`prompt_cache_miss_tokens` per turn, and we already store the *latest*
on App. This adds a 50-turn ring (`turn_cache_history`) populated at
the same site as `last_prompt_cache_*_tokens`, plus a `/cache [count]`
slash command that renders a fixed-width table of the last N turns
with per-turn ratios and a session aggregate. Default count is 10;
larger values clamp to the ring size.
Edge cases the formatter handles:
- No telemetry yet → friendly "no turns recorded" message
- `cache_hit_tokens = None` (provider didn't report) → row renders all
em-dashes and is excluded from session aggregates so one missing-
telemetry turn can't make the average ratio look broken.
- `cache_hit_tokens = Some, cache_miss_tokens = None` → infer miss as
`input − hit` and mark the cell with `*`. Footer documents the
asterisk.
- Ring at cap (50) → push evicts oldest.
Tests cover all four paths plus the cap.
* test(prompts): add cache-prefix stability harness for #263 step 2
The DeepSeek prefix-cache only hits while the byte prefix of each
request matches the prior call. Anything in the cached prefix that
varies turn-to-turn for unchanged inputs is a cache buster.
Adds a focused harness next to the production surface so the property
is regression-guarded:
1. `first_divergence(a, b)` helper that returns the first divergent
byte position with a `±32 byte` window of context, used by the
custom assertion `assert_byte_identical`. Future suspect tests can
reuse this to surface "where" rather than just "fail".
2. `compose_prompt_is_byte_stable_across_calls` — sweeps every
(mode, personality) pair and pins that two consecutive calls
produce identical bytes. Rules out suspect #4 (mode-prompt churn).
3. `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace`
— the call site `engine.rs::build_tool_context` actually invokes,
pinned for an empty workspace across all three modes.
4. `system_prompt_with_working_set_summary_is_byte_stable_for_constant_summary`
— pins that the surrounding prompt construction faithfully embeds
the working_set summary it's given without injecting extra
non-determinism. (The actual working_set summary stability lives
in `working_set.rs` and is the next investigation target — see
issue note in PR description.)
Foundation for the suspect-by-suspect bisection in the rest of #263.
* fix(secrets): never overwrite the secrets file when load_unlocked errors
`FileKeyringStore::set` and `delete` did
`self.load_unlocked().unwrap_or_default()`, which wiped every existing
secret if the read failed for any reason other than \"file is missing\":
- file mode != 0600 (`InsecurePermissions`) — easy on headless / CI
environments where a permissive umask got applied
- corrupt JSON
- transient I/O error
In all of those, the next `store_unlocked` overwrote the file with an
empty-or-single-entry blob and reset perms to 0600, silently losing
every other provider's key.
Switch both call sites to `?`. `load_unlocked` already returns
`Ok(default)` for a missing file, so the first-write-creates-the-file
ergonomic is preserved (covered by the new
`file_store_set_still_creates_file_when_missing` test).
Adds four regression tests:
- set: insecure perms surface InsecurePermissions and leave the file
byte-identical.
- delete: same.
- set: corrupt JSON surfaces the parse error and leaves the file
byte-identical.
- set: missing file path still works (idempotence guard).
Closes#281
* fix(cache): make tool catalog byte-stable across calls and sessions
DeepSeek's KV prefix cache hits on the longest matching byte prefix of
the request. Two places in the tool-array path were silently introducing
divergence:
1. `ToolRegistry::to_api_tools()` iterated `self.tools.values()` directly.
Rust's default `HashMap` is seeded with `RandomState` per process, so
every `deepseek` launch produced a different tool order — the cross-
session resume case (the one with the biggest cache wins) never hit.
2. `active_tool_list_from_catalog()` filtered the catalog `Vec` by the
active set in catalog order. When ToolSearch activated a previously-
deferred tool mid-conversation, the new tool appeared at its catalog
index, shifting every later tool's byte offset and busting the cached
prefix from there onwards.
Fixes:
- `to_api_tools()` now sorts by tool name before emitting the API tool
array. Stable across calls AND across launches.
- `build_model_tool_catalog()` sorts each partition (built-ins first,
contiguous; MCP tools after, also alphabetical). Mirrors Claude Code's
`assembleToolPool` strategy where they explicitly call out cache
stability as the reason: "a flat sort would interleave MCP tools into
built-ins and invalidate all downstream cache keys whenever an MCP
tool sorts between existing built-ins."
- `active_tool_list_from_catalog()` puts always-loaded tools in catalog
order at the head and deferred-but-now-active tools at the tail. A
deferred-tool activation during ToolSearch no longer shifts earlier
tools' positions.
Adds three regression tests:
- `to_api_tools_emits_alphabetical_order_regardless_of_registration_order`
- `model_tool_catalog_sorts_each_partition_for_prefix_cache_stability`
- `active_tool_list_pushes_deferred_activations_to_the_tail`
Refs #263. Findings produced by reading reference Claude Code source
side-by-side with our request-building flow; full delta analysis in
the PR description.
* fix(sandbox): elevate Agent-mode shell sandbox to allow network access
The seatbelt-default policy is `WorkspaceWrite { network_access: false }`,
which on macOS emits `(deny default)` with no `(allow network-outbound)` /
`(allow system-socket)`. Every outbound socket call from a sandboxed
shell command — including `getaddrinfo` for DNS — gets denied by the
kernel. Symptom: "DNS resolution failed" for any URL the model tries to
reach via curl, yt-dlp, package managers, etc.
Engine.build_tool_context only elevated the policy in Yolo mode, leaving
Agent mode (the default) stuck on the strict default. That's tighter
than competitors (Claude Code, Codex) without buying any safety the
application-level NetworkPolicy or the approval flow doesn't already
provide.
Switch the elevation to a `match` so:
- Plan → no elevation (read-only investigation; shell tool not registered)
- Agent → WorkspaceWrite { network_access: true, … }
- Yolo → WorkspaceWrite { network_access: true, … } (unchanged)
Adds `agent_and_yolo_modes_elevate_shell_sandbox_to_allow_network` so a
future revert to the no-network default trips CI immediately.
Closes#273
* fix(skills): treat bare github.com/<owner>/<repo> URLs as GitHubRepo
Closes#269.
`/skill install https://github.com/obra/superpowers` failed on every
platform with `invalid gzip header`. Root cause: `InstallSource::parse`
matched any `https://`-prefixed spec as `DirectUrl`, so the installer
downloaded the HTML repo page (200 OK, `text/html`) and tried to
gzip-decode HTML. The user reported it from Win11 + PowerShell but the
parse path is platform-independent.
Recognize bare GitHub repo URLs in `InstallSource::parse`:
- `https://github.com/<owner>/<repo>`
- `https://github.com/<owner>/<repo>/`
- `https://github.com/<owner>/<repo>.git`
- `https://github.com/<owner>/<repo>.git/`
- `https://www.github.com/<owner>/<repo>`
- `http://github.com/<owner>/<repo>` (legacy)
…all route to the existing `GitHubRepo` source, which already produces
`https://github.com/<repo>/archive/refs/heads/{main,master}.tar.gz`
candidates with proper fallback. URLs with a third path segment
(`/archive/...`, `/blob/...`, `/tree/...`) keep going through
`DirectUrl` because the user picked that exact path.
Adds two regression tests: one asserting the seven recognised forms
all canonicalize to `github:obra/superpowers`, and one pinning the
sub-resource paths to `DirectUrl`.
* fix(cache): drop volatile fields from working_set summary block (#280) (#287)
The working-set summary lands inside the system prompt before the
historical conversation, so any byte that drifts there cache-misses
everything that follows in DeepSeek's KV prefix cache. Two sources of
turn-over-turn drift are removed:
1. The rendered line is now `- {path} ({kind})`. The previous form
interpolated `entry.touches` and `self.turn - entry.last_turn`,
both of which advance on every user message even when no new
paths are observed.
2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC)
instead of the turn-aware `sorted_entries`. The recency bonus in
`score_entry` crosses bucket boundaries as turns advance, so even
without rendering `last seen` the order — and which entries cross
the `max_prompt_entries` cutoff — drifted. Compaction pinning
still uses `sorted_entries` because it genuinely wants recency.
Adds a regression test that observes a fixed message set, calls
`summary_block` before and after `next_turn()`, and asserts the two
outputs are byte-identical. The shared `first_divergence` /
`assert_byte_identical` helpers (from #279) move from `prompts::tests`
into `test_support` so working_set tests can reuse them.
Closes#280.
* fix(cache): memoise tool catalog so descriptions stay byte-stable (#289)
`to_api_tools` previously re-sampled `tool.description()` and
`tool.input_schema()` on every call. Native tools return `&'static str`
and a `json!` literal, so the bytes were stable in practice — but the
`McpToolAdapter` returns `self.tool.description.as_deref()`, which can
drift when the upstream MCP server reconnects with a different
description string. Any drift mid-session rewrites the tool catalog
that lands in the cached prefix and busts every byte that follows.
Adds an `api_cache: OnceLock<Vec<Tool>>` field on `ToolRegistry`. The
first `to_api_tools` call materialises the catalog; subsequent calls
return a clone of the cached vector. Mutations (`register`, `remove`,
`clear`) reset the field so the next read rebuilds. Mirrors
reference-cc's `getToolSchemaCache` (`utils/api.ts:119–208`).
Tests:
- `to_api_tools_pins_description_bytes_across_calls` registers a tool
whose `description()` advances through a script of pre-built strings
on each call. After the cache is populated, the second `to_api_tools`
read returns the original description because `description()` is no
longer invoked. Without the cache the second read would return the
next script entry.
- `register_invalidates_api_tools_cache` registers a tool, snapshots,
registers another, snapshots again, and asserts the second snapshot
reflects both tools (cache rebuilt) and that the varying tool's
description advanced (proving the rebuild actually re-sampled).
- `remove_and_clear_invalidate_api_tools_cache` covers the other two
invalidation paths.
* fix(cache): sort project_tree and summarize_project output (#290)
Both helpers walked the workspace via `ignore::WalkBuilder::build()`
and emitted entries in the OS readdir order — non-deterministic across
filesystems (htree-hash on ext4, insertion-order on APFS, etc.). Their
output lands in the fallback branch of the system prompt's project
context (when the workspace has no AGENTS.md / CLAUDE.md) and inside
the `project_map` tool surface, both of which feed the cached prefix.
`summarize_project` now sorts the collected key-files list before the
type-detection logic and the fallback `Project with key files: …` join.
`project_tree` collects `(rel_path, is_dir)` tuples, sorts by full
path, and only then formats the indented tree. Sorting by full path
preserves the visual tree shape — `"src" < "src/lib.rs"` because the
shorter string compares less — while making siblings deterministic.
Tests cover sibling order, parent-before-children invariant, byte
stability across two consecutive calls, and the fallback `Project
with key files:` branch (the only branch where the joined order
escapes into output without further sorting downstream).
* fix(client): unique fallback id for parallel streaming tool calls (#291)
When a streamed tool_call delta omits the `id` field, the chat-completion
decoder used to fall back to the literal string `"tool_call"` for every
call. With the V4 API's native parallel tool calls (multiple tool_calls
in one delta), every parallel call ended up with the same fallback id —
downstream tool-result routing then matched the first call's result
twice and the second call hung waiting for an answer that never arrived.
The fallback now indexes by the assigned `content_block` position,
producing `"call_0"`, `"call_1"`, … within a single response. Upstream-
supplied ids are still forwarded verbatim; only the fallback path
changes.
Tests pin both invariants:
- `decoder_assigns_unique_fallback_ids_to_parallel_tool_calls_missing_id`
feeds two tool calls without `id` in one delta and asserts they get
distinct ids.
- `decoder_preserves_upstream_tool_call_id_when_present` keeps the
forward-as-is path honest.
* fix(cache): place handoff and working_set after static prompt blocks (#292)
* fix(cache): drop volatile fields from working_set summary block (#280)
The working-set summary lands inside the system prompt before the
historical conversation, so any byte that drifts there cache-misses
everything that follows in DeepSeek's KV prefix cache. Two sources of
turn-over-turn drift are removed:
1. The rendered line is now `- {path} ({kind})`. The previous form
interpolated `entry.touches` and `self.turn - entry.last_turn`,
both of which advance on every user message even when no new
paths are observed.
2. A new `sorted_for_prompt` helper sorts by (touches DESC, path ASC)
instead of the turn-aware `sorted_entries`. The recency bonus in
`score_entry` crosses bucket boundaries as turns advance, so even
without rendering `last seen` the order — and which entries cross
the `max_prompt_entries` cutoff — drifted. Compaction pinning
still uses `sorted_entries` because it genuinely wants recency.
Adds a regression test that observes a fixed message set, calls
`summary_block` before and after `next_turn()`, and asserts the two
outputs are byte-identical. The shared `first_divergence` /
`assert_byte_identical` helpers (from #279) move from `prompts::tests`
into `test_support` so working_set tests can reuse them.
Closes#280.
* fix(cache): place handoff and working_set after static prompt blocks
`system_prompt_for_mode_with_context_and_skills` previously interleaved
volatile content into the static prefix:
1. mode prompt static
2. project context static
3. working_set_summary ← volatile
4. skills_block static
5. handoff_block ← volatile
6. ## Context Management static
7. COMPACT_TEMPLATE static
Anything past byte (3) cache-missed every time the working-set drifted
or `/compact` rewrote `.deepseek/handoff.md` — including the static
`## Context Management` and `## Compaction Handoff` blocks behind them.
New order keeps every static block in the cached prefix and pushes the
two volatile blocks to the end:
1. mode prompt
2. project context (or fallback automap)
3. skills block
4. ## Context Management (Agent / Yolo only)
5. COMPACT_TEMPLATE
── volatile boundary ──
6. handoff block
7. working-set summary
Adds a doc comment on the function describing the volatile-content-last
invariant so future contributors don't reintroduce churn into the
prefix. Adds two regression tests:
- `system_prompt_with_handoff_file_is_byte_stable_when_file_is_unchanged`
pins the handoff path with a fixture file.
- `handoff_and_working_set_appear_after_static_blocks` asserts the
ordering invariant directly so a future reorder fails loudly.
Reference: Claude Code's own prompt builder marks this same boundary
with a `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` constant; we don't introduce
the abstraction yet but match the principle.
* feat(i18n): localize slash command help (Phase 1a, #285) (#294)
Adds 44 new MessageIds, one per slash command, and translations to all
four shipped locales (en/ja/zh-Hans/pt-BR). Refactors CommandInfo so the
English description now lives in localization.rs (single source of
truth) instead of being duplicated on the struct, and threads the
active Locale through the three render surfaces:
- crates/tui/src/tui/views/help.rs (the ?/F1/Ctrl+/ help overlay)
- crates/tui/src/tui/command_palette.rs (Ctrl+K palette)
- crates/tui/src/commands/core.rs (the /help text command)
Usage strings (e.g. /cache [count]) stay English by design — they're
placeholder syntax, not natural language.
The existing locale-coverage test
(`shipped_first_pack_has_no_missing_core_messages`) already iterates
ALL_MESSAGE_IDS across Locale::shipped(), so the 44 new IDs are
automatically required to be present in all four locale arms or CI
fails.
This is the first of several incremental Phase 1 PRs. Phase 1b covers
the debug commands (/tokens /cost /cache), 1c the footer hints, and
1d doctor output. Phases 2–3 cover onboarding and error surfaces.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(i18n): localize /tokens /cost /cache debug output (Phase 1b, #285) (#295)
Adds 13 new MessageIds covering the report templates and the
sub-strings shared across them, with translations for all four
shipped locales (en/ja/zh-Hans/pt-BR):
- CmdTokensReport, CmdTokensContextWithWindow, CmdTokensContextUnknownWindow
- CmdTokensCacheBoth, CmdTokensCacheHitOnly, CmdTokensCacheMissOnly
- CmdTokensNotReported
- CmdCostReport
- CmdCacheNoData, CmdCacheHeader, CmdCacheTotals, CmdCacheFootnote, CmdCacheAdvice
Each template uses {placeholder} substitution via String::replace
rather than format!, since format! requires a literal — the
locale-resolved &'static str isn't one. The placeholder convention
({active}, {hit}, {miss}, …) means a translator can re-order or
restructure a sentence freely without changing the call site.
Helpers `token_count`, `active_context_summary`, `cache_summary`, and
`format_cache_history` now take `Locale` so each can resolve their
templates from the same source of truth.
The English templates byte-match the previous hardcoded format strings
so the existing 16 debug-command tests pass unchanged.
Column headers in the cache table (`turn in out hit miss …`)
are intentionally NOT localized — the body rows are formatted with
fixed column widths and translating the header words would break
alignment. Numbers, ratios, and the model id stay in English form.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(i18n): localize footer state + help section labels (Phase 1c, #285) (#296)
Adds 11 new MessageIds covering visible footer chrome and the help-overlay
section headings, with translations for all four shipped locales:
Footer:
- FooterWorking — animated `working` / `working.` / … pulse
- FooterAgentSingular / FooterAgentsPlural — the sub-agent count chip
- FooterPressCtrlCAgain — the quit-confirmation toast
Help overlay sections (`?` / `F1` / `Ctrl+/`):
- HelpSectionNavigation, HelpSectionEditing, HelpSectionActions,
HelpSectionModes, HelpSectionSessions, HelpSectionClipboard,
HelpSectionHelp
`KeybindingSection::label` now takes Locale and returns tr(locale, …).
`footer_working_label` and `footer_agents_chip` likewise take Locale; the
two production callsites in tui/ui.rs pass `app.ui_locale`.
The mode chip itself (agent / yolo / plan) intentionally stays English —
those are brand/acronym labels, and translating them would mean explaining
to maintainers what `代理` means in a bug report.
The keybinding catalog DESCRIPTIONS (41 entries) are not translated in this
PR — those are technical prose that would dwarf the rest of i18n work and
can ship in v0.8.5. Section labels are translated so the help overlay
groups read as expected in any locale.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(commands): smoke-test that every / command dispatches to a handler (#299)
Adds two parallel-safe smoke tests in `crates/tui/src/commands/mod.rs`
that iterate the COMMANDS registry and verify every command — and every
declared alias — dispatches to a real handler. A dispatch miss surfaces
as the fall-through `Unknown command:` error message in `execute`,
which used to be invisible until a user typed the command and saw the
"did you mean" suggestion fire on a registered command.
The tests build a workspace-isolated app via `tempfile::TempDir` so
side-effecting handlers (`/init` writing AGENTS.md, `/save` and
`/export` writing files) do not pollute `crates/tui/` when CI runs from
there. `/save` and `/export` get an explicit tempdir-relative path
because their no-arg defaults still resolve relative to `cwd`.
`/restore` is skipped — it shells out to git for the snapshot repo and
its own dedicated tests in `commands/restore.rs` already serialize on
the global env mutex via `scoped_home`. The existing coverage there is
sufficient.
Closes a gap surfaced when verifying that the v0.8.4 i18n refactor
(#294, #295, #296) did not silently break any slash-command dispatch.
All 44 commands and their aliases pass (16 aliases on top of the
44 names; `/restore` is the only skip).
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(release): bump version to 0.8.4 (#297)
CHANGELOG entry covers the v0.8.4 work landed since 0.8.3:
- Localization Phase 1 (#285) — slash command help (#294), debug command
output (#295), footer state and help-overlay section labels (#296).
Adds 68 new MessageIds across all four shipped locales (en/ja/zh-Hans/pt-BR).
- Cache-prefix stability (#263) — five companion fixes (#287, #288→#292,
#289, #290, #291) that keep the DeepSeek prefix cache stable across turns.
- Plus the items already in [Unreleased]: agent-mode network exec (#272),
/skill GitHub URL parsing (#269), and the V4 Pro discount expiry extension
(#267).
Bumps:
- Cargo.toml workspace version 0.8.3 → 0.8.4
- npm/deepseek-tui/package.json version + deepseekBinaryVersion 0.8.3 → 0.8.4
- Cargo.lock regenerated from the new workspace version.
Phase 1d (doctor output), Phase 2 (onboarding/init/missing-companion),
and Phase 3 (tool errors / sandbox denials / approvals) deferred to v0.8.5.
The shipped Phase 1 surfaces (slash commands, debug telemetry, footer
chrome) cover the highest-traffic UI paths Chinese users see first.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(release): bump internal path-dep versions + repair doc link (#301)
CI on PR #300 (release feat/v0.8.4 → main) flagged two regressions
introduced by the 0.8.4 version bump:
1. Version drift — path-dependency `version = "0.8.3"` references
inside the workspace crates (10 crates: agent, app-server, cli,
config, core, execpolicy, hooks, mcp, tools, tui) did not move with
the workspace `[workspace.package] version = "0.8.4"`. The CI guard
`scripts/release/check-versions.sh` requires they match.
2. Broken intra-doc-link `[crate::localization::english]` in the
CommandInfo doc comment — `english` is private. Replaced with a
reference to the public `description_for` accessor and the public
`tr()` function.
Verified with:
- scripts/release/check-versions.sh — Version state OK.
- RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — green.
- cargo fmt + clippy + test all green.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the `publish-npm` job from `release.yml`. It has been failing on
every release with `npm error code EOTP` because the configured `NPM_TOKEN`
doesn't bypass 2FA. Manual publish from a developer machine is the actual
ship path; codify that.
- Update `docs/RELEASE_RUNBOOK.md` "npm Wrapper Release" to describe the
manual flow (`npm publish --access public` + OTP) and explain why the auto
path is gone, with a recovery note for future Trusted-Publishing migration.
- Refresh stale cross-reference comment in `publish-npm.yml` (the workflow
remains as inert plumbing for an eventual Trusted Publishing setup).
- Stop tracking `docs/DeepSeek_V4.pdf` (4.4 MB). It was never referenced
outside test fixture filenames; the tests synthesize their own fake PDF.
Add to `.gitignore` so a local copy can sit there without nagging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The OIDC Trusted Publisher path for npm has 404'd on PUT for v0.5.1,
v0.5.2, and v0.6.1, even with valid OIDC tokens. Switch publish-npm and
publish-npm-manual to a classic NPM_TOKEN automation token (set the
NPM_TOKEN repo secret to a granular access token scoped to deepseek-tui
with publish permission) so future releases ship reliably.
Also add .github/workflows/auto-tag.yml: when the workspace version on
main changes, push the matching v$VERSION tag automatically so release.yml
fires without a manual tag push. Requires a RELEASE_TAG_PAT secret to
trigger downstream workflows (GITHUB_TOKEN tag pushes don't trigger
on: push: tags by design).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds first-class DeepSeek V4 Pro and Flash support, updates the default model to deepseek-v4-pro, aligns legacy aliases with the current V4 1M context behavior, and fixes thinking-mode request handling.
Key fixes:
- Send DeepSeek's raw Chat Completions `thinking` parameter at the top level instead of SDK-only `extra_body`.
- Preserve assistant `reasoning_content` for all prior thinking-mode tool-call turns so subsequent requests satisfy DeepSeek V4's replay requirement.
- Fix npm wrapper concurrent first-run downloads by using per-process temporary download paths.
- Add `.mailmap` so historical bot-attributed commits aggregate under Hunter Bown where mailmap is honored.
Verified with the full local Rust gate, live DeepSeek V4 smoke, npm wrapper temp-install smoke, and green PR CI across Linux, macOS, and Windows.
- Move src/* into crates/tui/src/ to create a proper workspace structure
- Add .claude/ and .trimtab/ directories for Trimtab closed-loop workflow
- Add DEPENDENCY_GRAPH.md and update documentation
- Update Cargo.toml files to reflect new crate dependencies
- Update CI workflows and npm package scripts
- All tests pass, release build works
- Convert root to Cargo workspace with crates/ layout
- Add deepseek-* crates mirroring Codex architecture
- Add parity CI workflow with snapshot/protocol/state tests
- Update release workflow to build both deepseek and deepseek-tui binaries
- Bump version to 0.3.28
The dtolnay/rust-toolchain action already specifies the toolchain version
in the action reference (@stable). Adding toolchain: stable as an input
was causing 'toolchain is a required input' errors.
Fixed in:
- ci.yml
- release.yml
- crates-publish.yml