- Match actual Harbor CLI interface (no invented flags)
- Proper BaseInstalledAgent subclass for Codex
- Robust token extraction from stream JSONL + transcript parsing
- Heuristic answer_len extraction (## Final Answer markers)
- Metadata capture: versions, git commit, platform, timestamp
- --regenerate walks existing run directories
- All missing fields explicit null, never zero
- Support multiple runs per task with run_idx tracking
The harness is designed to run:
harbor run --dataset terminal-bench@2.0:<task> --agent ... --model ...
for both codex and codewhale agents, then normalize the results.
Split SiliconflowCN into its own [providers.siliconflow_cn] TOML section
instead of silently ignoring [providers.siliconflow-CN] config.
- ProvidersToml / ProvidersConfig: add siliconflow_cn field with serde alias
- for_provider / for_provider_mut / provider_config_for: route SiliconflowCN
to the new field
- resolve_runtime_options_with_secrets: fallback siliconflow_cn → siliconflow
for api_key / base_url / model when unset
- deepseek_api_key: add config-file fallback for SiliconflowCn
- provider_config_key: update metadata to "siliconflow_cn"
- save_api_key_for: write SiliconflowCn keys to providers.siliconflow_cn
- docs/PROVIDERS.md, config.example.toml, scripts/check-provider-registry.py
- setup-vm.sh: bump RELEASE_TAG default to v0.8.57, add gh CLI install
step (official APT repo) and 4G swapfile creation (idempotent)
- agent-session.sh: new sourceable helper that exports the provider key
from /etc/codewhale/runtime.env for interactive agent sessions
- README.md: update version refs, add agent-session.sh to layout, add
Autonomous agent loop section with full pick->PR commands
The droplet ops (binary upgrade, PAT setup, first end-to-end issue run)
are documented as the next steps for the operator.
The workflow hardcoded install boilerplate plus a contributor list that
had already drifted (v0.8.56's release thanked people 'for shaping
v0.9.0'). The body now comes from scripts/release/generate-release-body.sh:
static install/verify sections plus the tagged version's changelog section,
which already carries the per-release credits.
- scripts/release/prepare-release.sh bumps workspace + crate pins + npm
wrapper + README install tags, refreshes Cargo.lock, regenerates the
TUI changelog slice and web facts, then runs check-versions.sh
- check-versions.sh now also gates web/lib/facts.generated.ts and the
README install-tag examples (both drifted silently before)
- .cnb.yml validates the pushed tag against Cargo.toml before generating
mirror release notes
- RELEASE_CHECKLIST/RUNBOOK updated accordingly (v0.8.56 needed 9 fix
commits for exactly these sync points)
The packaged changelog is now a recent-releases slice produced by
scripts/sync-changelog.sh (which gains a --check mode); also restore the
SECURITY.md contact line the version gate guards, and finish the stale
binary-name sweep (--bin codewhale examples, qa harness doc).
The /change command embeds this file into every binary via include_str!;
it now carries only recent releases (regenerated by scripts/sync-changelog.sh,
wired into the release checklist). The explicit-version test derives its
fixture versions from the embedded slice instead of hardcoding old ones.
Harvested from PR #2482 by @AdityaVG13, preserving the typed WhaleFlow config and deterministic planner direction without exposing the runtime workflow_run tool yet.
Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
Two bugs from the initial run:
1. workspace_files format is [{source, dest}] not {path, content} —
files live in PinchBench's assets/ directory, not tasks/. Now checks
both tasks/ and assets/ directories.
2. LLM judge tasks (writing, research) scored 0% because the judge
wasn't implemented. Now uses codewhale exec as the judge — sends
the rubric + workspace contents and parses a JSON score response.
Also strips ANSI escape codes and control characters from judge output
to prevent JSON parse failures.
Runs PinchBench tasks directly through codewhale exec --auto instead
of going through OpenClaw. Loads task markdown, creates workspace,
runs the prompt, and grades using PinchBench's embedded automated
checks.
No external agent framework dependency — just codewhale + pyyaml.
PinchBench runner now defaults to direct Xiaomi API (no OpenRouter).
Reads API key from ~/.codewhale/config.toml [providers.xiaomi_mimo]
when XIAOMI_MIMO_API_KEY env var is not set. --openrouter flag for
the old OpenRouter path.
PinchBench runner now defaults to openrouter/xiaomi/mimo-v2.5-pro instead
of deepseek/deepseek-chat. Adds --direct-mimo flag for routing through
Xiaomi's API directly (bypasses OpenRouter), with tp-/sk- key type
detection and endpoint mismatch warnings.
Harbor adapter gains --provider CLI flag for MiMo provider routing.
Known issues documented in docs/MIMO_BENCHMARK_ISSUES.md:
- PinchBench model validation requires OpenRouter prefix
- OPENROUTER_API_KEY needed even for some direct-provider paths
- Token Plan vs pay-as-you-go key/endpoint mismatch
- PinchBench runs through OpenClaw, not CodeWhale
Benchmark harness for evaluating CodeWhale against three external
benchmarks:
- SWE-bench: batch driver wrapping existing codewhale swebench commands
- Terminal-Bench: Harbor adapter (BaseInstalledAgent) for container eval
- PinchBench: runner with auto-install for real-world agent tasks
Includes docs/BENCHMARKS.md umbrella doc with setup, usage, and
reproducibility checklist. Scripts record version/commit/timestamp
metadata for each run.
Branch: codex/v0.8.53-benchmarks (based on v0.8.53)
Add AUTHOR_MAP plus a lightweight co-author trailer checker so harvested commits use numeric GitHub noreply identities, reject bot/tool trailers, and require machine-readable credit when a commit says it was harvested from a PR.
Also normalize the local unpushed v0.9 harvest range so existing contributor authors/trailers for HUQIANTAO, Implementist, jrcjrcc, xyuai, cyq1017, idling11, and shenjackyuanjie use GitHub-mappable identities before the branch is published.
Validation: python3 scripts/check-coauthor-trailers.py --author-map .github/AUTHOR_MAP --range origin/main..HEAD --check-authors; python3 -m py_compile scripts/check-coauthor-trailers.py; ruby -e 'require "yaml"; YAML.load_file(".github/workflows/ci.yml")'; git diff --check; negative in-process validation for raw email, missing harvested credit, and bot author cases.
Harvest the HarmonyOS/OpenHarmony port from PR #2634 and make it publish-safe by target-gating unsupported host dependencies out of the OHOS TUI graph. Self-update is disabled on OHOS, PTY shell mode reports unsupported, and Starlark execpolicy parsing returns an explicit unsupported-platform error until upstream starlark/rustyline/nix support catches up.
Add OHOS SDK setup docs and launcher scripts, install the rustls ring provider for rustls-no-provider entrypoints, and keep the packaged codewhale-tui OHOS graph free of starlark, rustyline, nix@0.28, portable-pty, and arboard.
Validation: cargo fmt --all -- --check; git diff --check; git diff --cached --check; cargo check -p codewhale-cli --locked; cargo check -p codewhale-app-server --locked; cargo check -p codewhale-tui --locked; cargo test -p codewhale-cli --locked update::tests::; cargo test -p codewhale-release --locked; cargo test -p codewhale-tui --locked background_tty_command_has_controlling_terminal; cargo test -p codewhale-tui --locked clipboard; cargo package -p codewhale-tui --allow-dirty --no-verify --locked; packaged OHOS cargo tree checks. OHOS target check still requires a loaded OpenHarmony SDK/sysroot and currently stops in ring with missing assert.h when CC/CFLAGS/linker are unset.
Harvested from PR #2634 by @shenjackyuanjie.
Co-authored-by: shenjackyuanjie <54507071+shenjackyuanjie@users.noreply.github.com>
- Define UnStrStr macro for uninstaller string functions
- Use un.StrStr instead of StrStr in uninstaller context
- Rewrite un.RemoveFromPath with correct offset calculations
and semicolon handling to prevent PATH corruption
- Use dynamic version fetch from GitHub API in CLASSROOM_INSTALL.md
Closes#1983
- Add scripts/installer/codewhale.nsi: NSIS installer that installs both
codewhale.exe and codewhale-tui.exe to %LOCALAPPDATA%\Programs\CodeWhale\bin,
adds to current-user PATH, and includes an uninstaller that cleans PATH
- Add docs/CLASSROOM_INSTALL.md: step-by-step checklist for IT admins
deploying CodeWhale in labs/classrooms, covering silent install, manual
fallback, API key provisioning, imaging notes, and troubleshooting
- Update docs/INSTALL.md: add Windows NSIS Installer section referencing
the new installer and classroom checklist
Harvested from #2415 with thanks to @axobase001.
Keeps the denser mobile QR renderer and replaces the fixed binding-warning sleep with health polling plus an explicit timeout failure path, so slow starts fail with the useful cause instead of drifting into misleading assertions.
Follow-up to #2403.
* feat: add mobile smoke tests and QR code for mobile URL
#2396: Add scripts/mobile-smoke.sh that launches the compiled binary on
loopback ports and verifies the mobile surface through real HTTP requests:
- Token auth (401/200, Bearer, query param, approval 404)
- Insecure mode (no token required)
- Binding warnings (0.0.0.0, LAN URL hint)
Add mobile-smoke job to CI workflow.
#2397: Add --qr flag to 'codewhale serve --mobile' that renders a
terminal QR code for the mobile URL. Uses the LAN IP when available,
falls back to 127.0.0.1. Adds qrcode crate (pure Rust, no C deps).
* fix: address review feedback on mobile smoke tests
- Fix Test Group 3 subprocess capture: use temp file instead of command
substitution to avoid hanging and subshell variable isolation
- Allow BINARY path to be overridden via BINARY env var
- Add libdbus-1-dev system dependency to CI job for ubuntu build
* fix: pass auth header in mobile smoke status helper
* fix: send approval JSON in mobile smoke
---------
Co-authored-by: Hu Qiantao <huqiantao@HudeMacBook-Air.local>
Co-authored-by: Hunter B <hmbown@gmail.com>
* fix: Windows .bat launcher with correct JS escaping and codewhale rebrand
* fix: complete Windows bat release asset handling
---------
Co-authored-by: cy2311 <cy2311@users.noreply.github.com>
Co-authored-by: Hunter B <hmbown@gmail.com>
* docs: v0.8.46 CHANGELOG — platform archives, palette, sub-agents, sandbox, web install, search fixes
Closes#2188
* feat(v0.8.46): quick fixes — palette, model picker Esc, sub-agent sidebar, shell chip, model name casing, CVE bump (#2212)
* fix: bump qs to >=6.15.2 for CVE-2026-8723
Add qs override in feishu-bridge package.json to force transitive
dependency resolution to >=6.15.2, addressing CVE-2026-8723.
Refs: #2198
* fix: Esc in model picker applies last-highlighted choice
Previously Esc reverted to the initial model when the user hadn't
moved the selection. Now Esc always applies the currently highlighted
model and thinking-effort tier, making Esc consistent with Enter.
Also updates the picker footer hint from 'Esc cancel' to 'Esc apply'.
Refs: #2196
* feat: show '⏳ shell running' chip in TUI footer
Adds a footer_shell_chip function that displays a '⏳ shell running'
status chip in the footer's right cluster whenever a foreground shell
command is active via exec_shell. The chip is always visible regardless
of user-configured status items.
Refs: #2194
* feat: auto-collapse finished sub-agents in sidebar
When a sub-agent completes (status = 'done'), its detail lines
(id, steps, duration, progress) are now hidden in the sidebar agents
panel. Only the summary label line is shown, keeping the sidebar
compact. Running agents still show full detail.
Refs: #2195
* feat: refresh Whale dark palette for better contrast
Improve contrast and layer separation in the Whale dark theme:
- Deepen base background for more depth (10,17,32)
- Lighten panel (22,34,56) for clearer distinction from bg
- Lighten elevated surface (36,52,78) for better elevation
- Lighten selection (48,68,100) for clearer selected state
- Boost text hint (138,150,174) and dim (118,130,156) readability
- Brighter border (52,88,145) for better edge definition
- Update tool surface colors for consistency
Refs: #2197
* fix: preserve model name casing in normalize_model_name_for_provider
When the user enters a model name like 'DeepSeek-V4-Flash', the
normalizer was lowercasing it to 'deepseek-v4-flash' via the
canonical_official_deepseek_model_id function. Now the normalizer
preserves the caller's casing when the input already matches a known
model id case-insensitively. Compact aliases like 'deepseek-v4pro'
are still rewritten to 'deepseek-v4-pro'.
Refs: #2109
* feat(web): install download tile with arch detection, SHA256, China mirrors + companion binary fix (#2213)
* fix(web): download both codewhale and codewhale-tui binaries in install snippets
The SNIPPETS map only fetched one binary per platform, causing the
dispatcher to fail with MISSING_COMPANION_BINARY. Every arch now
downloads both codewhale AND codewhale-tui side-by-side.
- macOS/Linux: added second curl + combined chmod/xattr/mv for tui
- Windows: added second Invoke-WebRequest for codewhale-tui.exe
- VERIFY: PowerShell now hashes both binaries; Unix --ignore-missing
covers all present binaries in a single sha256sum pass
* feat(web): add install download tile with arch detection, SHA256, and China mirrors (#2192)
* feat(sandbox/linux): process hardening — PR_SET_DUMPABLE, NO_NEW_PRIVS, RLIMIT_CORE (#2214)
* feat(sandbox/linux): add process hardening module — PR_SET_DUMPABLE, NO_NEW_PRIVS, RLIMIT_CORE (#2183)
* feat(sandbox/linux): seccomp filter + bwrap passthrough
- seccomp: BPF filter whitelisting safe syscalls, denying ptrace/mount/kexec
and other dangerous syscalls. Uses raw BPF instructions via libc prctl to
avoid external dependencies (#2182).
- bwrap: optional bubblewrap passthrough when /usr/bin/bwrap is present
and [sandbox] prefer_bwrap=true in config. Creates read-only rootfs with
write access limited to the working directory (#2184).
- landlock detect_denial extended to recognize seccomp SIGSYS/"Bad system
call" patterns alongside existing Landlock EACCES/EPERM detection.
- SandboxManager gains prefer_bwrap field; set_prefer_bwrap on ShellManager.
- EngineConfig gains prefer_bwrap field, wired through main/ui/runtime_threads.
- Diagnostics now reports bwrap_available and cgroup_version.
- config.example.toml documents the prefer_bwrap key.
Pre-existing clippy fixes picked up in the same build:
- collapsible_if in ui.rs version-check
- cmp_owned in goal.rs test
- consecutive str::replace in normalize_auth_mode
Closes#2182, closes#2184
* docs: add cross-links to issue and PR templates in CONTRIBUTING.md (#2215)
- Link .github/ISSUE_TEMPLATE/bug_report.md and feature_request.md from
the Reporting Issues section
- Link .github/PULL_REQUEST_TEMPLATE.md from the Pull Request Guidelines
section
* feat(release): bundle platform archives with install scripts (#2216)
- Add bundle job to release workflow that creates per-platform archives
(tar.gz for Linux/macOS, .zip for Windows) containing both codewhale
and codewhale-tui binaries plus install scripts
- Create install.bat (Windows) — copies binaries to %USERPROFILE%\bin
- Create install.sh (Unix) — copies binaries to ~/.local/bin
- Windows gets a portable .zip variant without install script
- Release notes updated to promote archives as primary download method
- Individual binaries retained for npm wrapper and scripting
Closes#2193
* fix(web_search): fall back to DuckDuckGo when Bing returns zero results (#2130)
When the configured search provider is Bing and the query returns zero
results (common for technical/compound queries), fall through to the
DuckDuckGo path instead of reporting empty. A provenance message is
surfaced: "Bing returned no results; used DuckDuckGo fallback".
Also adds Security and Code of Conduct cross-links to CONTRIBUTING.md
per the sub-agent renovation (#2203).
* docs: SANDBOX.md threat model + RFCs for persistence and MCP + SandboxExecutor trait
- docs/SANDBOX.md: complete threat model describing each platform's sandbox
(Seatbelt, Landlock, seccomp, process hardening, bwrap, Windows v1).
Covers defense-in-depth layering, config keys, denial detection, limitations.
- docs/rfcs/2189-persistence-sqlite.md: RFC for SQLite migration (drafted by sub-agent)
- docs/rfcs/2190-mcp-modularization.md: RFC for MCP crate split into
protocol/client/server with OAuth support
- crates/tui/src/sandbox/policy.rs: SandboxExecutor trait definition and
SafetyLevel→SandboxPolicyBehavior mapping function with tests
Closes#2180, closes#2186, closes#2189, closes#2190
* feat: sandbox parity tests + remove sub-agent 100-turn cap
- Add sandbox parity tests covering platform detection, denial patterns,
bwrap preference, and policy consistency across modes (#2187)
- Remove arbitrary 100-turn sub-agent cap: DEFAULT_MAX_STEPS changed
from 100 to u32::MAX. Sub-agents now run until they produce a final
text response, are cancelled by the parent, or hit a configured
explicit budget (#2034)
Closes#2187, closes#2034
A small cleanup pass to catch brand mentions that the R5 sweep missed
because they hid in:
- HTTP User-Agent format strings (`Mozilla/5.0 (compatible; deepseek-tui/`
in `client.rs` and `fetch_url.rs`).
- Multi-line error messages whose phrase boundary straddled a line break
("…restart\n deepseek-tui." in `js_execution.rs`,
`tool_catalog.rs`, `repl/runtime.rs`).
- Doc comments mentioning `deepseek-tui` as a binary (`config/src/lib.rs`,
`core/capacity.rs`, `tui/streaming/chunking.rs`, `features.rs`).
- Skill descriptions shipped in `crates/tui/assets/skills/*/SKILL.md`.
- Test fixtures with placeholder paths / git emails
(`tui/external_editor.rs`, `snapshot/repo.rs`).
- `task_manager.rs`'s `cargo test -p deepseek-tui --lib` example.
- `scripts/tencent-lighthouse/doctor.sh` info-line prefix.
The remaining `deepseek-tui` mentions in the codebase are intentional
(the legacy `[[bin]]` entry in `crates/tui/Cargo.toml`, the legacy
`npm/deepseek-tui/` deprecation shim package, the CNB mirror namespace,
the security email, the legacy bin's shim source file, and historical
CHANGELOG entries) and were preserved per the rebrand anti-scope.
Local gates green: `cargo check --workspace --all-targets --locked`,
`cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings`, `cargo test --workspace
--all-features --locked` (3226+ pass, 0 fail).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the npm wrapper directory and package from `deepseek-tui` to
`codewhale`. Move under `npm/codewhale/`:
- `package.json` renamed (name, bin, internal field) — keeps a
`deepseekBinaryVersion` fallback so old metadata still works.
- Bin entry points renamed to `bin/codewhale.js` and
`bin/codewhale-tui.js`; they spawn the corresponding canonical
binaries via the wrapper.
- `scripts/artifacts.js` switches to the canonical asset-name matrix
(`codewhale-*`, `codewhale-tui-*`) and `codewhale-artifacts-sha256.txt`.
- `scripts/run.js` exports `runCodewhale` and `runCodewhaleTui`; the
legacy `runDeepseek` exports are gone since nothing else inside the
package depended on them.
- `scripts/install.js`, `verify-release-assets.js`, `preflight-glibc.js`
update brand-mention strings + User-Agent headers. Env vars
(`DEEPSEEK_TUI_*`, `DEEPSEEK_*`) are explicitly anti-scope and are
left in place.
- Tests retargeted at the canonical asset names; all 19 still pass.
- README rewritten with the new install command and a deprecation
note about the old package.
Create a one-release deprecation shim at `npm/deepseek-tui/`:
- `package.json` with no `bin`, just a postinstall script that
prints a clear message telling the user to install `codewhale`
instead.
- `README.md` with the same migration note.
- Will be removed in v0.9.0 (or whenever Hunter retires the shims).
Release-side scripts in `scripts/release/` follow the rename:
- `prepare-local-release-assets.js` now requires `npm/codewhale/...`
and copies the canonical `codewhale*` binaries.
- `npm-wrapper-smoke.js` smokes the renamed package.
- `check-versions.sh` reads `npm/codewhale/package.json` for the
primary check and additionally pins the legacy shim package to
the same version.
- `check-published.sh` queries `codewhale@<version>` (with
`codewhaleBinaryVersion` lookup that falls back to the legacy
`deepseekBinaryVersion` field).
- `.github/workflows/auto-tag.yml` watches both `npm/codewhale/` and
`npm/deepseek-tui/` package.json for auto-tag triggers.
Verified:
- `npm test` inside `npm/codewhale/` passes 19/19.
- `npm install --dry-run --ignore-scripts` succeeds for both
`npm/codewhale/` and `npm/deepseek-tui/`.
- `scripts/release/check-versions.sh` reports OK.
- Rust gates re-run: `cargo check`, `cargo fmt --check`,
`cargo clippy -- -D warnings`, all clean.
No `npm publish` is run from this change — Hunter publishes manually
when the rebrand is ready to ship.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the canonical binaries:
- `deepseek` → `codewhale` (CLI dispatcher)
- `deepseek-tui` → `codewhale-tui` (TUI runtime)
Both legacy names continue to ship as tiny deprecation shims that print
a one-line warning to stderr and forward argv to the new binary. The
shims are produced by two new `[[bin]]` entries in `crates/cli/Cargo.toml`
and `crates/tui/Cargo.toml` pointing at small source files under
`src/bin/`. They will be removed in v0.9.0.
Touchpoints:
- Cargo bin entries + new shim source files.
- clap `name`/`bin_name`/usage strings flip to `codewhale`.
- Dispatcher's sibling-binary discovery looks for `codewhale-tui` and
reports `codewhale` in its error/help prose. `DEEPSEEK_TUI_BIN` env
var stays — env vars are explicitly anti-scope.
- `update.rs` now downloads `codewhale-*` assets and verifies them
against `codewhale-artifacts-sha256.txt`. Legacy `deepseek-*` assets
and `deepseek-artifacts-sha256.txt` are still produced by the release
matrix so v0.8.40's `deepseek update` keeps working through one
transition release.
- `ci.yml`, `nightly.yml`, `release.yml` updated to build/upload the new
canonical binaries; `release.yml`'s matrix doubles to also ship the
legacy shim binaries so v0.8.40 update clients land on the shim.
- `scripts/release/crates.sh` and `check-versions.sh` updated for the
renamed crate names from R1.
Local gates green: `cargo check --workspace --all-targets --locked`,
`cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings`, `cargo test --workspace
--all-features --locked` (3226+ pass, 0 fail), and `cargo build
--release` produces all four binaries:
- target/release/codewhale (canonical dispatcher)
- target/release/codewhale-tui (canonical TUI)
- target/release/deepseek (legacy shim, forwards to codewhale)
- target/release/deepseek-tui (legacy shim, forwards to codewhale-tui)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the Feishu/Lark long-connection bridge, Tencent Lighthouse runbooks, CNB mirror guidance, CNB tag release pipeline, and China-friendly update fallback documentation for the v0.8.37 line.