- scripts/release/prepare-release.sh bumps workspace + crate pins + npm
wrapper + README install tags, refreshes Cargo.lock, regenerates the
TUI changelog slice and web facts, then runs check-versions.sh
- check-versions.sh now also gates web/lib/facts.generated.ts and the
README install-tag examples (both drifted silently before)
- .cnb.yml validates the pushed tag against Cargo.toml before generating
mirror release notes
- RELEASE_CHECKLIST/RUNBOOK updated accordingly (v0.8.56 needed 9 fix
commits for exactly these sync points)
The /change command embeds this file into every binary via include_str!;
it now carries only recent releases (regenerated by scripts/sync-changelog.sh,
wired into the release checklist). The explicit-version test derives its
fixture versions from the embedded slice instead of hardcoding old ones.
Completes the in-progress OpenAI Codex provider and bumps the workspace to
0.8.55. Builds on the committed Together AI provider + model catalog work.
OpenAI Codex (ChatGPT) provider — experimental:
- Wire the previously-dead OAuth module into credential resolution. The TUI
config now resolves the access token via the Codex CLI login in
~/.codex/auth.json (env overrides OPENAI_CODEX_ACCESS_TOKEN/CODEX_ACCESS_TOKEN),
refreshing expired tokens synchronously via the OpenAI token endpoint —
mirroring the existing Kimi OAuth flow rather than introducing a new pattern.
- Send the ChatGPT backend's required headers from the Responses client
(chatgpt-account-id, OpenAI-Beta: responses=experimental, originator) and
stop duplicating the Authorization header already installed on the client.
- Fix the cli crate's non-exhaustive ProviderKind matches (compile blocker).
Consistency / de-slop pass (so the provider fits the whole app, not one path):
- has_api_key_for / active_provider_has_config_api_key now detect the Codex
OAuth login on disk, the same way they detect Kimi OAuth — a `codex login`
user is no longer reported as unauthenticated.
- Replace the bogus OPENAI_CODEX_API_KEY hint (which exists nowhere else) with
the real OPENAI_CODEX_ACCESS_TOKEN/CODEX_ACCESS_TOKEN in the auth-error and
picker surfaces.
- Drop dead state in the Responses stream parser (unused ToolCallState fields /
imports); tool-call data is streamed live.
- Update docs/PROVIDERS.md, config.example.toml, and the provider-metadata wire
test for the Responses wire format.
Release:
- Bump workspace + crates + npm package to 0.8.55; update CHANGELOG.md and
crates/tui/CHANGELOG.md.
Note: the live Responses round-trip has not been exercised against the
production ChatGPT backend in this environment; the provider ships as preview.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harvested from PR #2885 by @greyfreedom. Wires ask-rules into the
app-server and core ExecPolicyEngine (previously inert). Removes the
original PR's NeedsApproval arm that incorrectly allow-listed the
working directory as a network host.
Co-Authored-By: greyfreedom <11493871+greyfreedom@users.noreply.github.com>
Add recorded mock-trace replay coverage for workflows/rlm_cache_change.star and prove missing dogfood records produce ReplayDiverged instead of live fallback.\n\nVerification:\n- cargo test -p codewhale-whaleflow rlm_cache_change --locked\n- cargo fmt --all --check\n- git diff --check\n- cmp -s CHANGELOG.md crates/tui/CHANGELOG.md\n- ./scripts/release/check-versions.sh\n- ./scripts/release/check-ohos-deps.sh
Harvests provider-scoped TLS skip-verify from #1893 by @wavezhang. Disabled by default, active-provider-only, doctor-reported, and keeps SSL_CERT_FILE as the preferred custom CA path.
Harvested from PR #2482 by @AdityaVG13, preserving the typed WhaleFlow config and deterministic planner direction without exposing the runtime workflow_run tool yet.
Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
Add /hf and /huggingface command routing for Hugging Face MCP setup/status plus a concepts explainer for provider, MCP, and Hub workflows.
Document the settings-generated Hugging Face MCP configuration path and keep the slice offline: no Hub search command, no direct Hugging Face HTTP requests, and no custom URL encoding.
Refs #2709
Harvested from PR #2782 by @idling11
Add completion_sound = "file" with [notifications].sound_file for Windows custom WAV completion sounds without changing the global Windows sound scheme.
The Windows path uses PlaySoundW asynchronously with no default fallback. Non-Windows file mode warns and no-ops, missing paths warn once, and setting a valid path resets the missing-path warning latch so later misconfiguration is visible again.
Fixes#2484
Reported by @LHqweasd
Harvested from PR #2512 by @cyq1017
Co-authored-by: cyq1017 <61975706+cyq1017@users.noreply.github.com>
Add optional [search].base_url support for DuckDuckGo-compatible private search endpoints, including a preferred CODEWHALE_SEARCH_BASE_URL env override and the legacy DEEPSEEK_SEARCH_BASE_URL alias.
Network policy now gates the configured endpoint host, custom endpoints do not fall back to public Bing, non-DuckDuckGo provider/base_url combinations and challenge pages return explicit errors, and custom endpoint results report the configured host as their source.
Fixes#2436
Reported by @Artenx
Harvested from PR #2510 by @cyq1017
Co-authored-by: cyq1017 <61975706+cyq1017@users.noreply.github.com>
PinchBench runner now defaults to openrouter/xiaomi/mimo-v2.5-pro instead
of deepseek/deepseek-chat. Adds --direct-mimo flag for routing through
Xiaomi's API directly (bypasses OpenRouter), with tp-/sk- key type
detection and endpoint mismatch warnings.
Harbor adapter gains --provider CLI flag for MiMo provider routing.
Known issues documented in docs/MIMO_BENCHMARK_ISSUES.md:
- PinchBench model validation requires OpenRouter prefix
- OPENROUTER_API_KEY needed even for some direct-provider paths
- Token Plan vs pay-as-you-go key/endpoint mismatch
- PinchBench runs through OpenClaw, not CodeWhale
Benchmark harness for evaluating CodeWhale against three external
benchmarks:
- SWE-bench: batch driver wrapping existing codewhale swebench commands
- Terminal-Bench: Harbor adapter (BaseInstalledAgent) for container eval
- PinchBench: runner with auto-install for real-world agent tasks
Includes docs/BENCHMARKS.md umbrella doc with setup, usage, and
reproducibility checklist. Scripts record version/commit/timestamp
metadata for each run.
Branch: codex/v0.8.53-benchmarks (based on v0.8.53)
Harvested from PR #2760 by @sximelon
Fixes#2758
Show the canonical 'codewhale resume <session-id>' subcommand in the sessions footer instead of the invalid dispatcher form, and add a parser/footer regression test tying the hint to the actual Resume command.
Verification: cargo fmt --all -- --check; git diff --check; ./scripts/release/check-versions.sh; cargo test -p codewhale-tui --bin codewhale-tui --locked sessions_footer_points_to_resume_subcommand -- --nocapture; cargo clippy -p codewhale-tui --bin codewhale-tui --locked -- -D warnings.
Co-authored-by: sximelon <15710511+sximelon@users.noreply.github.com>