chore(release): merge v0.9.0-stewardship into v0.8.54

Includes Paulo's command parity and Gherkin E2E harnesses,
HUQIANTAO's concurrency/security fixes, LeoAlex0's runtime_prompt
slim, reidliu41's hotbar persistence, HarmonyOS scaffolding,
Whaleflow foundation crate, and all v0.9.0 stabilization work.
This commit is contained in:
Hunter B
2026-06-08 06:54:09 -07:00
237 changed files with 41229 additions and 4498 deletions
+56
View File
@@ -0,0 +1,56 @@
# Agent Ethos
CodeWhale is maintained with agents, but it is not maintained by automation
alone. Treat community reports and patches as real collaboration: people are
bringing us machines, providers, regions, shells, packages, and edge cases we
could not cover by ourselves.
## Stewardship
- Verify live truth before acting. Check the current branch, release state,
registry state, CI, and linked issues instead of trusting a handoff.
- Issues are intake, not a privilege boundary. Do not auto-close good-faith
issues because the reporter is not allowlisted. Ask for missing reproduction
detail and leave room for maintainer triage.
- PR gates exist for code review, CI load, and trust-boundary safety. They are
not a quality judgment on the contributor. Keep dry-run mode unless a
maintainer deliberately enables enforcement, and use warm copy when the gate
comments.
- Be generous with recurring contributors. When someone repeatedly brings
useful reports or patches, use `/lgtmi` for issue access or `/lgtm` for PR
access so the automation gets out of their way.
- Preserve contributor credit. When harvesting work, inspect the PR and linked
issues, keep author/co-author attribution where possible, add
`Harvested from PR #N by @handle`, and credit the contributor in the
changelog or release notes.
- Make credit machine-readable. If a harvested commit cannot preserve the
contributor as the author, add a `Co-authored-by` trailer with the GitHub
numeric noreply address from `.github/AUTHOR_MAP` or
`gh api users/<login> --jq '"\(.id)+\(.login)@users.noreply.github.com"'`.
Do not use `.local`, placeholder, bot/tool, or raw third-party emails for
human contributor credit.
- Deferral is a maintainer action, not a dismissal. If a PR or issue is not
ready, say what is blocked, what evidence would change the decision, and
which part of the work remains valuable.
## Agent Workflow
- Use sub-agents for exploration, review, and verification, but keep a human
maintainer posture in the parent session. Sub-agent output is evidence; the
parent is responsible for the final decision.
- Personally review community PRs before merging, harvesting, closing, or
deferring them. Do not close work based only on title, labels, or an agent's
summary.
- Prefer narrow, reversible changes that match the existing codebase. Avoid
drive-by refactors while harvesting community work.
- Run the smallest meaningful validation first, then broaden tests when a
change touches shared behavior, release plumbing, auth, sandboxing,
providers, or UI workflows.
- Do not tag, publish, push release artifacts, or create GitHub releases
without explicit maintainer approval.
## Product Tone
CodeWhale should feel like a capable coding harness with a public community,
not a closed queue. Automation should reduce maintainer load while making
contributors feel seen, credited, and able to keep helping.
+155 -14
View File
@@ -209,6 +209,25 @@ legacy top-level `base_url`, so the OpenAI-compatible provider receives it.
provider tables in one config, `[providers.openai].model` can be used as the
OpenAI-provider-specific override.
If the gateway accepts `POST /chat/completions` but rejects
`/v1/chat/completions`, set a provider-local `path_suffix`:
```toml
[providers.openai]
base_url = "https://your-gateway.example/v1"
path_suffix = "/chat/completions"
```
The suffix applies only to chat-completion requests. Model listing and
DeepSeek beta paths keep their built-in routing so a generic gateway override
does not accidentally rewrite `/models` or `/beta/completions`.
For private gateways with broken or intercepted certificates, prefer
`SSL_CERT_FILE` with a trusted CA bundle. As a last resort, a provider table can
set `insecure_skip_tls_verify = true`; this disables certificate verification
only for the active LLM provider client, leaves other HTTP clients unchanged,
and is reported by `codewhale doctor`.
Local HTTP endpoints such as Ollama, SGLang, and vLLM are allowed by default
when they use localhost or loopback addresses. For a non-local `http://`
gateway, launch with `DEEPSEEK_ALLOW_INSECURE_HTTP=1` only on a trusted network:
@@ -246,6 +265,13 @@ api_key = "YOUR_XIAOMI_KEY"
base_url = "https://api.xiaomimimo.com/v1"
```
The example above uses Xiaomi MiMo's pay-as-you-go OpenAI-compatible endpoint.
If you are using a Token Plan key (`tp-...`) for `[vision_model]`, you must set
`base_url` explicitly because this generic OpenAI-compatible block does not
auto-select MiMo endpoints. Use
`https://token-plan-sgp.xiaomimimo.com/v1` for Singapore accounts or
`https://token-plan-cn.xiaomimimo.com/v1` for China-region accounts.
To bootstrap MCP and skills directories at their resolved paths, run `codewhale-tui setup`.
To only scaffold MCP, run `codewhale-tui mcp init`.
@@ -352,6 +378,35 @@ Select a profile with:
If a profile is selected but missing, codewhale exits with an error listing available profiles.
## Harness Profiles
v0.9 adds a config data model for model-specific harness posture. This is a
preview schema: it can be parsed and tested, but runtime provider/model
selection and prompt/tool behavior are wired in later v0.9 slices.
When no configured profile matches, the resolver falls back to built-in seed
profiles for the model families listed in the cutline doc. Configured profiles
always take precedence over those seeds.
```toml
[[harness_profiles]]
provider_route = "deepseek"
model_pattern = "deepseek-v4.*"
[harness_profiles.posture]
kind = "cache-heavy" # standard | cache-heavy | lean | custom
max_subagents = 10 # 0 means runtime default
prefer_codebase_search = false
compaction_strategy = "prefix-cache" # default | prefix-cache | aggressive
tool_surface = "full" # full | read-only | auto
safety_posture = "standard" # standard | strict | permissive
```
Unknown posture names or unknown keys inside a harness profile fail config
deserialization instead of silently becoming `custom`. That is intentional:
once runtime wiring consumes these profiles, a typo should be visible.
The v0.9 implementation order and automatic-creator boundary are documented in
[`HARNESS_PROFILE_CUTLINE.md`](HARNESS_PROFILE_CUTLINE.md).
## Environment Variables
Most runtime environment variables override config values. API-key variables are
@@ -390,13 +445,17 @@ Remaining variables:
- `VOLCENGINE_MODEL` or `VOLCENGINE_ARK_MODEL`
- `OPENROUTER_API_KEY`
- `OPENROUTER_BASE_URL`
- `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, or `MIMO_API_KEY`
- `XIAOMI_MIMO_TOKEN_PLAN_API_KEY`, `MIMO_TOKEN_PLAN_API_KEY`, `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, or `MIMO_API_KEY`
- `XIAOMI_MIMO_BASE_URL` or `MIMO_BASE_URL`
- `XIAOMI_MIMO_MODEL` or `MIMO_MODEL`
- `XIAOMI_MIMO_MODE` or `MIMO_MODE`
- `NOVITA_API_KEY`
- `NOVITA_BASE_URL`
- `FIREWORKS_API_KEY`
- `FIREWORKS_BASE_URL`
- `HUGGINGFACE_API_KEY` or `HF_TOKEN` (`HF_TOKEN` is a fallback alias accepted when provider is `huggingface`)
- `HUGGINGFACE_BASE_URL` or `HF_BASE_URL`
- `HUGGINGFACE_MODEL` or `HF_MODEL`
- `SILICONFLOW_API_KEY`
- `SILICONFLOW_BASE_URL`
- `SILICONFLOW_MODEL`
@@ -555,6 +614,61 @@ the message. Existing environment variables remain available.
`shell_env` hooks keep their existing `KEY=VALUE` stdout contract;
the JSON stdout contract applies only to `message_submit`.
### Turn-end observer hooks
`turn_end` hooks observe the end of each model turn after post-turn
state, usage totals, cost accounting, notifications, receipts, and
queue recovery have been updated. They receive JSON on stdin and are
observer-only: stdout is ignored, failures are logged as warnings, and
the hook cannot block user input, mutate the transcript, or change the
next queued follow-up.
```toml
[[hooks.hooks]]
event = "turn_end"
command = "~/.codewhale/hooks/turn-audit.sh"
timeout_secs = 2
continue_on_error = true
```
The payload includes common hook metadata plus post-turn accounting:
```json
{
"event": "turn_end",
"session_id": "sess_12345678",
"workspace": "/path/to/workspace",
"mode": "agent",
"model": "deepseek-chat",
"turn_id": "turn_12345678",
"status": "completed",
"error": null,
"duration_ms": 1834,
"usage": {
"input_tokens": 1200,
"output_tokens": 180,
"prompt_cache_hit_tokens": 900,
"prompt_cache_miss_tokens": 300,
"reasoning_tokens": null,
"reasoning_replay_tokens": null
},
"totals": {
"session_tokens": 1380,
"conversation_tokens": 1380,
"input_tokens": 1200,
"output_tokens": 180
},
"tool_count": 2,
"queued_message_count": 1,
"stop_hook_active": false
}
```
For `interrupted` or `failed` turns, `status` reflects that terminal
state and `error` carries the engine error string when one is available.
`stop_hook_active` is reserved for future re-entry protection and is
currently always `false`.
### Sub-agent lifecycle hooks
`subagent_spawn` and `subagent_complete` hooks observe sub-agent lifecycle
@@ -741,9 +855,11 @@ If you are upgrading from older releases:
### Core keys (used by the TUI/engine)
- `provider` (string, optional): `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `siliconflow-CN`, `arcee`, `moonshot`, `sglang`, `vllm`, or `ollama`. Legacy `deepseek-cn` configs are still accepted as an alias for `deepseek`; DeepSeek uses the same official host [`https://api.deepseek.com`](https://api-docs.deepseek.com/) worldwide. `nvidia-nim` targets NVIDIA's NIM-hosted DeepSeek endpoints through `https://integrate.api.nvidia.com/v1`; `openai` targets a generic OpenAI-compatible endpoint, defaulting to `https://api.openai.com/v1`; `atlascloud` targets AtlasCloud's OpenAI-compatible endpoint at `https://api.atlascloud.ai/v1`; `wanjie-ark` targets Wanjie Ark's OpenAI-compatible endpoint at `https://maas-openapi.wanjiedata.com/api/v1`; `volcengine` targets Volcengine Ark's OpenAI-compatible coding endpoint at `https://ark.cn-beijing.volces.com/api/coding/v3`; `openrouter` targets `https://openrouter.ai/api/v1`; `xiaomi-mimo` targets Xiaomi MiMo's OpenAI-compatible endpoint at `https://api.xiaomimimo.com/v1`; `novita` targets `https://api.novita.ai/v1`; `fireworks` targets `https://api.fireworks.ai/inference/v1`; `siliconflow` targets SiliconFlow, defaulting to `https://api.siliconflow.com/v1`; `siliconflow-CN` targets the SiliconFlow China regional endpoint while sharing `[providers.siliconflow]`; `arcee` targets Arcee AI's OpenAI-compatible endpoint at `https://api.arcee.ai/api/v1`; `moonshot` targets Moonshot/Kimi, defaulting to `https://api.moonshot.ai/v1`; `sglang` targets a self-hosted OpenAI-compatible endpoint, defaulting to `http://localhost:30000/v1`; `vllm` targets a self-hosted vLLM OpenAI-compatible endpoint, defaulting to `http://localhost:8000/v1`; `ollama` targets Ollama's OpenAI-compatible endpoint, defaulting to `http://localhost:11434/v1`.
- `provider` (string, optional): `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `siliconflow-CN`, `arcee`, `moonshot`, `sglang`, `vllm`, or `ollama`. Legacy `deepseek-cn` configs are still accepted as an alias for `deepseek`; DeepSeek uses the same official host [`https://api.deepseek.com`](https://api-docs.deepseek.com/) worldwide. `nvidia-nim` targets NVIDIA's NIM-hosted DeepSeek endpoints through `https://integrate.api.nvidia.com/v1`; `openai` targets a generic OpenAI-compatible endpoint, defaulting to `https://api.openai.com/v1`; `atlascloud` targets AtlasCloud's OpenAI-compatible endpoint at `https://api.atlascloud.ai/v1`; `wanjie-ark` targets Wanjie Ark's OpenAI-compatible endpoint at `https://maas-openapi.wanjiedata.com/api/v1`; `volcengine` targets Volcengine Ark's OpenAI-compatible coding endpoint at `https://ark.cn-beijing.volces.com/api/coding/v3`; `openrouter` targets `https://openrouter.ai/api/v1`; `xiaomi-mimo` targets Xiaomi MiMo's OpenAI-compatible endpoint, using `https://token-plan-sgp.xiaomimimo.com/v1` by default for Token Plan keys (`tp-...`) and `https://api.xiaomimimo.com/v1` for pay-as-you-go keys; set `base_url` explicitly if your Token Plan account uses the China region; `novita` targets `https://api.novita.ai/v1`; `fireworks` targets `https://api.fireworks.ai/inference/v1`; `siliconflow` targets SiliconFlow, defaulting to `https://api.siliconflow.com/v1`; `siliconflow-CN` targets the SiliconFlow China regional endpoint while sharing `[providers.siliconflow]`; `arcee` targets Arcee AI's OpenAI-compatible endpoint at `https://api.arcee.ai/api/v1`; `moonshot` targets Moonshot/Kimi, defaulting to `https://api.moonshot.ai/v1`; `sglang` targets a self-hosted OpenAI-compatible endpoint, defaulting to `http://localhost:30000/v1`; `vllm` targets a self-hosted vLLM OpenAI-compatible endpoint, defaulting to `http://localhost:8000/v1`; `ollama` targets Ollama's OpenAI-compatible endpoint, defaulting to `http://localhost:11434/v1`.
- `api_key` (string, required for hosted providers): must be non-empty for DeepSeek/hosted providers (or set the provider API key env var). Self-hosted SGLang, vLLM, and Ollama can omit it.
- `base_url` (string, optional): defaults to `https://api.deepseek.com/beta` for DeepSeek's OpenAI-compatible Chat Completions API, including legacy `provider = "deepseek-cn"` configs. Other defaults are `https://integrate.api.nvidia.com/v1` for `nvidia-nim`, `https://api.openai.com/v1` for `openai`, `https://api.atlascloud.ai/v1` for `atlascloud`, `https://maas-openapi.wanjiedata.com/api/v1` for `wanjie-ark`, `https://ark.cn-beijing.volces.com/api/coding/v3` for `volcengine`, `https://openrouter.ai/api/v1` for `openrouter`, `https://api.xiaomimimo.com/v1` for `xiaomi-mimo`, `https://api.novita.ai/v1` for `novita`, `https://api.fireworks.ai/inference/v1` for `fireworks`, `https://api.siliconflow.com/v1` for `siliconflow`, `https://api.siliconflow.cn/v1` for `siliconflow-CN`, `https://api.arcee.ai/api/v1` for `arcee`, `https://api.moonshot.ai/v1` for `moonshot`, `http://localhost:30000/v1` for `sglang`, `http://localhost:8000/v1` for `vllm`, and `http://localhost:11434/v1` for `ollama`. Set `https://api.deepseek.com` or `https://api.deepseek.com/v1` explicitly to opt out of DeepSeek beta features.
- `base_url` (string, optional): defaults to `https://api.deepseek.com/beta` for DeepSeek's OpenAI-compatible Chat Completions API, including legacy `provider = "deepseek-cn"` configs. Other defaults are `https://integrate.api.nvidia.com/v1` for `nvidia-nim`, `https://api.openai.com/v1` for `openai`, `https://api.atlascloud.ai/v1` for `atlascloud`, `https://maas-openapi.wanjiedata.com/api/v1` for `wanjie-ark`, `https://ark.cn-beijing.volces.com/api/coding/v3` for `volcengine`, `https://openrouter.ai/api/v1` for `openrouter`, `https://token-plan-sgp.xiaomimimo.com/v1` for `xiaomi-mimo` when the API key starts with `tp-...` and `https://api.xiaomimimo.com/v1` otherwise, `https://api.novita.ai/v1` for `novita`, `https://api.fireworks.ai/inference/v1` for `fireworks`, `https://api.siliconflow.com/v1` for `siliconflow`, `https://api.siliconflow.cn/v1` for `siliconflow-CN`, `https://api.arcee.ai/api/v1` for `arcee`, `https://api.moonshot.ai/v1` for `moonshot`, `http://localhost:30000/v1` for `sglang`, `http://localhost:8000/v1` for `vllm`, and `http://localhost:11434/v1` for `ollama`. Set `base_url = "https://token-plan-cn.xiaomimimo.com/v1"` explicitly if your Xiaomi MiMo Token Plan account is provisioned in the China region. Set `https://api.deepseek.com` or `https://api.deepseek.com/v1` explicitly to opt out of DeepSeek beta features.
- `path_suffix` (string, optional provider-table key): override the chat-completions path for OpenAI-compatible gateways that do not serve `/v1/chat/completions`. For example, `[providers.openai] path_suffix = "/chat/completions"` sends chat requests to the unversioned base URL plus `/chat/completions`; `models` and `beta/*` requests keep their normal routing.
- `insecure_skip_tls_verify` (bool, optional provider-table key): disabled by default. When true on the active provider table, only the LLM provider HTTP client skips TLS certificate verification. Prefer `SSL_CERT_FILE` for corporate or private CA bundles; `codewhale doctor` reports this setting when enabled.
- `default_text_model` (string, optional): defaults to `deepseek-v4-pro` for DeepSeek and generic OpenAI-compatible endpoints, `deepseek-ai/deepseek-v4-pro` for NVIDIA NIM, `deepseek-ai/deepseek-v4-flash` for AtlasCloud, `deepseek-reasoner` for Wanjie Ark, `DeepSeek-V4-Pro` for Volcengine Ark, `deepseek/deepseek-v4-pro` for OpenRouter and Novita, `mimo-v2.5-pro` for Xiaomi MiMo, `accounts/fireworks/models/deepseek-v4-pro` for Fireworks, `deepseek-ai/DeepSeek-V4-Pro` for SiliconFlow, `trinity-large-thinking` for Arcee AI, `kimi-k2.6` for Moonshot, `deepseek-ai/DeepSeek-V4-Pro` for SGLang/vLLM, and `deepseek-coder:1.3b` for Ollama. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows, 384K max output, and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash` until July 24, 2026, except SiliconFlow maps `deepseek-reasoner` and `deepseek-r1` to its Pro model while `deepseek-chat` and `deepseek-v3` map to Flash. Provider-specific mappings translate `deepseek-v4-pro` / `deepseek-v4-flash` to each provider's model ID where supported. OpenRouter also recognizes recent large IDs such as `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, and `moonshotai/kimi-k2.6`; direct Arcee uses bare IDs such as `trinity-large-thinking` and `trinity-large-preview`; direct Xiaomi MiMo recognizes chat IDs `mimo-v2.5-pro` and `mimo-v2.5`, while TTS IDs are selected through `codewhale speech` / `tts`. Generic `openai`, `atlascloud`, `wanjie-ark`, `xiaomi-mimo`, `arcee`, and Ollama model IDs are passed through unchanged after known aliases are normalized. OpenRouter and SiliconFlow provider configs with a custom `base_url` also preserve explicit model values, which lets OpenAI-compatible gateways accept bare model IDs. Use `/models` or `codewhale models` to discover live IDs from your configured endpoint. `CODEWHALE_MODEL` overrides this for a single process; `DEEPSEEK_MODEL` is the legacy alias.
- `reasoning_effort` (string, optional): `off`, `low`, `medium`, `high`, or `max`; defaults to the configured UI tier. DeepSeek Platform receives top-level `thinking` / `reasoning_effort` fields. NVIDIA NIM receives equivalent settings through `chat_template_kwargs`.
- `allow_shell` (bool, optional): defaults to `false`; shell tools must be explicitly enabled.
@@ -759,8 +875,10 @@ If you are upgrading from older releases:
records loaded next to `config.toml`, for example
`~/.codewhale/permissions.toml`. This schema foundation accepts
`[[rules]]` entries with `tool` plus optional `command` or `path` fields.
It intentionally does not accept typed allow/deny records or provide approval
UI persistence yet.
Loaded rules feed the execution policy engine and force approval in approval
modes that can ask; under `approval_policy = "never"`, matching ask rules are
rejected because no prompt can be shown. This intentionally does not accept
typed allow/deny records, glob expansion, or approval UI persistence yet.
- `managed_config_path` (string, optional): managed config file loaded after user/env config.
- `requirements_path` (string, optional): requirements file used to enforce allowed approval/sandbox values.
- `max_subagents` (int, optional): defaults to `10` and is clamped to `1..=20`.
@@ -851,18 +969,22 @@ If you are upgrading from older releases:
turns whose elapsed time meets `threshold_secs`; failed and cancelled
turns are silent. `auto` resolves to `osc9` for `iTerm.app`, `Ghostty`,
and `WezTerm` (detected via `$TERM_PROGRAM`). Otherwise the fallback is
`bel` on macOS / Linux and `off` on Windows (where BEL maps to the
system error chime — see the [Notifications](#notifications) section
for the full rationale, #583).
`bel`; on Windows the BEL path is routed through `MessageBeep(MB_OK)`.
- `[notifications].threshold_secs` (int, optional): defaults to `30`.
Only completed turns whose elapsed time meets or exceeds this fire a
notification.
- `[notifications].include_summary` (bool, optional): defaults to
`false`. When `true`, the notification body includes the elapsed
duration and the turn's cost in the configured display currency.
- `[notifications].completion_sound` (string, optional): `off`, `beep`,
`bell`, or `file`. Defaults to `beep`. `file` plays the WAV path from
`[notifications].sound_file` on Windows.
- `[notifications].sound_file` (path, optional): path to a custom WAV file
used when `completion_sound = "file"`.
- `tui.alternate_screen` (string, optional): `auto`, `always`, or `never`. This is retained for config compatibility, but interactive sessions now always use the TUI-owned alternate screen so host terminal scrollback cannot hijack the viewport.
- `tui.mouse_capture` (bool, optional, default `true` on non-Windows terminals and on Windows Terminal/ConEmu/Cmder when the alternate screen is active; `false` on legacy Windows console and inside JetBrains JediTerm — PyCharm/IDEA/CLion/etc. — where mouse-event escapes leak into the input stream as garbled text, see #878 / #898): enable internal mouse scrolling, transcript selection, right-click context actions, and transcript scrollbar dragging. TUI-owned drag selection copies only transcript text, removes visual wrap-column line breaks from paragraphs, and keeps selection scoped to the transcript pane. Set this to `false` or run with `--no-mouse-capture` for raw terminal selection; set it to `true` or run with `--mouse-capture` to opt in anywhere it's defaulted off. On raw terminal selection, especially on legacy Windows console or when mouse capture is disabled, selection may cross the right sidebar and include visual wraps because the terminal, not the TUI, owns the selection.
- `tui.terminal_probe_timeout_ms` (int, optional, default `500`): startup terminal-mode probe timeout in milliseconds. Values are clamped to `100..=5000`; timeout emits a warning and aborts startup instead of hanging indefinitely.
- `tui.stream_chunk_timeout_secs` (int, optional, default `300`): per-SSE-chunk idle timeout for streamed model responses. Slow local or compatible servers can raise this with `/config stream_chunk_timeout_secs <seconds>`; `0` maps to the default and explicit values must be `1..=3600`. The legacy `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` env var is still honored when this key is omitted.
- `tui.osc8_links` (bool, optional, default `true`): emit OSC 8 escape sequences around URLs in transcript output so terminals that support them (iTerm2, Terminal.app 13+, Ghostty, Kitty, WezTerm, Alacritty, recent gnome-terminal/konsole) render them as Cmd+click hyperlinks. Terminals without OSC 8 support render the plain URL and ignore the escape. Set `false` for terminals that misrender the sequence; selection/clipboard output always strips the escapes.
- `hooks` (optional): lifecycle hooks configuration (see `config.example.toml`).
- `features.*` (optional): feature flag overrides (see below).
@@ -922,16 +1044,22 @@ The TUI can emit a desktop notification (OSC 9 escape or plain BEL) when a turn
method = "auto" # auto | osc9 | bel | off
threshold_secs = 30 # only notify when the turn took >= this many seconds
include_summary = false # include elapsed time + cost in the notification body
completion_sound = "beep" # off | beep | bell | file
sound_file = "E:\\google\\downloads\\notify.wav" # for completion_sound = "file"
```
Method semantics:
- `auto` (default) — picks `osc9` for `iTerm.app`, `Ghostty`, and `WezTerm` (detected via `$TERM_PROGRAM`). On macOS and Linux it falls back to `bel`. **On Windows the fallback is `off`** instead of `bel`, because the Windows audio stack maps `\x07` to the `SystemAsterisk` / `MB_OK` chime — the same sound application error popups use, so a successful-turn notification ends up sounding like an error (#583).
- `auto` (default) — picks `osc9` for `iTerm.app`, `Ghostty`, and `WezTerm` (detected via `$TERM_PROGRAM`). Otherwise it falls back to `bel`; on Windows that BEL path is routed through `MessageBeep(MB_OK)`.
- `osc9` — emit `\x1b]9;<msg>\x07`. Inside tmux the sequence is wrapped in DCS passthrough so it reaches the outer terminal.
- `bel` — emit a single `\x07` byte. Use this on Windows only if you actively want the chime back.
- `off` — disable post-turn notifications entirely.
Windows users who run inside a known OSC-9 terminal (e.g. WezTerm on Windows) keep getting OSC-9 notifications; the `off` fallback only applies when no recognised `TERM_PROGRAM` is detected.
Windows users who run inside a known OSC-9 terminal (e.g. WezTerm on Windows) keep getting OSC-9 notifications. Set `method = "off"` to disable threshold-based desktop notifications entirely.
`completion_sound = "file"` is for Windows users who want a per-application
completion sound without changing the global Windows sound scheme. It plays the
configured WAV `sound_file` asynchronously via the native Windows audio API.
### Parsed but currently unused (reserved for future versions)
@@ -978,8 +1106,15 @@ Use `codewhale-tui features list` to inspect known flags and their effective sta
`web_search` uses DuckDuckGo by default and does not require an API key. The
DuckDuckGo path keeps a Bing fallback when DDG returns a bot challenge or no
parseable results. Bing remains selectable for users who explicitly want it,
and Tavily, Bocha, Metaso, or Baidu can be selected when an API-backed provider
is preferred.
and Tavily, Bocha, Metaso, Baidu, Volcengine, or Sofya can be selected when an
API-backed provider is preferred.
For a private/internal search service that serves DuckDuckGo-compatible HTML,
keep `provider = "duckduckgo"` and set `base_url`; CodeWhale appends the `q`
query parameter to that endpoint and applies network policy to its host.
Custom endpoints do not fall back to public Bing. `CODEWHALE_SEARCH_BASE_URL`
can override this per process; `DEEPSEEK_SEARCH_BASE_URL` remains accepted as
the legacy alias.
**Metaso** ([metaso.cn](https://metaso.cn)) has a 100 searches/day free quota;
set `METASO_API_KEY` or `[search] api_key` for a higher quota.
@@ -989,10 +1124,16 @@ set `METASO_API_KEY` or `[search] api_key` for a higher quota.
`BAIDU_SEARCH_API_KEY` or `[search] api_key`. This is a search-tool backend
only; it does not add a Baidu model provider.
**Sofya** ([sofya.co](https://sofya.co)) returns full extracted page content
rather than snippets. Set `[search] api_key` to your `ay_live_...` key, or the
`SOFYA_API_KEY` env var. This is a search-tool backend only; it does not add a
Sofya model provider.
```toml
[search]
provider = "baidu" # duckduckgo | bing | tavily | bocha | metaso | baidu
# api_key = "YOUR_KEY" # required for tavily, bocha, and baidu; optional for metaso
provider = "baidu" # duckduckgo | bing | tavily | bocha | metaso | baidu | volcengine | sofya
# base_url = "https://search.example/html/" # optional with provider = "duckduckgo"
# api_key = "YOUR_KEY" # required for tavily, bocha, baidu, volcengine, and sofya; optional for metaso
```
## Local Media Attachments
+5
View File
@@ -232,6 +232,11 @@ Or switch directly:
Plan mode is the safest place to start in an unfamiliar repository. It is for
inspection and decision-making, not file edits.
For non-trivial work, Plan mode's confirmation prompt can show a grounded
PlanArtifact: objective, context, sources used, critical files, constraints,
approach, verification plan, risks, and handoff notes. Empty sections are
visible when the agent uses the rich artifact shape, so you can ask for a
revision instead of accepting an under-specified plan.
Agent mode is the default for most contribution work. It lets CodeWhale read,
run checks, and edit files while keeping risky actions behind approval gates.
+79
View File
@@ -0,0 +1,79 @@
# Harness Profile Cutline
This note defines the v0.9.0 order for HarnessProfile work. The automatic
Harness Creator must not run before the profile schema, resolver, seed
profiles, and user-visible status surfaces are explicit and tested.
## Decision
For v0.9.0, CodeWhale should treat harness profiles as typed policy data first.
Automatic profile evolution is deferred until replay evidence, candidate
manifests, and promotion gates exist.
The first implementation lane stops at:
1. `HarnessPosture` enum and policy knobs.
2. `HarnessProfile` schema and registry.
3. Deterministic profile resolver.
4. Seed profiles for common model families.
5. Repo constitution overlay input.
6. Status/UX display of the resolved provider, model, profile, and repo law.
Only after those surfaces are visible and tested should CodeWhale add evidence
stores, candidate manifests, promotion gates, or an agentic Harness Creator.
## Required Seed Profiles
| Model family | Intended posture | Notes |
| --- | --- | --- |
| DeepSeek V4 Pro / Flash | cache-heavy | Preserve prefix stability and large-context continuity. |
| Xiaomi MiMo v2.5 Pro / Flash | cache-heavy | Similar long-context/cache posture, but route and auth remain distinct from DeepSeek. |
| Arcee Trinity Thinking | cache-heavy or explicit Arcee profile | Direct Arcee IDs such as `trinity-large-thinking` must not be hidden behind OpenRouter aliases. |
| Hugging Face / local / open-weight routes | lean | Prefer smaller context packs, stricter tool surfaces, and subagent-oriented decomposition. |
| Generic OpenAI-compatible gateways | standard unless matched | Do not infer provider-specific posture from a bare endpoint alone. |
Provider route, endpoint, model id, HarnessProfile, and repo constitution must be
separately visible. A profile resolver may choose a profile, but it must not
silently change provider auth, base URLs, model IDs, tool allowlists, or repo
permissions.
## Repo Constitution Boundary
`.codewhale/constitution.json` is local repo law, not another provider profile.
The resolver may read it as an input after project trust checks, but profile
selection must show both:
- the model-facing posture, such as `cache-heavy` or `lean`;
- the repo-law source, such as `.codewhale/constitution.json` or none.
## Automatic Evolution Boundary
AHE/GEPA-style profile evolution is future work. It can be referenced as
inspiration only after the text distinguishes these stages:
1. candidate proposal from recorded evidence;
2. replay/eval against a weaker or constrained student;
3. promotion-gate decision with required tests and policy checks;
4. inspectable overlay update or rollback.
No v0.9.0 harness profile should be silently promoted, mutated, or written to a
cached-main overlay by the schema/resolver/display lane.
## Smoke Evidence
Before v0.9.0 ships with HarnessProfile runtime behavior beyond schema parsing
and pure resolver checks, the acceptance matrix should record evidence for:
- DeepSeek V4 resolving to a cache-heavy profile;
- Xiaomi MiMo resolving to a cache-heavy profile without sharing DeepSeek auth;
- Arcee direct `trinity-large-thinking` resolving through the direct `arcee`
route, not the OpenRouter `arcee-ai/trinity-large-thinking` alias;
- a generic/HF/local model resolving to a lean or standard profile;
- the TUI or runtime status surface showing provider, model, profile, and repo
constitution separately;
- no automatic profile mutation during normal Agent or WhaleFlow runs.
For v0.9.0, pure resolver tests may satisfy the profile-selection evidence, but
status display and runtime use remain deferred until separate PRs wire those
surfaces deliberately. Release notes should still call HarnessProfile a typed
schema/resolver foundation rather than an automatic harness creator.
+92
View File
@@ -0,0 +1,92 @@
# HarmonyOS and OpenHarmony
This page covers CodeWhale on HarmonyOS PC and OpenHarmony cross-build setups.
## Running On HarmonyOS PC
HarmonyOS PC can use the normal Linux ARM64 package when its userspace is
glibc-compatible:
```bash
npm i -g codewhale
codewhale --version
```
You can also download `codewhale-linux-arm64` and
`codewhale-tui-linux-arm64` from the GitHub Releases page and place both
binaries on `PATH`.
## Cross-Compiling To OpenHarmony
The repository does not check in machine-specific SDK paths. Set
`OHOS_NATIVE_SDK` to the OpenHarmony native SDK directory, the directory that
contains `llvm/bin`, `sysroot`, and `build/cmake/ohos.toolchain.cmake`.
On Windows PowerShell:
```powershell
$env:OHOS_NATIVE_SDK="<path-to-openharmony-native-sdk>"
. .\scripts\ohos-env.ps1
rustup target add aarch64-unknown-linux-ohos
cargo build --target aarch64-unknown-linux-ohos -p codewhale-cli
```
On Linux or macOS:
```bash
export OHOS_NATIVE_SDK=/path/to/openharmony/native
. ./scripts/ohos-env.sh
rustup target add aarch64-unknown-linux-ohos
cargo build --target aarch64-unknown-linux-ohos -p codewhale-cli
```
The setup scripts export Cargo's target-specific `linker`, `AR`, `CC`, `CXX`,
`CFLAGS`, `CXXFLAGS`, `CARGO_ENCODED_RUSTFLAGS`, `CC_SHELL_ESCAPED_FLAGS`, and
CMake toolchain variables for `aarch64-unknown-linux-ohos`.
## Compiler Wrappers
For ad-hoc compiler calls, use the root wrappers. They read the same
`OHOS_NATIVE_SDK` variable and do not contain local paths.
Windows PowerShell:
```powershell
.\ohos-clang.ps1 --version
.\ohos-clangxx.ps1 --version
```
Linux or macOS:
```bash
sh ./ohos-clang.sh --version
sh ./ohos-clangxx.sh --version
```
If you want to run the POSIX wrappers directly as `./ohos-clang.sh`, make them
executable first:
```bash
chmod +x ./ohos-clang.sh ./ohos-clangxx.sh
```
## Cargo Config
`.cargo/config.toml` intentionally does not set a checked-in linker path.
Cargo cannot expand environment variables inside `linker` or CMake toolchain
path values there, so those values are exported by `scripts/ohos-env.ps1` and
`scripts/ohos-env.sh` instead.
## Dependency Guard
Release prep runs a no-SDK dependency check:
```bash
./scripts/release/check-ohos-deps.sh
```
The guard resolves the `codewhale-tui` dependency graph for
`aarch64-unknown-linux-ohos` and fails if unsupported host/UI crates re-enter
the target graph: `nix` 0.28/0.29, `portable-pty`, `starlark`, `arboard`, or
`keyring`. This does not replace a real SDK/sysroot build, but it catches the
known `starlark -> rustyline -> nix` and PTY/keyring regressions before release.
+34
View File
@@ -44,6 +44,8 @@ systems such as Alpine should use [Build from source](#7-build-from-source).
> and `codewhale-tui-linux-arm64`, so a plain `npm i -g codewhale` works
> on any glibc-based ARM64 Linux. If you're stuck on v0.8.7, jump to
> [Build from source](#7-build-from-source) — `cargo install` works fine.
> For HarmonyOS PC and OpenHarmony cross-build setup, see
> [HarmonyOS and OpenHarmony](HarmonyOS.md).
---
@@ -285,6 +287,38 @@ curl -L -o /tmp/codewhale-artifacts-sha256.txt \
(Use `shasum -a 256 -c` instead of `sha256sum` on macOS.)
### Roll back to a previous release
If a new release is bad on your machine, install the last known-good version
explicitly. Replace `X.Y.Z` with the version you want to restore.
```bash
# npm wrapper, including the matching GitHub release binaries
npm install -g codewhale@X.Y.Z
# Cargo install path; both crates are required
cargo install codewhale-cli --version X.Y.Z --locked --force
cargo install codewhale-tui --version X.Y.Z --locked --force
```
For manual installs, download both binaries or the platform archive from the
exact release tag and verify the matching checksum manifest from that same tag:
```bash
# individual binaries
curl -L -o codewhale-artifacts-sha256.txt \
https://github.com/Hmbown/CodeWhale/releases/download/vX.Y.Z/codewhale-artifacts-sha256.txt
# platform archives
curl -L -o codewhale-bundles-sha256.txt \
https://github.com/Hmbown/CodeWhale/releases/download/vX.Y.Z/codewhale-bundles-sha256.txt
```
Inside a CodeWhale workspace, `/restore list [N]` lists side-git file snapshots
and `/restore <N>` restores files from the chosen snapshot. That workspace
rollback does not change your installed binary version and does not rewrite
conversation history.
### Windows Scoop
The `codewhale` package is listed in Scoop's main bucket:
+50
View File
@@ -61,6 +61,56 @@ manager snapshot. Config edits made from the TUI are written immediately, but
the model-visible MCP tool pool is not hot-reloaded; the manager marks this as
restart-required until the TUI is restarted.
## Hugging Face MCP
Hugging Face provides a hosted MCP server for Hub resources, documentation,
datasets, Spaces, and community tools. CodeWhale does not call Hugging Face's
Hub HTTP APIs from `/hf`; it only helps you inspect and set up the MCP config
that the regular MCP manager will load.
The recommended setup path is Hugging Face's settings-generated configuration:
1. Visit <https://huggingface.co/settings/mcp> while signed in.
2. Choose the MCP client closest to your CodeWhale config shape and copy the
generated server snippet.
3. Paste the Hugging Face server entry into your resolved MCP config file.
4. Restart CodeWhale, or run `/mcp reload` for the manager snapshot and restart
if the model-visible tool pool still needs to rebuild.
CodeWhale reads both `servers` and `mcpServers`, so settings-generated snippets
can be adapted without changing the rest of the MCP file. A placeholder-only
shape looks like this:
```json
{
"servers": {
"huggingface": {
"url": "https://huggingface.co/mcp",
"headers": {
"Authorization": "Bearer ${HF_TOKEN}"
}
}
}
}
```
The placeholder above is not a runnable secret. Use the settings-generated
value in your private MCP config and never commit real Hugging Face tokens.
Interactive helpers:
```text
/hf mcp status
/hf mcp setup
/hf concepts
```
`/hf mcp status` checks the configured MCP file for common Hugging Face server
names or Hugging Face MCP URLs. `/hf concepts` explains the difference between
the Hugging Face provider route, Hugging Face MCP, and explicit Hub workflows.
Official docs: <https://huggingface.co/docs/hub/hf-mcp-server>
## Config File Location
Default path:
+13 -11
View File
@@ -15,10 +15,10 @@ implemented today.
- DeepSeek is the first-class default provider today, with `deepseek-v4-pro`,
`deepseek-v4-flash`, streaming thinking blocks, Fin routing, `DEEPSEEK_*`
environment variables, and `~/.deepseek` config compatibility.
- OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, generic
OpenAI-compatible endpoints, SGLang, vLLM, and Ollama are supported provider
paths where their IDs appear in `/provider`, `codewhale --provider`, or
`codewhale models`.
- OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, Hugging
Face Inference Providers, generic OpenAI-compatible endpoints, SGLang, vLLM,
and Ollama are supported provider paths where their IDs appear in
`/provider`, `codewhale --provider`, or `codewhale models`.
- Model auto-routing chooses a concrete DeepSeek model and thinking level per
turn. It is not a TUI mode.
- Fin is the fast `deepseek-v4-flash` thinking-off path for routing,
@@ -29,9 +29,11 @@ implemented today.
## Not Implemented Yet
- A native Hugging Face provider or Hub browser.
- Built-in Hugging Face model card, dataset, adapter, safetensors, or Jobs
workflows.
- A native Hugging Face Hub browser, model passport picker, or direct Hub search
workflow. The OpenAI-compatible Hugging Face Inference Providers route is
implemented separately as a chat provider.
- Built-in Hugging Face model card, dataset, adapter, safetensors, Spaces, or
Jobs workflows.
- Native Unsloth, NeMo, or Arcee integrations.
- A dedicated Model Lab UI tab.
- Built-in benchmark suites, eval leaderboards, hosted observability, or
@@ -62,13 +64,13 @@ Planned scope:
- Hub API auth and model discovery.
- Model cards, licenses, tags, safetensors metadata, adapters, and dataset
links surfaced in a terminal-friendly way.
- Inference Providers as explicit provider choices when the user configures
them.
- Native Hub browser and model-passport metadata on top of the already separate
Hugging Face Inference Providers chat route.
- Hugging Face Jobs as an optional remote execution path for user-approved
experiments.
Non-goal for now: claiming a native Hugging Face provider exists before it is
implemented in code.
Non-goal for now: claiming native Hub search, model passports, Spaces/Jobs, or
Model Lab UI exists before those surfaces are implemented in code.
## Unsloth Workset
+2 -1
View File
@@ -137,7 +137,8 @@ DeepSeek-TUI has three related but intentionally separate recovery paths:
- Esc-Esc backtrack rewinds the live transcript to a previous user prompt and
restores that prompt into the composer for editing.
- `/restore` and the `revert_turn` tool restore workspace files from side-git
snapshots. They do not rewrite conversation history.
snapshots. `/restore list [N]` lists more snapshot options before choosing a
rollback point. They do not rewrite conversation history.
A Pi-style in-file tree browser is a larger UI/data-model project. v0.8.40
ships the bounded fork/backtrack primitives and explicit lineage metadata.
+33 -1
View File
@@ -102,6 +102,12 @@ base_url = "https://gateway.example/v1"
model = "your-deepseek-compatible-model"
```
Private gateways with broken or intercepted certificates should use
`SSL_CERT_FILE` with a trusted CA bundle. As a last resort,
`insecure_skip_tls_verify = true` can be set on the active `[providers.*]`
table; it applies only to the LLM provider client and is shown by
`codewhale doctor`.
Keep `provider`, `api_key`, and `base_url` in user config or process
environment. Project-local config overlays intentionally cannot set those keys,
so a repository cannot silently redirect prompts or credentials to another
@@ -118,7 +124,7 @@ endpoint.
| `wanjie-ark` | `[providers.wanjie_ark]` | `WANJIE_ARK_API_KEY`, `WANJIE_API_KEY`, `WANJIE_MAAS_API_KEY` | `WANJIE_ARK_BASE_URL`, `WANJIE_BASE_URL`, `WANJIE_MAAS_BASE_URL`; default `https://maas-openapi.wanjiedata.com/api/v1` | `deepseek-reasoner` | OpenAI-compatible hosted route. `WANJIE_ARK_MODEL`, `WANJIE_MODEL`, and `WANJIE_MAAS_MODEL` are accepted. |
| `volcengine` | `[providers.volcengine]` | `VOLCENGINE_API_KEY`, `VOLCENGINE_ARK_API_KEY`, `ARK_API_KEY` | `VOLCENGINE_BASE_URL`, `VOLCENGINE_ARK_BASE_URL`, `ARK_BASE_URL`; default `https://ark.cn-beijing.volces.com/api/coding/v3` | `DeepSeek-V4-Pro`, `DeepSeek-V4-Flash` | Volcengine/Volcano Engine Ark OpenAI-compatible coding endpoint. `VOLCENGINE_MODEL` and `VOLCENGINE_ARK_MODEL` are accepted. |
| `openrouter` | `[providers.openrouter]` | `OPENROUTER_API_KEY` | `OPENROUTER_BASE_URL`; default `https://openrouter.ai/api/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`; recent large IDs include `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, `z-ai/glm-5.1`, `moonshotai/kimi-k2.6` | Additive open-model routing layer. It does not replace DeepSeek; it lets users route supported model IDs through OpenRouter when they choose it. |
| `xiaomi-mimo` | `[providers.xiaomi_mimo]` | `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, `MIMO_API_KEY` | `XIAOMI_MIMO_BASE_URL`, `MIMO_BASE_URL`; default `https://token-plan-sgp.xiaomimimo.com/v1` | Chat: `mimo-v2.5-pro`, `mimo-v2.5`; speech/TTS: `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | Xiaomi MiMo OpenAI-compatible chat completions route. Token Plan keys (`tp-...`) use the token-plan endpoint by default; pay-as-you-go keys can set `base_url = "https://api.xiaomimimo.com/v1"`. It sends `max_completion_tokens` and uses MiMo's `thinking` field for reasoning control. `codewhale speech` / `tts` uses the TTS models. |
| `xiaomi-mimo` | `[providers.xiaomi_mimo]` | `XIAOMI_MIMO_TOKEN_PLAN_API_KEY`, `MIMO_TOKEN_PLAN_API_KEY`, `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, `MIMO_API_KEY` | `XIAOMI_MIMO_BASE_URL`, `MIMO_BASE_URL`, `XIAOMI_MIMO_MODE`, `MIMO_MODE`; default `https://token-plan-sgp.xiaomimimo.com/v1` | Chat: `mimo-v2.5-pro`, `mimo-v2.5`; speech/TTS: `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | Xiaomi MiMo OpenAI-compatible chat completions route. Token Plan keys (`tp-...`) use `api-key` auth and the token-plan endpoint by default; pay-as-you-go mode uses standard API keys (`sk-...`) and `https://api.xiaomimimo.com/v1`. It sends `max_completion_tokens` and uses MiMo's `thinking` field for reasoning control. `codewhale speech` / `tts` uses the TTS models. |
| `novita` | `[providers.novita]` | `NOVITA_API_KEY` | `NOVITA_BASE_URL`; default `https://api.novita.ai/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash` | OpenAI-compatible hosted route for DeepSeek model IDs. Use config or `CODEWHALE_MODEL` / `DEEPSEEK_MODEL` for model overrides. |
| `fireworks` | `[providers.fireworks]` | `FIREWORKS_API_KEY` | `FIREWORKS_BASE_URL`; default `https://api.fireworks.ai/inference/v1` | `accounts/fireworks/models/deepseek-v4-pro` | OpenAI-compatible hosted route. Use config or `CODEWHALE_MODEL` / `DEEPSEEK_MODEL` for model overrides. |
| `siliconflow` | `[providers.siliconflow]` | `SILICONFLOW_API_KEY` | `SILICONFLOW_BASE_URL`; default `https://api.siliconflow.com/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | OpenAI-compatible hosted route. Official docs use the `.com` endpoint. `SILICONFLOW_MODEL` is accepted. Reasoning aliases `deepseek-reasoner` and `deepseek-r1` map to Pro; `deepseek-chat` and `deepseek-v3` map to Flash. |
@@ -130,6 +136,24 @@ endpoint.
| `ollama` | `[providers.ollama]` | Optional `OLLAMA_API_KEY` | `OLLAMA_BASE_URL`; default `http://localhost:11434/v1` | `deepseek-coder:1.3b`; provider-hinted custom tags pass through | Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. `OLLAMA_MODEL` is accepted. |
| `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible route. Org-prefixed model IDs pass through. |
### Hugging Face Provider vs MCP vs Hub
CodeWhale's `huggingface` provider ID is only the OpenAI-compatible chat
inference route through Hugging Face Inference Providers. It is selected with
`/provider huggingface`, `CODEWHALE_PROVIDER=huggingface`, or
`provider = "huggingface"`.
Hugging Face MCP is a separate external-tool route. Configure it through the
MCP config described in `docs/MCP.md`, preferably using the settings-generated
snippet from <https://huggingface.co/settings/mcp>. In the TUI, `/hf mcp status`
checks whether the Hugging Face MCP server appears in the resolved MCP config,
`/hf mcp setup` prints the settings workflow and a placeholder-only shape, and
`/hf concepts` explains the provider/MCP/Hub distinction.
Hub publishing or repository management remains explicit user action through
Hub-native tooling such as `huggingface_hub` or git. The `/hf` helper does not
upload to Hugging Face and does not perform direct Hugging Face Hub HTTP search.
### Xiaomi MiMo Notes
`xiaomi-mimo` defaults to `mimo-v2.5-pro` for long-context reasoning and coding
@@ -137,6 +161,14 @@ work. The chat picker also exposes the latest Omni model `mimo-v2.5`. Xiaomi MiM
TTS is available through `codewhale --provider xiaomi-mimo speech "text"
--model tts` (or the `tts` alias) plus model-visible `speech` / `tts` tools in
Agent/YOLO mode.
Token Plan keys default to the Singapore endpoint
`https://token-plan-sgp.xiaomimimo.com/v1`. If your MiMo account is provisioned
for the China region, set `base_url = "https://token-plan-cn.xiaomimimo.com/v1"`
explicitly in `[providers.xiaomi_mimo]` or set `mode = "token-plan-cn"`. Europe
Token Plan accounts can use `mode = "token-plan-ams"`; `mode = "pay-as-you-go"`
selects the standard API endpoint and standard MiMo key family.
Voice-design and voice-clone shorthands map to `mimo-v2.5-tts-voicedesign` and
`mimo-v2.5-tts-voiceclone`. Xiaomi's current
[image-understanding guide](https://platform.xiaomimimo.com/docs/en-US/usage-guide/multimodal-understanding/image-understanding)
+11 -12
View File
@@ -69,10 +69,10 @@ Anything that targets the DeepSeek provider API stays exactly as it was:
Docker, or direct downloads.
- **Docker image**: `ghcr.io/hmbown/codewhale`.
## Deprecation shims (through v0.8.x)
## Deprecation shims (removed in v0.9.0)
To keep existing shell aliases, scripts, and CI working through the rename,
v0.8.41 and later v0.8.x releases ship **deprecation shims**:
v0.8.41 and later v0.8.x releases shipped **deprecation shims**:
- A `deepseek` binary that prints a one-line warning to stderr and forwards
argv to `codewhale`.
@@ -80,7 +80,9 @@ v0.8.41 and later v0.8.x releases ship **deprecation shims**:
- The legacy `deepseek-tui` npm package is deprecated and no longer receives
new releases. Install the `codewhale` npm package instead.
These shims will be removed in **v0.9.0**. Please migrate before then.
These binary shims are removed in **v0.9.0**. DeepSeek provider support, model
IDs, `DEEPSEEK_*` environment variables, and legacy `~/.deepseek/` state
fallbacks remain supported.
## Migrating in practice
@@ -114,15 +116,12 @@ downloads until the formula and tap repo are renamed.
### Manual / GitHub Releases
`v0.8.41` Releases attach **both** the canonical `codewhale-*` /
`codewhale-tui-*` assets and the legacy `deepseek-*` / `deepseek-tui-*`
shim assets. Existing `deepseek update` invocations on v0.8.40 keep working;
they land you on the deprecation shim, which then prompts the install of
`codewhale`.
A second checksum manifest, `deepseek-artifacts-sha256.txt`, is attached as
an alias of `codewhale-artifacts-sha256.txt` so v0.8.40's hardcoded lookup
still verifies.
`v0.8.41` through `v0.8.x` Releases attached both the canonical
`codewhale-*` / `codewhale-tui-*` assets and compatibility-only
`deepseek-*` / `deepseek-tui-*` shim assets. Starting in v0.9.0, Releases attach
only the canonical `codewhale-*` / `codewhale-tui-*` assets and the canonical
`codewhale-artifacts-sha256.txt` checksum manifest. Install or update through
`codewhale` before moving to v0.9.0.
### Sessions, skills, and manual workspaces
+8
View File
@@ -6,6 +6,11 @@ Step through this in order from a clean worktree on the release branch
For deeper context on the underlying tools (preflight scripts, npm smoke,
publish-crates), see [`RELEASE_RUNBOOK.md`](RELEASE_RUNBOOK.md).
For v0.9.0, also complete the dedicated
[`V0_9_0_RELEASE_ACCEPTANCE.md`](V0_9_0_RELEASE_ACCEPTANCE.md) matrix before
tagging; it covers provider routes, WhaleFlow feature gates, GUI/runtime smoke,
remote workbench decisions, and credit hygiene that the generic checklist does
not enumerate.
## 1. CHANGELOG entry exists for the version
@@ -39,6 +44,9 @@ publish-crates), see [`RELEASE_RUNBOOK.md`](RELEASE_RUNBOOK.md).
- [ ] `Cargo.lock` is refreshed (`cargo update --workspace --offline`).
- [ ] `./scripts/release/check-versions.sh` reports
`Version state OK: workspace=X.Y.Z, npm=X.Y.Z, lockfile in sync.`
- [ ] `./scripts/release/check-ohos-deps.sh` reports that the OpenHarmony
target graph does not pull the unsupported `nix` 0.28/0.29,
`portable-pty`, `starlark`, `arboard`, or `keyring` crates.
## 3. Preflight gates
+21 -6
View File
@@ -25,6 +25,7 @@ Current packaging note:
- `codewhale-core`
- `codewhale-app-server`
- `codewhale-tui-core`
- `codewhale-whaleflow`
## Version Coordination
@@ -119,20 +120,22 @@ configured.
`main` and letting `auto-tag.yml` create the tag — see the npm wrapper
release section below for the `RELEASE_TAG_PAT` requirement).
4. Publish crates in this order with `./scripts/release/publish-crates.sh publish`:
- `codewhale-secrets`
- `codewhale-config`
- `codewhale-mcp`
- `codewhale-protocol`
- `codewhale-release`
- `codewhale-secrets`
- `codewhale-state`
- `codewhale-agent`
- `codewhale-tui-core`
- `codewhale-whaleflow`
- `codewhale-execpolicy`
- `codewhale-hooks`
- `codewhale-mcp`
- `codewhale-tools`
- `codewhale-config`
- `codewhale-agent`
- `codewhale-tui`
- `codewhale-core`
- `codewhale-app-server`
- `codewhale-tui-core`
- `codewhale-cli`
- `codewhale-tui`
5. Wait for each published crate version to appear on crates.io before publishing dependents.
The publish helper is idempotent for reruns: already-published crate versions are skipped.
@@ -202,6 +205,18 @@ remote add cnb …`, then `git push cnb vX.Y.Z`).
## Recovery and Rollback
- User-facing rollback:
- npm: `npm install -g codewhale@X.Y.Z`
- Cargo: `cargo install codewhale-cli --version X.Y.Z --locked --force`
and `cargo install codewhale-tui --version X.Y.Z --locked --force`
- manual assets: download binaries or the platform archive plus the matching
`codewhale-artifacts-sha256.txt` or `codewhale-bundles-sha256.txt`
manifest from `https://github.com/Hmbown/CodeWhale/releases/tag/vX.Y.Z`
- workspace files: use `/restore list [N]` and `/restore <N>` for side-git
snapshots; this does not change the installed binary version or rewrite
conversation history
- keep [docs/INSTALL.md](INSTALL.md#roll-back-to-a-previous-release) in sync
with these commands
- Crates publish partially:
- rerun `./scripts/release/publish-crates.sh publish`
- already-published crate versions will be skipped
+52
View File
@@ -178,6 +178,36 @@ fronting layer.
- `POST /v1/threads/{id}/resume`
- `POST /v1/threads/{id}/fork`
`GET /v1/threads/summary` is the read-only summary surface used by the VS Code
Agent View. Each item includes `id`, `title`, `preview`, `model`, `mode`,
`archived`, `updated_at`, `latest_turn_id`, `latest_turn_status`, plus
workspace metadata:
```json
{
"id": "thread_...",
"title": "Implement MCP status count",
"preview": "The TUI footer should count project MCP servers...",
"model": "deepseek-v4-pro",
"mode": "agent",
"branch": "feature/runtime-api",
"head": "abc1234",
"dirty": false,
"workspace": "/Users/you/projects/codewhale",
"archived": false,
"updated_at": "2026-06-06T05:43:00Z",
"latest_turn_id": "turn_...",
"latest_turn_status": "completed"
}
```
`branch` is resolved from the thread workspace at request time and may be
`null` when the workspace is not a Git repository or the branch cannot be read.
`head` is the current short Git commit for that workspace when available.
`dirty` is true when the workspace has staged, unstaged, or untracked changes.
`workspace` is included so editor clients can show when an agent lane is working
outside the current VS Code folder.
Thread forks are sibling runtime threads, not an in-place tree projection.
`thread.forked` events include `source_thread_id`; internal backtrack-aware
forks may also include `backtrack_depth_from_tail` and `dropped_turn_id`.
@@ -219,6 +249,28 @@ accept an empty string to clear a previously-set value. Added in v0.8.10 (#562):
**Events** (SSE replay + live stream)
- `GET /v1/threads/{id}/events?since_seq=<u64>`
**Snapshots** (read-only side-git restore point listing)
- `GET /v1/snapshots?limit=20`
`/v1/snapshots` lists recent side-git restore points for the runtime workspace.
It is read-only and does not restore files. `limit` defaults to `20` and must be
between `1` and `100`.
```json
[
{
"id": "snap_...",
"label": "post-turn:1",
"timestamp": 1780730580
}
]
```
Runtime API restore/retry/undo/editor-apply mutation endpoints are intentionally
deferred. GUI clients should treat thread summaries and snapshots as inspection
surfaces until atomic filesystem + conversation-state mutation semantics are
specified and tested.
**Receipts** (future read-only audit export)
- Proposed only: `GET /v1/threads/{thread_id}/turns/{turn_id}/receipt`
+13
View File
@@ -18,6 +18,19 @@ The `type` field on `agent_open` selects a system-prompt posture for the child
(`agent_type` is accepted as a compatibility alias). Each role is a distinct
stance toward the work — not just a different label.
## Maintainer posture
Sub-agents help CodeWhale move faster, but the parent agent still owns the
maintainer decision. Use children to gather evidence, review patches, and run
verification while keeping the community posture in
[`AGENT_ETHOS.md`](AGENT_ETHOS.md): issues are open intake, PR gates are
review-load controls, and harvested work needs clear contributor credit.
When a child reviews community work, the parent should still inspect the PR
diff, linked issues, tests, and CI before merging, harvesting, closing, or
deferring it. A sub-agent's result is a working set, not a substitute for
stewardship.
| Role | Stance | Writes? | Shell posture | Typical use |
|---------------|----------------------------------------|---------|---------------|----------------------------------------------|
| `general` | flexible; do whatever the parent says | yes | yes | the default; multi-step tasks |
+36 -1
View File
@@ -110,9 +110,24 @@ to the model, such as `mcp_<server>_<tool>`.
| `task_cancel` | Cancel a queued or running durable task. Approval-required. |
| `checklist_write` | Granular progress under the active thread/task. Checklist state is subordinate to the durable task. |
| `checklist_add` / `checklist_update` / `checklist_list` | Single-item checklist operations. |
| `todo_write` / `todo_add` / `todo_update` / `todo_list` | Compatibility aliases for the checklist tools. Existing sessions keep working, but new prompts should use `checklist_*`. |
| `note` | One-off important fact for later. |
The legacy `todo_write`, `todo_add`, `todo_update`, and `todo_list` names are
hidden compatibility aliases for saved transcript replay. They remain callable
by exact name, but they are not part of the model-visible catalog; compatibility
results include `_deprecation.use_instead = checklist_*` and
`_deprecation.removed_in = 0.9.0`.
`update_plan` accepts both the legacy shape (`explanation` plus `plan` steps)
and a richer PlanArtifact shape for Plan mode review. The richer fields are
optional and should be filled only when grounded in evidence: `title`,
`objective`, `context_summary`, `sources_used`, `critical_files`,
`constraints`, `recommended_approach`, `verification_plan`,
`risks_and_unknowns`, and `handoff_packet`. The transcript card, Plan-mode
confirmation prompt, `/relay`, and fork-state handoff all render the same
artifact so a plan can be reviewed, accepted, revised, replayed, or delegated
without losing its source context.
### Verification gates and artifacts
| Tool | Niche |
@@ -228,6 +243,12 @@ Aliases: `/batonpass`, `/接力`.
Use it before a long break, compaction, or moving work to a fresh session. The
relay should preserve the goal, current Work checklist item, changed files,
decisions, verification state, and one concrete next action.
Treat it as the deliberate counterpart to automatic compaction: both exist to
preserve continuity for the next session or sub-agent, but `/relay` lets the
current agent inspect live evidence and choose the durable handoff facts
explicitly. When `update_plan` has a rich PlanArtifact, `/relay` includes that
strategy metadata so manual relay, fork-state, and compacted continuity do not
drift into separate stories.
### Parallel fan-out: cost-class caps
@@ -257,6 +278,20 @@ prompting and tool catalogs. Do not use these names in new active guidance:
The old one-shot `rlm` model-facing tool is also replaced by persistent
`rlm_open` / `rlm_eval` / `rlm_configure` / `rlm_close` sessions.
v0.9.0 adds the following hidden-compat aliases (#2682, #2683):
| Hidden alias | Canonical replacement | Status |
|---|---|---|
| `todo_write` | `checklist_write` | Hidden, returns `_deprecation` metadata |
| `todo_add` | `checklist_add` | Hidden, returns `_deprecation` metadata |
| `todo_update` | `checklist_update` | Hidden, returns `_deprecation` metadata |
| `todo_list` | `checklist_list` | Hidden, returns `_deprecation` metadata |
| `exec_wait` | `exec_shell_wait` | Hidden, callable for replay |
| `exec_interact` | `exec_shell_interact` | Hidden, callable for replay |
All hidden aliases remain registered and callable so saved transcripts can
replay without teaching new sessions the deprecated spelling.
Historical compatibility results may include a `_deprecation` block shaped
like this:
+98
View File
@@ -0,0 +1,98 @@
# v0.9.0 Release Acceptance Matrix
This matrix is the pre-tag gate for v0.9.0. Do not tag or publish v0.9.0 until
each row is checked off or has an explicit defer decision with an owner.
For every manual smoke, record the date, OS, provider/model, command, redacted
config source, result, and follow-up issue or PR.
## Core Build And Packaging
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| `cargo fmt --all -- --check` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. |
| `cargo check --workspace --all-targets --locked` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. |
| `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. |
| `cargo test --workspace --all-features --locked` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df` (`4254 passed, 0 failed, 4 ignored` in `codewhale-tui`; package integration and doctest suites also passed). An earlier full run hit one transient localhost SSE reset in `mcp::tests::legacy_sse_closed_stream_reconnects_and_retries_tool_call`; the exact test passed serially before the full rerun. |
| `./scripts/release/check-versions.sh` | release steward | ship | Passed locally during #2845 (`e22a7da53`) and remains part of the PR-local release gate for each stewardship slice. |
| `./scripts/release/check-ohos-deps.sh` | release steward | ship | Passed locally during #2845 (`e22a7da53`); OHOS dependency graph stayed compatible for `codewhale-tui` on `aarch64-unknown-linux-ohos`. |
| `./scripts/release/publish-crates.sh dry-run` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. The script performed full `cargo publish --dry-run` for crates without unpublished workspace dependencies and package-content verification for dependent workspace crates; expected 0.8.53 already-published warnings were observed. |
| `node scripts/release/npm-wrapper-smoke.js` after release build | release steward | ship | Passed locally on 2026-06-06 at `2561a54df` after `cargo build --release --locked -p codewhale-cli -p codewhale-tui`. The harness packed `codewhale-0.8.53.tgz`, served local release assets, and verified `npx --no-install codewhale doctor --help` plus `npx --no-install codewhale-tui --help`. |
| GitHub release asset verification before npm publish | release steward | post-tag/pre-npm gate | The live v0.9.0 GitHub Release does not exist yet. After tagging and before `npm publish`, verify the Release contains the expected platform archives, individual binaries, Windows installer/portable assets, `codewhale-artifacts-sha256.txt`, and `codewhale-bundles-sha256.txt`; `npm/codewhale/scripts/verify-release-assets.js` remains the npm prepublish asset guard. |
## Provider, Model, And Auth
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| DeepSeek V4 direct provider smoke | provider steward | ship | Passed locally on 2026-06-06 at `7bd68279e` using macOS 26.1 arm64 release binary: `./target/release/codewhale --provider deepseek --model deepseek-v4-flash exec "Reply exactly CODEWHALE_V09_SMOKE_OK and nothing else."` returned `CODEWHALE_V09_SMOKE_OK`. Redacted auth source: `codewhale auth status --provider deepseek` reported config-backed DeepSeek API key present, env unset, with no secret value printed. |
| Xiaomi MiMo token-plan and pay-as-you-go config smoke | provider steward | ship config evidence / require live smoke before tag if claiming provider availability | Config coverage exercises token-plan and pay-as-you-go env behavior in `crates/config/src/lib.rs` (`xiaomi_mimo_env_token_plan_mode_uses_token_plan_key_and_endpoint`, `xiaomi_mimo_env_pay_as_you_go_mode_prefers_standard_key`) and mirrors the TUI config path in `crates/tui/src/config.rs`; `docs/PROVIDERS.md` documents Token Plan regions and pay-as-you-go mode. This is config evidence only, not a live Xiaomi call. |
| Arcee Trinity Thinking route smoke or explicit defer | provider steward | defer live smoke / ship static route metadata | Static provider/model metadata exists in `docs/PROVIDERS.md`, `crates/agent/src/lib.rs`, and `crates/tui/src/config.rs`, but no live Arcee credential smoke has been recorded. Do not claim live Arcee route readiness in v0.9 release notes unless a dated manual smoke is added. |
| Hugging Face provider route and MCP concept helpers ship; native Hub search/passports are deferred | model-lab steward | ship foundation / defer native search-passport runtime | `ProviderKind::Huggingface`, env aliases, picker/docs, and `/hf concepts` / `/hf mcp status` distinguish the chat provider route from Hugging Face MCP and explicit Hub tooling. `docs/PROVIDERS.md` states native Hub HTTP search/passport picker metadata are not shipped behavior in this checkout; #2705/#2707/#2712 remain open for native Model Lab work. |
| OpenRouter, Novita, Fireworks, and Volcengine env behavior smoke | provider steward | ship config evidence / require live smoke before claiming live route coverage | Env/config tests cover OpenRouter, Novita, Fireworks, and Volcengine key/base-url/model override behavior in `crates/config/src/lib.rs`; TUI provider defaults and Volcengine env override are covered in `crates/tui/src/config.rs`, and `docs/PROVIDERS.md` documents the env/default behavior. This is env behavior evidence only, not live provider traffic. |
| Provider registry drift check covers aliases/default env keys | provider steward | ship | #2820 (`5d491bc68`) added the metadata-only provider registry and `scripts/check-provider-registry.py`; verification included `python3 scripts/check-provider-registry.py` and `cargo test -p codewhale-config provider_ -- --nocapture`. |
| Provider-scoped TLS skip-verify remains default-off and doctor-visible | security steward | ship | #2834 (`190e9f35e`, `6269cb91f`) landed provider-scoped TLS skip verify with default-off config, doctor warnings, docs, and CLI/runtime option tests. |
## Runtime Stability
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| Windows input/render smoke or documented manual verification | runtime steward | manual smoke required before tag | No dated Windows input/render smoke has been recorded on this matrix yet. Unit/shell-dispatcher tests are not a substitute for Windows ConPTY/manual input verification. |
| macOS and Linux TUI startup smoke | runtime steward | ship | macOS 26.1 arm64 evidence from 2026-06-06: release binaries built from the stewardship line reported `codewhale-tui 0.8.53 (2561a54df0ed)` and `codewhale 0.8.53 (2561a54df0ed)`, and `cargo test -p codewhale-tui --test qa_pty --locked` passed 6/6 startup/composer/keystroke PTY scenarios. Linux evidence from 2026-06-06: a streamed source archive built inside a Debian Bookworm arm64 `rust:1.88-bookworm` container with `libdbus-1-dev` / `pkg-config`; `cargo build --release --locked -p codewhale-cli -p codewhale-tui` passed and `./target/release/codewhale --version` / `./target/release/codewhale-tui --version` both ran successfully. |
| Large-repo startup smoke | runtime steward | defer full smoke / ship bounded-context mitigation evidence | Bounded project-context tests and changelog evidence cover the mitigation slice, but live large-workspace reports #697 and #1827 remain open. Do not close those issues or claim a full large-repo startup smoke without a dated manual run. |
| Sub-agent timeout/completion smoke | subagent steward | ship timeout/completion slice | `docs/SUBAGENTS.md` documents per-step timeout and heartbeat behavior; `crates/tui/src/tools/subagent/tests.rs` covers `api_timeout_preserves_checkpoint_and_agent_eval_continues_from_it`, parent completion ordering, and timeout propagation. Broader hung-agent issues #1806/#2614 remain open. |
| Long-running command live-state smoke | runtime steward | defer root-cause live-state smoke / ship shell-routing tests | Shell tests cover timeout/background/wait/cancel behavior and `shell_job_routing.rs` distinguishes live from stale process state, but #1786 remains open for shell PID/task-flow hangs and premature LIVE-state exit. |
| Runtime API remains token-protected for GUI clients | GUI steward | ship | #2811/#2814 documented and consumed the existing runtime token flow from the official VS Code extension; #2822 (`bb8835812`) added `GET /v1/snapshots` behind the same runtime API token middleware. |
| Snapshot/restore surfaces are read-only unless mutation semantics are tested | GUI steward | ship | #2822 (`bb8835812`) and #2828 (`293643e27`) expose restore points as read-only listing/Agent View metadata only; #2808 restore/retry/patch-undo mutation endpoints remain unmerged pending atomicity tests. |
## UI And Workflow UX
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| First-look screen included or explicitly deferred | UX steward | defer v0.9 redesign / keep existing onboarding | The existing onboarding welcome remains covered by `first_run_user_always_starts_at_welcome`; the opinionated v0.9 first-look/home redesign remains deferred to #2713 so release notes should not imply a new home screen. |
| Slash picker readability smoke | UX steward | ship | Focused slash-menu coverage exercises visibility/hide state, removed-command filtering, Up/Down wrap behavior, argument spacing, skill command insertion, inline skill mentions, Esc priority, and locked composer height while match counts change. Verification: `cargo test -p codewhale-tui slash_menu --locked`, `cargo test -p codewhale-tui try_autocomplete_slash_command_completes_skill_argument --locked`, and `cargo test -p codewhale-tui next_escape_action_slash_menu_takes_priority --locked`. |
| Transcript tool-collapse smoke or explicit defer | UX steward | ship | #2776 (`c76ec4752`) landed dense successful tool-run collapse with guardrails for failed/running/shell/patch/review/diff cells; focused widget coverage includes `chat_widget_collapses_dense_tool_runs_by_default`, `chat_widget_expands_dense_tool_runs_on_demand`, and `chat_widget_expanded_mode_leaves_dense_tool_runs_visible`. |
| Sidebar detail popovers smoke or explicit defer | UX steward | ship | #2778 (`3cb49233e`) added row-level hover metadata and wrapping detail popovers for truncated Work/Tasks/Agents rows; #2806 (`19f5c7aa6`) preserved current sub-agent progress in the sidebar hover text. Focused coverage includes `sidebar_hover_rows_mark_source_text_diff_as_truncated` and `subagent_hover_text_preserves_full_agent_id_and_progress`. |
| Plan review/handoff artifact smoke | Plan steward | ship | #2770 (`7ac8063b6`) added rich PlanArtifact sections through the transcript/Plan prompt path; focused coverage includes `plan_update_cell_renders_rich_artifact_metadata` and `plan_prompt_renders_rich_plan_artifact_sections`. |
| VS Code Agent View branch/workspace visibility smoke | GUI steward | ship | #2825 (`1bacaf763`) added `workspace` / `branch` metadata to `/v1/threads/summary`; #2832 (`50b773f1d`) added read-only auto-refresh so branch/workspace changes can appear without manual refresh. The current stewardship slice extends the same read-only metadata with current Git `head` and `dirty` worktree state for editor/agent-lane visibility. |
## v0.9.0 Feature Gates
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| WhaleFlow typed IR, mock executor, replay, TeacherReview, StudentReplay, and cutline docs are tested | WhaleFlow steward | ship | #2821/#2824/#2831/#2833/#2839/#2840/#2841 plus focused local `cargo test -p codewhale-whaleflow --locked`; #2670 closed after `cargo test -p codewhale-whaleflow starlark --locked` passed 7/7 on current stewardship head. The `rlm_cache_change.star` dogfood workflow now has recorded mock-trace replay coverage, including a missing-record divergence check. |
| Live `workflow_run`, worktree application, provider calls, and TraceStore writes are deferred until cancellation/replay/atomicity semantics pass | WhaleFlow steward | defer | #2669 and #2679 remain open for live runtime execution, provider calls, TraceStore writes, Arcee/student replay, and CLI/TUI workflow mode; current v0.9 branch ships mock executor/replay foundations only. |
| Model Lab / Hugging Face MVP is included or deferred with release-note wording | model-lab steward | ship provider/MCP docs foundation / defer native Model Lab MVP | v0.9 ships the Hugging Face chat-provider route, provider docs, and `/hf` concept/MCP status helpers only. Native Hub search, model passports, Spaces/Jobs workflows, and Model Lab eval/export surfaces remain deferred to #2705/#2707/#2710/#2712/#2727. |
| HarnessProfile runtime MVP is deferred; schema/resolver foundation ships with release-note wording | harness steward | ship foundation / defer runtime | #2844 (`efbcc681a`) documents the cutline; `HarnessPosture` / `HarnessProfile` config schema and strict validation are present; a pure resolver matches provider/model routes without changing runtime behavior; seed-profile runtime selection, telemetry, and status display remain follow-up work. |
| `codebase_search` MVP is included or deferred with release-note wording | search steward | defer runtime / ship design doc | `docs/CODEBASE_SEARCH_DESIGN.md` is explicitly doc-only and says no catalog code ships in this cycle; runtime tool registration, index/eval fixtures, and search implementation remain deferred to #2680. |
| External memory remains explicit/optional per `WHALEFLOW_EXTERNAL_MEMORY.md` | memory steward | ship | #2842 (`a7052751e`) added the external-memory cutline: optional/explicit workflow node/plugin only, visible state/owner/storage/scope, and no hidden default context substrate. |
## Remote Workbench
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| Remote workbench is marked included, experimental, or deferred | remote steward | defer runtime / ship setup docs only | `docs/REMOTE_VM_US.md`, `docs/REMOTE_SETUP_DESIGN.md`, and `docs/TENCENT_LIGHTHOUSE_HK.md` document possible VM/Telegram/Lark setup patterns, but no v0.9 remote workbench runtime is included. |
| If included: VM install smoke passes | remote steward | defer | Not applicable while remote workbench runtime is deferred; no v0.9 VM install smoke is required before tagging. |
| If included: Telegram bridge smoke passes | remote steward | defer | Not applicable while remote workbench runtime is deferred; Telegram bridge docs remain design/setup guidance only. |
| If deferred: release notes avoid implying remote workbench availability | remote steward | ship | Acceptance matrix and changelog wording must say setup/design docs only, not a shipped remote workbench feature. |
## Docs, Migration, And Rollback
| Gate | Owner | Ship/defer decision | Evidence |
| --- | --- | --- | --- |
| README, configuration docs, provider docs, and changelog agree | docs steward | ship | #2845 (`e22a7da53`) aligned README/config example/changelogs with the HarnessProfile cutline and removed stale `V0_9_0_EXECUTION_MAP` links. |
| Breaking changes, deprecations, and deferred v0.9 gates are listed in release notes | release steward | ship | Changelog and this matrix list deferred Model Lab/Hugging Face native Hub work, `codebase_search`, remote workbench runtime, WhaleFlow live runtime execution, HarnessProfile runtime selection, large-repo startup smoke, long-running command live-state smoke, and Arcee live smoke. `.github/workflows/release.yml` release-body text avoids stale v0.8.x-only shim wording and keeps CodeWhale as the canonical package/asset name. |
| Upgrade steps exist for users coming from `deepseek-tui` | docs steward | ship | `docs/REBRAND.md` documents npm/Cargo migration commands, legacy state fallback, binary/package/asset naming, and the v0.9.0 compatibility cutline. |
| Rollback steps exist for npm wrapper, Cargo install, and side-git restore | release steward | ship | `docs/INSTALL.md#roll-back-to-a-previous-release` and `docs/RELEASE_RUNBOOK.md#recovery-and-rollback` document pinned npm rollback, pinned Cargo rollback for both crates, exact-tag manual asset restore with checksums, and side-git `/restore list [N]` / `/restore <N>` workspace rollback. |
| Live GitHub Release body has its own contributor/credit section | release steward | post-tag/pre-npm gate | `.github/workflows/release.yml` now creates a dedicated `## Contributors` release-body section with v0.9 contributor, reporter, helper, and harvested-PR credits. The live v0.9.0 Release does not exist yet, so this remains a release-time verification gate before npm publish or completion. |
| Contributors/reporters/helpers from harvested PRs and linked issues are credited | release steward | ship local changelog / verify live body at release time | Changelog credits include harvested PR authors, issue reporters/helpers, and external/co-authored work including @Implementist, @jrcjrcc, and @punkcanyang. `python3 scripts/check-coauthor-trailers.py --author-map .github/AUTHOR_MAP --range origin/main..HEAD --check-authors` remains the local co-author-map gate; live release-body credits are covered by the row above. |
## Before Tagging
- [ ] Every `ship` row has evidence.
- [ ] Every `decide` row is changed to either `ship` with evidence or `defer`
with an owner and linked follow-up.
- [ ] Every `manual smoke required` row has dated smoke evidence, or is changed
to an explicit defer decision with an owner and linked follow-up.
- [ ] Draft integration PR CI is green on the exact commit that will be tagged.
- [ ] The release prompt points new agents to this matrix before any tag,
publish, or GitHub Release action.
+3 -1
View File
@@ -280,7 +280,9 @@ clearable, and scoped**:
`finalize`/`FINAL` is an *in-kernel Python function*, not a tool).
6. **Cached-main overlay** — promoted lessons from the cached main branch
(`/overlay`, §9).
7. **External memory (Aleph)** — large local data via the `aleph` skill.
7. **External memory (Aleph)** — large local data via the `aleph` skill;
see `docs/WHALEFLOW_EXTERNAL_MEMORY.md` for the v0.9.0 cutline that keeps
this optional, explicit, inspectable, and out of the default path.
**Why it helps weaker models.** The model never has to *guess* where a fact
should live or *re-derive* context it already established. Each layer has a
+72
View File
@@ -0,0 +1,72 @@
# WhaleFlow External Memory Cutline
This note resolves the v0.9.0 cutline for Aleph-style external memory in
WhaleFlow. It is a design boundary, not a runtime implementation.
## Decision
External memory should be optional and explicit for v0.9.0. Normal CodeWhale
operation must not depend on it, and WhaleFlow must not silently enable it for
long-running runs.
For v0.9.0, external memory can appear only as:
- an explicit workflow node whose inputs, outputs, scope, and permissions are
visible in the typed WhaleFlow IR;
- an optional plugin or skill-backed tool that the user enables deliberately;
- a documented experiment whose state can be inspected, cleared, and exported.
It should not be a hidden context substrate, a replacement for repo search, or a
default backing store for every workflow run.
## Layer Boundaries
External memory is separate from the existing memory and replay layers:
| Layer | Scope | v0.9.0 rule |
| --- | --- | --- |
| User memory | Small durable user preferences and facts surfaced by `/memory` | Opt-in, user-owned, not workflow evidence |
| Repo search / codemap | Derived repo structure and search results | Rebuildable from the workspace; not a memory log |
| ARMH/RLM memo | In-session working memory and exact-context memoization | Visible hit/miss telemetry; not durable replay evidence |
| TraceStore | Recorded workflow, branch, leaf, and control results | Source of deterministic replay; no live model calls during replay |
| Cached-main overlay | Promoted lessons after review and replay | Inspectable and reversible; never mutates Git main |
| External memory | Large local or plugin-backed data outside normal context | Explicit node/plugin only; visible state and clear/export required |
## Visibility Requirements
Any future external-memory implementation must show:
- when it is active;
- which workflow node or plugin owns it;
- where its state is stored;
- what repo or run scope it can read;
- whether it is included in replay, export, or promotion evidence;
- how to inspect, clear, pin, and export it.
The UI should treat this like an active context layer, not like invisible model
intuition. If a run cannot explain why a fact came from external memory, the
feature is not ready for default use.
## Permissions And Privacy
External memory must inherit the strictest relevant scope:
- it must not cross repo/workspace boundaries without explicit approval;
- project-local config must not silently enable broad external-memory reads;
- replay must record external-memory inputs as evidence or mark replay as
unavailable/diverged;
- exports must make external-memory references visible without dumping private
raw state by default.
## Deferred Work
The following remain out of scope for the v0.9.0 cutline:
- default-on Aleph-style memory for all WhaleFlow runs;
- automatic promotion from external memory into cached-main overlay;
- hidden retrieval behind ordinary prompts;
- hosted or shared external-memory services;
- treating external memory as a substitute for TraceStore replay.
Future implementation should start with a read-only typed workflow node and a
mock replay fixture before adding any plugin-backed or live retrieval path.
+10 -1
View File
@@ -64,6 +64,13 @@ Non-goals:
- no blocking of user input
- no transcript mutation from `turn_end`
Implementation note for the v0.9 branch: the narrow #2578 harvest uses the
shared structured observer path introduced for sub-agent lifecycle hooks. It
fires before queued follow-up dispatch, after queue-recovery state is known, so
the payload can report the queued-message count without letting a hook change
what gets sent next. Stdout is ignored for `turn_end`; only `message_submit`
has a stdout mutation contract.
### PR 3: Subagent lifecycle observer hooks
Expose subagent start and completion as observer-only hook events.
@@ -251,7 +258,9 @@ transcript content in the first version.
- Existing observer-only hooks keep working.
- Existing env vars remain available.
- `shell_env` keeps its existing stdout `KEY=VALUE` contract.
- Structured stdout is interpreted only by `message_submit` in PR 1.
- Structured stdout is interpreted only by `message_submit` in PR 1. Structured
observer hooks such as `turn_end`, `subagent_spawn`, and `subagent_complete`
receive JSON on stdin, but their stdout is ignored by the caller.
## 6. Review checkpoints
+167
View File
@@ -0,0 +1,167 @@
# RFC: Provider Fallback Chain
**Issue:** #2574
**Reporter:** @hsdbeebou
**Design source:** #2581 by @idling11
**Status:** Draft for the v0.9 provider-routing lane
**Date:** 2026-06-04
## Problem
CodeWhale can store credentials and defaults for several providers, but a
running session uses one active provider route at a time. When that provider
hits a rate limit, temporary outage, or transport failure, the user must notice
the failure, run `/provider`, choose another route, and resubmit the turn.
That manual switch is especially disruptive during long-running agentic work.
A provider fallback chain can keep work moving, but it also changes billing
source, model behavior, tool support, context-window limits, and vendor
expectations. The design must make that switch explicit and capability-aware.
## Principles
- Fallback is opt-in. No provider switch happens unless the user configured a
fallback chain.
- Billing and vendor changes are visible in the transcript and status UI.
- Normal retry policy runs before fallback.
- Fallback is allowed only before assistant content or tool calls have started
streaming for the failing request.
- Fallback candidates must support the request shape for the current turn.
- Authentication, authorization, malformed request, and model-not-found errors
do not silently switch providers by default.
## Proposed Config Shape
Keep the existing root `provider = "..."` setting as the primary route. Add an
ordered fallback list and a small policy section:
```toml
provider = "nvidia-nim"
fallback_providers = ["deepseek", "openrouter"]
[provider_fallback]
enabled = true
reset_on_new_session = true
```
Rules:
- `fallback_providers` is ordered and contains provider IDs already accepted by
the provider parser.
- The primary provider is not repeated in the fallback list.
- Duplicate fallback providers are rejected.
- Missing credentials produce a startup warning and make that fallback entry
inactive until credentials appear.
- If `provider_fallback.enabled` is absent, the presence of a non-empty
`fallback_providers` list enables fallback.
## Fallback Eligibility
| Failure | Fallback by default? | Notes |
| --- | --- | --- |
| HTTP 429 | Yes | Rate limit or quota exhaustion on the active route. |
| HTTP 502, 503, 504 | Yes | Temporary upstream failure after normal retries. |
| Connect timeout / DNS failure | Yes | Transport path failed before content streamed. |
| HTTP 401 / 403 | No | Usually bad credentials or account permissions. |
| HTTP 400 | No | Usually client request shape or model parameter issue. |
| Model not found | No | Avoid silently switching model families unless a future policy explicitly opts in. |
| Stream interrupted after content | No | The transcript may already contain partial assistant content or tool-call deltas. |
The first implementation should classify errors centrally and expose tests for
each case before any fallback execution is wired into the turn loop.
## Capability Gate
Before switching to a fallback provider/model, CodeWhale checks that the
candidate can support the current request shape:
| Requirement | Gate |
| --- | --- |
| Tool calls | Candidate provider/model must support tool calling. |
| Reasoning effort | Candidate must support the requested thinking mode, or the switch is blocked. |
| Context size | Candidate context window must fit the estimated current request. |
| Image inputs | Candidate must support vision if the turn includes images. |
| Provider-specific headers | Candidate request must be rebuilt from that provider's own auth/base-url/header rules. |
If no fallback candidate passes the gate, CodeWhale surfaces the original
provider error with a clear "fallback chain exhausted or incompatible" note.
## Runtime Behavior
1. Build the request for the active provider.
2. Run existing retry policy for that provider.
3. If retries exhaust with a fallback-eligible failure and no assistant content
has streamed, evaluate the next fallback provider.
4. Rebuild the request with the fallback provider's model, base URL, auth, and
provider-specific headers.
5. Add a visible transcript marker and status event before the fallback request
starts.
6. Continue through the chain until a provider succeeds, the chain is
exhausted, or a non-eligible failure occurs.
Suggested transcript marker:
```text
[provider fallback: nvidia-nim -> deepseek, reason: rate_limit]
```
Suggested status text:
```text
NVIDIA NIM unavailable; switched to DeepSeek fallback
```
For multi-request turns, such as tool-call result follow-ups, fallback can be
considered for a later request only if that later request has not started
streaming assistant content yet. The transcript marker must identify that the
turn changed provider between requests.
## UI and Commands
- `/provider` should show the primary route and the current fallback position.
- `/provider reset` should return to the primary provider for future requests in
the current session.
- The footer/statusline should surface the concrete provider/model that actually
handled the latest request.
- Session receipts should record both attempted provider and successful
provider so cost and debugging information stay truthful.
## Implementation Slices
1. Config schema and validation:
- parse `fallback_providers` and `[provider_fallback]`
- validate known providers, duplicates, missing credentials, and primary
self-reference
- document the config surface
2. Error classification:
- define fallback-eligible error kinds
- add unit tests for HTTP and transport failures
3. Request-shape capability gate:
- evaluate tool, thinking, context, and image requirements
- add tests for incompatible fallbacks
4. Fallback execution:
- run retries per provider before moving to the next provider
- rebuild auth/base-url/header state for each candidate
- block fallback after partial streaming
5. UI/receipt integration:
- status event
- transcript marker
- `/provider reset`
- receipt fields for attempted and selected provider
## Non-goals
- No automatic cost optimization or weighted provider selection.
- No silent fallback when authentication or permissions fail.
- No fallback after partial assistant content or tool-call deltas have streamed.
- No provider/model capability downgrades without an explicit future policy.
- No sub-agent-specific fallback policy in the first implementation; sub-agents
inherit the same configured fallback chain unless they are given an explicit
provider/model override.
## Credit
This RFC is based on issue #2574 from @hsdbeebou and PR #2581 from @idling11.
The original PR head currently has no net file changes, so this document
preserves the useful design direction while tightening the v0.9 contract around
truthful provider routing, billing visibility, and capability checks.