chore(release): merge v0.9.0-stewardship into v0.8.54

Includes Paulo's command parity and Gherkin E2E harnesses, HUQIANTAO's concurrency/security fixes, LeoAlex0's runtime_prompt slim, reidliu41's hotbar persistence, HarmonyOS scaffolding, Whaleflow foundation crate, and all v0.9.0 stabilization work.
2026-06-08 06:54:09 -07:00
parent edd28066e1 f88528a5a3
commit 78ae354fa4
237 changed files with 41229 additions and 4498 deletions
@@ -0,0 +1,56 @@
+# Agent Ethos
+
+CodeWhale is maintained with agents, but it is not maintained by automation
+alone. Treat community reports and patches as real collaboration: people are
+bringing us machines, providers, regions, shells, packages, and edge cases we
+could not cover by ourselves.
+
+## Stewardship
+
+- Verify live truth before acting. Check the current branch, release state,
+  registry state, CI, and linked issues instead of trusting a handoff.
+- Issues are intake, not a privilege boundary. Do not auto-close good-faith
+  issues because the reporter is not allowlisted. Ask for missing reproduction
+  detail and leave room for maintainer triage.
+- PR gates exist for code review, CI load, and trust-boundary safety. They are
+  not a quality judgment on the contributor. Keep dry-run mode unless a
+  maintainer deliberately enables enforcement, and use warm copy when the gate
+  comments.
+- Be generous with recurring contributors. When someone repeatedly brings
+  useful reports or patches, use `/lgtmi` for issue access or `/lgtm` for PR
+  access so the automation gets out of their way.
+- Preserve contributor credit. When harvesting work, inspect the PR and linked
+  issues, keep author/co-author attribution where possible, add
+  `Harvested from PR #N by @handle`, and credit the contributor in the
+  changelog or release notes.
+- Make credit machine-readable. If a harvested commit cannot preserve the
+  contributor as the author, add a `Co-authored-by` trailer with the GitHub
+  numeric noreply address from `.github/AUTHOR_MAP` or
+  `gh api users/<login> --jq '"\(.id)+\(.login)@users.noreply.github.com"'`.
+  Do not use `.local`, placeholder, bot/tool, or raw third-party emails for
+  human contributor credit.
+- Deferral is a maintainer action, not a dismissal. If a PR or issue is not
+  ready, say what is blocked, what evidence would change the decision, and
+  which part of the work remains valuable.
+
+## Agent Workflow
+
+- Use sub-agents for exploration, review, and verification, but keep a human
+  maintainer posture in the parent session. Sub-agent output is evidence; the
+  parent is responsible for the final decision.
+- Personally review community PRs before merging, harvesting, closing, or
+  deferring them. Do not close work based only on title, labels, or an agent's
+  summary.
+- Prefer narrow, reversible changes that match the existing codebase. Avoid
+  drive-by refactors while harvesting community work.
+- Run the smallest meaningful validation first, then broaden tests when a
+  change touches shared behavior, release plumbing, auth, sandboxing,
+  providers, or UI workflows.
+- Do not tag, publish, push release artifacts, or create GitHub releases
+  without explicit maintainer approval.
+
+## Product Tone
+
+CodeWhale should feel like a capable coding harness with a public community,
+not a closed queue. Automation should reduce maintainer load while making
+contributors feel seen, credited, and able to keep helping.
@@ -209,6 +209,25 @@ legacy top-level `base_url`, so the OpenAI-compatible provider receives it.
 provider tables in one config, `[providers.openai].model` can be used as the
 OpenAI-provider-specific override.

+If the gateway accepts `POST /chat/completions` but rejects
+`/v1/chat/completions`, set a provider-local `path_suffix`:
+
+```toml
+[providers.openai]
+base_url = "https://your-gateway.example/v1"
+path_suffix = "/chat/completions"
+```
+
+The suffix applies only to chat-completion requests. Model listing and
+DeepSeek beta paths keep their built-in routing so a generic gateway override
+does not accidentally rewrite `/models` or `/beta/completions`.
+
+For private gateways with broken or intercepted certificates, prefer
+`SSL_CERT_FILE` with a trusted CA bundle. As a last resort, a provider table can
+set `insecure_skip_tls_verify = true`; this disables certificate verification
+only for the active LLM provider client, leaves other HTTP clients unchanged,
+and is reported by `codewhale doctor`.
+
 Local HTTP endpoints such as Ollama, SGLang, and vLLM are allowed by default
 when they use localhost or loopback addresses. For a non-local `http://`
 gateway, launch with `DEEPSEEK_ALLOW_INSECURE_HTTP=1` only on a trusted network:
@@ -246,6 +265,13 @@ api_key = "YOUR_XIAOMI_KEY"
 base_url = "https://api.xiaomimimo.com/v1"
 ```

+The example above uses Xiaomi MiMo's pay-as-you-go OpenAI-compatible endpoint.
+If you are using a Token Plan key (`tp-...`) for `[vision_model]`, you must set
+`base_url` explicitly because this generic OpenAI-compatible block does not
+auto-select MiMo endpoints. Use
+`https://token-plan-sgp.xiaomimimo.com/v1` for Singapore accounts or
+`https://token-plan-cn.xiaomimimo.com/v1` for China-region accounts.
+
 To bootstrap MCP and skills directories at their resolved paths, run `codewhale-tui setup`.
 To only scaffold MCP, run `codewhale-tui mcp init`.

@@ -352,6 +378,35 @@ Select a profile with:

 If a profile is selected but missing, codewhale exits with an error listing available profiles.

+## Harness Profiles
+
+v0.9 adds a config data model for model-specific harness posture. This is a
+preview schema: it can be parsed and tested, but runtime provider/model
+selection and prompt/tool behavior are wired in later v0.9 slices.
+When no configured profile matches, the resolver falls back to built-in seed
+profiles for the model families listed in the cutline doc. Configured profiles
+always take precedence over those seeds.
+
+```toml
+[[harness_profiles]]
+provider_route = "deepseek"
+model_pattern = "deepseek-v4.*"
+
+[harness_profiles.posture]
+kind = "cache-heavy"          # standard | cache-heavy | lean | custom
+max_subagents = 10            # 0 means runtime default
+prefer_codebase_search = false
+compaction_strategy = "prefix-cache" # default | prefix-cache | aggressive
+tool_surface = "full"              # full | read-only | auto
+safety_posture = "standard"        # standard | strict | permissive
+```
+
+Unknown posture names or unknown keys inside a harness profile fail config
+deserialization instead of silently becoming `custom`. That is intentional:
+once runtime wiring consumes these profiles, a typo should be visible.
+The v0.9 implementation order and automatic-creator boundary are documented in
+[`HARNESS_PROFILE_CUTLINE.md`](HARNESS_PROFILE_CUTLINE.md).
+
 ## Environment Variables

 Most runtime environment variables override config values. API-key variables are
@@ -390,13 +445,17 @@ Remaining variables:
 - `VOLCENGINE_MODEL` or `VOLCENGINE_ARK_MODEL`
 - `OPENROUTER_API_KEY`
 - `OPENROUTER_BASE_URL`
- `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, or `MIMO_API_KEY`
+- `XIAOMI_MIMO_TOKEN_PLAN_API_KEY`, `MIMO_TOKEN_PLAN_API_KEY`, `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, or `MIMO_API_KEY`
 - `XIAOMI_MIMO_BASE_URL` or `MIMO_BASE_URL`
 - `XIAOMI_MIMO_MODEL` or `MIMO_MODEL`
+- `XIAOMI_MIMO_MODE` or `MIMO_MODE`
 - `NOVITA_API_KEY`
 - `NOVITA_BASE_URL`
 - `FIREWORKS_API_KEY`
 - `FIREWORKS_BASE_URL`
+- `HUGGINGFACE_API_KEY` or `HF_TOKEN` (`HF_TOKEN` is a fallback alias accepted when provider is `huggingface`)
+- `HUGGINGFACE_BASE_URL` or `HF_BASE_URL`
+- `HUGGINGFACE_MODEL` or `HF_MODEL`
 - `SILICONFLOW_API_KEY`
 - `SILICONFLOW_BASE_URL`
 - `SILICONFLOW_MODEL`
@@ -555,6 +614,61 @@ the message. Existing environment variables remain available.
 `shell_env` hooks keep their existing `KEY=VALUE` stdout contract;
 the JSON stdout contract applies only to `message_submit`.

+### Turn-end observer hooks
+
+`turn_end` hooks observe the end of each model turn after post-turn
+state, usage totals, cost accounting, notifications, receipts, and
+queue recovery have been updated. They receive JSON on stdin and are
+observer-only: stdout is ignored, failures are logged as warnings, and
+the hook cannot block user input, mutate the transcript, or change the
+next queued follow-up.
+
+```toml
+[[hooks.hooks]]
+event = "turn_end"
+command = "~/.codewhale/hooks/turn-audit.sh"
+timeout_secs = 2
+continue_on_error = true
+```
+
+The payload includes common hook metadata plus post-turn accounting:
+
+```json
+{
+  "event": "turn_end",
+  "session_id": "sess_12345678",
+  "workspace": "/path/to/workspace",
+  "mode": "agent",
+  "model": "deepseek-chat",
+  "turn_id": "turn_12345678",
+  "status": "completed",
+  "error": null,
+  "duration_ms": 1834,
+  "usage": {
+    "input_tokens": 1200,
+    "output_tokens": 180,
+    "prompt_cache_hit_tokens": 900,
+    "prompt_cache_miss_tokens": 300,
+    "reasoning_tokens": null,
+    "reasoning_replay_tokens": null
+  },
+  "totals": {
+    "session_tokens": 1380,
+    "conversation_tokens": 1380,
+    "input_tokens": 1200,
+    "output_tokens": 180
+  },
+  "tool_count": 2,
+  "queued_message_count": 1,
+  "stop_hook_active": false
+}
+```
+
+For `interrupted` or `failed` turns, `status` reflects that terminal
+state and `error` carries the engine error string when one is available.
+`stop_hook_active` is reserved for future re-entry protection and is
+currently always `false`.
+
 ### Sub-agent lifecycle hooks

 `subagent_spawn` and `subagent_complete` hooks observe sub-agent lifecycle
@@ -741,9 +855,11 @@ If you are upgrading from older releases:

 ### Core keys (used by the TUI/engine)

- `provider` (string, optional): `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `siliconflow-CN`, `arcee`, `moonshot`, `sglang`, `vllm`, or `ollama`. Legacy `deepseek-cn` configs are still accepted as an alias for `deepseek`; DeepSeek uses the same official host [`https://api.deepseek.com`](https://api-docs.deepseek.com/) worldwide. `nvidia-nim` targets NVIDIA's NIM-hosted DeepSeek endpoints through `https://integrate.api.nvidia.com/v1`; `openai` targets a generic OpenAI-compatible endpoint, defaulting to `https://api.openai.com/v1`; `atlascloud` targets AtlasCloud's OpenAI-compatible endpoint at `https://api.atlascloud.ai/v1`; `wanjie-ark` targets Wanjie Ark's OpenAI-compatible endpoint at `https://maas-openapi.wanjiedata.com/api/v1`; `volcengine` targets Volcengine Ark's OpenAI-compatible coding endpoint at `https://ark.cn-beijing.volces.com/api/coding/v3`; `openrouter` targets `https://openrouter.ai/api/v1`; `xiaomi-mimo` targets Xiaomi MiMo's OpenAI-compatible endpoint at `https://api.xiaomimimo.com/v1`; `novita` targets `https://api.novita.ai/v1`; `fireworks` targets `https://api.fireworks.ai/inference/v1`; `siliconflow` targets SiliconFlow, defaulting to `https://api.siliconflow.com/v1`; `siliconflow-CN` targets the SiliconFlow China regional endpoint while sharing `[providers.siliconflow]`; `arcee` targets Arcee AI's OpenAI-compatible endpoint at `https://api.arcee.ai/api/v1`; `moonshot` targets Moonshot/Kimi, defaulting to `https://api.moonshot.ai/v1`; `sglang` targets a self-hosted OpenAI-compatible endpoint, defaulting to `http://localhost:30000/v1`; `vllm` targets a self-hosted vLLM OpenAI-compatible endpoint, defaulting to `http://localhost:8000/v1`; `ollama` targets Ollama's OpenAI-compatible endpoint, defaulting to `http://localhost:11434/v1`.
+- `provider` (string, optional): `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `siliconflow-CN`, `arcee`, `moonshot`, `sglang`, `vllm`, or `ollama`. Legacy `deepseek-cn` configs are still accepted as an alias for `deepseek`; DeepSeek uses the same official host [`https://api.deepseek.com`](https://api-docs.deepseek.com/) worldwide. `nvidia-nim` targets NVIDIA's NIM-hosted DeepSeek endpoints through `https://integrate.api.nvidia.com/v1`; `openai` targets a generic OpenAI-compatible endpoint, defaulting to `https://api.openai.com/v1`; `atlascloud` targets AtlasCloud's OpenAI-compatible endpoint at `https://api.atlascloud.ai/v1`; `wanjie-ark` targets Wanjie Ark's OpenAI-compatible endpoint at `https://maas-openapi.wanjiedata.com/api/v1`; `volcengine` targets Volcengine Ark's OpenAI-compatible coding endpoint at `https://ark.cn-beijing.volces.com/api/coding/v3`; `openrouter` targets `https://openrouter.ai/api/v1`; `xiaomi-mimo` targets Xiaomi MiMo's OpenAI-compatible endpoint, using `https://token-plan-sgp.xiaomimimo.com/v1` by default for Token Plan keys (`tp-...`) and `https://api.xiaomimimo.com/v1` for pay-as-you-go keys; set `base_url` explicitly if your Token Plan account uses the China region; `novita` targets `https://api.novita.ai/v1`; `fireworks` targets `https://api.fireworks.ai/inference/v1`; `siliconflow` targets SiliconFlow, defaulting to `https://api.siliconflow.com/v1`; `siliconflow-CN` targets the SiliconFlow China regional endpoint while sharing `[providers.siliconflow]`; `arcee` targets Arcee AI's OpenAI-compatible endpoint at `https://api.arcee.ai/api/v1`; `moonshot` targets Moonshot/Kimi, defaulting to `https://api.moonshot.ai/v1`; `sglang` targets a self-hosted OpenAI-compatible endpoint, defaulting to `http://localhost:30000/v1`; `vllm` targets a self-hosted vLLM OpenAI-compatible endpoint, defaulting to `http://localhost:8000/v1`; `ollama` targets Ollama's OpenAI-compatible endpoint, defaulting to `http://localhost:11434/v1`.
 - `api_key` (string, required for hosted providers): must be non-empty for DeepSeek/hosted providers (or set the provider API key env var). Self-hosted SGLang, vLLM, and Ollama can omit it.
- `base_url` (string, optional): defaults to `https://api.deepseek.com/beta` for DeepSeek's OpenAI-compatible Chat Completions API, including legacy `provider = "deepseek-cn"` configs. Other defaults are `https://integrate.api.nvidia.com/v1` for `nvidia-nim`, `https://api.openai.com/v1` for `openai`, `https://api.atlascloud.ai/v1` for `atlascloud`, `https://maas-openapi.wanjiedata.com/api/v1` for `wanjie-ark`, `https://ark.cn-beijing.volces.com/api/coding/v3` for `volcengine`, `https://openrouter.ai/api/v1` for `openrouter`, `https://api.xiaomimimo.com/v1` for `xiaomi-mimo`, `https://api.novita.ai/v1` for `novita`, `https://api.fireworks.ai/inference/v1` for `fireworks`, `https://api.siliconflow.com/v1` for `siliconflow`, `https://api.siliconflow.cn/v1` for `siliconflow-CN`, `https://api.arcee.ai/api/v1` for `arcee`, `https://api.moonshot.ai/v1` for `moonshot`, `http://localhost:30000/v1` for `sglang`, `http://localhost:8000/v1` for `vllm`, and `http://localhost:11434/v1` for `ollama`. Set `https://api.deepseek.com` or `https://api.deepseek.com/v1` explicitly to opt out of DeepSeek beta features.
+- `base_url` (string, optional): defaults to `https://api.deepseek.com/beta` for DeepSeek's OpenAI-compatible Chat Completions API, including legacy `provider = "deepseek-cn"` configs. Other defaults are `https://integrate.api.nvidia.com/v1` for `nvidia-nim`, `https://api.openai.com/v1` for `openai`, `https://api.atlascloud.ai/v1` for `atlascloud`, `https://maas-openapi.wanjiedata.com/api/v1` for `wanjie-ark`, `https://ark.cn-beijing.volces.com/api/coding/v3` for `volcengine`, `https://openrouter.ai/api/v1` for `openrouter`, `https://token-plan-sgp.xiaomimimo.com/v1` for `xiaomi-mimo` when the API key starts with `tp-...` and `https://api.xiaomimimo.com/v1` otherwise, `https://api.novita.ai/v1` for `novita`, `https://api.fireworks.ai/inference/v1` for `fireworks`, `https://api.siliconflow.com/v1` for `siliconflow`, `https://api.siliconflow.cn/v1` for `siliconflow-CN`, `https://api.arcee.ai/api/v1` for `arcee`, `https://api.moonshot.ai/v1` for `moonshot`, `http://localhost:30000/v1` for `sglang`, `http://localhost:8000/v1` for `vllm`, and `http://localhost:11434/v1` for `ollama`. Set `base_url = "https://token-plan-cn.xiaomimimo.com/v1"` explicitly if your Xiaomi MiMo Token Plan account is provisioned in the China region. Set `https://api.deepseek.com` or `https://api.deepseek.com/v1` explicitly to opt out of DeepSeek beta features.
+- `path_suffix` (string, optional provider-table key): override the chat-completions path for OpenAI-compatible gateways that do not serve `/v1/chat/completions`. For example, `[providers.openai] path_suffix = "/chat/completions"` sends chat requests to the unversioned base URL plus `/chat/completions`; `models` and `beta/*` requests keep their normal routing.
+- `insecure_skip_tls_verify` (bool, optional provider-table key): disabled by default. When true on the active provider table, only the LLM provider HTTP client skips TLS certificate verification. Prefer `SSL_CERT_FILE` for corporate or private CA bundles; `codewhale doctor` reports this setting when enabled.
 - `default_text_model` (string, optional): defaults to `deepseek-v4-pro` for DeepSeek and generic OpenAI-compatible endpoints, `deepseek-ai/deepseek-v4-pro` for NVIDIA NIM, `deepseek-ai/deepseek-v4-flash` for AtlasCloud, `deepseek-reasoner` for Wanjie Ark, `DeepSeek-V4-Pro` for Volcengine Ark, `deepseek/deepseek-v4-pro` for OpenRouter and Novita, `mimo-v2.5-pro` for Xiaomi MiMo, `accounts/fireworks/models/deepseek-v4-pro` for Fireworks, `deepseek-ai/DeepSeek-V4-Pro` for SiliconFlow, `trinity-large-thinking` for Arcee AI, `kimi-k2.6` for Moonshot, `deepseek-ai/DeepSeek-V4-Pro` for SGLang/vLLM, and `deepseek-coder:1.3b` for Ollama. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows, 384K max output, and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash` until July 24, 2026, except SiliconFlow maps `deepseek-reasoner` and `deepseek-r1` to its Pro model while `deepseek-chat` and `deepseek-v3` map to Flash. Provider-specific mappings translate `deepseek-v4-pro` / `deepseek-v4-flash` to each provider's model ID where supported. OpenRouter also recognizes recent large IDs such as `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, and `moonshotai/kimi-k2.6`; direct Arcee uses bare IDs such as `trinity-large-thinking` and `trinity-large-preview`; direct Xiaomi MiMo recognizes chat IDs `mimo-v2.5-pro` and `mimo-v2.5`, while TTS IDs are selected through `codewhale speech` / `tts`. Generic `openai`, `atlascloud`, `wanjie-ark`, `xiaomi-mimo`, `arcee`, and Ollama model IDs are passed through unchanged after known aliases are normalized. OpenRouter and SiliconFlow provider configs with a custom `base_url` also preserve explicit model values, which lets OpenAI-compatible gateways accept bare model IDs. Use `/models` or `codewhale models` to discover live IDs from your configured endpoint. `CODEWHALE_MODEL` overrides this for a single process; `DEEPSEEK_MODEL` is the legacy alias.
 - `reasoning_effort` (string, optional): `off`, `low`, `medium`, `high`, or `max`; defaults to the configured UI tier. DeepSeek Platform receives top-level `thinking` / `reasoning_effort` fields. NVIDIA NIM receives equivalent settings through `chat_template_kwargs`.
 - `allow_shell` (bool, optional): defaults to `false`; shell tools must be explicitly enabled.
@@ -759,8 +875,10 @@ If you are upgrading from older releases:
  records loaded next to `config.toml`, for example
  `~/.codewhale/permissions.toml`. This schema foundation accepts
  `[[rules]]` entries with `tool` plus optional `command` or `path` fields.
-  It intentionally does not accept typed allow/deny records or provide approval
-  UI persistence yet.
+  Loaded rules feed the execution policy engine and force approval in approval
+  modes that can ask; under `approval_policy = "never"`, matching ask rules are
+  rejected because no prompt can be shown. This intentionally does not accept
+  typed allow/deny records, glob expansion, or approval UI persistence yet.
 - `managed_config_path` (string, optional): managed config file loaded after user/env config.
 - `requirements_path` (string, optional): requirements file used to enforce allowed approval/sandbox values.
 - `max_subagents` (int, optional): defaults to `10` and is clamped to `1..=20`.
@@ -851,18 +969,22 @@ If you are upgrading from older releases:
  turns whose elapsed time meets `threshold_secs`; failed and cancelled
  turns are silent. `auto` resolves to `osc9` for `iTerm.app`, `Ghostty`,
  and `WezTerm` (detected via `$TERM_PROGRAM`). Otherwise the fallback is
-  `bel` on macOS / Linux and `off` on Windows (where BEL maps to the
-  system error chime — see the [Notifications](#notifications) section
-  for the full rationale, #583).
+  `bel`; on Windows the BEL path is routed through `MessageBeep(MB_OK)`.
 - `[notifications].threshold_secs` (int, optional): defaults to `30`.
  Only completed turns whose elapsed time meets or exceeds this fire a
  notification.
 - `[notifications].include_summary` (bool, optional): defaults to
  `false`. When `true`, the notification body includes the elapsed
  duration and the turn's cost in the configured display currency.
+- `[notifications].completion_sound` (string, optional): `off`, `beep`,
+  `bell`, or `file`. Defaults to `beep`. `file` plays the WAV path from
+  `[notifications].sound_file` on Windows.
+- `[notifications].sound_file` (path, optional): path to a custom WAV file
+  used when `completion_sound = "file"`.
 - `tui.alternate_screen` (string, optional): `auto`, `always`, or `never`. This is retained for config compatibility, but interactive sessions now always use the TUI-owned alternate screen so host terminal scrollback cannot hijack the viewport.
 - `tui.mouse_capture` (bool, optional, default `true` on non-Windows terminals and on Windows Terminal/ConEmu/Cmder when the alternate screen is active; `false` on legacy Windows console and inside JetBrains JediTerm — PyCharm/IDEA/CLion/etc. — where mouse-event escapes leak into the input stream as garbled text, see #878 / #898): enable internal mouse scrolling, transcript selection, right-click context actions, and transcript scrollbar dragging. TUI-owned drag selection copies only transcript text, removes visual wrap-column line breaks from paragraphs, and keeps selection scoped to the transcript pane. Set this to `false` or run with `--no-mouse-capture` for raw terminal selection; set it to `true` or run with `--mouse-capture` to opt in anywhere it's defaulted off. On raw terminal selection, especially on legacy Windows console or when mouse capture is disabled, selection may cross the right sidebar and include visual wraps because the terminal, not the TUI, owns the selection.
 - `tui.terminal_probe_timeout_ms` (int, optional, default `500`): startup terminal-mode probe timeout in milliseconds. Values are clamped to `100..=5000`; timeout emits a warning and aborts startup instead of hanging indefinitely.
+- `tui.stream_chunk_timeout_secs` (int, optional, default `300`): per-SSE-chunk idle timeout for streamed model responses. Slow local or compatible servers can raise this with `/config stream_chunk_timeout_secs <seconds>`; `0` maps to the default and explicit values must be `1..=3600`. The legacy `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` env var is still honored when this key is omitted.
 - `tui.osc8_links` (bool, optional, default `true`): emit OSC 8 escape sequences around URLs in transcript output so terminals that support them (iTerm2, Terminal.app 13+, Ghostty, Kitty, WezTerm, Alacritty, recent gnome-terminal/konsole) render them as Cmd+click hyperlinks. Terminals without OSC 8 support render the plain URL and ignore the escape. Set `false` for terminals that misrender the sequence; selection/clipboard output always strips the escapes.
 - `hooks` (optional): lifecycle hooks configuration (see `config.example.toml`).
 - `features.*` (optional): feature flag overrides (see below).
@@ -922,16 +1044,22 @@ The TUI can emit a desktop notification (OSC 9 escape or plain BEL) when a turn
 method          = "auto"  # auto | osc9 | bel | off
 threshold_secs  = 30      # only notify when the turn took >= this many seconds
 include_summary = false   # include elapsed time + cost in the notification body
+completion_sound = "beep" # off | beep | bell | file
+sound_file = "E:\\google\\downloads\\notify.wav" # for completion_sound = "file"
 ```

 Method semantics:

- `auto` (default) — picks `osc9` for `iTerm.app`, `Ghostty`, and `WezTerm` (detected via `$TERM_PROGRAM`). On macOS and Linux it falls back to `bel`. **On Windows the fallback is `off`** instead of `bel`, because the Windows audio stack maps `\x07` to the `SystemAsterisk` / `MB_OK` chime — the same sound application error popups use, so a successful-turn notification ends up sounding like an error (#583).
+- `auto` (default) — picks `osc9` for `iTerm.app`, `Ghostty`, and `WezTerm` (detected via `$TERM_PROGRAM`). Otherwise it falls back to `bel`; on Windows that BEL path is routed through `MessageBeep(MB_OK)`.
 - `osc9` — emit `\x1b]9;<msg>\x07`. Inside tmux the sequence is wrapped in DCS passthrough so it reaches the outer terminal.
 - `bel` — emit a single `\x07` byte. Use this on Windows only if you actively want the chime back.
 - `off` — disable post-turn notifications entirely.

-Windows users who run inside a known OSC-9 terminal (e.g. WezTerm on Windows) keep getting OSC-9 notifications; the `off` fallback only applies when no recognised `TERM_PROGRAM` is detected.
+Windows users who run inside a known OSC-9 terminal (e.g. WezTerm on Windows) keep getting OSC-9 notifications. Set `method = "off"` to disable threshold-based desktop notifications entirely.
+
+`completion_sound = "file"` is for Windows users who want a per-application
+completion sound without changing the global Windows sound scheme. It plays the
+configured WAV `sound_file` asynchronously via the native Windows audio API.

 ### Parsed but currently unused (reserved for future versions)

@@ -978,8 +1106,15 @@ Use `codewhale-tui features list` to inspect known flags and their effective sta
 `web_search` uses DuckDuckGo by default and does not require an API key. The
 DuckDuckGo path keeps a Bing fallback when DDG returns a bot challenge or no
 parseable results. Bing remains selectable for users who explicitly want it,
-and Tavily, Bocha, Metaso, or Baidu can be selected when an API-backed provider
-is preferred.
+and Tavily, Bocha, Metaso, Baidu, Volcengine, or Sofya can be selected when an
+API-backed provider is preferred.
+
+For a private/internal search service that serves DuckDuckGo-compatible HTML,
+keep `provider = "duckduckgo"` and set `base_url`; CodeWhale appends the `q`
+query parameter to that endpoint and applies network policy to its host.
+Custom endpoints do not fall back to public Bing. `CODEWHALE_SEARCH_BASE_URL`
+can override this per process; `DEEPSEEK_SEARCH_BASE_URL` remains accepted as
+the legacy alias.

 **Metaso** ([metaso.cn](https://metaso.cn)) has a 100 searches/day free quota;
 set `METASO_API_KEY` or `[search] api_key` for a higher quota.
@@ -989,10 +1124,16 @@ set `METASO_API_KEY` or `[search] api_key` for a higher quota.
 `BAIDU_SEARCH_API_KEY` or `[search] api_key`. This is a search-tool backend
 only; it does not add a Baidu model provider.

+**Sofya** ([sofya.co](https://sofya.co)) returns full extracted page content
+rather than snippets. Set `[search] api_key` to your `ay_live_...` key, or the
+`SOFYA_API_KEY` env var. This is a search-tool backend only; it does not add a
+Sofya model provider.
+
 ```toml
 [search]
-provider = "baidu" # duckduckgo | bing | tavily | bocha | metaso | baidu
-# api_key = "YOUR_KEY" # required for tavily, bocha, and baidu; optional for metaso
+provider = "baidu" # duckduckgo | bing | tavily | bocha | metaso | baidu | volcengine | sofya
+# base_url = "https://search.example/html/" # optional with provider = "duckduckgo"
+# api_key = "YOUR_KEY" # required for tavily, bocha, baidu, volcengine, and sofya; optional for metaso
 ```

 ## Local Media Attachments
@@ -232,6 +232,11 @@ Or switch directly:

 Plan mode is the safest place to start in an unfamiliar repository. It is for
 inspection and decision-making, not file edits.
+For non-trivial work, Plan mode's confirmation prompt can show a grounded
+PlanArtifact: objective, context, sources used, critical files, constraints,
+approach, verification plan, risks, and handoff notes. Empty sections are
+visible when the agent uses the rich artifact shape, so you can ask for a
+revision instead of accepting an under-specified plan.

 Agent mode is the default for most contribution work. It lets CodeWhale read,
 run checks, and edit files while keeping risky actions behind approval gates.
@@ -0,0 +1,79 @@
+# Harness Profile Cutline
+
+This note defines the v0.9.0 order for HarnessProfile work. The automatic
+Harness Creator must not run before the profile schema, resolver, seed
+profiles, and user-visible status surfaces are explicit and tested.
+
+## Decision
+
+For v0.9.0, CodeWhale should treat harness profiles as typed policy data first.
+Automatic profile evolution is deferred until replay evidence, candidate
+manifests, and promotion gates exist.
+
+The first implementation lane stops at:
+
+1. `HarnessPosture` enum and policy knobs.
+2. `HarnessProfile` schema and registry.
+3. Deterministic profile resolver.
+4. Seed profiles for common model families.
+5. Repo constitution overlay input.
+6. Status/UX display of the resolved provider, model, profile, and repo law.
+
+Only after those surfaces are visible and tested should CodeWhale add evidence
+stores, candidate manifests, promotion gates, or an agentic Harness Creator.
+
+## Required Seed Profiles
+
+| Model family | Intended posture | Notes |
+| --- | --- | --- |
+| DeepSeek V4 Pro / Flash | cache-heavy | Preserve prefix stability and large-context continuity. |
+| Xiaomi MiMo v2.5 Pro / Flash | cache-heavy | Similar long-context/cache posture, but route and auth remain distinct from DeepSeek. |
+| Arcee Trinity Thinking | cache-heavy or explicit Arcee profile | Direct Arcee IDs such as `trinity-large-thinking` must not be hidden behind OpenRouter aliases. |
+| Hugging Face / local / open-weight routes | lean | Prefer smaller context packs, stricter tool surfaces, and subagent-oriented decomposition. |
+| Generic OpenAI-compatible gateways | standard unless matched | Do not infer provider-specific posture from a bare endpoint alone. |
+
+Provider route, endpoint, model id, HarnessProfile, and repo constitution must be
+separately visible. A profile resolver may choose a profile, but it must not
+silently change provider auth, base URLs, model IDs, tool allowlists, or repo
+permissions.
+
+## Repo Constitution Boundary
+
+`.codewhale/constitution.json` is local repo law, not another provider profile.
+The resolver may read it as an input after project trust checks, but profile
+selection must show both:
+
+- the model-facing posture, such as `cache-heavy` or `lean`;
+- the repo-law source, such as `.codewhale/constitution.json` or none.
+
+## Automatic Evolution Boundary
+
+AHE/GEPA-style profile evolution is future work. It can be referenced as
+inspiration only after the text distinguishes these stages:
+
+1. candidate proposal from recorded evidence;
+2. replay/eval against a weaker or constrained student;
+3. promotion-gate decision with required tests and policy checks;
+4. inspectable overlay update or rollback.
+
+No v0.9.0 harness profile should be silently promoted, mutated, or written to a
+cached-main overlay by the schema/resolver/display lane.
+
+## Smoke Evidence
+
+Before v0.9.0 ships with HarnessProfile runtime behavior beyond schema parsing
+and pure resolver checks, the acceptance matrix should record evidence for:
+
+- DeepSeek V4 resolving to a cache-heavy profile;
+- Xiaomi MiMo resolving to a cache-heavy profile without sharing DeepSeek auth;
+- Arcee direct `trinity-large-thinking` resolving through the direct `arcee`
+  route, not the OpenRouter `arcee-ai/trinity-large-thinking` alias;
+- a generic/HF/local model resolving to a lean or standard profile;
+- the TUI or runtime status surface showing provider, model, profile, and repo
+  constitution separately;
+- no automatic profile mutation during normal Agent or WhaleFlow runs.
+
+For v0.9.0, pure resolver tests may satisfy the profile-selection evidence, but
+status display and runtime use remain deferred until separate PRs wire those
+surfaces deliberately. Release notes should still call HarnessProfile a typed
+schema/resolver foundation rather than an automatic harness creator.
@@ -0,0 +1,92 @@
+# HarmonyOS and OpenHarmony
+
+This page covers CodeWhale on HarmonyOS PC and OpenHarmony cross-build setups.
+
+## Running On HarmonyOS PC
+
+HarmonyOS PC can use the normal Linux ARM64 package when its userspace is
+glibc-compatible:
+
+```bash
+npm i -g codewhale
+codewhale --version
+```
+
+You can also download `codewhale-linux-arm64` and
+`codewhale-tui-linux-arm64` from the GitHub Releases page and place both
+binaries on `PATH`.
+
+## Cross-Compiling To OpenHarmony
+
+The repository does not check in machine-specific SDK paths. Set
+`OHOS_NATIVE_SDK` to the OpenHarmony native SDK directory, the directory that
+contains `llvm/bin`, `sysroot`, and `build/cmake/ohos.toolchain.cmake`.
+
+On Windows PowerShell:
+
+```powershell
+$env:OHOS_NATIVE_SDK="<path-to-openharmony-native-sdk>"
+. .\scripts\ohos-env.ps1
+rustup target add aarch64-unknown-linux-ohos
+cargo build --target aarch64-unknown-linux-ohos -p codewhale-cli
+```
+
+On Linux or macOS:
+
+```bash
+export OHOS_NATIVE_SDK=/path/to/openharmony/native
+. ./scripts/ohos-env.sh
+rustup target add aarch64-unknown-linux-ohos
+cargo build --target aarch64-unknown-linux-ohos -p codewhale-cli
+```
+
+The setup scripts export Cargo's target-specific `linker`, `AR`, `CC`, `CXX`,
+`CFLAGS`, `CXXFLAGS`, `CARGO_ENCODED_RUSTFLAGS`, `CC_SHELL_ESCAPED_FLAGS`, and
+CMake toolchain variables for `aarch64-unknown-linux-ohos`.
+
+## Compiler Wrappers
+
+For ad-hoc compiler calls, use the root wrappers. They read the same
+`OHOS_NATIVE_SDK` variable and do not contain local paths.
+
+Windows PowerShell:
+
+```powershell
+.\ohos-clang.ps1 --version
+.\ohos-clangxx.ps1 --version
+```
+
+Linux or macOS:
+
+```bash
+sh ./ohos-clang.sh --version
+sh ./ohos-clangxx.sh --version
+```
+
+If you want to run the POSIX wrappers directly as `./ohos-clang.sh`, make them
+executable first:
+
+```bash
+chmod +x ./ohos-clang.sh ./ohos-clangxx.sh
+```
+
+## Cargo Config
+
+`.cargo/config.toml` intentionally does not set a checked-in linker path.
+Cargo cannot expand environment variables inside `linker` or CMake toolchain
+path values there, so those values are exported by `scripts/ohos-env.ps1` and
+`scripts/ohos-env.sh` instead.
+
+## Dependency Guard
+
+Release prep runs a no-SDK dependency check:
+
+```bash
+./scripts/release/check-ohos-deps.sh
+```
+
+The guard resolves the `codewhale-tui` dependency graph for
+`aarch64-unknown-linux-ohos` and fails if unsupported host/UI crates re-enter
+the target graph: `nix` 0.28/0.29, `portable-pty`, `starlark`, `arboard`, or
+`keyring`. This does not replace a real SDK/sysroot build, but it catches the
+known `starlark -> rustyline -> nix` and PTY/keyring regressions before release.
@@ -44,6 +44,8 @@ systems such as Alpine should use [Build from source](#7-build-from-source).
 > and `codewhale-tui-linux-arm64`, so a plain `npm i -g codewhale` works
 > on any glibc-based ARM64 Linux. If you're stuck on v0.8.7, jump to
 > [Build from source](#7-build-from-source) — `cargo install` works fine.
+> For HarmonyOS PC and OpenHarmony cross-build setup, see
+> [HarmonyOS and OpenHarmony](HarmonyOS.md).

 ---

@@ -285,6 +287,38 @@ curl -L -o /tmp/codewhale-artifacts-sha256.txt \

 (Use `shasum -a 256 -c` instead of `sha256sum` on macOS.)

+### Roll back to a previous release
+
+If a new release is bad on your machine, install the last known-good version
+explicitly. Replace `X.Y.Z` with the version you want to restore.
+
+```bash
+# npm wrapper, including the matching GitHub release binaries
+npm install -g codewhale@X.Y.Z
+
+# Cargo install path; both crates are required
+cargo install codewhale-cli --version X.Y.Z --locked --force
+cargo install codewhale-tui --version X.Y.Z --locked --force
+```
+
+For manual installs, download both binaries or the platform archive from the
+exact release tag and verify the matching checksum manifest from that same tag:
+
+```bash
+# individual binaries
+curl -L -o codewhale-artifacts-sha256.txt \
+  https://github.com/Hmbown/CodeWhale/releases/download/vX.Y.Z/codewhale-artifacts-sha256.txt
+
+# platform archives
+curl -L -o codewhale-bundles-sha256.txt \
+  https://github.com/Hmbown/CodeWhale/releases/download/vX.Y.Z/codewhale-bundles-sha256.txt
+```
+
+Inside a CodeWhale workspace, `/restore list [N]` lists side-git file snapshots
+and `/restore <N>` restores files from the chosen snapshot. That workspace
+rollback does not change your installed binary version and does not rewrite
+conversation history.
+
 ### Windows Scoop

 The `codewhale` package is listed in Scoop's main bucket:
@@ -61,6 +61,56 @@ manager snapshot. Config edits made from the TUI are written immediately, but
 the model-visible MCP tool pool is not hot-reloaded; the manager marks this as
 restart-required until the TUI is restarted.

+## Hugging Face MCP
+
+Hugging Face provides a hosted MCP server for Hub resources, documentation,
+datasets, Spaces, and community tools. CodeWhale does not call Hugging Face's
+Hub HTTP APIs from `/hf`; it only helps you inspect and set up the MCP config
+that the regular MCP manager will load.
+
+The recommended setup path is Hugging Face's settings-generated configuration:
+
+1. Visit <https://huggingface.co/settings/mcp> while signed in.
+2. Choose the MCP client closest to your CodeWhale config shape and copy the
+   generated server snippet.
+3. Paste the Hugging Face server entry into your resolved MCP config file.
+4. Restart CodeWhale, or run `/mcp reload` for the manager snapshot and restart
+   if the model-visible tool pool still needs to rebuild.
+
+CodeWhale reads both `servers` and `mcpServers`, so settings-generated snippets
+can be adapted without changing the rest of the MCP file. A placeholder-only
+shape looks like this:
+
+```json
+{
+  "servers": {
+    "huggingface": {
+      "url": "https://huggingface.co/mcp",
+      "headers": {
+        "Authorization": "Bearer ${HF_TOKEN}"
+      }
+    }
+  }
+}
+```
+
+The placeholder above is not a runnable secret. Use the settings-generated
+value in your private MCP config and never commit real Hugging Face tokens.
+
+Interactive helpers:
+
+```text
+/hf mcp status
+/hf mcp setup
+/hf concepts
+```
+
+`/hf mcp status` checks the configured MCP file for common Hugging Face server
+names or Hugging Face MCP URLs. `/hf concepts` explains the difference between
+the Hugging Face provider route, Hugging Face MCP, and explicit Hub workflows.
+
+Official docs: <https://huggingface.co/docs/hub/hf-mcp-server>
+
 ## Config File Location

 Default path:
@@ -15,10 +15,10 @@ implemented today.
 - DeepSeek is the first-class default provider today, with `deepseek-v4-pro`,
  `deepseek-v4-flash`, streaming thinking blocks, Fin routing, `DEEPSEEK_*`
  environment variables, and `~/.deepseek` config compatibility.
- OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, generic
-  OpenAI-compatible endpoints, SGLang, vLLM, and Ollama are supported provider
-  paths where their IDs appear in `/provider`, `codewhale --provider`, or
-  `codewhale models`.
+- OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, Hugging
+  Face Inference Providers, generic OpenAI-compatible endpoints, SGLang, vLLM,
+  and Ollama are supported provider paths where their IDs appear in
+  `/provider`, `codewhale --provider`, or `codewhale models`.
 - Model auto-routing chooses a concrete DeepSeek model and thinking level per
  turn. It is not a TUI mode.
 - Fin is the fast `deepseek-v4-flash` thinking-off path for routing,
@@ -29,9 +29,11 @@ implemented today.

 ## Not Implemented Yet

- A native Hugging Face provider or Hub browser.
- Built-in Hugging Face model card, dataset, adapter, safetensors, or Jobs
-  workflows.
+- A native Hugging Face Hub browser, model passport picker, or direct Hub search
+  workflow. The OpenAI-compatible Hugging Face Inference Providers route is
+  implemented separately as a chat provider.
+- Built-in Hugging Face model card, dataset, adapter, safetensors, Spaces, or
+  Jobs workflows.
 - Native Unsloth, NeMo, or Arcee integrations.
 - A dedicated Model Lab UI tab.
 - Built-in benchmark suites, eval leaderboards, hosted observability, or
@@ -62,13 +64,13 @@ Planned scope:
 - Hub API auth and model discovery.
 - Model cards, licenses, tags, safetensors metadata, adapters, and dataset
  links surfaced in a terminal-friendly way.
- Inference Providers as explicit provider choices when the user configures
-  them.
+- Native Hub browser and model-passport metadata on top of the already separate
+  Hugging Face Inference Providers chat route.
 - Hugging Face Jobs as an optional remote execution path for user-approved
  experiments.

-Non-goal for now: claiming a native Hugging Face provider exists before it is
-implemented in code.
+Non-goal for now: claiming native Hub search, model passports, Spaces/Jobs, or
+Model Lab UI exists before those surfaces are implemented in code.

 ## Unsloth Workset

@@ -137,7 +137,8 @@ DeepSeek-TUI has three related but intentionally separate recovery paths:
 - Esc-Esc backtrack rewinds the live transcript to a previous user prompt and
  restores that prompt into the composer for editing.
 - `/restore` and the `revert_turn` tool restore workspace files from side-git
-  snapshots. They do not rewrite conversation history.
+  snapshots. `/restore list [N]` lists more snapshot options before choosing a
+  rollback point. They do not rewrite conversation history.

 A Pi-style in-file tree browser is a larger UI/data-model project. v0.8.40
 ships the bounded fork/backtrack primitives and explicit lineage metadata.
@@ -102,6 +102,12 @@ base_url = "https://gateway.example/v1"
 model = "your-deepseek-compatible-model"
 ```

+Private gateways with broken or intercepted certificates should use
+`SSL_CERT_FILE` with a trusted CA bundle. As a last resort,
+`insecure_skip_tls_verify = true` can be set on the active `[providers.*]`
+table; it applies only to the LLM provider client and is shown by
+`codewhale doctor`.
+
 Keep `provider`, `api_key`, and `base_url` in user config or process
 environment. Project-local config overlays intentionally cannot set those keys,
 so a repository cannot silently redirect prompts or credentials to another
@@ -118,7 +124,7 @@ endpoint.
 | `wanjie-ark` | `[providers.wanjie_ark]` | `WANJIE_ARK_API_KEY`, `WANJIE_API_KEY`, `WANJIE_MAAS_API_KEY` | `WANJIE_ARK_BASE_URL`, `WANJIE_BASE_URL`, `WANJIE_MAAS_BASE_URL`; default `https://maas-openapi.wanjiedata.com/api/v1` | `deepseek-reasoner` | OpenAI-compatible hosted route. `WANJIE_ARK_MODEL`, `WANJIE_MODEL`, and `WANJIE_MAAS_MODEL` are accepted. |
 | `volcengine` | `[providers.volcengine]` | `VOLCENGINE_API_KEY`, `VOLCENGINE_ARK_API_KEY`, `ARK_API_KEY` | `VOLCENGINE_BASE_URL`, `VOLCENGINE_ARK_BASE_URL`, `ARK_BASE_URL`; default `https://ark.cn-beijing.volces.com/api/coding/v3` | `DeepSeek-V4-Pro`, `DeepSeek-V4-Flash` | Volcengine/Volcano Engine Ark OpenAI-compatible coding endpoint. `VOLCENGINE_MODEL` and `VOLCENGINE_ARK_MODEL` are accepted. |
 | `openrouter` | `[providers.openrouter]` | `OPENROUTER_API_KEY` | `OPENROUTER_BASE_URL`; default `https://openrouter.ai/api/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`; recent large IDs include `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, `z-ai/glm-5.1`, `moonshotai/kimi-k2.6` | Additive open-model routing layer. It does not replace DeepSeek; it lets users route supported model IDs through OpenRouter when they choose it. |
-| `xiaomi-mimo` | `[providers.xiaomi_mimo]` | `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, `MIMO_API_KEY` | `XIAOMI_MIMO_BASE_URL`, `MIMO_BASE_URL`; default `https://token-plan-sgp.xiaomimimo.com/v1` | Chat: `mimo-v2.5-pro`, `mimo-v2.5`; speech/TTS: `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | Xiaomi MiMo OpenAI-compatible chat completions route. Token Plan keys (`tp-...`) use the token-plan endpoint by default; pay-as-you-go keys can set `base_url = "https://api.xiaomimimo.com/v1"`. It sends `max_completion_tokens` and uses MiMo's `thinking` field for reasoning control. `codewhale speech` / `tts` uses the TTS models. |
+| `xiaomi-mimo` | `[providers.xiaomi_mimo]` | `XIAOMI_MIMO_TOKEN_PLAN_API_KEY`, `MIMO_TOKEN_PLAN_API_KEY`, `XIAOMI_MIMO_API_KEY`, `XIAOMI_API_KEY`, `MIMO_API_KEY` | `XIAOMI_MIMO_BASE_URL`, `MIMO_BASE_URL`, `XIAOMI_MIMO_MODE`, `MIMO_MODE`; default `https://token-plan-sgp.xiaomimimo.com/v1` | Chat: `mimo-v2.5-pro`, `mimo-v2.5`; speech/TTS: `mimo-v2.5-tts`, `mimo-v2.5-tts-voicedesign`, `mimo-v2.5-tts-voiceclone`, `mimo-v2-tts` | Xiaomi MiMo OpenAI-compatible chat completions route. Token Plan keys (`tp-...`) use `api-key` auth and the token-plan endpoint by default; pay-as-you-go mode uses standard API keys (`sk-...`) and `https://api.xiaomimimo.com/v1`. It sends `max_completion_tokens` and uses MiMo's `thinking` field for reasoning control. `codewhale speech` / `tts` uses the TTS models. |
 | `novita` | `[providers.novita]` | `NOVITA_API_KEY` | `NOVITA_BASE_URL`; default `https://api.novita.ai/v1` | `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash` | OpenAI-compatible hosted route for DeepSeek model IDs. Use config or `CODEWHALE_MODEL` / `DEEPSEEK_MODEL` for model overrides. |
 | `fireworks` | `[providers.fireworks]` | `FIREWORKS_API_KEY` | `FIREWORKS_BASE_URL`; default `https://api.fireworks.ai/inference/v1` | `accounts/fireworks/models/deepseek-v4-pro` | OpenAI-compatible hosted route. Use config or `CODEWHALE_MODEL` / `DEEPSEEK_MODEL` for model overrides. |
 | `siliconflow` | `[providers.siliconflow]` | `SILICONFLOW_API_KEY` | `SILICONFLOW_BASE_URL`; default `https://api.siliconflow.com/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | OpenAI-compatible hosted route. Official docs use the `.com` endpoint. `SILICONFLOW_MODEL` is accepted. Reasoning aliases `deepseek-reasoner` and `deepseek-r1` map to Pro; `deepseek-chat` and `deepseek-v3` map to Flash. |
@@ -130,6 +136,24 @@ endpoint.
 | `ollama` | `[providers.ollama]` | Optional `OLLAMA_API_KEY` | `OLLAMA_BASE_URL`; default `http://localhost:11434/v1` | `deepseek-coder:1.3b`; provider-hinted custom tags pass through | Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. `OLLAMA_MODEL` is accepted. |
 | `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible route. Org-prefixed model IDs pass through. |

+### Hugging Face Provider vs MCP vs Hub
+
+CodeWhale's `huggingface` provider ID is only the OpenAI-compatible chat
+inference route through Hugging Face Inference Providers. It is selected with
+`/provider huggingface`, `CODEWHALE_PROVIDER=huggingface`, or
+`provider = "huggingface"`.
+
+Hugging Face MCP is a separate external-tool route. Configure it through the
+MCP config described in `docs/MCP.md`, preferably using the settings-generated
+snippet from <https://huggingface.co/settings/mcp>. In the TUI, `/hf mcp status`
+checks whether the Hugging Face MCP server appears in the resolved MCP config,
+`/hf mcp setup` prints the settings workflow and a placeholder-only shape, and
+`/hf concepts` explains the provider/MCP/Hub distinction.
+
+Hub publishing or repository management remains explicit user action through
+Hub-native tooling such as `huggingface_hub` or git. The `/hf` helper does not
+upload to Hugging Face and does not perform direct Hugging Face Hub HTTP search.
+
 ### Xiaomi MiMo Notes

 `xiaomi-mimo` defaults to `mimo-v2.5-pro` for long-context reasoning and coding
@@ -137,6 +161,14 @@ work. The chat picker also exposes the latest Omni model `mimo-v2.5`. Xiaomi MiM
 TTS is available through `codewhale --provider xiaomi-mimo speech "text"
 --model tts` (or the `tts` alias) plus model-visible `speech` / `tts` tools in
 Agent/YOLO mode.
+
+Token Plan keys default to the Singapore endpoint
+`https://token-plan-sgp.xiaomimimo.com/v1`. If your MiMo account is provisioned
+for the China region, set `base_url = "https://token-plan-cn.xiaomimimo.com/v1"`
+explicitly in `[providers.xiaomi_mimo]` or set `mode = "token-plan-cn"`. Europe
+Token Plan accounts can use `mode = "token-plan-ams"`; `mode = "pay-as-you-go"`
+selects the standard API endpoint and standard MiMo key family.
+
 Voice-design and voice-clone shorthands map to `mimo-v2.5-tts-voicedesign` and
 `mimo-v2.5-tts-voiceclone`. Xiaomi's current
 [image-understanding guide](https://platform.xiaomimimo.com/docs/en-US/usage-guide/multimodal-understanding/image-understanding)
@@ -69,10 +69,10 @@ Anything that targets the DeepSeek provider API stays exactly as it was:
  Docker, or direct downloads.
 - **Docker image**: `ghcr.io/hmbown/codewhale`.

-## Deprecation shims (through v0.8.x)
+## Deprecation shims (removed in v0.9.0)

 To keep existing shell aliases, scripts, and CI working through the rename,
-v0.8.41 and later v0.8.x releases ship **deprecation shims**:
+v0.8.41 and later v0.8.x releases shipped **deprecation shims**:

 - A `deepseek` binary that prints a one-line warning to stderr and forwards
  argv to `codewhale`.
@@ -80,7 +80,9 @@ v0.8.41 and later v0.8.x releases ship **deprecation shims**:
 - The legacy `deepseek-tui` npm package is deprecated and no longer receives
  new releases. Install the `codewhale` npm package instead.

-These shims will be removed in **v0.9.0**. Please migrate before then.
+These binary shims are removed in **v0.9.0**. DeepSeek provider support, model
+IDs, `DEEPSEEK_*` environment variables, and legacy `~/.deepseek/` state
+fallbacks remain supported.

 ## Migrating in practice

@@ -114,15 +116,12 @@ downloads until the formula and tap repo are renamed.

 ### Manual / GitHub Releases

-`v0.8.41` Releases attach **both** the canonical `codewhale-*` /
-`codewhale-tui-*` assets and the legacy `deepseek-*` / `deepseek-tui-*`
-shim assets. Existing `deepseek update` invocations on v0.8.40 keep working;
-they land you on the deprecation shim, which then prompts the install of
-`codewhale`.
-
-A second checksum manifest, `deepseek-artifacts-sha256.txt`, is attached as
-an alias of `codewhale-artifacts-sha256.txt` so v0.8.40's hardcoded lookup
-still verifies.
+`v0.8.41` through `v0.8.x` Releases attached both the canonical
+`codewhale-*` / `codewhale-tui-*` assets and compatibility-only
+`deepseek-*` / `deepseek-tui-*` shim assets. Starting in v0.9.0, Releases attach
+only the canonical `codewhale-*` / `codewhale-tui-*` assets and the canonical
+`codewhale-artifacts-sha256.txt` checksum manifest. Install or update through
+`codewhale` before moving to v0.9.0.

 ### Sessions, skills, and manual workspaces

@@ -6,6 +6,11 @@ Step through this in order from a clean worktree on the release branch

 For deeper context on the underlying tools (preflight scripts, npm smoke,
 publish-crates), see [`RELEASE_RUNBOOK.md`](RELEASE_RUNBOOK.md).
+For v0.9.0, also complete the dedicated
+[`V0_9_0_RELEASE_ACCEPTANCE.md`](V0_9_0_RELEASE_ACCEPTANCE.md) matrix before
+tagging; it covers provider routes, WhaleFlow feature gates, GUI/runtime smoke,
+remote workbench decisions, and credit hygiene that the generic checklist does
+not enumerate.

 ## 1. CHANGELOG entry exists for the version

@@ -39,6 +44,9 @@ publish-crates), see [`RELEASE_RUNBOOK.md`](RELEASE_RUNBOOK.md).
 - [ ] `Cargo.lock` is refreshed (`cargo update --workspace --offline`).
 - [ ] `./scripts/release/check-versions.sh` reports
      `Version state OK: workspace=X.Y.Z, npm=X.Y.Z, lockfile in sync.`
+- [ ] `./scripts/release/check-ohos-deps.sh` reports that the OpenHarmony
+      target graph does not pull the unsupported `nix` 0.28/0.29,
+      `portable-pty`, `starlark`, `arboard`, or `keyring` crates.

 ## 3. Preflight gates

@@ -25,6 +25,7 @@ Current packaging note:
  - `codewhale-core`
  - `codewhale-app-server`
  - `codewhale-tui-core`
+  - `codewhale-whaleflow`

 ## Version Coordination

@@ -119,20 +120,22 @@ configured.
   `main` and letting `auto-tag.yml` create the tag — see the npm wrapper
   release section below for the `RELEASE_TAG_PAT` requirement).
 4. Publish crates in this order with `./scripts/release/publish-crates.sh publish`:
-   - `codewhale-secrets`
-   - `codewhale-config`
+   - `codewhale-mcp`
   - `codewhale-protocol`
+   - `codewhale-release`
+   - `codewhale-secrets`
   - `codewhale-state`
-   - `codewhale-agent`
+   - `codewhale-tui-core`
+   - `codewhale-whaleflow`
   - `codewhale-execpolicy`
   - `codewhale-hooks`
-   - `codewhale-mcp`
   - `codewhale-tools`
+   - `codewhale-config`
+   - `codewhale-agent`
+   - `codewhale-tui`
   - `codewhale-core`
   - `codewhale-app-server`
-   - `codewhale-tui-core`
   - `codewhale-cli`
-   - `codewhale-tui`
 5. Wait for each published crate version to appear on crates.io before publishing dependents.

 The publish helper is idempotent for reruns: already-published crate versions are skipped.
@@ -202,6 +205,18 @@ remote add cnb …`, then `git push cnb vX.Y.Z`).

 ## Recovery and Rollback

+- User-facing rollback:
+  - npm: `npm install -g codewhale@X.Y.Z`
+  - Cargo: `cargo install codewhale-cli --version X.Y.Z --locked --force`
+    and `cargo install codewhale-tui --version X.Y.Z --locked --force`
+  - manual assets: download binaries or the platform archive plus the matching
+    `codewhale-artifacts-sha256.txt` or `codewhale-bundles-sha256.txt`
+    manifest from `https://github.com/Hmbown/CodeWhale/releases/tag/vX.Y.Z`
+  - workspace files: use `/restore list [N]` and `/restore <N>` for side-git
+    snapshots; this does not change the installed binary version or rewrite
+    conversation history
+  - keep [docs/INSTALL.md](INSTALL.md#roll-back-to-a-previous-release) in sync
+    with these commands
 - Crates publish partially:
  - rerun `./scripts/release/publish-crates.sh publish`
  - already-published crate versions will be skipped
@@ -178,6 +178,36 @@ fronting layer.
 - `POST /v1/threads/{id}/resume`
 - `POST /v1/threads/{id}/fork`

+`GET /v1/threads/summary` is the read-only summary surface used by the VS Code
+Agent View. Each item includes `id`, `title`, `preview`, `model`, `mode`,
+`archived`, `updated_at`, `latest_turn_id`, `latest_turn_status`, plus
+workspace metadata:
+
+```json
+{
+  "id": "thread_...",
+  "title": "Implement MCP status count",
+  "preview": "The TUI footer should count project MCP servers...",
+  "model": "deepseek-v4-pro",
+  "mode": "agent",
+  "branch": "feature/runtime-api",
+  "head": "abc1234",
+  "dirty": false,
+  "workspace": "/Users/you/projects/codewhale",
+  "archived": false,
+  "updated_at": "2026-06-06T05:43:00Z",
+  "latest_turn_id": "turn_...",
+  "latest_turn_status": "completed"
+}
+```
+
+`branch` is resolved from the thread workspace at request time and may be
+`null` when the workspace is not a Git repository or the branch cannot be read.
+`head` is the current short Git commit for that workspace when available.
+`dirty` is true when the workspace has staged, unstaged, or untracked changes.
+`workspace` is included so editor clients can show when an agent lane is working
+outside the current VS Code folder.
+
 Thread forks are sibling runtime threads, not an in-place tree projection.
 `thread.forked` events include `source_thread_id`; internal backtrack-aware
 forks may also include `backtrack_depth_from_tail` and `dropped_turn_id`.
@@ -219,6 +249,28 @@ accept an empty string to clear a previously-set value. Added in v0.8.10 (#562):
 **Events** (SSE replay + live stream)
 - `GET /v1/threads/{id}/events?since_seq=<u64>`

+**Snapshots** (read-only side-git restore point listing)
+- `GET /v1/snapshots?limit=20`
+
+`/v1/snapshots` lists recent side-git restore points for the runtime workspace.
+It is read-only and does not restore files. `limit` defaults to `20` and must be
+between `1` and `100`.
+
+```json
+[
+  {
+    "id": "snap_...",
+    "label": "post-turn:1",
+    "timestamp": 1780730580
+  }
+]
+```
+
+Runtime API restore/retry/undo/editor-apply mutation endpoints are intentionally
+deferred. GUI clients should treat thread summaries and snapshots as inspection
+surfaces until atomic filesystem + conversation-state mutation semantics are
+specified and tested.
+
 **Receipts** (future read-only audit export)
 - Proposed only: `GET /v1/threads/{thread_id}/turns/{turn_id}/receipt`

@@ -18,6 +18,19 @@ The `type` field on `agent_open` selects a system-prompt posture for the child
 (`agent_type` is accepted as a compatibility alias). Each role is a distinct
 stance toward the work — not just a different label.

+## Maintainer posture
+
+Sub-agents help CodeWhale move faster, but the parent agent still owns the
+maintainer decision. Use children to gather evidence, review patches, and run
+verification while keeping the community posture in
+[`AGENT_ETHOS.md`](AGENT_ETHOS.md): issues are open intake, PR gates are
+review-load controls, and harvested work needs clear contributor credit.
+
+When a child reviews community work, the parent should still inspect the PR
+diff, linked issues, tests, and CI before merging, harvesting, closing, or
+deferring it. A sub-agent's result is a working set, not a substitute for
+stewardship.
+
 | Role          | Stance                                 | Writes? | Shell posture | Typical use                                  |
 |---------------|----------------------------------------|---------|---------------|----------------------------------------------|
 | `general`     | flexible; do whatever the parent says  | yes     | yes           | the default; multi-step tasks                |
@@ -110,9 +110,24 @@ to the model, such as `mcp_<server>_<tool>`.
 | `task_cancel` | Cancel a queued or running durable task. Approval-required. |
 | `checklist_write` | Granular progress under the active thread/task. Checklist state is subordinate to the durable task. |
 | `checklist_add` / `checklist_update` / `checklist_list` | Single-item checklist operations. |
-| `todo_write` / `todo_add` / `todo_update` / `todo_list` | Compatibility aliases for the checklist tools. Existing sessions keep working, but new prompts should use `checklist_*`. |
 | `note` | One-off important fact for later. |

+The legacy `todo_write`, `todo_add`, `todo_update`, and `todo_list` names are
+hidden compatibility aliases for saved transcript replay. They remain callable
+by exact name, but they are not part of the model-visible catalog; compatibility
+results include `_deprecation.use_instead = checklist_*` and
+`_deprecation.removed_in = 0.9.0`.
+
+`update_plan` accepts both the legacy shape (`explanation` plus `plan` steps)
+and a richer PlanArtifact shape for Plan mode review. The richer fields are
+optional and should be filled only when grounded in evidence: `title`,
+`objective`, `context_summary`, `sources_used`, `critical_files`,
+`constraints`, `recommended_approach`, `verification_plan`,
+`risks_and_unknowns`, and `handoff_packet`. The transcript card, Plan-mode
+confirmation prompt, `/relay`, and fork-state handoff all render the same
+artifact so a plan can be reviewed, accepted, revised, replayed, or delegated
+without losing its source context.
+
 ### Verification gates and artifacts

 | Tool | Niche |
@@ -228,6 +243,12 @@ Aliases: `/batonpass`, `/接力`.
 Use it before a long break, compaction, or moving work to a fresh session. The
 relay should preserve the goal, current Work checklist item, changed files,
 decisions, verification state, and one concrete next action.
+Treat it as the deliberate counterpart to automatic compaction: both exist to
+preserve continuity for the next session or sub-agent, but `/relay` lets the
+current agent inspect live evidence and choose the durable handoff facts
+explicitly. When `update_plan` has a rich PlanArtifact, `/relay` includes that
+strategy metadata so manual relay, fork-state, and compacted continuity do not
+drift into separate stories.

 ### Parallel fan-out: cost-class caps

@@ -257,6 +278,20 @@ prompting and tool catalogs. Do not use these names in new active guidance:
 The old one-shot `rlm` model-facing tool is also replaced by persistent
 `rlm_open` / `rlm_eval` / `rlm_configure` / `rlm_close` sessions.

+v0.9.0 adds the following hidden-compat aliases (#2682, #2683):
+
+| Hidden alias | Canonical replacement | Status |
+|---|---|---|
+| `todo_write` | `checklist_write` | Hidden, returns `_deprecation` metadata |
+| `todo_add` | `checklist_add` | Hidden, returns `_deprecation` metadata |
+| `todo_update` | `checklist_update` | Hidden, returns `_deprecation` metadata |
+| `todo_list` | `checklist_list` | Hidden, returns `_deprecation` metadata |
+| `exec_wait` | `exec_shell_wait` | Hidden, callable for replay |
+| `exec_interact` | `exec_shell_interact` | Hidden, callable for replay |
+
+All hidden aliases remain registered and callable so saved transcripts can
+replay without teaching new sessions the deprecated spelling.
+
 Historical compatibility results may include a `_deprecation` block shaped
 like this:

@@ -0,0 +1,98 @@
+# v0.9.0 Release Acceptance Matrix
+
+This matrix is the pre-tag gate for v0.9.0. Do not tag or publish v0.9.0 until
+each row is checked off or has an explicit defer decision with an owner.
+
+For every manual smoke, record the date, OS, provider/model, command, redacted
+config source, result, and follow-up issue or PR.
+
+## Core Build And Packaging
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| `cargo fmt --all -- --check` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. |
+| `cargo check --workspace --all-targets --locked` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. |
+| `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. |
+| `cargo test --workspace --all-features --locked` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df` (`4254 passed, 0 failed, 4 ignored` in `codewhale-tui`; package integration and doctest suites also passed). An earlier full run hit one transient localhost SSE reset in `mcp::tests::legacy_sse_closed_stream_reconnects_and_retries_tool_call`; the exact test passed serially before the full rerun. |
+| `./scripts/release/check-versions.sh` | release steward | ship | Passed locally during #2845 (`e22a7da53`) and remains part of the PR-local release gate for each stewardship slice. |
+| `./scripts/release/check-ohos-deps.sh` | release steward | ship | Passed locally during #2845 (`e22a7da53`); OHOS dependency graph stayed compatible for `codewhale-tui` on `aarch64-unknown-linux-ohos`. |
+| `./scripts/release/publish-crates.sh dry-run` | release steward | ship | Passed locally on 2026-06-06 at `2561a54df`. The script performed full `cargo publish --dry-run` for crates without unpublished workspace dependencies and package-content verification for dependent workspace crates; expected 0.8.53 already-published warnings were observed. |
+| `node scripts/release/npm-wrapper-smoke.js` after release build | release steward | ship | Passed locally on 2026-06-06 at `2561a54df` after `cargo build --release --locked -p codewhale-cli -p codewhale-tui`. The harness packed `codewhale-0.8.53.tgz`, served local release assets, and verified `npx --no-install codewhale doctor --help` plus `npx --no-install codewhale-tui --help`. |
+| GitHub release asset verification before npm publish | release steward | post-tag/pre-npm gate | The live v0.9.0 GitHub Release does not exist yet. After tagging and before `npm publish`, verify the Release contains the expected platform archives, individual binaries, Windows installer/portable assets, `codewhale-artifacts-sha256.txt`, and `codewhale-bundles-sha256.txt`; `npm/codewhale/scripts/verify-release-assets.js` remains the npm prepublish asset guard. |
+
+## Provider, Model, And Auth
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| DeepSeek V4 direct provider smoke | provider steward | ship | Passed locally on 2026-06-06 at `7bd68279e` using macOS 26.1 arm64 release binary: `./target/release/codewhale --provider deepseek --model deepseek-v4-flash exec "Reply exactly CODEWHALE_V09_SMOKE_OK and nothing else."` returned `CODEWHALE_V09_SMOKE_OK`. Redacted auth source: `codewhale auth status --provider deepseek` reported config-backed DeepSeek API key present, env unset, with no secret value printed. |
+| Xiaomi MiMo token-plan and pay-as-you-go config smoke | provider steward | ship config evidence / require live smoke before tag if claiming provider availability | Config coverage exercises token-plan and pay-as-you-go env behavior in `crates/config/src/lib.rs` (`xiaomi_mimo_env_token_plan_mode_uses_token_plan_key_and_endpoint`, `xiaomi_mimo_env_pay_as_you_go_mode_prefers_standard_key`) and mirrors the TUI config path in `crates/tui/src/config.rs`; `docs/PROVIDERS.md` documents Token Plan regions and pay-as-you-go mode. This is config evidence only, not a live Xiaomi call. |
+| Arcee Trinity Thinking route smoke or explicit defer | provider steward | defer live smoke / ship static route metadata | Static provider/model metadata exists in `docs/PROVIDERS.md`, `crates/agent/src/lib.rs`, and `crates/tui/src/config.rs`, but no live Arcee credential smoke has been recorded. Do not claim live Arcee route readiness in v0.9 release notes unless a dated manual smoke is added. |
+| Hugging Face provider route and MCP concept helpers ship; native Hub search/passports are deferred | model-lab steward | ship foundation / defer native search-passport runtime | `ProviderKind::Huggingface`, env aliases, picker/docs, and `/hf concepts` / `/hf mcp status` distinguish the chat provider route from Hugging Face MCP and explicit Hub tooling. `docs/PROVIDERS.md` states native Hub HTTP search/passport picker metadata are not shipped behavior in this checkout; #2705/#2707/#2712 remain open for native Model Lab work. |
+| OpenRouter, Novita, Fireworks, and Volcengine env behavior smoke | provider steward | ship config evidence / require live smoke before claiming live route coverage | Env/config tests cover OpenRouter, Novita, Fireworks, and Volcengine key/base-url/model override behavior in `crates/config/src/lib.rs`; TUI provider defaults and Volcengine env override are covered in `crates/tui/src/config.rs`, and `docs/PROVIDERS.md` documents the env/default behavior. This is env behavior evidence only, not live provider traffic. |
+| Provider registry drift check covers aliases/default env keys | provider steward | ship | #2820 (`5d491bc68`) added the metadata-only provider registry and `scripts/check-provider-registry.py`; verification included `python3 scripts/check-provider-registry.py` and `cargo test -p codewhale-config provider_ -- --nocapture`. |
+| Provider-scoped TLS skip-verify remains default-off and doctor-visible | security steward | ship | #2834 (`190e9f35e`, `6269cb91f`) landed provider-scoped TLS skip verify with default-off config, doctor warnings, docs, and CLI/runtime option tests. |
+
+## Runtime Stability
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| Windows input/render smoke or documented manual verification | runtime steward | manual smoke required before tag | No dated Windows input/render smoke has been recorded on this matrix yet. Unit/shell-dispatcher tests are not a substitute for Windows ConPTY/manual input verification. |
+| macOS and Linux TUI startup smoke | runtime steward | ship | macOS 26.1 arm64 evidence from 2026-06-06: release binaries built from the stewardship line reported `codewhale-tui 0.8.53 (2561a54df0ed)` and `codewhale 0.8.53 (2561a54df0ed)`, and `cargo test -p codewhale-tui --test qa_pty --locked` passed 6/6 startup/composer/keystroke PTY scenarios. Linux evidence from 2026-06-06: a streamed source archive built inside a Debian Bookworm arm64 `rust:1.88-bookworm` container with `libdbus-1-dev` / `pkg-config`; `cargo build --release --locked -p codewhale-cli -p codewhale-tui` passed and `./target/release/codewhale --version` / `./target/release/codewhale-tui --version` both ran successfully. |
+| Large-repo startup smoke | runtime steward | defer full smoke / ship bounded-context mitigation evidence | Bounded project-context tests and changelog evidence cover the mitigation slice, but live large-workspace reports #697 and #1827 remain open. Do not close those issues or claim a full large-repo startup smoke without a dated manual run. |
+| Sub-agent timeout/completion smoke | subagent steward | ship timeout/completion slice | `docs/SUBAGENTS.md` documents per-step timeout and heartbeat behavior; `crates/tui/src/tools/subagent/tests.rs` covers `api_timeout_preserves_checkpoint_and_agent_eval_continues_from_it`, parent completion ordering, and timeout propagation. Broader hung-agent issues #1806/#2614 remain open. |
+| Long-running command live-state smoke | runtime steward | defer root-cause live-state smoke / ship shell-routing tests | Shell tests cover timeout/background/wait/cancel behavior and `shell_job_routing.rs` distinguishes live from stale process state, but #1786 remains open for shell PID/task-flow hangs and premature LIVE-state exit. |
+| Runtime API remains token-protected for GUI clients | GUI steward | ship | #2811/#2814 documented and consumed the existing runtime token flow from the official VS Code extension; #2822 (`bb8835812`) added `GET /v1/snapshots` behind the same runtime API token middleware. |
+| Snapshot/restore surfaces are read-only unless mutation semantics are tested | GUI steward | ship | #2822 (`bb8835812`) and #2828 (`293643e27`) expose restore points as read-only listing/Agent View metadata only; #2808 restore/retry/patch-undo mutation endpoints remain unmerged pending atomicity tests. |
+
+## UI And Workflow UX
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| First-look screen included or explicitly deferred | UX steward | defer v0.9 redesign / keep existing onboarding | The existing onboarding welcome remains covered by `first_run_user_always_starts_at_welcome`; the opinionated v0.9 first-look/home redesign remains deferred to #2713 so release notes should not imply a new home screen. |
+| Slash picker readability smoke | UX steward | ship | Focused slash-menu coverage exercises visibility/hide state, removed-command filtering, Up/Down wrap behavior, argument spacing, skill command insertion, inline skill mentions, Esc priority, and locked composer height while match counts change. Verification: `cargo test -p codewhale-tui slash_menu --locked`, `cargo test -p codewhale-tui try_autocomplete_slash_command_completes_skill_argument --locked`, and `cargo test -p codewhale-tui next_escape_action_slash_menu_takes_priority --locked`. |
+| Transcript tool-collapse smoke or explicit defer | UX steward | ship | #2776 (`c76ec4752`) landed dense successful tool-run collapse with guardrails for failed/running/shell/patch/review/diff cells; focused widget coverage includes `chat_widget_collapses_dense_tool_runs_by_default`, `chat_widget_expands_dense_tool_runs_on_demand`, and `chat_widget_expanded_mode_leaves_dense_tool_runs_visible`. |
+| Sidebar detail popovers smoke or explicit defer | UX steward | ship | #2778 (`3cb49233e`) added row-level hover metadata and wrapping detail popovers for truncated Work/Tasks/Agents rows; #2806 (`19f5c7aa6`) preserved current sub-agent progress in the sidebar hover text. Focused coverage includes `sidebar_hover_rows_mark_source_text_diff_as_truncated` and `subagent_hover_text_preserves_full_agent_id_and_progress`. |
+| Plan review/handoff artifact smoke | Plan steward | ship | #2770 (`7ac8063b6`) added rich PlanArtifact sections through the transcript/Plan prompt path; focused coverage includes `plan_update_cell_renders_rich_artifact_metadata` and `plan_prompt_renders_rich_plan_artifact_sections`. |
+| VS Code Agent View branch/workspace visibility smoke | GUI steward | ship | #2825 (`1bacaf763`) added `workspace` / `branch` metadata to `/v1/threads/summary`; #2832 (`50b773f1d`) added read-only auto-refresh so branch/workspace changes can appear without manual refresh. The current stewardship slice extends the same read-only metadata with current Git `head` and `dirty` worktree state for editor/agent-lane visibility. |
+
+## v0.9.0 Feature Gates
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| WhaleFlow typed IR, mock executor, replay, TeacherReview, StudentReplay, and cutline docs are tested | WhaleFlow steward | ship | #2821/#2824/#2831/#2833/#2839/#2840/#2841 plus focused local `cargo test -p codewhale-whaleflow --locked`; #2670 closed after `cargo test -p codewhale-whaleflow starlark --locked` passed 7/7 on current stewardship head. The `rlm_cache_change.star` dogfood workflow now has recorded mock-trace replay coverage, including a missing-record divergence check. |
+| Live `workflow_run`, worktree application, provider calls, and TraceStore writes are deferred until cancellation/replay/atomicity semantics pass | WhaleFlow steward | defer | #2669 and #2679 remain open for live runtime execution, provider calls, TraceStore writes, Arcee/student replay, and CLI/TUI workflow mode; current v0.9 branch ships mock executor/replay foundations only. |
+| Model Lab / Hugging Face MVP is included or deferred with release-note wording | model-lab steward | ship provider/MCP docs foundation / defer native Model Lab MVP | v0.9 ships the Hugging Face chat-provider route, provider docs, and `/hf` concept/MCP status helpers only. Native Hub search, model passports, Spaces/Jobs workflows, and Model Lab eval/export surfaces remain deferred to #2705/#2707/#2710/#2712/#2727. |
+| HarnessProfile runtime MVP is deferred; schema/resolver foundation ships with release-note wording | harness steward | ship foundation / defer runtime | #2844 (`efbcc681a`) documents the cutline; `HarnessPosture` / `HarnessProfile` config schema and strict validation are present; a pure resolver matches provider/model routes without changing runtime behavior; seed-profile runtime selection, telemetry, and status display remain follow-up work. |
+| `codebase_search` MVP is included or deferred with release-note wording | search steward | defer runtime / ship design doc | `docs/CODEBASE_SEARCH_DESIGN.md` is explicitly doc-only and says no catalog code ships in this cycle; runtime tool registration, index/eval fixtures, and search implementation remain deferred to #2680. |
+| External memory remains explicit/optional per `WHALEFLOW_EXTERNAL_MEMORY.md` | memory steward | ship | #2842 (`a7052751e`) added the external-memory cutline: optional/explicit workflow node/plugin only, visible state/owner/storage/scope, and no hidden default context substrate. |
+
+## Remote Workbench
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| Remote workbench is marked included, experimental, or deferred | remote steward | defer runtime / ship setup docs only | `docs/REMOTE_VM_US.md`, `docs/REMOTE_SETUP_DESIGN.md`, and `docs/TENCENT_LIGHTHOUSE_HK.md` document possible VM/Telegram/Lark setup patterns, but no v0.9 remote workbench runtime is included. |
+| If included: VM install smoke passes | remote steward | defer | Not applicable while remote workbench runtime is deferred; no v0.9 VM install smoke is required before tagging. |
+| If included: Telegram bridge smoke passes | remote steward | defer | Not applicable while remote workbench runtime is deferred; Telegram bridge docs remain design/setup guidance only. |
+| If deferred: release notes avoid implying remote workbench availability | remote steward | ship | Acceptance matrix and changelog wording must say setup/design docs only, not a shipped remote workbench feature. |
+
+## Docs, Migration, And Rollback
+
+| Gate | Owner | Ship/defer decision | Evidence |
+| --- | --- | --- | --- |
+| README, configuration docs, provider docs, and changelog agree | docs steward | ship | #2845 (`e22a7da53`) aligned README/config example/changelogs with the HarnessProfile cutline and removed stale `V0_9_0_EXECUTION_MAP` links. |
+| Breaking changes, deprecations, and deferred v0.9 gates are listed in release notes | release steward | ship | Changelog and this matrix list deferred Model Lab/Hugging Face native Hub work, `codebase_search`, remote workbench runtime, WhaleFlow live runtime execution, HarnessProfile runtime selection, large-repo startup smoke, long-running command live-state smoke, and Arcee live smoke. `.github/workflows/release.yml` release-body text avoids stale v0.8.x-only shim wording and keeps CodeWhale as the canonical package/asset name. |
+| Upgrade steps exist for users coming from `deepseek-tui` | docs steward | ship | `docs/REBRAND.md` documents npm/Cargo migration commands, legacy state fallback, binary/package/asset naming, and the v0.9.0 compatibility cutline. |
+| Rollback steps exist for npm wrapper, Cargo install, and side-git restore | release steward | ship | `docs/INSTALL.md#roll-back-to-a-previous-release` and `docs/RELEASE_RUNBOOK.md#recovery-and-rollback` document pinned npm rollback, pinned Cargo rollback for both crates, exact-tag manual asset restore with checksums, and side-git `/restore list [N]` / `/restore <N>` workspace rollback. |
+| Live GitHub Release body has its own contributor/credit section | release steward | post-tag/pre-npm gate | `.github/workflows/release.yml` now creates a dedicated `## Contributors` release-body section with v0.9 contributor, reporter, helper, and harvested-PR credits. The live v0.9.0 Release does not exist yet, so this remains a release-time verification gate before npm publish or completion. |
+| Contributors/reporters/helpers from harvested PRs and linked issues are credited | release steward | ship local changelog / verify live body at release time | Changelog credits include harvested PR authors, issue reporters/helpers, and external/co-authored work including @Implementist, @jrcjrcc, and @punkcanyang. `python3 scripts/check-coauthor-trailers.py --author-map .github/AUTHOR_MAP --range origin/main..HEAD --check-authors` remains the local co-author-map gate; live release-body credits are covered by the row above. |
+
+## Before Tagging
+
+- [ ] Every `ship` row has evidence.
+- [ ] Every `decide` row is changed to either `ship` with evidence or `defer`
+      with an owner and linked follow-up.
+- [ ] Every `manual smoke required` row has dated smoke evidence, or is changed
+      to an explicit defer decision with an owner and linked follow-up.
+- [ ] Draft integration PR CI is green on the exact commit that will be tagged.
+- [ ] The release prompt points new agents to this matrix before any tag,
+      publish, or GitHub Release action.
@@ -280,7 +280,9 @@ clearable, and scoped**:
   `finalize`/`FINAL` is an *in-kernel Python function*, not a tool).
 6. **Cached-main overlay** — promoted lessons from the cached main branch
   (`/overlay`, §9).
-7. **External memory (Aleph)** — large local data via the `aleph` skill.
+7. **External memory (Aleph)** — large local data via the `aleph` skill;
+   see `docs/WHALEFLOW_EXTERNAL_MEMORY.md` for the v0.9.0 cutline that keeps
+   this optional, explicit, inspectable, and out of the default path.

 **Why it helps weaker models.** The model never has to *guess* where a fact
 should live or *re-derive* context it already established. Each layer has a
@@ -0,0 +1,72 @@
+# WhaleFlow External Memory Cutline
+
+This note resolves the v0.9.0 cutline for Aleph-style external memory in
+WhaleFlow. It is a design boundary, not a runtime implementation.
+
+## Decision
+
+External memory should be optional and explicit for v0.9.0. Normal CodeWhale
+operation must not depend on it, and WhaleFlow must not silently enable it for
+long-running runs.
+
+For v0.9.0, external memory can appear only as:
+
+- an explicit workflow node whose inputs, outputs, scope, and permissions are
+  visible in the typed WhaleFlow IR;
+- an optional plugin or skill-backed tool that the user enables deliberately;
+- a documented experiment whose state can be inspected, cleared, and exported.
+
+It should not be a hidden context substrate, a replacement for repo search, or a
+default backing store for every workflow run.
+
+## Layer Boundaries
+
+External memory is separate from the existing memory and replay layers:
+
+| Layer | Scope | v0.9.0 rule |
+| --- | --- | --- |
+| User memory | Small durable user preferences and facts surfaced by `/memory` | Opt-in, user-owned, not workflow evidence |
+| Repo search / codemap | Derived repo structure and search results | Rebuildable from the workspace; not a memory log |
+| ARMH/RLM memo | In-session working memory and exact-context memoization | Visible hit/miss telemetry; not durable replay evidence |
+| TraceStore | Recorded workflow, branch, leaf, and control results | Source of deterministic replay; no live model calls during replay |
+| Cached-main overlay | Promoted lessons after review and replay | Inspectable and reversible; never mutates Git main |
+| External memory | Large local or plugin-backed data outside normal context | Explicit node/plugin only; visible state and clear/export required |
+
+## Visibility Requirements
+
+Any future external-memory implementation must show:
+
+- when it is active;
+- which workflow node or plugin owns it;
+- where its state is stored;
+- what repo or run scope it can read;
+- whether it is included in replay, export, or promotion evidence;
+- how to inspect, clear, pin, and export it.
+
+The UI should treat this like an active context layer, not like invisible model
+intuition. If a run cannot explain why a fact came from external memory, the
+feature is not ready for default use.
+
+## Permissions And Privacy
+
+External memory must inherit the strictest relevant scope:
+
+- it must not cross repo/workspace boundaries without explicit approval;
+- project-local config must not silently enable broad external-memory reads;
+- replay must record external-memory inputs as evidence or mark replay as
+  unavailable/diverged;
+- exports must make external-memory references visible without dumping private
+  raw state by default.
+
+## Deferred Work
+
+The following remain out of scope for the v0.9.0 cutline:
+
+- default-on Aleph-style memory for all WhaleFlow runs;
+- automatic promotion from external memory into cached-main overlay;
+- hidden retrieval behind ordinary prompts;
+- hosted or shared external-memory services;
+- treating external memory as a substitute for TraceStore replay.
+
+Future implementation should start with a read-only typed workflow node and a
+mock replay fixture before adding any plugin-backed or live retrieval path.
@@ -64,6 +64,13 @@ Non-goals:
 - no blocking of user input
 - no transcript mutation from `turn_end`

+Implementation note for the v0.9 branch: the narrow #2578 harvest uses the
+shared structured observer path introduced for sub-agent lifecycle hooks. It
+fires before queued follow-up dispatch, after queue-recovery state is known, so
+the payload can report the queued-message count without letting a hook change
+what gets sent next. Stdout is ignored for `turn_end`; only `message_submit`
+has a stdout mutation contract.
+
 ### PR 3: Subagent lifecycle observer hooks

 Expose subagent start and completion as observer-only hook events.
@@ -251,7 +258,9 @@ transcript content in the first version.
 - Existing observer-only hooks keep working.
 - Existing env vars remain available.
 - `shell_env` keeps its existing stdout `KEY=VALUE` contract.
- Structured stdout is interpreted only by `message_submit` in PR 1.
+- Structured stdout is interpreted only by `message_submit` in PR 1. Structured
+  observer hooks such as `turn_end`, `subagent_spawn`, and `subagent_complete`
+  receive JSON on stdin, but their stdout is ignored by the caller.

 ## 6. Review checkpoints

@@ -0,0 +1,167 @@
+# RFC: Provider Fallback Chain
+
+**Issue:** #2574
+**Reporter:** @hsdbeebou
+**Design source:** #2581 by @idling11
+**Status:** Draft for the v0.9 provider-routing lane
+**Date:** 2026-06-04
+
+## Problem
+
+CodeWhale can store credentials and defaults for several providers, but a
+running session uses one active provider route at a time. When that provider
+hits a rate limit, temporary outage, or transport failure, the user must notice
+the failure, run `/provider`, choose another route, and resubmit the turn.
+
+That manual switch is especially disruptive during long-running agentic work.
+A provider fallback chain can keep work moving, but it also changes billing
+source, model behavior, tool support, context-window limits, and vendor
+expectations. The design must make that switch explicit and capability-aware.
+
+## Principles
+
+- Fallback is opt-in. No provider switch happens unless the user configured a
+  fallback chain.
+- Billing and vendor changes are visible in the transcript and status UI.
+- Normal retry policy runs before fallback.
+- Fallback is allowed only before assistant content or tool calls have started
+  streaming for the failing request.
+- Fallback candidates must support the request shape for the current turn.
+- Authentication, authorization, malformed request, and model-not-found errors
+  do not silently switch providers by default.
+
+## Proposed Config Shape
+
+Keep the existing root `provider = "..."` setting as the primary route. Add an
+ordered fallback list and a small policy section:
+
+```toml
+provider = "nvidia-nim"
+fallback_providers = ["deepseek", "openrouter"]
+
+[provider_fallback]
+enabled = true
+reset_on_new_session = true
+```
+
+Rules:
+
+- `fallback_providers` is ordered and contains provider IDs already accepted by
+  the provider parser.
+- The primary provider is not repeated in the fallback list.
+- Duplicate fallback providers are rejected.
+- Missing credentials produce a startup warning and make that fallback entry
+  inactive until credentials appear.
+- If `provider_fallback.enabled` is absent, the presence of a non-empty
+  `fallback_providers` list enables fallback.
+
+## Fallback Eligibility
+
+| Failure | Fallback by default? | Notes |
+| --- | --- | --- |
+| HTTP 429 | Yes | Rate limit or quota exhaustion on the active route. |
+| HTTP 502, 503, 504 | Yes | Temporary upstream failure after normal retries. |
+| Connect timeout / DNS failure | Yes | Transport path failed before content streamed. |
+| HTTP 401 / 403 | No | Usually bad credentials or account permissions. |
+| HTTP 400 | No | Usually client request shape or model parameter issue. |
+| Model not found | No | Avoid silently switching model families unless a future policy explicitly opts in. |
+| Stream interrupted after content | No | The transcript may already contain partial assistant content or tool-call deltas. |
+
+The first implementation should classify errors centrally and expose tests for
+each case before any fallback execution is wired into the turn loop.
+
+## Capability Gate
+
+Before switching to a fallback provider/model, CodeWhale checks that the
+candidate can support the current request shape:
+
+| Requirement | Gate |
+| --- | --- |
+| Tool calls | Candidate provider/model must support tool calling. |
+| Reasoning effort | Candidate must support the requested thinking mode, or the switch is blocked. |
+| Context size | Candidate context window must fit the estimated current request. |
+| Image inputs | Candidate must support vision if the turn includes images. |
+| Provider-specific headers | Candidate request must be rebuilt from that provider's own auth/base-url/header rules. |
+
+If no fallback candidate passes the gate, CodeWhale surfaces the original
+provider error with a clear "fallback chain exhausted or incompatible" note.
+
+## Runtime Behavior
+
+1. Build the request for the active provider.
+2. Run existing retry policy for that provider.
+3. If retries exhaust with a fallback-eligible failure and no assistant content
+   has streamed, evaluate the next fallback provider.
+4. Rebuild the request with the fallback provider's model, base URL, auth, and
+   provider-specific headers.
+5. Add a visible transcript marker and status event before the fallback request
+   starts.
+6. Continue through the chain until a provider succeeds, the chain is
+   exhausted, or a non-eligible failure occurs.
+
+Suggested transcript marker:
+
+```text
+[provider fallback: nvidia-nim -> deepseek, reason: rate_limit]
+```
+
+Suggested status text:
+
+```text
+NVIDIA NIM unavailable; switched to DeepSeek fallback
+```
+
+For multi-request turns, such as tool-call result follow-ups, fallback can be
+considered for a later request only if that later request has not started
+streaming assistant content yet. The transcript marker must identify that the
+turn changed provider between requests.
+
+## UI and Commands
+
+- `/provider` should show the primary route and the current fallback position.
+- `/provider reset` should return to the primary provider for future requests in
+  the current session.
+- The footer/statusline should surface the concrete provider/model that actually
+  handled the latest request.
+- Session receipts should record both attempted provider and successful
+  provider so cost and debugging information stay truthful.
+
+## Implementation Slices
+
+1. Config schema and validation:
+   - parse `fallback_providers` and `[provider_fallback]`
+   - validate known providers, duplicates, missing credentials, and primary
+     self-reference
+   - document the config surface
+2. Error classification:
+   - define fallback-eligible error kinds
+   - add unit tests for HTTP and transport failures
+3. Request-shape capability gate:
+   - evaluate tool, thinking, context, and image requirements
+   - add tests for incompatible fallbacks
+4. Fallback execution:
+   - run retries per provider before moving to the next provider
+   - rebuild auth/base-url/header state for each candidate
+   - block fallback after partial streaming
+5. UI/receipt integration:
+   - status event
+   - transcript marker
+   - `/provider reset`
+   - receipt fields for attempted and selected provider
+
+## Non-goals
+
+- No automatic cost optimization or weighted provider selection.
+- No silent fallback when authentication or permissions fail.
+- No fallback after partial assistant content or tool-call deltas have streamed.
+- No provider/model capability downgrades without an explicit future policy.
+- No sub-agent-specific fallback policy in the first implementation; sub-agents
+  inherit the same configured fallback chain unless they are given an explicit
+  provider/model override.
+
+## Credit
+
+This RFC is based on issue #2574 from @hsdbeebou and PR #2581 from @idling11.
+The original PR head currently has no net file changes, so this document
+preserves the useful design direction while tightening the v0.9 contract around
+truthful provider routing, billing visibility, and capability checks.