Files
codewhale/docs/V0_9_0_RELEASE_ACCEPTANCE.md
T
2026-06-06 02:41:09 -07:00

17 KiB

v0.9.0 Release Acceptance Matrix

This matrix is the pre-tag gate for v0.9.0. Do not tag or publish v0.9.0 until each row is checked off or has an explicit defer decision with an owner.

For every manual smoke, record the date, OS, provider/model, command, redacted config source, result, and follow-up issue or PR.

Core Build And Packaging

Gate Owner Ship/defer decision Evidence
cargo fmt --all -- --check release steward ship Passed locally on 2026-06-06 at 2561a54df.
cargo check --workspace --all-targets --locked release steward ship Passed locally on 2026-06-06 at 2561a54df.
cargo clippy --workspace --all-targets --all-features --locked -- -D warnings release steward ship Passed locally on 2026-06-06 at 2561a54df.
cargo test --workspace --all-features --locked release steward ship Passed locally on 2026-06-06 at 2561a54df (4254 passed, 0 failed, 4 ignored in codewhale-tui; package integration and doctest suites also passed). An earlier full run hit one transient localhost SSE reset in mcp::tests::legacy_sse_closed_stream_reconnects_and_retries_tool_call; the exact test passed serially before the full rerun.
./scripts/release/check-versions.sh release steward ship Passed locally during #2845 (e22a7da53) and remains part of the PR-local release gate for each stewardship slice.
./scripts/release/check-ohos-deps.sh release steward ship Passed locally during #2845 (e22a7da53); OHOS dependency graph stayed compatible for codewhale-tui on aarch64-unknown-linux-ohos.
./scripts/release/publish-crates.sh dry-run release steward ship Passed locally on 2026-06-06 at 2561a54df. The script performed full cargo publish --dry-run for crates without unpublished workspace dependencies and package-content verification for dependent workspace crates; expected 0.8.53 already-published warnings were observed.
node scripts/release/npm-wrapper-smoke.js after release build release steward ship Passed locally on 2026-06-06 at 2561a54df after cargo build --release --locked -p codewhale-cli -p codewhale-tui. The harness packed codewhale-0.8.53.tgz, served local release assets, and verified npx --no-install codewhale doctor --help plus npx --no-install codewhale-tui --help.
GitHub release asset verification before npm publish release steward post-tag/pre-npm gate The live v0.9.0 GitHub Release does not exist yet. After tagging and before npm publish, verify the Release contains the expected platform archives, individual binaries, Windows installer/portable assets, codewhale-artifacts-sha256.txt, and codewhale-bundles-sha256.txt; npm/codewhale/scripts/verify-release-assets.js remains the npm prepublish asset guard.

Provider, Model, And Auth

Gate Owner Ship/defer decision Evidence
DeepSeek V4 direct provider smoke provider steward ship Passed locally on 2026-06-06 at 7bd68279e using macOS 26.1 arm64 release binary: ./target/release/codewhale --provider deepseek --model deepseek-v4-flash exec "Reply exactly CODEWHALE_V09_SMOKE_OK and nothing else." returned CODEWHALE_V09_SMOKE_OK. Redacted auth source: codewhale auth status --provider deepseek reported config-backed DeepSeek API key present, env unset, with no secret value printed.
Xiaomi MiMo token-plan and pay-as-you-go config smoke provider steward ship config evidence / require live smoke before tag if claiming provider availability Config coverage exercises token-plan and pay-as-you-go env behavior in crates/config/src/lib.rs (xiaomi_mimo_env_token_plan_mode_uses_token_plan_key_and_endpoint, xiaomi_mimo_env_pay_as_you_go_mode_prefers_standard_key) and mirrors the TUI config path in crates/tui/src/config.rs; docs/PROVIDERS.md documents Token Plan regions and pay-as-you-go mode. This is config evidence only, not a live Xiaomi call.
Arcee Trinity Thinking route smoke or explicit defer provider steward defer live smoke / ship static route metadata Static provider/model metadata exists in docs/PROVIDERS.md, crates/agent/src/lib.rs, and crates/tui/src/config.rs, but no live Arcee credential smoke has been recorded. Do not claim live Arcee route readiness in v0.9 release notes unless a dated manual smoke is added.
Hugging Face provider route and MCP concept helpers ship; native Hub search/passports are deferred model-lab steward ship foundation / defer native search-passport runtime ProviderKind::Huggingface, env aliases, picker/docs, and /hf concepts / /hf mcp status distinguish the chat provider route from Hugging Face MCP and explicit Hub tooling. docs/PROVIDERS.md states native Hub HTTP search/passport picker metadata are not shipped behavior in this checkout; #2705/#2707/#2712 remain open for native Model Lab work.
OpenRouter, Novita, Fireworks, and Volcengine env behavior smoke provider steward ship config evidence / require live smoke before claiming live route coverage Env/config tests cover OpenRouter, Novita, Fireworks, and Volcengine key/base-url/model override behavior in crates/config/src/lib.rs; TUI provider defaults and Volcengine env override are covered in crates/tui/src/config.rs, and docs/PROVIDERS.md documents the env/default behavior. This is env behavior evidence only, not live provider traffic.
Provider registry drift check covers aliases/default env keys provider steward ship #2820 (5d491bc68) added the metadata-only provider registry and scripts/check-provider-registry.py; verification included python3 scripts/check-provider-registry.py and cargo test -p codewhale-config provider_ -- --nocapture.
Provider-scoped TLS skip-verify remains default-off and doctor-visible security steward ship #2834 (190e9f35e, 6269cb91f) landed provider-scoped TLS skip verify with default-off config, doctor warnings, docs, and CLI/runtime option tests.

Runtime Stability

Gate Owner Ship/defer decision Evidence
Windows input/render smoke or documented manual verification runtime steward manual smoke required before tag No dated Windows input/render smoke has been recorded on this matrix yet. Unit/shell-dispatcher tests are not a substitute for Windows ConPTY/manual input verification.
macOS and Linux TUI startup smoke runtime steward partial macOS evidence recorded / Linux startup smoke required before tag On macOS 26.1 arm64 on 2026-06-06, release binaries built from the current stewardship line reported codewhale-tui 0.8.53 (2561a54df0ed) and codewhale 0.8.53 (2561a54df0ed), and cargo test -p codewhale-tui --test qa_pty --locked passed 6/6 startup/composer/keystroke PTY scenarios. This supports macOS startup confidence only; Linux release-startup evidence still must be recorded before tagging, and CI does not replace manual startup smoke.
Large-repo startup smoke runtime steward defer full smoke / ship bounded-context mitigation evidence Bounded project-context tests and changelog evidence cover the mitigation slice, but live large-workspace reports #697 and #1827 remain open. Do not close those issues or claim a full large-repo startup smoke without a dated manual run.
Sub-agent timeout/completion smoke subagent steward ship timeout/completion slice docs/SUBAGENTS.md documents per-step timeout and heartbeat behavior; crates/tui/src/tools/subagent/tests.rs covers api_timeout_preserves_checkpoint_and_agent_eval_continues_from_it, parent completion ordering, and timeout propagation. Broader hung-agent issues #1806/#2614 remain open.
Long-running command live-state smoke runtime steward defer root-cause live-state smoke / ship shell-routing tests Shell tests cover timeout/background/wait/cancel behavior and shell_job_routing.rs distinguishes live from stale process state, but #1786 remains open for shell PID/task-flow hangs and premature LIVE-state exit.
Runtime API remains token-protected for GUI clients GUI steward ship #2811/#2814 documented and consumed the existing runtime token flow from the official VS Code extension; #2822 (bb8835812) added GET /v1/snapshots behind the same runtime API token middleware.
Snapshot/restore surfaces are read-only unless mutation semantics are tested GUI steward ship #2822 (bb8835812) and #2828 (293643e27) expose restore points as read-only listing/Agent View metadata only; #2808 restore/retry/patch-undo mutation endpoints remain unmerged pending atomicity tests.

UI And Workflow UX

Gate Owner Ship/defer decision Evidence
First-look screen included or explicitly deferred UX steward defer v0.9 redesign / keep existing onboarding The existing onboarding welcome remains covered by first_run_user_always_starts_at_welcome; the opinionated v0.9 first-look/home redesign remains deferred to #2713 so release notes should not imply a new home screen.
Slash picker readability smoke UX steward ship Focused slash-menu coverage exercises visibility/hide state, removed-command filtering, Up/Down wrap behavior, argument spacing, skill command insertion, inline skill mentions, Esc priority, and locked composer height while match counts change. Verification: cargo test -p codewhale-tui slash_menu --locked, cargo test -p codewhale-tui try_autocomplete_slash_command_completes_skill_argument --locked, and cargo test -p codewhale-tui next_escape_action_slash_menu_takes_priority --locked.
Transcript tool-collapse smoke or explicit defer UX steward ship #2776 (c76ec4752) landed dense successful tool-run collapse with guardrails for failed/running/shell/patch/review/diff cells; focused widget coverage includes chat_widget_collapses_dense_tool_runs_by_default, chat_widget_expands_dense_tool_runs_on_demand, and chat_widget_expanded_mode_leaves_dense_tool_runs_visible.
Sidebar detail popovers smoke or explicit defer UX steward ship #2778 (3cb49233e) added row-level hover metadata and wrapping detail popovers for truncated Work/Tasks/Agents rows; #2806 (19f5c7aa6) preserved current sub-agent progress in the sidebar hover text. Focused coverage includes sidebar_hover_rows_mark_source_text_diff_as_truncated and subagent_hover_text_preserves_full_agent_id_and_progress.
Plan review/handoff artifact smoke Plan steward ship #2770 (7ac8063b6) added rich PlanArtifact sections through the transcript/Plan prompt path; focused coverage includes plan_update_cell_renders_rich_artifact_metadata and plan_prompt_renders_rich_plan_artifact_sections.
VS Code Agent View branch/workspace visibility smoke GUI steward ship #2825 (1bacaf763) added workspace / branch metadata to /v1/threads/summary; #2832 (50b773f1d) added read-only auto-refresh so branch/workspace changes can appear without manual refresh.

v0.9.0 Feature Gates

Gate Owner Ship/defer decision Evidence
WhaleFlow typed IR, mock executor, replay, TeacherReview, StudentReplay, and cutline docs are tested WhaleFlow steward ship #2821/#2824/#2831/#2833/#2839/#2840/#2841 plus focused local cargo test -p codewhale-whaleflow --locked; #2670 closed after cargo test -p codewhale-whaleflow starlark --locked passed 7/7 on current stewardship head. The rlm_cache_change.star dogfood workflow now has recorded mock-trace replay coverage, including a missing-record divergence check.
Live workflow_run, worktree application, provider calls, and TraceStore writes are deferred until cancellation/replay/atomicity semantics pass WhaleFlow steward defer #2669 and #2679 remain open for live runtime execution, provider calls, TraceStore writes, Arcee/student replay, and CLI/TUI workflow mode; current v0.9 branch ships mock executor/replay foundations only.
Model Lab / Hugging Face MVP is included or deferred with release-note wording model-lab steward ship provider/MCP docs foundation / defer native Model Lab MVP v0.9 ships the Hugging Face chat-provider route, provider docs, and /hf concept/MCP status helpers only. Native Hub search, model passports, Spaces/Jobs workflows, and Model Lab eval/export surfaces remain deferred to #2705/#2707/#2710/#2712/#2727.
HarnessProfile runtime MVP is deferred; schema/resolver foundation ships with release-note wording harness steward ship foundation / defer runtime #2844 (efbcc681a) documents the cutline; HarnessPosture / HarnessProfile config schema and strict validation are present; a pure resolver matches provider/model routes without changing runtime behavior; seed-profile runtime selection, telemetry, and status display remain follow-up work.
codebase_search MVP is included or deferred with release-note wording search steward defer runtime / ship design doc docs/CODEBASE_SEARCH_DESIGN.md is explicitly doc-only and says no catalog code ships in this cycle; runtime tool registration, index/eval fixtures, and search implementation remain deferred to #2680.
External memory remains explicit/optional per WHALEFLOW_EXTERNAL_MEMORY.md memory steward ship #2842 (a7052751e) added the external-memory cutline: optional/explicit workflow node/plugin only, visible state/owner/storage/scope, and no hidden default context substrate.

Remote Workbench

Gate Owner Ship/defer decision Evidence
Remote workbench is marked included, experimental, or deferred remote steward defer runtime / ship setup docs only docs/REMOTE_VM_US.md, docs/REMOTE_SETUP_DESIGN.md, and docs/TENCENT_LIGHTHOUSE_HK.md document possible VM/Telegram/Lark setup patterns, but no v0.9 remote workbench runtime is included.
If included: VM install smoke passes remote steward defer Not applicable while remote workbench runtime is deferred; no v0.9 VM install smoke is required before tagging.
If included: Telegram bridge smoke passes remote steward defer Not applicable while remote workbench runtime is deferred; Telegram bridge docs remain design/setup guidance only.
If deferred: release notes avoid implying remote workbench availability remote steward ship Acceptance matrix and changelog wording must say setup/design docs only, not a shipped remote workbench feature.

Docs, Migration, And Rollback

Gate Owner Ship/defer decision Evidence
README, configuration docs, provider docs, and changelog agree docs steward ship #2845 (e22a7da53) aligned README/config example/changelogs with the HarnessProfile cutline and removed stale V0_9_0_EXECUTION_MAP links.
Breaking changes, deprecations, and deferred v0.9 gates are listed in release notes release steward ship Changelog and this matrix list deferred Model Lab/Hugging Face native Hub work, codebase_search, remote workbench runtime, WhaleFlow live runtime execution, HarnessProfile runtime selection, large-repo startup smoke, long-running command live-state smoke, and Arcee live smoke. .github/workflows/release.yml release-body text avoids stale v0.8.x-only shim wording and keeps CodeWhale as the canonical package/asset name.
Upgrade steps exist for users coming from deepseek-tui docs steward ship docs/REBRAND.md documents npm/Cargo migration commands, legacy state fallback, binary/package/asset naming, and the v0.9.0 compatibility cutline.
Rollback steps exist for npm wrapper, Cargo install, and side-git restore release steward ship docs/INSTALL.md#roll-back-to-a-previous-release and docs/RELEASE_RUNBOOK.md#recovery-and-rollback document pinned npm rollback, pinned Cargo rollback for both crates, exact-tag manual asset restore with checksums, and side-git /restore list [N] / /restore <N> workspace rollback.
Live GitHub Release body has its own contributor/credit section release steward post-tag/pre-npm gate .github/workflows/release.yml now creates a dedicated ## Contributors release-body section with v0.9 contributor, reporter, helper, and harvested-PR credits. The live v0.9.0 Release does not exist yet, so this remains a release-time verification gate before npm publish or completion.
Contributors/reporters/helpers from harvested PRs and linked issues are credited release steward ship local changelog / verify live body at release time Changelog credits include harvested PR authors, issue reporters/helpers, and external/co-authored work including @Implementist, @jrcjrcc, and @punkcanyang. python3 scripts/check-coauthor-trailers.py --author-map .github/AUTHOR_MAP --range origin/main..HEAD --check-authors remains the local co-author-map gate; live release-body credits are covered by the row above.

Before Tagging

  • Every ship row has evidence.
  • Every decide row is changed to either ship with evidence or defer with an owner and linked follow-up.
  • Every manual smoke required row has dated smoke evidence, or is changed to an explicit defer decision with an owner and linked follow-up.
  • Draft integration PR CI is green on the exact commit that will be tagged.
  • The release prompt points new agents to this matrix before any tag, publish, or GitHub Release action.