chore(release): merge v0.9.0-stewardship into v0.8.54

Includes Paulo's command parity and Gherkin E2E harnesses, HUQIANTAO's concurrency/security fixes, LeoAlex0's runtime_prompt slim, reidliu41's hotbar persistence, HarmonyOS scaffolding, Whaleflow foundation crate, and all v0.9.0 stabilization work.
2026-06-08 06:54:09 -07:00
parent edd28066e1 f88528a5a3
commit 78ae354fa4
237 changed files with 41229 additions and 4498 deletions
@@ -0,0 +1,18 @@
+# HarmonyOS/OpenHarmony cross-build paths are intentionally not configured
+# here. Cargo does not expand environment variables inside target linker paths
+# or CMake toolchain paths, so checked-in absolute SDK paths make the workspace
+# machine-specific.
+#
+# See docs/HarmonyOS.md for setup details.
+#
+# Set OHOS_NATIVE_SDK to the OpenHarmony native SDK directory, then load one of:
+#
+#   PowerShell:
+#     . .\scripts\ohos-env.ps1
+#
+#   Linux/macOS:
+#     . ./scripts/ohos-env.sh
+#
+# The setup scripts export Cargo's target-specific linker, AR, CC, CXX, CFLAGS,
+# CXXFLAGS, CARGO_ENCODED_RUSTFLAGS, CC_SHELL_ESCAPED_FLAGS, and
+# CMAKE_TOOLCHAIN_FILE variables for aarch64-unknown-linux-ohos.
@@ -38,6 +38,7 @@
      script: |
        set -eu
        ./scripts/release/check-versions.sh
+        ./scripts/release/check-ohos-deps.sh
        cargo fmt --all -- --check
        cargo check --workspace --all-targets --locked
        cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
@@ -75,6 +76,7 @@
      script: |
        set -eu
        ./scripts/release/check-versions.sh
+        ./scripts/release/check-ohos-deps.sh
        cargo fmt --all -- --check
        cargo check --workspace --all-targets --locked
        cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
@@ -123,6 +125,7 @@ $:
            apt-get install -y git libdbus-1-dev nodejs pkg-config

            ./scripts/release/check-versions.sh
+            ./scripts/release/check-ohos-deps.sh
            cargo build --release --locked -p codewhale-cli -p codewhale-tui

            mkdir -p target/cnb-release
@@ -3,5 +3,11 @@
 # produces different compiled binaries on Windows vs Linux/macOS.
 crates/tui/src/prompts/*.md text eol=lf

+# Rustfmt writes LF; keep Rust sources stable across Windows/Linux/macOS.
+*.rs text eol=lf
+
+# Keep repository attributes themselves stable on every platform.
+.gitattributes text eol=lf
+
 # Everything else auto-detects (default).
 * text=auto
@@ -9,3 +9,56 @@
 #   issue:username
 #   all:username
 all:hmbown
+all:reidliu41
+all:ousamabenyounes
+all:ljm3790865
+all:HUQIANTAO
+all:xyuai
+all:merchloubna70-dot
+all:h3c-hexin
+all:axobase001
+all:donglovejava
+all:Oliver-ZPLiu
+all:idling11
+all:angziii
+all:aboimpinto
+all:encyc
+all:Duducoco
+all:cyq1017
+all:zlh124
+all:THINKER-ONLY
+all:nightt5879
+all:Liu-Vince
+all:JiarenWang
+all:wdw8276
+all:pengyou200902
+all:linzhiqin2003
+all:LING71671
+all:JasonOA888
+all:Inference1
+all:hongqitai
+all:gordonlu
+all:gaord
+all:zhuangbiaowei
+all:yuanchenglu
+all:Vishnu1837
+all:sximelon
+all:Sskift
+all:New2Niu
+all:shenjackyuanjie
+all:AdityaVG13
+all:mvanhorn
+all:MengZ-super
+all:membphis
+all:LeoAlex0
+all:Lee-take
+all:lbcheng888
+all:Implementist
+all:jrcjrcc
+all:yusufgurdogan
+all:kunpeng-ai-lab
+all:elowen53
+all:CrepuscularIRIS
+all:chnjames
+all:ChaceLyee2101
+all:AresNing
@@ -0,0 +1,106 @@
+# Contributor credit identity map.
+#
+# Format:
+#   alias = Display Name <id+login@users.noreply.github.com>
+#
+# The right-hand side must use GitHub's numeric noreply address so harvested
+# co-author credit lands in the contributor graph. The left-hand side may be a
+# GitHub login, old-style noreply address, raw email from a contributor commit,
+# or local machine email seen in older harvested history.
+
+hmbown = Hmbown <101357273+Hmbown@users.noreply.github.com>
+reidliu41 = reidliu41 <61492567+reidliu41@users.noreply.github.com>
+reid201711@gmail.com = reidliu41 <61492567+reidliu41@users.noreply.github.com>
+ousamabenyounes = Ben Younes <2910651+ousamabenyounes@users.noreply.github.com>
+benyounes.ousama@gmail.com = Ben Younes <2910651+ousamabenyounes@users.noreply.github.com>
+ljm3790865 = ljm3790865 <263429444+ljm3790865@users.noreply.github.com>
+HUQIANTAO = HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
+Hu Qiantao = HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
+huqiantao@users.noreply.github.com = HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
+huqiantao@HudeMacBook-Air.local = HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
+tom_huu@qq.com = HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
+punkcanyang = Punkcan Yang <36871858+punkcanyang@users.noreply.github.com>
+Punkcan Yang = Punkcan Yang <36871858+punkcanyang@users.noreply.github.com>
+bucunzai@gmail.com = Punkcan Yang <36871858+punkcanyang@users.noreply.github.com>
+merchloubna70-dot = merchloubna70-dot <258170091+merchloubna70-dot@users.noreply.github.com>
+h3c-hexin = h3c-hexin <13790929+h3c-hexin@users.noreply.github.com>
+he.xin@h3c.com = h3c-hexin <13790929+h3c-hexin@users.noreply.github.com>
+axobase001 = axobase001 <138223345+axobase001@users.noreply.github.com>
+donglovejava = donglovejava <211940267+donglovejava@users.noreply.github.com>
+Oliver-ZPLiu = Oliver-ZPLiu <47081637+Oliver-ZPLiu@users.noreply.github.com>
+idling11 = idling11 <8055620+idling11@users.noreply.github.com>
+Hanmiao Li = idling11 <8055620+idling11@users.noreply.github.com>
+894876246@qq.com = idling11 <8055620+idling11@users.noreply.github.com>
+angziii = angziii <177907677+angziii@users.noreply.github.com>
+aboimpinto = aboimpinto <1231687+aboimpinto@users.noreply.github.com>
+Paulo Aboim Pinto = aboimpinto <1231687+aboimpinto@users.noreply.github.com>
+aboimpinto@gmail.com = aboimpinto <1231687+aboimpinto@users.noreply.github.com>
+encyc = encyc <62669951+encyc@users.noreply.github.com>
+Duducoco = Duducoco <69681789+Duducoco@users.noreply.github.com>
+cyq1017 = cyq1017 <61975706+cyq1017@users.noreply.github.com>
+cyq = cyq1017 <61975706+cyq1017@users.noreply.github.com>
+15000851237@163.com = cyq1017 <61975706+cyq1017@users.noreply.github.com>
+zlh124 = zlh124 <56312993+zlh124@users.noreply.github.com>
+THINKER-ONLY = THINKER-ONLY <181556007+THINKER-ONLY@users.noreply.github.com>
+nightt5879 = nightt5879 <87569709+nightt5879@users.noreply.github.com>
+Liu-Vince = Liu-Vince <56624166+Liu-Vince@users.noreply.github.com>
+Vince = Liu-Vince <56624166+Liu-Vince@users.noreply.github.com>
+liuwenchang.x@qq.com = Liu-Vince <56624166+Liu-Vince@users.noreply.github.com>
+JiarenWang = JiarenWang <33421508+JiarenWang@users.noreply.github.com>
+wdw8276 = wdw8276 <3972439+wdw8276@users.noreply.github.com>
+pengyou200902 = pengyou200902 <35026241+pengyou200902@users.noreply.github.com>
+linzhiqin2003 = linzhiqin2003 <123250980+linzhiqin2003@users.noreply.github.com>
+LING71671 = LING71671 <231181387+LING71671@users.noreply.github.com>
+JasonOA888 = JasonOA888 <101583541+JasonOA888@users.noreply.github.com>
+Inference1 = Inference1 <68734681+Inference1@users.noreply.github.com>
+hongqitai = hongqitai <188678175+hongqitai@users.noreply.github.com>
+gordonlu = gordonlu <3125629+gordonlu@users.noreply.github.com>
+gaord = gaord <9567937+gaord@users.noreply.github.com>
+Ben Gao = gaord <9567937+gaord@users.noreply.github.com>
+bengao168@msn.com = gaord <9567937+gaord@users.noreply.github.com>
+zhuangbiaowei = zhuangbiaowei <93194+zhuangbiaowei@users.noreply.github.com>
+yuanchenglu = yuanchenglu <4088730+yuanchenglu@users.noreply.github.com>
+Vishnu1837 = Vishnu1837 <104626273+Vishnu1837@users.noreply.github.com>
+sximelon = sximelon <15710511+sximelon@users.noreply.github.com>
+Sskift = Sskift <163287349+Sskift@users.noreply.github.com>
+New2Niu = New2Niu <19551155+New2Niu@users.noreply.github.com>
+mvanhorn = mvanhorn <455140+mvanhorn@users.noreply.github.com>
+MengZ-super = MengZ-super <121712068+MengZ-super@users.noreply.github.com>
+membphis = membphis <6814606+membphis@users.noreply.github.com>
+LeoAlex0 = LeoAlex0 <31839998+LeoAlex0@users.noreply.github.com>
+Lee-take = Lee-take <210963840+Lee-take@users.noreply.github.com>
+lbcheng888 = lbcheng888 <6716643+lbcheng888@users.noreply.github.com>
+kunpeng-ai-lab = kunpeng-ai-lab <16793595+kunpeng-ai-lab@users.noreply.github.com>
+elowen53 = elowen53 <88364845+elowen53@users.noreply.github.com>
+Elowen = elowen53 <88364845+elowen53@users.noreply.github.com>
+xrnc@outlook.com = elowen53 <88364845+elowen53@users.noreply.github.com>
+CrepuscularIRIS = CrepuscularIRIS <126939795+CrepuscularIRIS@users.noreply.github.com>
+chnjames = chnjames <44110547+chnjames@users.noreply.github.com>
+ChaceLyee2101 = ChaceLyee2101 <95995339+ChaceLyee2101@users.noreply.github.com>
+ci4ic4 = ci4ic4 <6495973+ci4ic4@users.noreply.github.com>
+Chavdar Ivanov = ci4ic4 <6495973+ci4ic4@users.noreply.github.com>
+ci4ic4@gmail.com = ci4ic4 <6495973+ci4ic4@users.noreply.github.com>
+yusufgurdogan = yusufgurdogan <13736056+yusufgurdogan@users.noreply.github.com>
+Yusuf Gurdogan = yusufgurdogan <13736056+yusufgurdogan@users.noreply.github.com>
+hotelswith = yusufgurdogan <13736056+yusufgurdogan@users.noreply.github.com>
+contact@hotelswith.com = yusufgurdogan <13736056+yusufgurdogan@users.noreply.github.com>
+AresNing = AresNing <49557311+AresNing@users.noreply.github.com>
+
+shenjackyuanjie = shenjackyuanjie <54507071+shenjackyuanjie@users.noreply.github.com>
+shenjack = shenjackyuanjie <54507071+shenjackyuanjie@users.noreply.github.com>
+3695888@qq.com = shenjackyuanjie <54507071+shenjackyuanjie@users.noreply.github.com>
+xyuai = xyuai <281015099+xyuai@users.noreply.github.com>
+AdityaVG13 = AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
+adityavgcode@gmail.com = AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
+Implementist = Implementist <24910011+Implementist@users.noreply.github.com>
+implecao = Implementist <24910011+Implementist@users.noreply.github.com>
+yuyuyu4993@qq.com = Implementist <24910011+Implementist@users.noreply.github.com>
+jrcjrcc = jrcjrcc <192965070+jrcjrcc@users.noreply.github.com>
+jrcjrcc@users.noreply.github.com = jrcjrcc <192965070+jrcjrcc@users.noreply.github.com>
+RefuseOdd = RefuseOdd <192543033+RefuseOdd@users.noreply.github.com>
+wywsoor = wywsoor <26341601+wywsoor@users.noreply.github.com>
+hsdbeebou = hsdbeebou <284843096+hsdbeebou@users.noreply.github.com>
+tdccccc = tdccccc <79492752+tdccccc@users.noreply.github.com>
+greyfreedom = greyfreedom <11493871+greyfreedom@users.noreply.github.com>
+greyfreedom@163.com = greyfreedom <11493871+greyfreedom@users.noreply.github.com>
+puneetdixit200 = puneetdixit200 <236133619+puneetdixit200@users.noreply.github.com>
@@ -11,3 +11,4 @@
 - [ ] Updated docs or comments as needed
 - [ ] Added or updated tests where relevant
 - [ ] Verified TUI behavior manually if UI changes
+- [ ] Harvested/co-authored credit uses a GitHub numeric noreply address
@@ -3,7 +3,7 @@
 #
 # Expected environment:
 #   TAG       – git tag, e.g. "v0.8.31"
-#   MANIFEST  – path to deepseek-artifacts-sha256.txt
+#   MANIFEST  – path to codewhale-artifacts-sha256.txt
 #   TAP_REPO  – owner/repo of the Homebrew tap
 #   TOKEN     – PAT with contents:write on TAP_REPO (optional; skips if unset)

@@ -43,15 +43,6 @@ readonly SHA_COD_LINUX_ARM="$(sha codewhale-linux-arm64)"
 readonly SHA_TUI_LINUX_ARM="$(sha codewhale-tui-linux-arm64)"
 readonly SHA_COD_LINUX_X64="$(sha codewhale-linux-x64)"
 readonly SHA_TUI_LINUX_X64="$(sha codewhale-tui-linux-x64)"
-# Legacy shims (removed in v0.9.0)
-readonly SHA_LEG_MACOS_ARM="$(sha deepseek-macos-arm64)"
-readonly SHA_LEG_TUI_MACOS_ARM="$(sha deepseek-tui-macos-arm64)"
-readonly SHA_LEG_MACOS_X64="$(sha deepseek-macos-x64)"
-readonly SHA_LEG_TUI_MACOS_X64="$(sha deepseek-tui-macos-x64)"
-readonly SHA_LEG_LINUX_ARM="$(sha deepseek-linux-arm64)"
-readonly SHA_LEG_TUI_LINUX_ARM="$(sha deepseek-tui-linux-arm64)"
-readonly SHA_LEG_LINUX_X64="$(sha deepseek-linux-x64)"
-readonly SHA_LEG_TUI_LINUX_X64="$(sha deepseek-tui-linux-x64)"

 # --- temp dirs --------------------------------------------------------

@@ -78,14 +69,6 @@ class DeepseekTui < Formula
        url "${BASE_URL}/codewhale-tui-macos-arm64", using: :nounzip
        sha256 "${SHA_TUI_MACOS_ARM}"
      end
-      resource "legacy-shim" do
-        url "${BASE_URL}/deepseek-macos-arm64", using: :nounzip
-        sha256 "${SHA_LEG_MACOS_ARM}"
-      end
-      resource "legacy-tui-shim" do
-        url "${BASE_URL}/deepseek-tui-macos-arm64", using: :nounzip
-        sha256 "${SHA_LEG_TUI_MACOS_ARM}"
-      end
    else
      url "${BASE_URL}/codewhale-macos-x64", using: :nounzip
      sha256 "${SHA_COD_MACOS_X64}"
@@ -93,14 +76,6 @@ class DeepseekTui < Formula
        url "${BASE_URL}/codewhale-tui-macos-x64", using: :nounzip
        sha256 "${SHA_TUI_MACOS_X64}"
      end
-      resource "legacy-shim" do
-        url "${BASE_URL}/deepseek-macos-x64", using: :nounzip
-        sha256 "${SHA_LEG_MACOS_X64}"
-      end
-      resource "legacy-tui-shim" do
-        url "${BASE_URL}/deepseek-tui-macos-x64", using: :nounzip
-        sha256 "${SHA_LEG_TUI_MACOS_X64}"
-      end
    end
  end

@@ -112,14 +87,6 @@ class DeepseekTui < Formula
        url "${BASE_URL}/codewhale-tui-linux-arm64", using: :nounzip
        sha256 "${SHA_TUI_LINUX_ARM}"
      end
-      resource "legacy-shim" do
-        url "${BASE_URL}/deepseek-linux-arm64", using: :nounzip
-        sha256 "${SHA_LEG_LINUX_ARM}"
-      end
-      resource "legacy-tui-shim" do
-        url "${BASE_URL}/deepseek-tui-linux-arm64", using: :nounzip
-        sha256 "${SHA_LEG_TUI_LINUX_ARM}"
-      end
    else
      url "${BASE_URL}/codewhale-linux-x64", using: :nounzip
      sha256 "${SHA_COD_LINUX_X64}"
@@ -127,22 +94,12 @@ class DeepseekTui < Formula
        url "${BASE_URL}/codewhale-tui-linux-x64", using: :nounzip
        sha256 "${SHA_TUI_LINUX_X64}"
      end
-      resource "legacy-shim" do
-        url "${BASE_URL}/deepseek-linux-x64", using: :nounzip
-        sha256 "${SHA_LEG_LINUX_X64}"
-      end
-      resource "legacy-tui-shim" do
-        url "${BASE_URL}/deepseek-tui-linux-x64", using: :nounzip
-        sha256 "${SHA_LEG_TUI_LINUX_X64}"
-      end
    end
  end

  def install
    bin.install Dir["*"].first => "codewhale"
    resource("tui").stage { bin.install Dir["*"].first => "codewhale-tui" }
-    resource("legacy-shim").stage { bin.install Dir["*"].first => "deepseek" }
-    resource("legacy-tui-shim").stage { bin.install Dir["*"].first => "deepseek-tui" }
  end

  test do
@@ -27,12 +27,16 @@ jobs:
          node-version: 20
      - name: Check version drift
        run: ./scripts/release/check-versions.sh
+      - name: Check OHOS dependency graph
+        run: ./scripts/release/check-ohos-deps.sh

  lint:
    name: Lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
      - uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt, clippy
@@ -50,6 +54,22 @@ jobs:
        run: cargo clippy --workspace --all-features --locked -- -D warnings
      - name: Check provider registry drift
        run: python3 scripts/check-provider-registry.py
+      - name: Check harvested contributor credit
+        if: github.event_name != 'schedule'
+        shell: bash
+        run: |
+          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
+            git fetch --no-tags origin "${{ github.base_ref }}"
+            RANGE="origin/${{ github.base_ref }}..HEAD"
+          elif [[ "${{ github.event.before }}" != "0000000000000000000000000000000000000000" ]]; then
+            RANGE="${{ github.event.before }}..${{ github.sha }}"
+          else
+            RANGE="HEAD~1..HEAD"
+          fi
+          python3 scripts/check-coauthor-trailers.py \
+            --author-map .github/AUTHOR_MAP \
+            --range "$RANGE" \
+            --check-authors
      - name: Linux clippy location
        run: echo "Linux clippy/test gates run on CNB for mirrored fix/*, rebrand/*, work/v*, and main branches."

@@ -1,4 +1,4 @@
-name: Contribution gate - issues
+name: Contribution intake - issues

 on:
  issues:
@@ -8,16 +8,11 @@ permissions:
  contents: read
  issues: write

-env:
-  # Keep new gates observable first. Switch to "enforce" only after maintainers
-  # have seeded active contributors and reviewed the dry-run signal.
-  CONTRIBUTION_GATE_MODE: dry-run
-
 jobs:
  gate:
    runs-on: ubuntu-latest
    steps:
-      - name: Gate unapproved external issues
+      - name: Welcome new external issue reporters
        uses: actions/github-script@v7
        with:
          script: |
@@ -25,12 +20,6 @@ jobs:
            const owner = context.repo.owner;
            const repo = context.repo.repo;
            const privileged = new Set(['OWNER', 'MEMBER', 'COLLABORATOR']);
-            const gateMode = (process.env.CONTRIBUTION_GATE_MODE || 'dry-run').trim().toLowerCase();
-            const enforceGate = gateMode === 'enforce';
-
-            if (!['dry-run', 'enforce'].includes(gateMode)) {
-              core.warning(`Unknown CONTRIBUTION_GATE_MODE "${gateMode}"; defaulting to dry-run.`);
-            }

            if (privileged.has(issue.author_association)) return;
            if (issue.user.login === 'github-actions[bot]') return;
@@ -71,29 +60,25 @@ jobs:
              return;
            }

-            const gateMessage = enforceGate
-              ? 'This repository currently uses a maintainer-managed contribution gate, so issues from contributors who are not listed in `.github/APPROVED_CONTRIBUTORS` are closed automatically.'
-              : 'This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this issue is staying open. When enforcement is enabled, issues from contributors who are not listed in `.github/APPROVED_CONTRIBUTORS` will be closed automatically.';
+            const marker = '<!-- codewhale-issue-intake -->';
+            const { data: comments } = await github.rest.issues.listComments({
+              owner,
+              repo,
+              issue_number: issue.number,
+              per_page: 100,
+            });
+            if (comments.some(comment => (comment.body || '').includes(marker))) return;

            await github.rest.issues.createComment({
              owner,
              repo,
              issue_number: issue.number,
              body: [
+                marker,
                `Thanks @${issue.user.login} for the report.`,
                '',
-                gateMessage,
+                'This issue is staying open for maintainer triage. CodeWhale gets better because people bring us real edge cases from real machines, providers, regions, and workflows.',
                '',
-                'Please read `CONTRIBUTING.md` for the expected issue shape. A maintainer can grant issue access by commenting `/lgtmi` on an issue.',
+                'If you can add a reproduction, logs, version output, screenshots, or the provider/model involved, that makes it much easier for us to verify and harvest the fix. Maintainers may comment `/lgtmi` to mark recurring issue reporters as approved so this intake note is skipped next time.',
              ].join('\n'),
            });
-
-            if (!enforceGate) return;
-
-            await github.rest.issues.update({
-              owner,
-              repo,
-              issue_number: issue.number,
-              state: 'closed',
-              state_reason: 'not_planned',
-            });
@@ -73,21 +73,32 @@ jobs:
            }

            const gateMessage = enforceGate
-              ? 'This repository currently uses a maintainer-managed contribution gate, so pull requests from contributors who are not listed in `.github/APPROVED_CONTRIBUTORS` are closed automatically.'
-              : 'This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in `.github/APPROVED_CONTRIBUTORS` will be closed automatically.';
+              ? 'This repository currently limits automated PR intake to contributors listed in `.github/APPROVED_CONTRIBUTORS`. This is a maintainer-safety control for code review and CI load, not a judgment on the contribution. A maintainer can grant recurring PR access with `/lgtm` after review; once the generated allowlist PR is merged, this pull request can be reopened or resubmitted.'
+              : 'This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered.';

-            await github.rest.issues.createComment({
+            const marker = '<!-- codewhale-pr-gate -->';
+            const { data: comments } = await github.rest.issues.listComments({
              owner,
              repo,
              issue_number: pr.number,
-              body: [
-                `Thanks @${pr.user.login} for taking the time to contribute.`,
-                '',
-                gateMessage,
-                '',
-                'Please read `CONTRIBUTING.md` for the expected contribution shape. A maintainer can grant PR access by commenting `/lgtm` on a pull request.',
-              ].join('\n'),
+              per_page: 100,
            });
+            const alreadyNoted = comments.some(comment => (comment.body || '').includes(marker));
+            if (!alreadyNoted) {
+              await github.rest.issues.createComment({
+                owner,
+                repo,
+                issue_number: pr.number,
+                body: [
+                  marker,
+                  `Thanks @${pr.user.login} for taking the time to contribute.`,
+                  '',
+                  gateMessage,
+                  '',
+                  'Please read `CONTRIBUTING.md` for the expected contribution shape. A maintainer can grant recurring PR access by commenting `/lgtm` on a pull request.',
+                ].join('\n'),
+              });
+            }

            if (!enforceGate) return;

@@ -42,6 +42,8 @@ jobs:
        run: cargo fmt --all -- --check
      - name: Compile check
        run: cargo check --workspace --all-targets --locked
+      - name: OHOS dependency graph
+        run: ./scripts/release/check-ohos-deps.sh
      - name: Clippy
        run: cargo clippy --workspace --all-targets --all-features --locked -- -D warnings
      - name: Workspace tests
@@ -157,48 +159,6 @@ jobs:
            target: x86_64-pc-windows-msvc
            binary: codewhale-tui.exe
            artifact_name: codewhale-tui-windows-x64.exe
-          # --- deepseek (legacy dispatcher shim; removed in v0.9.0) ---
-          - os: ubuntu-latest
-            target: x86_64-unknown-linux-gnu
-            binary: deepseek
-            artifact_name: deepseek-linux-x64
-          - os: ubuntu-latest
-            target: aarch64-unknown-linux-gnu
-            binary: deepseek
-            artifact_name: deepseek-linux-arm64
-          - os: macos-latest
-            target: x86_64-apple-darwin
-            binary: deepseek
-            artifact_name: deepseek-macos-x64
-          - os: macos-latest
-            target: aarch64-apple-darwin
-            binary: deepseek
-            artifact_name: deepseek-macos-arm64
-          - os: windows-latest
-            target: x86_64-pc-windows-msvc
-            binary: deepseek.exe
-            artifact_name: deepseek-windows-x64.exe
-          # --- deepseek-tui (legacy TUI shim; removed in v0.9.0) ---
-          - os: ubuntu-latest
-            target: x86_64-unknown-linux-gnu
-            binary: deepseek-tui
-            artifact_name: deepseek-tui-linux-x64
-          - os: ubuntu-latest
-            target: aarch64-unknown-linux-gnu
-            binary: deepseek-tui
-            artifact_name: deepseek-tui-linux-arm64
-          - os: macos-latest
-            target: x86_64-apple-darwin
-            binary: deepseek-tui
-            artifact_name: deepseek-tui-macos-x64
-          - os: macos-latest
-            target: aarch64-apple-darwin
-            binary: deepseek-tui
-            artifact_name: deepseek-tui-macos-arm64
-          - os: windows-latest
-            target: x86_64-pc-windows-msvc
-            binary: deepseek-tui.exe
-            artifact_name: deepseek-tui-windows-x64.exe
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
@@ -502,8 +462,6 @@ jobs:
      - uses: actions/download-artifact@v4
        with:
          path: artifacts
-          # Match both the canonical `codewhale*` artifacts and the legacy
-          # `deepseek*` shim artifacts that ship for the transition release.
          pattern: '*'
      - name: Generate Windows npm launcher asset
        shell: bash
@@ -535,10 +493,6 @@ jobs:
            base="$(basename "${file}")"
            printf '%s  %s\n' "${hash}" "${base}" >> "${manifest}"
          done < <(find artifacts -type f ! -path 'artifacts/checksums/*' -print0 | sort -z)
-          # Legacy alias manifest so v0.8.40 `deepseek update` clients can
-          # still find a manifest by their hardcoded name. Same content; will
-          # be removed once the legacy shim binaries are retired in v0.9.0.
-          cp "${manifest}" "artifacts/checksums/deepseek-artifacts-sha256.txt"
          cat "${manifest}"
      - uses: softprops/action-gh-release@v1
        with:
@@ -546,13 +500,11 @@ jobs:
          files: artifacts/*/*
          prerelease: false
          body: |
-            > This release renames the project to **CodeWhale**. The legacy
-            > `deepseek` and `deepseek-tui` binaries continue to ship as
-            > compatibility-only deprecation shims during v0.8.x; they print a
-            > one-line warning and forward to `codewhale` / `codewhale-tui`.
-            > They will be removed in v0.9.0. The legacy npm package
-            > `deepseek-tui` is deprecated and receives no further releases.
-            > See `docs/REBRAND.md` for the full migration story.
+            > **CodeWhale** is the canonical project, command, npm package, and
+            > release-asset name. The legacy npm package `deepseek-tui` is
+            > deprecated and receives no further releases. Users coming from
+            > v0.8.x legacy `deepseek` / `deepseek-tui` names should migrate
+            > with `docs/REBRAND.md`.

            ## Install

@@ -573,7 +525,7 @@ jobs:
              ghcr.io/hmbown/codewhale:${{ needs.resolve.outputs.tag }}
            ```

-            The image ships the `codewhale` dispatcher and `codewhale-tui` runtime (plus the legacy `deepseek` / `deepseek-tui` shims during the transition). The `latest` tag is also updated on release.
+            The image ships the `codewhale` dispatcher and `codewhale-tui` runtime. The `latest` tag is also updated on release.

            ### Cargo (Linux / macOS)

@@ -613,7 +565,7 @@ jobs:

            The **portable** Windows archive skips the install script — extract and run from any directory. The NSIS installer is currently unsigned and may trigger Windows SmartScreen until a signing certificate is wired into the release pipeline.

-            Individual binaries are also attached below for scripting and the npm wrapper. Legacy `deepseek-*` and `deepseek-tui-*` assets are compatibility-only deprecation shims for v0.8.x so that existing `deepseek update` invocations on v0.8.40 keep working; they forward to the canonical binaries. The legacy npm package `deepseek-tui` is deprecated and is not republished.
+            Individual binaries are also attached below for scripting and the npm wrapper. The legacy npm package `deepseek-tui` is deprecated and is not republished. For migration from v0.8.x legacy binary names, see `docs/REBRAND.md`.

            ### Verify (recommended)

@@ -631,7 +583,19 @@ jobs:
            shasum -a 256 -c codewhale-artifacts-sha256.txt
            ```

-            The legacy `deepseek-artifacts-sha256.txt` is also attached for backward compatibility and contains the same hashes as the canonical manifest.
+            ## Contributors
+
+            Thanks to @sximelon, @cyq1017, @Artenx, @LHqweasd, @wywsoor,
+            @hsdbeebou, @mserrano11, @Dr3259, @yekern, @lioryx,
+            @puneetdixit200, @HUQIANTAO, @xyuai, @gaord, @shenjackyuanjie,
+            @AdityaVG13, @aboimpinto, @ousamabenyounes, @reidliu41,
+            @ljm3790865, @idling11, @h3c-hexin, @AresNing, @tdccccc,
+            @qiyuanlicn, @bevis-wong, @shuxiangxuebiancheng, @hongqitai,
+            @NASLXTO, @wuxixing, @linzhiqin2003, @merchloubna70-dot,
+            @mvanhorn, @Implementist, @jrcjrcc, @punkcanyang,
+            @yusufgurdogan, @LeoAlex0, @mo-vic, @AiurArtanis, @nasus9527,
+            and @lbcheng888 for reports, PRs, reviews, reproductions,
+            design direction, and harvested work that shaped v0.9.0.

            ## Changelog

@@ -668,13 +632,13 @@ jobs:
        run: |
          gh release download ${{ needs.resolve.outputs.tag }} \
            --repo ${{ github.repository }} \
-            --pattern 'deepseek-artifacts-sha256.txt' \
+            --pattern 'codewhale-artifacts-sha256.txt' \
            --dir /tmp
      - name: Update Homebrew tap
        if: steps.homebrew-token.outputs.available == 'true'
        env:
          TAG: ${{ needs.resolve.outputs.tag }}
-          MANIFEST: /tmp/deepseek-artifacts-sha256.txt
+          MANIFEST: /tmp/codewhale-artifacts-sha256.txt
          TAP_REPO: Hmbown/homebrew-deepseek-tui
          TOKEN: ${{ secrets.HOMEBREW_TAP_PAT || secrets.RELEASE_TAG_PAT }}
        run: bash .github/scripts/update-homebrew-tap.sh
@@ -50,6 +50,8 @@ docs/*.pdf
 # Local dev scripts and temp files
 *.sh
 *.cmd
+!ohos-clang.sh
+!ohos-clangxx.sh
 !scripts/**
 !.github/scripts/**
 test.txt
@@ -0,0 +1,22 @@
+# Repository Agent Guidance
+
+## CodeWhale Stewardship
+
+- Treat community contributors as partners. Good-faith PRs, issue reports,
+  repros, logs, reviews, and verification comments are maintainer evidence,
+  not queue noise.
+- Keep gates warm and dry-run unless Hunter explicitly approves enforcement.
+  Gate copy should guide contributors clearly and respectfully.
+- Credit every harvested PR, issue report, or comment that materially shaped a
+  fix. Preserve authorship when possible; otherwise use mappable GitHub
+  noreply `Co-authored-by` trailers from `.github/AUTHOR_MAP`.
+- Do not tag, publish, create a GitHub Release, or push release artifacts
+  without Hunter approval.
+- Use CodeWhale branding while keeping DeepSeek support first-class. Retiring
+  legacy `deepseek-tui` names must never read as deprecating DeepSeek models or
+  provider support.
+- Review PRs from code, tests, linked issues, comments, and check results.
+  Never merge, close, harvest, or defer community work from title or labels
+  alone.
+- Respect concurrent work in the tree. Do not revert or rewrite unrelated
+  edits by other people or agents.
@@ -13,11 +13,442 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 - **Benchmark harness runners.** Added CodeWhale-native benchmark entry points for SWE-bench, Terminal-Bench, and PinchBench, plus a local PinchBench runner that can grade tool-use traces with an LLM judge.
 - **Direct MiMo benchmark routing.** The benchmark runner now defaults to direct Xiaomi MiMo v2.5 Pro routing when configured, while keeping provider/model selection explicit.
+- Added `/restore list [N]` so users can inspect more side-git rollback
+  snapshots with UTC timestamps before choosing a restore point. Plain
+  `/restore` now shows the 20 most recent snapshots, numeric restore targets can
+  reach beyond that default listing up to a bounded index, and list requests
+  above the visible cap fail explicitly instead of silently truncating.
+- Added HarmonyOS/OpenHarmony support scaffolding: environment-driven
+  `OHOS_NATIVE_SDK` setup scripts and compiler wrappers, platform docs,
+  explicit Rustls ring-provider installation for the no-provider TLS build, and
+  OHOS fallbacks for unsupported keyring, clipboard, sandbox, browser-open, TTY,
+  execpolicy Starlark parsing, and self-update surfaces.
+- Added `scripts/release/check-ohos-deps.sh` and wired it into CI/release
+  preflight so the OpenHarmony target graph fails if unsupported `nix`,
+  `portable-pty`, `starlark`, `arboard`, or `keyring` dependencies re-enter.
+- Added `.github/AUTHOR_MAP` and a CI co-author credit check so harvested
+  commits use GitHub-mappable numeric noreply identities instead of `.local`,
+  placeholder, bot/tool, or raw third-party emails.
+- Added a `turn_end` observer hook that fires after post-turn TUI state and
+  token totals are updated. Hooks receive structured JSON with status, usage,
+  totals, duration, tool count, and queued-message count on stdin; stdout is
+  ignored and failures are warn-only (#1364, #2578).
+- Added provider-scoped `insecure_skip_tls_verify` for private
+  OpenAI-compatible gateways that cannot use a trusted CA bundle. The setting is
+  disabled by default, applies only to the active LLM provider HTTP client, and
+  is surfaced by `codewhale doctor`; `SSL_CERT_FILE` remains the preferred path
+  for corporate or private CA roots. Thanks @wavezhang for the original #1893
+  direction.
+- Added a default-disabled hard-compaction planner that can identify the
+  summarizable middle of a long conversation while preserving the recent tail,
+  existing tool-call/result pair guarantees, and working-set pinning. This
+  harvests the safe planning layer from #2522 without enabling hard compaction
+  or adding a message-rewrite execution path yet. Thanks @HUQIANTAO for the
+  proposal.
+- Added rich PlanArtifact support to `update_plan`: Plan mode can now carry
+  grounded objectives, context, sources, critical files, constraints,
+  verification, risks, and handoff notes through the transcript card, Plan
+  confirmation prompt, `/relay`, fork-state, and saved-session replay.
+- Added the first `codewhale-whaleflow` foundation crate with typed workflow
+  config/IR validation and deterministic phase ordering tests. This preserves
+  the WhaleFlow direction from #2482/#2486 without exposing a runtime
+  `workflow_run` tool until cancellation, replay, and worktree semantics are
+  release-safe. The foundation now includes explicit `WorkflowSpec`,
+  `WorkflowNode`, branch/leaf/policy metadata structs, plus serializable branch,
+  leaf, and control-node result records toward the #2668 TraceStore contract.
+  It also adds a crate-local mock executor skeleton for Sequence, BranchSet,
+  Leaf, Reduce, LoopUntil, Cond, Expand, BranchTournament, and ParetoFrontier
+  control flow so #2669 can progress without spawning agents, applying
+  worktrees, or exposing a `workflow_run` runtime tool yet. A first Starlark
+  authoring layer now compiles fail-closed model-authored workflow files into
+  that typed IR, with `rlm_cache_change.star` and `issue_fix_tournament.star`
+  examples plus a one-pass repair for common `ctx.*` authoring aliases (#2670).
+  Leaf, branch, and workflow execution results now carry deterministic token
+  and cost telemetry fields that the mock executor can aggregate without live
+  provider calls or runtime sub-agent fanout (#2486). The mock executor now
+  carries crate-local cancellation and budget-exhaustion status markers so the
+  branch/leaf runtime contract can be tested before live workflow execution is
+  exposed (#2669). A crate-only replay executor now evaluates workflows from
+  recorded leaf/control records, computes
+  stable SHA-256 leaf input hashes, and marks missing records as
+  `replay_diverged` instead of calling models again (#2673); the runtime replay
+  command and live-provider replay fallback remain deferred. The crate also now
+  has a model-agnostic role/capability registry with mock provider plumbing and
+  fail-closed JSON repair parsing, so WhaleFlow can choose capable models for
+  roles without hardcoding provider-specific runtime paths (#2672). The
+  `rlm_cache_change.star` dogfood workflow now exercises candidate branches,
+  LoopUntil verification, tournament selection, teacher review, and mock
+  execution in CI-oriented crate tests (#2679). Leaf, branch, and workflow
+  results now also carry separate ARMH/shared-memo and provider prompt-cache
+  telemetry counters, with mock aggregation tests, so #2671 can progress
+  without wiring live RLM calls or billing-affecting provider behavior yet. The
+  Starlark and typed-IR gates now also reject unknown leaf dependencies,
+  reducer inputs, and teacher-review candidates before mock execution or replay,
+  keeping generated workflows fail-closed while runtime/worktree semantics stay
+  deferred. TeacherReview now has serializable GEPA-style candidate artifacts
+  for notes, workflow recipes, skills, regression tests, cache policy, branch
+  heuristics, and Starlark authoring prompt patches, plus an offline helper
+  that proposes candidates from recorded execution traces without promoting
+  them or training model weights (#2674). StudentReplay results can now be
+  stored on teacher candidates, and a deterministic PromotionGate compares
+  baseline-vs-candidate replay deltas, required tests, policy violations,
+  staleness, and cost constraints before marking a candidate promotable (#2675).
+  The external-memory cutline now documents that Aleph-style memory stays
+  optional, explicit, visible, and clear/export-capable for v0.9.0 rather than
+  becoming a hidden default context substrate (#2677).
+  A dedicated v0.9.0 release acceptance matrix now tracks provider, runtime,
+  UI, WhaleFlow, Model Lab, remote-workbench, docs, rollback, and credit gates
+  that must be checked or explicitly deferred before tagging (#2729).
+  HarnessProfile docs now pin the v0.9.0 order: posture/schema/resolver/seed
+  profiles/status display must precede evidence stores, promotion gates, or any
+  automatic Harness Creator, with DeepSeek, MiMo, Arcee, and generic/HF/local
+  posture expectations called out separately (#2728).
+  Hugging Face / Model Lab and `codebase_search` release gates now explicitly
+  ship only the provider/MCP/docs/design foundation in v0.9; native Hub search,
+  model passports, Spaces/Jobs workflows, eval/export surfaces, and runtime
+  `codebase_search` registration remain deferred (#2705, #2680, #2727).
+  Remote workbench acceptance is also marked docs/setup-only for v0.9 so release
+  notes do not imply a shipped VM or Telegram bridge runtime (#2724).
+  Release-facing HarnessProfile docs now match the current implementation:
+  v0.9 ships the typed schema/config foundation and defers runtime resolver,
+  telemetry, seed-profile selection, and status-display behavior until later
+  verified slices. `config.example.toml` includes a commented dormant
+  harness-profile example, and README links point at the real acceptance matrix
+  and HarnessProfile cutline docs.
+  The release acceptance matrix now records evidence for already-landed gates:
+  provider-registry drift checks, provider-scoped TLS skip verify, read-only
+  GUI runtime/restore-point surfaces, VS Code Agent View branch visibility,
+  WhaleFlow mock/runtime foundations, explicit external-memory boundaries, and
+  docs alignment. Live workflow execution, provider calls, TraceStore writes,
+  and mutation-oriented GUI endpoints remain deferred until their atomicity and
+  replay contracts are tested. The `rlm_cache_change.star` dogfood workflow can
+  now be replayed from recorded mock leaf/control records, and missing dogfood
+  records produce `ReplayDiverged` instead of falling back to live execution
+  (#2679). The UI/workflow UX rows now also distinguish shipped transcript
+  tool-run collapse, sidebar detail popovers, and PlanArtifact review/handoff
+  evidence from the deferred first-look/home redesign, and record focused
+  slash-picker readability smoke coverage for visibility, selection, skill
+  insertion, Esc priority, and stable composer height (#2692, #2694, #2691,
+  #2713).
+  Thanks @AdityaVG13 for the WhaleFlow draft and cost-tracking direction.
+- Added a state-store v2 schema migration for WhaleFlow trace tables covering
+  workflow, branch, leaf, control-node, and teacher-candidate runs. The
+  migration creates persistence shape only; workflow execution and replay
+  remain deferred until the runtime semantics are safe (#2668).
+- Added an official VS Code extension Phase 0 scaffold with terminal launch,
+  local runtime attach checks, status bar state, and a read-only Agent View
+  preview backed by recent runtime thread summaries, plus a read-only
+  `GET /v1/snapshots` endpoint for GUI clients to inspect side-git restore
+  points. The extension now renders those restore points read-only in its Agent
+  View, and thread summaries include read-only workspace, branch, current Git
+  head, and dirty-state metadata so the VS Code Agent View can show when a
+  thread or agent lane is on another branch or has changed worktree state. Agent
+  View and restore-point data now auto-refresh on a configurable
+  read-only interval so branch/workspace/status changes become visible without a
+  manual refresh. Agent View refreshes keep thread branch/workspace rows
+  independent from restore-point loading, so a snapshot-listing failure no
+  longer clears already-available thread metadata. This answers the VS Code GUI
+  lane without exposing chat webviews, inline edits, or retry/undo/restore
+  runtime mutation endpoints yet
+  (#461, #462, #480, #1217, #2341, #1584, #2327, #2580, #2808). Thanks @AiurArtanis
+  for the Agent View prompt, @lbcheng888 for the earlier scaffold, @gaord for
+  the GUI runtime API direction, @douglarek, @caeserchen, and @nightt5879 for
+  the branch visibility trail, and @BigBenLabs, @lzx1545642258, @yangdaowan,
+  @mangdehuang, @VerrPower, @hejia-v, @nasus9527, and @ygzhang-cn for the
+  GUI/VS Code demand and validation trail.
+- Added inline live-output refresh for background shell Exec cards keyed by the
+  exact shell task id, so long-running commands can show bounded stdout/stderr
+  tails without consuming deltas or matching by command text. Thanks
+  @donglovejava for the live shell-output direction in #2048.
+- Added a static prompt composer override for embedders that need to replace
+  the byte-stable base/personality prompt segment while leaving mode metadata,
+  approval policy, tool taxonomy, Context Management, and the Compaction Relay
+  under CodeWhale's runtime prompt assembly. This refines the embedder prompt
+  customization path from #2786 without weakening prompt-continuity safeguards.
+  Thanks @h3c-hexin.
+- Added `POST /v1/sessions` for runtime clients to save a completed thread as a
+  managed session. The endpoint preserves thread title/model/mode/workspace
+  metadata, maps missing threads to 404, and returns 409 instead of snapshotting
+  queued or active turns.
+- Added cost-estimate pricing for the Xiaomi MiMo primary chat models, which
+  were previously unpriced: `mimo-v2.5-pro` / `xiaomi/mimo-v2.5-pro` reuse the
+  DeepSeek V4-Pro rate table and `mimo-v2.5` / `xiaomi/mimo-v2.5` reuse the
+  DeepSeek V4-Flash rates. Existing DeepSeek pricing is unchanged (#2731, #2750).
+- Added a metadata-only `codewhale-config` provider registry with canonical
+  lookup, alias-aware resolution, provider defaults, config-table keys, and
+  API-key env candidates. Runtime routing remains unchanged and fallback
+  providers stay dormant; this harvests the safe provider-trait foundation from
+  #2479 toward #2075. Thanks @sximelon.
+- Added optional `[search].base_url` / `CODEWHALE_SEARCH_BASE_URL` support for
+  DuckDuckGo-compatible private search endpoints, while keeping
+  `DEEPSEEK_SEARCH_BASE_URL` as a legacy alias. Custom endpoints are gated by
+  their configured host, do not fall back to public Bing, and report the custom
+  host as the result source for diagnostics (#2436, #2510).
+- Added `completion_sound = "file"` with `[notifications].sound_file` so
+  Windows users can play a custom WAV file for turn-completion sounds without
+  changing the global Windows sound scheme (#2484, #2512).
+- Added `[tui].stream_chunk_timeout_secs` and `/config stream_chunk_timeout_secs`
+  so slow local or OpenAI-compatible model servers can extend the SSE idle
+  timeout without mutating process environment. The legacy
+  `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` env var remains a fallback (#2365, #2507).
+- Added dormant `fallback_providers = [...]` config parsing plus a provider-chain
+  helper for future fallback routing. This preserves the requested contract
+  without enabling silent runtime provider switches yet (#2574, #2777). Thanks
+  @hsdbeebou for the request and @idling11 for the data-model draft.
+- Added `/hf` with `/huggingface` alias for Hugging Face MCP status/setup
+  helpers and `/hf concepts` provider/MCP/Hub guidance. The helper points users
+  to Hugging Face's settings-generated MCP configuration and intentionally does
+  not include Hub search, direct Hugging Face HTTP requests, or upload behavior
+  (#2709, #2782). Thanks @idling11 for the original Hugging Face MCP draft.
+- Added an in-process response cache for deterministic non-streaming,
+  tool-free chat requests. The cache is keyed by provider, base URL, path
+  suffix, API-key fingerprint, and final wire body, and zeroes usage on hits so
+  local spend counters are not double-counted (#2501). Thanks @HUQIANTAO for
+  the response-cache proposal and canonical-body key update.
+- Added `/sidebar` so users can toggle, show, hide, and optionally persist the
+  TUI sidebar from the command line instead of relying on copy-hostile sidebar
+  state during long transcript work (#2766, #2788). Thanks @mo-vic for the
+  detailed report and @aboimpinto for the fix.
+- Added a pausable custom slash-command MVP: commands with `pausable: true`
+  can pause before further tool execution, preserve the paused command while
+  separate messages are handled, and resume only on explicit continue/resume
+  wording. Harvested from #2732 with thanks to @aboimpinto.
+- Added Sofya (`provider = "sofya"`) as a search-tool backend with
+  `SOFYA_API_KEY` fallback, while keeping Sofya scoped to web search rather
+  than model-provider routing (#2790). Thanks @yusufgurdogan for the
+  implementation.
+- Added Xiaomi MiMo `mode` / `XIAOMI_MIMO_MODE` / `MIMO_MODE` selection for
+  Token Plan region endpoints and pay-as-you-go routing, plus dedicated Token
+  Plan env keys for `tp-*` subscriptions (#2621, #2627). Thanks @springeye for
+  the request and @xyuai for the implementation.
+- Added the first TUI hotbar action registry foundation so future UI controls
+  can dispatch typed app actions instead of growing another command match
+  surface (#2866). Thanks @reidliu41 for the implementation.
+- Added the narrow multi-tab core and persistence foundation, including tab
+  manager snapshots, delegation/group restore counters, mention parsing,
+  cross-tab events, and corruption-tolerant persisted state, while leaving the
+  broader collaboration UI wiring to follow-up work (#2864). Thanks
+  @ljm3790865 for the tab-core implementation and #2753 direction.
+- The VS Code Agent View now renders the runtime thread summary's Git `head`
+  and dirty-worktree flag alongside branch metadata, keeping branch switches
+  visible without adding retry/undo/restore mutation endpoints yet (#2580,
+  #2862). Thanks @AiurArtanis and @nasus9527 for the IDE/agent-view requests
+  and @gaord for the runtime metadata direction.
+
+### Changed
+
+- Removed the deprecated `deepseek` and `deepseek-tui` binary shims from the
+  v0.9.0 Cargo crates and GitHub release artifact matrix. The canonical
+  `codewhale`, `codew`, and `codewhale-tui` entry points remain, the private
+  deprecated `npm/deepseek-tui` notice package stays unpublished, and DeepSeek
+  provider/model/env/config compatibility remains first-class.
+- Command-adjacent config persistence and auto model routing now live in
+  neutral TUI modules instead of command-owned files, reducing command-boundary
+  coupling while preserving current `/config`, `/model`, UI, runtime, and
+  sub-agent behavior (#2871). Thanks @aboimpinto for landing this first staged
+  command-boundary layer from the broader #2851/#2791 design direction.
+- `/config` now reports the canonical `~/.codewhale/settings.toml` path for TUI
+  settings while still reading legacy DeepSeek-branded settings fallbacks and
+  migrating them into the CodeWhale home on load.
+- Provider switches now roll back transactionally when the first request to a
+  newly selected provider fails authentication: CodeWhale restores the previous
+  provider/model, model-ID passthrough, onboarding/API-key state, runtime
+  config, persisted provider selection, and engine handle so users can return
+  to DeepSeek after a failed Moonshot/Kimi switch (#2754, #2755). Thanks
+  @Dr3259 for the Windows repro and @cyq1017 for the draft fix.
+- `PATCH /v1/threads/{id}` can now update a thread's persisted workspace for
+  GUI/runtime clients. Workspace changes reject active turns and evict idle
+  cached engines so the next turn starts in the new workspace.
+- Split `web_run` session/page cache state so cached page reads use shared
+  page handles and do not serialize through the mutation path. The harvest also
+  adds panic-safe state write-back and serializes cache-mutating unit tests so
+  the global web cache remains stable under normal Cargo test parallelism.
+- Appended volatile `<turn_meta>` blocks after user text in outgoing user
+  message content arrays so provider prefix caches can keep matching the stable
+  user-input prefix across date, route, and working-set changes.
+- Projected mode, approval, and tool-taxonomy prompt metadata per request
+  instead of mutating stored system prompts, keeping provider prefix-cache
+  inputs byte-stable while preserving mode-specific instructions (#2687).
+  Thanks @LeoAlex0 for the implementation.
+- Softened contribution intake automation: external issues now receive a warm
+  triage note and are never auto-closed by the contribution gate, while the PR
+  gate copy makes clear that dry-run observations are about maintainer safety,
+  not contributor quality.
+- Added a PR gate marker guard so reopened unapproved PRs do not get duplicate
+  intake comments, and clarified that PR reopening should happen after
+  allowlist approval is merged.
+- Ollama `/model` completions no longer show hosted DeepSeek API model IDs.
+  The picker preserves the current or saved local Ollama tag, and users can
+  still fetch installed model IDs through `/models` instead of relying on a
+  stale static default (#2742). Thanks @reidliu41 for the focused report and
+  draft fix.
+- MCP runtime API tool listings and approval summaries no longer split
+  underscored MCP server names at the first `_`. Tool-call routing already used
+  the longest registered server name; the list endpoint now reuses that parser,
+  and approval cards show the full MCP target route instead of a guessed server
+  segment (#2744). Thanks @lioryx, @cyq1017, and @puneetdixit200 for the report
+  and matching fixes.
+- Documented the agent and sub-agent stewardship ethos so future automation
+  preserves human issue intake, careful PR review, and contributor credit.
+- Moved the TUI Starlark execpolicy parser and PTY support behind non-OHOS
+  target dependencies so published OpenHarmony builds no longer pull `nix` 0.28
+  through `rustyline` or `portable-pty`.
+- Explicit `skills_dir` configuration is now unioned with workspace skill
+  discovery instead of being shadowed by workspace-local skills, and configured
+  skills take precedence over global defaults when prompt space is constrained.
+- Tool-agent sub-agent routing now inherits the parent session model, or an
+  explicit tool-agent override, instead of hard-coding `deepseek-v4-flash`;
+  the fast lane still disables thinking through provider-aware request shaping.
+- Dense successful read/search/list tool runs now collapse into a single
+  expandable transcript row by default, while running, failed, shell, patch,
+  review, diff, and other risky tool cells remain visible. The setting
+  `tool_collapse = "compact" | "expanded" | "calm"` controls the behavior.
+- Pending-input preview rows now label delivery mode explicitly as steer
+  pending, rejected steer, or queued follow-up, with wrapped continuation rows
+  aligned under the label so busy-turn input state is easier to read (#2054).
+- Editing a queued follow-up is now an explicit pending-input state. Pressing
+  `Esc` while editing a queued follow-up restores the original queued message
+  instead of cancelling the active turn or silently dropping the queued work
+  (#2054).
+- Approval prompts now render prominent command, directory, file, path, or
+  target rows before falling back to raw JSON params. Shell approvals preserve
+  long command tails, split common shell chains for review, and show compact
+  `printf > file` previews while keeping intent summaries visible (#1991,
+  #2269).
+- Sidebar hover details now use row-level metadata for truncated Work, Tasks,
+  and Agents rows. Mouse hover opens a bordered, wrapping popover with the full
+  underlying row text, long turn/agent ids, and current sub-agent progress
+  instead of repeating the already-ellipsized sidebar label (#2694, #2734).
+- Sub-agents now preserve checkpoint metadata around long model calls. A
+  per-step API timeout marks the child as interrupted with a continuable
+  checkpoint instead of ending as a null failed result, and `agent_eval` can
+  explicitly continue a live checkpointed interrupted child while normal
+  completed/failed/cancelled follow-up behavior stays unchanged (#2029).
+- Durable task recovery no longer requeues tasks that were `running` when the
+  previous CodeWhale process exited. On restart those records are marked failed
+  with a recovery note, and any running tool-call summaries are marked failed
+  too, so stale shell/task state cannot silently become live work again (#1786).
+- Auto-generated project instructions now reuse the bounded Project Context
+  Pack data instead of running an unbounded summary/tree scan when no
+  `.codewhale/instructions.md` file exists. The fallback keeps later
+  top-level folders visible in noisy large workspaces while the dynamic
+  `<project_context_pack>` marker remains controlled by its own setting
+  (#697, #1827).
+- Project context loading now uses a bounded process-local content-signature
+  cache for repeated hot-path loads. The cache covers workspace/parent
+  instructions, global AGENTS/WHALE fallbacks, repo constitution files,
+  generated-context targets, trust markers, and trust config paths, and it
+  stores post-load signatures so auto-generated context deletion/regeneration
+  stays correct (#2636).
+- Configuration docs now show the provider-local `path_suffix` escape hatch
+  for OpenAI-compatible gateways that accept `/chat/completions` but reject
+  `/v1/chat/completions`, while making clear that model listing and DeepSeek
+  beta routes keep their built-in paths (#1874).
+- The config crate now carries the v0.9 HarnessPosture data model:
+  `HarnessPosture`, `HarnessProfile`, and typed posture/compaction/tool/safety
+  enums. The schema rejects misspelled posture names or unknown profile keys
+  instead of silently falling back to `custom`; a pure resolver can match
+  provider/model routes for tests and future status plumbing, while runtime
+  provider/model posture selection remains a follow-up (#2693, #2741, #2728).

 ### Fixed

 - **Benchmark workspace copying.** Fixed benchmark workspace file copying so local benchmark tasks can preserve their intended file layout during agent runs.
 - **MiMo default tests.** Guarded Xiaomi MiMo default-model tests against ambient CI provider environment variables.
+- Stream/body decode failures such as `Stream read error: error decoding
+  response body` are now classified as recoverable network interruptions
+  instead of generic internal errors, keeping the transcript and triage metadata
+  aligned with the existing stream retry path (#2847). Thanks
+  @qamranmushtaq-collab for the Windows/npx DeepSeek report.
+- The TUI footer, `/status`, `/mcp` manager, and command-palette MCP entries
+  now count trusted workspace-local `.codewhale/mcp.json` servers together with
+  the global MCP config, matching `codewhale mcp list` for merged global +
+  project setups (#2787). Thanks @yekern for the detailed reproduction.
+- AltGr key chords in the composer no longer get swallowed by sidebar shortcuts
+  on AZERTY and other international layouts, so characters such as `@`, `#`,
+  `$`, `!`, and `%` can be entered normally (#2863, #2867). Thanks
+  @ousamabenyounes for the fix and report.
+- Sub-agent shell completions now refresh the workspace branch/status chip
+  immediately, and `/subagents` plus the Agents sidebar show each sub-agent's
+  current workspace branch when it is running in a child worktree.
+- Authentication failures now include redacted request context such as provider,
+  base URL authority, model, key source, key type, and key fingerprint, making
+  stale provider, endpoint, or API-key state diagnosable without exposing the
+  secret (#2665, #2792). Thanks @mvanhorn for the implementation.
+- Browser-opening actions now compile on non-desktop targets by delegating the
+  unsupported-platform error to the shared URL opener instead of hiding the TUI
+  wrapper behind a narrower macOS/Linux/Windows cfg. Thanks @ci4ic4 for the
+  NetBSD/pkgsrc packaging report and fix (#2789).
+- MCP tool routing now preserves server names that contain underscores.
+  `parse_prefixed_name` matches the qualified `mcp_<server>_<tool>` name against
+  the set of registered server names and prefers the longest match, so tools on
+  a server like `my_db` are reachable and an overlapping `my` / `my_db` pair
+  routes correctly. Falls back to the legacy first-underscore split when no
+  registered server matches (#2744).
+- Schema-hydrated deferred tools no longer render as a completed run. The first
+  use of a deferred tool returns a schema-hydration result instead of executing;
+  the transcript and sidebar now show "tool loaded — retry required" via a
+  dedicated hydrated status, so it is no longer indistinguishable from a real
+  successful execution. A hydrated row also ranks with active work rather than
+  completed successes (#2648).
+- `codewhale sessions` now shows `codewhale resume <session-id>` in the footer
+  instead of the invalid dispatcher command `codewhale --resume <session-id>`
+  (#2758, #2760).
+- TUI HTTP clients now install the Rustls ring crypto provider before building
+  `reqwest` clients, covering engine, runtime API, tool, MCP, config, and skill
+  download paths. This keeps the no-provider TLS build from panicking during
+  tests or embedded startup paths that do not enter through the main binary.
+- Prompt byte-stability tests now pin their temporary home and skills
+  environment under the shared test-env lock so global skill directories cannot
+  perturb deterministic prompt bytes during parallel test runs.
+
+### Community
+
+Thanks to **@sximelon** for reporting and fixing the saved-session resume
+footer hint (#2758, #2760), **@cyq1017** for the custom
+DuckDuckGo-compatible search endpoint, custom completion sound file support,
+restore-listing implementation, and pending-input delivery-mode label work
+(#2510, #2512, #2513, #2532, #2054),
+**@Artenx** for the private-search endpoint report (#2436),
+**@LHqweasd** for the Windows custom notification sound request (#2484),
+**@wywsoor** for the broader macOS/iTerm rollback UX report (#2494),
+**@HUQIANTAO** for the `web_run` lock-splitting work (#2502), turn-metadata
+prefix-cache stability work (#2517), and project-context cache direction
+(#2636), **@xyuai** for canonical CodeWhale
+settings-path migration work (#2730), **@gaord** for the runtime thread
+workspace update and completed-thread save APIs (#2640, #2639),
+**@shenjackyuanjie** for the
+HarmonyOS/OpenHarmony port and MatePad Edge validation trail (#2634),
+**@ousamabenyounes** for the AZERTY AltGr composer shortcut fix (#2863,
+#2867), **@reidliu41** for the hotbar action-registry foundation (#2866), and
+**@ljm3790865** for the multi-tab core/persistence foundation and broader
+collaboration direction (#2864, #2753),
+**@aboimpinto** for the direct command-support boundary cleanup in #2871 and
+the broader #2851/#2791 command-layer design direction,
+**@idling11** for the PlanArtifact direction in Plan mode (#2733), the dense
+tool-call transcript collapse/sidebar detail direction (#2738, #2734, #2692,
+#2694), and the HarnessPosture config model for provider/model posture (#2741,
+#2693), and
+**@h3c-hexin** for the tool-agent model inheritance and configured
+`skills_dir` fixes (#2736, #2737), **@AresNing** for the turn-end observer hook
+work (#2578), and **@tdccccc** for the approval key-detail and shell-preview
+work (#1991, #2269). Thanks also to **@qiyuanlicn** for the
+checkpoint/resume report that shaped the sub-agent recovery slice (#2029),
+**@bevis-wong** for the long-running shell/task liveness report (#1786),
+**@shuxiangxuebiancheng** for the third-party OpenAI-compatible path report
+(#1874), **@hongqitai** and **@cyq1017** for the follow-up path-suffix PR
+review trail (#2508, #2506), **@NASLXTO** and **@wuxixing** for the
+large-workspace startup reports (#697, #1827), and **@linzhiqin2003** and
+**@merchloubna70-dot** for earlier context-cap and startup-diagnosis work that
+shaped this bounded fallback. Thanks also to **@cyq1017** for the MCP
+underscore-server-name fix and Xiaomi MiMo pricing (#2747, #2744, #2750, #2731)
+and **@puneetdixit200** for independently diagnosing and fixing the same MCP
+underscore issue (#2746, #2744), **@mvanhorn** for the hydrated deferred-tool
+render fix (#2757, #2648), and **@xyuai** for the Xiaomi MiMo Token Plan region
+documentation (#2756, #2735). Additional thanks to **@Implementist** for Plan
+prompt scrolling, wrapping, and display-width fixes, **@jrcjrcc** for the
+Windows sub-agent completion render-width fix, and **@punkcanyang** for the
+original `/init` implementation harvested through #2771/#2745.

 ## [0.8.53] - 2026-06-03

@@ -98,8 +98,12 @@ When this happens:
 - If the maintainer copies or adapts your code, the harvested commit also
  keeps attribution with the original author identity when possible: either by
  preserving the commit author on a cherry-pick or by adding a
-  `Co-authored-by: Name <email>` trailer from the original PR commit. This is
+  `Co-authored-by: Name <id+login@users.noreply.github.com>` trailer. This is
  what lets GitHub's contribution surfaces recognize more than prose credit.
+  Maintainers should use `.github/AUTHOR_MAP`, or run
+  `gh api users/<login> --jq '"\(.id)+\(.login)@users.noreply.github.com"'`,
+  rather than copying raw, `.local`, or old-style noreply emails from a
+  contributor's machine.
 - The `CHANGELOG.md` entry for the next release credits you by handle.
 - The auto-close workflow closes your PR with a templated thank-you and
  a link to the commit on `main`.
@@ -172,16 +176,24 @@ Validation:
 CodeWhale uses a maintainer-managed contribution gate for the community front
 door. Maintainers and collaborators bypass this gate automatically. The gate
 workflows default to dry-run / comment-only mode so maintainers can observe the
-signal before closing contributor work. In dry-run mode, unapproved external
-issues and pull requests receive a short thank-you / CONTRIBUTING pointer and
-remain open.
+signal before changing contributor flow.

-When maintainers are ready to enforce the gate, set
-`CONTRIBUTION_GATE_MODE: enforce` in the PR and issue gate workflows. In enforce
-mode, external contributors must be listed in
-`.github/APPROVED_CONTRIBUTORS` before their issues or pull requests remain
-open. Before enabling enforcement, seed the allowlist broadly enough for active
-external contributors who should not be interrupted by the rollout.
+The maintainer posture is documented in
+[docs/AGENT_ETHOS.md](docs/AGENT_ETHOS.md): automation should reduce load while
+keeping good-faith contributors seen, credited, and able to keep helping.
+
+Issues are never auto-closed by the contribution gate. Unapproved external
+issues receive a short welcome note that asks for reproduction details and then
+remain open for maintainer triage. CodeWhale depends on real edge cases from
+real users, so issue intake should stay warm and open.
+
+Pull requests are different because they can touch code, CI, release plumbing,
+auth, sandboxing, provider policy, and other trust-boundary surfaces. The PR
+gate can be switched from dry-run to enforcement when maintainers decide they
+need that safety control, but it should be treated as a review-load control,
+not a judgment on contributor quality. Before enabling PR enforcement, seed the
+allowlist broadly enough for active external contributors who should not be
+interrupted by the rollout.

 The allowlist is scoped:

@@ -198,11 +210,10 @@ discussion.
 Approvals do not edit `main` directly. The approval workflow opens a small
 allowlist update PR so the new entry is reviewable before it takes effect.

-If the gate fires on a good contributor incorrectly, use the same approval flow
-to restore them: comment `/lgtm` or `/lgtmi`, merge the generated allowlist PR,
-then reopen the affected issue or pull request. If GitHub will not allow the
-closed item to be reopened, ask the contributor to resubmit after the allowlist
-PR is merged.
+If the PR gate fires on a good contributor incorrectly, use the same approval
+flow to restore them: comment `/lgtm`, merge the generated allowlist PR, then
+reopen the affected pull request. If GitHub will not allow the closed PR to be
+reopened, ask the contributor to resubmit after the allowlist PR is merged.

 ## Agent-Assisted Improvements

@@ -213,6 +224,11 @@ from a fresh fork or branch, let the agent find exactly one small friction point
 and stop after one patch. DeepSeek V4 Pro is the first-class path for this loop
 today, but the review shape matters more than the provider.

+Agents and maintainers should follow the stewardship posture in
+[docs/AGENT_ETHOS.md](docs/AGENT_ETHOS.md): use automation for evidence,
+verification, and narrow patches while keeping the final community decision
+human-reviewed.
+
 The useful output is not "ideas for improvement." The useful output is a
 specific reproduction, a minimal diff, focused checks, and a PR description that
 explains the trade-off. Do not use an agent to touch auth, credentials, sandbox
@@ -15,6 +15,7 @@ members = [
    "crates/tools",
    "crates/tui",
    "crates/tui-core",
+    "crates/whaleflow",
 ]
 default-members = ["crates/cli", "crates/app-server", "crates/tui"]
 resolver = "2"
@@ -38,7 +39,8 @@ chrono = { version = "0.4.43", features = ["serde"] }
 clap = { version = "4.5.54", features = ["derive"] }
 clap_complete = "4.5"
 dirs = "6.0.0"
-reqwest = { version = "0.13.1", default-features = false, features = ["json", "rustls", "socks"] }
+reqwest = { version = "0.13.1", default-features = false, features = ["json", "rustls-no-provider", "socks"] }
+rustls = { version = "0.23.36", default-features = false, features = ["ring", "std", "tls12"] }
 rusqlite = { version = "0.32.1", features = ["bundled"] }
 serde = { version = "1.0.228", features = ["derive"] }
 serde_json = "1.0.149"
@@ -143,6 +143,8 @@ codewhale doctor                         # セットアップを検証

 `npm i -g codewhale` は v0.8.8 以降、glibc ベースの ARM64 Linux で動作します。[Releases ページ](https://github.com/Hmbown/CodeWhale/releases) からビルド済みバイナリをダウンロードし、`PATH` 上に並べて配置することもできます。

+HarmonyOS PC と OpenHarmony クロスビルドの設定は [docs/HarmonyOS.md](docs/HarmonyOS.md) を参照してください。
+
 ### 中国 / ミラーフレンドリーなインストール

 中国本土から GitHub または npm のダウンロードが遅い場合は、Cargo レジストリのミラーを利用してください:
@@ -1,11 +1,102 @@
 # CodeWhale

-> Terminal coding agent for DeepSeek V4. It runs from the `codewhale` command, streams reasoning blocks, edits local workspaces with approval gates, and includes an auto mode that chooses both model and thinking level per turn.
+> DeepSeek-first terminal coding agent with a durable harness: approval-gated
+> local edits, sub-agents, provider/model routing, live verification, rollback,
+> relay/continuity handoffs, and a v0.9 track for typed WhaleFlow workflows.

 [简体中文 README](README.zh-CN.md)
 [日本語 README](README.ja-JP.md)
 [Tiếng Việt README](README.vi.md)

+[![CI](https://github.com/Hmbown/CodeWhale/actions/workflows/ci.yml/badge.svg)](https://github.com/Hmbown/CodeWhale/actions/workflows/ci.yml)
+[![npm](https://img.shields.io/npm/v/codewhale)](https://www.npmjs.com/package/codewhale)
+[![crates.io](https://img.shields.io/crates/v/codewhale-cli?label=crates.io)](https://crates.io/crates/codewhale-cli)
+[DeepWiki project index](https://deepwiki.com/Hmbown/CodeWhale)
+
+![codewhale screenshot](assets/screenshot.png)
+
+## What CodeWhale Does
+
+CodeWhale is a terminal-native coding harness for agentic model work. It gives
+the model a durable prompt constitution, a typed tool surface, approval gates,
+side-git rollback, LSP feedback after edits, cost/cache telemetry, and
+concurrent sub-agents that can investigate or implement without blocking the
+parent turn.
+
+It is DeepSeek-first, not DeepSeek-only. The default path targets DeepSeek V4,
+while provider routes such as OpenRouter, NVIDIA NIM, Arcee, Xiaomi MiMo,
+SiliconFlow, Fireworks, OpenAI-compatible gateways, self-hosted SGLang/vLLM, and
+Hugging Face stay explicit. Provider, model, base URL, and credentials are
+separate choices so direct-provider APIs do not get blurred with OpenRouter
+aliases.
+
+The product goal is practical continuity. A long CodeWhale task should survive
+model routing, compaction, shell noise, branch experiments, contributor review,
+and a fresh maintainer session without losing the reason the work started or
+who helped move it forward.
+
+## Active v0.9 Track
+
+v0.9.0 is not released yet. The current branch is a stewardship lane for making
+long-running CodeWhale work easier to continue, review, and hand off without
+turning the README into release notes.
+
+The v0.9 track keeps the same DeepSeek-first harness and adds work in these
+areas:
+
+| Track | What is changing |
+| --- | --- |
+| Relay and continuity | `/relay`, fork-state handoff, and rich PlanArtifact context preserve the goal, why it matters, evidence, constraints, blockers, changed files, verification state, and the next action. |
+| Transcript calmness | Dense read/search/list-style tool runs can collapse into expandable groups, while failures, running work, shell commands, writes, diffs, plans, and reviews stay visible. |
+| Runtime sessions and workspaces | Branch work extends session/thread runtime APIs, including workspace-aware thread updates, completed-thread session saves, and safer guards around active turns. Treat this as v0.9-track capability until the release ships. |
+| Sub-agent recovery | Live per-step timeout recovery can preserve checkpoint metadata and let `agent_eval { continue: true }` resume an interrupted child in the same runtime. Cold-restart continuation is still a follow-up; persisted child tasks are not rehydrated yet. |
+| Project context stability | Bounded project-context packs and generated instructions keep large/noisy repositories from turning the first turn into an unbounded filesystem walk. |
+| HarmonyOS / OHOS | The lane carries safe OpenHarmony setup, OHOS platform guards, self-update disablement on OHOS, and target gating for PTY and Starlark execpolicy paths. Full OHOS target builds still require a host with the OpenHarmony native SDK configured. |
+| Nix and Starlark compatibility | Dependency stewardship keeps OHOS builds from pulling incompatible Nix-chain crates through PTY or Starlark paths where those features are gated. |
+| HarnessProfile | The branch carries the typed `HarnessPosture` / `HarnessProfile` config data model, strict schema validation, and a documented [profile cutline](docs/HARNESS_PROFILE_CUTLINE.md). Provider/model posture resolution, prompt/tool/runtime behavior, telemetry, and status display remain follow-up work. |
+| Contributor stewardship | Harvested PRs stay credited, contributor identity mapping is machine-readable, and community gates remain dry-run and human-toned while the branch is reviewed. |
+| WhaleFlow | Typed branch/leaf workflows, deterministic replay, pod-style workflow monitoring, provider/model posture, and evidence-backed profile evolution remain the larger v0.9 workbench goal. |
+
+The current release acceptance matrix lives in
+[docs/V0_9_0_RELEASE_ACCEPTANCE.md](docs/V0_9_0_RELEASE_ACCEPTANCE.md), with
+the HarnessProfile runtime boundary documented in
+[docs/HARNESS_PROFILE_CUTLINE.md](docs/HARNESS_PROFILE_CUTLINE.md).
+
+## Release Status
+
+The latest published release line is still separate from the v0.9 integration
+branch. v0.9.0 work in this README describes the current integration track, not
+a published release artifact. Release-specific detail belongs in
+[CHANGELOG.md](CHANGELOG.md); this README summarizes the current user-facing
+surface and links to deeper docs.
+
+Release channels can lag each other. Before making release claims, verify the
+intended surface directly: GitHub Releases and checksums, npm `codewhale`,
+Cargo crates, Docker/GHCR images, CNB mirrors, and any legacy Homebrew formula.
+No tag, GitHub Release, npm/Cargo publish, Docker publish, or release artifact
+push should happen without explicit maintainer approval.
+
+## Quickstart
+
+```bash
+npm install -g codewhale
+codewhale --version
+codewhale --model auto
+```
+
+On first launch, CodeWhale prompts for a DeepSeek API key and saves it to
+`~/.codewhale/config.toml`; the legacy `~/.deepseek/config.toml` path is still
+read for compatibility. You can also set credentials directly:
+
+```bash
+codewhale auth set --provider deepseek
+codewhale auth status
+codewhale doctor
+```
+
+Use `/provider`, `/model`, `/config`, `/statusline`, `/skills`, and `/restore`
+inside the TUI. Prefix a composer line with `!` to run a shell command through
+the normal approval and sandbox path, for example `! cargo test -p codewhale-tui`.

 ## Install

@@ -67,177 +158,131 @@ cargo install codewhale-cli --locked --force
 cargo install codewhale-tui     --locked --force
 ```

-> codewhale update now supports --proxy, update through a proxy
-> eg: codewhale update --proxy https://localhost:7897
-
-[![CI](https://github.com/Hmbown/CodeWhale/actions/workflows/ci.yml/badge.svg)](https://github.com/Hmbown/CodeWhale/actions/workflows/ci.yml)
-[![npm](https://img.shields.io/npm/v/codewhale)](https://www.npmjs.com/package/codewhale)
-[![crates.io](https://img.shields.io/crates/v/codewhale-cli?label=crates.io)](https://crates.io/crates/codewhale-cli)
-[DeepWiki project index](https://deepwiki.com/Hmbown/CodeWhale)
-
-![codewhale screenshot](assets/screenshot.png)
+`codewhale update --proxy https://localhost:7897` routes update checks and
+downloads through a proxy.

 ---

-## What Is It?
+## Harness Model

-A model answers a question. An agent finishes a task. The difference is
-the harness — a system of rules, evidence, and feedback that keeps the
-model oriented instead of drifting.
+A model answers a question. An agent finishes a task. The difference is the
+harness: the rules, tools, evidence, and feedback that keep the model oriented
+when user intent, repo instructions, tool output, stale memory, and prior
+handoffs all compete inside one turn.

-CodeWhale is that harness, built around DeepSeek V4 and guided by three ideas:
+CodeWhale's harness has four practical parts:

-| Principle | How it works |
-|---|---|
-| **Start with trust** | Every turn begins with "A" — possibility before certainty, craft before convenience |
-| **Clear jurisdiction** | A written Constitution with nine tiers of authority. User intent outranks stale instructions. Verification outranks confidence. |
-| **Recursive improvement** | V4 helped write the harness. As the harness improves, V4 becomes more effective — and helps improve the harness further. Each turn starts stronger. |
+| Part | What it does |
+| --- | --- |
+| Prompt constitution | `crates/tui/src/prompts/base.md` gives the model a stable authority hierarchy: live user intent beats stale instructions, live tool output beats assumptions, and verification beats confidence. |
+| Typed tool surface | Shell, file, git, web, MCP, RLM, image, and sub-agent tools are registered with explicit schemas, visibility rules, and compatibility aliases. |
+| Runtime evidence loop | Side-git snapshots, LSP diagnostics, command output, cost/cache accounting, and task state are fed back into the transcript instead of hidden behind the UI. |
+| Approval and sandbox posture | Plan is read-only, Agent uses approval gates, and YOLO auto-approves in trusted workspaces. macOS Seatbelt is enforced; Linux Landlock is detected but not yet enforced; Windows sandboxing is not advertised. |

-It's open source, terminal-native, and packaged as a matched `codewhale` /
-`codewhale-tui` Rust binary pair.
+### Relay And Continuity

-## How the Harness Works
+Relay is intentional compaction for human and agent handoff. Use `/relay` before
+a long break, a fresh thread, a fork, or a handoff to another agent. It keeps the
+important story small: the objective, why the work is being done, current state,
+changed files, evidence checked, constraints, blockers, and the next concrete
+action.

-Agentic models deal with conflicting information at scale: user intent,
-project rules, system defaults, tool output, and stale memory all compete
-for authority in a single turn. LLM-as-a-judge needs jurisdiction — which
-source wins when they disagree?
+Automatic compaction protects context windows. Relay protects continuity. In
+the v0.9 track, rich PlanArtifact fields feed the transcript card, Plan-mode
+confirmation, `/relay`, fork-state handoff, and saved-session replay so the
+plan, the evidence, and the next step do not become separate stories.

-CodeWhale answers this with a **Constitution** (`prompts/base.md`). It's a
-formal hierarchy of law — Article VII ranks nine tiers from the
-Constitution's own articles down to prior-session handoffs. The user's
-current message outranks stale project instructions. Live tool output
-outranks assumptions. Verification outranks confidence. The model inherits
-a clear chain of authority every turn and never has to guess which
-directive to follow.
+`codewhale` is the dispatcher CLI. `codewhale-tui` is the companion runtime
+binary it launches for interactive sessions. The TUI talks to an async engine,
+an OpenAI-compatible streaming client, the tool registry, the durable task
+queue, the LSP subsystem, and optional HTTP/SSE or ACP servers. See
+[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full walkthrough.

-Six Articles define the model's identity, duties, and agency (Article VII
-is the hierarchy itself): a verification mandate (Article V — every action
-leaves evidence, never declare success on faith), a coordination legacy
-(Article VI — leave the workspace cleaner and the handoff truthful for the
-next intelligence), and a primacy-of-truth clause (Article II —
-non-negotiable; not even a user request may override the duty of truth).
+### Auto Model Routing

-DeepSeek V4's prefix caching makes this practical. The Constitution is long
-and detailed, but once cached it costs roughly 100× less per turn than a
-cold read. The model references it recursively — peeking, scanning, and
-querying through RLM sessions — revisiting information on demand rather
-than relying on a single memorized pass. It performs more like an
-open-book test than a closed one.
-
-Because the authority structure is explicit, failure isn't hidden. Non-zero
-exit codes, type errors from rust-analyzer arriving between turns, sandbox
-denials — these are fed back as correction vectors. The model uses its own
-drift to self-correct.
-
-Three modes control the action space. Plan is read-only. Agent gates
-destructive operations behind approval. YOLO auto-approves in trusted
-workspaces. macOS Seatbelt is the active sandbox; Linux Landlock is
-detected but not yet enforced; Windows sandboxing is not yet advertised.
-
-Fin — a cheap Flash call with thinking off — handles model auto-routing per
-turn. `--model auto` is the default.
-
-Every turn records a side-git snapshot outside your repo's `.git`.
-`/restore` and `revert_turn` roll back the workspace.
-
-Sub-agents run concurrently (up to 20). `agent_open` returns immediately;
-results arrive inline as completion sentinels with a summary. Full
-transcripts stay behind bounded handles through `agent_eval`. See
-[docs/SUBAGENTS.md](docs/SUBAGENTS.md).
-
-The rest of the surface: LSP diagnostics after every edit (rust-analyzer,
-pyright, typescript-language-server, gopls, clangd, jdtls,
-vue-language-server), RLM sessions for batched analysis, MCP protocol,
-HTTP/SSE runtime API, persistent task queue, ACP adapter for Zed,
-SWE-bench export, and live cost tracking with cache hit/miss breakdowns.
-
---
-
-## The Harness
-
-`codewhale` (dispatcher CLI) → `codewhale-tui` (companion binary) → ratatui interface ↔ async engine ↔ OpenAI-compatible streaming client. Tool calls route through a typed registry (shell, file ops, git, web, sub-agents, MCP, RLM) and results stream back into the transcript. The engine manages session state, turn tracking, the durable task queue, and an LSP subsystem that feeds post-edit diagnostics into the model's context before the next reasoning step.
-
-See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full walkthrough.
-
-### Sub-agents: Concurrent Background Execution
-
-CodeWhale can dispatch multiple sub-agents that run in parallel — like a concurrent task queue:
-
- **Non-blocking launch.** `agent_open` returns immediately. The child gets its own fresh context and tool registry and runs independently. The parent keeps working.
- **Background execution.** Sub-agents execute concurrently (default cap: 10, configurable to 20). The engine manages the pool — no polling loop needed.
- **Completion notification.** When a sub-agent finishes, the runtime injects a `<codewhale:subagent.done>` sentinel into the parent's transcript. The human-readable summary — including the child's findings, changed files, and any risks — sits on the line immediately before the sentinel. The parent model reads that summary and integrates findings without an extra tool call.
- **Bounded result retrieval.** The full child transcript lives behind a `transcript_handle` accessible through `agent_eval`. When the summary isn't enough, the parent calls `handle_read` for slices, line ranges, or JSONPath projections — keeping the parent context lean without losing access to the details.
-
-See [docs/SUBAGENTS.md](docs/SUBAGENTS.md) for the full sub-agent reference.
-
---
-
-## Quickstart
-
-```bash
-npm install -g codewhale
-codewhale --version
-codewhale --model auto
-```
-
-Prebuilt binary pairs and platform archives are published for **Linux x64**, **Linux ARM64** (v0.8.8+), **macOS x64**, **macOS ARM64**, and **Windows x64**. For other targets (musl, riscv64, FreeBSD, etc.), see [Install from source](#install-from-source) or [docs/INSTALL.md](docs/INSTALL.md).
-
-On first launch you'll be prompted for your [DeepSeek API key](https://platform.deepseek.com/api_keys). The key is saved to `~/.codewhale/config.toml` (legacy `~/.deepseek/config.toml` also supported) so it works from any directory without OS credential prompts.
-
-You can also set it ahead of time:
-
-```bash
-codewhale auth set --provider deepseek   # saves to ~/.codewhale/config.toml
-codewhale auth status                    # shows the active credential source
-
-export DEEPSEEK_API_KEY="YOUR_KEY"      # env var alternative; use ~/.zshenv for non-interactive shells
-codewhale
-
-codewhale doctor                         # verify setup
-```
-
-If `codewhale doctor` says the rejected key came from `DEEPSEEK_API_KEY`, remove
-the stale export from your shell startup file, open a fresh shell, or run
-`codewhale auth set --provider deepseek`. Use `codewhale auth status` to see the
-config, keyring, and env-var source state without printing the key. Saved config
-keys take precedence over the keyring and environment and are easier to rotate.
-
-> To rotate or remove a saved key: `codewhale auth clear --provider deepseek`.
-
-### Tencent Cloud / CNB Remote-First Path
-
-For an always-on workspace you can control from a phone, use the Tencent-native
-path: CNB mirror/source, Tencent Lighthouse HK, a Feishu/Lark long-connection
-bridge, and optional EdgeOne for a deliberate public HTTPS edge. The runtime API
-stays bound to localhost; EdgeOne is not used to expose `/v1/*`.
-
-Start with [docs/TENCENT_CLOUD_REMOTE_FIRST.md](docs/TENCENT_CLOUD_REMOTE_FIRST.md),
-then use [docs/TENCENT_LIGHTHOUSE_HK.md](docs/TENCENT_LIGHTHOUSE_HK.md) for the
-server runbook.
-
-### Auto Mode
-
-Use `codewhale --model auto` or `/model auto` when you want codewhale to decide how much model and reasoning power a turn needs.
-
-Auto mode controls two settings together:
+`--model auto` is the default. Before the real turn is sent, CodeWhale makes a
+small `deepseek-v4-flash` routing call with thinking off. That local router
+selects the concrete model and thinking level for the real request:

 - Model: `deepseek-v4-flash` or `deepseek-v4-pro`
 - Thinking: `off`, `high`, or `max`

-Before the real turn is sent, the app makes a small `deepseek-v4-flash` routing call with thinking off. That router looks at the latest request and recent context, then selects a concrete model and thinking level for the real request. Short/simple turns can stay on Flash with thinking off; coding, debugging, release work, architecture, security review, or ambiguous multi-step tasks can move up to Pro and/or higher thinking.
+The upstream API never receives `model: "auto"`; it receives the concrete route
+chosen for that turn. Use a fixed model or thinking level for repeatable
+benchmarking, strict cost ceilings, or exact provider/model mapping.

-`auto` is local to codewhale. The upstream API never receives `model: "auto"`; it receives the concrete model and thinking setting chosen for that turn. The TUI shows the selected route, and cost tracking is charged against the model that actually ran. If the router call fails or returns an invalid answer, the app falls back to a local heuristic. Sub-agents inherit auto mode unless you assign them an explicit model.
+### Sub-agents

-Use a fixed model or fixed thinking level when you want repeatable benchmarking, a strict cost ceiling, or a specific provider/model mapping.
+Sub-agents run concurrently in the background. `agent_open` returns immediately;
+the child receives its own context and tool registry, then reports back with a
+completion sentinel and a human-readable summary. The full child transcript
+stays behind a bounded handle that the parent can inspect through `agent_eval`.

-### Linux ARM64 (Raspberry Pi, Asahi, Graviton, HarmonyOS PC)
+Default concurrency is 10 and configurable up to 20. See
+[docs/SUBAGENTS.md](docs/SUBAGENTS.md) for role taxonomy, lifecycle, wait/eval
+tools, and transcript-handle details.

-`npm i -g codewhale` works on glibc-based ARM64 Linux from v0.8.8 onward. You can also download prebuilt binaries from the [Releases page](https://github.com/Hmbown/CodeWhale/releases) and place them side by side on your `PATH`.
+## Provider Routes
+
+For the full provider registry, model IDs, auth variables, base URLs, and
+capability boundaries, see [docs/PROVIDERS.md](docs/PROVIDERS.md).
+
+Provider and model are deliberately separate choices. `provider` is the route,
+account, endpoint, and credential source; `model` is the model ID on that route.
+That distinction matters when the same model family appears through direct APIs
+and OpenRouter aliases.
+
+| Provider | Typical model IDs | Notes |
+| --- | --- | --- |
+| `deepseek` | `deepseek-v4-pro`, `deepseek-v4-flash` | Default direct DeepSeek route. |
+| `openrouter` | `deepseek/deepseek-v4-pro`, `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3` | OpenRouter route; keep these IDs distinct from direct provider IDs. |
+| `arcee` | `trinity-large-thinking`, `trinity-large-preview`, `trinity-mini` | Direct Arcee API at `https://api.arcee.ai/api/v1`. |
+| `xiaomi-mimo` | `mimo-v2.5-pro`, `mimo-v2.5`, TTS IDs | Token Plan keys (`tp-...`) use `api-key` auth and default to the Token Plan endpoint; pay-as-you-go keys can set the MiMo API endpoint explicitly. |
+| `nvidia-nim` | `deepseek-ai/deepseek-v4-pro` | Uses NVIDIA account terms and model IDs. |
+| `siliconflow` / `siliconflow-CN` | `deepseek-ai/DeepSeek-V4-Pro` | SiliconFlow global and China routes. |
+| `fireworks` | `accounts/fireworks/models/deepseek-v4-pro` | Fireworks route. |
+| `openai` | Your gateway's model ID | Generic OpenAI-compatible endpoint. |
+| `huggingface` | `deepseek-ai/DeepSeek-V4-Pro` | Hugging Face router route. |
+| `sglang`, `vllm`, `ollama` | Local model IDs/tags | Self-hosted routes. |
+
+```bash
+codewhale auth set --provider openrouter --api-key "YOUR_OPENROUTER_API_KEY"
+codewhale --provider openrouter --model deepseek/deepseek-v4-pro
+
+codewhale auth set --provider arcee --api-key "YOUR_ARCEE_API_KEY"
+codewhale --provider arcee --model trinity-large-thinking
+
+codewhale auth set --provider xiaomi-mimo --api-key "YOUR_XIAOMI_KEY"
+codewhale --provider xiaomi-mimo --model mimo-v2.5-pro
+codewhale --provider xiaomi-mimo speech "Hello from MiMo" --model tts -o hello.wav
+XIAOMI_MIMO_TOKEN_PLAN_API_KEY="tp-..." XIAOMI_MIMO_MODE="token-plan-sgp" \
+  codewhale --provider xiaomi-mimo --model mimo-v2.5-pro
+
+codewhale auth set --provider openai --api-key "YOUR_OPENAI_COMPATIBLE_API_KEY"
+OPENAI_BASE_URL="https://openai-compatible.example/v4" \
+  codewhale --provider openai --model glm-5
+
+SGLANG_BASE_URL="http://localhost:30000/v1" \
+  codewhale --provider sglang --model deepseek-v4-flash
+```
+
+Inside the TUI, `/provider` opens the provider picker and `/model` opens the
+model/thinking picker. `/models` fetches live API model lists when the active
+provider supports listing.
+
+## Platform Notes
+
+Prebuilt binary pairs and platform archives are published for Linux x64, Linux
+ARM64, macOS x64, macOS ARM64, and Windows x64. For other targets, see
+[docs/INSTALL.md](docs/INSTALL.md).
+
+For HarmonyOS PC and OpenHarmony cross-build setup, see [docs/HarmonyOS.md](docs/HarmonyOS.md).

 ### China / Mirror-friendly Installation

-If GitHub or npm downloads are slow from mainland China, use a Cargo registry mirror:
+If GitHub or npm downloads are slow from mainland China, use
+`npm install -g codewhale --registry=https://registry.npmmirror.com`, download
+from GitHub Releases, or configure a Cargo registry mirror:

 ```toml
 # ~/.cargo/config.toml
@@ -248,37 +293,38 @@ replace-with = "tuna"
 registry = "sparse+https://mirrors.tuna.tsinghua.edu.cn/crates.io-index/"
 ```

-Then install both binaries (the dispatcher delegates to the TUI at runtime):
+Then install both binaries:

 ```bash
-cargo install codewhale-cli --locked   # provides `codewhale`
-cargo install codewhale-tui     --locked   # provides `codewhale-tui`
+cargo install codewhale-cli --locked
+cargo install codewhale-tui --locked
 codewhale --version
 ```

-Prebuilt binaries can also be downloaded from [GitHub Releases](https://github.com/Hmbown/CodeWhale/releases). Use `DEEPSEEK_TUI_RELEASE_BASE_URL` for mirrored release assets.
+Use `DEEPSEEK_TUI_RELEASE_BASE_URL` for mirrored release assets.

-### Windows (Scoop)
+### Windows

-[Scoop](https://scoop.sh) is a Windows package manager. The `codewhale` package is listed
-in Scoop's main bucket, but that manifest updates independently and can lag the
-GitHub/npm/Cargo release. Run `scoop update` first, then verify the installed
-version with `codewhale --version`:
+The Scoop `codewhale` manifest can lag GitHub/npm/Cargo releases. Run
+`scoop update` first, then verify with `codewhale --version`. Use npm or direct
+GitHub release downloads when you need the newest release immediately.

-```bash
-scoop update
-scoop install codewhale
-codewhale --version
-```
+### Remote-first Workspaces

-Use npm or direct GitHub release downloads when you need the newest release
-before Scoop's manifest catches up.
+For an always-on workspace you can control from a phone, use the Tencent-native
+path: CNB mirror/source, Tencent Lighthouse HK, a Feishu/Lark long-connection
+bridge, and optional EdgeOne for a deliberate public HTTPS edge. The runtime API
+stays bound to localhost; EdgeOne is not used to expose `/v1/*`.

+Start with [docs/TENCENT_CLOUD_REMOTE_FIRST.md](docs/TENCENT_CLOUD_REMOTE_FIRST.md),
+then use [docs/TENCENT_LIGHTHOUSE_HK.md](docs/TENCENT_LIGHTHOUSE_HK.md) for the
+server runbook.

 <details id="install-from-source">
 <summary>Install from source</summary>

-Works on any Tier-1 Rust target — including musl, riscv64, FreeBSD, and older ARM64 distros.
+Works on any Tier-1 Rust target including musl, riscv64, FreeBSD, and older
+ARM64 distros.

 ```bash
 # Linux build deps (Debian/Ubuntu/RHEL):
@@ -288,137 +334,15 @@ Works on any Tier-1 Rust target — including musl, riscv64, FreeBSD, and older
 git clone https://github.com/Hmbown/CodeWhale.git
 cd CodeWhale

-cargo install --path crates/cli --locked   # requires Rust 1.88+; provides `codewhale`
-cargo install --path crates/tui --locked   # provides `codewhale-tui`
+cargo install --path crates/cli --locked
+cargo install --path crates/tui --locked
 ```

-Both binaries are required. Cross-compilation and platform-specific notes: [docs/INSTALL.md](docs/INSTALL.md).
+Both binaries are required. Rust 1.88+ is required because the crates use the
+2024 edition.

 </details>

-### Other API Providers
-
-For the full shipped provider registry, including model IDs, auth variables,
-base URLs, and capability boundaries, see [docs/PROVIDERS.md](docs/PROVIDERS.md).
-
-Think of provider and model as separate choices: `provider` is the route,
-account, and endpoint; `model` is the model ID on that route. DeepSeek-family
-models can be reached through several routes, so `/config` exposes both
-`provider` and `provider_url`.
-
-| Route | Typical DeepSeek model ID |
-|-------|---------------------------|
-| `deepseek` | `deepseek-v4-pro` |
-| `nvidia-nim` | `deepseek-ai/deepseek-v4-pro` |
-| `openrouter` | `deepseek/deepseek-v4-pro` |
-| `fireworks` | `accounts/fireworks/models/deepseek-v4-pro` |
-| `siliconflow` | `deepseek-ai/DeepSeek-V4-Pro` |
-| `openai` | Your gateway's model ID |
-| `huggingface` | `deepseek-ai/DeepSeek-V4-Pro` |
-
-```bash
-# NVIDIA NIM
-codewhale auth set --provider nvidia-nim --api-key "YOUR_NVIDIA_API_KEY"
-codewhale --provider nvidia-nim
-
-# AtlasCloud
-codewhale auth set --provider atlascloud --api-key "YOUR_ATLASCLOUD_API_KEY"
-codewhale --provider atlascloud
-codewhale --provider atlascloud --model vendor/model-id
-
-# Wanjie Ark
-codewhale auth set --provider wanjie-ark --api-key "YOUR_WANJIE_API_KEY"
-codewhale --provider wanjie-ark --model deepseek-reasoner
-
-# OpenRouter
-codewhale auth set --provider openrouter --api-key "YOUR_OPENROUTER_API_KEY"
-codewhale --provider openrouter --model deepseek/deepseek-v4-pro
-codewhale --provider openrouter --model arcee-ai/trinity-large-thinking
-codewhale --provider openrouter --model minimax/minimax-m3
-
-Arcee AI offers direct API access to its powerful Trinity models, including the reasoning-capable Trinity-Large Thinking. This section provides comprehensive setup instructions and model comparisons.
-
-## Configuration
-
-### API Key
-The primary authentication method is the `ARCEE_API_KEY` environment variable or the `[providers.arcee]` configuration section in `~/.codewhale/config.toml`:
-
-```toml
-[providers.arcee]
-# api_key = "your-arcee-api-key"
-# base_url = "https://api.arcee.ai/api/v1"
-# model = "trinity-large-thinking"  # or "trinity-large-preview", "trinity-mini"
-```
-
-### Environment Variables
-
- `ARCEE_API_KEY`: Your Arcee API key (required)
- `ARCEE_BASE_URL`: Custom base URL (optional, defaults to `https://api.arcee.ai/api/v1`)
- `ARCEE_MODEL`: Default model to use (optional, defaults to `trinity-large-thinking`)
-
-### Model Support
-
-CodeWhale supports three Arcee models:
-
-| Model | Reasoning | Context Window | Max Output | Best For |
-|--------|-----------|----------------|------------|----------|
-| `trinity-large-thinking` | ✅ Yes | 262,144 tokens | 262,144 tokens | Complex reasoning, coding, math |
-| `trinity-large-preview` | ❌ No | 262,144 tokens | 4,096 tokens | High-accuracy non-reasoning tasks |
-| `trinity-mini` | ❌ No | 128,000 tokens | 4,096 tokens | Faster, cost-effective tasks |
-
-**Note:** The `trinity-large-thinking` model supports reasoning (thinking mode) and can handle very large contexts, making it ideal for complex programming tasks. The other models do not support reasoning but offer larger context windows than many other providers.
-codewhale auth set --provider arcee --api-key "YOUR_ARCEE_API_KEY"
-codewhale --provider arcee --model trinity-large-thinking
-codewhale --provider arcee --model trinity-large-preview
-
-# Xiaomi MiMo
-codewhale auth set --provider xiaomi-mimo --api-key "YOUR_XIAOMI_KEY"
-# Token Plan (`tp-...`) keys default to https://token-plan-sgp.xiaomimimo.com/v1.
-# To force a provider endpoint: /config provider_url token-plan --save
-# or /config provider_url pay-as-you-go --save.
-codewhale --provider xiaomi-mimo --model mimo-v2.5-pro
-codewhale --provider xiaomi-mimo --model mimo-v2.5
-codewhale --provider xiaomi-mimo speech "Hello from MiMo" --model tts -o hello.wav
-
-# Novita
-codewhale auth set --provider novita --api-key "YOUR_NOVITA_API_KEY"
-codewhale --provider novita --model deepseek/deepseek-v4-pro
-
-# Fireworks
-codewhale auth set --provider fireworks --api-key "YOUR_FIREWORKS_API_KEY"
-codewhale --provider fireworks --model deepseek-v4-pro
-
-# SiliconFlow
-codewhale auth set --provider siliconflow --api-key "YOUR_SILICONFLOW_API_KEY"
-codewhale --provider siliconflow --model deepseek-ai/DeepSeek-V4-Pro
-
-# Generic OpenAI-compatible endpoint
-codewhale auth set --provider openai --api-key "YOUR_OPENAI_COMPATIBLE_API_KEY"
-OPENAI_BASE_URL="https://openai-compatible.example/v4" codewhale --provider openai --model glm-5
-
-# Custom DeepSeek-compatible endpoint
-DEEPSEEK_BASE_URL="https://your-provider.example/v1" \
-  DEEPSEEK_MODEL="deepseek-ai/DeepSeek-V4-Pro" \
-  codewhale --provider deepseek
-
-# Self-hosted SGLang
-SGLANG_BASE_URL="http://localhost:30000/v1" codewhale --provider sglang --model deepseek-v4-flash
-
-# Self-hosted vLLM
-VLLM_BASE_URL="http://localhost:8000/v1" codewhale --provider vllm --model deepseek-v4-flash
-# Trusted LAN vLLM over HTTP
-DEEPSEEK_ALLOW_INSECURE_HTTP=1 VLLM_BASE_URL="http://192.168.0.110:8000/v1" codewhale --provider vllm --model deepseek-v4-flash
-
-# Self-hosted Ollama
-ollama pull codewhale-coder:1.3b
-codewhale --provider ollama --model codewhale-coder:1.3b
-```
-
-Inside the TUI, `/provider` opens the provider picker and `/model` opens the
-local model/thinking picker. `/provider openrouter` and `/model <id>` switch
-directly, while `/models` explicitly fetches and lists live API models when the
-active provider supports model listing.
-
 ---

 ## Release Notes
@@ -499,7 +423,7 @@ volume ownership notes, and non-interactive pipeline usage.

 ### Zed / ACP

-DeepSeek can run as a custom Agent Client Protocol server for editors that
+CodeWhale can run as a custom Agent Client Protocol server for editors that
 spawn local ACP agents over stdio. In Zed, add a custom agent server:

 ```json
@@ -578,18 +502,18 @@ Key environment variables:
 | `DEEPSEEK_BASE_URL` | API base URL |
 | `DEEPSEEK_HTTP_HEADERS` | Optional custom model request headers, e.g. `X-Model-Provider-Id=your-model-provider` |
 | `DEEPSEEK_MODEL` | Default model |
-| `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` | Stream idle timeout in seconds, default `300`, clamped to `1..=3600` |
+| `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` | Legacy stream idle timeout env override, default `300`, clamped to `1..=3600`; `[tui].stream_chunk_timeout_secs` takes precedence when configured |
 | `CODEWHALE_PROVIDER` / `DEEPSEEK_PROVIDER` | `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `siliconflow-CN`, `arcee`, `moonshot`, `sglang`, `vllm`, `ollama`, `huggingface` |
 | `DEEPSEEK_PROFILE` | Config profile name |
 | `DEEPSEEK_MEMORY` | Set to `on` to enable user memory |
 | `DEEPSEEK_ALLOW_INSECURE_HTTP=1` | Allow non-local `http://` API base URLs on trusted networks |
-| `NVIDIA_API_KEY` / `OPENAI_API_KEY` / `ATLASCLOUD_API_KEY` / `WANJIE_ARK_API_KEY` / `VOLCENGINE_API_KEY` / `VOLCENGINE_ARK_API_KEY` / `ARK_API_KEY` / `OPENROUTER_API_KEY` / `XIAOMI_MIMO_API_KEY` / `XIAOMI_API_KEY` / `MIMO_API_KEY` / `NOVITA_API_KEY` / `FIREWORKS_API_KEY` / `SILICONFLOW_API_KEY` / `ARCEE_API_KEY` / `MOONSHOT_API_KEY` / `KIMI_API_KEY` / `SGLANG_API_KEY` / `VLLM_API_KEY` / `OLLAMA_API_KEY` / `HUGGINGFACE_API_KEY` / `HF_TOKEN` | Provider auth |
+| `NVIDIA_API_KEY` / `OPENAI_API_KEY` / `ATLASCLOUD_API_KEY` / `WANJIE_ARK_API_KEY` / `VOLCENGINE_API_KEY` / `VOLCENGINE_ARK_API_KEY` / `ARK_API_KEY` / `OPENROUTER_API_KEY` / `XIAOMI_MIMO_TOKEN_PLAN_API_KEY` / `MIMO_TOKEN_PLAN_API_KEY` / `XIAOMI_MIMO_API_KEY` / `XIAOMI_API_KEY` / `MIMO_API_KEY` / `NOVITA_API_KEY` / `FIREWORKS_API_KEY` / `SILICONFLOW_API_KEY` / `ARCEE_API_KEY` / `MOONSHOT_API_KEY` / `KIMI_API_KEY` / `SGLANG_API_KEY` / `VLLM_API_KEY` / `OLLAMA_API_KEY` / `HUGGINGFACE_API_KEY` / `HF_TOKEN` | Provider auth |
 | `OPENAI_BASE_URL` / `OPENAI_MODEL` | Generic OpenAI-compatible endpoint and model ID |
 | `ATLASCLOUD_BASE_URL` / `ATLASCLOUD_MODEL` | AtlasCloud endpoint and model override |
 | `WANJIE_ARK_BASE_URL` / `WANJIE_ARK_MODEL` | Wanjie Ark endpoint and model override |
 | `VOLCENGINE_BASE_URL` / `VOLCENGINE_ARK_BASE_URL` / `ARK_BASE_URL` / `VOLCENGINE_MODEL` / `VOLCENGINE_ARK_MODEL` | Volcengine Ark endpoint and model override |
 | `OPENROUTER_BASE_URL` | OpenRouter endpoint override |
-| `XIAOMI_MIMO_BASE_URL` / `MIMO_BASE_URL` / `XIAOMI_MIMO_MODEL` / `MIMO_MODEL` | Xiaomi MiMo endpoint and model override; Token Plan default is `https://token-plan-sgp.xiaomimimo.com/v1` |
+| `XIAOMI_MIMO_BASE_URL` / `MIMO_BASE_URL` / `XIAOMI_MIMO_MODEL` / `MIMO_MODEL` / `XIAOMI_MIMO_MODE` / `MIMO_MODE` | Xiaomi MiMo endpoint, model, and Token Plan mode override; Token Plan default is `https://token-plan-sgp.xiaomimimo.com/v1` |
 | `NOVITA_BASE_URL` | Novita endpoint override |
 | `FIREWORKS_BASE_URL` | Fireworks endpoint override |
 | `SILICONFLOW_BASE_URL` / `SILICONFLOW_MODEL` | SiliconFlow endpoint and model override |
@@ -602,25 +526,30 @@ Key environment variables:
 | `OLLAMA_MODEL` | Self-hosted Ollama model tag |
 | `HUGGINGFACE_API_KEY` / `HF_TOKEN` / `HUGGINGFACE_BASE_URL` / `HUGGINGFACE_MODEL` | Hugging Face endpoint and model override |
 | `NO_ANIMATIONS=1` | Force accessibility mode at startup |
-| `SSL_CERT_FILE` | Custom CA bundle for corporate proxies |
+| `SSL_CERT_FILE` | Custom CA bundle for corporate proxies; prefer this over provider-local `insecure_skip_tls_verify` |

 Set `locale` in `settings.toml`, use `/config locale zh-Hans`, or rely on `LC_ALL`/`LANG` to choose UI chrome and the fallback language sent to V4 models. The latest user message still wins for natural-language reasoning and replies, so Chinese user turns stay Chinese even on an English system locale. See [docs/CONFIGURATION.md](docs/CONFIGURATION.md) and [docs/MCP.md](docs/MCP.md).

 ---

-## Models & Pricing
+## Models & Cost Tracking

-| Model | Context | Input (cache hit) | Input (cache miss) | Output |
-|---|---|---|---|---|
-| `deepseek-v4-pro` | 1M | $0.003625 / 1M | $0.435 / 1M | $0.87 / 1M |
-| `deepseek-v4-flash` | 1M | $0.0028 / 1M | $0.14 / 1M | $0.28 / 1M |
+CodeWhale tracks the provider route, concrete model, prompt-cache hit/miss
+estimate, input tokens, and output tokens for the turn that actually ran. Auto
+mode is resolved before the upstream request, so the footer and session summary
+charge against `deepseek-v4-flash`, `deepseek-v4-pro`, or the explicit provider
+model selected for that turn.

-DeepSeek Platform defaults to `https://api.deepseek.com/beta` so beta-gated API features can be tested without extra setup. Set `base_url = "https://api.deepseek.com"` to opt out.
+Pricing changes over time and can vary by account, region, provider route, and
+promotion. Use [docs/PROVIDERS.md](docs/PROVIDERS.md) for supported model IDs
+and the provider's official pricing pages for billing decisions. Treat the TUI
+cost display as a local estimate, not a receipt.

-Legacy aliases `deepseek-chat` / `deepseek-reasoner` map to `deepseek-v4-flash` and retire after July 24, 2026. NVIDIA NIM variants use your NVIDIA account terms.
-
-> [!Note]
-> DeepSeek's pricing page now lists the V4 Pro rates above as the permanent prices: the previous 75% promotional discount has been folded into a one-quarter base-rate adjustment as the promotion window closes on 15:59 UTC on 31 May 2026. The TUI cost estimator already uses these values, so no behavioural change is required. For any future price changes, consult the official [DeepSeek pricing page](https://api-docs.deepseek.com/zh-cn/quick_start/pricing).
+DeepSeek Platform defaults to `https://api.deepseek.com/beta` so beta-gated API
+features can be tested without extra setup. Set `base_url =
+"https://api.deepseek.com"` to opt out. Legacy aliases `deepseek-chat` /
+`deepseek-reasoner` remain compatibility shims; prefer V4 model IDs for new
+config. NVIDIA NIM variants use your NVIDIA account terms.

 ---

@@ -673,11 +602,15 @@ without recreating skills the user deliberately deleted.
 | [TENCENT_CLOUD_REMOTE_FIRST.md](docs/TENCENT_CLOUD_REMOTE_FIRST.md) | Tencent/CNB/Lighthouse/Feishu remote-first path |
 | [TENCENT_LIGHTHOUSE_HK.md](docs/TENCENT_LIGHTHOUSE_HK.md) | Lighthouse Hong Kong server setup |
 | [MEMORY.md](docs/MEMORY.md) | User memory feature guide |
+| [AGENT_ETHOS.md](docs/AGENT_ETHOS.md) | Maintainer and agent stewardship posture |
 | [SUBAGENTS.md](docs/SUBAGENTS.md) | Sub-agent role taxonomy and lifecycle |
 | [KEYBINDINGS.md](docs/KEYBINDINGS.md) | Full shortcut catalog |
 | [RELEASE_RUNBOOK.md](docs/RELEASE_RUNBOOK.md) | Release process |
 | [LOCALIZATION.md](docs/LOCALIZATION.md) | UI locale matrix & switching |
 | [OPERATIONS_RUNBOOK.md](docs/OPERATIONS_RUNBOOK.md) | Ops & recovery |
+| [V0_9_0_RELEASE_ACCEPTANCE.md](docs/V0_9_0_RELEASE_ACCEPTANCE.md) | v0.9.0 pre-tag acceptance matrix and release gates |
+| [HARNESS_PROFILE_CUTLINE.md](docs/HARNESS_PROFILE_CUTLINE.md) | HarnessProfile schema, resolver, and runtime boundary for v0.9 |
+| [2574-provider-fallback-chain.md](docs/rfcs/2574-provider-fallback-chain.md) | Provider fallback chain RFC |

 Full Changelog: [CHANGELOG.md](CHANGELOG.md).

@@ -690,7 +623,80 @@ Full Changelog: [CHANGELOG.md](CHANGELOG.md).
 - **[OpenWarp](https://github.com/zerx-lab/warp)** — thank you for prioritizing codewhale support and for collaborating on a better terminal-agent experience.
 - **[Open Design](https://github.com/nexu-io/open-design)** — thank you for support and collaboration around design-forward agent workflows.

-This project ships with help from a growing community of contributors:
+This project ships with help from a growing community of contributors. The
+maintainer rule is simple: reports and PRs are real project work, even when the
+final patch has to be narrowed, delayed, or harvested into a maintainer branch.
+
+For the v0.9 track, harvested PRs should keep visible credit in the commit or
+PR body, changelog or release notes, and relevant issue/PR comments. Contributor
+credit should use mappable GitHub identities from `.github/AUTHOR_MAP` or
+numeric noreply addresses, not placeholder local emails. The contribution gate
+is kept in dry-run mode unless a maintainer deliberately enables enforcement;
+when it comments, the tone should be warm and practical rather than treating
+the reporter as the problem. Recurring contributors should be recognized so the
+automation gets out of their way and the public record shows their repeated
+help.
+
+Current v0.9 track credits:
+
+- **[xyuai](https://github.com/xyuai)** — canonical CodeWhale settings path,
+  provider persistence, provider picker, logout-scope, and MiMo auth cleanup
+  work (#2730, #2714, #2715, #2717, #2718)
+- **[shenjackyuanjie](https://github.com/shenjackyuanjie)** — HarmonyOS /
+  OpenHarmony porting work and MatePad Edge validation trail (#2634)
+- **[ousamabenyounes](https://github.com/ousamabenyounes)** — AZERTY/AltGr
+  composer shortcut fix for Windows keyboard layouts (#2863, #2867)
+- **[reidliu41](https://github.com/reidliu41)** — hotbar action-registry
+  foundation and Ollama model-completion cleanup for the v0.9 track (#2866,
+  #2742)
+- **[ljm3790865](https://github.com/ljm3790865)** — multi-tab
+  core/persistence foundation and broader tab collaboration direction (#2864,
+  #2753)
+- **[sximelon](https://github.com/sximelon)** — saved-session resume footer
+  hint work plus provider-trait metadata registry direction reviewed and
+  harvested for the v0.9 track (#2758, #2760, #2479)
+- **[aboimpinto](https://github.com/aboimpinto)** — sidebar command polish and
+  pausable custom-command lifecycle direction harvested into the v0.9 track,
+  plus the directly merged command-support boundary cleanup and broader command
+  layer design direction (#2788, #2732, #2871, #2851, #2791)
+- **[AdityaVG13](https://github.com/AdityaVG13)** — WhaleFlow orchestration and
+  cost-tracking drafts that shaped the maintained v0.9 WhaleFlow IR and
+  TraceStore foundation (#2482, #2486)
+- **[lbcheng888](https://github.com/lbcheng888)**,
+  **[AiurArtanis](https://github.com/AiurArtanis)**, and
+  **[nasus9527](https://github.com/nasus9527)** — VS Code extension scaffold
+  direction, Agent View request, and IDE plugin request that shaped the
+  official Phase 0 extension (#1022, #1584, #2580)
+- **[HUQIANTAO](https://github.com/HUQIANTAO)** — `web_run` cache-state
+  lock-splitting, turn-metadata prefix-cache stability, and project-context
+  cache work (#2502, #2517, #2636)
+- **[idling11](https://github.com/idling11)** — PlanArtifact continuity,
+  dense tool-call transcript collapse, sidebar detail popovers, and
+  HarnessPosture provider/model policy direction (#2733, #2738, #2734,
+  #2741, #2692, #2694, #2693)
+- **[h3c-hexin](https://github.com/h3c-hexin)** — sub-agent model inheritance,
+  configured `skills_dir` discovery, prompt-environment stability, and static
+  prompt composer direction (#2736, #2737, #2786)
+- **[gaord](https://github.com/gaord)** — runtime thread workspace updates and
+  completed-thread saved-session API work (#2640, #2639)
+- **[cyq1017](https://github.com/cyq1017)** — trusted workspace MCP config,
+  provider auth rollback, custom search endpoint, custom completion sound,
+  restore-listing, and pending-input delivery-mode label work (#2751, #2755,
+  #2510, #2512, #2513, #2532, #2054)
+- **[yusufgurdogan](https://github.com/yusufgurdogan)** — Sofya search
+  provider implementation harvested as a non-default search backend (#2790)
+- **[LeoAlex0](https://github.com/LeoAlex0)** — runtime prompt metadata cache
+  direction harvested into the v0.9 prompt/cache path (#2687)
+- **[NASLXTO](https://github.com/NASLXTO)** and
+  **[wuxixing](https://github.com/wuxixing)** — large-workspace startup
+  reports that shaped the bounded project-context fallback (#697, #1827)
+- **[shuxiangxuebiancheng](https://github.com/shuxiangxuebiancheng)**,
+  **[hongqitai](https://github.com/hongqitai)**, and
+  **[cyq1017](https://github.com/cyq1017)** — third-party
+  OpenAI-compatible path-suffix report and follow-up review trail (#1874,
+  #2508, #2506)
+
+Current and recurring contributors include:

 - **[merchloubna70-dot](https://github.com/merchloubna70-dot)** — 28 PRs spanning features, fixes, and VS Code extension scaffolding (#645–#681)
 - **[WyxBUPT-22](https://github.com/WyxBUPT-22)** — Markdown rendering for tables, bold/italic, and horizontal rules (#579)
@@ -742,7 +748,10 @@ This project ships with help from a growing community of contributors:
 - **[Aitensa](https://github.com/Aitensa)** — CJK wrapping propagation for diff and pager output (#1622)
 - **[qiyan233](https://github.com/qiyan233)** — legacy DeepSeek CN provider alias compatibility (#1645)
 - **[zlh124](https://github.com/zlh124)** — WSL2/headless startup report, clipboard-init fix, CodeWhale tab-title polish, localized context-menu labels, and approval-dialog fixes (#1772, #1773, #2319, #2320, #2325)
- **[aboimpinto](https://github.com/aboimpinto)** — Windows alt-screen logging, Home/End composer, and runtime log follow-ups (#1774, #1776, #1748, #1749, #1782, #1783)
+- **[aboimpinto](https://github.com/aboimpinto)** — Windows alt-screen
+  logging, Home/End composer, runtime log follow-ups, sidebar command polish,
+  and pausable command lifecycle work (#1774, #1776, #1748, #1749, #1782,
+  #1783, #2788, #2732)
 - **[LeoLin990405](https://github.com/LeoLin990405)** — provider model passthrough, reasoning replay, thinking-only turn, and Windows quoting fixes (#1740, #1743, #1742, #1744)
 - **[nightt5879](https://github.com/nightt5879)** — Ctrl+C prompt restore, provider registry drift docs, tool-search defaults, footer git branch display, and startup prompt interactivity (#1764, #2274, #2344, #2347, #2373)
 - **[donglovejava](https://github.com/donglovejava)** — paste @file consolidation, CJK panic fix, user feedback, RLM routing, edit_file retry, hidden-worktree discovery skip, IME composer routing, and eager shell companion tools (#2154-#2168, #2302, #2329, #2330, #2331)
@@ -764,7 +773,8 @@ This project ships with help from a growing community of contributors:
 - **[yuanchenglu](https://github.com/yuanchenglu)** — Feishu per-chat model switching (#2149)
 - **[HUQIANTAO](https://github.com/HUQIANTAO)** — Xiaomi balance/status work, stalled-turn recovery, approval intent summaries, mobile smoke/QR support, Claude theme, and broad docs/test/CI coverage (#2257, #2267, #2283, #2384, #2385, #2389, #2403, #2440-#2458, #2460)
 - **[h3c-hexin](https://github.com/h3c-hexin)** — web-search URL decoding, prompt/instructions override hooks, sub-agent guidance, SSRF fake-IP trust configuration, and prompt-cache-friendly environment placement (#2245, #2311, #2313, #2314, #2354, #2355, #2356)
- **[AresNing](https://github.com/AresNing)** — first-run guide and message-submit hook transform design harvested into the maintained hooks path (#2278, #2318, #2434)
+- **[tdccccc](https://github.com/tdccccc)** — approval prompt key-detail and shell-preview work harvested into the maintained approval path (#1991, #2269)
+- **[AresNing](https://github.com/AresNing)** — first-run guide, message-submit hook transform design, and turn-end observer hook work harvested into the maintained hooks path (#2278, #2318, #2434, #2578)
 - **[Implementist](https://github.com/Implementist)** — Volcengine Ark search provider and reliability hardening (#2426, #2429, #2439)
 - **[lihuan215](https://github.com/lihuan215)** — Unix socket hook sink design harvested into the opt-in hook event path (#2333, #2430)
 - **[AdityaVG13](https://github.com/AdityaVG13)** — Xiaomi MiMo provider support (#2246)
@@ -802,6 +812,21 @@ credit: **[@buko](https://github.com/buko)**, **[@yyyCode](https://github.com/yy

 See [CONTRIBUTING.md](CONTRIBUTING.md). Pull requests welcome — check the [open issues](https://github.com/Hmbown/CodeWhale/issues) for good first contributions.

+CodeWhale gets a lot of good reports and PRs. The maintainer posture is to keep
+that door open while protecting release quality:
+
+- Issues should stay human-readable and actionable. Intake automation is
+  advisory unless a maintainer deliberately enables enforcement.
+- PRs are reviewed from code, tests, linked issues, and runtime behavior, not
+  from title alone.
+- If a PR is too broad to merge directly, maintainers may harvest the safe part
+  into a narrower branch, then credit the author and explain what landed.
+- Co-author trailers should use mappable GitHub noreply identities from
+  `.github/AUTHOR_MAP`; reporters and repro authors should be thanked in
+  changelogs, release notes, and closure comments.
+- Recurring contributors can be added to `.github/APPROVED_CONTRIBUTORS` so
+  dry-run gates stay out of their way.
+
 Support: [Buy me a coffee](https://www.buymeacoffee.com/hmbown).

 > [!Note]
@@ -183,6 +183,8 @@ Hãy chỉ định mô hình hoặc cấp độ suy nghĩ cố định nếu b

 Lệnh cài đặt `npm i -g codewhale` hoạt động trên môi trường Linux ARM64 nền glibc từ phiên bản v0.8.8 trở đi. Bạn cũng có thể tải trực tiếp các tệp binary dựng sẵn từ [trang phát hành Releases](https://github.com/Hmbown/CodeWhale/releases) và đặt chúng cạnh nhau trong một thư mục thuộc biến `PATH`.

+Xem [docs/HarmonyOS.md](docs/HarmonyOS.md) để cấu hình HarmonyOS PC và cross-build OpenHarmony.
+
 ### Cài đặt thân thiện qua Mirror (Tại Trung Quốc)

 Nếu việc tải xuống từ GitHub hoặc npm bị chậm từ Trung Quốc đại lục, bạn hãy sử dụng mirror registry cho Cargo:
@@ -186,6 +186,8 @@ Auto 模式同时控制两个设置：

 从 v0.8.8 起，`npm i -g codewhale` 直接支持 glibc 系的 ARM64 Linux。你也可以从 [Releases 页面](https://github.com/Hmbown/CodeWhale/releases) 下载预编译二进制，放到 `PATH` 目录中。

+HarmonyOS PC 运行和 OpenHarmony 交叉编译配置见 [docs/HarmonyOS.md](docs/HarmonyOS.md)。
+
 ### 中国大陆 / 镜像友好安装

 如果在中国大陆访问 GitHub 或 npm 下载较慢，可以通过 Cargo 注册表镜像安装：
@@ -270,6 +272,8 @@ codewhale --provider openrouter --model qwen/qwen3.7-max
 codewhale auth set --provider xiaomi-mimo --api-key "YOUR_XIAOMI_MIMO_API_KEY"
 codewhale --provider xiaomi-mimo --model mimo-v2.5-pro
 codewhale --provider xiaomi-mimo speech "???MiMo" --model tts -o hello.wav
+XIAOMI_MIMO_TOKEN_PLAN_API_KEY="tp-..." XIAOMI_MIMO_MODE="token-plan-sgp" \
+  codewhale --provider xiaomi-mimo --model mimo-v2.5-pro

 # Novita
 codewhale auth set --provider novita --api-key "YOUR_NOVITA_API_KEY"
@@ -425,13 +429,13 @@ DeepSeek 可作为自定义 Agent Client Protocol 服务器运行，供 Zed 等
 | `DEEPSEEK_PROFILE` | 配置 profile 名称 |
 | `DEEPSEEK_MEMORY` | 设为 `on` 启用用户记忆 |
 | `DEEPSEEK_ALLOW_INSECURE_HTTP=1` | 在可信网络上允许非本机 `http://` API base URL |
-| `NVIDIA_API_KEY` / `OPENAI_API_KEY` / `ATLASCLOUD_API_KEY` / `WANJIE_ARK_API_KEY` / `VOLCENGINE_API_KEY` / `ARK_API_KEY` / `OPENROUTER_API_KEY` / `XIAOMI_MIMO_API_KEY` / `MIMO_API_KEY` / `NOVITA_API_KEY` / `FIREWORKS_API_KEY` / `SILICONFLOW_API_KEY` / `MOONSHOT_API_KEY` / `KIMI_API_KEY` / `SGLANG_API_KEY` / `VLLM_API_KEY` / `OLLAMA_API_KEY` / `HUGGINGFACE_API_KEY` / `HF_TOKEN` | 提供商认证 |
+| `NVIDIA_API_KEY` / `OPENAI_API_KEY` / `ATLASCLOUD_API_KEY` / `WANJIE_ARK_API_KEY` / `VOLCENGINE_API_KEY` / `ARK_API_KEY` / `OPENROUTER_API_KEY` / `XIAOMI_MIMO_TOKEN_PLAN_API_KEY` / `MIMO_TOKEN_PLAN_API_KEY` / `XIAOMI_MIMO_API_KEY` / `MIMO_API_KEY` / `NOVITA_API_KEY` / `FIREWORKS_API_KEY` / `SILICONFLOW_API_KEY` / `MOONSHOT_API_KEY` / `KIMI_API_KEY` / `SGLANG_API_KEY` / `VLLM_API_KEY` / `OLLAMA_API_KEY` / `HUGGINGFACE_API_KEY` / `HF_TOKEN` | 提供商认证 |
 | `OPENAI_BASE_URL` / `OPENAI_MODEL` | 通用 OpenAI 兼容端点和模型 ID |
 | `ATLASCLOUD_BASE_URL` / `ATLASCLOUD_MODEL` | AtlasCloud 端点和模型覆盖 |
 | `WANJIE_ARK_BASE_URL` / `WANJIE_ARK_MODEL` | Wanjie Ark 端点和模型覆盖 |
 | `VOLCENGINE_BASE_URL` / `ARK_BASE_URL` / `VOLCENGINE_MODEL` / `ARK_MODEL` | Volcengine Ark 端点和模型覆盖 |
 | `OPENROUTER_BASE_URL` | OpenRouter 端点覆盖 |
-| `XIAOMI_MIMO_BASE_URL` / `MIMO_BASE_URL` / `XIAOMI_MIMO_MODEL` / `MIMO_MODEL` | Xiaomi MiMo 端点和模型覆盖 |
+| `XIAOMI_MIMO_BASE_URL` / `MIMO_BASE_URL` / `XIAOMI_MIMO_MODEL` / `MIMO_MODEL` / `XIAOMI_MIMO_MODE` / `MIMO_MODE` | Xiaomi MiMo 端点、模型和 Token Plan 模式覆盖 |
 | `NOVITA_BASE_URL` | Novita 端点覆盖 |
 | `FIREWORKS_BASE_URL` | Fireworks 端点覆盖 |
 | `SILICONFLOW_BASE_URL` / `SILICONFLOW_MODEL` | SiliconFlow 端点和模型覆盖 |
@@ -88,6 +88,26 @@ cost_currency = "usd" # usd | cny
 check_for_updates = true
 # update_uri = "https://internal.mirror.example/codewhale/releases/latest"

+# ─────────────────────────────────────────────────────────────────────────────────
+# Hotbar slots (#2061 / #2064)
+# ─────────────────────────────────────────────────────────────────────────────────
+# Optional 1-8 sidebar hotbar bindings. When no [[hotbar]] tables are present,
+# the TUI uses built-in defaults:
+#   1 voice.toggle      2 session.compact   3 mode.plan        4 mode.agent
+#   5 mode.yolo         6 palette.open      7 sidebar.toggle   8 trust.toggle
+#
+# Invalid slots are skipped with a warning, duplicate slots use the last entry,
+# and unknown actions are preserved so the UI can show a disabled placeholder.
+#
+# [[hotbar]]
+# slot = 1
+# label = "voice"
+# action = "voice.toggle"
+#
+# [[hotbar]]
+# slot = 2
+# action = "session.compact"
+
 # ─────────────────────────────────────────────────────────────────────────────────
 # Paths
 # ─────────────────────────────────────────────────────────────────────────────────
@@ -144,11 +164,12 @@ memory_path = "~/.codewhale/memory.md"
 allow_shell = true
 approval_policy = "on-request" # on-request | untrusted | never
 sandbox_mode = "workspace-write" # read-only | workspace-write | danger-full-access | external-sandbox
+# prompt_suggestion = true  # opt-in: show ghost-text follow-up question in composer after each turn

 # Typed permission rules live in a sibling `permissions.toml` file, not in
-# config.toml. This schema slice is ask-only and is parsed for follow-up
-# approval-flow wiring; allow/deny records and UI persistence are intentionally
-# out of scope here.
+# config.toml. This shape is ask-only and feeds the execution policy engine;
+# allow/deny records, glob expansion, and UI persistence are intentionally out
+# of scope here.
 #
 # Example ~/.codewhale/permissions.toml:
 #
@@ -239,7 +260,7 @@ max_subagents = 10 # optional (1-20)
 #   Volcengine Ark: VOLCENGINE_API_KEY (or VOLCENGINE_ARK_API_KEY / ARK_API_KEY), VOLCENGINE_BASE_URL, VOLCENGINE_MODEL
 #   OpenRouter: OPENROUTER_API_KEY, OPENROUTER_BASE_URL, OPENROUTER_MODEL
 #   Xiaomi MiMo: XIAOMI_MIMO_API_KEY (or XIAOMI_API_KEY / MIMO_API_KEY), XIAOMI_MIMO_BASE_URL, XIAOMI_MIMO_MODEL
-#                Token Plan keys (`tp-...`) default to https://token-plan-sgp.xiaomimimo.com/v1.
+#                Token Plan: XIAOMI_MIMO_TOKEN_PLAN_API_KEY (or MIMO_TOKEN_PLAN_API_KEY), XIAOMI_MIMO_MODE/MIMO_MODE
 #   Novita:     NOVITA_API_KEY, NOVITA_BASE_URL, NOVITA_MODEL
 #   Fireworks:  FIREWORKS_API_KEY, FIREWORKS_BASE_URL
 #   SiliconFlow: SILICONFLOW_API_KEY, SILICONFLOW_BASE_URL, SILICONFLOW_MODEL
@@ -248,7 +269,7 @@ max_subagents = 10 # optional (1-20)
 #   SGLang:    SGLANG_BASE_URL, SGLANG_MODEL, optional SGLANG_API_KEY
 #   vLLM:      VLLM_BASE_URL, VLLM_MODEL, optional VLLM_API_KEY
 #   Ollama:    OLLAMA_BASE_URL, OLLAMA_MODEL, optional OLLAMA_API_KEY
-#   Hugging Face: HUGGINGFACE_API_KEY (or HF_TOKEN), HUGGINGFACE_BASE_URL, HUGGINGFACE_MODEL
+#   Hugging Face: HUGGINGFACE_API_KEY (or HF_TOKEN), HUGGINGFACE_BASE_URL (or HF_BASE_URL), HUGGINGFACE_MODEL (or HF_MODEL)
 #
 # Custom DeepSeek-compatible APIs usually do not need a new provider table:
 # set `provider = "deepseek"` and override [providers.deepseek].base_url/model.
@@ -274,6 +295,7 @@ max_subagents = 10 # optional (1-20)
 # model = "deepseek-ai/DeepSeek-V4-Pro"
 # http_headers = { "X-Model-Provider-Id" = "your-model-provider" } # optional custom request headers
 # path_suffix = "/chat/completions" # override the API path; skips /v1 versioning when set
+# insecure_skip_tls_verify = true # last resort for private gateways; prefer SSL_CERT_FILE

 # NVIDIA NIM-hosted DeepSeek V4 (https://build.nvidia.com)
 [providers.nvidia_nim]
@@ -292,6 +314,7 @@ max_subagents = 10 # optional (1-20)
 # Gateway example:
 # base_url = "https://gateway.example/v1"
 # model = "your-deepseek-compatible-model"
+# insecure_skip_tls_verify = true # last resort for private gateways; prefer SSL_CERT_FILE

 # AtlasCloud OpenAI-compatible endpoint (https://www.atlascloud.ai/docs/models/llm)
 [providers.atlascloud]
@@ -329,6 +352,11 @@ max_subagents = 10 # optional (1-20)
 # # base_url = "https://api.xiaomimimo.com/v1"           # Pay-as-you-go / sk- keys
 # model = "mimo-v2.5-pro"              # chat/reasoning
 # Chat model IDs: mimo-v2.5-pro, mimo-v2.5
+# Token Plan subscriptions use separate tp-* API keys plus api-key auth.
+# mode = "token-plan-sgp"              # default Token Plan endpoint
+# mode = "token-plan-cn"               # China cluster
+# mode = "token-plan-ams"              # Europe cluster
+# mode = "pay-as-you-go"               # standard API / sk- keys
 # TTS aliases are also accepted by `codewhale speech`: tts, voice-design, voice-clone
 # TTS model IDs: mimo-v2.5-tts, mimo-v2.5-tts-voicedesign, mimo-v2.5-tts-voiceclone, mimo-v2-tts

@@ -385,6 +413,8 @@ max_subagents = 10 # optional (1-20)
 # model = "deepseek-coder:1.3b"             # or any local Ollama tag

 # Hugging Face Inference Providers (https://huggingface.co/docs/api-inference)
+# Env var aliases: HUGGINGFACE_API_KEY / HF_TOKEN, HUGGINGFACE_BASE_URL / HF_BASE_URL,
+#                  HUGGINGFACE_MODEL / HF_MODEL
 [providers.huggingface]
 # api_key = "YOUR_HF_TOKEN"
 # base_url = "https://router.huggingface.co/v1"
@@ -399,7 +429,7 @@ max_subagents = 10 # optional (1-20)
 # API-backed search.
 #
 # [search]
-# provider = "duckduckgo"  # duckduckgo | bing | tavily | bocha | metaso | baidu | volcengine
+# provider = "duckduckgo"  # duckduckgo | bing | tavily | bocha | metaso | baidu | volcengine | sofya
 #                            # duckduckgo: HTML scrape with Bing fallback
 #                            # bing:       HTML scrape, no API key
 #                            # tavily:     https://tavily.com — AI search, needs api_key
@@ -409,15 +439,22 @@ max_subagents = 10 # optional (1-20)
 #                            # baidu:      百度 AI Search via qianfan.baidubce.com，需 api_key
 #                            # volcengine: 火山引擎 Ark web_search (免费 2 万次/月), 需 api_key
 #                            #             也回退到 VOLCENGINE_API_KEY / VOLCENGINE_ARK_API_KEY / ARK_API_KEY 环境变量
-# api_key = "YOUR_SEARCH_KEY" # required for tavily, bocha, and baidu; optional for metaso
+#                            # sofya:      https://sofya.co — AI search returning full page
+#                            #             content (not snippets), needs api_key (ay_live_...);
+#                            #             also falls back to the SOFYA_API_KEY env var
+# base_url = "https://search.example/html/" # optional DuckDuckGo-compatible HTML endpoint
+# api_key = "YOUR_SEARCH_KEY" # required for tavily, bocha, baidu, volcengine, and sofya; optional for metaso
 #                             # WARNING: treat config.toml like a secret file when
 #                             # storing API keys. Prefer env vars for local smoke tests.
 #
 # Env-var overrides:
 #   DEEPSEEK_SEARCH_PROVIDER → search.provider
 #   DEEPSEEK_SEARCH_API_KEY  → search.api_key
+#   CODEWHALE_SEARCH_BASE_URL → search.base_url
+#   DEEPSEEK_SEARCH_BASE_URL  → search.base_url (legacy alias)
 #   METASO_API_KEY           → metaso key fallback
 #   BAIDU_SEARCH_API_KEY     → baidu key fallback
+#   SOFYA_API_KEY            → sofya key fallback

 # ─────────────────────────────────────────────────────────────────────────────────
 # Network Policy (#135)
@@ -469,6 +506,7 @@ max_subagents = 10 # optional (1-20)
 alternate_screen = "auto"   # auto/always use the TUI screen; never uses terminal scrollback
 mouse_capture = true        # true copies only transcript user/assistant text; false uses raw terminal selection/copy
 terminal_probe_timeout_ms = 500 # optional startup terminal-mode timeout (100-5000ms)
+stream_chunk_timeout_secs = 300 # optional SSE idle timeout per chunk (0 = default, 1-3600)
 osc8_links = true            # emit OSC 8 escapes around URLs (Cmd+click in iTerm2/Ghostty/Kitty/WezTerm/Terminal.app 13+); set false for terminals that misrender
 # Ordered footer chips shown in the TUI status line. Omit the key to use the
 # built-in default; set [] to hide all configurable chips. You can also edit
@@ -588,6 +626,26 @@ deepseek_v4_pro_prior = 3.5
 deepseek_v4_flash_prior = 4.2
 fallback_default_prior = 3.8

+# ─────────────────────────────────────────────────────────────────────────────────
+# Harness Profiles (preview schema; runtime consumption follows later)
+# ─────────────────────────────────────────────────────────────────────────────────
+# Harness profiles let future CodeWhale runtime slices select model-specific
+# prompt, context, tool, and subagent posture. v0.9 parses, validates, and can
+# resolve profiles for tests/status plumbing, but normal Agent and WhaleFlow
+# runs do not silently promote or mutate behavior from these profiles yet.
+#
+# [[harness_profiles]]
+# provider_route = "deepseek"
+# model_pattern = "deepseek-v4.*"
+#
+# [harness_profiles.posture]
+# kind = "cache-heavy"          # standard | cache-heavy | lean | custom
+# max_subagents = 10            # 0 means runtime default
+# prefer_codebase_search = false
+# compaction_strategy = "prefix-cache" # default | prefix-cache | aggressive
+# tool_surface = "full"              # full | read-only | auto
+# safety_posture = "standard"        # standard | strict | permissive
+
 # ─────────────────────────────────────────────────────────────────────────────────
 # Profile Example (for multiple environments)
 # ─────────────────────────────────────────────────────────────────────────────────
@@ -617,21 +675,20 @@ default_text_model = "deepseek-ai/deepseek-v4-pro"
 # method        = "auto"   # auto | osc9 | bel | off
 #                 auto: OSC 9 for iTerm.app / Ghostty / WezTerm.
 #                       On macOS / Linux, falls back to BEL.
-#                       On Windows, falls back to "off" — BEL maps to the
-#                       system error chime (SystemAsterisk / MB_OK), which
-#                       sounds like an error popup. Set method = "bel"
-#                       explicitly to opt back in (#583).
+#                       On Windows, BEL is routed through MessageBeep(MB_OK).
 #                 osc9: \x1b]9;<msg>\x07 (iTerm2-style; shows macOS notification)
 #                 bel:  plain \x07 beep
 #                 off:  disable entirely
 # threshold_secs = 30      # only notify when the turn took >= this many seconds
 # include_summary = false  # include elapsed time + cost in the notification body
-# completion_sound = "beep" # off | beep | bell — sound on turn completion (✅ marker)
+# completion_sound = "beep" # off | beep | bell | file — sound on turn completion (✅ marker)
+# sound_file = "E:\\google\\downloads\\notify.wav" # WAV used when completion_sound = "file" (Windows)
 [notifications]
 # method = "auto"
 # threshold_secs = 30
 # include_summary = false
 # completion_sound = "beep"
+# sound_file = "E:\\google\\downloads\\notify.wav"

 # ─────────────────────────────────────────────────────────────────────────────────
 # Workspace Snapshots (#137)
@@ -21,8 +21,10 @@ codewhale-state = { path = "../state", version = "0.8.54" }
 codewhale-tools = { path = "../tools", version = "0.8.54" }
 serde.workspace = true
 serde_json.workspace = true
+rustls.workspace = true
 tokio.workspace = true
 tower-http.workspace = true
+tracing.workspace = true
 uuid.workspace = true

 [dev-dependencies]
@@ -12,7 +12,6 @@ use axum::{Json, Router};
 use codewhale_agent::ModelRegistry;
 use codewhale_config::{CliRuntimeOverrides, ConfigStore};
 use codewhale_core::Runtime;
-use codewhale_execpolicy::ExecPolicyEngine;
 use codewhale_hooks::{HookDispatcher, JsonlHookSink, StdoutHookSink, UnixSocketHookSink};
 use codewhale_mcp::McpManager;
 use codewhale_protocol::{
@@ -277,14 +276,19 @@ async fn tool_handler(
    let cwd = req
        .cwd
        .unwrap_or_else(|| std::env::current_dir().unwrap_or_else(|_| PathBuf::from(".")));
-    match runtime
-        .invoke_tool(
-            req.call,
-            codewhale_execpolicy::AskForApproval::OnRequest,
-            &cwd,
-        )
-        .await
-    {
+    // Resolve approval policy from config instead of hardcoding.
+    let approval_mode = {
+        let cfg = state.config.read().await;
+        cfg.approval_policy
+            .as_deref()
+            .and_then(|p| match p.trim().to_ascii_lowercase().as_str() {
+                "auto" | "yolo" => Some(codewhale_execpolicy::AskForApproval::UnlessTrusted),
+                "never" | "deny" => Some(codewhale_execpolicy::AskForApproval::Never),
+                _ => None,
+            })
+            .unwrap_or(codewhale_execpolicy::AskForApproval::OnRequest)
+    };
+    match runtime.invoke_tool(req.call, approval_mode, &cwd).await {
        Ok(value) => Json(value),
        Err(err) => Json(json!({ "ok": false, "error": err.to_string() })),
    }
@@ -314,6 +318,7 @@ async fn app_handler(
 fn build_state(config_path: Option<PathBuf>, auth_token: Option<String>) -> Result<AppState> {
    let store = ConfigStore::load(config_path.clone())?;
    let config = store.config.clone();
+    let exec_policy = store.exec_policy_engine();
    let registry = ModelRegistry::default();

    let state_db_path = config_path
@@ -344,7 +349,7 @@ fn build_state(config_path: Option<PathBuf>, auth_token: Option<String>) -> Resu
        state_store,
        Arc::new(ToolRegistry::default()),
        Arc::new(McpManager::default()),
-        ExecPolicyEngine::new(Vec::new(), Vec::new()),
+        exec_policy,
        hooks,
    );

@@ -879,7 +884,9 @@ async fn process_app_request(
            let message = result.err().map(|e| e.to_string());
            let snapshot = cfg.clone();
            drop(cfg);
-            let _ = persist_config(state, snapshot).await;
+            if let Err(e) = persist_config(state, snapshot).await {
+                tracing::error!("Failed to persist config after set: {e}");
+            }
            AppResponse {
                ok,
                data: json!({ "key": key, "value": value, "error": message }),
@@ -893,7 +900,9 @@ async fn process_app_request(
            let message = result.err().map(|e| e.to_string());
            let snapshot = cfg.clone();
            drop(cfg);
-            let _ = persist_config(state, snapshot).await;
+            if let Err(e) = persist_config(state, snapshot).await {
+                tracing::error!("Failed to persist config after unset: {e}");
+            }
            AppResponse {
                ok,
                data: json!({ "key": key, "error": message }),
@@ -1048,6 +1057,43 @@ mod tests {
        );
    }

+    #[tokio::test]
+    async fn build_state_loads_permissions_into_runtime_policy() {
+        let tmp = tempfile::tempdir().expect("tempdir");
+        let config_path = tmp.path().join("config.toml");
+        fs::write(&config_path, "api_key = \"sk-deepseek-secret\"\n").expect("write config");
+        fs::write(
+            tmp.path().join("permissions.toml"),
+            r#"
+            [[rules]]
+            tool = "exec_shell"
+            command = "cargo test"
+            "#,
+        )
+        .expect("write permissions");
+
+        let state = build_state(Some(config_path), None).expect("state");
+        let runtime = state.runtime.lock().await;
+        let decision = runtime
+            .exec_policy
+            .check(codewhale_execpolicy::ExecPolicyContext {
+                command: "cargo test --workspace",
+                cwd: "/workspace",
+                tool: Some("exec_shell"),
+                path: None,
+                ask_for_approval: codewhale_execpolicy::AskForApproval::UnlessTrusted,
+                sandbox_mode: Some("workspace-write"),
+            })
+            .expect("policy check");
+
+        assert!(decision.allow);
+        assert!(decision.requires_approval);
+        assert_eq!(
+            decision.matched_rule.as_deref(),
+            Some("tool=exec_shell command=cargo test")
+        );
+    }
+
    #[test]
    fn non_loopback_bind_without_auth_fails_fast() {
        let options = AppServerOptions {
@@ -1067,7 +1113,10 @@ mod tests {

    #[tokio::test]
    async fn stdio_transport_keeps_raw_config_get_for_legacy_clients() {
-        let state = build_state(None, None).expect("state");
+        let tmp = tempfile::tempdir().expect("tempdir");
+        let config_path = tmp.path().join("config.toml");
+        fs::write(&config_path, "").expect("write config");
+        let state = build_state(Some(config_path), None).expect("state");
        {
            let mut cfg = state.config.write().await;
            cfg.api_key = Some("sk-deepseek-secret".to_string());
@@ -27,6 +27,8 @@ struct Cli {

 #[tokio::main]
 async fn main() -> Result<()> {
+    install_rustls_crypto_provider();
+
    let cli = Cli::parse();
    let listen: SocketAddr = format!("{}:{}", cli.host, cli.port)
        .parse()
@@ -41,6 +43,10 @@ async fn main() -> Result<()> {
    .await
 }

+fn install_rustls_crypto_provider() {
+    let _ = rustls::crypto::ring::default_provider().install_default();
+}
+
 fn app_server_token_from_env() -> Option<String> {
    std::env::var("CODEWHALE_APP_SERVER_TOKEN")
        .ok()
@@ -15,12 +15,6 @@ path = "src/main.rs"
 name = "codew"
 path = "src/bin/codew_legacy_shim.rs"

-# Legacy alias — forwards to `codewhale` and prints a deprecation notice.
-# Will be removed in v0.9.0.
-[[bin]]
-name = "deepseek"
-path = "src/bin/deepseek_legacy_shim.rs"
-
 [dependencies]
 anyhow.workspace = true
 clap.workspace = true
@@ -38,6 +32,7 @@ dirs.workspace = true
 serde.workspace = true
 serde_json.workspace = true
 reqwest = { workspace = true, features = ["blocking"] }
+rustls.workspace = true
 semver.workspace = true
 tokio.workspace = true
 sha2.workspace = true
@@ -1,61 +0,0 @@
-//! Legacy `deepseek` alias.
-//!
-//! Forwards argv to the `codewhale` dispatcher and prints a one-line
-//! deprecation notice to stderr on each invocation. This binary exists
-//! for one release cycle to give existing installs a smooth path to the
-//! new name; it will be removed in v0.9.0. See `docs/REBRAND.md` for the
-//! full migration story.
-
-use std::env;
-use std::process::Command;
-
-fn main() {
-    eprintln!(
-        "warning: `deepseek` is deprecated; run `codewhale` instead. \
-         This alias will be removed in v0.9.0."
-    );
-    let args: Vec<String> = env::args_os()
-        .skip(1)
-        .map(|a| a.to_string_lossy().into_owned())
-        .collect();
-
-    let status = match spawn_codewhale(&args) {
-        Ok(s) => s,
-        Err(e) => {
-            eprintln!(
-                "error: failed to spawn `codewhale`: {e}. Is it on PATH? \
-                 Install with `cargo install codewhale-cli` or via npm/Homebrew."
-            );
-            std::process::exit(127);
-        }
-    };
-    std::process::exit(status.code().unwrap_or(1));
-}
-
-fn spawn_codewhale(args: &[String]) -> std::io::Result<std::process::ExitStatus> {
-    // Try PATH first.
-    match Command::new("codewhale").args(args).status() {
-        Ok(s) => return Ok(s),
-        Err(e) if e.kind() == std::io::ErrorKind::NotFound => {}
-        Err(e) => return Err(e),
-    }
-
-    // On Windows, after an update the sibling `codewhale.exe` may be in the
-    // same directory as this shim but not on PATH (#2006).
-    #[cfg(windows)]
-    {
-        if let Ok(exe_path) = env::current_exe()
-            && let Some(dir) = exe_path.parent()
-        {
-            let sibling = dir.join("codewhale.exe");
-            if sibling.is_file() {
-                return Command::new(sibling).args(args).status();
-            }
-        }
-    }
-
-    Err(std::io::Error::new(
-        std::io::ErrorKind::NotFound,
-        "codewhale not found on PATH or in sibling directory",
-    ))
-}
@@ -471,7 +471,13 @@ struct AppServerArgs {

 const MCP_SERVER_DEFINITIONS_KEY: &str = "mcp.server_definitions";

+fn install_rustls_crypto_provider() {
+    let _ = rustls::crypto::ring::default_provider().install_default();
+}
+
 pub fn run_cli() -> std::process::ExitCode {
+    install_rustls_crypto_provider();
+
    match run() {
        Ok(()) => std::process::ExitCode::SUCCESS,
        Err(err) => {
@@ -2965,6 +2971,7 @@ mod tests {
            api_key_source: Some(RuntimeApiKeySource::Keyring),
            base_url: "https://openai-compatible.example/v4".to_string(),
            auth_mode: Some("api_key".to_string()),
+            insecure_skip_tls_verify: false,
            output_mode: None,
            log_level: None,
            telemetry: false,
@@ -3024,6 +3031,7 @@ mod tests {
            api_key_source: Some(RuntimeApiKeySource::ConfigFile),
            base_url: "https://api.deepseek.com/beta".to_string(),
            auth_mode: Some("api_key".to_string()),
+            insecure_skip_tls_verify: false,
            output_mode: None,
            log_level: None,
            telemetry: false,
@@ -3079,6 +3087,7 @@ mod tests {
            api_key_source: Some(RuntimeApiKeySource::Keyring),
            base_url: "https://api.moonshot.ai/v1".to_string(),
            auth_mode: Some("api_key".to_string()),
+            insecure_skip_tls_verify: false,
            output_mode: None,
            log_level: None,
            telemetry: false,
@@ -3145,6 +3154,7 @@ mod tests {
            api_key_source: None,
            base_url: "https://openai-compatible.example/v4".to_string(),
            auth_mode: None,
+            insecure_skip_tls_verify: false,
            output_mode: None,
            log_level: None,
            telemetry: false,
@@ -3240,6 +3250,7 @@ mod tests {
                api_key_source: Some(RuntimeApiKeySource::Keyring),
                base_url: "http://localhost:8000/v1".to_string(),
                auth_mode: Some("api_key".to_string()),
+                insecure_skip_tls_verify: false,
                output_mode: None,
                log_level: None,
                telemetry: false,
@@ -20,6 +20,12 @@ use std::io::Write;

 /// Run the self-update workflow.
 pub fn run_update(beta: bool, check_only: bool, proxy_arg: Option<String>) -> Result<()> {
+    #[cfg(target_env = "ohos")]
+    {
+        let _ = (beta, check_only, proxy_arg);
+        bail!("self-update is not supported on HarmonyOS/OpenHarmony yet");
+    }
+
    let current_exe =
        std::env::current_exe().context("failed to determine current executable path")?;
    let targets = update_targets_for_exe(&current_exe);
@@ -353,6 +359,8 @@ pub(crate) fn validate_and_build_proxy(proxy_str: &str) -> Result<Proxy> {
 }

 fn update_http_client(proxy: Option<&Proxy>) -> Result<reqwest::blocking::Client> {
+    let _ = rustls::crypto::ring::default_provider().install_default();
+
    let mut builder = reqwest::blocking::Client::builder();
    if let Some(proxy) = proxy {
        builder = builder.proxy(proxy.clone());
@@ -0,0 +1,363 @@
+//! Built-in provider metadata.
+//!
+//! This module is a metadata foundation for collapsing provider drift over
+//! time. It deliberately does not mutate request bodies or choose fallback
+//! providers; runtime routing remains in `ConfigToml::resolve_runtime_options`.
+
+use super::{
+    DEFAULT_ARCEE_BASE_URL, DEFAULT_ARCEE_MODEL, DEFAULT_ATLASCLOUD_BASE_URL,
+    DEFAULT_ATLASCLOUD_MODEL, DEFAULT_DEEPSEEK_BASE_URL, DEFAULT_DEEPSEEK_MODEL,
+    DEFAULT_FIREWORKS_BASE_URL, DEFAULT_FIREWORKS_MODEL, DEFAULT_HUGGINGFACE_BASE_URL,
+    DEFAULT_HUGGINGFACE_MODEL, DEFAULT_MOONSHOT_BASE_URL, DEFAULT_MOONSHOT_MODEL,
+    DEFAULT_NOVITA_BASE_URL, DEFAULT_NOVITA_MODEL, DEFAULT_NVIDIA_NIM_BASE_URL,
+    DEFAULT_NVIDIA_NIM_MODEL, DEFAULT_OLLAMA_BASE_URL, DEFAULT_OLLAMA_MODEL,
+    DEFAULT_OPENAI_BASE_URL, DEFAULT_OPENAI_MODEL, DEFAULT_OPENROUTER_BASE_URL,
+    DEFAULT_OPENROUTER_MODEL, DEFAULT_SGLANG_BASE_URL, DEFAULT_SGLANG_MODEL,
+    DEFAULT_SILICONFLOW_BASE_URL, DEFAULT_SILICONFLOW_CN_BASE_URL, DEFAULT_SILICONFLOW_MODEL,
+    DEFAULT_VLLM_BASE_URL, DEFAULT_VLLM_MODEL, DEFAULT_VOLCENGINE_BASE_URL,
+    DEFAULT_VOLCENGINE_MODEL, DEFAULT_WANJIE_ARK_BASE_URL, DEFAULT_WANJIE_ARK_MODEL,
+    DEFAULT_XIAOMI_MIMO_BASE_URL, DEFAULT_XIAOMI_MIMO_MODEL, ProviderKind,
+};
+
+/// Wire protocol spoken by a provider.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum WireFormat {
+    /// OpenAI-compatible `/v1/chat/completions` style payloads.
+    ChatCompletions,
+}
+
+/// Static metadata for a built-in model provider.
+pub trait Provider: Send + Sync {
+    /// Provider enum variant represented by this entry.
+    fn kind(&self) -> ProviderKind;
+
+    /// Canonical provider identifier.
+    fn id(&self) -> &'static str {
+        self.kind().as_str()
+    }
+
+    /// Human-readable provider label for UIs and diagnostics.
+    fn display_name(&self) -> &'static str;
+
+    /// Default base URL used when no config/env/CLI override is present.
+    fn default_base_url(&self) -> &'static str;
+
+    /// Default model used when no config/env/CLI override is present.
+    fn default_model(&self) -> &'static str;
+
+    /// Environment variable candidates used for this provider's API key.
+    fn env_vars(&self) -> &'static [&'static str];
+
+    /// TOML table key under `[providers.<key>]`.
+    fn provider_config_key(&self) -> &'static str;
+
+    /// Wire format used by the provider.
+    fn wire(&self) -> WireFormat {
+        WireFormat::ChatCompletions
+    }
+}
+
+macro_rules! provider {
+    (
+        $struct_name:ident,
+        $kind:ident,
+        $display_name:literal,
+        $base_url:ident,
+        $model:ident,
+        [$($env_var:literal),* $(,)?],
+        $config_key:literal
+    ) => {
+        /// Zero-sized metadata entry for this built-in provider.
+        pub struct $struct_name;
+
+        impl Provider for $struct_name {
+            fn kind(&self) -> ProviderKind {
+                ProviderKind::$kind
+            }
+
+            fn display_name(&self) -> &'static str {
+                $display_name
+            }
+
+            fn default_base_url(&self) -> &'static str {
+                $base_url
+            }
+
+            fn default_model(&self) -> &'static str {
+                $model
+            }
+
+            fn env_vars(&self) -> &'static [&'static str] {
+                &[$($env_var),*]
+            }
+
+            fn provider_config_key(&self) -> &'static str {
+                $config_key
+            }
+        }
+    };
+}
+
+provider!(
+    Deepseek,
+    Deepseek,
+    "DeepSeek",
+    DEFAULT_DEEPSEEK_BASE_URL,
+    DEFAULT_DEEPSEEK_MODEL,
+    ["DEEPSEEK_API_KEY"],
+    "deepseek"
+);
+provider!(
+    NvidiaNim,
+    NvidiaNim,
+    "NVIDIA NIM",
+    DEFAULT_NVIDIA_NIM_BASE_URL,
+    DEFAULT_NVIDIA_NIM_MODEL,
+    ["NVIDIA_API_KEY", "NVIDIA_NIM_API_KEY", "DEEPSEEK_API_KEY"],
+    "nvidia_nim"
+);
+provider!(
+    Openai,
+    Openai,
+    "OpenAI-compatible",
+    DEFAULT_OPENAI_BASE_URL,
+    DEFAULT_OPENAI_MODEL,
+    ["OPENAI_API_KEY"],
+    "openai"
+);
+provider!(
+    Atlascloud,
+    Atlascloud,
+    "AtlasCloud",
+    DEFAULT_ATLASCLOUD_BASE_URL,
+    DEFAULT_ATLASCLOUD_MODEL,
+    ["ATLASCLOUD_API_KEY"],
+    "atlascloud"
+);
+provider!(
+    WanjieArk,
+    WanjieArk,
+    "Wanjie Ark",
+    DEFAULT_WANJIE_ARK_BASE_URL,
+    DEFAULT_WANJIE_ARK_MODEL,
+    [
+        "WANJIE_ARK_API_KEY",
+        "WANJIE_API_KEY",
+        "WANJIE_MAAS_API_KEY"
+    ],
+    "wanjie_ark"
+);
+provider!(
+    Volcengine,
+    Volcengine,
+    "Volcengine Ark",
+    DEFAULT_VOLCENGINE_BASE_URL,
+    DEFAULT_VOLCENGINE_MODEL,
+    [
+        "VOLCENGINE_API_KEY",
+        "VOLCENGINE_ARK_API_KEY",
+        "ARK_API_KEY"
+    ],
+    "volcengine"
+);
+provider!(
+    Openrouter,
+    Openrouter,
+    "OpenRouter",
+    DEFAULT_OPENROUTER_BASE_URL,
+    DEFAULT_OPENROUTER_MODEL,
+    ["OPENROUTER_API_KEY"],
+    "openrouter"
+);
+provider!(
+    XiaomiMimo,
+    XiaomiMimo,
+    "Xiaomi MiMo",
+    DEFAULT_XIAOMI_MIMO_BASE_URL,
+    DEFAULT_XIAOMI_MIMO_MODEL,
+    [
+        "XIAOMI_MIMO_TOKEN_PLAN_API_KEY",
+        "MIMO_TOKEN_PLAN_API_KEY",
+        "XIAOMI_MIMO_API_KEY",
+        "XIAOMI_API_KEY",
+        "MIMO_API_KEY",
+    ],
+    "xiaomi_mimo"
+);
+provider!(
+    Novita,
+    Novita,
+    "Novita",
+    DEFAULT_NOVITA_BASE_URL,
+    DEFAULT_NOVITA_MODEL,
+    ["NOVITA_API_KEY"],
+    "novita"
+);
+provider!(
+    Fireworks,
+    Fireworks,
+    "Fireworks",
+    DEFAULT_FIREWORKS_BASE_URL,
+    DEFAULT_FIREWORKS_MODEL,
+    ["FIREWORKS_API_KEY"],
+    "fireworks"
+);
+provider!(
+    Siliconflow,
+    Siliconflow,
+    "SiliconFlow",
+    DEFAULT_SILICONFLOW_BASE_URL,
+    DEFAULT_SILICONFLOW_MODEL,
+    ["SILICONFLOW_API_KEY"],
+    "siliconflow"
+);
+provider!(
+    SiliconflowCN,
+    SiliconflowCN,
+    "SiliconFlow CN",
+    DEFAULT_SILICONFLOW_CN_BASE_URL,
+    DEFAULT_SILICONFLOW_MODEL,
+    ["SILICONFLOW_API_KEY"],
+    "siliconflow"
+);
+provider!(
+    Arcee,
+    Arcee,
+    "Arcee",
+    DEFAULT_ARCEE_BASE_URL,
+    DEFAULT_ARCEE_MODEL,
+    ["ARCEE_API_KEY"],
+    "arcee"
+);
+provider!(
+    Moonshot,
+    Moonshot,
+    "Moonshot",
+    DEFAULT_MOONSHOT_BASE_URL,
+    DEFAULT_MOONSHOT_MODEL,
+    ["MOONSHOT_API_KEY", "KIMI_API_KEY"],
+    "moonshot"
+);
+provider!(
+    Sglang,
+    Sglang,
+    "SGLang",
+    DEFAULT_SGLANG_BASE_URL,
+    DEFAULT_SGLANG_MODEL,
+    ["SGLANG_API_KEY"],
+    "sglang"
+);
+provider!(
+    Vllm,
+    Vllm,
+    "vLLM",
+    DEFAULT_VLLM_BASE_URL,
+    DEFAULT_VLLM_MODEL,
+    ["VLLM_API_KEY"],
+    "vllm"
+);
+provider!(
+    Ollama,
+    Ollama,
+    "Ollama",
+    DEFAULT_OLLAMA_BASE_URL,
+    DEFAULT_OLLAMA_MODEL,
+    ["OLLAMA_API_KEY"],
+    "ollama"
+);
+provider!(
+    Huggingface,
+    Huggingface,
+    "Hugging Face",
+    DEFAULT_HUGGINGFACE_BASE_URL,
+    DEFAULT_HUGGINGFACE_MODEL,
+    ["HUGGINGFACE_API_KEY", "HF_TOKEN"],
+    "huggingface"
+);
+
+static DEEPSEEK: Deepseek = Deepseek;
+static NVIDIA_NIM: NvidiaNim = NvidiaNim;
+static OPENAI: Openai = Openai;
+static ATLASCLOUD: Atlascloud = Atlascloud;
+static WANJIE_ARK: WanjieArk = WanjieArk;
+static VOLCENGINE: Volcengine = Volcengine;
+static OPENROUTER: Openrouter = Openrouter;
+static XIAOMI_MIMO: XiaomiMimo = XiaomiMimo;
+static NOVITA: Novita = Novita;
+static FIREWORKS: Fireworks = Fireworks;
+static SILICONFLOW: Siliconflow = Siliconflow;
+static SILICONFLOW_CN: SiliconflowCN = SiliconflowCN;
+static ARCEE: Arcee = Arcee;
+static MOONSHOT: Moonshot = Moonshot;
+static SGLANG: Sglang = Sglang;
+static VLLM: Vllm = Vllm;
+static OLLAMA: Ollama = Ollama;
+static HUGGINGFACE: Huggingface = Huggingface;
+
+static PROVIDER_REGISTRY: [&dyn Provider; 18] = [
+    &DEEPSEEK,
+    &NVIDIA_NIM,
+    &OPENAI,
+    &ATLASCLOUD,
+    &WANJIE_ARK,
+    &VOLCENGINE,
+    &OPENROUTER,
+    &XIAOMI_MIMO,
+    &NOVITA,
+    &FIREWORKS,
+    &SILICONFLOW,
+    &SILICONFLOW_CN,
+    &ARCEE,
+    &MOONSHOT,
+    &SGLANG,
+    &VLLM,
+    &OLLAMA,
+    &HUGGINGFACE,
+];
+
+/// Return all built-in provider metadata entries in `ProviderKind::ALL` order.
+#[must_use]
+pub fn all_providers() -> &'static [&'static dyn Provider] {
+    &PROVIDER_REGISTRY
+}
+
+/// Find a provider by canonical id only.
+#[must_use]
+pub fn lookup_provider(id: &str) -> Option<&'static dyn Provider> {
+    let id = id.trim();
+    all_providers()
+        .iter()
+        .copied()
+        .find(|provider| provider.id() == id)
+}
+
+/// Resolve a provider by canonical id or supported legacy alias.
+#[must_use]
+pub fn resolve_provider(id_or_alias: &str) -> Option<&'static dyn Provider> {
+    ProviderKind::parse(id_or_alias).map(provider_for_kind)
+}
+
+/// Return metadata for a known provider kind.
+#[must_use]
+pub fn provider_for_kind(kind: ProviderKind) -> &'static dyn Provider {
+    match kind {
+        ProviderKind::Deepseek => &DEEPSEEK,
+        ProviderKind::NvidiaNim => &NVIDIA_NIM,
+        ProviderKind::Openai => &OPENAI,
+        ProviderKind::Atlascloud => &ATLASCLOUD,
+        ProviderKind::WanjieArk => &WANJIE_ARK,
+        ProviderKind::Volcengine => &VOLCENGINE,
+        ProviderKind::Openrouter => &OPENROUTER,
+        ProviderKind::XiaomiMimo => &XIAOMI_MIMO,
+        ProviderKind::Novita => &NOVITA,
+        ProviderKind::Fireworks => &FIREWORKS,
+        ProviderKind::Siliconflow => &SILICONFLOW,
+        ProviderKind::SiliconflowCN => &SILICONFLOW_CN,
+        ProviderKind::Arcee => &ARCEE,
+        ProviderKind::Moonshot => &MOONSHOT,
+        ProviderKind::Sglang => &SGLANG,
+        ProviderKind::Vllm => &VLLM,
+        ProviderKind::Ollama => &OLLAMA,
+        ProviderKind::Huggingface => &HUGGINGFACE,
+    }
+}
@@ -18,4 +18,5 @@ codewhale-protocol = { path = "../protocol", version = "0.8.54" }
 codewhale-state = { path = "../state", version = "0.8.54" }
 codewhale-tools = { path = "../tools", version = "0.8.54" }
 serde_json.workspace = true
+tracing.workspace = true
 uuid.workspace = true
@@ -748,7 +748,9 @@ impl Runtime {
        hooks: HookDispatcher,
    ) -> Self {
        let mut jobs = JobManager::default();
-        let _ = jobs.load_from_store(&state);
+        if let Err(e) = jobs.load_from_store(&state) {
+            tracing::warn!("Failed to load job store, starting with empty job list: {e}");
+        }
        Self {
            config,
            model_registry,
@@ -1095,11 +1097,12 @@ impl Runtime {
            ToolPayload::LocalShell { .. } => "exec_shell",
            _ => call.name.as_str(),
        };
+        let policy_path = permission_path_for_call(&call);
        let decision = self.exec_policy.check(ExecPolicyContext {
            command: &command,
            cwd: &policy_cwd,
            tool: Some(policy_tool),
-            path: None,
+            path: policy_path.as_deref(),
            ask_for_approval: approval_mode,
            sandbox_mode: None,
        })?;
@@ -1500,6 +1503,24 @@ fn preview_from_initial_history(initial_history: &InitialHistory) -> String {
    }
 }

+fn permission_path_for_call(call: &ToolCall) -> Option<String> {
+    match &call.payload {
+        ToolPayload::Function { arguments } => serde_json::from_str::<Value>(arguments)
+            .ok()
+            .and_then(|value| {
+                value
+                    .get("path")
+                    .and_then(Value::as_str)
+                    .map(str::to_string)
+            }),
+        ToolPayload::Mcp { raw_arguments, .. } => raw_arguments
+            .get("path")
+            .and_then(Value::as_str)
+            .map(str::to_string),
+        ToolPayload::Custom { .. } | ToolPayload::LocalShell { .. } => None,
+    }
+}
+
 fn truncate_preview(value: &str) -> String {
    value.chars().take(120).collect()
 }
@@ -1806,9 +1827,65 @@ fn job_state_status_to_runtime(status: JobStateStatus) -> JobStatus {
 #[cfg(test)]
 mod tests {
    use super::*;
+    use codewhale_tools::ToolCallSource;

    // ── JobManager: lifecycle ──────────────────────────────────────────

+    #[test]
+    fn permission_path_for_call_extracts_function_path_argument() {
+        let call = ToolCall {
+            name: "read_file".to_string(),
+            payload: ToolPayload::Function {
+                arguments: json!({ "path": "README.md" }).to_string(),
+            },
+            source: ToolCallSource::Direct,
+            raw_tool_call_id: None,
+        };
+
+        assert_eq!(
+            permission_path_for_call(&call).as_deref(),
+            Some("README.md")
+        );
+    }
+
+    #[test]
+    fn permission_path_for_call_extracts_mcp_path_argument() {
+        let call = ToolCall {
+            name: "mcp_fs_read".to_string(),
+            payload: ToolPayload::Mcp {
+                server: "fs".to_string(),
+                tool: "read".to_string(),
+                raw_arguments: json!({ "path": "secrets/token.txt" }),
+                raw_tool_call_id: None,
+            },
+            source: ToolCallSource::Direct,
+            raw_tool_call_id: None,
+        };
+
+        assert_eq!(
+            permission_path_for_call(&call).as_deref(),
+            Some("secrets/token.txt")
+        );
+    }
+
+    #[test]
+    fn permission_path_for_call_ignores_shell_payload() {
+        let call = ToolCall {
+            name: "exec_shell".to_string(),
+            payload: ToolPayload::LocalShell {
+                params: codewhale_protocol::LocalShellParams {
+                    command: "cargo test".to_string(),
+                    cwd: None,
+                    timeout_ms: None,
+                },
+            },
+            source: ToolCallSource::Direct,
+            raw_tool_call_id: None,
+        };
+
+        assert_eq!(permission_path_for_call(&call), None);
+    }
+
    #[test]
    fn enqueue_creates_queued_job_with_zero_progress() {
        let mut jm = JobManager::default();
@@ -359,8 +359,9 @@ impl BashArityDict {
            return true;
        }

-        // Fallback: plain normalised prefix match for patterns not in the table
-        // (preserves backward compatibility with exact-match allow rules).
+        // Fallback: word-boundary prefix match for patterns not in the arity table.
+        // Matches the exact pattern or the pattern followed by a space (i.e., at
+        // word boundary), so "ls" matches "ls" and "ls -la" but NOT "lsof".
        let command_lower = command.trim().to_ascii_lowercase();
        // Normalise whitespace in both sides before comparing.
        let pattern_norm: String = pattern_lower
@@ -371,7 +372,9 @@ impl BashArityDict {
            .split_whitespace()
            .collect::<Vec<_>>()
            .join(" ");
-        command_norm == pattern_norm || command_norm.starts_with(&format!("{pattern_norm} "))
+        command_norm == pattern_norm
+            || (command_norm.starts_with(&pattern_norm)
+                && command_norm.as_bytes().get(pattern_norm.len()) == Some(&b' '))
    }

    /// Iterate over all entries in the dictionary.
@@ -313,21 +313,26 @@ impl ExecPolicyEngine {

        self.rulesets
            .iter()
-            .flat_map(|ruleset| ruleset.ask_rules.iter())
-            .filter(|rule| rule.tool == tool)
-            .filter(|rule| match rule.command.as_deref() {
+            .flat_map(|ruleset| {
+                ruleset
+                    .ask_rules
+                    .iter()
+                    .map(move |rule| (ruleset.layer, rule))
+            })
+            .filter(|(_, rule)| rule.tool == tool)
+            .filter(|(_, rule)| match rule.command.as_deref() {
                Some(command) => self.arity_dict.allow_rule_matches(command, ctx.command),
                None => true,
            })
-            .filter(|rule| match (rule.path.as_deref(), ctx.path) {
+            .filter(|(_, rule)| match (rule.path.as_deref(), ctx.path) {
                (Some(pattern), Some(path)) => {
                    normalize_path_value(pattern) == normalize_path_value(path)
                }
                (Some(_), None) => false,
                (None, _) => true,
            })
-            .max_by_key(|rule| ask_rule_specificity(rule))
-            .cloned()
+            .max_by_key(|(layer, rule)| (*layer, ask_rule_specificity(rule)))
+            .map(|(_, rule)| rule.clone())
    }

    /// Records an approval key for the current session so subsequent checks skip approval.
@@ -347,11 +352,15 @@ impl ExecPolicyEngine {
    pub fn check(&self, ctx: ExecPolicyContext<'_>) -> Result<ExecPolicyDecision> {
        let normalized = normalize_command(ctx.command);
        let (trusted_prefixes, denied_prefixes) = self.resolve_prefixes();
-        // Deny rules use simple prefix matching (no arity semantics needed).
-        if let Some(rule) = denied_prefixes
-            .iter()
-            .find(|rule| normalized.starts_with(&normalize_command(rule)))
-        {
+        // Deny rules use word-boundary prefix matching: the command must either
+        // equal the rule or start with the rule followed by a space, so "rm"
+        // blocks "rm -rf /" but NOT "rmdir" or "rmview".
+        if let Some(rule) = denied_prefixes.iter().find(|rule| {
+            let norm_rule = normalize_command(rule);
+            normalized == norm_rule
+                || (normalized.starts_with(&norm_rule)
+                    && normalized.as_bytes().get(norm_rule.len()) == Some(&b' '))
+        }) {
            return Ok(ExecPolicyDecision {
                allow: false,
                requires_approval: false,
@@ -373,51 +382,82 @@ impl ExecPolicyEngine {

        let ask_rule = self.matching_ask_rule(&ctx);

-        let requirement = match &ctx.ask_for_approval {
-            AskForApproval::Never => {
-                if let Some(rule) = &ask_rule {
-                    ExecApprovalRequirement::Forbidden {
-                        reason: format!(
-                            "Typed ask rule '{}' requires approval, but approval policy is never.",
-                            rule.label()
-                        ),
-                    }
-                } else {
-                    ExecApprovalRequirement::Skip {
-                        bypass_sandbox: false,
-                        proposed_execpolicy_amendment: None,
+        let mut matched_ask_rule = None;
+        // Resolve a matching typed ask-rule first. Ask-rules take precedence over
+        // mode-based handling for everything except `Never` (which forbids,
+        // because no prompt can be shown) and `Reject { rules: true }` (which
+        // explicitly rejects rule-exceptions). This ordering is checked against
+        // the experimental `if let` match-guard the original PR used; it is
+        // reproduced here with plain control flow for edition-2024 stable.
+        let ask_rule_requirement = match &ctx.ask_for_approval {
+            AskForApproval::Never | AskForApproval::Reject { rules: true, .. } => None,
+            _ => ask_rule.as_ref().map(|rule| {
+                matched_ask_rule = Some(rule.label());
+                ExecApprovalRequirement::NeedsApproval {
+                    reason: format!("Typed ask rule '{}' requires approval.", rule.label()),
+                    proposed_execpolicy_amendment: None,
+                    // A typed ask-rule approval (exec/fn/MCP) must not touch
+                    // network policy. The original PR allow-listed `ctx.cwd` as a
+                    // network host here, which is incorrect and security-relevant:
+                    // approving e.g. an exec rule should never create a network
+                    // allow-entry. Emit no network amendments for ask-rule prompts.
+                    proposed_network_policy_amendments: Vec::new(),
+                }
+            }),
+        };
+
+        let requirement = if let Some(req) = ask_rule_requirement {
+            req
+        } else {
+            match &ctx.ask_for_approval {
+                AskForApproval::Never => {
+                    if let Some(rule) = &ask_rule {
+                        matched_ask_rule = Some(rule.label());
+                        ExecApprovalRequirement::Forbidden {
+                            reason: format!(
+                                "Typed ask rule '{}' requires approval, but approval policy is never.",
+                                rule.label()
+                            ),
+                        }
+                    } else {
+                        ExecApprovalRequirement::Skip {
+                            bypass_sandbox: false,
+                            proposed_execpolicy_amendment: None,
+                        }
                    }
                }
+                AskForApproval::Reject { rules, .. } if *rules => {
+                    ExecApprovalRequirement::Forbidden {
+                        reason: "Policy is configured to reject rule-exceptions.".to_string(),
+                    }
+                }
+                AskForApproval::UnlessTrusted if is_trusted => ExecApprovalRequirement::Skip {
+                    bypass_sandbox: false,
+                    proposed_execpolicy_amendment: None,
+                },
+                AskForApproval::OnFailure => ExecApprovalRequirement::Skip {
+                    bypass_sandbox: false,
+                    proposed_execpolicy_amendment: None,
+                },
+                _ => ExecApprovalRequirement::NeedsApproval {
+                    reason: if is_trusted {
+                        "Approval requested by policy mode.".to_string()
+                    } else {
+                        "Unmatched command prefix requires approval.".to_string()
+                    },
+                    proposed_execpolicy_amendment: if is_trusted {
+                        None
+                    } else {
+                        Some(ExecPolicyAmendment {
+                            prefixes: vec![first_token(ctx.command)],
+                        })
+                    },
+                    proposed_network_policy_amendments: vec![NetworkPolicyAmendment {
+                        host: ctx.cwd.to_string(),
+                        action: NetworkPolicyRuleAction::Allow,
+                    }],
+                },
            }
-            AskForApproval::UnlessTrusted if is_trusted => ExecApprovalRequirement::Skip {
-                bypass_sandbox: false,
-                proposed_execpolicy_amendment: None,
-            },
-            AskForApproval::OnFailure => ExecApprovalRequirement::Skip {
-                bypass_sandbox: false,
-                proposed_execpolicy_amendment: None,
-            },
-            AskForApproval::Reject { rules, .. } if *rules => ExecApprovalRequirement::Forbidden {
-                reason: "Policy is configured to reject rule-exceptions.".to_string(),
-            },
-            _ => ExecApprovalRequirement::NeedsApproval {
-                reason: if is_trusted {
-                    "Approval requested by policy mode.".to_string()
-                } else {
-                    "Unmatched command prefix requires approval.".to_string()
-                },
-                proposed_execpolicy_amendment: if is_trusted {
-                    None
-                } else {
-                    Some(ExecPolicyAmendment {
-                        prefixes: vec![first_token(ctx.command)],
-                    })
-                },
-                proposed_network_policy_amendments: vec![NetworkPolicyAmendment {
-                    host: ctx.cwd.to_string(),
-                    action: NetworkPolicyRuleAction::Allow,
-                }],
-            },
        };

        let (allow, requires_approval) = match requirement {
@@ -426,12 +466,6 @@ impl ExecPolicyEngine {
            ExecApprovalRequirement::Forbidden { .. } => (false, false),
        };

-        let matched_ask_rule = if matches!(&ctx.ask_for_approval, AskForApproval::Never) {
-            ask_rule.map(|rule| rule.label())
-        } else {
-            None
-        };
-
        Ok(ExecPolicyDecision {
            allow,
            requires_approval,
@@ -442,7 +476,13 @@ impl ExecPolicyEngine {
 }

 fn normalize_command(value: &str) -> String {
-    value.trim().to_ascii_lowercase()
+    // Normalize: lowercase, collapse internal whitespace to single spaces.
+    // This prevents bypass via "git  status" (double space) vs "git status".
+    value
+        .split_whitespace()
+        .collect::<Vec<_>>()
+        .join(" ")
+        .to_ascii_lowercase()
 }

 fn first_token(command: &str) -> String {
@@ -629,7 +669,7 @@ mod tests {
    }

    #[test]
-    fn typed_ask_rule_is_ignored_outside_never_mode_for_now() {
+    fn typed_ask_rule_requires_approval_under_unless_trusted() {
        let engine = ExecPolicyEngine::with_rulesets(vec![
            Ruleset::user(vec![], vec![])
                .with_ask_rules(vec![ToolAskRule::exec_shell("cargo test")]),
@@ -641,18 +681,49 @@ mod tests {

        assert!(decision.allow);
        assert!(decision.requires_approval);
-        assert_eq!(decision.matched_rule, None);
+        assert_eq!(
+            decision.matched_rule.as_deref(),
+            Some("tool=exec_shell command=cargo test")
+        );
        match decision.requirement {
            ExecApprovalRequirement::NeedsApproval {
-                proposed_execpolicy_amendment: Some(amendment),
+                proposed_execpolicy_amendment,
+                proposed_network_policy_amendments,
                ..
-            } => assert_eq!(amendment.prefixes, vec!["cargo"]),
-            other => panic!("expected unchanged approval behavior, got {other:?}"),
+            } => {
+                assert_eq!(proposed_execpolicy_amendment, None);
+                // A typed ask-rule approval must not allow-list the cwd (or
+                // anything else) as a network host. See the NeedsApproval arm.
+                assert!(
+                    proposed_network_policy_amendments.is_empty(),
+                    "ask-rule approval must not propose network amendments, got {proposed_network_policy_amendments:?}"
+                );
+            }
+            other => panic!("expected typed ask approval, got {other:?}"),
        }
    }

    #[test]
-    fn typed_ask_rule_does_not_change_allow_deny_precedence() {
+    fn typed_ask_rule_requires_approval_under_on_failure() {
+        let engine = ExecPolicyEngine::with_rulesets(vec![
+            Ruleset::user(vec![], vec![])
+                .with_ask_rules(vec![ToolAskRule::exec_shell("cargo test")]),
+        ]);
+
+        let decision = engine
+            .check(ctx("cargo test --workspace", AskForApproval::OnFailure))
+            .unwrap();
+
+        assert!(decision.allow);
+        assert!(decision.requires_approval);
+        assert_eq!(
+            decision.reason(),
+            "Typed ask rule 'tool=exec_shell command=cargo test' requires approval."
+        );
+    }
+
+    #[test]
+    fn typed_ask_rule_overrides_trusted_but_not_deny() {
        let engine = ExecPolicyEngine::with_rulesets(vec![
            Ruleset::user(
                vec!["cargo test".to_string()],
@@ -665,8 +736,11 @@ mod tests {
            .check(ctx("cargo test --workspace", AskForApproval::UnlessTrusted))
            .unwrap();
        assert!(trusted.allow);
-        assert!(!trusted.requires_approval);
-        assert_eq!(trusted.matched_rule.as_deref(), Some("cargo test"));
+        assert!(trusted.requires_approval);
+        assert_eq!(
+            trusted.matched_rule.as_deref(),
+            Some("tool=exec_shell command=cargo test")
+        );

        let denied = engine
            .check(ctx("cargo test --danger", AskForApproval::Never))
@@ -680,6 +754,56 @@ mod tests {
        );
    }

+    #[test]
+    fn typed_ask_rule_prefers_higher_layer_before_specificity() {
+        let engine = ExecPolicyEngine::with_rulesets(vec![
+            Ruleset::agent(vec![], vec![])
+                .with_ask_rules(vec![ToolAskRule::exec_shell("cargo test --workspace")]),
+            Ruleset::user(vec![], vec![])
+                .with_ask_rules(vec![ToolAskRule::exec_shell("cargo test")]),
+        ]);
+
+        let decision = engine
+            .check(ctx(
+                "cargo test --workspace --all-features",
+                AskForApproval::UnlessTrusted,
+            ))
+            .unwrap();
+
+        assert!(decision.requires_approval);
+        assert_eq!(
+            decision.matched_rule.as_deref(),
+            Some("tool=exec_shell command=cargo test")
+        );
+    }
+
+    #[test]
+    fn reject_rules_mode_still_forbids_matching_ask_rule() {
+        let engine = ExecPolicyEngine::with_rulesets(vec![
+            Ruleset::user(vec![], vec![])
+                .with_ask_rules(vec![ToolAskRule::exec_shell("cargo test")]),
+        ]);
+
+        let decision = engine
+            .check(ctx(
+                "cargo test --workspace",
+                AskForApproval::Reject {
+                    sandbox_approval: false,
+                    rules: true,
+                    mcp_elicitations: false,
+                },
+            ))
+            .unwrap();
+
+        assert!(!decision.allow);
+        assert!(!decision.requires_approval);
+        assert_eq!(decision.matched_rule, None);
+        assert_eq!(
+            decision.reason(),
+            "Policy is configured to reject rule-exceptions."
+        );
+    }
+
    #[test]
    fn typed_ask_rule_label_wins_when_never_blocks_trusted_command() {
        let engine = ExecPolicyEngine::with_rulesets(vec![
@@ -9,6 +9,7 @@ description = "Shared CodeWhale release discovery and version comparison helpers
 [dependencies]
 anyhow.workspace = true
 reqwest = { workspace = true, features = ["blocking"] }
+rustls.workspace = true
 semver.workspace = true
 serde.workspace = true
 serde_json.workspace = true
@@ -19,7 +19,7 @@ keyring = { version = "3", features = ["apple-native"] }
 [target.'cfg(target_os = "windows")'.dependencies]
 keyring = { version = "3", features = ["windows-native"] }

-[target.'cfg(target_os = "linux")'.dependencies]
+[target.'cfg(all(target_os = "linux", not(target_env = "ohos")))'.dependencies]
 keyring = { version = "3", features = ["linux-native-sync-persistent", "crypto-rust"] }

 [dev-dependencies]
@@ -92,7 +92,7 @@ pub trait KeyringStore: Send + Sync {
 /// Wraps the platform credential store:
 /// - **macOS**: Keychain (via `security` framework)
 /// - **Windows**: Credential Manager
-/// - **Linux**: Secret Service (GNOME Keyring / kwallet via dbus)
+/// - **Linux**: Secret Service (GNOME Keyring / kwallet via dbus), excluding OHOS
 ///
 /// This backend is opt-in -- set the [`SECRET_BACKEND_ENV`] environment
 /// variable to `system` or `keyring` to activate it. On platforms without
@@ -124,7 +124,11 @@ impl DefaultKeyringStore {
    /// Probe the OS keyring without writing anything. Returns `Ok(())` if
    /// a backend is reachable, otherwise an error describing why not.
    pub fn probe(&self) -> Result<(), SecretsError> {
-        #[cfg(any(target_os = "macos", target_os = "windows", target_os = "linux"))]
+        #[cfg(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        ))]
        {
            // `Entry::new` is enough to validate the native macOS/Windows
            // backend path. Avoid a dummy read there because it can trigger
@@ -149,7 +153,11 @@ impl DefaultKeyringStore {
                Err(other) => Err(SecretsError::Keyring(other.to_string())),
            }
        }
-        #[cfg(not(any(target_os = "macos", target_os = "windows", target_os = "linux")))]
+        #[cfg(not(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        )))]
        {
            let _ = &self.service;
            Err(SecretsError::Keyring(unsupported_keyring_message()))
@@ -159,7 +167,11 @@ impl DefaultKeyringStore {

 impl KeyringStore for DefaultKeyringStore {
    fn get(&self, key: &str) -> Result<Option<String>, SecretsError> {
-        #[cfg(any(target_os = "macos", target_os = "windows", target_os = "linux"))]
+        #[cfg(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        ))]
        {
            let entry = keyring::Entry::new(&self.service, key)
                .map_err(|err| SecretsError::Keyring(err.to_string()))?;
@@ -169,7 +181,11 @@ impl KeyringStore for DefaultKeyringStore {
                Err(err) => Err(SecretsError::Keyring(err.to_string())),
            }
        }
-        #[cfg(not(any(target_os = "macos", target_os = "windows", target_os = "linux")))]
+        #[cfg(not(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        )))]
        {
            let _ = key;
            Err(SecretsError::Keyring(unsupported_keyring_message()))
@@ -177,7 +193,11 @@ impl KeyringStore for DefaultKeyringStore {
    }

    fn set(&self, key: &str, value: &str) -> Result<(), SecretsError> {
-        #[cfg(any(target_os = "macos", target_os = "windows", target_os = "linux"))]
+        #[cfg(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        ))]
        {
            let entry = keyring::Entry::new(&self.service, key)
                .map_err(|err| SecretsError::Keyring(err.to_string()))?;
@@ -185,7 +205,11 @@ impl KeyringStore for DefaultKeyringStore {
                .set_password(value)
                .map_err(|err| SecretsError::Keyring(err.to_string()))
        }
-        #[cfg(not(any(target_os = "macos", target_os = "windows", target_os = "linux")))]
+        #[cfg(not(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        )))]
        {
            let _ = (key, value);
            Err(SecretsError::Keyring(unsupported_keyring_message()))
@@ -193,7 +217,11 @@ impl KeyringStore for DefaultKeyringStore {
    }

    fn delete(&self, key: &str) -> Result<(), SecretsError> {
-        #[cfg(any(target_os = "macos", target_os = "windows", target_os = "linux"))]
+        #[cfg(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        ))]
        {
            let entry = keyring::Entry::new(&self.service, key)
                .map_err(|err| SecretsError::Keyring(err.to_string()))?;
@@ -202,7 +230,11 @@ impl KeyringStore for DefaultKeyringStore {
                Err(err) => Err(SecretsError::Keyring(err.to_string())),
            }
        }
-        #[cfg(not(any(target_os = "macos", target_os = "windows", target_os = "linux")))]
+        #[cfg(not(any(
+            target_os = "macos",
+            target_os = "windows",
+            all(target_os = "linux", not(target_env = "ohos"))
+        )))]
        {
            let _ = key;
            Err(SecretsError::Keyring(unsupported_keyring_message()))
@@ -214,7 +246,11 @@ impl KeyringStore for DefaultKeyringStore {
    }
 }

-#[cfg(not(any(target_os = "macos", target_os = "windows", target_os = "linux")))]
+#[cfg(not(any(
+    target_os = "macos",
+    target_os = "windows",
+    all(target_os = "linux", not(target_env = "ohos"))
+)))]
 fn unsupported_keyring_message() -> String {
    "system keyring backend is unsupported on this platform".to_string()
 }
@@ -267,7 +267,7 @@ impl StateStore {

    fn init_schema(&self) -> Result<()> {
        let conn = self.conn()?;
-        let user_version: u32 = conn.query_row("PRAGMA user_version;", [], |row| row.get(0))?;
+        let mut user_version: u32 = conn.query_row("PRAGMA user_version;", [], |row| row.get(0))?;
        if user_version == 0 {
            conn.execute_batch(
                r#"
@@ -376,6 +376,104 @@ impl StateStore {
                "#,
            )
            .context("failed to initialize thread schema")?;
+            user_version = 1;
+        }
+        if user_version < 2 {
+            conn.execute_batch(
+                r#"
+                BEGIN;
+                CREATE TABLE IF NOT EXISTS workflow_runs (
+                    id TEXT PRIMARY KEY,
+                    workflow_id TEXT NOT NULL,
+                    goal TEXT NOT NULL,
+                    status TEXT NOT NULL,
+                    input_hash TEXT,
+                    started_at INTEGER NOT NULL,
+                    completed_at INTEGER,
+                    metadata_json TEXT NOT NULL DEFAULT '{}'
+                );
+                CREATE INDEX IF NOT EXISTS idx_workflow_runs_status_started_at
+                    ON workflow_runs(status, started_at DESC);
+                CREATE INDEX IF NOT EXISTS idx_workflow_runs_workflow_started_at
+                    ON workflow_runs(workflow_id, started_at DESC);
+
+                CREATE TABLE IF NOT EXISTS branch_runs (
+                    id TEXT PRIMARY KEY,
+                    workflow_run_id TEXT NOT NULL,
+                    branch_id TEXT NOT NULL,
+                    node_id TEXT NOT NULL,
+                    status TEXT NOT NULL,
+                    started_at INTEGER NOT NULL,
+                    completed_at INTEGER,
+                    result_json TEXT NOT NULL DEFAULT '{}',
+                    FOREIGN KEY(workflow_run_id) REFERENCES workflow_runs(id) ON DELETE CASCADE
+                );
+                CREATE INDEX IF NOT EXISTS idx_branch_runs_workflow_run_id
+                    ON branch_runs(workflow_run_id);
+                CREATE INDEX IF NOT EXISTS idx_branch_runs_branch_id
+                    ON branch_runs(branch_id);
+
+                CREATE TABLE IF NOT EXISTS leaf_runs (
+                    id TEXT PRIMARY KEY,
+                    workflow_run_id TEXT NOT NULL,
+                    branch_run_id TEXT,
+                    leaf_id TEXT NOT NULL,
+                    task_id TEXT NOT NULL,
+                    input_hash TEXT,
+                    status TEXT NOT NULL,
+                    output_json TEXT NOT NULL DEFAULT '{}',
+                    artifacts_json TEXT NOT NULL DEFAULT '[]',
+                    started_at INTEGER NOT NULL,
+                    completed_at INTEGER,
+                    FOREIGN KEY(workflow_run_id) REFERENCES workflow_runs(id) ON DELETE CASCADE,
+                    FOREIGN KEY(branch_run_id) REFERENCES branch_runs(id) ON DELETE SET NULL
+                );
+                CREATE INDEX IF NOT EXISTS idx_leaf_runs_workflow_run_id
+                    ON leaf_runs(workflow_run_id);
+                CREATE INDEX IF NOT EXISTS idx_leaf_runs_replay_lookup
+                    ON leaf_runs(workflow_run_id, leaf_id, input_hash);
+
+                CREATE TABLE IF NOT EXISTS control_node_runs (
+                    id TEXT PRIMARY KEY,
+                    workflow_run_id TEXT NOT NULL,
+                    node_id TEXT NOT NULL,
+                    kind TEXT NOT NULL,
+                    status TEXT NOT NULL,
+                    selected_children_json TEXT NOT NULL DEFAULT '[]',
+                    result_json TEXT NOT NULL DEFAULT '{}',
+                    started_at INTEGER NOT NULL,
+                    completed_at INTEGER,
+                    FOREIGN KEY(workflow_run_id) REFERENCES workflow_runs(id) ON DELETE CASCADE
+                );
+                CREATE INDEX IF NOT EXISTS idx_control_node_runs_workflow_run_id
+                    ON control_node_runs(workflow_run_id);
+                CREATE INDEX IF NOT EXISTS idx_control_node_runs_node_id
+                    ON control_node_runs(node_id);
+
+                CREATE TABLE IF NOT EXISTS teacher_candidates (
+                    id TEXT PRIMARY KEY,
+                    workflow_run_id TEXT NOT NULL,
+                    control_node_run_id TEXT NOT NULL,
+                    candidate_id TEXT NOT NULL,
+                    branch_run_id TEXT,
+                    score REAL,
+                    passed INTEGER,
+                    rationale_json TEXT NOT NULL DEFAULT '{}',
+                    created_at INTEGER NOT NULL,
+                    FOREIGN KEY(workflow_run_id) REFERENCES workflow_runs(id) ON DELETE CASCADE,
+                    FOREIGN KEY(control_node_run_id) REFERENCES control_node_runs(id) ON DELETE CASCADE,
+                    FOREIGN KEY(branch_run_id) REFERENCES branch_runs(id) ON DELETE SET NULL
+                );
+                CREATE INDEX IF NOT EXISTS idx_teacher_candidates_workflow_run_id
+                    ON teacher_candidates(workflow_run_id);
+                CREATE INDEX IF NOT EXISTS idx_teacher_candidates_control_node_run_id
+                    ON teacher_candidates(control_node_run_id);
+
+                PRAGMA user_version = 2;
+                COMMIT;
+                "#,
+            )
+            .context("failed to initialize workflow trace schema")?;
        }
        Ok(())
    }
@@ -12,6 +12,30 @@ fn temp_state_path(label: &str) -> PathBuf {
    ))
 }

+fn assert_workflow_trace_schema(conn: &Connection) {
+    let user_version: u32 = conn
+        .query_row("PRAGMA user_version;", [], |row| row.get(0))
+        .expect("read user_version");
+    assert_eq!(user_version, 2);
+
+    for table in [
+        "workflow_runs",
+        "branch_runs",
+        "leaf_runs",
+        "control_node_runs",
+        "teacher_candidates",
+    ] {
+        let exists: bool = conn
+            .query_row(
+                "SELECT EXISTS(SELECT 1 FROM sqlite_master WHERE type = 'table' AND name = ?1)",
+                [table],
+                |row| row.get(0),
+            )
+            .unwrap_or_else(|err| panic!("read sqlite_master for {table}: {err}"));
+        assert!(exists, "missing workflow trace table {table}");
+    }
+}
+
 #[test]
 fn upsert_and_resume_thread_metadata() {
    let path = temp_state_path("upsert_resume");
@@ -157,6 +181,102 @@ fn init_schema_migration() {
    StateStore::open(Some(path.clone())).expect("open state store");
 }

+#[test]
+fn fresh_schema_includes_workflow_trace_tables() {
+    let path = temp_state_path("fresh_schema_includes_workflow_trace_tables");
+
+    StateStore::open(Some(path.clone())).expect("open state store");
+
+    let conn = Connection::open(&path).expect("open state db");
+    assert_workflow_trace_schema(&conn);
+}
+
+#[test]
+fn v1_schema_migrates_workflow_trace_tables() {
+    let path = temp_state_path("v1_schema_migrates_workflow_trace_tables");
+    let conn = Connection::open(&path).expect("open state db");
+    conn.execute_batch(
+        r#"
+        CREATE TABLE threads (
+            id TEXT PRIMARY KEY,
+            rollout_path TEXT,
+            preview TEXT NOT NULL,
+            ephemeral INTEGER NOT NULL,
+            model_provider TEXT NOT NULL,
+            created_at INTEGER NOT NULL,
+            updated_at INTEGER NOT NULL,
+            status TEXT NOT NULL,
+            path TEXT,
+            cwd TEXT NOT NULL,
+            cli_version TEXT NOT NULL,
+            source TEXT NOT NULL,
+            title TEXT,
+            sandbox_policy TEXT,
+            approval_mode TEXT,
+            archived INTEGER NOT NULL DEFAULT 0,
+            archived_at INTEGER,
+            git_sha TEXT,
+            git_branch TEXT,
+            git_origin_url TEXT,
+            memory_mode TEXT,
+            current_leaf_id INTEGER
+        );
+        CREATE TABLE messages (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            thread_id TEXT NOT NULL,
+            role TEXT NOT NULL,
+            content TEXT NOT NULL,
+            item_json TEXT,
+            created_at INTEGER NOT NULL,
+            parent_entry_id INTEGER
+        );
+        CREATE TABLE checkpoints (
+            thread_id TEXT NOT NULL,
+            checkpoint_id TEXT NOT NULL,
+            state_json TEXT NOT NULL,
+            created_at INTEGER NOT NULL,
+            PRIMARY KEY(thread_id, checkpoint_id)
+        );
+        CREATE TABLE jobs (
+            id TEXT PRIMARY KEY,
+            name TEXT NOT NULL,
+            status TEXT NOT NULL,
+            progress INTEGER,
+            detail TEXT,
+            created_at INTEGER NOT NULL,
+            updated_at INTEGER NOT NULL
+        );
+        CREATE TABLE thread_dynamic_tools (
+            thread_id TEXT NOT NULL,
+            position INTEGER NOT NULL,
+            name TEXT NOT NULL,
+            description TEXT,
+            input_schema TEXT NOT NULL,
+            PRIMARY KEY (thread_id, position)
+        );
+        INSERT INTO threads (
+            id, preview, ephemeral, model_provider, created_at, updated_at, status, cwd, cli_version, source, archived
+        )
+        VALUES (
+            'thread-test-1', 'hello', false, 'deepseek', 0, 0, 'running', '/tmp/project', '0.0.0-test', 'interactive', false
+        );
+        PRAGMA user_version = 1;
+        "#,
+    )
+    .expect("create v1 schema");
+    drop(conn);
+
+    let store = StateStore::open(Some(path.clone())).expect("open state store");
+    let thread = store
+        .get_thread("thread-test-1")
+        .expect("read thread")
+        .expect("thread survives migration");
+    assert_eq!(thread.preview, "hello");
+
+    let conn = Connection::open(&path).expect("open state db");
+    assert_workflow_trace_schema(&conn);
+}
+
 #[test]
 fn init_schema_migration_same_second_messages() {
    let path = temp_state_path("init_schema_migration_same_second_messages");
@@ -13,11 +13,442 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 - **Benchmark harness runners.** Added CodeWhale-native benchmark entry points for SWE-bench, Terminal-Bench, and PinchBench, plus a local PinchBench runner that can grade tool-use traces with an LLM judge.
 - **Direct MiMo benchmark routing.** The benchmark runner now defaults to direct Xiaomi MiMo v2.5 Pro routing when configured, while keeping provider/model selection explicit.
+- Added `/restore list [N]` so users can inspect more side-git rollback
+  snapshots with UTC timestamps before choosing a restore point. Plain
+  `/restore` now shows the 20 most recent snapshots, numeric restore targets can
+  reach beyond that default listing up to a bounded index, and list requests
+  above the visible cap fail explicitly instead of silently truncating.
+- Added HarmonyOS/OpenHarmony support scaffolding: environment-driven
+  `OHOS_NATIVE_SDK` setup scripts and compiler wrappers, platform docs,
+  explicit Rustls ring-provider installation for the no-provider TLS build, and
+  OHOS fallbacks for unsupported keyring, clipboard, sandbox, browser-open, TTY,
+  execpolicy Starlark parsing, and self-update surfaces.
+- Added `scripts/release/check-ohos-deps.sh` and wired it into CI/release
+  preflight so the OpenHarmony target graph fails if unsupported `nix`,
+  `portable-pty`, `starlark`, `arboard`, or `keyring` dependencies re-enter.
+- Added `.github/AUTHOR_MAP` and a CI co-author credit check so harvested
+  commits use GitHub-mappable numeric noreply identities instead of `.local`,
+  placeholder, bot/tool, or raw third-party emails.
+- Added a `turn_end` observer hook that fires after post-turn TUI state and
+  token totals are updated. Hooks receive structured JSON with status, usage,
+  totals, duration, tool count, and queued-message count on stdin; stdout is
+  ignored and failures are warn-only (#1364, #2578).
+- Added provider-scoped `insecure_skip_tls_verify` for private
+  OpenAI-compatible gateways that cannot use a trusted CA bundle. The setting is
+  disabled by default, applies only to the active LLM provider HTTP client, and
+  is surfaced by `codewhale doctor`; `SSL_CERT_FILE` remains the preferred path
+  for corporate or private CA roots. Thanks @wavezhang for the original #1893
+  direction.
+- Added a default-disabled hard-compaction planner that can identify the
+  summarizable middle of a long conversation while preserving the recent tail,
+  existing tool-call/result pair guarantees, and working-set pinning. This
+  harvests the safe planning layer from #2522 without enabling hard compaction
+  or adding a message-rewrite execution path yet. Thanks @HUQIANTAO for the
+  proposal.
+- Added rich PlanArtifact support to `update_plan`: Plan mode can now carry
+  grounded objectives, context, sources, critical files, constraints,
+  verification, risks, and handoff notes through the transcript card, Plan
+  confirmation prompt, `/relay`, fork-state, and saved-session replay.
+- Added the first `codewhale-whaleflow` foundation crate with typed workflow
+  config/IR validation and deterministic phase ordering tests. This preserves
+  the WhaleFlow direction from #2482/#2486 without exposing a runtime
+  `workflow_run` tool until cancellation, replay, and worktree semantics are
+  release-safe. The foundation now includes explicit `WorkflowSpec`,
+  `WorkflowNode`, branch/leaf/policy metadata structs, plus serializable branch,
+  leaf, and control-node result records toward the #2668 TraceStore contract.
+  It also adds a crate-local mock executor skeleton for Sequence, BranchSet,
+  Leaf, Reduce, LoopUntil, Cond, Expand, BranchTournament, and ParetoFrontier
+  control flow so #2669 can progress without spawning agents, applying
+  worktrees, or exposing a `workflow_run` runtime tool yet. A first Starlark
+  authoring layer now compiles fail-closed model-authored workflow files into
+  that typed IR, with `rlm_cache_change.star` and `issue_fix_tournament.star`
+  examples plus a one-pass repair for common `ctx.*` authoring aliases (#2670).
+  Leaf, branch, and workflow execution results now carry deterministic token
+  and cost telemetry fields that the mock executor can aggregate without live
+  provider calls or runtime sub-agent fanout (#2486). The mock executor now
+  carries crate-local cancellation and budget-exhaustion status markers so the
+  branch/leaf runtime contract can be tested before live workflow execution is
+  exposed (#2669). A crate-only replay executor now evaluates workflows from
+  recorded leaf/control records, computes
+  stable SHA-256 leaf input hashes, and marks missing records as
+  `replay_diverged` instead of calling models again (#2673); the runtime replay
+  command and live-provider replay fallback remain deferred. The crate also now
+  has a model-agnostic role/capability registry with mock provider plumbing and
+  fail-closed JSON repair parsing, so WhaleFlow can choose capable models for
+  roles without hardcoding provider-specific runtime paths (#2672). The
+  `rlm_cache_change.star` dogfood workflow now exercises candidate branches,
+  LoopUntil verification, tournament selection, teacher review, and mock
+  execution in CI-oriented crate tests (#2679). Leaf, branch, and workflow
+  results now also carry separate ARMH/shared-memo and provider prompt-cache
+  telemetry counters, with mock aggregation tests, so #2671 can progress
+  without wiring live RLM calls or billing-affecting provider behavior yet. The
+  Starlark and typed-IR gates now also reject unknown leaf dependencies,
+  reducer inputs, and teacher-review candidates before mock execution or replay,
+  keeping generated workflows fail-closed while runtime/worktree semantics stay
+  deferred. TeacherReview now has serializable GEPA-style candidate artifacts
+  for notes, workflow recipes, skills, regression tests, cache policy, branch
+  heuristics, and Starlark authoring prompt patches, plus an offline helper
+  that proposes candidates from recorded execution traces without promoting
+  them or training model weights (#2674). StudentReplay results can now be
+  stored on teacher candidates, and a deterministic PromotionGate compares
+  baseline-vs-candidate replay deltas, required tests, policy violations,
+  staleness, and cost constraints before marking a candidate promotable (#2675).
+  The external-memory cutline now documents that Aleph-style memory stays
+  optional, explicit, visible, and clear/export-capable for v0.9.0 rather than
+  becoming a hidden default context substrate (#2677).
+  A dedicated v0.9.0 release acceptance matrix now tracks provider, runtime,
+  UI, WhaleFlow, Model Lab, remote-workbench, docs, rollback, and credit gates
+  that must be checked or explicitly deferred before tagging (#2729).
+  HarnessProfile docs now pin the v0.9.0 order: posture/schema/resolver/seed
+  profiles/status display must precede evidence stores, promotion gates, or any
+  automatic Harness Creator, with DeepSeek, MiMo, Arcee, and generic/HF/local
+  posture expectations called out separately (#2728).
+  Hugging Face / Model Lab and `codebase_search` release gates now explicitly
+  ship only the provider/MCP/docs/design foundation in v0.9; native Hub search,
+  model passports, Spaces/Jobs workflows, eval/export surfaces, and runtime
+  `codebase_search` registration remain deferred (#2705, #2680, #2727).
+  Remote workbench acceptance is also marked docs/setup-only for v0.9 so release
+  notes do not imply a shipped VM or Telegram bridge runtime (#2724).
+  Release-facing HarnessProfile docs now match the current implementation:
+  v0.9 ships the typed schema/config foundation and defers runtime resolver,
+  telemetry, seed-profile selection, and status-display behavior until later
+  verified slices. `config.example.toml` includes a commented dormant
+  harness-profile example, and README links point at the real acceptance matrix
+  and HarnessProfile cutline docs.
+  The release acceptance matrix now records evidence for already-landed gates:
+  provider-registry drift checks, provider-scoped TLS skip verify, read-only
+  GUI runtime/restore-point surfaces, VS Code Agent View branch visibility,
+  WhaleFlow mock/runtime foundations, explicit external-memory boundaries, and
+  docs alignment. Live workflow execution, provider calls, TraceStore writes,
+  and mutation-oriented GUI endpoints remain deferred until their atomicity and
+  replay contracts are tested. The `rlm_cache_change.star` dogfood workflow can
+  now be replayed from recorded mock leaf/control records, and missing dogfood
+  records produce `ReplayDiverged` instead of falling back to live execution
+  (#2679). The UI/workflow UX rows now also distinguish shipped transcript
+  tool-run collapse, sidebar detail popovers, and PlanArtifact review/handoff
+  evidence from the deferred first-look/home redesign, and record focused
+  slash-picker readability smoke coverage for visibility, selection, skill
+  insertion, Esc priority, and stable composer height (#2692, #2694, #2691,
+  #2713).
+  Thanks @AdityaVG13 for the WhaleFlow draft and cost-tracking direction.
+- Added a state-store v2 schema migration for WhaleFlow trace tables covering
+  workflow, branch, leaf, control-node, and teacher-candidate runs. The
+  migration creates persistence shape only; workflow execution and replay
+  remain deferred until the runtime semantics are safe (#2668).
+- Added an official VS Code extension Phase 0 scaffold with terminal launch,
+  local runtime attach checks, status bar state, and a read-only Agent View
+  preview backed by recent runtime thread summaries, plus a read-only
+  `GET /v1/snapshots` endpoint for GUI clients to inspect side-git restore
+  points. The extension now renders those restore points read-only in its Agent
+  View, and thread summaries include read-only workspace, branch, current Git
+  head, and dirty-state metadata so the VS Code Agent View can show when a
+  thread or agent lane is on another branch or has changed worktree state. Agent
+  View and restore-point data now auto-refresh on a configurable
+  read-only interval so branch/workspace/status changes become visible without a
+  manual refresh. Agent View refreshes keep thread branch/workspace rows
+  independent from restore-point loading, so a snapshot-listing failure no
+  longer clears already-available thread metadata. This answers the VS Code GUI
+  lane without exposing chat webviews, inline edits, or retry/undo/restore
+  runtime mutation endpoints yet
+  (#461, #462, #480, #1217, #2341, #1584, #2327, #2580, #2808). Thanks @AiurArtanis
+  for the Agent View prompt, @lbcheng888 for the earlier scaffold, @gaord for
+  the GUI runtime API direction, @douglarek, @caeserchen, and @nightt5879 for
+  the branch visibility trail, and @BigBenLabs, @lzx1545642258, @yangdaowan,
+  @mangdehuang, @VerrPower, @hejia-v, @nasus9527, and @ygzhang-cn for the
+  GUI/VS Code demand and validation trail.
+- Added inline live-output refresh for background shell Exec cards keyed by the
+  exact shell task id, so long-running commands can show bounded stdout/stderr
+  tails without consuming deltas or matching by command text. Thanks
+  @donglovejava for the live shell-output direction in #2048.
+- Added a static prompt composer override for embedders that need to replace
+  the byte-stable base/personality prompt segment while leaving mode metadata,
+  approval policy, tool taxonomy, Context Management, and the Compaction Relay
+  under CodeWhale's runtime prompt assembly. This refines the embedder prompt
+  customization path from #2786 without weakening prompt-continuity safeguards.
+  Thanks @h3c-hexin.
+- Added `POST /v1/sessions` for runtime clients to save a completed thread as a
+  managed session. The endpoint preserves thread title/model/mode/workspace
+  metadata, maps missing threads to 404, and returns 409 instead of snapshotting
+  queued or active turns.
+- Added cost-estimate pricing for the Xiaomi MiMo primary chat models, which
+  were previously unpriced: `mimo-v2.5-pro` / `xiaomi/mimo-v2.5-pro` reuse the
+  DeepSeek V4-Pro rate table and `mimo-v2.5` / `xiaomi/mimo-v2.5` reuse the
+  DeepSeek V4-Flash rates. Existing DeepSeek pricing is unchanged (#2731, #2750).
+- Added a metadata-only `codewhale-config` provider registry with canonical
+  lookup, alias-aware resolution, provider defaults, config-table keys, and
+  API-key env candidates. Runtime routing remains unchanged and fallback
+  providers stay dormant; this harvests the safe provider-trait foundation from
+  #2479 toward #2075. Thanks @sximelon.
+- Added optional `[search].base_url` / `CODEWHALE_SEARCH_BASE_URL` support for
+  DuckDuckGo-compatible private search endpoints, while keeping
+  `DEEPSEEK_SEARCH_BASE_URL` as a legacy alias. Custom endpoints are gated by
+  their configured host, do not fall back to public Bing, and report the custom
+  host as the result source for diagnostics (#2436, #2510).
+- Added `completion_sound = "file"` with `[notifications].sound_file` so
+  Windows users can play a custom WAV file for turn-completion sounds without
+  changing the global Windows sound scheme (#2484, #2512).
+- Added `[tui].stream_chunk_timeout_secs` and `/config stream_chunk_timeout_secs`
+  so slow local or OpenAI-compatible model servers can extend the SSE idle
+  timeout without mutating process environment. The legacy
+  `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` env var remains a fallback (#2365, #2507).
+- Added dormant `fallback_providers = [...]` config parsing plus a provider-chain
+  helper for future fallback routing. This preserves the requested contract
+  without enabling silent runtime provider switches yet (#2574, #2777). Thanks
+  @hsdbeebou for the request and @idling11 for the data-model draft.
+- Added `/hf` with `/huggingface` alias for Hugging Face MCP status/setup
+  helpers and `/hf concepts` provider/MCP/Hub guidance. The helper points users
+  to Hugging Face's settings-generated MCP configuration and intentionally does
+  not include Hub search, direct Hugging Face HTTP requests, or upload behavior
+  (#2709, #2782). Thanks @idling11 for the original Hugging Face MCP draft.
+- Added an in-process response cache for deterministic non-streaming,
+  tool-free chat requests. The cache is keyed by provider, base URL, path
+  suffix, API-key fingerprint, and final wire body, and zeroes usage on hits so
+  local spend counters are not double-counted (#2501). Thanks @HUQIANTAO for
+  the response-cache proposal and canonical-body key update.
+- Added `/sidebar` so users can toggle, show, hide, and optionally persist the
+  TUI sidebar from the command line instead of relying on copy-hostile sidebar
+  state during long transcript work (#2766, #2788). Thanks @mo-vic for the
+  detailed report and @aboimpinto for the fix.
+- Added a pausable custom slash-command MVP: commands with `pausable: true`
+  can pause before further tool execution, preserve the paused command while
+  separate messages are handled, and resume only on explicit continue/resume
+  wording. Harvested from #2732 with thanks to @aboimpinto.
+- Added Sofya (`provider = "sofya"`) as a search-tool backend with
+  `SOFYA_API_KEY` fallback, while keeping Sofya scoped to web search rather
+  than model-provider routing (#2790). Thanks @yusufgurdogan for the
+  implementation.
+- Added Xiaomi MiMo `mode` / `XIAOMI_MIMO_MODE` / `MIMO_MODE` selection for
+  Token Plan region endpoints and pay-as-you-go routing, plus dedicated Token
+  Plan env keys for `tp-*` subscriptions (#2621, #2627). Thanks @springeye for
+  the request and @xyuai for the implementation.
+- Added the first TUI hotbar action registry foundation so future UI controls
+  can dispatch typed app actions instead of growing another command match
+  surface (#2866). Thanks @reidliu41 for the implementation.
+- Added the narrow multi-tab core and persistence foundation, including tab
+  manager snapshots, delegation/group restore counters, mention parsing,
+  cross-tab events, and corruption-tolerant persisted state, while leaving the
+  broader collaboration UI wiring to follow-up work (#2864). Thanks
+  @ljm3790865 for the tab-core implementation and #2753 direction.
+- The VS Code Agent View now renders the runtime thread summary's Git `head`
+  and dirty-worktree flag alongside branch metadata, keeping branch switches
+  visible without adding retry/undo/restore mutation endpoints yet (#2580,
+  #2862). Thanks @AiurArtanis and @nasus9527 for the IDE/agent-view requests
+  and @gaord for the runtime metadata direction.
+
+### Changed
+
+- Removed the deprecated `deepseek` and `deepseek-tui` binary shims from the
+  v0.9.0 Cargo crates and GitHub release artifact matrix. The canonical
+  `codewhale`, `codew`, and `codewhale-tui` entry points remain, the private
+  deprecated `npm/deepseek-tui` notice package stays unpublished, and DeepSeek
+  provider/model/env/config compatibility remains first-class.
+- Command-adjacent config persistence and auto model routing now live in
+  neutral TUI modules instead of command-owned files, reducing command-boundary
+  coupling while preserving current `/config`, `/model`, UI, runtime, and
+  sub-agent behavior (#2871). Thanks @aboimpinto for landing this first staged
+  command-boundary layer from the broader #2851/#2791 design direction.
+- `/config` now reports the canonical `~/.codewhale/settings.toml` path for TUI
+  settings while still reading legacy DeepSeek-branded settings fallbacks and
+  migrating them into the CodeWhale home on load.
+- Provider switches now roll back transactionally when the first request to a
+  newly selected provider fails authentication: CodeWhale restores the previous
+  provider/model, model-ID passthrough, onboarding/API-key state, runtime
+  config, persisted provider selection, and engine handle so users can return
+  to DeepSeek after a failed Moonshot/Kimi switch (#2754, #2755). Thanks
+  @Dr3259 for the Windows repro and @cyq1017 for the draft fix.
+- `PATCH /v1/threads/{id}` can now update a thread's persisted workspace for
+  GUI/runtime clients. Workspace changes reject active turns and evict idle
+  cached engines so the next turn starts in the new workspace.
+- Split `web_run` session/page cache state so cached page reads use shared
+  page handles and do not serialize through the mutation path. The harvest also
+  adds panic-safe state write-back and serializes cache-mutating unit tests so
+  the global web cache remains stable under normal Cargo test parallelism.
+- Appended volatile `<turn_meta>` blocks after user text in outgoing user
+  message content arrays so provider prefix caches can keep matching the stable
+  user-input prefix across date, route, and working-set changes.
+- Projected mode, approval, and tool-taxonomy prompt metadata per request
+  instead of mutating stored system prompts, keeping provider prefix-cache
+  inputs byte-stable while preserving mode-specific instructions (#2687).
+  Thanks @LeoAlex0 for the implementation.
+- Softened contribution intake automation: external issues now receive a warm
+  triage note and are never auto-closed by the contribution gate, while the PR
+  gate copy makes clear that dry-run observations are about maintainer safety,
+  not contributor quality.
+- Added a PR gate marker guard so reopened unapproved PRs do not get duplicate
+  intake comments, and clarified that PR reopening should happen after
+  allowlist approval is merged.
+- Ollama `/model` completions no longer show hosted DeepSeek API model IDs.
+  The picker preserves the current or saved local Ollama tag, and users can
+  still fetch installed model IDs through `/models` instead of relying on a
+  stale static default (#2742). Thanks @reidliu41 for the focused report and
+  draft fix.
+- MCP runtime API tool listings and approval summaries no longer split
+  underscored MCP server names at the first `_`. Tool-call routing already used
+  the longest registered server name; the list endpoint now reuses that parser,
+  and approval cards show the full MCP target route instead of a guessed server
+  segment (#2744). Thanks @lioryx, @cyq1017, and @puneetdixit200 for the report
+  and matching fixes.
+- Documented the agent and sub-agent stewardship ethos so future automation
+  preserves human issue intake, careful PR review, and contributor credit.
+- Moved the TUI Starlark execpolicy parser and PTY support behind non-OHOS
+  target dependencies so published OpenHarmony builds no longer pull `nix` 0.28
+  through `rustyline` or `portable-pty`.
+- Explicit `skills_dir` configuration is now unioned with workspace skill
+  discovery instead of being shadowed by workspace-local skills, and configured
+  skills take precedence over global defaults when prompt space is constrained.
+- Tool-agent sub-agent routing now inherits the parent session model, or an
+  explicit tool-agent override, instead of hard-coding `deepseek-v4-flash`;
+  the fast lane still disables thinking through provider-aware request shaping.
+- Dense successful read/search/list tool runs now collapse into a single
+  expandable transcript row by default, while running, failed, shell, patch,
+  review, diff, and other risky tool cells remain visible. The setting
+  `tool_collapse = "compact" | "expanded" | "calm"` controls the behavior.
+- Pending-input preview rows now label delivery mode explicitly as steer
+  pending, rejected steer, or queued follow-up, with wrapped continuation rows
+  aligned under the label so busy-turn input state is easier to read (#2054).
+- Editing a queued follow-up is now an explicit pending-input state. Pressing
+  `Esc` while editing a queued follow-up restores the original queued message
+  instead of cancelling the active turn or silently dropping the queued work
+  (#2054).
+- Approval prompts now render prominent command, directory, file, path, or
+  target rows before falling back to raw JSON params. Shell approvals preserve
+  long command tails, split common shell chains for review, and show compact
+  `printf > file` previews while keeping intent summaries visible (#1991,
+  #2269).
+- Sidebar hover details now use row-level metadata for truncated Work, Tasks,
+  and Agents rows. Mouse hover opens a bordered, wrapping popover with the full
+  underlying row text, long turn/agent ids, and current sub-agent progress
+  instead of repeating the already-ellipsized sidebar label (#2694, #2734).
+- Sub-agents now preserve checkpoint metadata around long model calls. A
+  per-step API timeout marks the child as interrupted with a continuable
+  checkpoint instead of ending as a null failed result, and `agent_eval` can
+  explicitly continue a live checkpointed interrupted child while normal
+  completed/failed/cancelled follow-up behavior stays unchanged (#2029).
+- Durable task recovery no longer requeues tasks that were `running` when the
+  previous CodeWhale process exited. On restart those records are marked failed
+  with a recovery note, and any running tool-call summaries are marked failed
+  too, so stale shell/task state cannot silently become live work again (#1786).
+- Auto-generated project instructions now reuse the bounded Project Context
+  Pack data instead of running an unbounded summary/tree scan when no
+  `.codewhale/instructions.md` file exists. The fallback keeps later
+  top-level folders visible in noisy large workspaces while the dynamic
+  `<project_context_pack>` marker remains controlled by its own setting
+  (#697, #1827).
+- Project context loading now uses a bounded process-local content-signature
+  cache for repeated hot-path loads. The cache covers workspace/parent
+  instructions, global AGENTS/WHALE fallbacks, repo constitution files,
+  generated-context targets, trust markers, and trust config paths, and it
+  stores post-load signatures so auto-generated context deletion/regeneration
+  stays correct (#2636).
+- Configuration docs now show the provider-local `path_suffix` escape hatch
+  for OpenAI-compatible gateways that accept `/chat/completions` but reject
+  `/v1/chat/completions`, while making clear that model listing and DeepSeek
+  beta routes keep their built-in paths (#1874).
+- The config crate now carries the v0.9 HarnessPosture data model:
+  `HarnessPosture`, `HarnessProfile`, and typed posture/compaction/tool/safety
+  enums. The schema rejects misspelled posture names or unknown profile keys
+  instead of silently falling back to `custom`; a pure resolver can match
+  provider/model routes for tests and future status plumbing, while runtime
+  provider/model posture selection remains a follow-up (#2693, #2741, #2728).

 ### Fixed

 - **Benchmark workspace copying.** Fixed benchmark workspace file copying so local benchmark tasks can preserve their intended file layout during agent runs.
 - **MiMo default tests.** Guarded Xiaomi MiMo default-model tests against ambient CI provider environment variables.
+- Stream/body decode failures such as `Stream read error: error decoding
+  response body` are now classified as recoverable network interruptions
+  instead of generic internal errors, keeping the transcript and triage metadata
+  aligned with the existing stream retry path (#2847). Thanks
+  @qamranmushtaq-collab for the Windows/npx DeepSeek report.
+- The TUI footer, `/status`, `/mcp` manager, and command-palette MCP entries
+  now count trusted workspace-local `.codewhale/mcp.json` servers together with
+  the global MCP config, matching `codewhale mcp list` for merged global +
+  project setups (#2787). Thanks @yekern for the detailed reproduction.
+- AltGr key chords in the composer no longer get swallowed by sidebar shortcuts
+  on AZERTY and other international layouts, so characters such as `@`, `#`,
+  `$`, `!`, and `%` can be entered normally (#2863, #2867). Thanks
+  @ousamabenyounes for the fix and report.
+- Sub-agent shell completions now refresh the workspace branch/status chip
+  immediately, and `/subagents` plus the Agents sidebar show each sub-agent's
+  current workspace branch when it is running in a child worktree.
+- Authentication failures now include redacted request context such as provider,
+  base URL authority, model, key source, key type, and key fingerprint, making
+  stale provider, endpoint, or API-key state diagnosable without exposing the
+  secret (#2665, #2792). Thanks @mvanhorn for the implementation.
+- Browser-opening actions now compile on non-desktop targets by delegating the
+  unsupported-platform error to the shared URL opener instead of hiding the TUI
+  wrapper behind a narrower macOS/Linux/Windows cfg. Thanks @ci4ic4 for the
+  NetBSD/pkgsrc packaging report and fix (#2789).
+- MCP tool routing now preserves server names that contain underscores.
+  `parse_prefixed_name` matches the qualified `mcp_<server>_<tool>` name against
+  the set of registered server names and prefers the longest match, so tools on
+  a server like `my_db` are reachable and an overlapping `my` / `my_db` pair
+  routes correctly. Falls back to the legacy first-underscore split when no
+  registered server matches (#2744).
+- Schema-hydrated deferred tools no longer render as a completed run. The first
+  use of a deferred tool returns a schema-hydration result instead of executing;
+  the transcript and sidebar now show "tool loaded — retry required" via a
+  dedicated hydrated status, so it is no longer indistinguishable from a real
+  successful execution. A hydrated row also ranks with active work rather than
+  completed successes (#2648).
+- `codewhale sessions` now shows `codewhale resume <session-id>` in the footer
+  instead of the invalid dispatcher command `codewhale --resume <session-id>`
+  (#2758, #2760).
+- TUI HTTP clients now install the Rustls ring crypto provider before building
+  `reqwest` clients, covering engine, runtime API, tool, MCP, config, and skill
+  download paths. This keeps the no-provider TLS build from panicking during
+  tests or embedded startup paths that do not enter through the main binary.
+- Prompt byte-stability tests now pin their temporary home and skills
+  environment under the shared test-env lock so global skill directories cannot
+  perturb deterministic prompt bytes during parallel test runs.
+
+### Community
+
+Thanks to **@sximelon** for reporting and fixing the saved-session resume
+footer hint (#2758, #2760), **@cyq1017** for the custom
+DuckDuckGo-compatible search endpoint, custom completion sound file support,
+restore-listing implementation, and pending-input delivery-mode label work
+(#2510, #2512, #2513, #2532, #2054),
+**@Artenx** for the private-search endpoint report (#2436),
+**@LHqweasd** for the Windows custom notification sound request (#2484),
+**@wywsoor** for the broader macOS/iTerm rollback UX report (#2494),
+**@HUQIANTAO** for the `web_run` lock-splitting work (#2502), turn-metadata
+prefix-cache stability work (#2517), and project-context cache direction
+(#2636), **@xyuai** for canonical CodeWhale
+settings-path migration work (#2730), **@gaord** for the runtime thread
+workspace update and completed-thread save APIs (#2640, #2639),
+**@shenjackyuanjie** for the
+HarmonyOS/OpenHarmony port and MatePad Edge validation trail (#2634),
+**@ousamabenyounes** for the AZERTY AltGr composer shortcut fix (#2863,
+#2867), **@reidliu41** for the hotbar action-registry foundation (#2866), and
+**@ljm3790865** for the multi-tab core/persistence foundation and broader
+collaboration direction (#2864, #2753),
+**@aboimpinto** for the direct command-support boundary cleanup in #2871 and
+the broader #2851/#2791 command-layer design direction,
+**@idling11** for the PlanArtifact direction in Plan mode (#2733), the dense
+tool-call transcript collapse/sidebar detail direction (#2738, #2734, #2692,
+#2694), and the HarnessPosture config model for provider/model posture (#2741,
+#2693), and
+**@h3c-hexin** for the tool-agent model inheritance and configured
+`skills_dir` fixes (#2736, #2737), **@AresNing** for the turn-end observer hook
+work (#2578), and **@tdccccc** for the approval key-detail and shell-preview
+work (#1991, #2269). Thanks also to **@qiyuanlicn** for the
+checkpoint/resume report that shaped the sub-agent recovery slice (#2029),
+**@bevis-wong** for the long-running shell/task liveness report (#1786),
+**@shuxiangxuebiancheng** for the third-party OpenAI-compatible path report
+(#1874), **@hongqitai** and **@cyq1017** for the follow-up path-suffix PR
+review trail (#2508, #2506), **@NASLXTO** and **@wuxixing** for the
+large-workspace startup reports (#697, #1827), and **@linzhiqin2003** and
+**@merchloubna70-dot** for earlier context-cap and startup-diagnosis work that
+shaped this bounded fallback. Thanks also to **@cyq1017** for the MCP
+underscore-server-name fix and Xiaomi MiMo pricing (#2747, #2744, #2750, #2731)
+and **@puneetdixit200** for independently diagnosing and fixing the same MCP
+underscore issue (#2746, #2744), **@mvanhorn** for the hydrated deferred-tool
+render fix (#2757, #2648), and **@xyuai** for the Xiaomi MiMo Token Plan region
+documentation (#2756, #2735). Additional thanks to **@Implementist** for Plan
+prompt scrolling, wrapping, and display-width fixes, **@jrcjrcc** for the
+Windows sub-agent completion render-width fix, and **@punkcanyang** for the
+original `/init` implementation harvested through #2771/#2745.

 ## [0.8.53] - 2026-06-03

@@ -18,15 +18,8 @@ toml = ["schemaui/toml"]
 name = "codewhale-tui"
 path = "src/main.rs"

-# Legacy alias — forwards to `codewhale-tui` and prints a deprecation
-# notice. Will be removed in v0.9.0.
-[[bin]]
-name = "deepseek-tui"
-path = "src/bin/deepseek_tui_legacy_shim.rs"
-
 [dependencies]
 anyhow = "1.0.100"
-arboard = "3.4"
 codewhale-config = { path = "../config", version = "0.8.54" }
 codewhale-protocol = { path = "../protocol", version = "0.8.54" }
 codewhale-release = { path = "../release", version = "0.8.54" }
@@ -47,10 +40,10 @@ fd-lock = "4.0.4"
 futures-util = "0.3.31"
 ratatui = "0.30"
 regex = "1.11"
-reqwest = { version = "0.13.1", default-features = false, features = ["blocking", "json", "stream", "multipart", "form", "rustls", "http2", "gzip", "brotli"] }
+reqwest = { version = "0.13.1", default-features = false, features = ["blocking", "json", "stream", "multipart", "form", "rustls-no-provider", "http2", "gzip", "brotli"] }
+rustls.workspace = true
 qrcode = { version = "0.14", default-features = false }
 similar = "2"
-rustyline = "15.0.0"
 serde = { version = "1.0.228", features = ["derive"] }
 serde_json = { version = "1.0.149", features = ["preserve_order"] }
 schemars = { version = "1.2.1", features = ["derive", "preserve_order"] }
@@ -71,18 +64,19 @@ tower-http = { version = "0.6", features = ["cors"] }
 wait-timeout = "0.2"
 multimap = "0.10.0"
 shlex = "1.3.0"
-starlark = "0.13.0"
 tiny_http = "0.12"
-portable-pty = "0.9"
 zeroize = "1.8.2"
 ignore = "0.4"
 image = { version = "0.25", default-features = false, features = ["png"] }
+lru = "0.16"
+parking_lot = "0.12"
 pdf-extract = "0.7"
 tar = "0.4"
 flate2 = "1.1"
 sha2 = "0.10"

 [dev-dependencies]
+cucumber = "0.23.0"
 wiremock = "0.6"
 pretty_assertions = "1.4"
 vt100 = "0.15"
@@ -90,9 +84,16 @@ vt100 = "0.15"
 [target.'cfg(unix)'.dependencies]
 libc = "0.2"

+[target.'cfg(any(target_os = "macos", target_os = "windows", all(target_os = "linux", not(target_env = "ohos"))))'.dependencies]
+arboard = "3.4"
+
+[target.'cfg(not(target_env = "ohos"))'.dependencies]
+portable-pty = "0.9"
+starlark = "0.13.0"
+
 [target.'cfg(target_os = "macos")'.dependencies]
 objc2 = "0.6.3"
 objc2-foundation = { version = "0.3.2", default-features = false, features = ["std", "NSArray", "NSDictionary", "NSError", "NSObject", "NSString", "NSURL"] }

 [target.'cfg(target_os = "windows")'.dependencies]
-windows = { version = "0.60", features = ["Win32_Foundation", "Win32_Security", "Win32_System_Console", "Win32_System_Diagnostics_Debug", "Win32_System_JobObjects", "Win32_System_Threading", "Win32_UI_WindowsAndMessaging"] }
+windows = { version = "0.60", features = ["Win32_Foundation", "Win32_Media_Audio", "Win32_Security", "Win32_System_Console", "Win32_System_Diagnostics_Debug", "Win32_System_JobObjects", "Win32_System_Threading", "Win32_UI_WindowsAndMessaging"] }
@@ -121,11 +121,9 @@ fn configure_windows_stack() {
    match std::env::var("CARGO_CFG_TARGET_ENV").as_deref() {
        Ok("msvc") => {
            println!("cargo:rustc-link-arg-bin=codewhale-tui=/STACK:8388608");
-            println!("cargo:rustc-link-arg-bin=deepseek-tui=/STACK:8388608");
        }
        Ok("gnu") => {
            println!("cargo:rustc-link-arg-bin=codewhale-tui=-Wl,--stack,8388608");
-            println!("cargo:rustc-link-arg-bin=deepseek-tui=-Wl,--stack,8388608");
        }
        _ => {}
    }
@@ -1,32 +0,0 @@
-//! Legacy `deepseek-tui` alias.
-//!
-//! Forwards argv to the `codewhale-tui` runtime and prints a one-line
-//! deprecation notice to stderr on each invocation. This binary exists
-//! for one release cycle to give existing installs a smooth path to the
-//! new name; it will be removed in v0.9.0. See `docs/REBRAND.md` for the
-//! full migration story.
-
-use std::env;
-use std::process::Command;
-
-fn main() {
-    eprintln!(
-        "warning: `deepseek-tui` is deprecated; run `codewhale-tui` (or `codewhale`) instead. \
-         This alias will be removed in v0.9.0."
-    );
-    let args: Vec<String> = env::args_os()
-        .skip(1)
-        .map(|a| a.to_string_lossy().into_owned())
-        .collect();
-    let status = match Command::new("codewhale-tui").args(&args).status() {
-        Ok(s) => s,
-        Err(e) => {
-            eprintln!(
-                "error: failed to spawn `codewhale-tui`: {e}. Is it on PATH? \
-                 Install with `cargo install codewhale-tui` or via npm/Homebrew."
-            );
-            std::process::exit(127);
-        }
-    };
-    std::process::exit(status.code().unwrap_or(1));
-}
@@ -62,6 +62,7 @@ where
    }
 }

+#[cfg(not(target_env = "ohos"))]
 pub fn apply_to_pty_command<I, K, V>(cmd: &mut portable_pty::CommandBuilder, overrides: I)
 where
    I: IntoIterator<Item = (K, V)>,
@@ -61,7 +61,10 @@ pub(super) fn from_api_tool_name(name: &str) -> String {
                    break;
                }
            }
-            if let Ok(code) = u32::from_str_radix(&hex, 16)
+            // Only decode if we got exactly 6 hex digits (matching encoder output).
+            // Fewer digits means a truncated/malformed sequence — pass through as-is.
+            if hex.len() == 6
+                && let Ok(code) = u32::from_str_radix(&hex, 16)
                && let Some(decoded) = std::char::from_u32(code)
            {
                if let Some('-') = iter.peek().copied() {
@@ -158,6 +161,7 @@ pub struct DeepSeekClient {
    connection_health: Arc<AsyncMutex<ConnectionHealth>>,
    rate_limiter: Arc<AsyncMutex<TokenBucket>>,
    path_suffix: Option<String>,
+    pub(super) stream_idle_timeout: Duration,
 }

 const CONNECTION_FAILURE_THRESHOLD: u32 = 2;
@@ -325,6 +329,7 @@ impl Clone for DeepSeekClient {
            connection_health: self.connection_health.clone(),
            rate_limiter: self.rate_limiter.clone(),
            path_suffix: self.path_suffix.clone(),
+            stream_idle_timeout: self.stream_idle_timeout,
        }
    }
 }
@@ -581,7 +586,9 @@ impl DeepSeekClient {
        validate_base_url_security(&base_url)?;
        let retry = config.retry_policy();
        let default_model = config.default_model();
+        let stream_idle_timeout = Duration::from_secs(config.stream_chunk_timeout_secs());
        let http_headers = config.http_headers();
+        let insecure_skip_tls_verify = config.insecure_skip_tls_verify();
        let path_suffix = config
            .provider_config_for(api_provider)
            .and_then(|p| p.path_suffix.clone());
@@ -597,12 +604,24 @@ impl DeepSeekClient {
                http_headers.len()
            ));
        }
+        if insecure_skip_tls_verify {
+            logging::warn(format!(
+                "TLS certificate verification is disabled for provider {}; prefer SSL_CERT_FILE with a trusted custom CA bundle when possible",
+                api_provider.as_str()
+            ));
+        }
        logging::info(format!(
            "Retry policy: enabled={}, max_retries={}, initial_delay={}s, max_delay={}s",
            retry.enabled, retry.max_retries, retry.initial_delay, retry.max_delay
        ));

-        let http_client = Self::build_http_client(&api_key, &http_headers)?;
+        let http_client = Self::build_http_client(
+            &api_key,
+            &http_headers,
+            api_provider,
+            &base_url,
+            insecure_skip_tls_verify,
+        )?;

        Ok(Self {
            http_client,
@@ -614,15 +633,19 @@ impl DeepSeekClient {
            connection_health: Arc::new(AsyncMutex::new(ConnectionHealth::default())),
            rate_limiter: Arc::new(AsyncMutex::new(TokenBucket::from_env())),
            path_suffix,
+            stream_idle_timeout,
        })
    }

    fn build_http_client(
        api_key: &str,
        extra_headers: &HashMap<String, String>,
+        api_provider: ApiProvider,
+        base_url: &str,
+        insecure_skip_tls_verify: bool,
    ) -> Result<reqwest::Client> {
-        let headers = build_default_headers(api_key, extra_headers)?;
-        let mut builder = reqwest::Client::builder()
+        let headers = build_default_headers(api_key, extra_headers, api_provider, base_url)?;
+        let mut builder = crate::tls::reqwest_client_builder()
            .default_headers(headers)
            .user_agent(concat!(
                "Mozilla/5.0 (compatible; codewhale/",
@@ -643,6 +666,9 @@ impl DeepSeekClient {
        {
            builder = add_extra_root_certs(builder, &cert_path);
        }
+        if insecure_skip_tls_verify {
+            builder = builder.danger_accept_invalid_certs(true);
+        }
        builder.build().map_err(Into::into)
    }

@@ -651,21 +677,52 @@ impl DeepSeekClient {
        api_key: &str,
        extra_headers: &HashMap<String, String>,
    ) -> Result<HeaderMap> {
-        build_default_headers(api_key, extra_headers)
+        build_default_headers(
+            api_key,
+            extra_headers,
+            ApiProvider::Deepseek,
+            crate::config::DEFAULT_DEEPSEEK_BASE_URL,
+        )
+    }
+
+    #[cfg(test)]
+    fn default_headers_for_provider(
+        api_key: &str,
+        extra_headers: &HashMap<String, String>,
+        api_provider: ApiProvider,
+        base_url: &str,
+    ) -> Result<HeaderMap> {
+        build_default_headers(api_key, extra_headers, api_provider, base_url)
    }
 }

 fn build_default_headers(
    api_key: &str,
    extra_headers: &HashMap<String, String>,
+    api_provider: ApiProvider,
+    base_url: &str,
 ) -> Result<HeaderMap> {
    let mut headers = HeaderMap::new();
    headers.insert(CONTENT_TYPE, HeaderValue::from_static("application/json"));
-    if !api_key.trim().is_empty() {
-        headers.insert(
-            AUTHORIZATION,
-            HeaderValue::from_str(&format!("Bearer {api_key}"))?,
-        );
+    let api_key = api_key.trim();
+    let auth_header_name = if !api_key.is_empty()
+        && api_provider == ApiProvider::XiaomiMimo
+        && (xiaomi_mimo_base_url_uses_token_plan(base_url)
+            || xiaomi_mimo_api_key_uses_token_plan(api_key))
+    {
+        Some(HeaderName::from_static("api-key"))
+    } else if !api_key.is_empty() {
+        Some(AUTHORIZATION)
+    } else {
+        None
+    };
+    if let Some(header_name) = auth_header_name.as_ref() {
+        let header_value = if *header_name == AUTHORIZATION {
+            HeaderValue::from_str(&format!("Bearer {api_key}"))?
+        } else {
+            HeaderValue::from_str(api_key)?
+        };
+        headers.insert(header_name.clone(), header_value);
    }
    for (name, value) in extra_headers {
        let name = name.trim();
@@ -674,7 +731,10 @@ fn build_default_headers(
            continue;
        }
        let header_name = HeaderName::from_bytes(name.as_bytes())?;
-        if header_name == AUTHORIZATION || header_name == CONTENT_TYPE {
+        if header_name == AUTHORIZATION
+            || header_name == CONTENT_TYPE
+            || auth_header_name.as_ref() == Some(&header_name)
+        {
            continue;
        }
        headers.insert(header_name, HeaderValue::from_str(value)?);
@@ -682,6 +742,24 @@ fn build_default_headers(
    Ok(headers)
 }

+fn xiaomi_mimo_base_url_uses_token_plan(base_url: &str) -> bool {
+    let normalized = base_url.trim().to_ascii_lowercase();
+    let without_scheme = normalized
+        .strip_prefix("https://")
+        .or_else(|| normalized.strip_prefix("http://"))
+        .unwrap_or(&normalized);
+    let host = without_scheme
+        .split(['/', '?', '#'])
+        .next()
+        .unwrap_or_default();
+    let host = host.split(':').next().unwrap_or(host);
+    host.starts_with("token-plan-") && host.ends_with(".xiaomimimo.com")
+}
+
+fn xiaomi_mimo_api_key_uses_token_plan(api_key: &str) -> bool {
+    api_key.trim_start().starts_with("tp-")
+}
+
 impl DeepSeekClient {
    /// Returns the API base URL used by this client.
    pub fn base_url(&self) -> &str {
@@ -852,7 +930,10 @@ impl DeepSeekClient {
            anyhow::bail!("Speech synthesis failed: HTTP {status}: {error_text}");
        }

-        let response_text = response.text().await.unwrap_or_default();
+        let response_text = response
+            .text()
+            .await
+            .context("Failed to read speech synthesis response body")?;
        let payload: Value = serde_json::from_str(&response_text)
            .context("Failed to parse speech synthesis response JSON")?;
        let (audio_bytes, transcript) = parse_speech_audio_response(&payload)?;
@@ -904,6 +985,8 @@ impl DeepSeekClient {
        let probe = self.http_client.get(health_url).send().await;
        match probe {
            Ok(resp) if resp.status().is_success() => {
+                // Consume the response body so the connection can be returned to the pool.
+                let _ = resp.text().await;
                self.mark_request_success().await;
                logging::info("Recovery probe succeeded");
            }
@@ -1021,6 +1104,8 @@ impl LlmClient for DeepSeekClient {
        let response = self.http_client.get(health_url).send().await;
        match response {
            Ok(resp) if resp.status().is_success() => {
+                // Consume the response body so the connection can be returned to the pool.
+                let _ = resp.text().await;
                self.mark_request_success().await;
                Ok(true)
            }
@@ -1320,8 +1405,8 @@ pub(super) fn parse_usage(usage: Option<&Value>) -> Usage {
    });

    Usage {
-        input_tokens: input_tokens as u32,
-        output_tokens: output_tokens as u32,
+        input_tokens: input_tokens.min(u64::from(u32::MAX)) as u32,
+        output_tokens: output_tokens.min(u64::from(u32::MAX)) as u32,
        prompt_cache_hit_tokens,
        prompt_cache_miss_tokens,
        reasoning_tokens,
@@ -1360,7 +1445,10 @@ impl DeepSeekClient {
            );
            anyhow::bail!("FIM API error: HTTP {status}: {error_text}");
        }
-        let response_text = response.text().await.unwrap_or_default();
+        let response_text = response
+            .text()
+            .await
+            .context("Failed to read FIM API response body")?;
        let value: serde_json::Value =
            serde_json::from_str(&response_text).context("Failed to parse FIM API response")?;
        let text = value
@@ -1628,6 +1716,109 @@ mod tests {
        assert!(headers.get("x-blank").is_none());
    }

+    #[test]
+    fn build_http_client_accepts_default_tls_verification() {
+        let client = DeepSeekClient::build_http_client(
+            "sk-test",
+            &HashMap::new(),
+            ApiProvider::Deepseek,
+            crate::config::DEFAULT_DEEPSEEK_BASE_URL,
+            false,
+        );
+
+        assert!(client.is_ok());
+    }
+
+    #[test]
+    fn build_http_client_accepts_provider_scoped_tls_skip_verify() {
+        let client = DeepSeekClient::build_http_client(
+            "sk-test",
+            &HashMap::new(),
+            ApiProvider::Openai,
+            crate::config::DEFAULT_OPENAI_BASE_URL,
+            true,
+        );
+
+        assert!(client.is_ok());
+    }
+
+    #[test]
+    fn client_stream_idle_timeout_uses_tui_config() {
+        let client = DeepSeekClient::new(&Config {
+            api_key: Some("sk-test".to_string()),
+            tui: Some(crate::config::TuiConfig {
+                stream_chunk_timeout_secs: Some(777),
+                ..crate::config::TuiConfig::default()
+            }),
+            ..Config::default()
+        })
+        .expect("client");
+
+        assert_eq!(client.stream_idle_timeout, Duration::from_secs(777));
+    }
+
+    #[test]
+    fn xiaomi_mimo_token_plan_endpoint_uses_api_key_header() {
+        let headers = DeepSeekClient::default_headers_for_provider(
+            "tp-test",
+            &HashMap::new(),
+            ApiProvider::XiaomiMimo,
+            crate::config::DEFAULT_XIAOMI_MIMO_BASE_URL,
+        )
+        .expect("headers");
+
+        assert_eq!(
+            headers.get("api-key").and_then(|value| value.to_str().ok()),
+            Some("tp-test")
+        );
+        assert!(
+            headers.get(AUTHORIZATION).is_none(),
+            "Token Plan requires api-key instead of Authorization Bearer"
+        );
+    }
+
+    #[test]
+    fn xiaomi_mimo_tp_key_uses_api_key_header_with_custom_base_url() {
+        let mut extra = HashMap::new();
+        extra.insert("api-key".to_string(), "wrong".to_string());
+        extra.insert("Authorization".to_string(), "Bearer wrong".to_string());
+        let headers = DeepSeekClient::default_headers_for_provider(
+            "tp-custom",
+            &extra,
+            ApiProvider::XiaomiMimo,
+            "https://proxy.example.test/mimo/v1",
+        )
+        .expect("headers");
+
+        assert_eq!(
+            headers.get("api-key").and_then(|value| value.to_str().ok()),
+            Some("tp-custom")
+        );
+        assert!(
+            headers.get(AUTHORIZATION).is_none(),
+            "tp-* Token Plan keys should use api-key auth even through custom gateways"
+        );
+    }
+
+    #[test]
+    fn xiaomi_mimo_pay_as_you_go_endpoint_keeps_bearer_header() {
+        let headers = DeepSeekClient::default_headers_for_provider(
+            "sk-test",
+            &HashMap::new(),
+            ApiProvider::XiaomiMimo,
+            crate::config::XIAOMI_MIMO_PAY_AS_YOU_GO_BASE_URL,
+        )
+        .expect("headers");
+
+        assert_eq!(
+            headers
+                .get(AUTHORIZATION)
+                .and_then(|value| value.to_str().ok()),
+            Some("Bearer sk-test")
+        );
+        assert!(headers.get("api-key").is_none());
+    }
+
    #[test]
    fn chat_messages_keep_current_turn_reasoning_content() {
        let message = Message {
@@ -2320,6 +2511,29 @@ mod tests {
        assert!(body.get("extra_body").is_none());
    }

+    #[test]
+    fn reasoning_effort_off_is_omitted_for_strict_openai_like_providers() {
+        for provider in [
+            ApiProvider::Openai,
+            ApiProvider::Atlascloud,
+            ApiProvider::WanjieArk,
+            ApiProvider::Arcee,
+            ApiProvider::Huggingface,
+            ApiProvider::Moonshot,
+            ApiProvider::Ollama,
+            ApiProvider::Fireworks,
+        ] {
+            let mut body = json!({});
+            apply_reasoning_effort(&mut body, Some("off"), provider);
+
+            assert_eq!(
+                body,
+                json!({}),
+                "provider {provider:?} should not receive unsupported reasoning-off fields"
+            );
+        }
+    }
+
    #[test]
    fn reasoning_effort_uses_nvidia_nim_chat_template_kwargs() {
        let mut body = json!({});
@@ -16,11 +16,6 @@ use tokio::time::timeout as tokio_timeout;

 use crate::config::wire_model_for_provider;

-/// Default idle timeout for SSE stream reads (300 seconds = 5 minutes).
-/// After this period with no data, the stream is considered stalled and
-/// yields a recoverable error so the caller can retry.
-const DEFAULT_STREAM_IDLE_TIMEOUT: Duration = Duration::from_secs(300);
-
 /// Default timeout for the initial streaming response headers.
 ///
 /// `doctor` uses a bounded non-streaming request, but normal TUI turns first
@@ -48,17 +43,6 @@ fn stream_open_timeout_from_env(value: Option<&str>) -> Duration {
    Duration::from_secs(secs)
 }

-/// Reads the `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS` env var, falling back to
-/// the default 300s. The parsed value is clamped to [1, 3600] seconds.
-fn stream_idle_timeout() -> Duration {
-    let secs = std::env::var("DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS")
-        .ok()
-        .and_then(|v| v.parse::<u64>().ok())
-        .unwrap_or(DEFAULT_STREAM_IDLE_TIMEOUT.as_secs())
-        .clamp(1, 3600);
-    Duration::from_secs(secs)
-}
-
 use crate::config::ApiProvider;
 use crate::llm_client::StreamEventBox;
 use crate::llm_client::sanitize_http_error_body;
@@ -91,6 +75,7 @@ impl DeepSeekClient {
        &self,
        request: &MessageRequest,
    ) -> Result<MessageResponse> {
+        let cacheable = crate::llm_response_cache::request_is_cacheable(request);
        let messages = build_chat_messages_for_request_and_provider(request, self.api_provider);
        let model = wire_model_for_provider(self.api_provider, &request.model);
        let mut body = json!({
@@ -137,6 +122,24 @@ impl DeepSeekClient {
            self.api_provider,
        );

+        let response_cache_key = if cacheable {
+            let wire_body =
+                serde_json::to_vec(&body).context("Failed to serialize Chat API cache key")?;
+            let key = crate::llm_response_cache::ResponseCache::make_key(
+                self.api_provider.as_str(),
+                &self.base_url,
+                self.path_suffix.as_deref(),
+                &self.api_key,
+                &wire_body,
+            );
+            if let Some(cached) = crate::llm_response_cache::response_cache().get(&key) {
+                return Ok(cached);
+            }
+            Some(key)
+        } else {
+            None
+        };
+
        let url = api_url_with_suffix(
            &self.base_url,
            "chat/completions",
@@ -174,7 +177,11 @@ impl DeepSeekClient {
        let response_text = response.text().await.unwrap_or_default();
        let value: Value =
            serde_json::from_str(&response_text).context("Failed to parse Chat API JSON")?;
-        parse_chat_message(&value)
+        let parsed = parse_chat_message(&value)?;
+        if let Some(key) = response_cache_key {
+            crate::llm_response_cache::response_cache().put(key, parsed.clone());
+        }
+        Ok(parsed)
    }
 }

@@ -283,6 +290,7 @@ impl DeepSeekClient {
        // gzip-compressor failure when investigating #103.
        let response_headers = format_stream_headers(response.headers());
        let byte_stream = response.bytes_stream();
+        let stream_idle_timeout = self.stream_idle_timeout;

        let stream = async_stream::stream! {
            use futures_util::StreamExt;
@@ -315,7 +323,7 @@ impl DeepSeekClient {
            let is_reasoning_model = is_reasoning_model_for_stream(api_provider, &model);

            let mut byte_stream = std::pin::pin!(byte_stream);
-            let idle = stream_idle_timeout();
+            let idle = stream_idle_timeout;

            // Telemetry for #103 stream-decode diagnostics: bytes received
            // since the start of this stream and last successful event time.
@@ -1982,6 +1990,8 @@ fn provider_accepts_reasoning_content(provider: ApiProvider) -> bool {
            | ApiProvider::Novita
            | ApiProvider::Fireworks
            | ApiProvider::Siliconflow
+            | ApiProvider::SiliconflowCn
+            | ApiProvider::Volcengine
            | ApiProvider::Arcee
            | ApiProvider::Sglang
    )
@@ -3062,6 +3072,22 @@ mod stream_decoder_tests {
        }
    }

+    fn user_message_with_tail_turn_meta(task: &str, turn_meta: &str) -> Message {
+        Message {
+            role: "user".to_string(),
+            content: vec![
+                ContentBlock::Text {
+                    text: task.to_string(),
+                    cache_control: None,
+                },
+                ContentBlock::Text {
+                    text: turn_meta.to_string(),
+                    cache_control: None,
+                },
+            ],
+        }
+    }
+
    fn tool_message_content(messages: &[Value], index: usize) -> &str {
        messages
            .iter()
@@ -3128,6 +3154,30 @@ mod stream_decoder_tests {
        );
    }

+    #[test]
+    fn request_builder_keeps_tail_turn_meta_after_user_text_for_wire() {
+        let turn_meta = "<turn_meta>\nCurrent local date: 2026-05-09\n</turn_meta>";
+        let messages = vec![
+            user_message_with_tail_turn_meta("first task", turn_meta),
+            Message {
+                role: "assistant".to_string(),
+                content: vec![ContentBlock::Text {
+                    text: "first answer".to_string(),
+                    cache_control: None,
+                }],
+            },
+            user_message_with_tail_turn_meta("second task", turn_meta),
+        ];
+
+        let built = build_chat_messages(None, &messages, "deepseek-v4-flash");
+        let first = user_message_content(&built, 0);
+        let second = user_message_content(&built, 1);
+        let expected_ref = "<turn_meta_unchanged />";
+
+        assert_eq!(first, format!("first task\n{turn_meta}"));
+        assert_eq!(second, format!("second task\n{expected_ref}"));
+    }
+
    #[test]
    fn request_builder_keeps_changed_turn_meta_full_and_updates_recent_hash() {
        let first_meta = "<turn_meta>\nCurrent local date: 2026-05-09\n</turn_meta>";
@@ -1,5 +1,3 @@
-#![allow(dead_code)]
-
 //! Command safety analysis for shell execution
 //!
 //! This module provides pre-execution analysis of shell commands to detect
@@ -374,43 +372,38 @@ pub enum SafetyLevel {
 #[derive(Debug, Clone)]
 pub struct SafetyAnalysis {
    pub level: SafetyLevel,
-    pub command: String,
    pub reasons: Vec<String>,
    pub suggestions: Vec<String>,
 }

 impl SafetyAnalysis {
-    pub fn safe(command: &str) -> Self {
+    pub fn safe(_command: &str) -> Self {
        Self {
            level: SafetyLevel::Safe,
-            command: command.to_string(),
            reasons: vec!["Command is read-only".to_string()],
            suggestions: vec![],
        }
    }

-    pub fn workspace_safe(command: &str, reason: &str) -> Self {
+    pub fn workspace_safe(_command: &str, reason: &str) -> Self {
        Self {
            level: SafetyLevel::WorkspaceSafe,
-            command: command.to_string(),
            reasons: vec![reason.to_string()],
            suggestions: vec![],
        }
    }

-    pub fn requires_approval(command: &str, reasons: Vec<String>) -> Self {
+    pub fn requires_approval(_command: &str, reasons: Vec<String>) -> Self {
        Self {
            level: SafetyLevel::RequiresApproval,
-            command: command.to_string(),
            reasons,
            suggestions: vec![],
        }
    }

-    pub fn dangerous(command: &str, reasons: Vec<String>, suggestions: Vec<String>) -> Self {
+    pub fn dangerous(_command: &str, reasons: Vec<String>, suggestions: Vec<String>) -> Self {
        Self {
            level: SafetyLevel::Dangerous,
-            command: command.to_string(),
            reasons,
            suggestions,
        }
@@ -1012,72 +1005,6 @@ fn is_workspace_safe_command(command: &str) -> bool {
    false
 }

-/// Check if a path escapes the workspace
-pub fn path_escapes_workspace(path: &str, workspace: &str) -> bool {
-    let path_lower = normalize_safety_path(path);
-    let workspace_lower = normalize_safety_path(workspace);
-
-    // Check for obvious escape patterns
-    if path_lower.starts_with("~/") || path_lower.starts_with("$home") {
-        return true;
-    }
-
-    if is_absolute_safety_path(&path_lower) {
-        let path_components = lexical_components(&path_lower);
-        let workspace_components = lexical_components(&workspace_lower);
-        return !components_start_with(&path_components, &workspace_components);
-    }
-
-    // Walk the path components. Track depth relative to the workspace root:
-    // non-`..` components increment depth, `..` components decrement it.
-    // If depth ever goes negative, the path escapes the workspace boundary.
-    // This correctly distinguishes genuine traversal like `../outside` from
-    // names that happen to contain consecutive dots like `foo..bar`.
-    let mut depth: i32 = 0;
-    for component in path_lower.split('/') {
-        match component {
-            "" | "." => {}
-            ".." => depth -= 1,
-            _ => depth += 1,
-        }
-        if depth < 0 {
-            return true;
-        }
-    }
-
-    false
-}
-
-fn normalize_safety_path(path: &str) -> String {
-    path.trim().replace('\\', "/").to_lowercase()
-}
-
-fn is_absolute_safety_path(path: &str) -> bool {
-    path.starts_with('/')
-        || path
-            .as_bytes()
-            .get(1..3)
-            .is_some_and(|bytes| bytes[0] == b':' && bytes[1] == b'/')
-}
-
-fn lexical_components(path: &str) -> Vec<&str> {
-    let mut components = Vec::new();
-    for component in path.split('/') {
-        match component {
-            "" | "." => {}
-            ".." => {
-                components.pop();
-            }
-            _ => components.push(component),
-        }
-    }
-    components
-}
-
-fn components_start_with(path: &[&str], prefix: &[&str]) -> bool {
-    path.len() >= prefix.len() && path.iter().zip(prefix.iter()).all(|(a, b)| a == b)
-}
-
 /// Parse a command and extract the primary command name
 pub fn extract_primary_command(command: &str) -> Option<&str> {
    let trimmed = command.trim();
@@ -1093,56 +1020,6 @@ pub fn extract_primary_command(command: &str) -> Option<&str> {
    }
 }

-/// Categorize commands into groups
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum CommandCategory {
-    FileSystem,
-    Network,
-    Process,
-    Package,
-    Git,
-    Build,
-    System,
-    Shell,
-    Other,
-}
-
-/// Get the category of a command
-pub fn categorize_command(command: &str) -> CommandCategory {
-    let primary = match extract_primary_command(command) {
-        Some(cmd) => cmd.to_lowercase(),
-        None => return CommandCategory::Other,
-    };
-
-    match primary.as_str() {
-        "ls" | "dir" | "cat" | "head" | "tail" | "less" | "more" | "cp" | "mv" | "rm" | "mkdir"
-        | "rmdir" | "touch" | "chmod" | "chown" | "ln" | "find" | "fd" | "locate" | "stat"
-        | "file" => CommandCategory::FileSystem,
-
-        "curl" | "wget" | "fetch" | "nc" | "netcat" | "ssh" | "scp" | "sftp" | "rsync" | "ftp"
-        | "ping" | "traceroute" | "nslookup" | "dig" | "host" | "nmap" => CommandCategory::Network,
-
-        "ps" | "top" | "htop" | "kill" | "killall" | "pkill" | "pgrep" | "nice" | "renice"
-        | "nohup" | "timeout" => CommandCategory::Process,
-
-        "npm" | "yarn" | "pnpm" | "pip" | "pip3" | "brew" | "apt" | "apt-get" | "yum" | "dnf"
-        | "pacman" => CommandCategory::Package,
-
-        "git" | "gh" | "hub" => CommandCategory::Git,
-
-        "make" | "cmake" | "ninja" | "meson" | "cargo" | "go" | "gcc" | "g++" | "clang"
-        | "rustc" | "javac" | "tsc" => CommandCategory::Build,
-
-        "sudo" | "su" | "systemctl" | "service" | "shutdown" | "reboot" | "mount" | "umount"
-        | "fdisk" | "parted" => CommandCategory::System,
-
-        "bash" | "sh" | "zsh" | "fish" | "csh" | "tcsh" | "dash" | "source" | "." | "exec"
-        | "eval" => CommandCategory::Shell,
-
-        _ => CommandCategory::Other,
-    }
-}
-
 // === Unit Tests ===

 #[cfg(test)]
@@ -1321,62 +1198,6 @@ mod tests {
        );
    }

-    #[test]
-    fn test_path_escapes_workspace() {
-        assert!(path_escapes_workspace("/etc/passwd", "/home/user/project"));
-        assert!(path_escapes_workspace("~/secret", "/home/user/project"));
-        assert!(!path_escapes_workspace(
-            "./src/main.rs",
-            "/home/user/project"
-        ));
-    }
-
-    #[test]
-    fn test_path_escapes_workspace_doesnt_flag_double_dot_in_names() {
-        // Names like `foo..bar` should NOT be flagged as path traversal
-        assert!(!path_escapes_workspace(
-            "some..file.txt",
-            "/home/user/project"
-        ));
-        assert!(!path_escapes_workspace(
-            "./dir..name/file.txt",
-            "/home/user/project"
-        ));
-    }
-
-    #[test]
-    fn test_path_escapes_workspace_detects_genuine_traversal() {
-        assert!(path_escapes_workspace("../outside", "/home/user/project"));
-        assert!(path_escapes_workspace(
-            "..\\outside",
-            "C:\\Users\\me\\project"
-        ));
-        assert!(path_escapes_workspace(
-            "./subdir/../../etc/passwd",
-            "/home/user/project"
-        ));
-        assert!(path_escapes_workspace(
-            "/home/user/project/../secret",
-            "/home/user/project"
-        ));
-        assert!(path_escapes_workspace(
-            "C:\\Users\\me\\project\\..\\secret",
-            "C:\\Users\\me\\project"
-        ));
-    }
-
-    #[test]
-    fn test_path_escapes_workspace_allows_absolute_workspace_children() {
-        assert!(!path_escapes_workspace(
-            "/home/user/project/src/main.rs",
-            "/home/user/project"
-        ));
-        assert!(!path_escapes_workspace(
-            "C:\\Users\\me\\project\\src\\main.rs",
-            "C:\\Users\\me\\project"
-        ));
-    }
-
    #[test]
    fn test_extract_primary_command() {
        assert_eq!(extract_primary_command("ls -la"), Some("ls"));
@@ -1387,21 +1208,6 @@ mod tests {
        assert_eq!(extract_primary_command("  git status  "), Some("git"));
    }

-    #[test]
-    fn test_categorize_command() {
-        assert_eq!(categorize_command("ls -la"), CommandCategory::FileSystem);
-        assert_eq!(
-            categorize_command("curl https://example.com"),
-            CommandCategory::Network
-        );
-        assert_eq!(categorize_command("git status"), CommandCategory::Git);
-        assert_eq!(categorize_command("npm install"), CommandCategory::Package);
-        assert_eq!(
-            categorize_command("sudo apt update"),
-            CommandCategory::System
-        );
-    }
-
    // ── classify_command tests ────────────────────────────────────────────────

    /// Helper: split a string on whitespace into a `Vec<&str>` and call
@@ -1183,6 +1183,22 @@ mod tests {
        let _spill_guard = crate::tools::truncate::TEST_SPILLOVER_GUARD
            .lock()
            .unwrap_or_else(|err| err.into_inner());
+        // Set a temporary spillover root so wire-dedup can persist
+        // SHA-addressed tool-result files without depending on a
+        // writable $HOME (nix sandboxes have a read-only home tree).
+        let tmp = tempfile::tempdir().expect("tempdir");
+        let _restore = {
+            let prior = crate::tools::truncate::set_test_spillover_root(Some(
+                tmp.path().join(".deepseek").join("tool_outputs"),
+            ));
+            struct Restore(Option<std::path::PathBuf>);
+            impl Drop for Restore {
+                fn drop(&mut self) {
+                    crate::tools::truncate::set_test_spillover_root(self.0.take());
+                }
+            }
+            Restore(prior)
+        };
        let mut app = create_test_app();
        let long_output = format!("{}{}", "A".repeat(7_000), "Z".repeat(7_000));
        app.api_messages.push(Message {
@@ -1225,10 +1241,25 @@ mod tests {
        let result = cache(&mut app, Some("inspect"));
        let msg = result.message.expect("inspect output");

-        assert!(msg.contains("original_chars=14000"), "got: {msg}");
-        assert!(msg.contains("truncated=true"), "got: {msg}");
-        assert!(msg.contains("deduplicated=false"), "got: {msg}");
-        assert!(msg.contains("deduplicated=true"), "got: {msg}");
+        let tool_budget_lines: Vec<_> = msg
+            .lines()
+            .filter(|line| line.contains("original_chars=14000"))
+            .collect();
+        assert_eq!(tool_budget_lines.len(), 2, "got: {msg}");
+
+        let first_sighting = tool_budget_lines
+            .iter()
+            .find(|line| line.contains("deduplicated=false"))
+            .expect("first tool-result sighting should report non-dedup metadata");
+        assert!(first_sighting.contains("sent_chars="), "got: {msg}");
+        assert!(first_sighting.contains("truncated=true"), "got: {msg}");
+
+        let repeat_sighting = tool_budget_lines
+            .iter()
+            .find(|line| line.contains("deduplicated=true"))
+            .expect("repeat tool-result sighting should report dedup metadata");
+        assert!(repeat_sighting.contains("sent_chars="), "got: {msg}");
+        assert!(repeat_sighting.contains("truncated=false"), "got: {msg}");
    }

    #[test]
@@ -0,0 +1,249 @@
+//! `/hf` - Hugging Face MCP and provider concept helpers.
+
+use crate::mcp::{McpConfig, McpServerConfig};
+use crate::tui::app::App;
+
+use super::CommandResult;
+
+const HF_MCP_SETTINGS_URL: &str = "https://huggingface.co/settings/mcp";
+const HF_MCP_DOCS_URL: &str = "https://huggingface.co/docs/hub/hf-mcp-server";
+const HF_MCP_SERVER_URL: &str = "https://huggingface.co/mcp";
+
+const HF_MCP_CONFIG_SKELETON: &str = r#"{
+  "servers": {
+    "huggingface": {
+      "url": "https://huggingface.co/mcp",
+      "headers": {
+        "Authorization": "Bearer ${HF_TOKEN}"
+      }
+    }
+  }
+}"#;
+
+/// Explainer shown by `/hf concepts`.
+const HF_CONCEPTS: &str = "\
+CodeWhale has three distinct Hugging Face surfaces:
+
+1. Hugging Face provider route - chat inference
+   Switch the active LLM backend to Hugging Face Inference Providers.
+   Use: /provider huggingface
+   Config: provider = \"huggingface\" or [providers.huggingface]
+   Auth: HF_TOKEN or HUGGINGFACE_API_KEY
+
+2. Hugging Face MCP - Hub, docs, datasets, Spaces, and community tools
+   Connect CodeWhale to Hugging Face's MCP server through mcp.json.
+   Use: /hf mcp status or /hf mcp setup
+   Then: /mcp validate or restart CodeWhale so model-visible tools reload.
+
+3. Hugging Face Hub workflows - publish, upload, or manage repositories
+   Use explicit Hub tooling such as huggingface_hub or git-based flows.
+   CodeWhale does not upload to the Hub through /hf.";
+
+pub fn hf(app: &mut App, args: Option<&str>) -> CommandResult {
+    let raw = args.unwrap_or("").trim();
+    if raw.is_empty() {
+        return usage();
+    }
+
+    let mut parts = raw.split_whitespace();
+    let subcommand = parts.next().unwrap_or_default().to_ascii_lowercase();
+    match subcommand.as_str() {
+        "mcp" => hf_mcp(app, parts.next()),
+        "concepts" | "explain" => CommandResult::message(HF_CONCEPTS),
+        _ => CommandResult::error(format!(
+            "Unknown /hf subcommand: {subcommand}. Use /hf mcp <status|setup> or /hf concepts."
+        )),
+    }
+}
+
+fn usage() -> CommandResult {
+    CommandResult::message(
+        "Usage: /hf mcp <status|setup>\n\
+         /hf concepts\n\n\
+         Hugging Face MCP settings: https://huggingface.co/settings/mcp",
+    )
+}
+
+fn hf_mcp(app: &mut App, action: Option<&str>) -> CommandResult {
+    match action.unwrap_or("status").to_ascii_lowercase().as_str() {
+        "status" => hf_mcp_status(app),
+        "setup" => CommandResult::message(hf_mcp_setup_message(app)),
+        other => CommandResult::error(format!(
+            "Unknown /hf mcp subcommand: {other}. Use status or setup."
+        )),
+    }
+}
+
+fn hf_mcp_status(app: &App) -> CommandResult {
+    match crate::mcp::load_config(&app.mcp_config_path) {
+        Ok(config) => {
+            if let Some(server_name) = configured_hf_mcp_server(&config) {
+                CommandResult::message(format!(
+                    "Hugging Face MCP appears configured as `{server_name}` in {}.\n\
+                     Run /mcp validate or restart CodeWhale if tools are not visible yet.",
+                    app.mcp_config_path.display()
+                ))
+            } else {
+                CommandResult::message(format!(
+                    "Hugging Face MCP is not configured in {}.\n\
+                     Run /hf mcp setup for the settings-generated config workflow.",
+                    app.mcp_config_path.display()
+                ))
+            }
+        }
+        Err(err) => CommandResult::error(format!(
+            "Could not read MCP config {}: {err}",
+            app.mcp_config_path.display()
+        )),
+    }
+}
+
+fn hf_mcp_setup_message(app: &App) -> String {
+    format!(
+        "Use Hugging Face's settings-generated MCP configuration when available:\n\
+         1. Open {HF_MCP_SETTINGS_URL} while signed in.\n\
+         2. Choose your MCP client and copy the generated configuration snippet.\n\
+         3. Paste the Hugging Face server entry into {}.\n\
+         4. Restart CodeWhale, or run /mcp reload for the TUI manager snapshot.\n\n\
+         CodeWhale-compatible placeholder shape:\n\n\
+         ```json\n{HF_MCP_CONFIG_SKELETON}\n```\n\n\
+         The placeholder is intentionally not runnable until your private MCP config has a real token value. \
+         Do not commit real Hugging Face tokens.\n\n\
+         Docs: {HF_MCP_DOCS_URL}\n\
+         Server: {HF_MCP_SERVER_URL}",
+        app.mcp_config_path.display()
+    )
+}
+
+fn configured_hf_mcp_server(config: &McpConfig) -> Option<&str> {
+    config
+        .servers
+        .iter()
+        .find(|(name, server)| looks_like_hf_mcp_server(name, server))
+        .map(|(name, _)| name.as_str())
+}
+
+fn looks_like_hf_mcp_server(name: &str, server: &McpServerConfig) -> bool {
+    let compact_name: String = name
+        .chars()
+        .filter(|ch| ch.is_ascii_alphanumeric())
+        .flat_map(|ch| ch.to_lowercase())
+        .collect();
+    if matches!(
+        compact_name.as_str(),
+        "huggingface" | "huggingfacemcp" | "hfmcp" | "hfmcpserver"
+    ) {
+        return true;
+    }
+
+    server.url.as_deref().is_some_and(|url| {
+        let url = url.to_ascii_lowercase();
+        url.contains("huggingface.co/mcp") || url.contains("huggingface.co/api/mcp")
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use std::fs;
+    use std::path::PathBuf;
+
+    use crate::config::Config;
+    use crate::tui::app::TuiOptions;
+    use tempfile::tempdir;
+
+    use super::*;
+
+    fn app_with_mcp_path(mcp_config_path: PathBuf) -> App {
+        App::new(
+            TuiOptions {
+                model: "deepseek-v4-pro".to_string(),
+                workspace: PathBuf::from("."),
+                config_path: None,
+                config_profile: None,
+                allow_shell: false,
+                use_alt_screen: false,
+                use_mouse_capture: false,
+                use_bracketed_paste: true,
+                max_subagents: 2,
+                skills_dir: PathBuf::from("."),
+                memory_path: PathBuf::from("memory.md"),
+                notes_path: PathBuf::from("notes.txt"),
+                mcp_config_path,
+                use_memory: false,
+                start_in_agent_mode: false,
+                skip_onboarding: true,
+                yolo: false,
+                resume_session_id: None,
+                initial_input: None,
+            },
+            &Config::default(),
+        )
+    }
+
+    #[test]
+    fn hf_mcp_config_skeleton_keeps_token_placeholder_only() {
+        assert!(HF_MCP_CONFIG_SKELETON.contains("${HF_TOKEN}"));
+        assert!(!HF_MCP_CONFIG_SKELETON.contains("hf_"));
+        assert!(!HF_MCP_CONFIG_SKELETON.contains("Bearer hf_"));
+        serde_json::from_str::<serde_json::Value>(HF_MCP_CONFIG_SKELETON)
+            .expect("skeleton should be valid JSON");
+    }
+
+    #[test]
+    fn hf_concepts_explains_provider_mcp_and_hub_surfaces() {
+        assert!(HF_CONCEPTS.contains("provider route"));
+        assert!(HF_CONCEPTS.contains("Hugging Face MCP"));
+        assert!(HF_CONCEPTS.contains("Hub workflows"));
+        assert!(HF_CONCEPTS.contains("/provider huggingface"));
+        assert!(HF_CONCEPTS.contains("/hf mcp"));
+    }
+
+    #[test]
+    fn hf_mcp_status_detects_settings_named_server() {
+        let dir = tempdir().expect("tempdir");
+        let path = dir.path().join("mcp.json");
+        fs::write(
+            &path,
+            r#"{"mcpServers":{"hf-mcp-server":{"url":"https://huggingface.co/mcp"}}}"#,
+        )
+        .expect("write mcp config");
+        let app = app_with_mcp_path(path);
+
+        let result = hf_mcp_status(&app);
+
+        assert!(!result.is_error);
+        let message = result.message.expect("status message");
+        assert!(message.contains("appears configured"));
+        assert!(message.contains("hf-mcp-server"));
+    }
+
+    #[test]
+    fn hf_mcp_status_reports_missing_server_without_network() {
+        let dir = tempdir().expect("tempdir");
+        let path = dir.path().join("mcp.json");
+        fs::write(&path, r#"{"servers":{"local":{"command":"node"}}}"#).expect("write mcp config");
+        let app = app_with_mcp_path(path);
+
+        let result = hf_mcp_status(&app);
+
+        assert!(!result.is_error);
+        assert!(
+            result
+                .message
+                .as_deref()
+                .unwrap_or_default()
+                .contains("not configured")
+        );
+    }
+
+    #[test]
+    fn hf_usage_and_setup_do_not_advertise_hub_search() {
+        let app = app_with_mcp_path(PathBuf::from("mcp.json"));
+        let usage = usage().message.expect("usage");
+        let setup = hf_mcp_setup_message(&app);
+
+        assert!(!usage.contains("/hf search"));
+        assert!(!setup.contains("/hf search"));
+        assert!(setup.contains(HF_MCP_SETTINGS_URL));
+    }
+}
@@ -43,6 +43,10 @@ fn events() -> CommandResult {
    let ordered = [
        (HookEvent::SessionStart, "fires once when the TUI launches"),
        (HookEvent::SessionEnd, "fires once on graceful shutdown"),
+        (
+            HookEvent::TurnEnd,
+            "fires after a turn completes (observer-only)",
+        ),
        (
            HookEvent::MessageSubmit,
            "fires before model dispatch; can transform or block submitted text",
@@ -146,6 +150,7 @@ fn event_label(event: HookEvent) -> &'static str {
        HookEvent::ToolCallAfter => "tool_call_after",
        HookEvent::ModeChange => "mode_change",
        HookEvent::OnError => "on_error",
+        HookEvent::TurnEnd => "turn_end",
        HookEvent::SubagentSpawn => "subagent_spawn",
        HookEvent::SubagentComplete => "subagent_complete",
        HookEvent::ShellEnv => "shell_env",
@@ -266,6 +271,7 @@ mod tests {
        let positions: Vec<(usize, &str)> = [
            "session_start",
            "session_end",
+            "turn_end",
            "message_submit",
            "tool_call_before",
            "tool_call_after",
@@ -310,6 +316,7 @@ mod tests {
        assert_eq!(event_label(HookEvent::MessageSubmit), "message_submit");
        assert_eq!(event_label(HookEvent::ModeChange), "mode_change");
        assert_eq!(event_label(HookEvent::OnError), "on_error");
+        assert_eq!(event_label(HookEvent::TurnEnd), "turn_end");
        assert_eq!(event_label(HookEvent::SubagentSpawn), "subagent_spawn");
        assert_eq!(
            event_label(HookEvent::SubagentComplete),
@@ -12,6 +12,7 @@ mod core;
 mod debug;
 mod feedback;
 mod goal;
+mod hf;
 mod hooks;
 mod init;
 mod jobs;
@@ -77,7 +78,6 @@ impl CommandResult {
    }

    /// Create a result with both message and action
-    #[allow(dead_code)]
    pub fn with_message_and_action(msg: impl Into<String>, action: AppAction) -> Self {
        Self {
            message: Some(msg.into()),
@@ -224,6 +224,12 @@ pub const COMMANDS: &[CommandInfo] = &[
        usage: "/feedback [bug|feature|security]",
        description_id: MessageId::CmdFeedbackDescription,
    },
+    CommandInfo {
+        name: "hf",
+        aliases: &["huggingface"],
+        usage: "/hf [mcp <status|setup>|concepts]",
+        description_id: MessageId::CmdHfDescription,
+    },
    CommandInfo {
        name: "home",
        aliases: &["stats", "overview", "zhuye", "shouye"],
@@ -352,6 +358,12 @@ pub const COMMANDS: &[CommandInfo] = &[
        usage: "/config",
        description_id: MessageId::CmdConfigDescription,
    },
+    CommandInfo {
+        name: "sidebar",
+        aliases: &[],
+        usage: "/sidebar [on|off|auto|work|tasks|agents|context] [--save]",
+        description_id: MessageId::CmdSidebarDescription,
+    },
    CommandInfo {
        name: "mode",
        aliases: &["jihua", "zidong"],
@@ -495,7 +507,7 @@ pub const COMMANDS: &[CommandInfo] = &[
    CommandInfo {
        name: "restore",
        aliases: &[],
-        usage: "/restore [N]",
+        usage: "/restore [N|list [N]]",
        description_id: MessageId::CmdRestoreDescription,
    },
    // RLM command
@@ -571,6 +583,7 @@ pub fn execute(cmd: &str, app: &mut App) -> CommandResult {
        "agent" | "daili" => agent(app, arg),
        "links" | "dashboard" | "api" | "lianjie" => core::deepseek_links(app),
        "feedback" => feedback::feedback(app, arg),
+        "hf" | "huggingface" => hf::hf(app, arg),
        "home" | "stats" | "overview" | "zhuye" | "shouye" => core::home_dashboard(app),
        "workspace" | "cwd" => core::workspace_switch(app, arg),
        "note" => note::note(app, arg),
@@ -595,6 +608,7 @@ pub fn execute(cmd: &str, app: &mut App) -> CommandResult {

        // Config commands
        "config" => config::config_command(app, arg),
+        "sidebar" => config::sidebar(app, arg),
        "settings" => config::show_settings(app),
        "status" => status::status(app),
        "statusline" => config::status_line(app),
@@ -668,8 +682,8 @@ pub fn execute(cmd: &str, app: &mut App) -> CommandResult {
        _ => {
            // Third source: skills (lowest precedence after native and user-config).
            // Try to run a skill whose name matches the command.
-            if skills::run_skill_by_name(app, command, arg).is_some() {
-                return skills::run_skill_by_name(app, command, arg).unwrap();
+            if let Some(result) = skills::run_skill_by_name(app, command, arg) {
+                return result;
            }
            let suggestions = suggest_command_names(command, 3);
            if suggestions.is_empty() {
@@ -695,37 +709,9 @@ pub fn set_config_value(app: &mut App, key: &str, value: &str, persist: bool) ->
    config::set_config_value(app, key, value, persist)
 }

-/// Persist the user's chosen footer items to `~/.deepseek/config.toml` under
-/// `tui.status_items`. See [`config::persist_status_items`] for details.
-pub fn persist_status_items(
-    items: &[crate::config::StatusItem],
-) -> anyhow::Result<std::path::PathBuf> {
-    config::persist_status_items(items)
-}
-
-/// Persist a root-level string key in `config.toml`.
-pub fn persist_root_string_key(
-    config_path: Option<&std::path::Path>,
-    key: &str,
-    value: &str,
-) -> anyhow::Result<std::path::PathBuf> {
-    config::persist_root_string_key(config_path, key, value)
-}
-
 pub fn switch_mode(app: &mut App, mode: crate::tui::app::AppMode) -> String {
    config::switch_mode(app, mode)
 }
-
-/// Auto-select a model based on request complexity.
-pub fn auto_model_heuristic(input: &str, current_model: &str) -> String {
-    config::auto_model_heuristic(input, current_model)
-}
-
-pub use config::{
-    AutoRouteRecommendation, AutoRouteSelection, normalize_auto_route_effort,
-    parse_auto_route_recommendation, resolve_auto_route_with_flash,
-};
-
 /// Execute a Recursive Language Model (RLM) turn — Algorithm 1 from
 /// Zhang et al. (arXiv:2512.24601).
 ///
@@ -854,11 +840,35 @@ fn build_relay_instruction(app: &App, focus: Option<&str>) -> String {

    if let Ok(plan) = app.plan_state.try_lock() {
        let snapshot = plan.snapshot();
-        if snapshot.explanation.is_some() || !snapshot.items.is_empty() {
+        if !snapshot.is_empty() {
            let _ = writeln!(out, "\nOptional strategy metadata from update_plan:");
-            if let Some(explanation) = snapshot.explanation.as_deref() {
-                let _ = writeln!(out, "- Explanation: {explanation}");
-            }
+            write_plan_field(&mut out, "Title", snapshot.title.as_deref());
+            write_plan_field(&mut out, "Objective", snapshot.objective.as_deref());
+            write_plan_field(&mut out, "Context", snapshot.context_summary.as_deref());
+            write_plan_field(&mut out, "Explanation", snapshot.explanation.as_deref());
+            write_plan_list(&mut out, "Source", &snapshot.sources_used);
+            write_plan_list(&mut out, "Critical file", &snapshot.critical_files);
+            write_plan_list(&mut out, "Constraint", &snapshot.constraints);
+            write_plan_field(
+                &mut out,
+                "Recommended approach",
+                snapshot.recommended_approach.as_deref(),
+            );
+            write_plan_field(
+                &mut out,
+                "Verification plan",
+                snapshot.verification_plan.as_deref(),
+            );
+            write_plan_field(
+                &mut out,
+                "Risks and unknowns",
+                snapshot.risks_and_unknowns.as_deref(),
+            );
+            write_plan_field(
+                &mut out,
+                "Handoff packet",
+                snapshot.handoff_packet.as_deref(),
+            );
            for item in snapshot.items {
                let _ = writeln!(out, "- [{}] {}", plan_status_label(&item.status), item.step);
            }
@@ -904,6 +914,21 @@ fn build_relay_instruction(app: &App, focus: Option<&str>) -> String {
    out
 }

+fn write_plan_field(out: &mut String, label: &str, value: Option<&str>) {
+    if let Some(value) = value.map(str::trim).filter(|value| !value.is_empty()) {
+        let _ = writeln!(out, "- {label}: {value}");
+    }
+}
+
+fn write_plan_list(out: &mut String, label: &str, values: &[String]) {
+    for value in values {
+        let value = value.trim();
+        if !value.is_empty() {
+            let _ = writeln!(out, "- {label}: {value}");
+        }
+    }
+}
+
 fn plan_status_label(status: &crate::tools::plan::StepStatus) -> &'static str {
    match status {
        crate::tools::plan::StepStatus::Pending => "pending",
@@ -952,45 +977,6 @@ pub fn get_command_info(name: &str) -> Option<&'static CommandInfo> {
        .find(|cmd| cmd.name == name || cmd.aliases.contains(&name))
 }

-/// Get all command names matching a prefix, including both built-in
-/// static commands and user-defined commands, formatted as `/name`.
-///
-/// `workspace` is used to also scan workspace-local command directories;
-/// pass `None` when no workspace context is available.
-#[allow(dead_code)]
-pub fn all_command_names_matching(
-    prefix: &str,
-    workspace: Option<&std::path::Path>,
-) -> Vec<String> {
-    let prefix = prefix.strip_prefix('/').unwrap_or(prefix).to_lowercase();
-    let mut result: Vec<String> = COMMANDS
-        .iter()
-        .filter(|cmd| {
-            cmd.name.starts_with(&prefix) || cmd.aliases.iter().any(|a| a.starts_with(&prefix))
-        })
-        .map(|cmd| format!("/{}", cmd.name))
-        .collect();
-
-    // Add user-defined commands
-    result.extend(user_commands::user_commands_matching(&prefix, workspace));
-
-    result.sort();
-    result.dedup();
-    result
-}
-
-/// Get all commands matching a prefix (for autocomplete)
-#[allow(dead_code)]
-pub fn commands_matching(prefix: &str) -> Vec<&'static CommandInfo> {
-    let prefix = prefix.strip_prefix('/').unwrap_or(prefix).to_lowercase();
-    COMMANDS
-        .iter()
-        .filter(|cmd| {
-            cmd.name.starts_with(&prefix) || cmd.aliases.iter().any(|a| a.starts_with(&prefix))
-        })
-        .collect()
-}
-
 fn edit_distance(a: &str, b: &str) -> usize {
    if a == b {
        return 0;
@@ -1078,7 +1064,7 @@ mod tests {
    use crate::config::{ApiProvider, Config};
    use crate::tools::plan::{PlanItemArg, StepStatus, UpdatePlanArgs};
    use crate::tools::todo::TodoStatus;
-    use crate::tui::app::{App, AppAction, TuiOptions};
+    use crate::tui::app::{App, AppAction, SidebarFocus, TuiOptions};
    use std::ffi::OsString;
    use std::path::{Path, PathBuf};
    use std::sync::MutexGuard;
@@ -1112,7 +1098,24 @@ mod tests {
    #[test]
    fn command_registry_contains_config_and_links_but_not_set_or_deepseek() {
        assert!(COMMANDS.iter().any(|cmd| cmd.name == "config"));
+        let sidebar = COMMANDS
+            .iter()
+            .find(|cmd| cmd.name == "sidebar")
+            .expect("sidebar command should exist");
+        assert_eq!(sidebar.description_id, MessageId::CmdSidebarDescription);
+        assert!(
+            sidebar
+                .description_for(Locale::En)
+                .contains("right sidebar")
+        );
        assert!(COMMANDS.iter().any(|cmd| cmd.name == "links"));
+        let hf = COMMANDS
+            .iter()
+            .find(|cmd| cmd.name == "hf")
+            .expect("hf command should exist");
+        assert_eq!(hf.aliases, &["huggingface"]);
+        assert_eq!(hf.description_id, MessageId::CmdHfDescription);
+        assert!(hf.description_for(Locale::En).contains("Hugging Face"));
        assert!(COMMANDS.iter().any(|cmd| cmd.name == "memory"));
        assert!(!COMMANDS.iter().any(|cmd| cmd.name == "set"));
        assert!(!COMMANDS.iter().any(|cmd| cmd.name == "deepseek"));
@@ -1127,6 +1130,17 @@ mod tests {
        assert_eq!(links.aliases, &["dashboard", "api", "lianjie"]);
    }

+    #[test]
+    fn hf_alias_dispatches_to_concepts_helper() {
+        let mut app = create_test_app();
+        let result = execute("/huggingface concepts", &mut app);
+        assert!(!result.is_error);
+        let message = result.message.expect("concepts message");
+        assert!(message.contains("Hugging Face provider route"));
+        assert!(message.contains("Hugging Face MCP"));
+        assert!(message.contains("Hub workflows"));
+    }
+
    #[test]
    fn rlm_slash_command_routes_to_persistent_tool_instruction() {
        let mut app = create_test_app();
@@ -1166,11 +1180,18 @@ mod tests {
        {
            let mut plan = app.plan_state.try_lock().expect("plan lock");
            plan.update(UpdatePlanArgs {
+                objective: Some("Keep relays grounded".to_string()),
                explanation: Some("RLM-style strategy".to_string()),
+                sources_used: vec!["transcript context".to_string()],
+                critical_files: vec!["crates/tui/src/commands/mod.rs".to_string()],
+                constraints: vec!["Do not invent verification".to_string()],
+                verification_plan: Some("Check relay prompt assertions".to_string()),
+                handoff_packet: Some("Next thread should read the Work checklist".to_string()),
                plan: vec![PlanItemArg {
                    step: "keep checklist primary".to_string(),
                    status: StepStatus::InProgress,
                }],
+                ..UpdatePlanArgs::default()
            });
        }

@@ -1197,7 +1218,13 @@ mod tests {
        assert!(message.contains("#1 [completed] inspect workspace"));
        assert!(message.contains("#2 [in_progress] patch relay command"));
        assert!(message.contains("Optional strategy metadata from update_plan"));
+        assert!(message.contains("Objective: Keep relays grounded"));
        assert!(message.contains("Explanation: RLM-style strategy"));
+        assert!(message.contains("Source: transcript context"));
+        assert!(message.contains("Critical file: crates/tui/src/commands/mod.rs"));
+        assert!(message.contains("Constraint: Do not invent verification"));
+        assert!(message.contains("Verification plan: Check relay prompt assertions"));
+        assert!(message.contains("Handoff packet: Next thread should read the Work checklist"));
        assert!(message.contains("[in_progress] keep checklist primary"));
    }

@@ -1243,6 +1270,127 @@ mod tests {
        }
    }

+    #[test]
+    fn command_registry_metadata_is_complete_and_palette_safe() {
+        for command in COMMANDS {
+            assert!(!command.name.is_empty(), "command name must not be empty");
+            assert_eq!(
+                command.name.trim(),
+                command.name,
+                "/{} command name must not need trimming",
+                command.name
+            );
+            assert!(
+                command
+                    .name
+                    .chars()
+                    .all(|ch| ch.is_ascii_lowercase() || ch.is_ascii_digit()),
+                "/{} command names must stay lowercase ASCII",
+                command.name
+            );
+
+            let expected_usage_prefix = format!("/{}", command.name);
+            assert!(
+                command.usage.starts_with(&expected_usage_prefix),
+                "/{} usage must start with its canonical slash command, got {:?}",
+                command.name,
+                command.usage
+            );
+
+            let description = command.description_for(Locale::En);
+            assert!(
+                !description.trim().is_empty(),
+                "/{} must have non-empty English help text",
+                command.name
+            );
+
+            let palette_command = command.palette_command();
+            assert!(
+                palette_command.starts_with(&expected_usage_prefix),
+                "/{} palette command must use the canonical command, got {:?}",
+                command.name,
+                palette_command
+            );
+            assert_eq!(
+                palette_command.ends_with(' '),
+                command.requires_argument(),
+                "/{} palette command spacing must match argument requirement",
+                command.name
+            );
+
+            for &alias in command.aliases {
+                assert!(
+                    !alias.trim().is_empty(),
+                    "/{} alias must not be empty",
+                    command.name
+                );
+                assert_eq!(
+                    alias.trim(),
+                    alias,
+                    "/{} alias /{alias} must not need trimming",
+                    command.name
+                );
+                assert!(
+                    !alias.starts_with('/'),
+                    "/{} alias /{alias} must be stored without a slash",
+                    command.name
+                );
+                assert!(
+                    !alias.chars().any(char::is_whitespace),
+                    "/{} alias /{alias} must not contain whitespace",
+                    command.name
+                );
+            }
+        }
+    }
+
+    #[test]
+    fn command_info_resolves_canonical_names_and_aliases() {
+        for command in COMMANDS {
+            for lookup in [command.name.to_string(), format!("/{}", command.name)] {
+                let resolved = get_command_info(&lookup)
+                    .unwrap_or_else(|| panic!("{lookup:?} should resolve to /{}", command.name));
+                assert_eq!(resolved.name, command.name);
+            }
+
+            for &alias in command.aliases {
+                for lookup in [alias.to_string(), format!("/{alias}")] {
+                    let resolved = get_command_info(&lookup).unwrap_or_else(|| {
+                        panic!("{lookup:?} should resolve to /{}", command.name)
+                    });
+                    assert_eq!(resolved.name, command.name);
+                }
+            }
+        }
+    }
+
+    #[test]
+    fn every_registered_command_has_a_help_topic() {
+        let mut app = create_test_app();
+        for command in COMMANDS {
+            let result = execute(&format!("/help {}", command.name), &mut app);
+            assert!(
+                !result.is_error,
+                "/help {} returned an error: {result:?}",
+                command.name
+            );
+            let message = result
+                .message
+                .unwrap_or_else(|| panic!("/help {} should return text", command.name));
+            assert!(
+                message.contains(command.name),
+                "/help {} should mention the command name, got {message:?}",
+                command.name
+            );
+            assert!(
+                message.contains(command.usage),
+                "/help {} should include usage {:?}, got {message:?}",
+                command.name,
+                command.usage
+            );
+        }
+    }
+
    #[test]
    fn context_command_opens_inspector_and_keeps_ctx_alias() {
        let context = COMMANDS
@@ -1303,6 +1451,68 @@ mod tests {
        assert!(result.message.unwrap().contains("off"));
    }

+    #[test]
+    fn execute_sidebar_toggles_visibility() {
+        let mut app = create_test_app();
+        app.set_sidebar_focus(SidebarFocus::Auto);
+
+        let result = execute("/sidebar", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Hidden);
+        assert!(app.status_message.is_none());
+        assert_eq!(result.message.as_deref(), Some("Sidebar is hidden"));
+
+        let result = execute("/sidebar", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Auto);
+        assert!(app.status_message.is_none());
+        assert_eq!(result.message.as_deref(), Some("Sidebar is visible"));
+    }
+
+    #[test]
+    fn execute_sidebar_accepts_explicit_focus_targets() {
+        let mut app = create_test_app();
+
+        let result = execute("/sidebar tasks", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Tasks);
+        assert!(app.status_message.is_none());
+
+        let result = execute("/sidebar off", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Hidden);
+        assert!(app.status_message.is_none());
+
+        let result = execute("/sidebar closed", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Hidden);
+        assert!(app.status_message.is_none());
+
+        let result = execute("/sidebar none", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Hidden);
+        assert!(app.status_message.is_none());
+
+        let result = execute("/sidebar on", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.sidebar_focus, SidebarFocus::Auto);
+        assert!(app.status_message.is_none());
+    }
+
+    #[test]
+    fn execute_sidebar_rejects_invalid_args() {
+        let mut app = create_test_app();
+        let result = execute("/sidebar maybe", &mut app);
+        assert!(result.is_error);
+        assert!(
+            result
+                .message
+                .as_deref()
+                .unwrap_or_default()
+                .contains("Usage: /sidebar")
+        );
+    }
+
    #[test]
    fn execute_links_and_aliases_return_links_message() {
        let mut app = create_test_app();
@@ -1445,6 +1655,86 @@ mod tests {
        name == "restore"
    }

+    #[test]
+    fn slash_parser_preserves_arguments_after_the_command_name() {
+        let mut app = create_test_app();
+        let result = execute("/agent 2 review   this   carefully", &mut app);
+        assert!(!result.is_error);
+        let Some(AppAction::SendMessage(message)) = result.action else {
+            panic!("expected /agent to send a model instruction");
+        };
+        assert!(message.contains(r#"prompt: "review   this   carefully""#));
+        assert!(message.contains("max_depth: 2"));
+
+        let mut app = create_test_app();
+        let result = execute("   /relay   ship   command   harness   ", &mut app);
+        assert!(!result.is_error);
+        let Some(AppAction::SendMessage(message)) = result.action else {
+            panic!("expected /relay to send a model instruction");
+        };
+        assert!(message.contains("Requested relay focus: ship   command   harness"));
+
+        let mut app = create_test_app();
+        let result = execute("/rlm 3 inspect   this   corpus", &mut app);
+        assert!(!result.is_error);
+        let Some(AppAction::SendMessage(message)) = result.action else {
+            panic!("expected /rlm to send a model instruction");
+        };
+        assert!(message.contains(r#"content: "inspect   this   corpus""#));
+        assert!(message.contains("sub_rlm_max_depth: 3"));
+    }
+
+    #[test]
+    fn representative_command_groups_keep_dispatch_surfaces() {
+        let mut app = create_test_app();
+        let help = execute("/help clear", &mut app)
+            .message
+            .expect("/help clear should return text");
+        assert!(help.contains("clear"));
+        assert!(help.contains("/clear"));
+
+        let mut app = create_test_app();
+        let result = execute("/config", &mut app);
+        assert!(matches!(result.action, Some(AppAction::OpenConfigView)));
+
+        let mut app = create_test_app();
+        let result = execute("/relay command boundary", &mut app);
+        assert!(!result.is_error);
+        assert!(matches!(
+            result.action,
+            Some(AppAction::SendMessage(message))
+                if message.contains("Requested relay focus: command boundary")
+        ));
+
+        let mut app = create_test_app();
+        let note_help = execute("/note help", &mut app)
+            .message
+            .expect("/note help should return text");
+        assert!(note_help.contains("Usage: /note"));
+
+        let mut app = create_test_app();
+        let result = execute("/hunt ship layer 2 | budget: 100", &mut app);
+        assert!(!result.is_error);
+        assert_eq!(app.hunt.quarry.as_deref(), Some("ship layer 2"));
+        assert_eq!(app.hunt.token_budget, Some(100));
+
+        let (mut app, _tmpdir, _guard) = create_isolated_test_app();
+        let skills = execute("/skills", &mut app)
+            .message
+            .expect("/skills should return text");
+        assert!(skills.contains("Skills location:"));
+
+        let mut app = create_test_app();
+        let result = execute("/task list", &mut app);
+        assert!(matches!(result.action, Some(AppAction::TaskList)));
+
+        let mut app = create_test_app();
+        let tokens = execute("/tokens", &mut app)
+            .message
+            .expect("/tokens should return text");
+        assert!(tokens.contains("deepseek-v4-pro"));
+    }
+
    /// Smoke test: every entry in `COMMANDS` must dispatch to a real handler.
    /// A dispatch miss surfaces as the fall-through `Unknown command:` error
    /// message in `execute`. This catches the case where a new command is
@@ -70,7 +70,7 @@ enum NetworkEdit {
 }

 fn list_policy() -> anyhow::Result<String> {
-    let path = super::config::config_toml_path(None)?;
+    let path = crate::config_persistence::config_toml_path(None)?;
    let doc = load_config_doc(&path)?;
    let network = doc.get("network").and_then(Value::as_table);
    let default = network
@@ -97,7 +97,7 @@ fn list_policy() -> anyhow::Result<String> {
 }

 fn update_host(edit: NetworkEdit, host: &str) -> anyhow::Result<String> {
-    let path = super::config::config_toml_path(None)?;
+    let path = crate::config_persistence::config_toml_path(None)?;
    let mut doc = load_config_doc(&path)?;
    let network = network_table_mut(&mut doc)?;

@@ -136,7 +136,7 @@ fn update_default(value: &str) -> anyhow::Result<String> {
        _ => bail!("Usage: /network default <allow|deny|prompt>"),
    };

-    let path = super::config::config_toml_path(None)?;
+    let path = crate::config_persistence::config_toml_path(None)?;
    let mut doc = load_config_doc(&path)?;
    let network = network_table_mut(&mut doc)?;
    network.insert("default".to_string(), Value::String(normalized.to_string()));
@@ -1,19 +1,23 @@
 //! `/restore` slash command — roll back the workspace to a prior snapshot.
 //!
-//! `/restore` (no arg) lists the most recent snapshots so the user can
-//! see what's available. `/restore <N>` restores the *N*th-most-recent
-//! snapshot, where `N=1` is the newest. In non-YOLO mode we refuse to
-//! mutate files unless the user has explicitly trusted the workspace
-//! (`/trust on` or YOLO) — the user can always view the list, just not
-//! one-shot revert without a safety net.
+//! `/restore` (no arg) lists the 20 most recent snapshots so the user can
+//! see what's available. `/restore list [N]` lists more snapshots, capped
+//! at 100. `/restore <N>` restores the *N*th-most-recent snapshot, where
+//! `N=1` is the newest. In non-YOLO mode we refuse to mutate files unless
+//! the user has explicitly trusted the workspace (`/trust on` or YOLO) —
+//! the user can always view the list, just not one-shot revert without a
+//! safety net.

 use super::CommandResult;
-use crate::snapshot::SnapshotRepo;
+use crate::snapshot::{Snapshot, SnapshotRepo};
 use crate::tui::app::App;
+use chrono::TimeZone;

-const LIST_LIMIT: usize = 10;
+const DEFAULT_LIST_LIMIT: usize = 20;
+const MAX_LIST_LIMIT: usize = 100;
+const MAX_RESTORE_INDEX: usize = 1000;

-/// Entry point for `/restore [N]`.
+/// Entry point for `/restore [N|list [N]]`.
 pub fn restore(app: &mut App, arg: Option<&str>) -> CommandResult {
    let workspace = app.workspace.clone();
    let repo = match SnapshotRepo::open_or_init(&workspace) {
@@ -26,29 +30,51 @@ pub fn restore(app: &mut App, arg: Option<&str>) -> CommandResult {
        }
    };

-    let snapshots = match repo.list(LIST_LIMIT) {
-        Ok(s) => s,
-        Err(e) => return CommandResult::error(format!("Failed to list snapshots: {e}")),
-    };
-
-    if snapshots.is_empty() {
-        return CommandResult::message(
-            "No snapshots yet. Send a message to create the first pre-turn snapshot.",
-        );
-    }
-
    let Some(arg) = arg.map(str::trim).filter(|s| !s.is_empty()) else {
+        let snapshots = match repo.list(DEFAULT_LIST_LIMIT) {
+            Ok(s) => s,
+            Err(e) => return CommandResult::error(format!("Failed to list snapshots: {e}")),
+        };
+        if snapshots.is_empty() {
+            return no_snapshots_message();
+        }
        return CommandResult::message(format_listing(&snapshots));
    };

+    if let Some(limit) = match parse_list_arg(arg) {
+        Ok(limit) => limit,
+        Err(message) => return CommandResult::error(message),
+    } {
+        let snapshots = match repo.list(limit) {
+            Ok(s) => s,
+            Err(e) => return CommandResult::error(format!("Failed to list snapshots: {e}")),
+        };
+        if snapshots.is_empty() {
+            return no_snapshots_message();
+        }
+        return CommandResult::message(format_listing(&snapshots));
+    }
+
    let n: usize = match arg.parse() {
-        Ok(n) if n >= 1 => n,
+        Ok(n) if (1..=MAX_RESTORE_INDEX).contains(&n) => n,
+        Ok(n) if n > MAX_RESTORE_INDEX => {
+            return CommandResult::error(format!(
+                "Restore index must be <= {MAX_RESTORE_INDEX}; got {n}. Use /restore list [N] to inspect snapshots first.",
+            ));
+        }
        _ => {
            return CommandResult::error(format!(
-                "Usage: /restore <N>  (N is 1-based; got '{arg}')",
+                "Usage: /restore <N> or /restore list [N]  (N is 1-based; got '{arg}')",
            ));
        }
    };
+    let snapshots = match repo.list(n.max(DEFAULT_LIST_LIMIT)) {
+        Ok(s) => s,
+        Err(e) => return CommandResult::error(format!("Failed to list snapshots: {e}")),
+    };
+    if snapshots.is_empty() {
+        return no_snapshots_message();
+    }

    if n > snapshots.len() {
        return CommandResult::error(format!(
@@ -81,12 +107,49 @@ pub fn restore(app: &mut App, arg: Option<&str>) -> CommandResult {
    ))
 }

-fn format_listing(snapshots: &[crate::snapshot::Snapshot]) -> String {
-    let mut out = String::from("Recent snapshots (newest first; pass /restore <N> to revert):\n");
+fn parse_list_arg(arg: &str) -> Result<Option<usize>, String> {
+    let mut parts = arg.split_whitespace();
+    let action = match parts.next() {
+        Some(action) => action,
+        None => return Ok(None),
+    };
+    if action != "list" {
+        return Ok(None);
+    }
+    let Some(value) = parts.next() else {
+        return Ok(Some(DEFAULT_LIST_LIMIT));
+    };
+    if parts.next().is_some() {
+        return Err(format!(
+            "Usage: /restore list [N]  (got extra arguments in '{arg}')",
+        ));
+    }
+    match value.parse::<usize>() {
+        Ok(limit @ 1..=MAX_LIST_LIMIT) => Ok(Some(limit)),
+        Ok(limit) if limit > MAX_LIST_LIMIT => Err(format!(
+            "Restore list limit must be <= {MAX_LIST_LIMIT}; got {limit}.",
+        )),
+        _ => Err(format!(
+            "Usage: /restore list [N]  (N must be >= 1; got '{value}')",
+        )),
+    }
+}
+
+fn no_snapshots_message() -> CommandResult {
+    CommandResult::message(
+        "No snapshots yet. Send a message to create the first pre-turn snapshot.",
+    )
+}
+
+fn format_listing(snapshots: &[Snapshot]) -> String {
+    let mut out = String::from(
+        "Recent snapshots (newest first; pass /restore <N> to revert; /restore list 50 shows more):\n",
+    );
    for (i, s) in snapshots.iter().enumerate() {
        out.push_str(&format!(
-            "  #{:<2}  {}  {}\n",
+            "  #{:<2}  {}  {}  {}\n",
            i + 1,
+            format_snapshot_time(s.timestamp),
            short_sha(s.id.as_str()),
            s.label,
        ));
@@ -94,6 +157,13 @@ fn format_listing(snapshots: &[crate::snapshot::Snapshot]) -> String {
    out
 }

+fn format_snapshot_time(timestamp: i64) -> String {
+    match chrono::Utc.timestamp_opt(timestamp, 0).single() {
+        Some(dt) => dt.format("%Y-%m-%d %H:%M UTC").to_string(),
+        None => "unknown time".to_string(),
+    }
+}
+
 fn short_sha(sha: &str) -> &str {
    &sha[..sha.len().min(8)]
 }
@@ -195,6 +265,117 @@ mod tests {
        assert!(msg.contains("#2"));
    }

+    #[test]
+    fn restore_lists_more_than_ten_snapshots_by_default() {
+        let tmp = TempDir::new().unwrap();
+        let _home = scoped_home(&tmp);
+        let mut app = make_app(&tmp, true);
+        let repo = SnapshotRepo::open_or_init(&app.workspace).unwrap();
+        for i in 0..12 {
+            std::fs::write(app.workspace.join("a.txt"), format!("v{i}")).unwrap();
+            repo.snapshot(&format!("turn:{i}")).unwrap();
+        }
+
+        let result = restore(&mut app, None);
+        let msg = result.message.expect("expected message");
+        assert!(msg.contains("#12"), "{msg}");
+        assert!(msg.contains("turn:0"), "{msg}");
+    }
+
+    #[test]
+    fn restore_listing_includes_snapshot_utc_time() {
+        let snapshots = [Snapshot {
+            id: crate::snapshot::SnapshotId("abcdef123456".to_string()),
+            label: "turn:demo".to_string(),
+            timestamp: 1_700_000_000,
+        }];
+
+        let msg = format_listing(&snapshots);
+
+        assert!(msg.contains("2023-11-14 22:13 UTC"), "{msg}");
+        assert!(msg.contains("abcdef12"), "{msg}");
+        assert!(msg.contains("turn:demo"), "{msg}");
+    }
+
+    #[test]
+    fn restore_list_subcommand_accepts_explicit_limit() {
+        let tmp = TempDir::new().unwrap();
+        let _home = scoped_home(&tmp);
+        let mut app = make_app(&tmp, true);
+        let repo = SnapshotRepo::open_or_init(&app.workspace).unwrap();
+        for i in 0..15 {
+            std::fs::write(app.workspace.join("a.txt"), format!("v{i}")).unwrap();
+            repo.snapshot(&format!("turn:{i}")).unwrap();
+        }
+
+        let result = restore(&mut app, Some("list 12"));
+        let msg = result.message.expect("expected message");
+        assert!(msg.contains("#12"), "{msg}");
+        assert!(!msg.contains("#13"), "{msg}");
+    }
+
+    #[test]
+    fn restore_list_subcommand_rejects_invalid_limit() {
+        let tmp = TempDir::new().unwrap();
+        let _home = scoped_home(&tmp);
+        let mut app = make_app(&tmp, true);
+
+        let result = restore(&mut app, Some("list nope"));
+        assert!(result.is_error);
+        assert!(result.message.unwrap().contains("Usage: /restore list [N]"));
+    }
+
+    #[test]
+    fn restore_list_subcommand_rejects_limit_above_cap() {
+        let tmp = TempDir::new().unwrap();
+        let _home = scoped_home(&tmp);
+        let mut app = make_app(&tmp, true);
+
+        let result = restore(&mut app, Some("list 101"));
+        assert!(result.is_error);
+        assert!(
+            result
+                .message
+                .unwrap()
+                .contains("Restore list limit must be <= 100")
+        );
+    }
+
+    #[test]
+    fn restore_numeric_index_can_target_beyond_default_listing() {
+        let tmp = TempDir::new().unwrap();
+        let _home = scoped_home(&tmp);
+        let mut app = make_app(&tmp, true);
+        let repo = SnapshotRepo::open_or_init(&app.workspace).unwrap();
+        let f = app.workspace.join("a.txt");
+        for i in 0..12 {
+            std::fs::write(&f, format!("v{i}")).unwrap();
+            repo.snapshot(&format!("turn:{i}")).unwrap();
+        }
+        std::fs::write(&f, "changed").unwrap();
+
+        let result = restore(&mut app, Some("12"));
+        assert!(result.message.unwrap().contains("Restored"));
+        assert_eq!(std::fs::read_to_string(&f).unwrap(), "v0");
+    }
+
+    #[test]
+    fn restore_numeric_index_rejects_unbounded_query() {
+        let tmp = TempDir::new().unwrap();
+        let _home = scoped_home(&tmp);
+        let mut app = make_app(&tmp, true);
+
+        let result = restore(&mut app, Some("1001"));
+
+        assert!(result.is_error);
+        assert!(
+            result
+                .message
+                .unwrap()
+                .contains("Restore index must be <= 1000")
+        );
+    }
+
    #[test]
    fn restore_in_yolo_reverts_workspace() {
        let tmp = TempDir::new().unwrap();
@@ -13,10 +13,32 @@ use crate::tui::history::HistoryCell;

 use super::CommandResult;

+#[cfg(test)]
+thread_local! {
+    static TEST_HOME_DIR: std::cell::RefCell<Option<std::path::PathBuf>> =
+        const { std::cell::RefCell::new(None) };
+}
+
+#[cfg(not(test))]
 fn discover_visible_skills(app: &App) -> SkillRegistry {
    crate::skills::discover_for_workspace_and_dir(&app.workspace, &app.skills_dir)
 }

+#[cfg(test)]
+fn discover_visible_skills(app: &App) -> SkillRegistry {
+    TEST_HOME_DIR.with(|home| {
+        if let Some(home) = home.borrow().as_deref() {
+            crate::skills::discover_for_workspace_and_dir_with_home(
+                &app.workspace,
+                &app.skills_dir,
+                Some(home),
+            )
+        } else {
+            crate::skills::discover_for_workspace_and_dir(&app.workspace, &app.skills_dir)
+        }
+    })
+}
+
 fn render_skill_warnings(registry: &SkillRegistry) -> String {
    if registry.warnings().is_empty() {
        return String::new();
@@ -601,6 +623,7 @@ mod tests {
        _lock: std::sync::MutexGuard<'static, ()>,
        home_prev: Option<OsString>,
        userprofile_prev: Option<OsString>,
+        test_home_prev: Option<std::path::PathBuf>,
    }

    impl IsolatedHome {
@@ -616,10 +639,12 @@ mod tests {
                std::env::set_var("HOME", &home);
                std::env::set_var("USERPROFILE", &home);
            }
+            let test_home_prev = TEST_HOME_DIR.with(|slot| slot.replace(Some(home)));
            Self {
                _lock: lock,
                home_prev,
                userprofile_prev,
+                test_home_prev,
            }
        }

@@ -634,6 +659,9 @@ mod tests {

    impl Drop for IsolatedHome {
        fn drop(&mut self) {
+            TEST_HOME_DIR.with(|slot| {
+                *slot.borrow_mut() = self.test_home_prev.take();
+            });
            // SAFETY: the shared test env mutex is still held while Drop runs.
            unsafe {
                Self::restore_var("HOME", self.home_prev.take());
@@ -6,7 +6,7 @@
 //! `/name`, the file contents are sent as a user message.
 //!
 //! Files may include optional YAML-like frontmatter between `---` markers.
-//! Supported fields are `description`, `argument-hint`, and `allowed-tools`.
+//! Supported fields are `description`, `argument-hint`, `allowed-tools`, and `pausable`.
 //! Frontmatter is stripped before the command body is sent to the model.
 //!
 //! ## Precedence
@@ -206,6 +206,9 @@ pub fn try_dispatch_user_command(app: &mut App, input: &str) -> Option<CommandRe
            app.hunt.verdict = HuntVerdict::Hunting;
            app.hunt.token_budget = None;
            app.active_allowed_tools = None;
+            app.pausable = false;
+            app.paused = false;
+            app.paused_quarry = None;
            for (key, value) in &metadata {
                match key.as_str() {
                    "description" => {
@@ -215,6 +218,9 @@ pub fn try_dispatch_user_command(app: &mut App, input: &str) -> Option<CommandRe
                    "allowed-tools" => {
                        app.active_allowed_tools = Some(parse_allowed_tools(value));
                    }
+                    "pausable" => {
+                        app.pausable = value.trim().eq_ignore_ascii_case("true");
+                    }
                    _ => {}
                }
            }
@@ -226,22 +232,6 @@ pub fn try_dispatch_user_command(app: &mut App, input: &str) -> Option<CommandRe
    None
 }

-/// Get user command names that match a given prefix (for autocomplete).
-///
-/// The prefix should be the command name portion only (after `/`).
-/// Returns entries formatted as `/name`.
-///
-/// `workspace` is used to also scan workspace-local command directories;
-/// pass `None` when no workspace context is available.
-pub fn user_commands_matching(prefix: &str, workspace: Option<&Path>) -> Vec<String> {
-    let prefix = prefix.to_lowercase();
-    load_user_commands(workspace)
-        .into_iter()
-        .filter(|(name, _)| name.starts_with(&prefix))
-        .map(|(name, _)| format!("/{name}"))
-        .collect()
-}
-
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -301,12 +291,6 @@ mod tests {
        assert!(result.is_none());
    }

-    #[test]
-    fn test_user_commands_matching_with_prefix_no_workspace() {
-        let matches = user_commands_matching("zzzznotfound", None);
-        assert!(matches.is_empty());
-    }
-
    // ── Workspace-local commands tests ─────────────────────────────────

    fn write_command(dir: &Path, name: &str, body: &str) {
@@ -468,23 +452,6 @@ mod tests {
        }
    }

-    #[test]
-    fn user_commands_matching_with_workspace() {
-        let tmp = TempDir::new().unwrap();
-        let ws = tmp.path();
-        write_command(
-            &ws.join(".deepseek").join("commands"),
-            "project-cmd",
-            "body",
-        );
-
-        let matches = user_commands_matching("project", Some(ws));
-        assert!(
-            matches.contains(&"/project-cmd".to_string()),
-            "got: {matches:?}"
-        );
-    }
-
    #[test]
    fn frontmatter_is_stripped_before_dispatch() {
        use crate::config::Config;
@@ -561,6 +528,84 @@ mod tests {
        );
    }

+    #[test]
+    fn pausable_frontmatter_sets_app_state_without_worktree_mutation() {
+        use crate::config::Config;
+
+        if std::process::Command::new("git")
+            .arg("--version")
+            .output()
+            .is_err()
+        {
+            return;
+        }
+
+        let tmp = TempDir::new().unwrap();
+        let ws = tmp.path().to_path_buf();
+        let init = std::process::Command::new("git")
+            .args(["-C", ws.to_str().unwrap(), "init"])
+            .output()
+            .expect("git init");
+        assert!(
+            init.status.success(),
+            "git init failed: {}",
+            String::from_utf8_lossy(&init.stderr)
+        );
+        std::fs::write(ws.join("user-work.txt"), "untracked user work").unwrap();
+        write_command(
+            &ws.join(".codewhale").join("commands"),
+            "pause-scan",
+            "---\ndescription: Scan repos\npausable: true\n---\nscan",
+        );
+
+        let mut app = App::new(test_options(ws.clone()), &Config::default());
+        let _ = try_dispatch_user_command(&mut app, "/pause-scan").unwrap();
+
+        assert!(app.pausable);
+        assert!(!app.paused);
+        assert!(app.paused_quarry.is_none());
+        assert!(ws.join("user-work.txt").exists());
+        let stash = std::process::Command::new("git")
+            .args(["-C", ws.to_str().unwrap(), "stash", "list"])
+            .output()
+            .expect("git stash list");
+        assert!(
+            stash.status.success(),
+            "git stash list failed: {}",
+            String::from_utf8_lossy(&stash.stderr)
+        );
+        assert!(
+            String::from_utf8_lossy(&stash.stdout).trim().is_empty(),
+            "pausable dispatch must not create git stash entries"
+        );
+    }
+
+    #[test]
+    fn new_user_command_clears_stale_paused_state() {
+        use crate::config::Config;
+
+        let tmp = TempDir::new().unwrap();
+        let ws = tmp.path().to_path_buf();
+        let commands_dir = ws.join(".codewhale").join("commands");
+        write_command(
+            &commands_dir,
+            "pause-scan",
+            "---\ndescription: Scan repos\npausable: true\n---\nscan",
+        );
+        write_command(&commands_dir, "plain", "plain command");
+
+        let mut app = App::new(test_options(ws), &Config::default());
+        let _ = try_dispatch_user_command(&mut app, "/pause-scan").unwrap();
+        app.paused = true;
+        app.paused_quarry = Some("Scan repos".to_string());
+
+        let _ = try_dispatch_user_command(&mut app, "/plain").unwrap();
+
+        assert!(!app.pausable);
+        assert!(!app.paused);
+        assert!(app.paused_quarry.is_none());
+    }
+
    #[test]
    fn review_regression_empty_allowed_tools_blocks_all_tools() {
        use crate::config::Config;
@@ -60,6 +60,8 @@ impl Default for CompactionConfig {
 }

 pub const KEEP_RECENT_MESSAGES: usize = 4;
+#[allow(dead_code)]
+pub const HARD_COMPACT_KEEP_RECENT: usize = 8;
 const RECENT_WORKING_SET_WINDOW: usize = 12;
 const MAX_WORKING_SET_PATHS: usize = 24;
 const MIN_SUMMARIZE_MESSAGES: usize = 6;
@@ -121,6 +123,29 @@ pub struct CompactionPlan {
    pub summarize_indices: Vec<usize>,
 }

+#[derive(Debug, Clone, PartialEq, Eq)]
+#[allow(dead_code)]
+pub struct HardCompactionConfig {
+    pub enabled: bool,
+    pub keep_recent: usize,
+}
+
+impl Default for HardCompactionConfig {
+    fn default() -> Self {
+        Self {
+            enabled: false,
+            keep_recent: HARD_COMPACT_KEEP_RECENT,
+        }
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+#[allow(dead_code)]
+pub struct HardCompactionPlan {
+    pub summarize_indices: Vec<usize>,
+    pub preserved_indices: Vec<usize>,
+}
+
 fn path_regex() -> &'static Regex {
    static PATH_RE: OnceLock<Regex> = OnceLock::new();
    PATH_RE.get_or_init(|| {
@@ -450,6 +475,32 @@ pub fn plan_compaction(
    }
 }

+#[allow(dead_code)]
+pub fn plan_hard_compaction(
+    messages: &[Message],
+    workspace: Option<&Path>,
+    keep_recent: usize,
+) -> Option<HardCompactionPlan> {
+    if keep_recent == 0 || messages.len() < keep_recent.saturating_add(MIN_SUMMARIZE_MESSAGES) {
+        return None;
+    }
+
+    let soft_plan = plan_compaction(messages, workspace, keep_recent, None, None);
+    if soft_plan.summarize_indices.len() < MIN_SUMMARIZE_MESSAGES {
+        return None;
+    }
+
+    let summarized: BTreeSet<_> = soft_plan.summarize_indices.iter().copied().collect();
+    let preserved_indices = (0..messages.len())
+        .filter(|idx| !summarized.contains(idx))
+        .collect();
+
+    Some(HardCompactionPlan {
+        summarize_indices: soft_plan.summarize_indices,
+        preserved_indices,
+    })
+}
+
 fn enforce_tool_call_pairs(messages: &[Message], pinned_indices: &mut BTreeSet<usize>) {
    if pinned_indices.is_empty() {
        return;
@@ -2100,6 +2151,80 @@ mod tests {
        assert!(plan.pinned_indices.contains(&1));
    }

+    #[test]
+    fn plan_hard_compaction_returns_none_when_too_few_messages() {
+        let messages = vec![
+            msg("user", "hello"),
+            msg("assistant", "hi"),
+            msg("user", "how are you"),
+            msg("assistant", "good"),
+        ];
+
+        assert!(plan_hard_compaction(&messages, None, HARD_COMPACT_KEEP_RECENT).is_none());
+    }
+
+    #[test]
+    fn plan_hard_compaction_preserves_recent_tail() {
+        let messages: Vec<Message> = (0..20)
+            .map(|i| {
+                msg(
+                    if i % 2 == 0 { "user" } else { "assistant" },
+                    &format!("message {i}"),
+                )
+            })
+            .collect();
+
+        let plan =
+            plan_hard_compaction(&messages, None, HARD_COMPACT_KEEP_RECENT).expect("hard plan");
+
+        let expected_recent: Vec<usize> = (20 - HARD_COMPACT_KEEP_RECENT..20).collect();
+        for idx in expected_recent {
+            assert!(plan.preserved_indices.contains(&idx));
+            assert!(!plan.summarize_indices.contains(&idx));
+        }
+        assert_eq!(plan.summarize_indices, (0..12).collect::<Vec<_>>());
+    }
+
+    #[test]
+    fn plan_hard_compaction_keeps_tool_pairs_across_tail_boundary() {
+        let mut messages: Vec<Message> = (0..8)
+            .map(|i| msg("user", &format!("summarizable noise {i}")))
+            .collect();
+        messages.push(Message {
+            role: "assistant".to_string(),
+            content: vec![ContentBlock::ToolUse {
+                id: "tail-call".to_string(),
+                name: "read_file".to_string(),
+                input: json!({"path": "crates/tui/src/compaction.rs"}),
+                caller: None,
+            }],
+        });
+        messages.push(Message {
+            role: "user".to_string(),
+            content: vec![ContentBlock::ToolResult {
+                tool_use_id: "tail-call".to_string(),
+                content: "file contents".to_string(),
+                is_error: None,
+                content_blocks: None,
+            }],
+        });
+
+        let plan = plan_hard_compaction(&messages, None, 1).expect("hard plan");
+
+        assert!(plan.preserved_indices.contains(&8));
+        assert!(plan.preserved_indices.contains(&9));
+        assert!(!plan.summarize_indices.contains(&8));
+        assert!(!plan.summarize_indices.contains(&9));
+    }
+
+    #[test]
+    fn hard_compaction_config_defaults_to_disabled() {
+        let config = HardCompactionConfig::default();
+
+        assert!(!config.enabled);
+        assert_eq!(config.keep_recent, HARD_COMPACT_KEEP_RECENT);
+    }
+
    #[test]
    fn should_compact_ignores_fully_pinned_context() {
        let config = CompactionConfig {
@@ -0,0 +1,461 @@
+//! Config file path resolution and TOML persistence helpers.
+//!
+//! These helpers are used by command handlers and non-command UI code, so
+//! persistence lives outside the command tree.
+
+use std::path::{Path, PathBuf};
+
+use crate::config::{ApiProvider, StatusItem, effective_home_dir, expand_path};
+
+pub(crate) fn persist_status_items(items: &[StatusItem]) -> anyhow::Result<PathBuf> {
+    use anyhow::Context;
+    use std::fs;
+
+    let path = config_toml_path(None)?;
+    if let Some(parent) = path.parent() {
+        fs::create_dir_all(parent)
+            .with_context(|| format!("failed to create config directory {}", parent.display()))?;
+    }
+
+    let mut doc: toml::Value = if path.exists() {
+        let raw = fs::read_to_string(&path)
+            .with_context(|| format!("failed to read config at {}", path.display()))?;
+        toml::from_str(&raw)
+            .with_context(|| format!("failed to parse config at {}", path.display()))?
+    } else {
+        toml::Value::Table(toml::value::Table::new())
+    };
+
+    let table = doc
+        .as_table_mut()
+        .context("config.toml root must be a table")?;
+    let tui_entry = table
+        .entry("tui".to_string())
+        .or_insert_with(|| toml::Value::Table(toml::value::Table::new()));
+    let tui_table = tui_entry
+        .as_table_mut()
+        .context("`tui` section in config.toml must be a table")?;
+    let array = items
+        .iter()
+        .map(|item| toml::Value::String(item.key().to_string()))
+        .collect::<Vec<_>>();
+    tui_table.insert("status_items".to_string(), toml::Value::Array(array));
+
+    let body = toml::to_string_pretty(&doc).context("failed to serialize config.toml")?;
+    fs::write(&path, body)
+        .with_context(|| format!("failed to write config at {}", path.display()))?;
+    Ok(path)
+}
+
+pub(crate) fn persist_root_string_key(
+    config_path: Option<&Path>,
+    key: &str,
+    value: &str,
+) -> anyhow::Result<PathBuf> {
+    use anyhow::Context;
+    use std::fs;
+
+    let path = config_toml_path(config_path)?;
+    if let Some(parent) = path.parent() {
+        fs::create_dir_all(parent)
+            .with_context(|| format!("failed to create config directory {}", parent.display()))?;
+    }
+
+    let mut doc: toml::Value = if path.exists() {
+        let raw = fs::read_to_string(&path)
+            .with_context(|| format!("failed to read config at {}", path.display()))?;
+        toml::from_str(&raw)
+            .with_context(|| format!("failed to parse config at {}", path.display()))?
+    } else {
+        toml::Value::Table(toml::value::Table::new())
+    };
+    let table = doc
+        .as_table_mut()
+        .context("config.toml root must be a table")?;
+    table.insert(key.to_string(), toml::Value::String(value.to_string()));
+    let body = toml::to_string_pretty(&doc).context("failed to serialize config.toml")?;
+    fs::write(&path, body)
+        .with_context(|| format!("failed to write config at {}", path.display()))?;
+    Ok(path)
+}
+
+pub(crate) fn persist_root_bool_key(
+    config_path: Option<&Path>,
+    key: &str,
+    value: bool,
+) -> anyhow::Result<PathBuf> {
+    use anyhow::Context;
+    use std::fs;
+
+    let path = config_toml_path(config_path)?;
+    if let Some(parent) = path.parent() {
+        fs::create_dir_all(parent)
+            .with_context(|| format!("failed to create config directory {}", parent.display()))?;
+    }
+
+    let mut doc: toml::Value = if path.exists() {
+        let raw = fs::read_to_string(&path)
+            .with_context(|| format!("failed to read config at {}", path.display()))?;
+        toml::from_str(&raw)
+            .with_context(|| format!("failed to parse config at {}", path.display()))?
+    } else {
+        toml::Value::Table(toml::value::Table::new())
+    };
+    let table = doc
+        .as_table_mut()
+        .context("config.toml root must be a table")?;
+    table.insert(key.to_string(), toml::Value::Boolean(value));
+    let body = toml::to_string_pretty(&doc).context("failed to serialize config.toml")?;
+    fs::write(&path, body)
+        .with_context(|| format!("failed to write config at {}", path.display()))?;
+    Ok(path)
+}
+
+pub(crate) fn persist_tui_integer_key(
+    config_path: Option<&Path>,
+    key: &str,
+    value: u64,
+) -> anyhow::Result<PathBuf> {
+    use anyhow::Context;
+    use std::fs;
+
+    let path = config_toml_path(config_path)?;
+    if let Some(parent) = path.parent() {
+        fs::create_dir_all(parent)
+            .with_context(|| format!("failed to create config directory {}", parent.display()))?;
+    }
+
+    let mut doc: toml::Value = if path.exists() {
+        let raw = fs::read_to_string(&path)
+            .with_context(|| format!("failed to read config at {}", path.display()))?;
+        toml::from_str(&raw)
+            .with_context(|| format!("failed to parse config at {}", path.display()))?
+    } else {
+        toml::Value::Table(toml::value::Table::new())
+    };
+    let table = doc
+        .as_table_mut()
+        .context("config.toml root must be a table")?;
+    let tui_entry = table
+        .entry("tui".to_string())
+        .or_insert_with(|| toml::Value::Table(toml::value::Table::new()));
+    let tui_table = tui_entry
+        .as_table_mut()
+        .context("`tui` section in config.toml must be a table")?;
+    let value = i64::try_from(value).context("integer value is too large for TOML")?;
+    tui_table.insert(key.to_string(), toml::Value::Integer(value));
+    let body = toml::to_string_pretty(&doc).context("failed to serialize config.toml")?;
+    fs::write(&path, body)
+        .with_context(|| format!("failed to write config at {}", path.display()))?;
+    Ok(path)
+}
+
+pub(crate) fn persist_provider_base_url_key(
+    config_path: Option<&Path>,
+    provider: ApiProvider,
+    value: &str,
+) -> anyhow::Result<PathBuf> {
+    use anyhow::Context;
+    use std::fs;
+
+    let path = config_toml_path(config_path)?;
+    if let Some(parent) = path.parent() {
+        fs::create_dir_all(parent)
+            .with_context(|| format!("failed to create config directory {}", parent.display()))?;
+    }
+
+    let mut doc: toml::Value = if path.exists() {
+        let raw = fs::read_to_string(&path)
+            .with_context(|| format!("failed to read config at {}", path.display()))?;
+        toml::from_str(&raw)
+            .with_context(|| format!("failed to parse config at {}", path.display()))?
+    } else {
+        toml::Value::Table(toml::value::Table::new())
+    };
+    let table = doc
+        .as_table_mut()
+        .context("config.toml root must be a table")?;
+    let providers = table
+        .entry("providers".to_string())
+        .or_insert_with(|| toml::Value::Table(toml::value::Table::new()))
+        .as_table_mut()
+        .context("`providers` must be a table")?;
+    let provider_key = provider_base_url_table_key(provider)?;
+    let entry = providers
+        .entry(provider_key.to_string())
+        .or_insert_with(|| toml::Value::Table(toml::value::Table::new()))
+        .as_table_mut()
+        .with_context(|| format!("`providers.{provider_key}` must be a table"))?;
+    entry.insert(
+        "base_url".to_string(),
+        toml::Value::String(value.to_string()),
+    );
+
+    let body = toml::to_string_pretty(&doc).context("failed to serialize config.toml")?;
+    fs::write(&path, body)
+        .with_context(|| format!("failed to write config at {}", path.display()))?;
+    Ok(path)
+}
+
+fn provider_base_url_table_key(provider: ApiProvider) -> anyhow::Result<&'static str> {
+    match provider {
+        ApiProvider::Deepseek | ApiProvider::DeepseekCN => {
+            anyhow::bail!("DeepSeek uses the root base_url setting")
+        }
+        ApiProvider::NvidiaNim => Ok("nvidia_nim"),
+        ApiProvider::Openai => Ok("openai"),
+        ApiProvider::Atlascloud => Ok("atlascloud"),
+        ApiProvider::WanjieArk => Ok("wanjie_ark"),
+        ApiProvider::Volcengine => Ok("volcengine"),
+        ApiProvider::Openrouter => Ok("openrouter"),
+        ApiProvider::XiaomiMimo => Ok("xiaomi_mimo"),
+        ApiProvider::Novita => Ok("novita"),
+        ApiProvider::Fireworks => Ok("fireworks"),
+        ApiProvider::Siliconflow | ApiProvider::SiliconflowCn => Ok("siliconflow"),
+        ApiProvider::Arcee => Ok("arcee"),
+        ApiProvider::Huggingface => Ok("huggingface"),
+        ApiProvider::Moonshot => Ok("moonshot"),
+        ApiProvider::Sglang => Ok("sglang"),
+        ApiProvider::Vllm => Ok("vllm"),
+        ApiProvider::Ollama => Ok("ollama"),
+    }
+}
+
+pub(crate) fn config_toml_path(config_path: Option<&Path>) -> anyhow::Result<PathBuf> {
+    use anyhow::Context;
+
+    if let Some(path) = config_path {
+        return Ok(expand_path(path.to_string_lossy().as_ref()));
+    }
+    if let Ok(env) = std::env::var("CODEWHALE_CONFIG_PATH") {
+        let trimmed = env.trim();
+        if !trimmed.is_empty() {
+            return Ok(PathBuf::from(trimmed));
+        }
+    }
+    if let Ok(env) = std::env::var("DEEPSEEK_CONFIG_PATH") {
+        let trimmed = env.trim();
+        if !trimmed.is_empty() {
+            return Ok(PathBuf::from(trimmed));
+        }
+    }
+    let home =
+        effective_home_dir().context("failed to resolve home directory for config.toml path")?;
+    let primary = home.join(".codewhale").join("config.toml");
+    if primary.exists() {
+        return Ok(primary);
+    }
+    let legacy = home.join(".deepseek").join("config.toml");
+    if legacy.exists() {
+        return Ok(legacy);
+    }
+    Ok(primary)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::env;
+    use std::ffi::OsString;
+    use std::fs;
+    use std::path::Path;
+    use std::time::{SystemTime, UNIX_EPOCH};
+
+    struct EnvGuard {
+        home: Option<OsString>,
+        userprofile: Option<OsString>,
+        codewhale_config_path: Option<OsString>,
+        deepseek_config_path: Option<OsString>,
+        _lock: std::sync::MutexGuard<'static, ()>,
+    }
+
+    impl EnvGuard {
+        fn new(home: &Path) -> Self {
+            let lock = crate::test_support::lock_test_env();
+            let home_str = OsString::from(home.as_os_str());
+            let config_path = home.join(".deepseek").join("config.toml");
+            let config_str = OsString::from(config_path.as_os_str());
+            let home_prev = env::var_os("HOME");
+            let userprofile_prev = env::var_os("USERPROFILE");
+            let codewhale_config_prev = env::var_os("CODEWHALE_CONFIG_PATH");
+            let deepseek_config_prev = env::var_os("DEEPSEEK_CONFIG_PATH");
+
+            // Safety: test-only environment mutation guarded by process-wide mutex.
+            unsafe {
+                env::set_var("HOME", &home_str);
+                env::set_var("USERPROFILE", &home_str);
+                env::remove_var("CODEWHALE_CONFIG_PATH");
+                env::set_var("DEEPSEEK_CONFIG_PATH", &config_str);
+            }
+
+            Self {
+                home: home_prev,
+                userprofile: userprofile_prev,
+                codewhale_config_path: codewhale_config_prev,
+                deepseek_config_path: deepseek_config_prev,
+                _lock: lock,
+            }
+        }
+    }
+
+    impl Drop for EnvGuard {
+        fn drop(&mut self) {
+            if let Some(value) = self.home.take() {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::set_var("HOME", value);
+                }
+            } else {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::remove_var("HOME");
+                }
+            }
+
+            if let Some(value) = self.userprofile.take() {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::set_var("USERPROFILE", value);
+                }
+            } else {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::remove_var("USERPROFILE");
+                }
+            }
+
+            if let Some(value) = self.codewhale_config_path.take() {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::set_var("CODEWHALE_CONFIG_PATH", value);
+                }
+            } else {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::remove_var("CODEWHALE_CONFIG_PATH");
+                }
+            }
+
+            if let Some(value) = self.deepseek_config_path.take() {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::set_var("DEEPSEEK_CONFIG_PATH", value);
+                }
+            } else {
+                // Safety: test-only environment mutation guarded by a global mutex.
+                unsafe {
+                    env::remove_var("DEEPSEEK_CONFIG_PATH");
+                }
+            }
+        }
+    }
+
+    fn temp_root(prefix: &str) -> std::path::PathBuf {
+        let nanos = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_nanos();
+        env::temp_dir().join(format!("{prefix}-{}-{nanos}", std::process::id()))
+    }
+
+    #[test]
+    fn persist_status_items_writes_tui_section_to_config_toml() {
+        let temp_root = temp_root("codewhale-statusline-persist");
+        fs::create_dir_all(&temp_root).unwrap();
+        let _guard = EnvGuard::new(&temp_root);
+
+        let items = vec![
+            crate::config::StatusItem::Mode,
+            crate::config::StatusItem::Model,
+            crate::config::StatusItem::Cost,
+        ];
+
+        let path = persist_status_items(&items).expect("persist should succeed");
+        let body = fs::read_to_string(&path).expect("written file should be readable");
+        assert!(body.contains("[tui]"), "expected [tui] section in {body}");
+        assert!(
+            body.contains("status_items"),
+            "expected status_items key in {body}"
+        );
+        assert!(body.contains("\"mode\""), "expected mode key in {body}");
+        assert!(body.contains("\"cost\""), "expected cost key in {body}");
+    }
+
+    #[test]
+    fn config_toml_path_uses_codewhale_home_for_fresh_installs() {
+        let temp_root = temp_root("codewhale-config-path-fresh");
+        fs::create_dir_all(&temp_root).unwrap();
+        let _guard = EnvGuard::new(&temp_root);
+
+        unsafe {
+            env::remove_var("DEEPSEEK_CONFIG_PATH");
+        }
+
+        assert_eq!(
+            config_toml_path(None).unwrap(),
+            temp_root.join(".codewhale").join("config.toml")
+        );
+    }
+
+    #[test]
+    fn config_toml_path_preserves_legacy_config_when_it_exists() {
+        let temp_root = temp_root("codewhale-config-path-legacy");
+        let legacy_config = temp_root.join(".deepseek").join("config.toml");
+        fs::create_dir_all(legacy_config.parent().unwrap()).unwrap();
+        fs::write(&legacy_config, "").unwrap();
+        let _guard = EnvGuard::new(&temp_root);
+
+        unsafe {
+            env::remove_var("DEEPSEEK_CONFIG_PATH");
+        }
+
+        assert_eq!(config_toml_path(None).unwrap(), legacy_config);
+    }
+
+    #[test]
+    fn config_toml_path_prefers_codewhale_env_over_legacy_env() {
+        let temp_root = temp_root("codewhale-config-path-env");
+        fs::create_dir_all(&temp_root).unwrap();
+        let _guard = EnvGuard::new(&temp_root);
+        let preferred = temp_root.join("preferred.toml");
+        let legacy = temp_root.join("legacy.toml");
+
+        unsafe {
+            env::set_var("CODEWHALE_CONFIG_PATH", &preferred);
+            env::set_var("DEEPSEEK_CONFIG_PATH", &legacy);
+        }
+
+        assert_eq!(config_toml_path(None).unwrap(), preferred);
+    }
+
+    #[test]
+    fn persist_status_items_preserves_existing_unrelated_keys() {
+        let temp_root = temp_root("codewhale-statusline-preserve");
+        fs::create_dir_all(&temp_root).unwrap();
+        let _guard = EnvGuard::new(&temp_root);
+
+        let path = temp_root.join(".deepseek").join("config.toml");
+        fs::create_dir_all(path.parent().unwrap()).unwrap();
+        fs::write(
+            &path,
+            "api_key = \"sentinel-key\"\nmodel = \"deepseek-v4-pro\"\n",
+        )
+        .unwrap();
+
+        let written = persist_status_items(&[crate::config::StatusItem::Mode])
+            .expect("persist should succeed");
+        let body = fs::read_to_string(&written).expect("written file should be readable");
+        assert!(
+            body.contains("api_key = \"sentinel-key\""),
+            "round-trip lost api_key: {body}"
+        );
+        assert!(
+            body.contains("model = \"deepseek-v4-pro\""),
+            "round-trip lost model: {body}"
+        );
+        assert!(
+            body.contains("status_items"),
+            "expected status_items in {body}"
+        );
+    }
+}
@@ -405,7 +405,7 @@ pub async fn start_web_editor(app: &App, config: &Config) -> Result<WebConfigSes
        let poll_tx = tx.clone();
        let poll_url = format!("{url}/api/session");
        let poll_task = tokio::spawn(async move {
-            let client = reqwest::Client::new();
+            let client = crate::tls::reqwest_client();
            let mut last: Option<ConfigUiDocument> = Some(app_snapshot);
            loop {
                tokio::time::sleep(Duration::from_millis(750)).await;
@@ -596,7 +596,7 @@ pub fn apply_document(
        app.status_items = new_status_items.clone();
        app.needs_redraw = true;
        if persist {
-            let path = commands::persist_status_items(&new_status_items)?;
+            let path = crate::config_persistence::persist_status_items(&new_status_items)?;
            notes.push(format!("status_items saved to {}", path.display()));
        } else {
            notes.push("status_items updated for this session".to_string());
@@ -685,7 +685,7 @@ fn apply_reasoning_effort(
    app.last_effective_reasoning_effort = None;
    app.update_model_compaction_budget();
    if persist {
-        commands::persist_root_string_key(
+        crate::config_persistence::persist_root_string_key(
            app.config_path.as_deref(),
            "reasoning_effort",
            effort.as_setting(),
@@ -168,9 +168,29 @@ impl StructuredState {

        if let Some(plan) = self.plan_snapshot.as_ref() {
            out.push_str("\nStrategy metadata\n");
-            if let Some(explanation) = plan.explanation.as_ref() {
-                out.push_str(&format!("{explanation}\n\n"));
-            }
+            append_plan_field(&mut out, "Title", plan.title.as_deref());
+            append_plan_field(&mut out, "Objective", plan.objective.as_deref());
+            append_plan_field(&mut out, "Context", plan.context_summary.as_deref());
+            append_plan_field(&mut out, "Explanation", plan.explanation.as_deref());
+            append_plan_list(&mut out, "Source", &plan.sources_used);
+            append_plan_list(&mut out, "Critical file", &plan.critical_files);
+            append_plan_list(&mut out, "Constraint", &plan.constraints);
+            append_plan_field(
+                &mut out,
+                "Recommended approach",
+                plan.recommended_approach.as_deref(),
+            );
+            append_plan_field(
+                &mut out,
+                "Verification plan",
+                plan.verification_plan.as_deref(),
+            );
+            append_plan_field(
+                &mut out,
+                "Risks and unknowns",
+                plan.risks_and_unknowns.as_deref(),
+            );
+            append_plan_field(&mut out, "Handoff packet", plan.handoff_packet.as_deref());
            for item in &plan.items {
                let marker = match item.status {
                    crate::tools::plan::StepStatus::Pending => "[ ]",
@@ -204,6 +224,21 @@ impl StructuredState {
    }
 }

+fn append_plan_field(out: &mut String, label: &str, value: Option<&str>) {
+    if let Some(value) = value.map(str::trim).filter(|value| !value.is_empty()) {
+        out.push_str(&format!("- {label}: {value}\n"));
+    }
+}
+
+fn append_plan_list(out: &mut String, label: &str, values: &[String]) {
+    for value in values {
+        let value = value.trim();
+        if !value.is_empty() {
+            out.push_str(&format!("- {label}: {value}\n"));
+        }
+    }
+}
+
 // === Types ===

 /// Configuration for the engine
@@ -309,11 +344,17 @@ pub struct EngineConfig {
    /// Metaso also falls back to `METASO_API_KEY` env var, then a built-in key.
    /// Baidu also falls back to `BAIDU_SEARCH_API_KEY`.
    pub search_api_key: Option<String>,
+    /// Optional DuckDuckGo-compatible HTML endpoint override.
+    pub search_base_url: Option<String>,
    /// Per-step DeepSeek API timeout for sub-agent `create_message` requests.
    /// Resolved from `[subagents] api_timeout_secs` (clamped to 1..=1800)
    /// once at engine construction, then threaded onto every
    /// `SubAgentRuntime` the engine builds (#1806, #1808).
    pub subagent_api_timeout: Duration,
+    /// Per-SSE-chunk idle timeout for streamed model responses.
+    /// Resolved from `[tui].stream_chunk_timeout_secs` (or the legacy
+    /// `DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS`) and updated live by `/config`.
+    pub stream_chunk_timeout: Duration,
    /// No-progress heartbeat timeout for live sub-agents. Used by the manager
    /// and parent wait loop to auto-cancel stuck children before they exhaust
    /// the sub-agent slot pool indefinitely (#2614).
@@ -373,9 +414,13 @@ impl Default for EngineConfig {
            workshop: None,
            search_provider: crate::config::SearchProvider::default(),
            search_api_key: None,
+            search_base_url: None,
            subagent_api_timeout: Duration::from_secs(
                crate::config::DEFAULT_SUBAGENT_API_TIMEOUT_SECS,
            ),
+            stream_chunk_timeout: Duration::from_secs(
+                crate::config::DEFAULT_STREAM_CHUNK_TIMEOUT_SECS,
+            ),
            subagent_heartbeat_timeout: Duration::from_secs(
                crate::config::DEFAULT_SUBAGENT_HEARTBEAT_TIMEOUT_SECS,
            ),
@@ -439,6 +484,8 @@ pub struct EngineHandle {
    tx_user_input: mpsc::Sender<UserInputDecision>,
    /// Send steer input for an in-flight turn.
    tx_steer: mpsc::Sender<String>,
+    /// Shared pause flag set by the TUI and read by the turn loop.
+    shared_paused: Arc<StdMutex<bool>>,
 }

 // `impl EngineHandle { ... }` moved to `engine/handle.rs` so the
@@ -505,6 +552,15 @@ pub struct Engine {
    slop_ledger_gate_cache: Option<(Option<SystemTime>, Option<String>)>,
    /// Current operating mode. Updated on `ChangeMode` and `SendMessage`.
    current_mode: AppMode,
+    /// Process-local cache for `estimated_input_tokens`. Memoizes the most
+    /// recent token estimate keyed on `(session.messages_revision,
+    /// system_prompt_fingerprint)`. Five call sites per turn consult this
+    /// (engine capacity checkpoints, seam manager, trim budget, etc.) plus
+    /// four TUI / command consumers; the cache turns N×O(messages) walks
+    /// into a single recompute on a content change.
+    token_estimate_cache: TokenEstimateCache,
+    /// Shared pause flag set by the TUI and read before tool execution.
+    shared_paused: Arc<StdMutex<bool>>,
 }

 // === Internal tool helpers ===
@@ -528,6 +584,10 @@ impl Engine {
            Ok(mut slot) => *slot = None,
            Err(poisoned) => *poisoned.into_inner() = None,
        }
+        match self.shared_paused.lock() {
+            Ok(mut paused) => *paused = false,
+            Err(poisoned) => *poisoned.into_inner() = false,
+        }
    }

    fn env_only_api_key_recovery_hint(api_config: &Config) -> Option<String> {
@@ -579,6 +639,8 @@ impl Engine {

    /// Create a new engine with the given configuration
    pub fn new(config: EngineConfig, api_config: &Config) -> (Self, EngineHandle) {
+        crate::tls::ensure_rustls_crypto_provider();
+
        if let Some(objective) = normalized_goal_objective(config.goal_objective.as_deref()) {
            sync_goal_state_from_host(&config.goal_state, Some(&objective), None, false);
        }
@@ -592,6 +654,7 @@ impl Engine {
        let cancel_token = CancellationToken::new();
        let shared_cancel_token = Arc::new(StdMutex::new(cancel_token.clone()));
        let cancel_reason: Arc<StdMutex<Option<CancelReason>>> = Arc::new(StdMutex::new(None));
+        let shared_paused = Arc::new(StdMutex::new(false));
        let tool_exec_lock = Arc::new(RwLock::new(()));

        // Create clients for both providers
@@ -633,7 +696,6 @@ impl Engine {
                    show_thinking: config.show_thinking,
                    allow_shell: config.allow_shell,
                },
-                session.approval_mode,
            );
        let stable_prompt = Some(system_prompt);
        session.last_system_prompt_hash = Some(system_prompt_hash(stable_prompt.as_ref()));
@@ -754,6 +816,8 @@ impl Engine {
            workshop_vars,
            sandbox_backend,
            current_mode: AppMode::Agent,
+            token_estimate_cache: TokenEstimateCache::new(),
+            shared_paused: shared_paused.clone(),
        };
        engine.rehydrate_latest_canonical_state();

@@ -765,6 +829,7 @@ impl Engine {
            tx_approval,
            tx_user_input,
            tx_steer,
+            shared_paused,
        };

        (engine, handle)
@@ -798,11 +863,12 @@ impl Engine {
        self.session.trust_mode = trust_mode;
        self.config.trust_mode = trust_mode;
        self.session.auto_approve = auto_approve;
-        self.session.approval_mode = if auto_approve {
-            crate::tui::approval::ApprovalMode::Auto
-        } else {
-            approval_mode
-        };
+        let agent_approval_mode = agent_approval_mode_for_turn(auto_approve, approval_mode);
+        // Only track the Agent-mode approval — Yolo/Plan have fixed
+        // approval policies that are derived from the mode itself.
+        if mode == AppMode::Agent {
+            self.session.approval_mode = agent_approval_mode;
+        }

        let _ = self
            .tx_event
@@ -1179,17 +1245,8 @@ impl Engine {
                    let _ = self.tx_event.send(Event::AgentList { agents }).await;
                }
                Op::ChangeMode { mode } => {
-                    let previous_mode = self.current_mode;
                    self.current_mode = mode;
-                    self.refresh_system_prompt(mode);
                    self.emit_session_updated().await;
-                    // Notify the agent that the mode has changed so it can re-evaluate
-                    // any operations that were blocked by the previous mode's policy.
-                    if previous_mode != mode {
-                        let msg = Self::mode_change_runtime_message(previous_mode, mode);
-                        self.session.add_message(msg);
-                        self.emit_session_updated().await;
-                    }
                    let _ = self
                        .tx_event
                        .send(Event::status(format!(
@@ -1198,11 +1255,11 @@ impl Engine {
                        )))
                        .await;
                }
-                Op::SetModel { model, mode } => {
+                Op::SetModel { model, mode: _ } => {
                    self.session.auto_model = model.trim().eq_ignore_ascii_case("auto");
                    self.session.model = model;
                    self.config.model.clone_from(&self.session.model);
-                    self.refresh_system_prompt(mode);
+                    self.refresh_system_prompt();
                    self.emit_session_updated().await;
                    let _ = self
                        .tx_event
@@ -1223,6 +1280,15 @@ impl Engine {
                        )))
                        .await;
                }
+                Op::SetStreamChunkTimeout { timeout_secs } => {
+                    self.config.stream_chunk_timeout = Duration::from_secs(timeout_secs);
+                    let _ = self
+                        .tx_event
+                        .send(Event::status(format!(
+                            "Stream chunk timeout set to {timeout_secs}s"
+                        )))
+                        .await;
+                }
                Op::SyncSession {
                    session_id,
                    messages,
@@ -1240,6 +1306,10 @@ impl Engine {
                    self.session.compaction_summary_prompt =
                        extract_compaction_summary_prompt(system_prompt.clone());
                    self.session.system_prompt = system_prompt;
+                    self.session.last_system_prompt_hash =
+                        Some(system_prompt_hash(self.session.system_prompt.as_ref()));
+                    // Host-supplied prompts are persisted prefixes. Keep them
+                    // byte-stable; mode/runtime state is projected per request.
                    self.session.system_prompt_override =
                        system_prompt_override && self.session.system_prompt.is_some();
                    self.session.auto_model = model.trim().eq_ignore_ascii_case("auto");
@@ -1282,6 +1352,7 @@ impl Engine {
                    }
                    if let Some(idx) = cut {
                        self.session.messages.truncate(idx);
+                        self.session.bump_messages_revision();
                    }
                    // Now dispatch the new message as a normal send,
                    // reusing the engine's stored mode/model config.
@@ -1335,47 +1406,6 @@ impl Engine {
            .await;
    }

-    /// Build a runtime event message notifying the agent that the operating mode has changed.
-    /// This lets the agent re-evaluate any operations that were blocked by the previous mode.
-    fn mode_change_runtime_message(previous_mode: AppMode, new_mode: AppMode) -> Message {
-        let (policy_note, re_eval_hint) = match new_mode {
-            AppMode::Yolo => (
-                "all operations run automatically without requiring user approval",
-                "Re-evaluate any previously blocked write, patch, or shell operations \
-                 — they are now auto-approved.",
-            ),
-            AppMode::Agent => (
-                "read-only operations run silently; writes, patches, and shell \
-                 commands require user approval",
-                "Any operations you ran automatically under YOLO mode now require \
-                 explicit user approval before executing.",
-            ),
-            AppMode::Plan => (
-                "all writes and patches are blocked; shell and code execution are unavailable",
-                "Any previously planned operations that require writes or shell access \
-                 must wait until the mode changes back to Agent or YOLO.",
-            ),
-        };
-        Message {
-            role: "user".to_string(),
-            content: vec![ContentBlock::Text {
-                text: format!(
-                    "<codewhale:runtime_event kind=\"mode_change\" visibility=\"internal\">\n\
-This is an internal runtime event, not user input. The operating mode has changed \
-from {previous} mode to {new} mode.\n\n\
-In {new} mode: {policy}\n\n\
-{re_eval}\n\
-</codewhale:runtime_event>",
-                    previous = previous_mode.description(),
-                    new = new_mode.description(),
-                    policy = policy_note,
-                    re_eval = re_eval_hint,
-                ),
-                cache_control: None,
-            }],
-        }
-    }
-
    async fn add_session_message(&mut self, message: Message) {
        self.session.add_message(message);
        self.emit_session_updated().await;
@@ -1420,6 +1450,18 @@ In {new} mode: {policy}\n\n\
        }
    }

+    fn runtime_prompt_message(&self) -> Message {
+        let mode = self.current_mode;
+        let approval_mode = approval_mode_for(mode, self.session.approval_mode);
+        Message {
+            role: "user".to_string(),
+            content: vec![ContentBlock::Text {
+                text: runtime_prompt_text(mode, approval_mode),
+                cache_control: None,
+            }],
+        }
+    }
+
    fn user_text_message_with_turn_metadata(&self, text: String) -> Message {
        self.user_text_message_with_turn_metadata_for_route(
            text,
@@ -1440,9 +1482,21 @@ In {new} mode: {policy}\n\n\
        reasoning_effort: Option<&str>,
        reasoning_effort_auto: bool,
    ) -> Message {
+        // Place the user text first and turn_meta last so that the leading
+        // bytes of each user message stay stable across date / model-route /
+        // working-set changes. DeepSeek's KV prefix cache matches byte
+        // sequences from the start of each message; when turn_meta (which
+        // contains the current date) sits at position 0 the entire user
+        // message prefix is invalidated at every date boundary. Moving it
+        // to the tail preserves the user-input prefix and limits cache
+        // invalidation to the trailing metadata block.
        Message {
            role: "user".to_string(),
            content: vec![
+                ContentBlock::Text {
+                    text,
+                    cache_control: None,
+                },
                self.turn_metadata_block(
                    routed_model,
                    mode,
@@ -1450,10 +1504,6 @@ In {new} mode: {policy}\n\n\
                    reasoning_effort,
                    reasoning_effort_auto,
                ),
-                ContentBlock::Text {
-                    text,
-                    cache_control: None,
-                },
            ],
        }
    }
@@ -1560,6 +1610,14 @@ In {new} mode: {policy}\n\n\
            .observe_user_message(&content, &self.session.workspace);
        let force_update_plan_first = should_force_update_plan_first(mode, &content);

+        let agent_approval_mode = agent_approval_mode_for_turn(auto_approve, approval_mode);
+        self.session.auto_approve = auto_approve;
+        // Only track the Agent-mode approval — Yolo/Plan have fixed
+        // approval policies that are derived from the mode itself.
+        if mode == AppMode::Agent {
+            self.session.approval_mode = agent_approval_mode;
+        }
+
        // Add user message to session
        let user_msg = self.user_text_message_with_turn_metadata_for_route(
            content,
@@ -1597,15 +1655,10 @@ In {new} mode: {policy}\n\n\
        self.config.trust_mode = trust_mode;
        self.config.translation_enabled = translation_enabled;
        self.config.show_thinking = show_thinking;
-        self.session.auto_approve = auto_approve;
-        self.session.approval_mode = if auto_approve {
-            crate::tui::approval::ApprovalMode::Auto
-        } else {
-            approval_mode
-        };

-        // Update system prompt to match current mode and include persisted compaction context.
-        self.refresh_system_prompt(mode);
+        // Refresh stable prompt context. Current mode is carried by the
+        // request-time runtime prompt projection.
+        self.refresh_system_prompt();
        self.emit_session_updated().await;

        // Build tool registry and tool list for the current mode
@@ -1708,14 +1761,21 @@ In {new} mode: {policy}\n\n\
                    } else {
                        None
                    };
-                    Some(
-                        builder
-                            .with_subagent_tools(
-                                self.subagent_manager.clone(),
-                                runtime.expect("sub-agent runtime should exist with active client"),
-                            )
-                            .build(tool_context),
-                    )
+                    if let Some(subagent_runtime) = runtime {
+                        Some(
+                            builder
+                                .with_subagent_tools(
+                                    self.subagent_manager.clone(),
+                                    subagent_runtime,
+                                )
+                                .build(tool_context),
+                        )
+                    } else {
+                        tracing::warn!(
+                            "Sub-agents enabled but no API client available, falling back to basic tool set"
+                        );
+                        Some(builder.build(tool_context))
+                    }
                } else {
                    Some(builder.build(tool_context))
                }
@@ -2011,10 +2071,15 @@ In {new} mode: {policy}\n\n\
            .await;
    }

-    fn estimated_input_tokens(&self) -> usize {
-        estimate_input_tokens_conservative(
-            &self.session.messages,
+    fn estimated_input_tokens(&mut self) -> usize {
+        // Memoized on (session.messages_revision, system-prompt fingerprint).
+        // The cache invalidates as soon as either input changes; until then
+        // repeated calls (capacity checkpoints, /status, context inspector,
+        // TUI footer) all hit the cached value.
+        self.token_estimate_cache.lookup_or_compute(
+            self.session.messages_revision,
            self.session.system_prompt.as_ref(),
+            &self.session.messages,
        )
    }

@@ -2024,6 +2089,7 @@ In {new} mode: {policy}\n\n\
            && self.estimated_input_tokens() > target_input_budget
        {
            self.session.messages.remove(0);
+            self.session.bump_messages_revision();
            removed = removed.saturating_add(1);
        }
        removed
@@ -2191,6 +2257,7 @@ In {new} mode: {policy}\n\n\
        // Wire search provider config.
        ctx.search_provider = self.config.search_provider;
        ctx.search_api_key = self.config.search_api_key.clone();
+        ctx.search_base_url = self.config.search_base_url.clone();

        let policy = sandbox_policy_for_mode(mode, &self.session.workspace);
        let mut ctx = ctx.with_elevated_sandbox_policy(policy);
@@ -2206,8 +2273,11 @@ In {new} mode: {policy}\n\n\
        if let Some(pool) = self.mcp_pool.as_ref() {
            return Ok(Arc::clone(pool));
        }
-        let mut pool = McpPool::from_config_path(&self.session.mcp_config_path)
-            .map_err(|e| ToolError::execution_failed(format!("Failed to load MCP config: {e}")))?;
+        let mut pool = McpPool::from_config_path_with_workspace(
+            &self.session.mcp_config_path,
+            &self.session.workspace,
+        )
+        .map_err(|e| ToolError::execution_failed(format!("Failed to load MCP config: {e}")))?;
        if let Some(decider) = self.config.network_policy.as_ref() {
            pool = pool.with_network_policy(decider.clone());
        }
@@ -2220,7 +2290,7 @@ In {new} mode: {policy}\n\n\
        let pool = match self.ensure_mcp_pool().await {
            Ok(pool) => pool,
            Err(err) => {
-                let _ = self.tx_event.send(Event::status(err.to_string())).await;
+                let _ = self.tx_event.send(Event::status(format!("{err:#}"))).await;
                return Vec::new();
            }
        };
@@ -2247,15 +2317,20 @@ In {new} mode: {policy}\n\n\
    /// assistant message. Called from `handle_deepseek_turn` before each API
    /// request so the model always has the latest navigation aids.
    async fn layered_context_checkpoint(&mut self) {
-        let Some(ref seam_mgr) = self.seam_manager else {
+        if self.seam_manager.is_none() {
            return;
-        };
-        if !seam_mgr.config().enabled {
+        }
+        if !self.seam_manager.as_ref().unwrap().config().enabled {
            return;
        }

+        // Compute the estimated token count *before* taking a long-lived
+        // `&SeamManager` borrow — `estimated_input_tokens` mutates the
+        // engine's token-estimate cache, which would conflict.
+        let estimated_tokens = self.estimated_input_tokens();
+        let seam_mgr = self.seam_manager.as_ref().unwrap();
        let highest = seam_mgr.highest_level().await;
-        let Some(level) = seam_mgr.seam_level_for(self.estimated_input_tokens(), highest) else {
+        let Some(level) = seam_mgr.seam_level_for(estimated_tokens, highest) else {
            return;
        };

@@ -2342,8 +2417,8 @@ In {new} mode: {policy}\n\n\
            )))
            .await;
    }
-    /// Refresh the system prompt based on current mode and context.
-    fn refresh_system_prompt(&mut self, mode: AppMode) {
+    /// Refresh the stable system prompt based on current non-mode context.
+    fn refresh_system_prompt(&mut self) {
        let user_memory_block =
            crate::memory::compose_block(self.config.memory_enabled, &self.config.memory_path);
        let prompt_goal_objective = goal_objective_for_prompt(
@@ -2351,7 +2426,7 @@ In {new} mode: {policy}\n\n\
            &self.config.goal_state,
        );
        let base = prompts::system_prompt_for_mode_with_context_skills_session_and_approval(
-            mode,
+            AppMode::Agent,
            &self.config.workspace,
            None,
            Some(&self.config.skills_dir),
@@ -2366,7 +2441,6 @@ In {new} mode: {policy}\n\n\
                show_thinking: self.config.show_thinking,
                allow_shell: self.session.allow_shell,
            },
-            self.session.approval_mode,
        );
        let mut stable_prompt =
            merge_system_prompts(Some(&base), self.session.compaction_summary_prompt.clone());
@@ -2384,7 +2458,6 @@ In {new} mode: {policy}\n\n\

        let stable_hash = system_prompt_hash(stable_prompt.as_ref());
        if self.session.system_prompt_override {
-            self.session.last_system_prompt_hash = Some(stable_hash);
            return;
        }
        if self.session.last_system_prompt_hash != Some(stable_hash) {
@@ -2532,13 +2605,10 @@ fn goal_objective_for_prompt(
 ) -> Option<String> {
    match goal_state.lock() {
        Ok(state) => {
-            if state.objective().is_some() {
-                return state.is_active().then(|| {
-                    state
-                        .objective()
-                        .expect("checked goal objective")
-                        .to_string()
-                });
+            if let Some(objective) = state.objective() {
+                // Preserve original behavior: return None (not fallback) when
+                // objective exists but goal is inactive.
+                return state.is_active().then(|| objective.to_string());
            }
        }
        Err(err) => tracing::warn!("goal state lock poisoned while building prompt: {err}"),
@@ -2546,6 +2616,59 @@ fn goal_objective_for_prompt(
    normalized_goal_objective(configured_goal)
 }

+// ── Mode & approval prompts as request-time runtime metadata ─────────
+//
+// Mode contracts and approval policies are not persisted in the session
+// history and are not sent as extra system messages. Instead, each API
+// request projects a transient user-role runtime metadata message at the
+// tail. The stable system prompt remains byte-stable, stored history remains
+// byte-stable, and strict chat-template providers never see a system message
+// outside messages[0].
+
+fn approval_mode_for(
+    mode: AppMode,
+    session_approval: crate::tui::approval::ApprovalMode,
+) -> crate::tui::approval::ApprovalMode {
+    match mode {
+        AppMode::Yolo => crate::tui::approval::ApprovalMode::Auto,
+        AppMode::Plan => crate::tui::approval::ApprovalMode::Never,
+        AppMode::Agent => session_approval,
+    }
+}
+
+fn agent_approval_mode_for_turn(
+    auto_approve: bool,
+    approval_mode: crate::tui::approval::ApprovalMode,
+) -> crate::tui::approval::ApprovalMode {
+    if auto_approve {
+        crate::tui::approval::ApprovalMode::Auto
+    } else {
+        approval_mode
+    }
+}
+
+/// Produce a minimal runtime-policy tag for the per-turn transient user message.
+///
+/// All mode and approval policy descriptions live in the frozen system-prompt
+/// prefix (`render_runtime_policy_reference()`). This tag is a pointer — the
+/// model looks up the corresponding rules from the system prompt.  Reduces
+/// per-request overhead from ~500 tokens to ~12 tokens.
+fn runtime_prompt_text(mode: AppMode, approval_mode: crate::tui::approval::ApprovalMode) -> String {
+    let mode_str = match mode {
+        AppMode::Agent => "agent",
+        AppMode::Plan => "plan",
+        AppMode::Yolo => "yolo",
+    };
+    let approval_str = match approval_mode {
+        crate::tui::approval::ApprovalMode::Auto => "auto",
+        crate::tui::approval::ApprovalMode::Suggest => "suggest",
+        crate::tui::approval::ApprovalMode::Never => "never",
+    };
+    format!(
+        "<runtime_prompt visibility=\"internal\" mode=\"{mode_str}\" approval=\"{approval_str}\"/>"
+    )
+}
+
 /// Spawn the engine in a background task
 pub fn spawn_engine(config: EngineConfig, api_config: &Config) -> EngineHandle {
    let (engine, handle) = Engine::new(config, api_config);
@@ -2609,6 +2732,7 @@ pub(crate) fn mock_engine_handle() -> MockEngineHandle {
    let cancel_token = CancellationToken::new();
    let shared_cancel_token = Arc::new(StdMutex::new(cancel_token.clone()));
    let cancel_reason: Arc<StdMutex<Option<CancelReason>>> = Arc::new(StdMutex::new(None));
+    let shared_paused = Arc::new(StdMutex::new(false));
    let handle = EngineHandle {
        tx_op,
        rx_event: Arc::new(RwLock::new(rx_event)),
@@ -2617,6 +2741,7 @@ pub(crate) fn mock_engine_handle() -> MockEngineHandle {
        tx_approval,
        tx_user_input,
        tx_steer,
+        shared_paused,
    };

    MockEngineHandle {
@@ -2636,17 +2761,19 @@ mod handle;
 pub(crate) use context::compact_tool_result_for_context;
 use context::{
    COMPACTION_SUMMARY_MARKER, MAX_CONTEXT_RECOVERY_ATTEMPTS, MIN_RECENT_MESSAGES_TO_KEEP,
-    context_input_budget, effective_max_output_tokens, estimate_input_tokens_conservative,
-    extract_compaction_summary_prompt, is_context_length_error_message, summarize_text,
+    context_input_budget, effective_max_output_tokens, extract_compaction_summary_prompt,
+    is_context_length_error_message, summarize_text,
 };
 mod dispatch;
 mod loop_guard;
 mod lsp_hooks;
 mod streaming;
+mod token_estimate_cache;
 mod tool_catalog;
 mod tool_execution;
 mod tool_setup;
 mod turn_loop;
+pub(crate) use token_estimate_cache::TokenEstimateCache;

 pub(crate) fn default_active_native_tool_names() -> &'static [&'static str] {
    tool_catalog::DEFAULT_ACTIVE_NATIVE_TOOLS
@@ -2671,7 +2798,7 @@ use self::streaming::{
    ContentBlockKind, FAKE_WRAPPER_NOTICE, MAX_STREAM_ERRORS_BEFORE_FAIL,
    MAX_TRANSPARENT_STREAM_RETRIES, STREAM_MAX_CONTENT_BYTES, STREAM_MAX_DURATION_SECS,
    ToolUseState, contains_fake_tool_wrapper, filter_tool_call_delta,
-    should_transparently_retry_stream, stream_chunk_timeout_secs,
+    should_transparently_retry_stream,
 };
 use self::tool_catalog::{
    CODE_EXECUTION_TOOL_NAME, JS_EXECUTION_TOOL_NAME, MULTI_TOOL_PARALLEL_NAME,
@@ -16,9 +16,8 @@ impl Engine {
        client: Option<&DeepSeekClient>,
        mode: AppMode,
    ) -> bool {
-        let snapshot = self
-            .capacity_controller
-            .observe_pre_turn(self.capacity_observation(turn));
+        let observation = self.capacity_observation(turn);
+        let snapshot = self.capacity_controller.observe_pre_turn(observation);
        let decision = self
            .capacity_controller
            .decide(self.turn_counter, snapshot.as_ref());
@@ -37,16 +36,15 @@ impl Engine {
    pub(super) async fn run_capacity_post_tool_checkpoint(
        &mut self,
        turn: &TurnContext,
-        mode: AppMode,
+
        tool_registry: Option<&crate::tools::ToolRegistry>,
        tool_exec_lock: Arc<RwLock<()>>,
        mcp_pool: Option<Arc<AsyncMutex<McpPool>>>,
        _step_error_count: usize,
        _consecutive_tool_error_steps: u32,
    ) -> bool {
-        let snapshot = self
-            .capacity_controller
-            .observe_post_tool(self.capacity_observation(turn));
+        let observation = self.capacity_observation(turn);
+        let snapshot = self.capacity_controller.observe_post_tool(observation);
        let decision = self
            .capacity_controller
            .decide(self.turn_counter, snapshot.as_ref());
@@ -58,7 +56,6 @@ impl Engine {
                let _ = self
                    .apply_verify_with_tool_replay(
                        turn,
-                        mode,
                        snapshot.as_ref(),
                        tool_registry,
                        tool_exec_lock,
@@ -68,7 +65,7 @@ impl Engine {
                false
            }
            GuardrailAction::VerifyAndReplan => {
-                self.apply_verify_and_replan(turn, mode, snapshot.as_ref(), "high_risk_post_tool")
+                self.apply_verify_and_replan(turn, snapshot.as_ref(), "high_risk_post_tool")
                    .await
            }
            GuardrailAction::NoIntervention | GuardrailAction::TargetedContextRefresh => false,
@@ -78,7 +75,7 @@ impl Engine {
    pub(super) async fn run_capacity_error_escalation_checkpoint(
        &mut self,
        turn: &TurnContext,
-        mode: AppMode,
+
        step_error_count: usize,
        consecutive_tool_error_steps: u32,
        error_categories: &[ErrorCategory],
@@ -111,8 +108,8 @@ impl Engine {
            .last_snapshot()
            .cloned()
            .or_else(|| {
-                self.capacity_controller
-                    .observe_pre_turn(self.capacity_observation(turn))
+                let observation = self.capacity_observation(turn);
+                self.capacity_controller.observe_pre_turn(observation)
            });
        let Some(snapshot) = snapshot else {
            return false;
@@ -138,7 +135,6 @@ impl Engine {
        let category_labels: Vec<String> = error_categories.iter().map(|c| c.to_string()).collect();
        self.apply_verify_and_replan(
            turn,
-            mode,
            Some(&forced),
            &format!(
                "error_escalation: step_errors={}, consecutive_steps={}, categories={}",
@@ -150,7 +146,7 @@ impl Engine {
        .await
    }

-    pub(super) fn capacity_observation(&self, turn: &TurnContext) -> CapacityObservationInput {
+    pub(super) fn capacity_observation(&mut self, turn: &TurnContext) -> CapacityObservationInput {
        let message_window = self.config.capacity.profile_window.max(8) * 3;
        let action_count_this_turn = usize::try_from(turn.step)
            .unwrap_or(usize::MAX)
@@ -387,7 +383,7 @@ impl Engine {
        &mut self,
        turn: &TurnContext,
        client: Option<&DeepSeekClient>,
-        mode: AppMode,
+        _mode: AppMode,
        snapshot: Option<&CapacitySnapshot>,
    ) -> bool {
        let before_tokens = self.estimated_input_tokens();
@@ -467,7 +463,7 @@ impl Engine {
            GuardrailAction::TargetedContextRefresh,
            None,
        )));
-        self.refresh_system_prompt(mode);
+        self.refresh_system_prompt();
        self.emit_session_updated().await;

        let after_tokens = self.estimated_input_tokens();
@@ -489,7 +485,6 @@ impl Engine {
    pub(super) async fn apply_verify_with_tool_replay(
        &mut self,
        turn: &TurnContext,
-        mode: AppMode,
        snapshot: Option<&CapacitySnapshot>,
        tool_registry: Option<&crate::tools::ToolRegistry>,
        tool_exec_lock: Arc<RwLock<()>>,
@@ -619,7 +614,7 @@ impl Engine {
            GuardrailAction::VerifyWithToolReplay,
            Some(&verification_note),
        )));
-        self.refresh_system_prompt(mode);
+        self.refresh_system_prompt();
        self.emit_session_updated().await;

        let after_tokens = self.estimated_input_tokens();
@@ -640,7 +635,6 @@ impl Engine {
    pub(super) async fn apply_verify_and_replan(
        &mut self,
        turn: &TurnContext,
-        mode: AppMode,
        snapshot: Option<&CapacitySnapshot>,
        reason: &str,
    ) -> bool {
@@ -659,34 +653,18 @@ impl Engine {
            .persist_capacity_record(turn, GuardrailAction::VerifyAndReplan, &record)
            .await;

-        let latest_user = self
-            .session
-            .messages
-            .iter()
-            .rev()
-            .find(|msg| {
-                msg.role == "user"
-                    && msg
-                        .content
-                        .iter()
-                        .any(|block| matches!(block, ContentBlock::Text { .. }))
-            })
-            .cloned();
-        let latest_verified = self
-            .session
-            .messages
-            .iter()
-            .rev()
-            .find(|msg| {
-                msg.role == "user"
-                    && msg.content.iter().any(|block| match block {
-                        ContentBlock::ToolResult { content, .. } => {
-                            content.contains("[verification replay]")
-                        }
-                        _ => false,
-                    })
-            })
-            .cloned();
+        // The replan path needs the *full* messages, not summaries.
+        // `scan_canonical_inputs` already located the indices in a single
+        // reverse pass; clone from the live `messages` slice once. We
+        // pass `true` because the replan path consumes
+        // `latest_verified_user_idx` below.
+        let scan = scan_canonical_inputs(&self.session.messages, true);
+        let latest_user = scan
+            .latest_user_text_idx
+            .and_then(|idx| self.session.messages.get(idx).cloned());
+        let latest_verified = scan
+            .latest_verified_user_idx
+            .and_then(|idx| self.session.messages.get(idx).cloned());

        self.session.messages.clear();
        if let Some(msg) = latest_user {
@@ -695,6 +673,7 @@ impl Engine {
        if let Some(msg) = latest_verified {
            self.session.messages.push(msg);
        }
+        self.session.bump_messages_revision();

        self.merge_compaction_summary(Some(self.canonical_prompt(
            &canonical,
@@ -702,7 +681,7 @@ impl Engine {
            GuardrailAction::VerifyAndReplan,
            Some("Replan now from canonical state. Keep steps minimal and verifiable."),
        )));
-        self.refresh_system_prompt(mode);
+        self.refresh_system_prompt();
        self.emit_session_updated().await;

        let _ = self
@@ -765,20 +744,18 @@ impl Engine {
        turn: &TurnContext,
        note: Option<&str>,
    ) -> CanonicalState {
-        let goal = self
-            .session
-            .messages
-            .iter()
-            .rev()
-            .find_map(|msg| {
-                if msg.role != "user" {
-                    return None;
-                }
-                msg.content.iter().find_map(|block| match block {
-                    ContentBlock::Text { text, .. } => Some(summarize_text(text, 220)),
-                    _ => None,
-                })
-            })
+        // Single reverse scan of session.messages collects the goal,
+        // confirmed facts (capped at 4), and the latest verified-user
+        // message index. Previously this function did two reverse
+        // `.iter().rev().find_map()` walks and a third for facts; the
+        // dedicated scan below replaces all three with one pass that
+        // also early-exits once every collector is satisfied. We pass
+        // `false` here because build_canonical_state does not consume
+        // `latest_verified_user_idx`, so we don't need the scan to keep
+        // looking for it.
+        let scan = scan_canonical_inputs(&self.session.messages, false);
+        let goal = scan
+            .goal
            .unwrap_or_else(|| "Continue current task from compact state".to_string());

        let mut constraints = vec![
@@ -789,24 +766,6 @@ impl Engine {
            constraints.push(summarize_text(note, 180));
        }

-        let mut confirmed_facts = Vec::new();
-        for msg in self.session.messages.iter().rev() {
-            for block in &msg.content {
-                if let ContentBlock::ToolResult { content, .. } = block {
-                    if content.starts_with("Error:") {
-                        continue;
-                    }
-                    confirmed_facts.push(summarize_text(content, 180));
-                    if confirmed_facts.len() >= 4 {
-                        break;
-                    }
-                }
-            }
-            if confirmed_facts.len() >= 4 {
-                break;
-            }
-        }
-
        let open_loops: Vec<String> = turn
            .tool_calls
            .iter()
@@ -837,7 +796,7 @@ impl Engine {
        CanonicalState {
            goal,
            constraints,
-            confirmed_facts,
+            confirmed_facts: scan.confirmed_facts,
            open_loops,
            pending_actions,
            critical_refs,
@@ -975,3 +934,243 @@ impl Engine {
        self.merge_compaction_summary(Some(prompt));
    }
 }
+
+/// Maximum number of confirmed-fact snippets retained by the canonical-state
+/// scan. Matches the prior `build_canonical_state` behavior — only the
+/// four most recent non-error tool results are surfaced.
+const CANONICAL_SCAN_MAX_FACTS: usize = 4;
+
+/// Output of [`scan_canonical_inputs`]: everything `build_canonical_state`
+/// and `apply_verify_and_replan` need to know about the session's recent
+/// history, collected in a single reverse pass over `session.messages`.
+///
+/// Index fields (`latest_user_text_idx`, `latest_verified_user_idx`) point
+/// into the original `messages` slice so the caller can clone the full
+/// `Message` value when the re-plan path needs to keep it across a
+/// `messages.clear()`.
+#[derive(Debug, Default)]
+struct CanonicalStateScan {
+    /// Most recent user-text block, summarized to ≤220 chars. `None` when
+    /// no user message with a Text block exists.
+    goal: Option<String>,
+    /// Index of the most recent user message containing at least one
+    /// `Text` content block. Used by the re-plan path to keep the
+    /// latest user request across a `messages.clear()`.
+    latest_user_text_idx: Option<usize>,
+    /// Index of the most recent user message whose content includes a
+    /// `[verification replay]` tool result. Used by the re-plan path.
+    latest_verified_user_idx: Option<usize>,
+    /// Up to [`CANONICAL_SCAN_MAX_FACTS`] most recent non-error
+    /// `ToolResult` snippets, newest first.
+    confirmed_facts: Vec<String>,
+    /// Running count of facts collected so far; lets the early-exit
+    /// condition avoid an extra `Vec::len()` call per message.
+    facts_collected: usize,
+}
+
+impl CanonicalStateScan {
+    /// `true` once every collector the caller actually needs is satisfied.
+    ///
+    /// `find_verified` controls whether `latest_verified_user_idx` is part
+    /// of the early-exit gate. The build_canonical_state path does not
+    /// consume that field, so passing `false` lets the scan stop as soon
+    /// as the goal and `CANONICAL_SCAN_MAX_FACTS` facts are found — a
+    /// huge win on long histories with no verification replay.
+    fn is_complete(&self, find_verified: bool) -> bool {
+        self.goal.is_some()
+            && (!find_verified || self.latest_verified_user_idx.is_some())
+            && self.facts_collected >= CANONICAL_SCAN_MAX_FACTS
+    }
+}
+
+/// Walk `messages` once (in reverse) and collect everything the canonical
+/// state and re-plan paths need. Replaces the previous pattern of three
+/// independent reverse scans: one for the goal, one for confirmed facts,
+/// and one for the latest verified user message.
+///
+/// `find_verified` controls whether the scan bothers locating the
+/// latest verified user message. Callers that don't need it (e.g.
+/// `build_canonical_state`) should pass `false` so the early-exit
+/// condition can fire as soon as the goal + facts are gathered.
+fn scan_canonical_inputs(messages: &[Message], find_verified: bool) -> CanonicalStateScan {
+    let mut scan = CanonicalStateScan::default();
+    for (idx, msg) in messages.iter().enumerate().rev() {
+        if msg.role == "user" {
+            if scan.goal.is_none()
+                && let Some(text) = msg.content.iter().find_map(|b| match b {
+                    ContentBlock::Text { text, .. } => Some(text.as_str()),
+                    _ => None,
+                })
+            {
+                scan.goal = Some(summarize_text(text, 220));
+                scan.latest_user_text_idx = Some(idx);
+            }
+            if find_verified && scan.latest_verified_user_idx.is_none() {
+                let verified = msg.content.iter().any(|b| match b {
+                    ContentBlock::ToolResult { content, .. } => {
+                        content.contains("[verification replay]")
+                    }
+                    _ => false,
+                });
+                if verified {
+                    scan.latest_verified_user_idx = Some(idx);
+                }
+            }
+        }
+        if scan.facts_collected < CANONICAL_SCAN_MAX_FACTS {
+            for block in &msg.content {
+                if let ContentBlock::ToolResult { content, .. } = block
+                    && !content.starts_with("Error:")
+                {
+                    scan.confirmed_facts.push(summarize_text(content, 180));
+                    scan.facts_collected = scan.facts_collected.saturating_add(1);
+                    if scan.facts_collected >= CANONICAL_SCAN_MAX_FACTS {
+                        break;
+                    }
+                }
+            }
+        }
+        if scan.is_complete(find_verified) {
+            break;
+        }
+    }
+    scan
+}
+
+#[cfg(test)]
+mod canonical_scan_tests {
+    use super::*;
+    use crate::models::ContentBlock;
+
+    fn user_text_msg(text: &str) -> Message {
+        Message {
+            role: "user".to_string(),
+            content: vec![ContentBlock::Text {
+                text: text.to_string(),
+                cache_control: None,
+            }],
+        }
+    }
+
+    fn user_with_verified_replay(text: &str) -> Message {
+        Message {
+            role: "user".to_string(),
+            content: vec![
+                ContentBlock::Text {
+                    text: text.to_string(),
+                    cache_control: None,
+                },
+                ContentBlock::ToolResult {
+                    tool_use_id: "x".to_string(),
+                    content: "[verification replay] pass=true".to_string(),
+                    is_error: None,
+                    content_blocks: None,
+                },
+            ],
+        }
+    }
+
+    fn tool_result_msg(content: &str) -> Message {
+        Message {
+            role: "tool".to_string(),
+            content: vec![ContentBlock::ToolResult {
+                tool_use_id: "x".to_string(),
+                content: content.to_string(),
+                is_error: None,
+                content_blocks: None,
+            }],
+        }
+    }
+
+    #[test]
+    fn scan_returns_goal_for_latest_user_text() {
+        let messages = vec![
+            user_text_msg("first"),
+            tool_result_msg("a"),
+            user_text_msg("second"),
+            tool_result_msg("b"),
+            user_text_msg("third"),
+        ];
+        let scan = scan_canonical_inputs(&messages, false);
+        // Goal should be the most recent user text.
+        let goal = scan.goal.expect("goal");
+        assert!(
+            goal.contains("third"),
+            "expected the most recent, got {goal}"
+        );
+        assert_eq!(scan.latest_user_text_idx, Some(4));
+    }
+
+    #[test]
+    fn scan_collects_up_to_four_facts_newest_first() {
+        let messages = vec![
+            tool_result_msg("fact-A"),
+            tool_result_msg("fact-B"),
+            tool_result_msg("fact-C"),
+            tool_result_msg("fact-D"),
+            tool_result_msg("fact-E"),
+        ];
+        let scan = scan_canonical_inputs(&messages, false);
+        assert_eq!(scan.confirmed_facts.len(), 4);
+        // The four most recent (newest first) are E, D, C, B.
+        assert!(scan.confirmed_facts[0].contains("fact-E"));
+        assert!(scan.confirmed_facts[1].contains("fact-D"));
+        assert!(scan.confirmed_facts[2].contains("fact-C"));
+        assert!(scan.confirmed_facts[3].contains("fact-B"));
+    }
+
+    #[test]
+    fn scan_skips_error_results() {
+        let messages = vec![
+            tool_result_msg("good-A"),
+            tool_result_msg("Error: bad"),
+            tool_result_msg("good-B"),
+        ];
+        let scan = scan_canonical_inputs(&messages, false);
+        assert_eq!(scan.confirmed_facts.len(), 2);
+        assert!(scan.confirmed_facts[0].contains("good-B"));
+        assert!(scan.confirmed_facts[1].contains("good-A"));
+    }
+
+    #[test]
+    fn scan_finds_latest_verified_user_message() {
+        let messages = vec![
+            user_text_msg("first"),
+            user_with_verified_replay("verified"),
+            user_text_msg("third"),
+        ];
+        let scan = scan_canonical_inputs(&messages, true);
+        // The verified marker is on the *middle* message, not the most
+        // recent. The scan should report its actual position.
+        assert_eq!(scan.latest_verified_user_idx, Some(1));
+        // The goal still points at the most recent user text.
+        assert!(scan.goal.as_deref().unwrap_or("").contains("third"));
+    }
+
+    #[test]
+    fn scan_handles_empty_input() {
+        let scan = scan_canonical_inputs(&[], false);
+        assert!(scan.goal.is_none());
+        assert!(scan.latest_verified_user_idx.is_none());
+        assert!(scan.latest_user_text_idx.is_none());
+        assert!(scan.confirmed_facts.is_empty());
+    }
+
+    #[test]
+    fn scan_early_exits_when_complete() {
+        // 1000 tool results — the scan should stop walking once the
+        // first 4 facts and a goal are found. We can't directly assert
+        // "didn't visit every element" without instrumentation, but the
+        // call must return promptly with the right slice. We pass
+        // `find_verified=false` so the scan does not have to keep
+        // walking looking for a verified user message that isn't there.
+        let mut messages: Vec<Message> = (0..1000)
+            .map(|i| tool_result_msg(&format!("fact-{i}")))
+            .collect();
+        // Most recent user message comes last.
+        messages.push(user_text_msg("goal"));
+        let scan = scan_canonical_inputs(&messages, false);
+        assert!(scan.goal.as_deref().unwrap_or("").contains("goal"));
+        assert_eq!(scan.confirmed_facts.len(), 4);
+    }
+}
@@ -525,10 +525,12 @@ pub(super) fn extract_compaction_summary_prompt(
    }
 }

+#[allow(dead_code)] // exposed for future engine-side callers; current call path goes through compaction::estimate_input_tokens_conservative via token_estimate_cache.
 fn estimate_text_tokens_conservative(text: &str) -> usize {
    text.chars().count().div_ceil(3)
 }

+#[allow(dead_code)] // see estimate_text_tokens_conservative above
 fn estimate_system_tokens_conservative(system: Option<&SystemPrompt>) -> usize {
    match system {
        Some(SystemPrompt::Text(text)) => estimate_text_tokens_conservative(text),
@@ -540,6 +542,7 @@ fn estimate_system_tokens_conservative(system: Option<&SystemPrompt>) -> usize {
    }
 }

+#[allow(dead_code)] // see estimate_text_tokens_conservative above
 pub(super) fn estimate_input_tokens_conservative(
    messages: &[Message],
    system: Option<&SystemPrompt>,
@@ -51,6 +51,24 @@ impl EngineHandle {
        }
    }

+    /// Pause or resume the current pausable command.
+    pub fn set_paused(&self, paused: bool) {
+        match self.shared_paused.lock() {
+            Ok(mut slot) => *slot = paused,
+            Err(poisoned) => *poisoned.into_inner() = paused,
+        }
+    }
+
+    /// Check whether the engine pause gate is set.
+    #[cfg(test)]
+    #[must_use]
+    pub fn is_paused(&self) -> bool {
+        match self.shared_paused.lock() {
+            Ok(slot) => *slot,
+            Err(poisoned) => *poisoned.into_inner(),
+        }
+    }
+
    /// Approve a pending tool call
    pub async fn approve_tool_call(&self, id: impl Into<String>) -> Result<()> {
        self.tx_approval
@@ -22,26 +22,6 @@ pub(super) struct ToolUseState {
    pub(super) input_buffer: String,
 }

-/// Default maximum time to wait for a single stream chunk before assuming a stall.
-/// **This is the idle timeout** — it resets on every SSE chunk, so long
-/// thinking turns that ARE producing reasoning_content stay alive. Only a
-/// genuine `chunk_timeout` window of silence kills the stream.
-const DEFAULT_STREAM_CHUNK_TIMEOUT_SECS: u64 = 300;
-const MIN_STREAM_CHUNK_TIMEOUT_SECS: u64 = 1;
-const MAX_STREAM_CHUNK_TIMEOUT_SECS: u64 = 3600;
-const STREAM_IDLE_TIMEOUT_ENV: &str = "DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS";
-
-/// Reads the shared stream idle-timeout override used by the SSE client.
-pub(super) fn stream_chunk_timeout_secs() -> u64 {
-    stream_chunk_timeout_secs_from_env(std::env::var(STREAM_IDLE_TIMEOUT_ENV).ok().as_deref())
-}
-
-fn stream_chunk_timeout_secs_from_env(value: Option<&str>) -> u64 {
-    value
-        .and_then(|v| v.parse::<u64>().ok())
-        .unwrap_or(DEFAULT_STREAM_CHUNK_TIMEOUT_SECS)
-        .clamp(MIN_STREAM_CHUNK_TIMEOUT_SECS, MAX_STREAM_CHUNK_TIMEOUT_SECS)
-}
 /// Maximum total bytes of text/thinking content before aborting the stream.
 pub(super) const STREAM_MAX_CONTENT_BYTES: usize = 10 * 1024 * 1024; // 10 MB
 /// Sanity backstop for total stream wall-clock duration. **Not** a routine
@@ -150,20 +130,3 @@ pub(crate) fn filter_tool_call_delta(delta: &str, in_tool_call: &mut bool) -> St

    output
 }
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn stream_chunk_timeout_defaults_and_clamps_env_values() {
-        assert_eq!(stream_chunk_timeout_secs_from_env(None), 300);
-        assert_eq!(
-            stream_chunk_timeout_secs_from_env(Some("not-a-number")),
-            300
-        );
-        assert_eq!(stream_chunk_timeout_secs_from_env(Some("0")), 1);
-        assert_eq!(stream_chunk_timeout_secs_from_env(Some("90")), 90);
-        assert_eq!(stream_chunk_timeout_secs_from_env(Some("99999")), 3600);
-    }
-}
@@ -3,6 +3,7 @@ use super::*;
 use super::context::TURN_MAX_OUTPUT_TOKENS;
 use crate::models::SystemBlock;
 use crate::test_support::lock_test_env;
+use crate::tools::plan::{PlanItemArg, PlanSnapshot, StepStatus};
 use crate::tools::spec::ToolCapability;
 use serde_json::json;
 use std::collections::{HashMap, HashSet};
@@ -84,6 +85,45 @@ fn build_engine_with_capacity(capacity: CapacityControllerConfig) -> Engine {
    engine
 }

+#[test]
+fn structured_state_block_includes_rich_plan_artifact() {
+    let state = StructuredState {
+        mode_label: "Plan".to_string(),
+        workspace: PathBuf::from("/workspace/codewhale"),
+        cwd: None,
+        working_set_summary: None,
+        todo_snapshot: None,
+        plan_snapshot: Some(PlanSnapshot {
+            objective: Some("Make Plan mode reviewable".to_string()),
+            context_summary: Some("Grounded in issue #2691".to_string()),
+            sources_used: vec!["gh issue view 2691".to_string()],
+            critical_files: vec!["crates/tui/src/tools/plan.rs".to_string()],
+            constraints: vec!["Preserve legacy payloads".to_string()],
+            recommended_approach: Some("Enrich update_plan".to_string()),
+            verification_plan: Some("Run focused tests".to_string()),
+            risks_and_unknowns: Some("Replay may drift".to_string()),
+            handoff_packet: Some("Next agent should inspect replay".to_string()),
+            items: vec![PlanItemArg {
+                step: "Render rich artifact".to_string(),
+                status: StepStatus::InProgress,
+            }],
+            ..PlanSnapshot::default()
+        }),
+        subagent_snapshots: Vec::new(),
+    };
+
+    let block = state.to_system_block().expect("fork state block");
+
+    assert!(block.contains("Objective: Make Plan mode reviewable"));
+    assert!(block.contains("Context: Grounded in issue #2691"));
+    assert!(block.contains("Source: gh issue view 2691"));
+    assert!(block.contains("Critical file: crates/tui/src/tools/plan.rs"));
+    assert!(block.contains("Constraint: Preserve legacy payloads"));
+    assert!(block.contains("Verification plan: Run focused tests"));
+    assert!(block.contains("Handoff packet: Next agent should inspect replay"));
+    assert!(block.contains("- [~] Render rich artifact"));
+}
+
 #[test]
 fn env_only_auth_error_gets_recovery_hint() {
    let _guard = lock_test_env();
@@ -263,7 +303,7 @@ fn refresh_system_prompt_uses_runtime_goal_state() {
        goal.create("Close the runtime goal loop".to_string(), None);
    }

-    engine.refresh_system_prompt(AppMode::Agent);
+    engine.refresh_system_prompt();
    let prompt = match engine.session.system_prompt {
        Some(SystemPrompt::Text(text)) => text,
        Some(SystemPrompt::Blocks(blocks)) => blocks
@@ -465,116 +505,36 @@ fn tool_exec_outcome_tracks_duration() {
 #[test]
 fn core_native_tools_stay_loaded_in_yolo_mode() {
    let always_load = HashSet::new();
-    assert!(!should_default_defer_tool(
-        "exec_shell",
-        AppMode::Yolo,
-        &always_load
-    ));
+    assert!(!should_default_defer_tool("exec_shell", &always_load));
    // git_blame remains deferred (read-only git history beyond log/show/diff).
-    assert!(should_default_defer_tool(
-        "git_blame",
-        AppMode::Yolo,
-        &always_load
-    ));
+    assert!(should_default_defer_tool("git_blame", &always_load));
 }

 #[test]
 fn non_yolo_mode_retains_default_defer_policy() {
    let always_load = HashSet::new();
-    assert!(!should_default_defer_tool(
-        "exec_shell",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "edit_file",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "apply_patch",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "fetch_url",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "git_diff",
-        AppMode::Agent,
-        &always_load
-    ));
+    assert!(!should_default_defer_tool("exec_shell", &always_load));
+    assert!(!should_default_defer_tool("edit_file", &always_load));
+    assert!(!should_default_defer_tool("apply_patch", &always_load));
+    assert!(!should_default_defer_tool("fetch_url", &always_load));
+    assert!(!should_default_defer_tool("git_diff", &always_load));
    // #2654: read-only git history joins the active set.
-    assert!(!should_default_defer_tool(
-        "git_log",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "git_show",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "git_status",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "run_tests",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "agent_open",
-        AppMode::Agent,
-        &always_load
-    ));
+    assert!(!should_default_defer_tool("git_log", &always_load));
+    assert!(!should_default_defer_tool("git_show", &always_load));
+    assert!(!should_default_defer_tool("git_status", &always_load));
+    assert!(!should_default_defer_tool("run_tests", &always_load));
+    assert!(!should_default_defer_tool("agent_open", &always_load));
    // #2605: the fetch/close side of the sub-agent surface must also stay
    // active so a first `agent_eval`/`agent_close` executes instead of
    // hydrating its schema and forcing a double-invoke.
-    assert!(!should_default_defer_tool(
-        "agent_eval",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "agent_close",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "read_file",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "web_search",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "write_file",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "task_shell_start",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(!should_default_defer_tool(
-        "task_shell_wait",
-        AppMode::Agent,
-        &always_load
-    ));
-    assert!(should_default_defer_tool(
-        "git_blame",
-        AppMode::Agent,
-        &always_load
-    ));
+    assert!(!should_default_defer_tool("agent_eval", &always_load));
+    assert!(!should_default_defer_tool("agent_close", &always_load));
+    assert!(!should_default_defer_tool("read_file", &always_load));
+    assert!(!should_default_defer_tool("web_search", &always_load));
+    assert!(!should_default_defer_tool("write_file", &always_load));
+    assert!(!should_default_defer_tool("task_shell_start", &always_load));
+    assert!(!should_default_defer_tool("task_shell_wait", &always_load));
+    assert!(should_default_defer_tool("git_blame", &always_load));
 }

 #[test]
@@ -775,11 +735,7 @@ fn agent_catalog_keeps_edit_file_loaded_when_fuzz_is_omitted() {
 #[test]
 fn tools_always_load_overrides_default_native_deferral() {
    let always_load = HashSet::from(["git_blame".to_string()]);
-    assert!(!should_default_defer_tool(
-        "git_blame",
-        AppMode::Agent,
-        &always_load
-    ));
+    assert!(!should_default_defer_tool("git_blame", &always_load));
 }

 #[test]
@@ -1755,15 +1711,20 @@ async fn change_mode_refreshes_session_prompt_and_updates_session() {
        .await
        .expect("send change mode");

-    let prompt = {
+    let (_prompt, messages) = {
        let mut rx = handle.rx_event.write().await;
        loop {
            let event = tokio::time::timeout(std::time::Duration::from_secs(1), rx.recv())
                .await
                .expect("session update after mode switch")
                .expect("event");
-            if let Event::SessionUpdated { system_prompt, .. } = event {
-                break match system_prompt.expect("system prompt") {
+            if let Event::SessionUpdated {
+                system_prompt,
+                messages,
+                ..
+            } = event
+            {
+                let prompt = match system_prompt.expect("system prompt") {
                    SystemPrompt::Text(text) => text,
                    SystemPrompt::Blocks(blocks) => blocks
                        .into_iter()
@@ -1771,17 +1732,102 @@ async fn change_mode_refreshes_session_prompt_and_updates_session() {
                        .collect::<Vec<_>>()
                        .join("\n"),
                };
+                break (prompt, messages);
            }
        }
    };
    run.abort();

-    assert!(prompt.contains("Mode: YOLO"));
-    assert!(prompt.contains("Approval Policy: Auto"));
+    assert!(
+        messages.iter().all(|message| message.role != "system"),
+        "mode switch must not persist appended system messages: {messages:?}"
+    );
+    assert!(
+        messages.iter().all(|message| {
+            message.content.iter().all(|block| {
+                !matches!(
+                    block,
+                    ContentBlock::Text { text, .. }
+                        if text.contains("<runtime_prompt")
+                )
+            })
+        }),
+        "runtime prompt tags should be request-time metadata, not session history"
+    );
+}
+
+#[test]
+fn turn_approval_mode_prefers_auto_approve_flag() {
+    use crate::tui::approval::ApprovalMode;
+
+    assert_eq!(
+        agent_approval_mode_for_turn(true, ApprovalMode::Suggest),
+        ApprovalMode::Auto
+    );
+    assert_eq!(
+        approval_mode_for(
+            AppMode::Agent,
+            agent_approval_mode_for_turn(true, ApprovalMode::Never),
+        ),
+        ApprovalMode::Auto
+    );
+    assert_eq!(
+        approval_mode_for(AppMode::Yolo, ApprovalMode::Suggest),
+        ApprovalMode::Auto
+    );
+    assert_eq!(
+        approval_mode_for(AppMode::Plan, ApprovalMode::Auto),
+        ApprovalMode::Never
+    );
+}
+
+#[test]
+fn runtime_prompt_is_projected_without_persisting_to_session_messages() {
+    use crate::tui::approval::ApprovalMode;
+
+    let tmp = tempdir().expect("tempdir");
+    let config = EngineConfig {
+        workspace: tmp.path().to_path_buf(),
+        ..Default::default()
+    };
+    let (mut engine, _handle) = Engine::new(config, &Config::default());
+    engine.current_mode = AppMode::Plan;
+    engine.session.approval_mode = ApprovalMode::Suggest;
+    engine.session.messages = vec![Message {
+        role: "user".to_string(),
+        content: vec![ContentBlock::Text {
+            text: "summary after compaction".to_string(),
+            cache_control: None,
+        }],
+    }];
+    let stored = engine.session.messages.clone();
+
+    let request_messages = engine.messages_with_turn_metadata();
+
+    assert_eq!(engine.session.messages, stored);
+    assert_eq!(request_messages.len(), stored.len() + 1);
+    assert!(
+        request_messages
+            .iter()
+            .all(|message| message.role != "system"),
+        "runtime prompts must not create appended system messages"
+    );
+    let runtime = request_messages.last().expect("runtime prompt message");
+    assert_eq!(runtime.role, "user");
+    let ContentBlock::Text { text, .. } = runtime.content.first().expect("runtime prompt text")
+    else {
+        panic!("expected text runtime prompt");
+    };
+    assert!(text.contains("<runtime_prompt"));
+    assert!(text.contains("mode=\"plan\""));
+    assert!(
+        text.contains("approval=\"never\""),
+        "Plan mode should project its fixed never-approval policy: {text}"
+    );
 }

 #[tokio::test]
-async fn change_mode_op_injects_runtime_event_into_session_messages() {
+async fn change_mode_op_updates_current_mode_and_emits_status() {
    let tmp = tempdir().expect("tempdir");
    let config = EngineConfig {
        workspace: tmp.path().to_path_buf(),
@@ -1791,7 +1837,6 @@ async fn change_mode_op_injects_runtime_event_into_session_messages() {
    let (engine, handle) = Engine::new(config, &Config::default());

    let run = tokio::spawn(engine.run());
-    // Switch from default Agent → YOLO
    handle
        .send(Op::ChangeMode {
            mode: AppMode::Yolo,
@@ -1799,40 +1844,41 @@ async fn change_mode_op_injects_runtime_event_into_session_messages() {
        .await
        .expect("send change mode");

-    // Collect session-updated events until we see the injected message
-    let messages = {
-        let mut rx = handle.rx_event.write().await;
-        loop {
-            let event = tokio::time::timeout(std::time::Duration::from_secs(2), rx.recv())
-                .await
-                .expect("session update after mode switch")
-                .expect("event");
-            if let Event::SessionUpdated { messages, .. } = event {
-                // The last message should be our runtime event
-                if let Some(last) = messages.last()
-                    && let ContentBlock::Text { text, .. } =
-                        last.content.first().expect("text block")
-                    && text.contains("kind=\"mode_change\"")
-                {
-                    break messages;
-                }
-            }
-        }
+    // Expect a SessionUpdated event confirming the mode change (the
+    // per-turn <runtime_prompt> tag carries the mode in every request,
+    // so no separate persistence of a mode_change runtime event is needed).
+    let mut rx = handle.rx_event.write().await;
+    let session_updated = tokio::time::timeout(std::time::Duration::from_secs(2), rx.recv())
+        .await
+        .expect("session update after mode switch")
+        .expect("event");
+    let Event::SessionUpdated { messages, .. } = session_updated else {
+        panic!("should emit SessionUpdated after mode change, got: {session_updated:?}");
    };
-    run.abort();
+    assert!(
+        messages.iter().all(|message| {
+            message.content.iter().all(|block| {
+                !matches!(
+                    block,
+                    ContentBlock::Text { text, .. }
+                        if text.contains("<runtime_prompt")
+                )
+            })
+        }),
+        "runtime prompt tags must not be persisted into session messages after mode change"
+    );

-    let last = messages.last().expect("at least one message");
-    let ContentBlock::Text { text, .. } = last.content.first().expect("text block") else {
-        panic!("expected text block");
-    };
+    // Also expect a status event
+    let status = tokio::time::timeout(std::time::Duration::from_secs(2), rx.recv())
+        .await
+        .expect("status after mode switch")
+        .expect("event");
    assert!(
-        text.contains("Agent mode") && text.contains("YOLO mode"),
-        "should contain both previous and new mode: {text}"
-    );
-    assert!(
-        text.contains("Re-evaluate"),
-        "should tell agent to re-evaluate: {text}"
+        matches!(status, Event::Status { .. }),
+        "should emit Status after mode change, got: {status:?}"
    );
+
+    run.abort();
 }

 #[test]
@@ -2176,7 +2222,7 @@ fn refresh_system_prompt_leaves_working_set_out_of_system_prompt() {
        .working_set
        .observe_user_message("please inspect src/lib.rs", tmp.path());

-    engine.refresh_system_prompt(AppMode::Agent);
+    engine.refresh_system_prompt();

    let prompt = match &engine.session.system_prompt {
        Some(SystemPrompt::Text(text)) => text.clone(),
@@ -2210,11 +2256,11 @@ fn working_set_reaches_model_as_turn_metadata() {
    engine.session.add_message(user_msg);

    let messages = engine.messages_with_turn_metadata();
-    let first_block = messages
-        .last()
-        .and_then(|message| message.content.first())
+    let last_block = messages
+        .first()
+        .and_then(|message| message.content.last())
        .expect("turn metadata block");
-    let ContentBlock::Text { text, .. } = first_block else {
+    let ContentBlock::Text { text, .. } = last_block else {
        panic!("expected text metadata block");
    };
    assert!(text.starts_with("<turn_meta>\n"));
@@ -2235,11 +2281,11 @@ fn turn_metadata_includes_current_local_date_without_working_set() {
    engine.session.add_message(user_msg);

    let messages = engine.messages_with_turn_metadata();
-    let first_block = messages
-        .last()
-        .and_then(|message| message.content.first())
+    let last_block = messages
+        .first()
+        .and_then(|message| message.content.last())
        .expect("turn metadata block");
-    let ContentBlock::Text { text, .. } = first_block else {
+    let ContentBlock::Text { text, .. } = last_block else {
        panic!("expected text metadata block");
    };

@@ -2266,8 +2312,8 @@ fn turn_metadata_includes_auto_model_route() {
        Some("max"),
        true,
    );
-    let first_block = user_msg.content.first().expect("turn metadata block");
-    let ContentBlock::Text { text, .. } = first_block else {
+    let last_block = user_msg.content.last().expect("turn metadata block");
+    let ContentBlock::Text { text, .. } = last_block else {
        panic!("expected text metadata block");
    };

@@ -2294,8 +2340,11 @@ fn turn_metadata_includes_current_mode() {
        None,
        false,
    );
-    let first_block = user_msg.content.first().expect("turn metadata block");
-    let ContentBlock::Text { text, .. } = first_block else {
+    // turn_meta was relocated to the tail of the user message in #2517
+    // to keep the leading bytes (user input) stable across date / model
+    // route / working-set changes.
+    let last_block = user_msg.content.last().expect("turn metadata block");
+    let ContentBlock::Text { text, .. } = last_block else {
        panic!("expected text metadata block");
    };

@@ -2314,10 +2363,11 @@ fn turn_metadata_mode_updates_with_change_mode_op() {
    };
    let (mut engine, _handle) = Engine::new(config, &Config::default());

-    // In agent mode by default
+    // In agent mode by default. The turn_meta block now sits at the
+    // *tail* of the user message (see #2517) so we read `content.last()`.
    let msg = engine.user_text_message_with_turn_metadata("hello".to_string());
-    let first_block = msg.content.first().expect("turn metadata block");
-    let ContentBlock::Text { text, .. } = first_block else {
+    let last_block = msg.content.last().expect("turn metadata block");
+    let ContentBlock::Text { text, .. } = last_block else {
        panic!("expected text metadata block");
    };
    assert!(
@@ -2328,8 +2378,8 @@ fn turn_metadata_mode_updates_with_change_mode_op() {
    // Switch to YOLO — user_text_message_with_turn_metadata should reflect the new mode
    engine.current_mode = AppMode::Yolo;
    let msg = engine.user_text_message_with_turn_metadata("hello again".to_string());
-    let first_block = msg.content.first().expect("turn metadata block");
-    let ContentBlock::Text { text, .. } = first_block else {
+    let last_block = msg.content.last().expect("turn metadata block");
+    let ContentBlock::Text { text, .. } = last_block else {
        panic!("expected text metadata block");
    };
    assert!(
@@ -2339,29 +2389,54 @@ fn turn_metadata_mode_updates_with_change_mode_op() {
 }

 #[test]
-fn mode_change_runtime_message_format() {
-    let msg = Engine::mode_change_runtime_message(AppMode::Agent, AppMode::Yolo);
-
-    assert_eq!(msg.role, "user");
-    let ContentBlock::Text { text, .. } = msg.content.first().expect("text block") else {
-        panic!("expected text block");
+fn current_mode_field_assignment_takes_effect_synchronously() {
+    // Basic unit-level invariant: the current_mode field mutates as expected
+    // and the per-turn <runtime_prompt> tag reflects the current mode.
+    // Op::ChangeMode dispatch through the run loop is exercised by the
+    // integration test change_mode_op_updates_current_mode_and_emits_status.
+    let tmp = tempdir().expect("tempdir");
+    let config = EngineConfig {
+        workspace: tmp.path().to_path_buf(),
+        model: "deepseek-v4-pro".to_string(),
+        ..Default::default()
    };
+    let (mut engine, _handle) = Engine::new(config, &Config::default());
+    assert_eq!(engine.current_mode, AppMode::Agent);

+    // Verify runtime tag in Agent mode
+    let agent_messages = engine.messages_with_turn_metadata();
+    let agent_tag = agent_messages.last().expect("runtime tag message");
+    let ContentBlock::Text {
+        text: agent_text, ..
+    } = agent_tag.content.first().expect("text block")
+    else {
+        panic!("expected text runtime tag in Agent mode");
+    };
    assert!(
-        text.contains("codewhale:runtime_event"),
-        "should be a runtime event message"
+        agent_text.contains("mode=\"agent\""),
+        "Agent mode should produce runtime tag with mode=\"agent\", got: {agent_text}"
+    );
+
+    // Switch to YOLO
+    engine.current_mode = AppMode::Yolo;
+    assert_eq!(engine.current_mode, AppMode::Yolo);
+
+    // Verify runtime tag reflects the YOLO mode with auto approval
+    let yolo_messages = engine.messages_with_turn_metadata();
+    let yolo_tag = yolo_messages.last().expect("runtime tag message");
+    let ContentBlock::Text {
+        text: yolo_text, ..
+    } = yolo_tag.content.first().expect("text block")
+    else {
+        panic!("expected text runtime tag in YOLO mode");
+    };
+    assert!(
+        yolo_text.contains("mode=\"yolo\""),
+        "YOLO mode should produce runtime tag with mode=\"yolo\", got: {yolo_text}"
    );
    assert!(
-        text.contains("kind=\"mode_change\""),
-        "should have mode_change kind"
-    );
-    assert!(
-        text.contains("Agent mode") && text.contains("YOLO mode"),
-        "should mention both previous and new mode: {text}"
-    );
-    assert!(
-        text.contains("Re-evaluate"),
-        "should tell agent to re-evaluate blocked operations: {text}"
+        yolo_text.contains("approval=\"auto\""),
+        "YOLO mode should project auto approval in runtime tag, got: {yolo_text}"
    );
 }

@@ -2377,10 +2452,10 @@ fn user_text_message_keeps_current_turn_input_after_turn_metadata() {
    let user_msg =
        engine.user_text_message_with_turn_metadata("explain the cache metrics".to_string());

-    let last_text = user_msg
+    // User text is now at position 0, turn_meta at position 1.
+    let first_text = user_msg
        .content
        .iter()
-        .rev()
        .find_map(|block| {
            if let ContentBlock::Text { text, .. } = block {
                Some(text.as_str())
@@ -2389,7 +2464,7 @@ fn user_text_message_keeps_current_turn_input_after_turn_metadata() {
            }
        })
        .expect("user text block");
-    assert_eq!(last_text, "explain the cache metrics");
+    assert_eq!(first_text, "explain the cache metrics");
 }

 #[test]
@@ -2411,7 +2486,16 @@ fn messages_with_turn_metadata_preserves_stored_messages_for_prefix_cache() {
    let first_user = engine.user_text_message_with_turn_metadata("inspect src/lib.rs".to_string());
    engine.session.add_message(first_user.clone());
    let first_request = engine.messages_with_turn_metadata();
-    assert_eq!(first_request, engine.session.messages);
+    assert_eq!(
+        &first_request[..engine.session.messages.len()],
+        engine.session.messages.as_slice()
+    );
+    assert_eq!(first_request.len(), engine.session.messages.len() + 1);
+    assert_eq!(first_request.first(), Some(&first_user));
+    assert_eq!(
+        first_request.last().map(|message| message.role.as_str()),
+        Some("user")
+    );

    engine.session.add_message(Message {
        role: "assistant".to_string(),
@@ -2428,14 +2512,24 @@ fn messages_with_turn_metadata_preserves_stored_messages_for_prefix_cache() {
    engine.session.add_message(second_user);

    let second_request = engine.messages_with_turn_metadata();
-    assert_eq!(second_request, engine.session.messages);
+    assert_eq!(
+        &second_request[..engine.session.messages.len()],
+        engine.session.messages.as_slice()
+    );
+    assert_eq!(second_request.len(), engine.session.messages.len() + 1);
    assert_eq!(second_request.first(), Some(&first_user));
+    let runtime = second_request.last().expect("runtime prompt");
+    let ContentBlock::Text { text, .. } = runtime.content.first().expect("runtime prompt text")
+    else {
+        panic!("expected runtime prompt text");
+    };
+    assert!(text.contains("<runtime_prompt"));
 }

 /// v0.8.11 regression: tool-result messages serialize to role="tool" on
 /// the wire but are stored as role="user" internally. `<turn_meta>` must
-/// be stored only on actual user-text messages, not retroactively added
-/// to tool-result messages at request time.
+/// be stored only on actual user-text messages. Request-time runtime metadata
+/// is appended separately and must not mutate tool-result messages.
 #[test]
 fn turn_metadata_skips_tool_result_messages() {
    let tmp = tempdir().expect("tempdir");
@@ -2478,9 +2572,11 @@ fn turn_metadata_skips_tool_result_messages() {

    let messages = engine.messages_with_turn_metadata();

-    // The trailing message is the tool result and MUST be untouched —
+    // The stored trailing message is the tool result and MUST be untouched —
    // no Text block sneaking in front of the ToolResult block.
-    let trailing = messages.last().expect("trailing message");
+    let trailing = messages
+        .get(messages.len().saturating_sub(2))
+        .expect("stored trailing message");
    assert_eq!(trailing.role, "user");
    assert_eq!(trailing.content.len(), 1);
    assert!(matches!(
@@ -2488,20 +2584,72 @@ fn turn_metadata_skips_tool_result_messages() {
        Some(ContentBlock::ToolResult { .. })
    ));

-    // The earlier real user message already carries the turn_meta prefix.
+    // The earlier real user message carries user text first, turn_meta last.
    let real_user = messages.first().expect("first user message");
    assert_eq!(real_user.role, "user");
    let ContentBlock::Text { text, .. } = real_user.content.first().expect("user text content")
    else {
        panic!("expected Text block on real user message");
    };
-    assert!(text.starts_with("<turn_meta>\n"));
-    assert!(text.contains("src/lib.rs"));
+    assert_eq!(text, "inspect src/lib.rs");
+    // turn_meta is at the tail of the content array.
+    let last_block = real_user.content.last().expect("turn_meta block");
+    let ContentBlock::Text { text: meta, .. } = last_block else {
+        panic!("expected Text block for turn_meta at tail");
+    };
+    assert!(meta.starts_with("<turn_meta>\n"));
+    assert!(meta.contains("src/lib.rs"));
+    assert!(
+        matches!(
+            messages.last().and_then(|message| message.content.first()),
+            Some(ContentBlock::Text { text, .. }) if text.contains("<runtime_prompt")
+        ),
+        "request projection should append transient runtime metadata"
+    );
+}
+
+/// User text must appear before turn_meta in the content array so that
+/// the leading bytes of each user message stay stable across date changes.
+/// DeepSeek's KV prefix cache matches byte sequences from the start of
+/// each message; placing the volatile date-bearing turn_meta at position
+/// 0 would invalidate the entire user message prefix at every date
+/// boundary. Moving it to the tail preserves the user-input prefix.
+#[test]
+fn user_message_turn_meta_is_appended_not_prepended() {
+    let tmp = tempdir().expect("tempdir");
+    let config = EngineConfig {
+        workspace: tmp.path().to_path_buf(),
+        ..Default::default()
+    };
+    let (engine, _handle) = Engine::new(config, &Config::default());
+
+    let msg = engine.user_text_message_with_turn_metadata("hello world".to_string());
+    assert_eq!(msg.role, "user");
+    assert_eq!(msg.content.len(), 2);
+
+    // First content block: user text.
+    let ContentBlock::Text { text, .. } = &msg.content[0] else {
+        panic!("expected Text block at position 0");
+    };
+    assert_eq!(text, "hello world");
+
+    // Second content block: turn_meta.
+    let ContentBlock::Text { text: meta, .. } = &msg.content[1] else {
+        panic!("expected Text block for turn_meta at position 1");
+    };
+    assert!(
+        meta.starts_with("<turn_meta>\n"),
+        "turn_meta must be at the tail"
+    );
+    assert!(
+        meta.contains("Current local date:"),
+        "turn_meta must contain the date"
+    );
 }

 /// When the turn is mid-execution and the trailing user message is a
-/// tool result, no turn_meta is injected at request time. The working_set
-/// surfaces again on the next stored user-text message.
+/// tool result, no turn_meta is injected into that tool-result message. The
+/// working_set surfaces again on the next stored user-text message.
 #[test]
 fn turn_metadata_skips_when_only_tool_results_trail() {
    let tmp = tempdir().expect("tempdir");
@@ -2534,14 +2682,21 @@ fn turn_metadata_skips_when_only_tool_results_trail() {

    let messages = engine.messages_with_turn_metadata();

-    // Returned unchanged: the single tool-result message, no Text
-    // prefix, content length == 1.
-    let only = messages.last().expect("trailing message");
+    // Stored tool-result message is unchanged: no Text prefix, content length == 1.
+    let only = messages.first().expect("stored tool result message");
    assert_eq!(only.content.len(), 1);
    assert!(matches!(
        only.content.first(),
        Some(ContentBlock::ToolResult { .. })
    ));
+    assert_eq!(messages.len(), 2);
+    assert!(
+        matches!(
+            messages.last().and_then(|message| message.content.first()),
+            Some(ContentBlock::Text { text, .. }) if text.contains("<runtime_prompt")
+        ),
+        "request projection should still append transient runtime metadata"
+    );
 }

 #[test]
@@ -2553,10 +2708,10 @@ fn refresh_system_prompt_is_noop_when_unchanged() {
    };
    let (mut engine, _handle) = Engine::new(config, &Config::default());

-    engine.refresh_system_prompt(AppMode::Agent);
+    engine.refresh_system_prompt();
    let first_hash = engine.session.last_system_prompt_hash;
    let first_prompt = engine.session.system_prompt.clone();
-    engine.refresh_system_prompt(AppMode::Agent);
+    engine.refresh_system_prompt();

    assert_eq!(engine.session.last_system_prompt_hash, first_hash);
    assert_eq!(engine.session.system_prompt, first_prompt);
@@ -2603,7 +2758,7 @@ fn text_system_prompt_override_via_runtime_sync_survives_refresh() {
    let expected = Some(prompt.clone());

    sync_runtime_system_prompt_override(&mut engine, prompt);
-    engine.refresh_system_prompt(AppMode::Agent);
+    engine.refresh_system_prompt();

    assert_eq!(engine.session.system_prompt, expected);
 }
@@ -2624,7 +2779,7 @@ fn blocks_system_prompt_override_via_runtime_sync_survives_mode_change_refresh()
    let expected = Some(prompt.clone());

    sync_runtime_system_prompt_override(&mut engine, prompt);
-    engine.refresh_system_prompt(AppMode::Plan);
+    engine.refresh_system_prompt();

    assert_eq!(engine.session.system_prompt, expected);
 }
@@ -2644,7 +2799,7 @@ fn compaction_summary_stays_in_stable_system_prompt() {
        .session
        .working_set
        .observe_user_message("continue in src/main.rs", tmp.path());
-    engine.refresh_system_prompt(AppMode::Agent);
+    engine.refresh_system_prompt();
    engine.merge_compaction_summary(Some(SystemPrompt::Blocks(vec![SystemBlock {
        block_type: "text".to_string(),
        text: format!("{COMPACTION_SUMMARY_MARKER}\nsummary"),
@@ -2797,7 +2952,6 @@ async fn post_tool_replay_invoked_when_high_non_severe_risk() {
    let restarted = engine
        .run_capacity_post_tool_checkpoint(
            &turn,
-            AppMode::Agent,
            Some(&registry),
            Arc::new(RwLock::new(())),
            None,
@@ -2858,7 +3012,7 @@ async fn error_escalation_triggers_replan_when_severe_or_repeated_failures() {
    let before_len = engine.session.messages.len();
    let turn = TurnContext::new(10);
    let restarted = engine
-        .run_capacity_error_escalation_checkpoint(&turn, AppMode::Agent, 2, 2, &[])
+        .run_capacity_error_escalation_checkpoint(&turn, 2, 2, &[])
        .await;

    assert!(restarted);
@@ -2916,7 +3070,7 @@ async fn capacity_disabled_by_default_keeps_messages_intact() {
    let before_len = engine.session.messages.len();
    let turn = TurnContext::new(10);
    let restarted = engine
-        .run_capacity_error_escalation_checkpoint(&turn, AppMode::Agent, 2, 2, &[])
+        .run_capacity_error_escalation_checkpoint(&turn, 2, 2, &[])
        .await;

    // Capacity is disabled → no replan, no message clear.
@@ -3747,9 +3901,10 @@ async fn post_edit_hook_injects_diagnostics_message_before_next_request() {

    let last = engine.session.messages.last().expect("message appended");
    assert_eq!(last.role, "user");
-    let meta = match &last.content[0] {
-        crate::models::ContentBlock::Text { text, .. } => text.clone(),
-        other => panic!("expected text block, got {other:?}"),
+    // turn_meta is now at the tail of the content array (PR #2517).
+    let meta = match last.content.last() {
+        Some(crate::models::ContentBlock::Text { text, .. }) => text.clone(),
+        other => panic!("expected text block at tail, got {other:?}"),
    };
    assert!(meta.starts_with("<turn_meta>\n"));
    let diagnostic_text = last
@@ -0,0 +1,312 @@
+//! Process-local memoization for [`crate::compaction::estimate_input_tokens_conservative`].
+//!
+//! The token estimator walks the full [`crate::models::Message`] history and the
+//! active system prompt, which is by far the most expensive per-turn CPU cost
+//! in the engine hot path. The same input data is queried from at least five
+//! sites per turn: capacity pre/post tool checkpoints, error escalation,
+//! the seam manager, and the trimmed-message budget check, plus four more
+//! from the TUI footer, `/status`, `/debug`, and the context inspector.
+//!
+//! Without memoization, a 200-message history with 5 KB of tool results costs
+//! ~2 ms per call; that is 20 ms of pure waste on a single turn. The estimator
+//! itself is a pure function of `(messages, system_prompt)`, so a
+//! content-versioned cache is safe: the caller bumps `messages_revision`
+//! on every mutation, and we also include a fast fingerprint of the system
+//! prompt as part of the key.
+//!
+//! The cache is process-local only — cross-session persistence is intentionally
+//! out of scope (see PR #2520 for the cross-session prompt-base disk cache).
+
+use std::collections::hash_map::DefaultHasher;
+use std::hash::{Hash, Hasher};
+
+use crate::compaction::estimate_input_tokens_conservative;
+use crate::models::{Message, SystemPrompt};
+
+/// Default capacity for the rolling audit ring. Sized so a 64-entry window
+/// covers a full capacity controller observation cycle without unbounded
+/// growth on long-running sessions.
+const AUDIT_RING_CAPACITY: usize = 64;
+
+/// Process-local memoization for `estimate_input_tokens_conservative`.
+///
+/// The cache is keyed on the `(messages_revision, system_fingerprint)`
+/// pair, both of which the engine bumps on every content change. On a hit
+/// the previously stored token estimate is returned without re-walking the
+/// message list. On a miss, the estimator runs and the result is stored
+/// alongside the audit ring entry.
+#[derive(Debug, Default, Clone)]
+pub struct TokenEstimateCache {
+    /// Monotonic counter bumped by the engine on every message mutation.
+    messages_revision: u64,
+    /// Stable 64-bit hash of the current system prompt text. Computed once
+    /// per `lookup_or_compute` call when the cache misses.
+    system_fingerprint: u64,
+    /// Cached token count, valid iff both keys match the current inputs.
+    cached_tokens: Option<usize>,
+    /// Audit ring of recent (revision, tokens) pairs. The most recent entry
+    /// is the tail; the oldest is dropped when capacity is exceeded. Used by
+    /// observability to surface cache effectiveness to `/status`.
+    audit_ring: Vec<(u64, usize)>,
+    /// Number of cache hits since the cache was last cleared. Saturates at
+    /// `u64::MAX` (effectively never in practice).
+    hits: u64,
+    /// Number of cache misses since the cache was last cleared.
+    misses: u64,
+}
+
+impl TokenEstimateCache {
+    /// Construct a fresh, empty cache. `messages_revision` defaults to 0; the
+    /// engine must call [`bump_messages_revision`](Self::bump_messages_revision)
+    /// whenever a mutation occurs so the next lookup correctly invalidates.
+    #[must_use]
+    pub fn new() -> Self {
+        Self::default()
+    }
+
+    /// Returns the cached token estimate, recomputing on miss.
+    ///
+    /// `messages_revision` is the engine's monotonic counter; bump it on
+    /// every add/remove/clear. `system_prompt` may be `None`. `messages` is
+    /// borrowed for the duration of the call so a miss can re-tokenize.
+    pub fn lookup_or_compute(
+        &mut self,
+        messages_revision: u64,
+        system_prompt: Option<&SystemPrompt>,
+        messages: &[Message],
+    ) -> usize {
+        let system_fingerprint = fingerprint_system_prompt(system_prompt);
+
+        if self.messages_revision == messages_revision
+            && self.system_fingerprint == system_fingerprint
+            && let Some(tokens) = self.cached_tokens
+        {
+            self.hits = self.hits.saturating_add(1);
+            return tokens;
+        }
+
+        let tokens = estimate_input_tokens_conservative(messages, system_prompt);
+        self.messages_revision = messages_revision;
+        self.system_fingerprint = system_fingerprint;
+        self.cached_tokens = Some(tokens);
+        self.misses = self.misses.saturating_add(1);
+        self.push_audit(messages_revision, tokens);
+        tokens
+    }
+
+    /// Record a messages-revision bump. The engine calls this whenever
+    /// `session.messages` is mutated. Calling it with a value smaller than
+    /// the current value is a no-op (the cache is monotonic).
+    #[allow(dead_code)] // exposed for future wiring of /clear and reset paths; tests exercise it
+    pub fn bump_messages_revision(&mut self, revision: u64) {
+        if revision > self.messages_revision {
+            self.messages_revision = revision;
+            self.cached_tokens = None;
+        }
+    }
+
+    /// Forget all cached state. Used by `/clear` and session reset paths.
+    #[allow(dead_code)] // exposed for future wiring of /clear and reset paths; tests exercise it
+    pub fn invalidate(&mut self) {
+        self.cached_tokens = None;
+        self.system_fingerprint = 0;
+        self.audit_ring.clear();
+        self.hits = 0;
+        self.misses = 0;
+    }
+
+    /// Returns `(hits, misses)` counters since the last `invalidate` call.
+    #[allow(dead_code)] // surfaced via /status in a follow-up; tests exercise it
+    #[must_use]
+    pub fn stats(&self) -> (u64, u64) {
+        (self.hits, self.misses)
+    }
+
+    /// Returns the most recent `(revision, tokens)` audit entries, newest
+    /// first. Bounded by [`AUDIT_RING_CAPACITY`].
+    #[allow(dead_code)] // surfaced via /status in a follow-up; tests exercise it
+    #[must_use]
+    pub fn recent_audit(&self) -> &[(u64, usize)] {
+        &self.audit_ring
+    }
+
+    fn push_audit(&mut self, revision: u64, tokens: usize) {
+        if self.audit_ring.len() >= AUDIT_RING_CAPACITY {
+            self.audit_ring.remove(0);
+        }
+        self.audit_ring.push((revision, tokens));
+    }
+}
+
+/// Stable 64-bit hash of the system prompt text. Walks the same shape the
+/// estimator consumes: a `Text` variant or a list of `Blocks`. Returns 0
+/// for `None` so the empty case is distinguishable but cheap to compare.
+fn fingerprint_system_prompt(system: Option<&SystemPrompt>) -> u64 {
+    let Some(system) = system else {
+        return 0;
+    };
+    let mut hasher = DefaultHasher::new();
+    match system {
+        SystemPrompt::Text(text) => {
+            "text".hash(&mut hasher);
+            text.hash(&mut hasher);
+        }
+        SystemPrompt::Blocks(blocks) => {
+            "blocks".hash(&mut hasher);
+            blocks.len().hash(&mut hasher);
+            for block in blocks {
+                block.block_type.hash(&mut hasher);
+                block.text.hash(&mut hasher);
+            }
+        }
+    }
+    hasher.finish()
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::models::{ContentBlock, SystemBlock};
+
+    fn user_text(s: &str) -> Message {
+        Message {
+            role: "user".to_string(),
+            content: vec![ContentBlock::Text {
+                text: s.to_string(),
+                cache_control: None,
+            }],
+        }
+    }
+
+    fn sys_text(s: &str) -> SystemPrompt {
+        SystemPrompt::Text(s.to_string())
+    }
+
+    #[test]
+    fn first_call_is_a_miss() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hello world")];
+        let tokens = cache.lookup_or_compute(1, None, &messages);
+        let (hits, misses) = cache.stats();
+        assert!(tokens > 0);
+        assert_eq!(hits, 0);
+        assert_eq!(misses, 1);
+    }
+
+    #[test]
+    fn repeated_call_with_same_revision_is_a_hit() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hello world")];
+        let _ = cache.lookup_or_compute(1, None, &messages);
+        let _ = cache.lookup_or_compute(1, None, &messages);
+        let (hits, misses) = cache.stats();
+        assert_eq!(hits, 1);
+        assert_eq!(misses, 1);
+    }
+
+    #[test]
+    fn revision_bump_invalidates() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hi")];
+        let a = cache.lookup_or_compute(1, None, &messages);
+        let b = cache.lookup_or_compute(2, None, &messages);
+        let (hits, misses) = cache.stats();
+        // Both calls were misses (different revisions), neither hit the cache.
+        assert_eq!(a, b);
+        assert_eq!(hits, 0);
+        assert_eq!(misses, 2);
+    }
+
+    #[test]
+    fn system_prompt_change_invalidates() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hi")];
+        let _ = cache.lookup_or_compute(1, Some(&sys_text("alpha")), &messages);
+        let _ = cache.lookup_or_compute(1, Some(&sys_text("beta")), &messages);
+        let (hits, misses) = cache.stats();
+        assert_eq!(hits, 0);
+        assert_eq!(misses, 2);
+    }
+
+    #[test]
+    fn bump_messages_revision_clears_cache() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("x")];
+        let _ = cache.lookup_or_compute(1, None, &messages);
+        cache.bump_messages_revision(2);
+        let _ = cache.lookup_or_compute(2, None, &messages);
+        let (hits, misses) = cache.stats();
+        assert_eq!(hits, 0);
+        assert_eq!(misses, 2);
+    }
+
+    #[test]
+    fn bump_to_smaller_revision_is_noop() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("x")];
+        let _ = cache.lookup_or_compute(5, None, &messages);
+        cache.bump_messages_revision(2);
+        // revision went down, cache should still be valid for revision 5
+        let _ = cache.lookup_or_compute(5, None, &messages);
+        let (hits, _) = cache.stats();
+        assert_eq!(hits, 1, "downward revision bumps must not invalidate");
+    }
+
+    #[test]
+    fn invalidate_resets_state() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("x")];
+        let _ = cache.lookup_or_compute(1, None, &messages);
+        let _ = cache.lookup_or_compute(1, None, &messages);
+        cache.invalidate();
+        let (hits, misses) = cache.stats();
+        assert_eq!(hits, 0);
+        assert_eq!(misses, 0);
+    }
+
+    #[test]
+    fn blocks_system_prompt_yields_distinct_fingerprint() {
+        let blocks_a = SystemPrompt::Blocks(vec![SystemBlock {
+            block_type: "text".to_string(),
+            text: "alpha".to_string(),
+            cache_control: None,
+        }]);
+        let blocks_b = SystemPrompt::Blocks(vec![SystemBlock {
+            block_type: "text".to_string(),
+            text: "beta".to_string(),
+            cache_control: None,
+        }]);
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hi")];
+        let _ = cache.lookup_or_compute(1, Some(&blocks_a), &messages);
+        let _ = cache.lookup_or_compute(1, Some(&blocks_b), &messages);
+        let (hits, misses) = cache.stats();
+        assert_eq!(hits, 0);
+        assert_eq!(misses, 2);
+    }
+
+    #[test]
+    fn audit_ring_records_recent_pairs() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hi")];
+        for rev in 1..=5 {
+            let _ = cache.lookup_or_compute(rev, None, &messages);
+        }
+        let ring = cache.recent_audit();
+        assert_eq!(ring.len(), 5);
+        assert_eq!(ring.last().copied(), Some((5, ring.last().unwrap().1)));
+    }
+
+    #[test]
+    fn audit_ring_bounded_by_capacity() {
+        let mut cache = TokenEstimateCache::new();
+        let messages = vec![user_text("hi")];
+        for rev in 1..=(AUDIT_RING_CAPACITY + 10) as u64 {
+            let _ = cache.lookup_or_compute(rev, None, &messages);
+        }
+        let ring = cache.recent_audit();
+        assert_eq!(ring.len(), AUDIT_RING_CAPACITY);
+        // newest entry should be the most recent revision we asked for
+        assert_eq!(ring.last().unwrap().0, (AUDIT_RING_CAPACITY + 10) as u64);
+    }
+}
@@ -67,11 +67,7 @@ pub(super) const DEFAULT_ACTIVE_NATIVE_TOOLS: &[&str] = &[
    "write_file",
 ];

-pub(super) fn should_default_defer_tool(
-    name: &str,
-    _mode: AppMode,
-    always_load: &HashSet<String>,
-) -> bool {
+pub(super) fn should_default_defer_tool(name: &str, always_load: &HashSet<String>) -> bool {
    if always_load.contains(name) {
        return false;
    }
@@ -85,13 +81,9 @@ pub(super) fn should_default_defer_tool(
        .any(|core_tool| core_tool == &name)
 }

-pub(super) fn apply_native_tool_deferral(
-    catalog: &mut [Tool],
-    mode: AppMode,
-    always_load: &HashSet<String>,
-) {
+pub(super) fn apply_native_tool_deferral(catalog: &mut [Tool], always_load: &HashSet<String>) {
    for tool in catalog {
-        tool.defer_loading = Some(should_default_defer_tool(&tool.name, mode, always_load));
+        tool.defer_loading = Some(should_default_defer_tool(&tool.name, always_load));
    }
 }

@@ -185,7 +177,7 @@ pub(super) fn build_model_tool_catalog(
    mode: AppMode,
    always_load: &HashSet<String>,
 ) -> Vec<Tool> {
-    apply_native_tool_deferral(&mut native_tools, mode, always_load);
+    apply_native_tool_deferral(&mut native_tools, always_load);
    apply_mcp_tool_deferral(&mut mcp_tools, mode);
    // Sort each partition by name for prefix-cache stability (#263). The
    // upstream `to_api_tools()` already sorts the registry's HashMap output;
@@ -229,7 +221,6 @@ pub(super) fn ensure_advanced_tooling(
            allowed_callers: Some(vec!["direct".to_string()]),
            defer_loading: Some(should_default_defer_tool(
                CODE_EXECUTION_TOOL_NAME,
-                mode,
                always_load,
            )),
            input_examples: None,
@@ -248,7 +239,7 @@ pub(super) fn ensure_advanced_tooling(
        && crate::dependencies::resolve_node().is_some()
    {
        let mut tool = crate::tools::js_execution::js_execution_tool_definition();
-        tool.defer_loading = Some(should_default_defer_tool(&tool.name, mode, always_load));
+        tool.defer_loading = Some(should_default_defer_tool(&tool.name, always_load));
        catalog.push(tool);
    }

@@ -125,14 +125,30 @@ pub(super) fn emit_tool_audit(event: serde_json::Value) {
    };
    let line = match serde_json::to_string(&event) {
        Ok(line) => line,
-        Err(_) => return,
+        Err(e) => {
+            tracing::error!("Failed to serialize tool audit event: {e}");
+            return;
+        }
    };
    let path = PathBuf::from(path);
-    if let Some(parent) = path.parent() {
-        let _ = std::fs::create_dir_all(parent);
+    if let Some(parent) = path.parent()
+        && let Err(e) = std::fs::create_dir_all(parent)
+    {
+        tracing::error!(
+            "Failed to create audit log directory {}: {e}",
+            parent.display()
+        );
+        return;
    }
-    if let Ok(mut file) = OpenOptions::new().create(true).append(true).open(path) {
-        let _ = writeln!(file, "{line}");
+    match OpenOptions::new().create(true).append(true).open(&path) {
+        Ok(mut file) => {
+            if let Err(e) = writeln!(file, "{line}") {
+                tracing::error!("Failed to write to audit log {}: {e}", path.display());
+            }
+        }
+        Err(e) => {
+            tracing::error!("Failed to open audit log {}: {e}", path.display());
+        }
    }
 }

@@ -105,7 +105,7 @@ impl Engine {
            }

            // Ensure system prompt is up to date with latest session states
-            self.refresh_system_prompt(mode);
+            self.refresh_system_prompt();

            if turn.at_max_steps() {
                let _ = self
@@ -469,8 +469,7 @@ impl Engine {
            // budget restarts with the fresh stream.
            let mut stream_start = Instant::now();
            let mut stream_content_bytes: usize = 0;
-            let chunk_timeout_secs = stream_chunk_timeout_secs();
-            let chunk_timeout = Duration::from_secs(chunk_timeout_secs);
+            let (chunk_timeout_secs, chunk_timeout) = stream_chunk_timeout_budget(&self.config);
            let max_duration = Duration::from_secs(STREAM_MAX_DURATION_SECS);

            // Process stream events
@@ -1260,6 +1259,14 @@ impl Engine {
            }

            // Execute tools
+            if self.shared_paused.lock().is_ok_and(|paused| *paused) {
+                let _ = self
+                    .tx_event
+                    .send(Event::status("Request was Paused"))
+                    .await;
+                return (TurnOutcomeStatus::Interrupted, None);
+            }
+
            let tool_exec_lock = self.tool_exec_lock.clone();
            let mcp_pool = if tool_uses
                .iter()
@@ -2150,7 +2157,6 @@ impl Engine {
            if self
                .run_capacity_post_tool_checkpoint(
                    turn,
-                    mode,
                    tool_registry,
                    tool_exec_lock.clone(),
                    mcp_pool.clone(),
@@ -2182,7 +2188,6 @@ impl Engine {
            if self
                .run_capacity_error_escalation_checkpoint(
                    turn,
-                    mode,
                    step_error_count,
                    consecutive_tool_error_steps,
                    &step_error_categories,
@@ -2255,11 +2260,15 @@ impl Engine {
    }

    pub(super) fn messages_with_turn_metadata(&self) -> Vec<Message> {
-        // `<turn_meta>` is stored on user-text messages when the message is
-        // appended. Do not rewrite historical messages at request time: doing
-        // so makes the API prefix differ from the bytes sent in earlier turns
-        // and destroys DeepSeek's KV prefix cache reuse.
-        self.session.messages.clone()
+        // Keep stored history byte-stable and provider-compatible: runtime
+        // mode/approval contracts are projected as a transient user message
+        // at request time instead of being persisted as appended system
+        // messages. This preserves the stable prefix through all stored
+        // messages while avoiding strict chat templates that only allow
+        // system messages at messages[0].
+        let mut messages = self.session.messages.clone();
+        messages.push(self.runtime_prompt_message());
+        messages
    }
 }

@@ -2293,6 +2302,29 @@ fn should_hold_turn_for_subagents(queued_completions: usize, running_children: u
    queued_completions > 0 || running_children > 0
 }

+fn stream_chunk_timeout_budget(config: &EngineConfig) -> (u64, Duration) {
+    let secs = config.stream_chunk_timeout.as_secs();
+    (secs, Duration::from_secs(secs))
+}
+
+#[cfg(test)]
+mod stream_timeout_tests {
+    use super::*;
+
+    #[test]
+    fn stream_chunk_timeout_budget_uses_engine_config() {
+        let config = EngineConfig {
+            stream_chunk_timeout: Duration::from_secs(42),
+            ..EngineConfig::default()
+        };
+
+        assert_eq!(
+            stream_chunk_timeout_budget(&config),
+            (42, Duration::from_secs(42))
+        );
+    }
+}
+
 fn command_allows_tool(allowed_tools: Option<&[String]>, tool_name: &str) -> bool {
    let Some(allowed_tools) = allowed_tools else {
        return true;
@@ -77,13 +77,16 @@ pub enum Op {
    #[allow(dead_code)]
    ChangeMode { mode: AppMode },

-    /// Update the model being used and refresh the prompt for the current mode.
+    /// Update the model being used and refresh stable prompt context.
    #[allow(dead_code)]
    SetModel { model: String, mode: AppMode },

    /// Update auto-compaction settings
    SetCompaction { config: CompactionConfig },

+    /// Update the SSE idle timeout used for subsequent streamed turns.
+    SetStreamChunkTimeout { timeout_secs: u64 },
+
    /// Sync engine session state (used for resume/load)
    SyncSession {
        session_id: Option<String>,
@@ -31,8 +31,8 @@ pub struct Session {

    /// System prompt (optional)
    pub system_prompt: Option<SystemPrompt>,
-    /// True when `system_prompt` came from an explicit runtime API override
-    /// and should not be replaced by mode/context refreshes.
+    /// True when `system_prompt` is a persisted/runtime-supplied prefix that
+    /// should not be replaced by mode/context refreshes.
    pub system_prompt_override: bool,
    /// Hash of the last assembled stable system prompt. Used to avoid
    /// replacing `system_prompt` when unchanged.
@@ -82,6 +82,14 @@ pub struct Session {
    /// request of the session; verified against the current system+tool
    /// state before every subsequent request. None until the first turn.
    pub frozen_prefix: Option<FrozenPrefix>,
+
+    /// Monotonic counter bumped on every direct mutation of `messages`.
+    /// Consumed by [`crate::core::engine::token_estimate_cache::TokenEstimateCache`]
+    /// to memoize the per-turn token estimate without re-walking the message
+    /// list. Defaults to 0; bumped in [`Session::add_message`],
+    /// [`Session::replace_messages`], and at every other mutation site in
+    /// `core/engine.rs` / `core/engine/capacity_flow.rs`.
+    pub messages_revision: u64,
 }

 /// Cumulative usage statistics for a session.
@@ -155,12 +163,33 @@ impl Session {
            working_set: WorkingSet::default(),
            prefix_stability: None,
            frozen_prefix: None,
+            messages_revision: 0,
        }
    }

    /// Add a message to the conversation
    pub fn add_message(&mut self, message: Message) {
        self.messages.push(message);
+        self.messages_revision = self.messages_revision.saturating_add(1);
+    }
+
+    /// Replace the entire message history. Used by session resume and
+    /// capacity interventions. Bumps `messages_revision` exactly once even
+    /// when the new history has a different length, so downstream caches
+    /// invalidate atomically.
+    #[allow(dead_code)]
+    pub fn replace_messages(&mut self, messages: Vec<Message>) {
+        self.messages = messages;
+        self.messages_revision = self.messages_revision.saturating_add(1);
+    }
+
+    /// Bump `messages_revision` without otherwise mutating the message list.
+    /// Reserved for sites that mutate the message list in place (e.g. an
+    /// in-place rewrite of a content block). Most call sites do not need
+    /// this — prefer [`add_message`](Self::add_message) and
+    /// [`replace_messages`](Self::replace_messages).
+    pub fn bump_messages_revision(&mut self) {
+        self.messages_revision = self.messages_revision.saturating_add(1);
    }

    /// Rebuild the working set from current messages (best effort).
@@ -42,19 +42,20 @@ pub fn report(model: &str, usage: &Usage) {
    if !cost.is_positive() {
        return;
    }
-    if let Ok(mut pending) = cell().lock() {
-        pending.usd += cost.usd;
-        pending.cny += cost.cny;
-    }
+    // Recover from poisoned lock — a previous holder panicked but the
+    // accumulated data is still valid.
+    let mut pending = cell().lock().unwrap_or_else(|e| e.into_inner());
+    pending.usd += cost.usd;
+    pending.cny += cost.cny;
 }

 /// Drain the pending cost. Returns the accumulated amount and resets
 /// the pool to zero. Called by the TUI render / event loop on each
 /// frame; any non-zero result gets folded into `accrue_subagent_cost_estimate`.
 pub fn drain() -> CostEstimate {
-    let Ok(mut pending) = cell().lock() else {
-        return CostEstimate::default();
-    };
+    // Recover from poisoned lock — a previous holder panicked but the
+    // accumulated data is still valid.
+    let mut pending = cell().lock().unwrap_or_else(|e| e.into_inner());
    std::mem::take(&mut *pending)
 }

@@ -63,9 +64,8 @@ pub fn drain() -> CostEstimate {
 /// state. Production code should always use [`drain`].
 #[cfg(test)]
 pub fn reset_for_tests() {
-    if let Ok(mut pending) = cell().lock() {
-        *pending = CostEstimate::default();
-    }
+    let mut pending = cell().lock().unwrap_or_else(|e| e.into_inner());
+    *pending = CostEstimate::default();
 }

 #[cfg(test)]
@@ -182,6 +182,7 @@ impl Theme {
        match status {
            ToolStatus::Running => self.tool_running_accent,
            ToolStatus::Success => self.tool_success_accent,
+            ToolStatus::Hydrated => self.tool_running_accent,
            ToolStatus::Failed => self.tool_failed_accent,
        }
    }
@@ -278,6 +279,10 @@ mod tests {
            theme.tool_status_color(ToolStatus::Success),
            theme.tool_success_accent
        );
+        assert_eq!(
+            theme.tool_status_color(ToolStatus::Hydrated),
+            theme.tool_running_accent
+        );
        assert_eq!(
            theme.tool_status_color(ToolStatus::Failed),
            theme.tool_failed_accent
@@ -234,12 +234,12 @@ impl From<LlmError> for ErrorEnvelope {
                "llm_timeout",
                format!("Request timed out after {duration:?}"),
            ),
-            LlmError::AuthenticationError(message) => Self::new(
+            LlmError::AuthenticationError(auth) => Self::new(
                ErrorCategory::Authentication,
                ErrorSeverity::Critical,
                false,
                "llm_auth_error",
-                message,
+                auth.to_user_message(),
            ),
            LlmError::AuthorizationError(message) => Self::new(
                ErrorCategory::Authorization,
@@ -342,6 +342,10 @@ pub fn classify_error_message(message: &str) -> ErrorCategory {
    if lower.contains("network")
        || lower.contains("connection")
        || lower.contains("dns")
+        || lower.contains("stream read error")
+        || lower.contains("error decoding response body")
+        || lower.contains("chunk decode error")
+        || lower.contains("body decode")
        || lower.contains("temporarily unavailable")
        || lower.contains(" 502 ")
        || lower.contains(" 503 ")
@@ -548,6 +552,22 @@ mod tests {
        );
    }

+    #[test]
+    fn network_catches_stream_body_decode_failures() {
+        for msg in [
+            "Warn Stream read error: error decoding response body",
+            "Stream read error: error decoding response body",
+            "chunk decode error",
+            "provider body decode failed mid-stream",
+        ] {
+            assert_eq!(
+                classify(msg),
+                ErrorCategory::Network,
+                "expected Network for `{msg}`",
+            );
+        }
+    }
+
    #[test]
    fn authentication_beats_authorization_when_api_key_phrasing_is_used() {
        // "api key" landing on Authentication (not Authorization) keeps
@@ -566,6 +586,35 @@ mod tests {
        }
    }

+    #[test]
+    fn llm_auth_error_envelope_renders_context_without_secret() {
+        let api_key = "tp-secret-token-plan-value";
+        let env = ErrorEnvelope::from(LlmError::from_http_response_with_request_context(
+            401,
+            &format!("Invalid API Key: {api_key}"),
+            Some("Xiaomi MiMo"),
+            Some("https://token-plan-sgp.xiaomimimo.com/v1"),
+            Some("mimo-v2.5"),
+            Some("env"),
+            Some(api_key),
+        ));
+
+        assert_eq!(env.category, ErrorCategory::Authentication);
+        assert_eq!(env.severity, ErrorSeverity::Critical);
+        assert!(!env.recoverable);
+        assert!(env.message.contains("provider: Xiaomi MiMo"));
+        assert!(
+            env.message
+                .contains("base URL authority: token-plan-sgp.xiaomimimo.com")
+        );
+        assert!(env.message.contains("model: mimo-v2.5"));
+        assert!(env.message.contains("key source: env"));
+        assert!(env.message.contains("key fingerprint: tp-... (len=26)"));
+        assert!(env.message.contains("key type: Xiaomi MiMo Token Plan key"));
+        assert!(!env.message.contains(api_key));
+        assert!(!env.message.contains("secret-token-plan-value"));
+    }
+
    #[test]
    fn authorization_catches_forbidden_and_denied() {
        for msg in [
@@ -1,3 +1,4 @@
+#[cfg(not(target_env = "ohos"))]
 use starlark::Error as StarlarkError;
 use thiserror::Error;

@@ -23,6 +24,9 @@ pub enum Error {
    },
    #[error("expected example to not match rule `{rule}`: {example}")]
    ExampleDidMatch { rule: String, example: String },
+    #[error("{0}")]
+    UnsupportedPlatform(String),
    #[error("starlark error: {0}")]
+    #[cfg(not(target_env = "ohos"))]
    Starlark(StarlarkError),
 }
@@ -6,7 +6,10 @@ pub mod decision;
 pub mod error;
 pub mod execpolicycheck;
 pub mod matcher;
+#[cfg(not(target_env = "ohos"))]
 pub mod parser;
+#[cfg(target_env = "ohos")]
+pub mod parser_ohos;
 pub mod policy;
 pub mod rule;
 pub mod rules;
@@ -17,7 +20,10 @@ pub use decision::Decision;
 pub use error::Error;
 pub use error::Result;
 pub use execpolicycheck::ExecPolicyCheckCommand;
+#[cfg(not(target_env = "ohos"))]
 pub use parser::PolicyParser;
+#[cfg(target_env = "ohos")]
+pub use parser_ohos::PolicyParser;
 pub use policy::Evaluation;
 pub use policy::Policy;
 pub use rule::Rule;
@@ -0,0 +1,26 @@
+use super::error::Error;
+use super::error::Result;
+
+pub struct PolicyParser;
+
+impl Default for PolicyParser {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl PolicyParser {
+    pub fn new() -> Self {
+        Self
+    }
+
+    pub fn parse(&mut self, _policy_identifier: &str, _policy_file_contents: &str) -> Result<()> {
+        Err(Error::UnsupportedPlatform(
+            "Starlark execpolicy files are not supported on HarmonyOS/OpenHarmony yet because upstream starlark-rust still depends on a rustyline/nix chain that does not compile for OHOS.".to_string(),
+        ))
+    }
+
+    pub fn build(self) -> super::policy::Policy {
+        super::policy::Policy::empty()
+    }
+}
@@ -7,6 +7,7 @@
 //! - Mode changes
 //! - Message submission
 //! - Error events
+//! - Turn completion
 //!
 //! Configuration is done via `[[hooks.hooks]]` in config.toml.

@@ -41,6 +42,8 @@ pub enum HookEvent {
    ModeChange,
    /// Triggered when an error occurs
    OnError,
+    /// Triggered after a turn completes and post-turn state has been updated
+    TurnEnd,
    /// Triggered when a sub-agent is spawned
    SubagentSpawn,
    /// Triggered when a sub-agent reaches a terminal state
@@ -66,6 +69,7 @@ impl HookEvent {
            HookEvent::ToolCallAfter => "tool_call_after",
            HookEvent::ModeChange => "mode_change",
            HookEvent::OnError => "on_error",
+            HookEvent::TurnEnd => "turn_end",
            HookEvent::SubagentSpawn => "subagent_spawn",
            HookEvent::SubagentComplete => "subagent_complete",
            HookEvent::ShellEnv => "shell_env",
@@ -480,6 +484,28 @@ enum MessageSubmitStdout {
    Invalid(String),
 }

+/// Post-turn accumulated totals included in the `turn_end` observer payload.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct TurnEndTotals {
+    pub session_tokens: u32,
+    pub conversation_tokens: u32,
+    pub input_tokens: u32,
+    pub output_tokens: u32,
+}
+
+/// Input used to build the structured `turn_end` observer payload.
+pub struct TurnEndPayloadInput<'a> {
+    pub context: &'a HookContext,
+    pub turn_id: Option<&'a str>,
+    pub status: &'a str,
+    pub error: Option<&'a str>,
+    pub duration: Duration,
+    pub usage: &'a crate::models::Usage,
+    pub totals: TurnEndTotals,
+    pub tool_count: usize,
+    pub queued_message_count: usize,
+}
+
 /// Executor for running hooks
 #[derive(Debug, Clone)]
 pub struct HookExecutor {
@@ -1051,7 +1077,7 @@ impl HookExecutor {
        let env = env_vars.clone();
        let wd = working_dir.clone();

-        // Spawn in a detached thread
+        // Spawn in a detached thread (fire-and-forget hook execution).
        std::thread::spawn(move || {
            let mut command = HookExecutor::build_shell_command(&cmd);
            command
@@ -1121,6 +1147,41 @@ fn message_submit_payload(context: &HookContext, text: &str) -> serde_json::Valu
    })
 }

+pub fn turn_end_payload(input: TurnEndPayloadInput<'_>) -> serde_json::Value {
+    json!({
+        "event": HookEvent::TurnEnd.as_str(),
+        "session_id": input.context.session_id.as_deref(),
+        "workspace": input.context.workspace.as_ref().map(|path| path.display().to_string()),
+        "mode": input.context.mode.as_deref(),
+        "model": input.context.model.as_deref(),
+        "turn_id": input.turn_id,
+        "status": input.status,
+        "error": input.error,
+        "duration_ms": duration_ms_saturating(input.duration),
+        "usage": {
+            "input_tokens": input.usage.input_tokens,
+            "output_tokens": input.usage.output_tokens,
+            "prompt_cache_hit_tokens": input.usage.prompt_cache_hit_tokens,
+            "prompt_cache_miss_tokens": input.usage.prompt_cache_miss_tokens,
+            "reasoning_tokens": input.usage.reasoning_tokens,
+            "reasoning_replay_tokens": input.usage.reasoning_replay_tokens,
+        },
+        "totals": {
+            "session_tokens": input.totals.session_tokens,
+            "conversation_tokens": input.totals.conversation_tokens,
+            "input_tokens": input.totals.input_tokens,
+            "output_tokens": input.totals.output_tokens,
+        },
+        "tool_count": input.tool_count,
+        "queued_message_count": input.queued_message_count,
+        "stop_hook_active": false,
+    })
+}
+
+fn duration_ms_saturating(duration: Duration) -> u64 {
+    u64::try_from(duration.as_millis()).unwrap_or(u64::MAX)
+}
+
 fn parse_message_submit_stdout(stdout: &str) -> MessageSubmitStdout {
    let trimmed = stdout.trim();
    if trimmed.is_empty() {
@@ -1343,10 +1404,70 @@ NOEQUAL line dropped
        assert_eq!(HookEvent::SessionStart.as_str(), "session_start");
        assert_eq!(HookEvent::ToolCallAfter.as_str(), "tool_call_after");
        assert_eq!(HookEvent::ModeChange.as_str(), "mode_change");
+        assert_eq!(HookEvent::TurnEnd.as_str(), "turn_end");
        assert_eq!(HookEvent::SubagentSpawn.as_str(), "subagent_spawn");
        assert_eq!(HookEvent::SubagentComplete.as_str(), "subagent_complete");
    }

+    #[test]
+    fn turn_end_payload_contains_post_turn_observer_fields() {
+        let context = HookContext::new()
+            .with_session_id("sess_test")
+            .with_workspace(PathBuf::from("/tmp/codewhale"))
+            .with_mode("agent")
+            .with_model("deepseek-v4")
+            .with_tokens(125);
+        let usage = crate::models::Usage {
+            input_tokens: 40,
+            output_tokens: 9,
+            prompt_cache_hit_tokens: Some(10),
+            prompt_cache_miss_tokens: Some(30),
+            reasoning_tokens: Some(4),
+            reasoning_replay_tokens: Some(2),
+            server_tool_use: None,
+        };
+
+        let payload = super::turn_end_payload(TurnEndPayloadInput {
+            context: &context,
+            turn_id: Some("turn_123"),
+            status: "completed",
+            error: None,
+            duration: Duration::from_millis(321),
+            usage: &usage,
+            totals: TurnEndTotals {
+                session_tokens: 125,
+                conversation_tokens: 100,
+                input_tokens: 100,
+                output_tokens: 25,
+            },
+            tool_count: 2,
+            queued_message_count: 1,
+        });
+
+        assert_eq!(payload["event"], "turn_end");
+        assert_eq!(payload["session_id"], "sess_test");
+        assert_eq!(payload["workspace"], "/tmp/codewhale");
+        assert_eq!(payload["mode"], "agent");
+        assert_eq!(payload["model"], "deepseek-v4");
+        assert_eq!(payload["turn_id"], "turn_123");
+        assert_eq!(payload["status"], "completed");
+        assert_eq!(payload["error"], serde_json::Value::Null);
+        assert_eq!(payload["duration_ms"], 321);
+        assert_eq!(payload["usage"]["input_tokens"], 40);
+        assert_eq!(payload["usage"]["output_tokens"], 9);
+        assert_eq!(payload["usage"]["prompt_cache_hit_tokens"], 10);
+        assert_eq!(payload["usage"]["prompt_cache_miss_tokens"], 30);
+        assert_eq!(payload["usage"]["reasoning_tokens"], 4);
+        assert_eq!(payload["usage"]["reasoning_replay_tokens"], 2);
+        assert_eq!(payload["totals"]["session_tokens"], 125);
+        assert_eq!(payload["totals"]["conversation_tokens"], 100);
+        assert_eq!(payload["totals"]["input_tokens"], 100);
+        assert_eq!(payload["totals"]["output_tokens"], 25);
+        assert_eq!(payload["tool_count"], 2);
+        assert_eq!(payload["queued_message_count"], 1);
+        assert_eq!(payload["stop_hook_active"], false);
+    }
+
    #[test]
    fn test_hook_context_to_env_vars() {
        let ctx = HookContext::new()
@@ -1578,6 +1699,76 @@ cat > "{}"
        assert_eq!(captured["prompt_truncated"], false);
    }

+    #[cfg(not(windows))]
+    #[test]
+    fn turn_end_observer_hook_receives_stdin_json_and_ignores_stdout_contract() {
+        let dir = tempfile::tempdir().expect("tempdir");
+        let out = dir.path().join("turn_end.json");
+        let command = write_hook_script(
+            &dir,
+            "capture_turn_end.sh",
+            &format!(
+                r#"#!/bin/sh
+cat > "{}"
+printf '%s\n' '{{"text":"stdout is not a mutation contract"}}'
+"#,
+                out.display()
+            ),
+        );
+        let executor = HookExecutor::new(
+            HooksConfig {
+                enabled: true,
+                hooks: vec![Hook::new(HookEvent::TurnEnd, &command)],
+                ..Default::default()
+            },
+            dir.path().to_path_buf(),
+        );
+        let usage = crate::models::Usage {
+            input_tokens: 12,
+            output_tokens: 3,
+            prompt_cache_hit_tokens: None,
+            prompt_cache_miss_tokens: None,
+            reasoning_tokens: None,
+            reasoning_replay_tokens: None,
+            server_tool_use: None,
+        };
+        let context = submit_context(&dir).with_tokens(15);
+        let payload = super::turn_end_payload(TurnEndPayloadInput {
+            context: &context,
+            turn_id: Some("turn_observed"),
+            status: "completed",
+            error: None,
+            duration: Duration::from_millis(7),
+            usage: &usage,
+            totals: TurnEndTotals {
+                session_tokens: 15,
+                conversation_tokens: 15,
+                input_tokens: 12,
+                output_tokens: 3,
+            },
+            tool_count: 0,
+            queued_message_count: 0,
+        });
+
+        let results = executor.execute_json_observer(HookEvent::TurnEnd, &context, &payload);
+
+        assert_eq!(results.len(), 1);
+        assert!(results[0].success);
+        assert!(
+            results[0]
+                .stdout
+                .contains("stdout is not a mutation contract"),
+            "stdout is still captured for diagnostics"
+        );
+        let captured: serde_json::Value =
+            serde_json::from_str(&std::fs::read_to_string(out).expect("payload written"))
+                .expect("valid JSON payload");
+        assert_eq!(captured["event"], "turn_end");
+        assert_eq!(captured["turn_id"], "turn_observed");
+        assert_eq!(captured["totals"]["input_tokens"], 12);
+        assert_eq!(captured["totals"]["output_tokens"], 3);
+    }
+
    #[cfg(not(windows))]
    #[test]
    fn json_observer_hook_failure_does_not_stop_later_hooks() {
@@ -1912,6 +2103,7 @@ exit 7
            HookEvent::ToolCallAfter,
            HookEvent::ModeChange,
            HookEvent::OnError,
+            HookEvent::TurnEnd,
            HookEvent::SubagentSpawn,
            HookEvent::SubagentComplete,
        ] {
@@ -82,6 +82,194 @@ pub trait RetryConfigurable {
    fn set_retry_config(&mut self, config: RetryConfig);
 }

+// === Authentication diagnostics ===
+
+#[derive(Debug, Clone, PartialEq, Eq, Default)]
+pub struct AuthenticationErrorContext {
+    pub provider: Option<String>,
+    pub base_url_authority: Option<String>,
+    pub model: Option<String>,
+    pub key_source: Option<String>,
+    pub key_fingerprint: Option<String>,
+    pub key_kind: Option<String>,
+}
+
+impl AuthenticationErrorContext {
+    #[must_use]
+    pub fn new(
+        provider: &str,
+        base_url: &str,
+        model: &str,
+        key_source: &str,
+        api_key: &str,
+    ) -> Self {
+        Self::from_parts(
+            Some(provider),
+            Some(base_url),
+            Some(model),
+            Some(key_source),
+            Some(api_key),
+        )
+    }
+
+    #[must_use]
+    pub fn from_parts(
+        provider: Option<&str>,
+        base_url: Option<&str>,
+        model: Option<&str>,
+        key_source: Option<&str>,
+        api_key: Option<&str>,
+    ) -> Self {
+        let api_key = api_key.and_then(non_empty_trimmed);
+        Self {
+            provider: provider.and_then(non_empty_trimmed).map(str::to_string),
+            base_url_authority: base_url.and_then(base_url_authority),
+            model: model.and_then(non_empty_trimmed).map(str::to_string),
+            key_source: key_source.and_then(non_empty_trimmed).map(str::to_string),
+            key_fingerprint: api_key.map(redacted_key_fingerprint),
+            key_kind: api_key.map(classify_api_key_prefix).map(str::to_string),
+        }
+    }
+
+    fn is_empty(&self) -> bool {
+        self.provider.is_none()
+            && self.base_url_authority.is_none()
+            && self.model.is_none()
+            && self.key_source.is_none()
+            && self.key_fingerprint.is_none()
+            && self.key_kind.is_none()
+    }
+
+    fn detail_segments(&self) -> Vec<String> {
+        let mut segments = Vec::new();
+        if let Some(provider) = self.provider.as_deref() {
+            segments.push(format!("provider: {provider}"));
+        }
+        if let Some(authority) = self.base_url_authority.as_deref() {
+            segments.push(format!("base URL authority: {authority}"));
+        }
+        if let Some(model) = self.model.as_deref() {
+            segments.push(format!("model: {model}"));
+        }
+        if let Some(source) = self.key_source.as_deref() {
+            segments.push(format!("key source: {source}"));
+        }
+        if let Some(fingerprint) = self.key_fingerprint.as_deref() {
+            segments.push(format!("key fingerprint: {fingerprint}"));
+        }
+        if let Some(kind) = self.key_kind.as_deref() {
+            segments.push(format!("key type: {kind}"));
+        }
+        segments
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct AuthenticationErrorDetail {
+    message: String,
+    context: Option<AuthenticationErrorContext>,
+}
+
+impl AuthenticationErrorDetail {
+    #[must_use]
+    pub fn new(message: impl Into<String>) -> Self {
+        Self {
+            message: message.into(),
+            context: None,
+        }
+    }
+
+    #[must_use]
+    pub fn with_context(
+        message: impl Into<String>,
+        context: Option<AuthenticationErrorContext>,
+    ) -> Self {
+        let context = context.filter(|context| !context.is_empty());
+        Self {
+            message: message.into(),
+            context,
+        }
+    }
+
+    #[must_use]
+    pub fn message(&self) -> &str {
+        &self.message
+    }
+
+    #[must_use]
+    pub fn to_user_message(&self) -> String {
+        let Some(context) = self.context.as_ref() else {
+            return self.message.clone();
+        };
+        let segments = context.detail_segments();
+        if segments.is_empty() {
+            self.message.clone()
+        } else {
+            format!("{} ({})", self.message, segments.join(", "))
+        }
+    }
+}
+
+impl From<String> for AuthenticationErrorDetail {
+    fn from(message: String) -> Self {
+        Self::new(message)
+    }
+}
+
+impl From<&str> for AuthenticationErrorDetail {
+    fn from(message: &str) -> Self {
+        Self::new(message)
+    }
+}
+
+#[must_use]
+pub fn classify_api_key_prefix(api_key: &str) -> &'static str {
+    if api_key.starts_with("tp-") {
+        "Xiaomi MiMo Token Plan key"
+    } else {
+        "API key"
+    }
+}
+
+fn non_empty_trimmed(value: &str) -> Option<&str> {
+    let value = value.trim();
+    if value.is_empty() { None } else { Some(value) }
+}
+
+fn base_url_authority(base_url: &str) -> Option<String> {
+    let base_url = non_empty_trimmed(base_url)?;
+    let without_scheme = base_url
+        .split_once("://")
+        .map_or(base_url, |(_, rest)| rest);
+    let authority = without_scheme.split('/').next().unwrap_or(without_scheme);
+    let authority = authority
+        .rsplit_once('@')
+        .map_or(authority, |(_, authority)| authority);
+    non_empty_trimmed(authority).map(str::to_string)
+}
+
+fn redacted_key_fingerprint(api_key: &str) -> String {
+    let api_key = api_key.trim();
+    let len = api_key.chars().count();
+    match public_key_prefix(api_key) {
+        Some(prefix) => format!("{prefix}... (len={len})"),
+        None => format!("unprefixed (len={len})"),
+    }
+}
+
+fn public_key_prefix(api_key: &str) -> Option<&str> {
+    ["tp-", "sk-", "hf_", "hf-", "ak-", "rk-"]
+        .into_iter()
+        .find(|prefix| api_key.starts_with(prefix))
+}
+
+fn redact_api_key_from_message(message: &str, api_key: Option<&str>) -> String {
+    let Some(api_key) = api_key.and_then(non_empty_trimmed) else {
+        return message.to_string();
+    };
+    message.replace(api_key, "[redacted API key]")
+}
+
 // === LlmError - Classified Error Types ===

 /// Classified LLM errors with retryability information.
@@ -107,8 +295,8 @@ pub enum LlmError {
    /// Request timed out
    Timeout(Duration),

-    /// Authentication failed (HTTP 401, 403)
-    AuthenticationError(String),
+    /// Authentication failed (HTTP 401, selected HTTP 403)
+    AuthenticationError(AuthenticationErrorDetail),

    /// Authorization or provider-side blocking failed (HTTP 403)
    AuthorizationError(String),
@@ -141,7 +329,9 @@ impl std::fmt::Display for LlmError {
            }
            LlmError::NetworkError(msg) => write!(f, "Network error: {msg}"),
            LlmError::Timeout(d) => write!(f, "Request timed out after {d:?}"),
-            LlmError::AuthenticationError(msg) => write!(f, "Authentication failed: {msg}"),
+            LlmError::AuthenticationError(auth) => {
+                write!(f, "Authentication failed: {}", auth.to_user_message())
+            }
            LlmError::AuthorizationError(msg) => write!(f, "Authorization failed: {msg}"),
            LlmError::InvalidRequest { status, message } => {
                write!(f, "Invalid request ({status}): {message}")
@@ -203,10 +393,10 @@ impl LlmError {
                message: body.to_string(),
                retry_after: None,
            },
-            401 => LlmError::AuthenticationError(body.to_string()),
+            401 => Self::authentication_error(body),
            403 => {
                if looks_like_authentication_failure(body) {
-                    LlmError::AuthenticationError(body.to_string())
+                    Self::authentication_error(body)
                } else {
                    LlmError::AuthorizationError(body.to_string())
                }
@@ -262,6 +452,62 @@ impl LlmError {
        }
    }

+    #[must_use]
+    pub fn authentication_error(message: impl Into<String>) -> Self {
+        LlmError::AuthenticationError(AuthenticationErrorDetail::new(message))
+    }
+
+    #[must_use]
+    pub fn authentication_error_with_context(
+        message: impl Into<String>,
+        context: Option<AuthenticationErrorContext>,
+    ) -> Self {
+        LlmError::AuthenticationError(AuthenticationErrorDetail::with_context(message, context))
+    }
+
+    /// Constructs an `LlmError` from HTTP response data plus request context
+    /// that is safe to display when authentication fails.
+    #[must_use]
+    pub fn from_http_response_with_request_context(
+        status: u16,
+        body: &str,
+        provider: Option<&str>,
+        base_url: Option<&str>,
+        model: Option<&str>,
+        key_source: Option<&str>,
+        api_key: Option<&str>,
+    ) -> Self {
+        let body = redact_api_key_from_message(body, api_key);
+        let context =
+            AuthenticationErrorContext::from_parts(provider, base_url, model, key_source, api_key);
+        Self::from_http_response_with_auth_context(status, &body, Some(context))
+    }
+
+    /// Constructs an `LlmError` from HTTP status code and response body, with
+    /// optional structured details for authentication failures.
+    ///
+    /// The `body` passed here must already be safe for user display. Prefer
+    /// [`Self::from_http_response_with_request_context`] when the raw API key is
+    /// available so the response body can be redacted before rendering.
+    #[must_use]
+    pub fn from_http_response_with_auth_context(
+        status: u16,
+        body: &str,
+        auth_context: Option<AuthenticationErrorContext>,
+    ) -> Self {
+        match status {
+            401 => Self::authentication_error_with_context(body, auth_context),
+            403 => {
+                if looks_like_authentication_failure(body) {
+                    Self::authentication_error_with_context(body, auth_context)
+                } else {
+                    LlmError::AuthorizationError(body.to_string())
+                }
+            }
+            _ => Self::from_http_response(status, body),
+        }
+    }
+
    /// Constructs an `LlmError` from HTTP status code, body, and optional Retry-After header.
    pub fn from_http_response_with_retry_after(
        status: u16,
@@ -898,6 +1144,13 @@ mod tests {
        );
    }

+    fn auth_user_message(error: LlmError) -> String {
+        match error {
+            LlmError::AuthenticationError(auth) => auth.to_user_message(),
+            other => panic!("expected authentication error, got {other}"),
+        }
+    }
+
    #[test]
    fn test_retry_config_defaults() {
        let config = RetryConfig::default();
@@ -1014,7 +1267,7 @@ mod tests {
        assert!(LlmError::Timeout(Duration::from_secs(30)).is_retryable());

        // Non-retryable errors
-        assert!(!LlmError::AuthenticationError("invalid key".to_string()).is_retryable());
+        assert!(!LlmError::authentication_error("invalid key").is_retryable());
        assert!(!LlmError::AuthorizationError("blocked".to_string()).is_retryable());
        assert!(
            !LlmError::InvalidRequest {
@@ -1071,6 +1324,109 @@ mod tests {
        assert!(matches!(err, LlmError::InvalidRequest { status: 400, .. }));
    }

+    #[test]
+    fn auth_error_with_context_includes_provider_authority_model_and_key_source() {
+        let err = LlmError::from_http_response_with_request_context(
+            401,
+            "Invalid API Key",
+            Some("Xiaomi MiMo"),
+            Some("https://token-plan-sgp.xiaomimimo.com/v1"),
+            Some("mimo-v2.5"),
+            Some("env"),
+            Some("tp-secret-token-plan-value"),
+        );
+        let message = auth_user_message(err);
+
+        assert!(message.contains("Invalid API Key"));
+        assert!(message.contains("provider: Xiaomi MiMo"));
+        assert!(message.contains("base URL authority: token-plan-sgp.xiaomimimo.com"));
+        assert!(message.contains("model: mimo-v2.5"));
+        assert!(message.contains("key source: env"));
+        assert!(message.contains("key fingerprint: tp-... (len=26)"));
+    }
+
+    #[test]
+    fn auth_error_redacts_full_api_key_from_body_and_context() {
+        let api_key = "tp-secret-token-plan-value";
+        let err = LlmError::from_http_response_with_request_context(
+            401,
+            &format!("Invalid API Key: {api_key}"),
+            Some("Xiaomi MiMo"),
+            Some("https://token-plan-sgp.xiaomimimo.com/v1"),
+            Some("mimo-v2.5"),
+            Some("config-file"),
+            Some(api_key),
+        );
+        let message = auth_user_message(err);
+
+        assert!(!message.contains(api_key));
+        assert!(!message.contains("secret-token-plan-value"));
+        assert!(message.contains("[redacted API key]"));
+        assert!(message.contains("key fingerprint: tp-... (len=26)"));
+    }
+
+    #[test]
+    fn auth_error_classifies_xiaomi_token_plan_key_prefix() {
+        let token_plan = AuthenticationErrorContext::from_parts(
+            None,
+            None,
+            None,
+            Some("session"),
+            Some("tp-secret-token-plan-value"),
+        );
+        let generic = AuthenticationErrorContext::from_parts(
+            None,
+            None,
+            None,
+            Some("session"),
+            Some("sk-other"),
+        );
+        let unprefixed = AuthenticationErrorContext::from_parts(
+            None,
+            None,
+            None,
+            Some("session"),
+            Some("plainsecretvalue"),
+        );
+
+        assert_eq!(
+            token_plan.key_kind.as_deref(),
+            Some("Xiaomi MiMo Token Plan key")
+        );
+        assert_eq!(generic.key_kind.as_deref(), Some("API key"));
+        assert_eq!(unprefixed.key_kind.as_deref(), Some("API key"));
+        assert_eq!(
+            unprefixed.key_fingerprint.as_deref(),
+            Some("unprefixed (len=16)")
+        );
+    }
+
+    #[test]
+    fn authorization_403_is_not_reclassified_by_auth_context() {
+        let err = LlmError::from_http_response_with_request_context(
+            403,
+            "forbidden",
+            Some("Arcee AI"),
+            Some("https://api.arcee.ai/v1"),
+            Some("auto"),
+            Some("env"),
+            Some("sk-arcee-secret"),
+        );
+
+        assert!(matches!(err, LlmError::AuthorizationError(_)));
+    }
+
+    #[test]
+    fn auth_error_without_context_preserves_bare_message() {
+        let err = LlmError::from_http_response_with_auth_context(
+            401,
+            "Invalid API Key",
+            Some(AuthenticationErrorContext::default()),
+        );
+
+        assert_eq!(auth_user_message(err), "Invalid API Key");
+    }
+
    #[test]
    fn cloudflare_html_error_is_summarized_without_raw_markup() {
        let body = r#"<!DOCTYPE html><html><head><title>Access Denied</title><style>
@@ -1247,7 +1603,7 @@ mod tests {
            &config,
            || {
                call_count += 1;
-                async { Err(LlmError::AuthenticationError("bad key".to_string())) }
+                async { Err(LlmError::authentication_error("bad key")) }
            },
            None,
        )
@@ -0,0 +1,239 @@
+//! Small in-process cache for deterministic non-streaming chat responses.
+
+use std::num::NonZeroUsize;
+use std::sync::{Mutex, OnceLock};
+
+use lru::LruCache;
+use sha2::{Digest, Sha256};
+
+use crate::models::{MessageRequest, MessageResponse, Usage};
+
+const DEFAULT_CAPACITY: usize = 256;
+
+static RESPONSE_CACHE: OnceLock<ResponseCache> = OnceLock::new();
+
+pub(crate) fn response_cache() -> &'static ResponseCache {
+    RESPONSE_CACHE.get_or_init(ResponseCache::new)
+}
+
+pub(crate) fn request_is_cacheable(request: &MessageRequest) -> bool {
+    request.stream != Some(true)
+        && request.tools.as_ref().is_none_or(Vec::is_empty)
+        && request.tool_choice.is_none()
+        && request.temperature == Some(0.0)
+        && request.top_p.is_none_or(|top_p| top_p == 1.0)
+}
+
+pub(crate) struct ResponseCache {
+    inner: Mutex<LruCache<[u8; 32], MessageResponse>>,
+}
+
+impl ResponseCache {
+    fn new() -> Self {
+        Self::with_capacity(NonZeroUsize::new(DEFAULT_CAPACITY).expect("non-zero capacity"))
+    }
+
+    fn with_capacity(capacity: NonZeroUsize) -> Self {
+        Self {
+            inner: Mutex::new(LruCache::new(capacity)),
+        }
+    }
+
+    pub(crate) fn make_key(
+        provider: &str,
+        base_url: &str,
+        path_suffix: Option<&str>,
+        api_key: &str,
+        wire_body: &[u8],
+    ) -> [u8; 32] {
+        let mut hasher = Sha256::new();
+        update_field(&mut hasher, provider.as_bytes());
+        update_field(&mut hasher, base_url.as_bytes());
+        update_field(&mut hasher, path_suffix.unwrap_or("").as_bytes());
+        update_field(&mut hasher, &Sha256::digest(api_key.as_bytes()));
+        update_field(&mut hasher, wire_body);
+        hasher.finalize().into()
+    }
+
+    pub(crate) fn get(&self, key: &[u8; 32]) -> Option<MessageResponse> {
+        let mut cache = self.inner.lock().ok()?;
+        cache.get(key).cloned().map(|mut response| {
+            response.usage = Usage::default();
+            response
+        })
+    }
+
+    pub(crate) fn put(&self, key: [u8; 32], value: MessageResponse) {
+        if let Ok(mut cache) = self.inner.lock() {
+            cache.put(key, value);
+        }
+    }
+}
+
+fn update_field(hasher: &mut Sha256, bytes: &[u8]) {
+    hasher.update((bytes.len() as u64).to_le_bytes());
+    hasher.update(bytes);
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn response_with_usage(id: &str) -> MessageResponse {
+        MessageResponse {
+            id: id.to_string(),
+            r#type: "message".to_string(),
+            role: "assistant".to_string(),
+            content: Vec::new(),
+            model: "test-model".to_string(),
+            stop_reason: Some("end_turn".to_string()),
+            stop_sequence: None,
+            container: None,
+            usage: Usage {
+                input_tokens: 42,
+                output_tokens: 7,
+                prompt_cache_hit_tokens: Some(3),
+                prompt_cache_miss_tokens: Some(39),
+                reasoning_tokens: Some(5),
+                reasoning_replay_tokens: Some(2),
+                server_tool_use: None,
+            },
+        }
+    }
+
+    fn request() -> MessageRequest {
+        MessageRequest {
+            model: "test-model".to_string(),
+            messages: Vec::new(),
+            max_tokens: 16,
+            system: None,
+            tools: None,
+            tool_choice: None,
+            metadata: None,
+            thinking: None,
+            reasoning_effort: None,
+            stream: None,
+            temperature: Some(0.0),
+            top_p: None,
+        }
+    }
+
+    #[test]
+    fn cache_key_separates_provider_route_account_and_wire_body() {
+        let base = ResponseCache::make_key(
+            "deepseek",
+            "https://api.example.com/v1",
+            None,
+            "key-a",
+            br#"{"model":"m","messages":[]}"#,
+        );
+
+        assert_ne!(
+            base,
+            ResponseCache::make_key(
+                "openai",
+                "https://api.example.com/v1",
+                None,
+                "key-a",
+                br#"{"model":"m","messages":[]}"#
+            )
+        );
+        assert_ne!(
+            base,
+            ResponseCache::make_key(
+                "deepseek",
+                "https://proxy.example.com/v1",
+                None,
+                "key-a",
+                br#"{"model":"m","messages":[]}"#
+            )
+        );
+        assert_ne!(
+            base,
+            ResponseCache::make_key(
+                "deepseek",
+                "https://api.example.com/v1",
+                Some("responses"),
+                "key-a",
+                br#"{"model":"m","messages":[]}"#
+            )
+        );
+        assert_ne!(
+            base,
+            ResponseCache::make_key(
+                "deepseek",
+                "https://api.example.com/v1",
+                None,
+                "key-b",
+                br#"{"model":"m","messages":[]}"#
+            )
+        );
+        assert_ne!(
+            base,
+            ResponseCache::make_key(
+                "deepseek",
+                "https://api.example.com/v1",
+                None,
+                "key-a",
+                br#"{"model":"m","messages":[],"reasoning_effort":"high"}"#
+            )
+        );
+    }
+
+    #[test]
+    fn cache_hit_zeroes_usage_to_avoid_fake_spend() {
+        let cache = ResponseCache::with_capacity(NonZeroUsize::new(2).unwrap());
+        let key =
+            ResponseCache::make_key("deepseek", "https://api.example.com", None, "key", b"{}");
+
+        cache.put(key, response_with_usage("cached"));
+
+        let hit = cache.get(&key).expect("cache hit");
+        assert_eq!(hit.id, "cached");
+        assert_eq!(hit.usage, Usage::default());
+    }
+
+    #[test]
+    fn capacity_evicts_oldest_entry() {
+        let cache = ResponseCache::with_capacity(NonZeroUsize::new(2).unwrap());
+        let key1 =
+            ResponseCache::make_key("deepseek", "https://api.example.com", None, "key", b"one");
+        let key2 =
+            ResponseCache::make_key("deepseek", "https://api.example.com", None, "key", b"two");
+        let key3 =
+            ResponseCache::make_key("deepseek", "https://api.example.com", None, "key", b"three");
+
+        cache.put(key1, response_with_usage("one"));
+        cache.put(key2, response_with_usage("two"));
+        cache.put(key3, response_with_usage("three"));
+
+        assert!(cache.get(&key1).is_none());
+        assert!(cache.get(&key2).is_some());
+        assert!(cache.get(&key3).is_some());
+    }
+
+    #[test]
+    fn cacheability_requires_deterministic_tool_free_non_streaming_request() {
+        let mut req = request();
+        assert!(request_is_cacheable(&req));
+
+        req.temperature = None;
+        assert!(!request_is_cacheable(&req));
+
+        req = request();
+        req.temperature = Some(0.2);
+        assert!(!request_is_cacheable(&req));
+
+        req = request();
+        req.stream = Some(true);
+        assert!(!request_is_cacheable(&req));
+
+        req = request();
+        req.top_p = Some(0.5);
+        assert!(!request_is_cacheable(&req));
+
+        req = request();
+        req.tool_choice = Some(serde_json::json!("auto"));
+        assert!(!request_is_cacheable(&req));
+    }
+}
@@ -268,6 +268,7 @@ pub enum MessageId {
    CmdExitDescription,
    CmdExportDescription,
    CmdFeedbackDescription,
+    CmdHfDescription,
    CmdHelpDescription,
    CmdHomeDescription,
    CmdHooksDescription,
@@ -314,6 +315,7 @@ pub enum MessageId {
    CmdNewDescription,
    CmdSessionsDescription,
    CmdSettingsDescription,
+    CmdSidebarDescription,
    CmdSkillDescription,
    CmdSkillsDescription,
    CmdSlopDescription,
@@ -594,6 +596,7 @@ pub const ALL_MESSAGE_IDS: &[MessageId] = &[
    MessageId::CmdExitDescription,
    MessageId::CmdExportDescription,
    MessageId::CmdFeedbackDescription,
+    MessageId::CmdHfDescription,
    MessageId::CmdHelpDescription,
    MessageId::CmdHomeDescription,
    MessageId::CmdHooksDescription,
@@ -637,6 +640,7 @@ pub const ALL_MESSAGE_IDS: &[MessageId] = &[
    MessageId::CmdNewDescription,
    MessageId::CmdSessionsDescription,
    MessageId::CmdSettingsDescription,
+    MessageId::CmdSidebarDescription,
    MessageId::CmdSkillDescription,
    MessageId::CmdSkillsDescription,
    MessageId::CmdSlopDescription,
@@ -1121,6 +1125,7 @@ fn english(id: MessageId) -> &'static str {
        MessageId::CmdExitDescription => "Exit the application",
        MessageId::CmdExportDescription => "Export conversation to markdown",
        MessageId::CmdFeedbackDescription => "Generate a GitHub feedback URL",
+        MessageId::CmdHfDescription => "Inspect Hugging Face MCP setup and concepts",
        MessageId::CmdHelpDescription => "Show help information",
        MessageId::CmdHomeDescription => "Show home dashboard with stats and quick actions",
        MessageId::CmdHooksDescription => "List configured lifecycle hooks (read-only)",
@@ -1181,6 +1186,7 @@ fn english(id: MessageId) -> &'static str {
        MessageId::CmdNewDescription => "Start a fresh saved session",
        MessageId::CmdSessionsDescription => "Open session history picker",
        MessageId::CmdSettingsDescription => "Show persistent settings",
+        MessageId::CmdSidebarDescription => "Toggle or focus the right sidebar",
        MessageId::CmdSkillDescription => {
            "Activate a skill, or install/update/uninstall/trust a community skill"
        }
@@ -1587,6 +1593,7 @@ fn vietnamese(id: MessageId) -> Option<&'static str> {
        MessageId::CmdExitDescription => "Thoát ứng dụng",
        MessageId::CmdExportDescription => "Xuất cuộc trò chuyện sang định dạng Markdown",
        MessageId::CmdFeedbackDescription => "Tạo một URL để gửi phản hồi trên GitHub",
+        MessageId::CmdHfDescription => "Kiểm tra thiết lập và khái niệm Hugging Face MCP",
        MessageId::CmdHelpDescription => "Hiển thị thông tin trợ giúp",
        MessageId::CmdHomeDescription => {
            "Hiển thị bảng điều khiển trang chủ với số liệu thống kê và hành động nhanh"
@@ -1661,6 +1668,7 @@ fn vietnamese(id: MessageId) -> Option<&'static str> {
        MessageId::CmdNewDescription => "Bắt đầu một phiên lưu mới",
        MessageId::CmdSessionsDescription => "Mở bảng chọn lịch sử phiên làm việc",
        MessageId::CmdSettingsDescription => "Hiển thị các cài đặt liên tục",
+        MessageId::CmdSidebarDescription => "Toggle or focus the right sidebar",
        MessageId::CmdSkillDescription => {
            "Kích hoạt một kỹ năng, hoặc cài đặt/cập nhật/gỡ bỏ/tin cậy một kỹ năng cộng đồng"
        }
@@ -2138,6 +2146,7 @@ fn japanese(id: MessageId) -> Option<&'static str> {
        MessageId::CmdExitDescription => "アプリを終了",
        MessageId::CmdExportDescription => "会話を Markdown にエクスポート",
        MessageId::CmdFeedbackDescription => "GitHub フィードバック URL を生成",
+        MessageId::CmdHfDescription => "Hugging Face MCP の設定と概念を確認",
        MessageId::CmdHelpDescription => "ヘルプを表示",
        MessageId::CmdHomeDescription => "統計とクイックアクション付きのホームダッシュボードを表示",
        MessageId::CmdHooksDescription => {
@@ -2204,6 +2213,7 @@ fn japanese(id: MessageId) -> Option<&'static str> {
        MessageId::CmdNewDescription => "新しい保存済みセッションを開始",
        MessageId::CmdSessionsDescription => "セッション履歴ピッカーを開く",
        MessageId::CmdSettingsDescription => "永続化された設定を表示",
+        MessageId::CmdSidebarDescription => "Toggle or focus the right sidebar",
        MessageId::CmdSkillDescription => {
            "スキルを有効化、またはコミュニティスキルをインストール／更新／アンインストール／信頼"
        }
@@ -2588,6 +2598,7 @@ fn chinese_simplified(id: MessageId) -> Option<&'static str> {
        MessageId::CmdExitDescription => "退出应用",
        MessageId::CmdExportDescription => "将对话导出为 Markdown",
        MessageId::CmdFeedbackDescription => "生成 GitHub 反馈链接",
+        MessageId::CmdHfDescription => "检查 Hugging Face MCP 设置和概念",
        MessageId::CmdHelpDescription => "显示帮助信息",
        MessageId::CmdHomeDescription => "显示主页面板，含统计与快捷操作",
        MessageId::CmdHooksDescription => "列出已配置的生命周期钩子（只读）",
@@ -2645,6 +2656,7 @@ fn chinese_simplified(id: MessageId) -> Option<&'static str> {
        MessageId::CmdSessionsDescription => "打开会话历史选择器",
        MessageId::CmdSettingsDescription => "显示持久化设置",
        MessageId::CmdSkillDescription => "激活技能，或安装/更新/卸载/信任社区技能",
+        MessageId::CmdSidebarDescription => "Toggle or focus the right sidebar",
        MessageId::CmdSkillsDescription => {
            "列出本地技能（用 `/skills <prefix>` 按名称前缀过滤，--remote 浏览精选注册表）"
        }
@@ -3000,6 +3012,7 @@ fn portuguese_brazil(id: MessageId) -> Option<&'static str> {
        MessageId::CmdExitDescription => "Sair do aplicativo",
        MessageId::CmdExportDescription => "Exportar a conversa para markdown",
        MessageId::CmdFeedbackDescription => "Gerar uma URL de feedback no GitHub",
+        MessageId::CmdHfDescription => "Inspecionar configuracao e conceitos do Hugging Face MCP",
        MessageId::CmdHelpDescription => "Exibir informações de ajuda",
        MessageId::CmdHomeDescription => "Exibir o painel inicial com estatísticas e ações rápidas",
        MessageId::CmdHooksDescription => {
@@ -3072,6 +3085,7 @@ fn portuguese_brazil(id: MessageId) -> Option<&'static str> {
        MessageId::CmdNewDescription => "Iniciar uma nova sessão salva",
        MessageId::CmdSessionsDescription => "Abrir seletor de histórico de sessões",
        MessageId::CmdSettingsDescription => "Exibir as configurações persistidas",
+        MessageId::CmdSidebarDescription => "Toggle or focus the right sidebar",
        MessageId::CmdSkillDescription => {
            "Ativar uma skill, ou instalar/atualizar/desinstalar/confiar em uma skill da comunidade"
        }
@@ -3484,6 +3498,7 @@ fn spanish_latin_america(id: MessageId) -> Option<&'static str> {
        MessageId::CmdExitDescription => "Salir de la aplicación",
        MessageId::CmdExportDescription => "Exportar la conversación a markdown",
        MessageId::CmdFeedbackDescription => "Generar una URL de feedback en GitHub",
+        MessageId::CmdHfDescription => "Inspeccionar configuracion y conceptos de Hugging Face MCP",
        MessageId::CmdHelpDescription => "Mostrar información de ayuda",
        MessageId::CmdHomeDescription => {
            "Mostrar el panel inicial con estadísticas y acciones rápidas"
@@ -3564,6 +3579,7 @@ fn spanish_latin_america(id: MessageId) -> Option<&'static str> {
        MessageId::CmdNewDescription => "Iniciar una nueva sesión guardada",
        MessageId::CmdSessionsDescription => "Abrir el selector de sesiones",
        MessageId::CmdSettingsDescription => "Mostrar las configuraciones persistidas",
+        MessageId::CmdSidebarDescription => "Toggle or focus the right sidebar",
        MessageId::CmdSkillDescription => {
            "Activar una skill, o instalar/actualizar/desinstalar/confiar en una skill de la comunidad"
        }
@@ -27,6 +27,7 @@ mod compaction;
 mod composer_history;
 mod composer_stash;
 mod config;
+mod config_persistence;
 mod config_ui;
 mod core;
 mod cost_status;
@@ -39,18 +40,21 @@ mod features;
 mod handoff;
 mod hooks;
 mod llm_client;
+mod llm_response_cache;
 mod localization;
 mod logging;
 mod lsp;
 mod mcp;
 mod mcp_server;
 mod memory;
+mod model_routing;
 mod models;
 mod network_policy;
 mod palette;
 mod prefix_cache;
 mod pricing;
 mod project_context;
+mod project_context_cache;
 mod project_doc;
 mod prompt_zones;
 mod prompts;
@@ -77,6 +81,7 @@ mod task_manager;
 #[cfg(test)]
 mod test_support;
 mod theme_qa_audit;
+mod tls;
 mod tool_output_receipts;
 mod tools;
 mod tui;
@@ -109,6 +114,10 @@ fn configure_windows_console_utf8() {
 #[cfg(not(windows))]
 fn configure_windows_console_utf8() {}

+fn install_rustls_crypto_provider() {
+    crate::tls::ensure_rustls_crypto_provider();
+}
+
 #[derive(Parser, Debug)]
 #[command(
    name = "codewhale-tui",
@@ -846,6 +855,7 @@ enum SandboxCommand {
 #[tokio::main]
 async fn main() -> Result<()> {
    configure_windows_console_utf8();
+    install_rustls_crypto_provider();

    // ── Process hardening (#2183) ─────────────────────────────────────────
    // MUST run before Tokio is booted and before any threads are spawned.
@@ -1020,7 +1030,8 @@ async fn main() -> Result<()> {
            Commands::Eval(args) => run_eval(args),
            Commands::Mcp { command } => {
                let config = load_config_from_cli(&cli)?;
-                run_mcp_command(&config, command).await
+                let workspace = resolve_workspace(&cli);
+                run_mcp_command(&config, &workspace, command).await
            }
            Commands::Execpolicy(command) => {
                let config = load_config_from_cli(&cli)?;
@@ -1533,6 +1544,7 @@ fn mcp_template_json() -> Result<String> {
            command: Some("node".to_string()),
            args: vec!["./path/to/your-mcp-server.js".to_string()],
            env: std::collections::HashMap::new(),
+            cwd: None,
            url: None,
            transport: None,
            connect_timeout: None,
@@ -2071,14 +2083,21 @@ fn run_setup_status(config: &Config, workspace: &Path) -> Result<()> {
    println!("  · default_text_model: {model}");

    let mcp_path = config.mcp_config_path();
-    let mcp_count = match load_mcp_config(&mcp_path) {
+    let project_mcp_path = crate::mcp::workspace_mcp_config_path(workspace);
+    let mcp_count = match crate::mcp::load_config_with_workspace(&mcp_path, workspace) {
        Ok(cfg) => cfg.servers.len(),
        Err(_) => 0,
    };
    let mcp_present = if mcp_path.exists() { "" } else { "  (missing)" };
+    let project_mcp_present = if project_mcp_path.exists() {
+        ""
+    } else {
+        "  (missing)"
+    };
    println!(
-        "  · mcp servers: {mcp_count} at {}{mcp_present}",
-        mcp_path.display()
+        "  · mcp servers: {mcp_count} from {}{mcp_present} + {}{project_mcp_present}",
+        mcp_path.display(),
+        project_mcp_path.display()
    );

    let skills_dir = config.skills_dir();
@@ -2473,6 +2492,11 @@ async fn run_doctor(config: &Config, workspace: &Path, config_path_override: Opt
    println!("  · provider: {}", api_target.provider);
    println!("  · base_url: {}", api_target.base_url);
    println!("  · model: {}", api_target.model);
+    let tls_status = doctor_tls_status(config);
+    if !tls_status.certificate_verification {
+        println!("  ! {}", tls_status.message);
+        println!("    Prefer SSL_CERT_FILE with a trusted custom CA bundle when possible.");
+    }
    let strict_tool_mode = doctor_strict_tool_mode_status(config);
    let strict_icon = match strict_tool_mode.status {
        "ready" => "✓".truecolor(aqua_r, aqua_g, aqua_b),
@@ -2568,68 +2592,85 @@ async fn run_doctor(config: &Config, workspace: &Path, config_path_override: Opt
    }

    let mcp_config_path = config.mcp_config_path();
+    let project_mcp_config_path = crate::mcp::workspace_mcp_config_path(workspace);
    if mcp_config_path.exists() {
        println!(
            "  {} MCP config found at {}",
            "✓".truecolor(aqua_r, aqua_g, aqua_b),
            crate::utils::display_path(&mcp_config_path)
        );
-        match load_mcp_config(&mcp_config_path) {
-            Ok(cfg) if cfg.servers.is_empty() => {
-                println!("  {} 0 server(s) configured", "·".dimmed());
-            }
-            Ok(cfg) => {
-                println!(
-                    "  {} {} server(s) configured",
-                    "·".dimmed(),
-                    cfg.servers.len()
-                );
-                for (name, server) in &cfg.servers {
-                    let status = doctor_check_mcp_server(server);
-                    let icon = match status {
-                        McpServerDoctorStatus::Ok(ref detail) => {
-                            format!(
-                                "  {} {name}: {}",
-                                "✓".truecolor(aqua_r, aqua_g, aqua_b),
-                                detail
-                            )
-                        }
-                        McpServerDoctorStatus::Warning(ref detail) => {
-                            format!(
-                                "  {} {name}: {}",
-                                "!".truecolor(sky_r, sky_g, sky_b),
-                                detail
-                            )
-                        }
-                        McpServerDoctorStatus::Error(ref detail) => {
-                            format!(
-                                "  {} {name}: {}",
-                                "✗".truecolor(red_r, red_g, red_b),
-                                detail
-                            )
-                        }
-                    };
-                    println!("{icon}");
-                    if !server.enabled {
-                        println!("      (disabled)");
-                    }
-                }
-            }
-            Err(err) => {
-                println!(
-                    "  {} MCP config parse error: {}",
-                    "✗".truecolor(red_r, red_g, red_b),
-                    err
-                );
-            }
-        }
    } else {
        println!(
            "  {} MCP config not found at {}",
            "·".dimmed(),
            crate::utils::display_path(&mcp_config_path)
        );
-        println!("    Run `codewhale mcp init` or `codewhale setup --mcp`.");
+    }
+    if project_mcp_config_path.exists() {
+        println!(
+            "  {} Project MCP config found at {}",
+            "✓".truecolor(aqua_r, aqua_g, aqua_b),
+            crate::utils::display_path(&project_mcp_config_path)
+        );
+    } else {
+        println!(
+            "  {} Project MCP config not found at {}",
+            "·".dimmed(),
+            crate::utils::display_path(&project_mcp_config_path)
+        );
+    }
+
+    match crate::mcp::load_config_with_workspace(&mcp_config_path, workspace) {
+        Ok(cfg) if cfg.servers.is_empty() => {
+            println!("  {} 0 merged server(s) configured", "·".dimmed());
+            if !mcp_config_path.exists() && !project_mcp_config_path.exists() {
+                println!("    Run `codewhale mcp init` or add `.codewhale/mcp.json`.");
+            }
+        }
+        Ok(cfg) => {
+            println!(
+                "  {} {} merged server(s) configured",
+                "·".dimmed(),
+                cfg.servers.len()
+            );
+            for (name, server) in &cfg.servers {
+                let status = doctor_check_mcp_server(server);
+                let icon = match status {
+                    McpServerDoctorStatus::Ok(ref detail) => {
+                        format!(
+                            "  {} {name}: {}",
+                            "✓".truecolor(aqua_r, aqua_g, aqua_b),
+                            detail
+                        )
+                    }
+                    McpServerDoctorStatus::Warning(ref detail) => {
+                        format!(
+                            "  {} {name}: {}",
+                            "!".truecolor(sky_r, sky_g, sky_b),
+                            detail
+                        )
+                    }
+                    McpServerDoctorStatus::Error(ref detail) => {
+                        format!(
+                            "  {} {name}: {}",
+                            "✗".truecolor(red_r, red_g, red_b),
+                            detail
+                        )
+                    }
+                };
+                println!("{icon}");
+                if !server.enabled {
+                    println!("      (disabled)");
+                }
+            }
+        }
+        Err(err) => {
+            println!(
+                "  {} MCP config parse error: {}",
+                "✗".truecolor(red_r, red_g, red_b),
+                err
+            );
+        }
    }

    // Skills configuration
@@ -3137,8 +3178,10 @@ fn run_doctor_json(
    };

    let mcp_config_path = config.mcp_config_path();
+    let project_mcp_config_path = crate::mcp::workspace_mcp_config_path(workspace);
    let mcp_present = mcp_config_path.exists();
-    let mcp_summary = match load_mcp_config(&mcp_config_path) {
+    let project_mcp_present = project_mcp_config_path.exists();
+    let mcp_summary = match crate::mcp::load_config_with_workspace(&mcp_config_path, workspace) {
        Ok(cfg) => {
            let servers: Vec<serde_json::Value> = cfg
                .servers
@@ -3161,12 +3204,16 @@ fn run_doctor_json(
            json!({
                "config_path": mcp_config_path.display().to_string(),
                "present": mcp_present,
+                "project_config_path": project_mcp_config_path.display().to_string(),
+                "project_present": project_mcp_present,
                "servers": servers,
            })
        }
        Err(err) => json!({
            "config_path": mcp_config_path.display().to_string(),
            "present": mcp_present,
+            "project_config_path": project_mcp_config_path.display().to_string(),
+            "project_present": project_mcp_present,
            "servers": [],
            "error": err.to_string(),
        }),
@@ -3241,6 +3288,7 @@ fn run_doctor_json(
    });
    let api_target = doctor_api_target(config);
    let strict_tool_mode = doctor_strict_tool_mode_status(config);
+    let tls_status = doctor_tls_status(config);

    let report = json!({
        "version": env!("CARGO_PKG_VERSION"),
@@ -3259,6 +3307,12 @@ fn run_doctor_json(
            "message": strict_tool_mode.message,
            "recommended_base_url": strict_tool_mode.recommended_base_url,
        },
+        "tls": {
+            "certificate_verification": tls_status.certificate_verification,
+            "insecure_skip_tls_verify": tls_status.insecure_skip_tls_verify,
+            "provider": tls_status.provider,
+            "message": tls_status.message,
+        },
        "search_provider": doctor_search_provider_json(config),
        "memory": memory_summary,
        "mcp": mcp_summary,
@@ -3468,6 +3522,29 @@ fn doctor_strict_tool_mode_status(config: &Config) -> DoctorStrictToolModeStatus
    }
 }

+#[derive(Debug, Clone, PartialEq, Eq)]
+struct DoctorTlsStatus {
+    certificate_verification: bool,
+    insecure_skip_tls_verify: bool,
+    provider: &'static str,
+    message: String,
+}
+
+fn doctor_tls_status(config: &Config) -> DoctorTlsStatus {
+    let provider = config.api_provider().as_str();
+    let insecure_skip_tls_verify = config.insecure_skip_tls_verify();
+    DoctorTlsStatus {
+        certificate_verification: !insecure_skip_tls_verify,
+        insecure_skip_tls_verify,
+        provider,
+        message: if insecure_skip_tls_verify {
+            format!("TLS certificate verification disabled for provider {provider}")
+        } else {
+            "TLS certificate verification enabled".to_string()
+        },
+    }
+}
+
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 enum DeepSeekBaseUrlKind {
    Beta,
@@ -3820,6 +3897,10 @@ fn rustc_version() -> String {
 }

 /// List saved sessions
+fn sessions_resume_command() -> &'static str {
+    "codewhale resume"
+}
+
 fn list_sessions(limit: usize, search: Option<String>) -> Result<()> {
    use crate::palette;
    use colored::Colorize;
@@ -3874,7 +3955,7 @@ fn list_sessions(limit: usize, search: Option<String>) -> Result<()> {
    println!();
    println!(
        "Resume with: {} {}",
-        "codewhale --resume".truecolor(blue_r, blue_g, blue_b),
+        sessions_resume_command().truecolor(blue_r, blue_g, blue_b),
        "<session-id>".dimmed()
    );
    println!(
@@ -4429,7 +4510,7 @@ fn read_patch_from_stdin() -> Result<String> {
    Ok(buffer)
 }

-async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
+async fn run_mcp_command(config: &Config, workspace: &Path, command: McpCommand) -> Result<()> {
    let config_path = config.mcp_config_path();
    match command {
        McpCommand::Init { force } => {
@@ -4452,9 +4533,13 @@ async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
            Ok(())
        }
        McpCommand::List => {
-            let cfg = load_mcp_config(&config_path)?;
+            let cfg = crate::mcp::load_config_with_workspace(&config_path, workspace)?;
            if cfg.servers.is_empty() {
-                println!("No MCP servers configured in {}", config_path.display());
+                println!(
+                    "No MCP servers configured in {} or {}",
+                    config_path.display(),
+                    crate::mcp::workspace_mcp_config_path(workspace).display()
+                );
                return Ok(());
            }
            println!("MCP servers ({}):", cfg.servers.len());
@@ -4482,7 +4567,7 @@ async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
            Ok(())
        }
        McpCommand::Connect { server } => {
-            let mut pool = McpPool::from_config_path(&config_path)?;
+            let mut pool = McpPool::from_config_path_with_workspace(&config_path, workspace)?;
            if let Some(name) = server {
                pool.get_or_connect(&name).await?;
                println!("Connected to MCP server: {name}");
@@ -4499,7 +4584,7 @@ async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
            Ok(())
        }
        McpCommand::Tools { server } => {
-            let mut pool = McpPool::from_config_path(&config_path)?;
+            let mut pool = McpPool::from_config_path_with_workspace(&config_path, workspace)?;
            if let Some(name) = server {
                let conn = pool.get_or_connect(&name).await?;
                if conn.tools().is_empty() {
@@ -4558,6 +4643,7 @@ async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
                    command,
                    args,
                    env: std::collections::HashMap::new(),
+                    cwd: None,
                    url,
                    transport,
                    connect_timeout: None,
@@ -4609,7 +4695,7 @@ async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
            Ok(())
        }
        McpCommand::Validate => {
-            let mut pool = McpPool::from_config_path(&config_path)?;
+            let mut pool = McpPool::from_config_path_with_workspace(&config_path, workspace)?;
            let errors = pool.connect_all().await;
            if errors.is_empty() {
                println!("MCP config is valid. All enabled servers connected.");
@@ -4645,6 +4731,7 @@ async fn run_mcp_command(config: &Config, command: McpCommand) -> Result<()> {
                    command: Some(exe_str.clone()),
                    args,
                    env: std::collections::HashMap::new(),
+                    cwd: None,
                    url: None,
                    transport: None,
                    connect_timeout: None,
@@ -5420,7 +5507,7 @@ struct CliAutoRoute {
 async fn resolve_cli_auto_route(config: &Config, model: &str, prompt: &str) -> CliAutoRoute {
    if model.trim().eq_ignore_ascii_case("auto") {
        let selection =
-            commands::resolve_auto_route_with_flash(config, prompt, "", "auto", "auto").await;
+            model_routing::resolve_auto_route_with_flash(config, prompt, "", "auto", "auto").await;
        CliAutoRoute {
            model: selection.model,
            reasoning_effort: selection.reasoning_effort,
@@ -5543,6 +5630,9 @@ struct ExecStreamMeta {
    input_tokens: u32,
    output_tokens: u32,
    session_id: String,
+    resume_command: String,
+    workspace: String,
+    message_count: usize,
    status: Option<String>,
 }

@@ -5578,6 +5668,14 @@ fn emit_exec_stream_event(event: &ExecStreamEvent) -> Result<()> {
    Ok(())
 }

+fn exec_resume_command(session_id: &str) -> String {
+    if session_id.trim().is_empty() {
+        String::new()
+    } else {
+        format!("codewhale exec --resume {session_id}")
+    }
+}
+
 fn persist_exec_session(
    messages: &[Message],
    model: &str,
@@ -5716,6 +5814,7 @@ async fn run_exec_agent(
        runtime_services: crate::tools::spec::RuntimeToolServices::default(),
        subagent_model_overrides: config.subagent_model_overrides(),
        subagent_api_timeout: std::time::Duration::from_secs(config.subagent_api_timeout_secs()),
+        stream_chunk_timeout: std::time::Duration::from_secs(config.stream_chunk_timeout_secs()),
        subagent_heartbeat_timeout: std::time::Duration::from_secs(
            config.subagent_heartbeat_timeout_secs(),
        ),
@@ -5734,6 +5833,7 @@ async fn run_exec_agent(
        workshop: config.workshop.clone(),
        search_provider: config.search_provider(),
        search_api_key: config.search.as_ref().and_then(|s| s.api_key.clone()),
+        search_base_url: config.search.as_ref().and_then(|s| s.base_url.clone()),
        tools_always_load: config.tools_always_load(),
        tools: config.tools.clone(),
    };
@@ -6044,7 +6144,13 @@ async fn run_exec_agent(
                            model: latest_model.clone(),
                            input_tokens: usage.input_tokens,
                            output_tokens: usage.output_tokens,
+                            resume_command: saved_session_id
+                                .as_deref()
+                                .map(exec_resume_command)
+                                .unwrap_or_default(),
                            session_id: saved_session_id.unwrap_or_default(),
+                            workspace: latest_workspace.display().to_string(),
+                            message_count: latest_messages.len(),
                            status: summary.status.clone(),
                        },
                    })?;
@@ -6245,6 +6351,34 @@ mod doctor_endpoint_tests {
        assert!(status.message.contains("custom endpoint"));
    }

+    #[test]
+    fn doctor_tls_status_reports_verification_enabled_by_default() {
+        let status = doctor_tls_status(&Config::default());
+
+        assert!(status.certificate_verification);
+        assert!(!status.insecure_skip_tls_verify);
+        assert_eq!(status.provider, "deepseek");
+        assert!(status.message.contains("enabled"));
+    }
+
+    #[test]
+    fn doctor_tls_status_warns_when_active_provider_skips_verification() {
+        let mut providers = crate::config::ProvidersConfig::default();
+        providers.openai.insecure_skip_tls_verify = Some(true);
+        let config = Config {
+            provider: Some("openai".to_string()),
+            providers: Some(providers),
+            ..Default::default()
+        };
+
+        let status = doctor_tls_status(&config);
+
+        assert!(!status.certificate_verification);
+        assert!(status.insecure_skip_tls_verify);
+        assert_eq!(status.provider, "openai");
+        assert!(status.message.contains("disabled"));
+    }
+
    #[test]
    fn provider_capability_report_exposes_alias_deprecation_for_deepseek_chat() {
        let config = Config {
@@ -6306,6 +6440,7 @@ mod doctor_endpoint_tests {
        let config = Config {
            search: Some(crate::config::SearchConfig {
                provider: Some(crate::config::SearchProvider::DuckDuckGo),
+                base_url: None,
                api_key: None,
            }),
            ..Default::default()
@@ -6345,6 +6480,7 @@ mod doctor_endpoint_tests {
        let config = Config {
            search: Some(crate::config::SearchConfig {
                provider: Some(crate::config::SearchProvider::Bing),
+                base_url: None,
                api_key: None,
            }),
            ..Default::default()
@@ -6506,6 +6642,19 @@ mod terminal_mode_tests {
        assert!(args.continue_session);
    }

+    #[test]
+    fn sessions_footer_points_to_resume_subcommand() {
+        let cli = parse_cli(&["codewhale", "resume", "abc123"]);
+        let Some(Commands::Resume { session_id, last }) = cli.command else {
+            panic!("expected resume command");
+        };
+
+        assert_eq!(session_id.as_deref(), Some("abc123"));
+        assert!(!last);
+        assert_eq!(sessions_resume_command(), "codewhale resume");
+        assert!(!sessions_resume_command().contains("--resume"));
+    }
+
    #[test]
    fn swebench_run_accepts_instance_issue_and_prediction_path() {
        let cli = parse_cli(&[
@@ -6579,6 +6728,12 @@ mod terminal_mode_tests {
            .args(["config", "user.email", "codewhale@example.invalid"])
            .status()
            .expect("git config user.email");
+        std::process::Command::new("git")
+            .arg("-C")
+            .arg(repo)
+            .args(["config", "core.autocrlf", "false"])
+            .status()
+            .expect("git config core.autocrlf");
        std::fs::write(
            repo.join("math_utils.py"),
            "def add(a, b):\n    return a - b\n",
@@ -6654,6 +6809,34 @@ mod terminal_mode_tests {
        assert_eq!(parsed["type"], "tool_result");
    }

+    #[test]
+    fn exec_stream_metadata_includes_resume_breadcrumbs() {
+        let event = ExecStreamEvent::Metadata {
+            meta: ExecStreamMeta {
+                model: "deepseek-v4-flash".to_string(),
+                input_tokens: 123,
+                output_tokens: 45,
+                session_id: "abc123".to_string(),
+                resume_command: exec_resume_command("abc123"),
+                workspace: "/tmp/work".to_string(),
+                message_count: 4,
+                status: Some("completed".to_string()),
+            },
+        };
+
+        let json = serde_json::to_string(&event).expect("serializes");
+        assert!(!json.contains('\n'));
+        let parsed: serde_json::Value = serde_json::from_str(&json).expect("valid json");
+        assert_eq!(parsed["type"], "metadata");
+        assert_eq!(parsed["meta"]["session_id"], "abc123");
+        assert_eq!(
+            parsed["meta"]["resume_command"],
+            "codewhale exec --resume abc123"
+        );
+        assert_eq!(parsed["meta"]["workspace"], "/tmp/work");
+        assert_eq!(parsed["meta"]["message_count"], 4);
+    }
+
    #[test]
    fn alternate_screen_defaults_on_in_auto_mode() {
        let cli = parse_cli(&["codewhale"]);
@@ -6678,6 +6861,7 @@ mod terminal_mode_tests {
                alternate_screen: Some("never".to_string()),
                mouse_capture: None,
                terminal_probe_timeout_ms: None,
+                stream_chunk_timeout_secs: None,
                status_items: None,
                osc8_links: None,
                composer_arrows_scroll: None,
@@ -6771,6 +6955,7 @@ mod terminal_mode_tests {
                alternate_screen: None,
                mouse_capture: Some(false),
                terminal_probe_timeout_ms: None,
+                stream_chunk_timeout_secs: None,
                status_items: None,
                osc8_links: None,
                composer_arrows_scroll: None,
@@ -6802,6 +6987,7 @@ mod terminal_mode_tests {
                alternate_screen: None,
                mouse_capture: Some(true),
                terminal_probe_timeout_ms: None,
+                stream_chunk_timeout_secs: None,
                status_items: None,
                osc8_links: None,
                composer_arrows_scroll: None,
@@ -6887,6 +7073,7 @@ mod terminal_mode_tests {
                alternate_screen: None,
                mouse_capture: Some(true),
                terminal_probe_timeout_ms: None,
+                stream_chunk_timeout_secs: None,
                status_items: None,
                osc8_links: None,
                composer_arrows_scroll: None,
@@ -7445,6 +7632,7 @@ mod doctor_mcp_tests {
            command: command.map(String::from),
            args: args.iter().map(|s| s.to_string()).collect(),
            env: std::collections::HashMap::new(),
+            cwd: None,
            url: url.map(String::from),
            transport: None,
            connect_timeout: None,
@@ -381,7 +381,7 @@ impl McpServer {
        let messages = if internal_name == "deepseek" {
            vec![user_message]
        } else {
-            let thread = self.threads.lock().unwrap();
+            let thread = self.threads.lock().unwrap_or_else(|e| e.into_inner());
            let mut existing = thread.get(&thread_id).cloned().ok_or_else(|| RpcError {
                code: -32602,
                message: format!("Thread not found: {thread_id}"),
@@ -431,7 +431,7 @@ impl McpServer {

        // Store the assistant response in the thread
        {
-            let mut thread = self.threads.lock().unwrap();
+            let mut thread = self.threads.lock().unwrap_or_else(|e| e.into_inner());
            let convo = thread.entry(thread_id.clone()).or_default();
            // If deepseek, we already have just the user message; if deepseek-reply,
            // the user message was appended to the cloned messages above but we need
@@ -0,0 +1,569 @@
+//! Model selection and auto-routing.
+//!
+//! The CLI, TUI, runtime threads, subagents, and command handlers all need
+//! this behavior, so it intentionally lives outside the command tree.
+
+use std::time::Duration;
+
+use anyhow::Result;
+
+use crate::client::DeepSeekClient;
+use crate::config::Config;
+use crate::llm_client::LlmClient;
+use crate::models::{ContentBlock, Message, MessageRequest, MessageResponse, SystemPrompt};
+use crate::tui::app::ReasoningEffort;
+
+/// Auto-select a model based on request complexity.
+///
+/// Short messages (<100 chars) go to Flash. Long messages and requests with
+/// complex keywords go to Pro. The fallback is Flash.
+pub(crate) fn auto_model_heuristic(input: &str, current_model: &str) -> String {
+    auto_model_heuristic_with_bias(input, current_model, false)
+}
+
+fn auto_model_heuristic_with_bias(input: &str, current_model: &str, cost_saving: bool) -> String {
+    auto_model_heuristic_selection_with_bias(input, current_model, cost_saving).model
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+enum AutoModelHeuristicConfidence {
+    Decisive,
+    Ambiguous,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+struct AutoModelHeuristicSelection {
+    model: String,
+    confidence: AutoModelHeuristicConfidence,
+}
+
+fn auto_model_heuristic_selection_with_bias(
+    input: &str,
+    _current_model: &str,
+    cost_saving: bool,
+) -> AutoModelHeuristicSelection {
+    let len = input.chars().count();
+    let lower = input.to_lowercase();
+    let borderline_pro_keywords: &[&str] = &[
+        "implement",
+        "analyze",
+        "\u{5b9e}\u{73b0}",
+        "\u{5206}\u{6790}",
+        "\u{5be6}\u{73fe}",
+    ];
+    let strong_match = COMPLEX_KEYWORDS
+        .iter()
+        .any(|kw| !borderline_pro_keywords.contains(kw) && lower.contains(kw));
+    let borderline_match = borderline_pro_keywords.iter().any(|kw| lower.contains(kw));
+    let pro_match = strong_match || (!cost_saving && borderline_match);
+    if pro_match {
+        return AutoModelHeuristicSelection {
+            model: "deepseek-v4-pro".to_string(),
+            confidence: AutoModelHeuristicConfidence::Decisive,
+        };
+    }
+    if len < 100 {
+        return AutoModelHeuristicSelection {
+            model: "deepseek-v4-flash".to_string(),
+            confidence: AutoModelHeuristicConfidence::Decisive,
+        };
+    }
+    let long_threshold = if cost_saving { 1_000 } else { 500 };
+    if len > long_threshold {
+        return AutoModelHeuristicSelection {
+            model: "deepseek-v4-pro".to_string(),
+            confidence: AutoModelHeuristicConfidence::Decisive,
+        };
+    }
+
+    AutoModelHeuristicSelection {
+        model: "deepseek-v4-flash".to_string(),
+        confidence: AutoModelHeuristicConfidence::Ambiguous,
+    }
+}
+
+const COMPLEX_KEYWORDS: &[&str] = &[
+    "refactor",
+    "architecture",
+    "design",
+    "debug",
+    "security",
+    "review",
+    "audit",
+    "migrate",
+    "optimize",
+    "rewrite",
+    "implement",
+    "analyze",
+    "\u{91cd}\u{6784}",
+    "\u{67b6}\u{6784}",
+    "\u{8bbe}\u{8ba1}",
+    "\u{8c03}\u{8bd5}",
+    "\u{5b89}\u{5168}",
+    "\u{5ba1}\u{67e5}",
+    "\u{5ba1}\u{8ba1}",
+    "\u{8fc1}\u{79fb}",
+    "\u{4f18}\u{5316}",
+    "\u{91cd}\u{5199}",
+    "\u{5b9e}\u{73b0}",
+    "\u{5206}\u{6790}",
+    "\u{91cd}\u{69cb}",
+    "\u{67b6}\u{69cb}",
+    "\u{8a2d}\u{8a08}",
+    "\u{8abf}\u{8a66}",
+    "\u{5be9}\u{67e5}",
+    "\u{5be9}\u{8a08}",
+    "\u{9077}\u{79fb}",
+    "\u{512a}\u{5316}",
+    "\u{91cd}\u{5beb}",
+    "\u{5be6}\u{73fe}",
+];
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub(crate) struct AutoRouteRecommendation {
+    pub(crate) model: String,
+    pub(crate) reasoning_effort: Option<ReasoningEffort>,
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub(crate) enum AutoRouteSource {
+    FlashRouter,
+    Heuristic,
+}
+
+impl AutoRouteSource {
+    #[must_use]
+    pub(crate) fn label(self) -> &'static str {
+        match self {
+            AutoRouteSource::FlashRouter => "flash-router",
+            AutoRouteSource::Heuristic => "heuristic",
+        }
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub(crate) struct AutoRouteSelection {
+    pub(crate) model: String,
+    pub(crate) reasoning_effort: Option<ReasoningEffort>,
+    pub(crate) source: AutoRouteSource,
+}
+
+const AUTO_MODEL_ROUTER_SYSTEM_PROMPT: &str = "\
+You are the codewhale auto-routing classifier. Return only compact JSON: \
+{\"model\":\"deepseek-v4-flash|deepseek-v4-pro\",\"thinking\":\"off|high|max\"}. \
+Use deepseek-v4-flash for trivial, conversational, status, or single-step work. \
+Use deepseek-v4-pro for coding, debugging, release work, multi-step tasks, high-risk decisions, \
+tool-heavy work, ambiguous requests, or anything that benefits from deeper reasoning. \
+Use thinking off only for trivial no-tool answers, high for ordinary reasoning, and max for \
+agentic, coding, multi-file, release, architecture, debugging, security, tool-heavy, or uncertain work.";
+
+const AUTO_MODEL_ROUTER_COST_SAVING_ADDENDUM: &str = "\
+\n\nCost-saving mode is ON. Prefer deepseek-v4-flash for any request that is \
+not unmistakably agentic, multi-step, architecture/design, security review, \
+debugging, or otherwise clearly out of Flash's capability. Resolve ambiguous \
+cases in favour of deepseek-v4-flash, not deepseek-v4-pro.";
+
+pub(crate) fn parse_auto_route_recommendation(raw: &str) -> Option<AutoRouteRecommendation> {
+    let json = extract_first_json_object(raw)?;
+    let value: serde_json::Value = serde_json::from_str(json).ok()?;
+    let model = value.get("model").and_then(serde_json::Value::as_str)?;
+    let model = normalize_auto_route_model(model)?;
+    let reasoning_effort = value
+        .get("thinking")
+        .or_else(|| value.get("reasoning_effort"))
+        .or_else(|| value.get("effort"))
+        .and_then(serde_json::Value::as_str)
+        .and_then(parse_auto_route_reasoning_effort);
+
+    Some(AutoRouteRecommendation {
+        model: model.to_string(),
+        reasoning_effort,
+    })
+}
+
+fn extract_first_json_object(raw: &str) -> Option<&str> {
+    let start = raw.find('{')?;
+    let end = raw.rfind('}')?;
+    (end >= start).then_some(&raw[start..=end])
+}
+
+fn normalize_auto_route_model(model: &str) -> Option<&'static str> {
+    match model.trim().to_ascii_lowercase().as_str() {
+        "deepseek-v4-pro" | "v4-pro" | "pro" => Some("deepseek-v4-pro"),
+        "deepseek-v4-flash" | "v4-flash" | "flash" => Some("deepseek-v4-flash"),
+        _ => None,
+    }
+}
+
+fn parse_auto_route_reasoning_effort(effort: &str) -> Option<ReasoningEffort> {
+    match effort.trim().to_ascii_lowercase().as_str() {
+        "off" | "disabled" | "none" | "false" => Some(ReasoningEffort::Off),
+        "low" | "minimal" | "medium" | "mid" => Some(ReasoningEffort::High),
+        "high" => Some(ReasoningEffort::High),
+        "max" | "maximum" | "xhigh" => Some(ReasoningEffort::Max),
+        _ => None,
+    }
+}
+
+#[must_use]
+pub(crate) fn normalize_auto_route_effort(effort: ReasoningEffort) -> ReasoningEffort {
+    match effort {
+        ReasoningEffort::Low | ReasoningEffort::Medium => ReasoningEffort::High,
+        other => other,
+    }
+}
+
+pub(crate) async fn resolve_auto_route_with_flash(
+    config: &Config,
+    latest_request: &str,
+    recent_context: &str,
+    selected_model_mode: &str,
+    selected_thinking_mode: &str,
+) -> AutoRouteSelection {
+    let cost_saving = config.auto_cost_saving();
+    let heuristic =
+        auto_model_heuristic_selection_with_bias(latest_request, selected_model_mode, cost_saving);
+    if heuristic.confidence == AutoModelHeuristicConfidence::Decisive {
+        return auto_route_from_heuristic(latest_request, heuristic);
+    }
+
+    match auto_route_flash_recommendation(
+        config,
+        latest_request,
+        recent_context,
+        selected_model_mode,
+        selected_thinking_mode,
+    )
+    .await
+    {
+        Ok(Some(recommendation)) => AutoRouteSelection {
+            model: recommendation.model,
+            reasoning_effort: recommendation.reasoning_effort,
+            source: AutoRouteSource::FlashRouter,
+        },
+        Ok(None) | Err(_) => auto_route_from_heuristic(latest_request, heuristic),
+    }
+}
+
+fn auto_route_from_heuristic(
+    latest_request: &str,
+    heuristic: AutoModelHeuristicSelection,
+) -> AutoRouteSelection {
+    AutoRouteSelection {
+        model: heuristic.model,
+        reasoning_effort: Some(normalize_auto_route_effort(crate::auto_reasoning::select(
+            false,
+            latest_request,
+        ))),
+        source: AutoRouteSource::Heuristic,
+    }
+}
+
+async fn auto_route_flash_recommendation(
+    config: &Config,
+    latest_request: &str,
+    recent_context: &str,
+    selected_model_mode: &str,
+    selected_thinking_mode: &str,
+) -> Result<Option<AutoRouteRecommendation>> {
+    if cfg!(test) {
+        return Ok(None);
+    }
+
+    let client = DeepSeekClient::new(config)?;
+    let mut router_system = AUTO_MODEL_ROUTER_SYSTEM_PROMPT.to_string();
+    if config.auto_cost_saving() {
+        router_system.push_str(AUTO_MODEL_ROUTER_COST_SAVING_ADDENDUM);
+    }
+    let request = MessageRequest {
+        model: "deepseek-v4-flash".to_string(),
+        messages: vec![Message {
+            role: "user".to_string(),
+            content: vec![ContentBlock::Text {
+                text: auto_route_prompt(
+                    latest_request,
+                    recent_context,
+                    selected_model_mode,
+                    selected_thinking_mode,
+                ),
+                cache_control: None,
+            }],
+        }],
+        max_tokens: 96,
+        system: Some(SystemPrompt::Text(router_system)),
+        tools: None,
+        tool_choice: None,
+        metadata: None,
+        thinking: None,
+        reasoning_effort: Some("off".to_string()),
+        stream: Some(false),
+        temperature: Some(0.0),
+        top_p: None,
+    };
+
+    let response =
+        tokio::time::timeout(Duration::from_secs(4), client.create_message(request)).await??;
+    Ok(parse_auto_route_recommendation(&message_response_text(
+        &response,
+    )))
+}
+
+fn auto_route_prompt(
+    latest_request: &str,
+    recent_context: &str,
+    selected_model_mode: &str,
+    selected_thinking_mode: &str,
+) -> String {
+    format!(
+        "Session mode: agent\nSelected model mode: {}\nSelected thinking mode: {}\n\nRecent context:\n{}\n\nLatest user request:\n{}\n\nReturn JSON only.",
+        selected_model_mode,
+        selected_thinking_mode,
+        if recent_context.trim().is_empty() {
+            "No prior context."
+        } else {
+            recent_context
+        },
+        truncate_for_auto_router(latest_request, 4_000)
+    )
+}
+
+fn message_response_text(response: &MessageResponse) -> String {
+    let mut out = String::new();
+    for block in &response.content {
+        match block {
+            ContentBlock::Text { text, .. } | ContentBlock::ToolResult { content: text, .. } => {
+                append_router_text(&mut out, text);
+            }
+            ContentBlock::Thinking { thinking } => {
+                append_router_text(&mut out, thinking);
+            }
+            ContentBlock::ToolUse { name, .. } => {
+                append_router_text(&mut out, &format!("[tool call: {name}]"));
+            }
+            _ => {}
+        }
+    }
+    out
+}
+
+fn append_router_text(out: &mut String, text: &str) {
+    if !out.is_empty() {
+        out.push('\n');
+    }
+    out.push_str(text);
+}
+
+fn truncate_for_auto_router(text: &str, max_chars: usize) -> String {
+    let mut chars = text.chars();
+    let truncated: String = chars.by_ref().take(max_chars).collect();
+    if chars.next().is_some() {
+        format!("{truncated}...")
+    } else {
+        truncated
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn auto_model_heuristic_chinese_keywords_route_to_pro() {
+        for msg in [
+            "\u{5e2e}\u{6211}\u{91cd}\u{6784}\u{8fd9}\u{4e2a}\u{6a21}\u{5757}",
+            "\u{8bbe}\u{8ba1}\u{6570}\u{636e}\u{5e93}\u{67b6}\u{6784}",
+            "\u{8c03}\u{8bd5}\u{5d29}\u{6e83}\u{95ee}\u{9898}",
+            "\u{5ba1}\u{8ba1}\u{5b89}\u{5168}\u{6f0f}\u{6d1e}",
+            "\u{8fc1}\u{79fb}\u{5230}\u{65b0}\u{6846}\u{67b6}",
+            "\u{4f18}\u{5316}\u{6027}\u{80fd}\u{74f6}\u{9888}",
+            "\u{5206}\u{6790}\u{8fd9}\u{6bb5}\u{4ee3}\u{7801}",
+        ] {
+            assert_eq!(
+                auto_model_heuristic(msg, "auto"),
+                "deepseek-v4-pro",
+                "expected Pro for `{msg}`",
+            );
+        }
+    }
+
+    #[test]
+    fn auto_model_heuristic_traditional_chinese_keywords_route_to_pro() {
+        for msg in [
+            "\u{8acb}\u{91cd}\u{69cb}\u{6b64}\u{6a21}\u{7d44}",
+            "\u{67b6}\u{69cb}\u{8a2d}\u{8a08}",
+            "\u{4ee3}\u{78bc}\u{8abf}\u{8a66}",
+            "\u{5be9}\u{8a08}\u{6f0f}\u{6d1e}",
+            "\u{9077}\u{79fb}\u{5230}\u{65b0}\u{67b6}\u{69cb}",
+            "\u{512a}\u{5316}\u{6027}\u{80fd}",
+            "\u{91cd}\u{5beb}\u{4ee3}\u{78bc}",
+            "\u{5be6}\u{73fe}\u{65b0}\u{529f}\u{80fd}",
+        ] {
+            assert_eq!(
+                auto_model_heuristic(msg, "auto"),
+                "deepseek-v4-pro",
+                "expected Pro for `{msg}`",
+            );
+        }
+    }
+
+    #[test]
+    fn auto_model_heuristic_short_chinese_chat_stays_on_flash() {
+        assert_eq!(
+            auto_model_heuristic("\u{4f60}\u{597d}", "auto"),
+            "deepseek-v4-flash",
+        );
+    }
+
+    #[test]
+    fn auto_heuristic_selection_marks_short_and_complex_routes_decisive() {
+        let short = auto_model_heuristic_selection_with_bias("yes", "auto", false);
+        assert_eq!(short.model, "deepseek-v4-flash");
+        assert_eq!(
+            short.confidence,
+            AutoModelHeuristicConfidence::Decisive,
+            "trivial replies should skip the Flash router"
+        );
+
+        let complex = auto_model_heuristic_selection_with_bias(
+            "Please review the auth migration",
+            "auto",
+            false,
+        );
+        assert_eq!(complex.model, "deepseek-v4-pro");
+        assert_eq!(
+            complex.confidence,
+            AutoModelHeuristicConfidence::Decisive,
+            "strong complexity keywords should skip the Flash router"
+        );
+    }
+
+    #[test]
+    fn auto_heuristic_selection_leaves_default_branch_ambiguous_for_router() {
+        let request =
+            "Please update the configuration notes so each option has a clearer label. ".repeat(3);
+        assert!(
+            (100..500).contains(&request.chars().count()),
+            "test request must stay in the default grey zone"
+        );
+
+        let selection = auto_model_heuristic_selection_with_bias(&request, "auto", false);
+        assert_eq!(selection.model, "deepseek-v4-flash");
+        assert_eq!(
+            selection.confidence,
+            AutoModelHeuristicConfidence::Ambiguous,
+            "only the grey-zone default branch should invoke the Flash router"
+        );
+    }
+
+    #[test]
+    fn auto_route_recommendation_parses_strict_json() {
+        let rec =
+            parse_auto_route_recommendation(r#"{"model":"deepseek-v4-pro","thinking":"max"}"#)
+                .expect("valid router response should parse");
+
+        assert_eq!(rec.model, "deepseek-v4-pro");
+        assert_eq!(rec.reasoning_effort, Some(ReasoningEffort::Max));
+    }
+
+    #[test]
+    fn auto_route_recommendation_accepts_wrapped_json_aliases() {
+        let rec =
+            parse_auto_route_recommendation(r#"route: {"model":"flash","reasoning_effort":"off"}"#)
+                .expect("wrapped router response should parse");
+
+        assert_eq!(rec.model, "deepseek-v4-flash");
+        assert_eq!(rec.reasoning_effort, Some(ReasoningEffort::Off));
+    }
+
+    #[test]
+    fn auto_route_recommendation_normalizes_legacy_low_medium_to_high() {
+        let rec = parse_auto_route_recommendation(
+            r#"{"model":"deepseek-v4-pro","reasoning_effort":"medium"}"#,
+        )
+        .expect("medium should parse for back-compat");
+
+        assert_eq!(rec.model, "deepseek-v4-pro");
+        assert_eq!(rec.reasoning_effort, Some(ReasoningEffort::High));
+    }
+
+    #[test]
+    fn auto_route_recommendation_rejects_unknown_model() {
+        assert!(
+            parse_auto_route_recommendation(r#"{"model":"some-other-model","thinking":"max"}"#,)
+                .is_none()
+        );
+    }
+
+    #[test]
+    fn auto_heuristic_default_routes_implement_to_pro() {
+        assert_eq!(
+            auto_model_heuristic_with_bias("Please implement a binary search", "auto", false),
+            "deepseek-v4-pro"
+        );
+    }
+
+    #[test]
+    fn auto_heuristic_cost_saving_keeps_borderline_keywords_on_flash() {
+        assert_eq!(
+            auto_model_heuristic_with_bias("Please implement a binary search", "auto", true),
+            "deepseek-v4-flash"
+        );
+        assert_eq!(
+            auto_model_heuristic_with_bias("analyze this snippet", "auto", true),
+            "deepseek-v4-flash"
+        );
+    }
+
+    #[test]
+    fn auto_heuristic_strong_keywords_still_route_to_pro_under_cost_saving() {
+        for kw in [
+            "refactor",
+            "architecture",
+            "design",
+            "debug",
+            "security",
+            "review",
+            "audit",
+            "migrate",
+            "optimize",
+            "rewrite",
+        ] {
+            let req = format!("Please {kw} this module");
+            assert_eq!(
+                auto_model_heuristic_with_bias(&req, "auto", true),
+                "deepseek-v4-pro",
+                "expected Pro for strong keyword `{kw}` even in cost-saving mode"
+            );
+        }
+    }
+
+    #[test]
+    fn auto_heuristic_cost_saving_raises_long_message_threshold() {
+        let body = "filler sentence. ".repeat(40);
+        assert_eq!(
+            auto_model_heuristic_with_bias(&body, "auto", false),
+            "deepseek-v4-pro"
+        );
+        assert_eq!(
+            auto_model_heuristic_with_bias(&body, "auto", true),
+            "deepseek-v4-flash"
+        );
+    }
+
+    #[test]
+    fn config_auto_cost_saving_defaults_to_false() {
+        let cfg = Config::default();
+        assert!(!cfg.auto_cost_saving());
+    }
+
+    #[test]
+    fn config_auto_cost_saving_reads_table() {
+        let cfg = Config {
+            auto: Some(crate::config::AutoConfig {
+                cost_saving: Some(true),
+            }),
+            ..Default::default()
+        };
+        assert!(cfg.auto_cost_saving());
+    }
+}
@@ -29,6 +29,11 @@
 //! └─────────────────────────────────────────┘
 //! ```

+use std::collections::hash_map::DefaultHasher;
+use std::collections::{HashMap, VecDeque};
+use std::hash::{Hash, Hasher};
+use std::sync::Arc;
+
 use serde::{Deserialize, Serialize};
 use sha2::{Digest, Sha256};

@@ -56,23 +61,45 @@ impl PrefixFingerprint {
    /// lexicographically by JSON text, then SHA-256 hashed. This catches
    /// schema/description drift that actually affects the API prefix,
    /// while ignoring internal-only fields like `allowed_callers` (#2264).
+    ///
+    /// This entry point shares a process-local [`ToolCatalogCache`] with
+    /// every other call, so a stable tool set (the common case after the
+    /// first turn of a session) avoids the per-tool JSON serialization
+    /// and sort/join entirely. Callers that hold their own cache — e.g.
+    /// [`PrefixStabilityManager`] — should use
+    /// [`Self::compute_with_tool_cache`] to share *that* cache instead
+    /// and avoid the thread-local lookup.
+    #[cfg(test)]
    pub fn compute(system_text: &str, tools: Option<&[Tool]>) -> Self {
+        let mut cache = ToolCatalogCache::new();
+        Self::compute_with_tool_cache(system_text, tools, &mut cache)
+    }
+
+    /// Compute a fingerprint while reusing a [`ToolCatalogCache`] for the
+    /// tool-side work. The cache holds the joined+sorted+SHA-256'd catalog
+    /// under a content-derived identity so the per-tool JSON serialization
+    /// and the sort/join only run on the first call for a given tool set.
+    ///
+    /// On a cache hit this function avoids the entire tool serialization
+    /// path, which can be 100+ microseconds for a 60-tool catalog.
+    pub fn compute_with_tool_cache(
+        system_text: &str,
+        tools: Option<&[Tool]>,
+        cache: &mut ToolCatalogCache,
+    ) -> Self {
        let system_sha256 = sha256_hex(system_text.as_bytes());

        let tools_sha256 = match tools {
            Some(tools) if !tools.is_empty() => {
-                let mut serialized: Vec<String> =
-                    tools.iter().filter_map(tool_to_api_json).collect();
-                serialized.sort();
-                let joined = serialized.join("\n");
-                sha256_hex(joined.as_bytes())
+                // `fingerprint_for` consults the cache first; on a hit
+                // it returns the pre-computed hex digest directly.
+                cache.fingerprint_for(tools).sha256_hex
            }
            _ => sha256_hex(b""),
        };

        let combined = format!("{system_sha256}:{tools_sha256}");
        let combined_sha256 = sha256_hex(combined.as_bytes());
-
        Self {
            system_sha256,
            tools_sha256,
@@ -153,19 +180,224 @@ pub struct PrefixStabilityManager {
    change_count: u64,
    /// Total number of stability checks performed.
    check_count: u64,
+    /// Process-local cache for the tool-catalog JSON serialization. Avoids
+    /// re-running `tool_to_api_json` + sort + join on every `check_and_update`
+    /// when the tool set is unchanged (the common case once tools are
+    /// registered at session start).
+    tool_catalog_cache: ToolCatalogCache,
 }

+/// Default capacity for the tool-catalog serialization cache. Sized for
+/// "session + 1 or 2 forked subagent catalogs" without unbounded growth.
+const TOOL_CATALOG_CACHE_CAPACITY: usize = 8;
+
+/// Bounded LRU cache of `(tool_set_identity) -> (sha256_hex, joined_string)`.
+///
+/// The cache key is a content-derived `u64` hash of the tool list (length +
+/// per-tool `name` + `description` + serialized `input_schema`). On a hit,
+/// `PrefixFingerprint::compute` skips the per-tool JSON serialization, the
+/// sort, and the join — a workload that can be 100+ microseconds for a
+/// 60-tool catalog. On a miss, the work runs once and the result is stored.
+///
+/// The cache is intentionally *not* generic over `PrefixFingerprint` because
+/// only the joined string is large; the SHA-256 is recomputed from the cached
+/// joined string when the catalog changes (cheap, ≤ a few hundred bytes).
+#[derive(Debug, Default, Clone)]
+pub struct ToolCatalogCache {
+    by_identity: HashMap<u64, CachedCatalog>,
+    insertion_order: VecDeque<u64>,
+    capacity: usize,
+}
+
+/// One entry in [`ToolCatalogCache`]. Stores the joined JSON catalog plus
+/// the pre-computed SHA-256 hex digest so [`PrefixFingerprint::compute`]
+/// does not need to re-hash on the hot path.
+#[derive(Debug, Clone)]
+pub struct CachedCatalog {
+    /// The newline-joined, sorted tool-catalog JSON. Wrapped in an `Arc` so
+    /// multiple cache consumers can hold the same allocation. Exposed for
+    /// observability (debug builds, `/status` chip) and for tests that
+    /// need to assert byte-stability of the joined catalog.
+    #[allow(dead_code)] // observability + tests; not consumed on the hot path
+    pub joined: Arc<String>,
+    /// SHA-256 hex digest of `joined`, computed once on cache miss.
+    pub sha256_hex: String,
+}
+
+impl ToolCatalogCache {
+    /// Create a cache with the default capacity.
+    #[must_use]
+    pub fn new() -> Self {
+        Self::with_capacity(TOOL_CATALOG_CACHE_CAPACITY)
+    }
+
+    /// Create a cache that holds at most `capacity` tool-set entries.
+    /// Smaller values save memory at the cost of more cache misses.
+    #[must_use]
+    pub fn with_capacity(capacity: usize) -> Self {
+        let cap = capacity.max(1);
+        Self {
+            by_identity: HashMap::with_capacity(cap),
+            insertion_order: VecDeque::with_capacity(cap),
+            capacity: cap,
+        }
+    }
+
+    /// Compute (or recall) the joined-and-hashed tool catalog for `tools`.
+    /// The cache is keyed on a content-derived `u64` identity so two `&[Tool]`
+    /// slices with the same payloads — in the same order — hit the same entry.
+    pub fn fingerprint_for(&mut self, tools: &[Tool]) -> CachedCatalog {
+        let identity = tool_set_identity(tools);
+        if let Some(cached) = self.by_identity.get(&identity) {
+            // Hit: clone the `Arc` so the caller can hold the joined string
+            // without keeping a reference to the cache.
+            return cached.clone();
+        }
+
+        // Miss: serialize, sort, join, hash. Store the joined string in an
+        // `Arc` so a later hit can return the same allocation.
+        let mut serialized: Vec<String> = tools.iter().filter_map(tool_to_api_json).collect();
+        serialized.sort();
+        let joined = Arc::new(serialized.join("\n"));
+        let sha256_hex = sha256_hex(joined.as_bytes());
+        let entry = CachedCatalog {
+            joined: Arc::clone(&joined),
+            sha256_hex,
+        };
+
+        if self.by_identity.len() >= self.capacity
+            && let Some(oldest) = self.insertion_order.pop_front()
+        {
+            self.by_identity.remove(&oldest);
+        }
+        self.by_identity.insert(identity, entry.clone());
+        self.insertion_order.push_back(identity);
+        entry
+    }
+
+    /// Drop every cached entry. Used by tool-registry mutation paths
+    /// (e.g. plugin hot-reload, MCP attach) when the caller cannot
+    /// easily prove the tool set is unchanged.
+    #[allow(dead_code)] // observability; called by /cache flush and tests
+    pub fn invalidate(&mut self) {
+        self.by_identity.clear();
+        self.insertion_order.clear();
+    }
+
+    /// Returns the number of cached entries.
+    #[must_use]
+    pub fn len(&self) -> usize {
+        self.by_identity.len()
+    }
+
+    /// Returns `true` if the cache has no entries.
+    #[allow(dead_code)] // observability; surfaced via /status
+    #[must_use]
+    pub fn is_empty(&self) -> bool {
+        self.by_identity.is_empty()
+    }
+
+    /// Returns `(current_entries, capacity)` for observability. Surfaced via
+    /// the `/status` chip in a follow-up; tests exercise the path.
+    #[allow(dead_code)] // surfaced via /status in a follow-up; tests exercise it
+    #[must_use]
+    pub fn stats(&self) -> (usize, usize) {
+        (self.len(), self.capacity)
+    }
+}
+
+/// Content-derived identity for a tool slice. Order-sensitive: two slices
+/// with the same tools in different orders produce different identities.
+/// (The downstream fingerprint itself is order-insensitive — the sort in
+/// `fingerprint_for` takes care of that — but the cache key matches the
+/// input order so re-registration of the same set in the same order hits.)
+fn tool_set_identity(tools: &[Tool]) -> u64 {
+    let mut hasher = DefaultHasher::new();
+    tools.len().hash(&mut hasher);
+    for tool in tools {
+        tool.name.hash(&mut hasher);
+        tool.description.hash(&mut hasher);
+        // `strict` participates in `tool_to_api_json` output (it is part of
+        // the wire-format the chat API receives), so it MUST be part of the
+        // identity. Omitting it lets two semantically different catalogs
+        // collide and serve a stale fingerprint.
+        tool.strict.hash(&mut hasher);
+        // Walk the schema JSON directly instead of materializing it as a
+        // String. For a 60-tool catalog this saves ~25-40 KB of allocation
+        // on every cache miss.
+        hash_json_value(&tool.input_schema, &mut hasher);
+    }
+    hasher.finish()
+}
+
+/// Fold a `serde_json::Value` into the hasher without allocating a
+/// `String`. Numeric variants are hashed via their bit pattern so `1` and
+/// `1.0` produce distinct identities (matching the JSON spec).
+fn hash_json_value<H: Hasher>(value: &serde_json::Value, state: &mut H) {
+    match value {
+        serde_json::Value::Null => 0u8.hash(state),
+        serde_json::Value::Bool(b) => {
+            1u8.hash(state);
+            b.hash(state);
+        }
+        serde_json::Value::Number(n) => {
+            2u8.hash(state);
+            if let Some(i) = n.as_i64() {
+                i.hash(state);
+            } else if let Some(u) = n.as_u64() {
+                u.hash(state);
+            } else if let Some(f) = n.as_f64() {
+                f.to_bits().hash(state);
+            }
+        }
+        serde_json::Value::String(s) => {
+            3u8.hash(state);
+            s.hash(state);
+        }
+        serde_json::Value::Array(arr) => {
+            4u8.hash(state);
+            arr.len().hash(state);
+            for v in arr {
+                hash_json_value(v, state);
+            }
+        }
+        serde_json::Value::Object(obj) => {
+            5u8.hash(state);
+            obj.len().hash(state);
+            // Iterate by sorted key so `{"a":1,"b":2}` and `{"b":2,"a":1}`
+            // collide — the wire format already canonicalizes via the
+            // `serde_json` Map ordering, but a defensively-sorted view
+            // future-proofs against schema serializers that emit
+            // declaration order.
+            let mut entries: Vec<(&String, &serde_json::Value)> = obj.iter().collect();
+            entries.sort_by(|a, b| a.0.cmp(b.0));
+            for (k, v) in entries {
+                k.hash(state);
+                hash_json_value(v, state);
+            }
+        }
+    }
+}
+
+/// Process-local fallback cache used by [`PrefixFingerprint::compute`]
+/// (when available). Callers that maintain their own cache (e.g.
+/// [`PrefixStabilityManager`]) should prefer
+/// [`PrefixFingerprint::compute_with_tool_cache`] and pass the cache in
+/// directly, both to share state and to avoid the thread-local lookup
+/// on the hot path.
 #[allow(dead_code)]
 impl PrefixStabilityManager {
    /// Create a new manager and immediately pin the first fingerprint.
    pub fn new(system_text: &str, tools: Option<&[Tool]>) -> Self {
-        let fp = PrefixFingerprint::compute(system_text, tools);
+        let mut cache = ToolCatalogCache::new();
+        let fp = PrefixFingerprint::compute_with_tool_cache(system_text, tools, &mut cache);
        Self {
            pinned: Some(fp.clone()),
            current: Some(fp),
            last_change: None,
            change_count: 0,
            check_count: 0,
+            tool_catalog_cache: cache,
        }
    }

@@ -178,6 +410,7 @@ impl PrefixStabilityManager {
            last_change: None,
            change_count: 0,
            check_count: 0,
+            tool_catalog_cache: ToolCatalogCache::new(),
        }
    }

@@ -186,7 +419,11 @@ impl PrefixStabilityManager {
    /// Note: does NOT increment `check_count` — that counter is reserved
    /// for `check_and_update` calls so `stability_ratio()` stays accurate.
    pub fn pin(&mut self, system_text: &str, tools: Option<&[Tool]>) -> bool {
-        let fp = PrefixFingerprint::compute(system_text, tools);
+        let fp = PrefixFingerprint::compute_with_tool_cache(
+            system_text,
+            tools,
+            &mut self.tool_catalog_cache,
+        );
        let was_unpinned = self.pinned.is_none();
        self.pinned = Some(fp.clone());
        self.current = Some(fp);
@@ -205,7 +442,16 @@ impl PrefixStabilityManager {
        system_text: &str,
        tools: Option<&[Tool]>,
    ) -> Result<bool, Box<PrefixChange>> {
-        let fp = PrefixFingerprint::compute(system_text, tools);
+        // Use the cached tool-catalog fingerprint path so a stable tool set
+        // (the common case after the first turn) does not re-serialize the
+        // full tool list. The system-prompt side is hashed on every call
+        // because the system prompt changes more often (mode flips,
+        // project-context refreshes, canonical state overlays).
+        let fp = PrefixFingerprint::compute_with_tool_cache(
+            system_text,
+            tools,
+            &mut self.tool_catalog_cache,
+        );
        let old_fp = self.current.replace(fp.clone());
        self.check_count += 1;

@@ -531,4 +777,126 @@ mod tests {
    fn system_prompt_text_returns_empty_for_none() {
        assert_eq!(system_prompt_text(None), "");
    }
+
+    // ── ToolCatalogCache tests ──────────────────────────────────
+
+    #[test]
+    fn tool_catalog_cache_miss_then_hit_returns_same_arc() {
+        let mut cache = ToolCatalogCache::new();
+        let tools = vec![make_tool("read_file"), make_tool("write_file")];
+
+        let first = cache.fingerprint_for(&tools);
+        assert_eq!(cache.len(), 1);
+
+        let second = cache.fingerprint_for(&tools);
+        assert_eq!(cache.len(), 1, "second call should be a cache hit");
+        assert!(Arc::ptr_eq(&first.joined, &second.joined));
+        assert_eq!(first.sha256_hex, second.sha256_hex);
+    }
+
+    #[test]
+    fn tool_catalog_cache_different_tool_sets_dont_collide() {
+        let mut cache = ToolCatalogCache::new();
+        let a = vec![make_tool("read_file")];
+        let b = vec![make_tool("write_file")];
+
+        let entry_a = cache.fingerprint_for(&a);
+        let entry_b = cache.fingerprint_for(&b);
+        assert_eq!(cache.len(), 2);
+        assert_ne!(entry_a.sha256_hex, entry_b.sha256_hex);
+        assert!(!Arc::ptr_eq(&entry_a.joined, &entry_b.joined));
+    }
+
+    #[test]
+    fn tool_catalog_cache_pinned_by_input_order() {
+        // The identity hash includes the input order so re-registering the
+        // same set with a different permutation produces a separate cache
+        // entry. The sorted-and-joined digest still matches the order-
+        // independent fingerprint that the chat API sees.
+        let mut cache = ToolCatalogCache::new();
+        let a = vec![make_tool("read_file"), make_tool("write_file")];
+        let b = vec![make_tool("write_file"), make_tool("read_file")];
+        let entry_a = cache.fingerprint_for(&a);
+        let entry_b = cache.fingerprint_for(&b);
+        // Joined output is the same (sorted) but the two cache entries are
+        // distinct because their identities differ.
+        assert_eq!(entry_a.joined.as_str(), entry_b.joined.as_str());
+        assert_eq!(cache.len(), 2);
+    }
+
+    #[test]
+    fn tool_catalog_cache_detects_schema_change() {
+        let mut cache = ToolCatalogCache::new();
+        let tool_v1 = make_tool("t");
+        let mut tool_v2 = make_tool("t");
+        tool_v2.description = "updated".to_string();
+
+        let entry_v1 = cache.fingerprint_for(&[tool_v1]);
+        let entry_v2 = cache.fingerprint_for(&[tool_v2]);
+        assert_ne!(entry_v1.sha256_hex, entry_v2.sha256_hex);
+        assert_eq!(cache.len(), 2);
+    }
+
+    #[test]
+    fn tool_catalog_cache_respects_capacity() {
+        let mut cache = ToolCatalogCache::with_capacity(2);
+        cache.fingerprint_for(&[make_tool("a")]);
+        cache.fingerprint_for(&[make_tool("b")]);
+        cache.fingerprint_for(&[make_tool("c")]);
+        assert_eq!(cache.len(), 2);
+        // The first entry was evicted; a re-query for it should miss.
+        let re_entry = cache.fingerprint_for(&[make_tool("a")]);
+        // After the re-query, the cache has [b, c, a] — 3 entries? No,
+        // capacity 2 means oldest is evicted when we insert the 3rd unique.
+        // After inserting a, the cache holds the most recent 2: {c, a}.
+        assert_eq!(cache.len(), 2);
+        // The returned entry should be the same as a fresh fingerprint.
+        let fresh = cache.fingerprint_for(&[make_tool("a")]);
+        assert!(Arc::ptr_eq(&re_entry.joined, &fresh.joined));
+    }
+
+    #[test]
+    fn tool_catalog_cache_invalidate_clears_all() {
+        let mut cache = ToolCatalogCache::new();
+        cache.fingerprint_for(&[make_tool("a")]);
+        cache.fingerprint_for(&[make_tool("b")]);
+        cache.invalidate();
+        assert!(cache.is_empty());
+        assert_eq!(cache.len(), 0);
+    }
+
+    #[test]
+    fn tool_catalog_cache_empty_slice_uses_zero_capacity_path() {
+        // Empty input is fine — should produce a stable, non-empty digest.
+        let mut cache = ToolCatalogCache::new();
+        let entry = cache.fingerprint_for(&[]);
+        assert!(!entry.sha256_hex.is_empty());
+        let again = cache.fingerprint_for(&[]);
+        assert!(Arc::ptr_eq(&entry.joined, &again.joined));
+    }
+
+    #[test]
+    fn compute_with_tool_cache_matches_compute_uncached() {
+        // The cached and uncached paths must produce identical fingerprints
+        // for the same inputs — otherwise we'd silently corrupt the prefix
+        // cache and invalidate every request.
+        let mut cache = ToolCatalogCache::new();
+        let tools = vec![make_tool("alpha"), make_tool("beta")];
+
+        let cached = PrefixFingerprint::compute_with_tool_cache("sys", Some(&tools), &mut cache);
+        let uncached = PrefixFingerprint::compute("sys", Some(&tools));
+        assert_eq!(cached.combined_sha256, uncached.combined_sha256);
+        assert_eq!(cached.tools_sha256, uncached.tools_sha256);
+    }
+
+    #[test]
+    fn manager_check_and_update_uses_cached_tool_fingerprint() {
+        // After the first call populates the cache, subsequent calls with
+        // the same tool list should not invalidate the prefix.
+        let tools = vec![make_tool("t1")];
+        let mut mgr = PrefixStabilityManager::new("sys", Some(&tools));
+        assert!(mgr.check_and_update("sys", Some(&tools)).is_ok());
+        assert!(mgr.check_and_update("sys", Some(&tools)).is_ok());
+        assert_eq!(mgr.change_count(), 0);
+    }
 }
@@ -117,39 +117,52 @@ fn pricing_for_model_at(model: &str, _now: DateTime<Utc>) -> Option<ModelPricing
        // DeepSeek Platform pricing. Avoid showing misleading DeepSeek costs.
        return None;
    }
-    if !lower.contains("deepseek") {
-        return None;
+    match lower.as_str() {
+        "xiaomi/mimo-v2.5-pro" | "mimo-v2.5-pro" => return Some(deepseek_v4_pro_pricing()),
+        "xiaomi/mimo-v2.5" | "mimo-v2.5" => return Some(deepseek_v4_flash_pricing()),
+        _ => {}
    }
-    if lower.contains("v4-pro") || lower.contains("v4pro") {
-        // DeepSeek's pricing page says the V4-Pro promotional 75% discount
-        // becomes the official one-quarter base price after 2026-05-31 15:59
-        // UTC. Keep using the adjusted rate after that cutoff (#2489).
-        Some(ModelPricing {
-            usd: CurrencyPricing {
-                input_cache_hit_per_million: 0.003625,
-                input_cache_miss_per_million: 0.435,
-                output_per_million: 0.87,
-            },
-            cny: CurrencyPricing {
-                input_cache_hit_per_million: 0.025,
-                input_cache_miss_per_million: 3.0,
-                output_per_million: 6.0,
-            },
-        })
+    if lower.contains("deepseek") {
+        if lower.contains("v4-pro") || lower.contains("v4pro") {
+            // DeepSeek's pricing page says the V4-Pro promotional 75% discount
+            // becomes the official one-quarter base price after 2026-05-31 15:59
+            // UTC. Keep using the adjusted rate after that cutoff (#2489).
+            Some(deepseek_v4_pro_pricing())
+        } else {
+            Some(deepseek_v4_flash_pricing())
+        }
    } else {
-        // deepseek-v4-flash pricing.
-        Some(ModelPricing {
-            usd: CurrencyPricing {
-                input_cache_hit_per_million: 0.0028,
-                input_cache_miss_per_million: 0.14,
-                output_per_million: 0.28,
-            },
-            cny: CurrencyPricing {
-                input_cache_hit_per_million: 0.02,
-                input_cache_miss_per_million: 1.0,
-                output_per_million: 2.0,
-            },
-        })
+        None
+    }
+}
+
+fn deepseek_v4_pro_pricing() -> ModelPricing {
+    ModelPricing {
+        usd: CurrencyPricing {
+            input_cache_hit_per_million: 0.003625,
+            input_cache_miss_per_million: 0.435,
+            output_per_million: 0.87,
+        },
+        cny: CurrencyPricing {
+            input_cache_hit_per_million: 0.025,
+            input_cache_miss_per_million: 3.0,
+            output_per_million: 6.0,
+        },
+    }
+}
+
+fn deepseek_v4_flash_pricing() -> ModelPricing {
+    ModelPricing {
+        usd: CurrencyPricing {
+            input_cache_hit_per_million: 0.0028,
+            input_cache_miss_per_million: 0.14,
+            output_per_million: 0.28,
+        },
+        cny: CurrencyPricing {
+            input_cache_hit_per_million: 0.02,
+            input_cache_miss_per_million: 1.0,
+            output_per_million: 2.0,
+        },
    }
 }

@@ -340,6 +353,27 @@ mod tests {
        assert_eq!(pricing.cny.output_per_million, 2.0);
    }

+    #[test]
+    fn xiaomi_mimo_primary_models_use_matching_deepseek_v4_rates() {
+        let now = Utc.with_ymd_and_hms(2026, 6, 4, 0, 0, 0).single().unwrap();
+
+        let pro_pricing = pricing_for_model_at("mimo-v2.5-pro", now).unwrap();
+        assert_eq!(pro_pricing.usd.input_cache_hit_per_million, 0.003625);
+        assert_eq!(pro_pricing.usd.input_cache_miss_per_million, 0.435);
+        assert_eq!(pro_pricing.usd.output_per_million, 0.87);
+        assert_eq!(pro_pricing.cny.input_cache_hit_per_million, 0.025);
+        assert_eq!(pro_pricing.cny.input_cache_miss_per_million, 3.0);
+        assert_eq!(pro_pricing.cny.output_per_million, 6.0);
+
+        let flash_pricing = pricing_for_model_at("xiaomi/mimo-v2.5", now).unwrap();
+        assert_eq!(flash_pricing.usd.input_cache_hit_per_million, 0.0028);
+        assert_eq!(flash_pricing.usd.input_cache_miss_per_million, 0.14);
+        assert_eq!(flash_pricing.usd.output_per_million, 0.28);
+        assert_eq!(flash_pricing.cny.input_cache_hit_per_million, 0.02);
+        assert_eq!(flash_pricing.cny.input_cache_miss_per_million, 1.0);
+        assert_eq!(flash_pricing.cny.output_per_million, 2.0);
+    }
+
    #[test]
    fn cost_estimate_calculates_usd_and_cny() {
        let estimate = calculate_turn_cost_estimate("deepseek-v4-flash", 1_000_000, 500_000)
@@ -359,6 +359,22 @@ struct ReadmePack {
 /// sorted entries, bounded README text, and sorted JSON object fields. It does
 /// not include timestamps, random ids, absolute temp paths, or live git state.
 pub fn generate_project_context_pack(workspace: &Path) -> Option<String> {
+    let pack = build_project_context_pack(workspace)?;
+    let json = serde_json::to_string_pretty(&pack).ok()?;
+    Some(format!(
+        "## Project Context Pack\n\n<project_context_pack>\n{json}\n</project_context_pack>"
+    ))
+}
+
+fn generate_bounded_project_overview(workspace: &Path) -> Option<String> {
+    let pack = build_project_context_pack(workspace)?;
+    let json = serde_json::to_string_pretty(&pack).ok()?;
+    Some(format!(
+        "## Bounded Project Overview\n\n```json\n{json}\n```"
+    ))
+}
+
+fn build_project_context_pack(workspace: &Path) -> Option<ProjectContextPack> {
    let mut entries = Vec::new();
    collect_pack_entries(workspace, workspace, 0, &mut entries);
    sort_pack_paths(&mut entries);
@@ -386,7 +402,7 @@ pub fn generate_project_context_pack(workspace: &Path) -> Option<String> {
    counts.insert("directory_entries".to_string(), entries.len());
    counts.insert("key_source_files".to_string(), key_source_files.len());

-    let pack = ProjectContextPack {
+    Some(ProjectContextPack {
        project_name: workspace
            .file_name()
            .and_then(|name| name.to_str())
@@ -397,12 +413,7 @@ pub fn generate_project_context_pack(workspace: &Path) -> Option<String> {
        config_files,
        key_source_files,
        counts,
-    };
-
-    let json = serde_json::to_string_pretty(&pack).ok()?;
-    Some(format!(
-        "## Project Context Pack\n\n<project_context_pack>\n{json}\n</project_context_pack>"
-    ))
+    })
 }

 fn collect_pack_entries(root: &Path, dir: &Path, depth: usize, out: &mut Vec<String>) {
@@ -649,20 +660,45 @@ pub fn load_project_context(workspace: &Path) -> ProjectContext {
 ///
 /// This allows for monorepo setups where a root AGENTS.md applies to all subdirectories.
 pub fn load_project_context_with_parents(workspace: &Path) -> ProjectContext {
-    load_project_context_with_parents_and_home(workspace, dirs::home_dir().as_deref())
+    load_project_context_with_parents_cached_and_home(workspace, dirs::home_dir().as_deref())
+}
+
+fn load_project_context_with_parents_cached_and_home(
+    workspace: &Path,
+    home_dir: Option<&Path>,
+) -> ProjectContext {
+    let workspace = canonicalize_workspace_or_keep(workspace);
+    let pre_load_key = crate::project_context_cache::compute_cache_key(&workspace, home_dir);
+    if let Some(ctx) = crate::project_context_cache::lookup(&pre_load_key) {
+        return ctx;
+    }
+
+    let ctx = load_project_context_with_parents_and_home(&workspace, home_dir);
+    let post_load_key = crate::project_context_cache::compute_cache_key(&workspace, home_dir);
+    crate::project_context_cache::store(post_load_key, ctx.clone());
+    ctx
 }

 fn load_project_context_with_parents_and_home(
    workspace: &Path,
    home_dir: Option<&Path>,
 ) -> ProjectContext {
+    let workspace_canonical = canonicalize_workspace_or_keep(workspace);
    let mut ctx = load_project_context(workspace);
+    let parent_search_stop = project_context_parent_search_stop_dir();

    // If no context found in workspace, check parent directories
    if !ctx.has_instructions() {
-        let mut current = workspace.parent();
+        let mut current = workspace_canonical.parent();

        while let Some(parent) = current {
+            if parent_search_stop
+                .as_deref()
+                .is_some_and(|stop| parent == stop)
+            {
+                break;
+            }
+
            let parent_ctx = load_project_context(parent);
            ctx.warnings.extend(parent_ctx.warnings.iter().cloned());
            if parent_ctx.has_instructions() {
@@ -704,7 +740,7 @@ fn load_project_context_with_parents_and_home(
        }
    }

-    // Auto-generate .deepseek/instructions.md when no context file exists anywhere.
+    // Auto-generate .codewhale/instructions.md when no context file exists anywhere.
    // This avoids the per-turn filesystem scan fallback in prompts.rs that
    // breaks KV prefix cache stability.
    if !ctx.has_instructions()
@@ -735,6 +771,92 @@ fn load_project_context_with_parents_and_home(
    ctx
 }

+pub(crate) fn project_context_cache_candidate_paths(
+    workspace: &Path,
+    home_dir: Option<&Path>,
+) -> Vec<PathBuf> {
+    let workspace = canonicalize_workspace_or_keep(workspace);
+    let mut paths = Vec::new();
+    let parent_search_stop = project_context_parent_search_stop_dir();
+
+    let mut current = Some(workspace.as_path());
+    while let Some(dir) = current {
+        if parent_search_stop
+            .as_deref()
+            .is_some_and(|stop| dir == stop)
+        {
+            break;
+        }
+
+        for filename in PROJECT_CONTEXT_FILES {
+            paths.push(dir.join(filename));
+        }
+        current = dir.parent();
+    }
+
+    if let Some(home) = home_dir {
+        for candidate in global_context_relative_paths() {
+            paths.push(join_relative_components(home, candidate));
+        }
+    }
+
+    paths.extend(repo_constitution_candidate_paths(&workspace));
+    paths.push(workspace.join(".deepseek").join("trusted"));
+    paths.push(workspace.join(".deepseek").join("trust.json"));
+    paths.extend(crate::config::workspace_trust_config_candidate_paths());
+
+    paths
+}
+
+fn repo_constitution_candidate_paths(workspace: &Path) -> Vec<PathBuf> {
+    let git_root = crate::project_doc::find_git_root(workspace);
+    let mut current = workspace.to_path_buf();
+    let mut paths = Vec::new();
+    loop {
+        paths.push(join_relative_components(
+            &current,
+            REPO_CONSTITUTION_RELATIVE_PATH,
+        ));
+        if let Some(ref root) = git_root
+            && current == *root
+        {
+            break;
+        }
+        match current.parent() {
+            Some(parent) if parent != current => current = parent.to_path_buf(),
+            _ => break,
+        }
+    }
+    paths
+}
+
+fn global_context_relative_paths() -> [&'static [&'static str]; 6] {
+    [
+        GLOBAL_AGENTS_RELATIVE_PATH,
+        GLOBAL_AGENTS_VENDOR_NEUTRAL_PATH,
+        GLOBAL_AGENTS_LEGACY_PATH,
+        GLOBAL_WHALE_RELATIVE_PATH,
+        GLOBAL_WHALE_VENDOR_NEUTRAL_PATH,
+        GLOBAL_WHALE_LEGACY_PATH,
+    ]
+}
+
+fn join_relative_components(base: &Path, relative: &[&str]) -> PathBuf {
+    let mut path = base.to_path_buf();
+    for component in relative {
+        path.push(component);
+    }
+    path
+}
+
+fn canonicalize_workspace_or_keep(workspace: &Path) -> PathBuf {
+    fs::canonicalize(workspace).unwrap_or_else(|_| workspace.to_path_buf())
+}
+
+fn project_context_parent_search_stop_dir() -> Option<PathBuf> {
+    dirs::home_dir().map(|home| canonicalize_workspace_or_keep(&home))
+}
+
 /// Combine global user-wide preferences with a project-local
 /// AGENTS.md/CLAUDE.md/instructions.md. Global comes first so
 /// workspace-specific rules can override it — the model reads in declared
@@ -765,22 +887,10 @@ fn load_global_agents_context(workspace: &Path, home_dir: Option<&Path>) -> Opti
    // 4. ~/.codewhale/WHALE.md      (deprecated, legacy fallback)
    // 5. ~/.agents/WHALE.md         (deprecated, vendor-neutral legacy)
    // 6. ~/.deepseek/WHALE.md       (deprecated, legacy)
-    let candidates: &[&[&str]] = &[
-        GLOBAL_AGENTS_RELATIVE_PATH,
-        GLOBAL_AGENTS_VENDOR_NEUTRAL_PATH,
-        GLOBAL_AGENTS_LEGACY_PATH,
-        GLOBAL_WHALE_RELATIVE_PATH,
-        GLOBAL_WHALE_VENDOR_NEUTRAL_PATH,
-        GLOBAL_WHALE_LEGACY_PATH,
-    ];
-
    let mut warnings = Vec::new();

-    for candidate in candidates {
-        let mut path = home.to_path_buf();
-        for component in *candidate {
-            path.push(component);
-        }
+    for candidate in global_context_relative_paths() {
+        let path = join_relative_components(home, candidate);

        if path.exists() && path.is_file() {
            match load_context_file(&path) {
@@ -823,15 +933,13 @@ fn auto_generate_context(workspace: &Path) -> Option<String> {
        return None;
    }

-    let summary = crate::utils::summarize_project(workspace);
-    let tree = crate::utils::project_tree(workspace, 2);
+    let overview = generate_bounded_project_overview(workspace)?;

    let content = format!(
-        "# Project Structure (Auto-generated)\n\n\
+        "# Project Context (Auto-generated)\n\n\
         > This file was automatically generated by CodeWhale.\n\
         > You can edit or delete it at any time.\n\n\
-         **Summary:** {summary}\n\n\
-         **Tree:**\n```\n{tree}\n```"
+         {overview}"
    );

    // Create .codewhale/ directory
@@ -1379,6 +1487,178 @@ mod tests {
        );
    }

+    #[test]
+    fn auto_generated_context_is_bounded_for_many_file_workspace() {
+        let workspace = tempdir().expect("workspace tempdir");
+        let home = tempdir().expect("home tempdir");
+        let noisy = workspace.path().join("aaa-many-files");
+        fs::create_dir_all(&noisy).expect("mkdir noisy");
+        for i in 0..1000 {
+            fs::write(noisy.join(format!("file-{i:04}.rs")), "fn noisy() {}").expect("write noisy");
+        }
+        fs::create_dir_all(workspace.path().join("zzz-important")).expect("mkdir important");
+        fs::write(
+            workspace.path().join("zzz-important").join("main.rs"),
+            "fn important() {}",
+        )
+        .expect("write important");
+
+        let start = std::time::Instant::now();
+        let ctx = load_project_context_with_parents_and_home(workspace.path(), Some(home.path()));
+        let elapsed = start.elapsed();
+        assert!(
+            elapsed < std::time::Duration::from_secs(2),
+            "auto-generated context should stay bounded, took {elapsed:?}"
+        );
+        assert!(ctx.has_instructions());
+
+        let generated_path = workspace.path().join(".codewhale").join("instructions.md");
+        assert_eq!(ctx.source_path.as_deref(), Some(generated_path.as_path()));
+        let generated = fs::read_to_string(&generated_path).expect("read generated");
+        assert!(generated.contains("Project Context (Auto-generated)"));
+        assert!(generated.contains("Bounded Project Overview"));
+        assert!(!generated.contains("<project_context_pack>"));
+        assert!(
+            generated.contains("\"zzz-important/\""),
+            "later top-level project areas should remain visible:\n{generated}"
+        );
+        let noisy_count = generated.matches("aaa-many-files/file-").count();
+        assert!(
+            noisy_count < 300,
+            "generated context should not list the whole noisy directory; saw {noisy_count}"
+        );
+        assert!(
+            !generated.contains("file-0999.rs"),
+            "bounded context should omit the tail of the noisy directory"
+        );
+    }
+
+    #[test]
+    fn cached_context_reflects_overwritten_agents_md() {
+        crate::project_context_cache::clear();
+        let workspace = tempdir().expect("workspace tempdir");
+        let home = tempdir().expect("home tempdir");
+        let agents = workspace.path().join("AGENTS.md");
+        fs::write(&agents, "alpha").expect("write alpha");
+
+        let first =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+        assert!(
+            first
+                .instructions
+                .as_deref()
+                .is_some_and(|s| s.contains("alpha")),
+            "expected alpha instructions: {:?}",
+            first.instructions
+        );
+
+        fs::write(&agents, "bravo").expect("write bravo");
+        let second =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+
+        assert!(
+            second
+                .instructions
+                .as_deref()
+                .is_some_and(|s| s.contains("bravo")),
+            "cache must invalidate on same-length content overwrite: {:?}",
+            second.instructions
+        );
+    }
+
+    #[test]
+    fn cached_context_reflects_constitution_json_change() {
+        crate::project_context_cache::clear();
+        let workspace = tempdir().expect("workspace tempdir");
+        let home = tempdir().expect("home tempdir");
+        fs::create_dir(workspace.path().join(".git")).expect("mkdir git");
+        fs::create_dir(workspace.path().join(".codewhale")).expect("mkdir codewhale");
+        let constitution = workspace
+            .path()
+            .join(".codewhale")
+            .join("constitution.json");
+        fs::write(
+            &constitution,
+            r#"{"schema_version":1,"authority":["alpha authority"]}"#,
+        )
+        .expect("write alpha constitution");
+
+        let first =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+        assert!(
+            first
+                .constitution_block
+                .as_deref()
+                .is_some_and(|s| s.contains("alpha authority")),
+            "expected alpha constitution block: {:?}",
+            first.constitution_block
+        );
+
+        fs::write(
+            &constitution,
+            r#"{"schema_version":1,"authority":["bravo authority"]}"#,
+        )
+        .expect("write bravo constitution");
+        let second =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+
+        assert!(
+            second
+                .constitution_block
+                .as_deref()
+                .is_some_and(|s| s.contains("bravo authority")),
+            "cache must invalidate when constitution changes: {:?}",
+            second.constitution_block
+        );
+    }
+
+    #[test]
+    fn cached_context_regenerates_after_auto_generated_context_is_deleted() {
+        crate::project_context_cache::clear();
+        let workspace = tempdir().expect("workspace tempdir");
+        let home = tempdir().expect("home tempdir");
+
+        let first =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+        assert!(first.has_instructions());
+        let generated_path = workspace.path().join(".codewhale").join("instructions.md");
+        assert!(generated_path.is_file(), "expected generated instructions");
+
+        fs::remove_file(&generated_path).expect("remove generated instructions");
+        assert!(!generated_path.exists());
+
+        let second =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+        assert!(second.has_instructions());
+        assert!(
+            generated_path.is_file(),
+            "cache hit under the missing-file signature would skip regeneration"
+        );
+    }
+
+    #[test]
+    fn cached_context_reflects_trust_marker_created() {
+        crate::project_context_cache::clear();
+        let workspace = tempdir().expect("workspace tempdir");
+        let home = tempdir().expect("home tempdir");
+        fs::write(workspace.path().join("AGENTS.md"), "instructions").expect("write agents");
+
+        let first =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+        assert!(!first.is_trusted);
+
+        let trust_dir = workspace.path().join(".deepseek");
+        fs::create_dir(&trust_dir).expect("mkdir trust dir");
+        fs::write(trust_dir.join("trusted"), "").expect("write trust marker");
+
+        let second =
+            load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
+        assert!(
+            second.is_trusted,
+            "cache must invalidate when trust marker appears"
+        );
+    }
+
    #[test]
    fn project_context_pack_sort_is_cross_platform_and_priority_aware() {
        let mut unix_paths = vec![
@@ -1657,7 +1937,7 @@ mod tests {
            ctx.instructions
                .as_ref()
                .unwrap()
-                .contains("Project Structure (Auto-generated)")
+                .contains("Project Context (Auto-generated)")
        );
    }
 }
@@ -0,0 +1,220 @@
+//! Process-local cache for project context loading.
+//!
+//! The project-context loader sits on prompt/session hot paths and repeatedly
+//! checks the same workspace, parent, global, constitution, and trust files.
+//! This cache avoids rereading unchanged context while keeping the signature
+//! broad enough for the loader's side effects and authority surfaces.
+
+use std::cell::RefCell;
+use std::collections::{HashMap, VecDeque};
+use std::path::{Path, PathBuf};
+
+use sha2::{Digest, Sha256};
+
+use crate::project_context::ProjectContext;
+
+const DEFAULT_CAPACITY: usize = 8;
+
+#[derive(Debug, Clone, PartialEq, Eq, Hash)]
+pub(crate) struct CacheKey {
+    workspace: PathBuf,
+    signature: ContentSignature,
+}
+
+#[derive(Debug, Clone, Default, PartialEq, Eq, Hash)]
+struct ContentSignature {
+    entries: Vec<ContentEntry>,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, Hash)]
+struct ContentEntry {
+    path: PathBuf,
+    fingerprint: Option<String>,
+}
+
+#[derive(Debug, Default)]
+struct WorkspaceCache {
+    by_key: HashMap<CacheKey, ProjectContext>,
+    order: VecDeque<CacheKey>,
+}
+
+thread_local! {
+    static CACHE: RefCell<WorkspaceCache> = RefCell::new(WorkspaceCache::default());
+}
+
+pub(crate) fn lookup(key: &CacheKey) -> Option<ProjectContext> {
+    CACHE.with(|cache| cache.borrow().by_key.get(key).cloned())
+}
+
+pub(crate) fn store(key: CacheKey, value: ProjectContext) {
+    CACHE.with(|cache| {
+        let mut cache = cache.borrow_mut();
+        if cache.by_key.insert(key.clone(), value).is_none() {
+            cache.order.push_back(key);
+        }
+        while cache.by_key.len() > DEFAULT_CAPACITY {
+            let Some(oldest) = cache.order.pop_front() else {
+                break;
+            };
+            cache.by_key.remove(&oldest);
+        }
+    });
+}
+
+#[cfg(test)]
+pub(crate) fn clear() {
+    CACHE.with(|cache| {
+        let mut cache = cache.borrow_mut();
+        cache.by_key.clear();
+        cache.order.clear();
+    });
+}
+
+#[must_use]
+pub(crate) fn compute_cache_key(workspace: &Path, home_dir: Option<&Path>) -> CacheKey {
+    let workspace = canonicalize_or_keep(workspace);
+    CacheKey {
+        signature: ContentSignature::for_loader(&workspace, home_dir),
+        workspace,
+    }
+}
+
+impl ContentSignature {
+    fn for_loader(workspace: &Path, home_dir: Option<&Path>) -> Self {
+        let mut entries: Vec<ContentEntry> =
+            crate::project_context::project_context_cache_candidate_paths(workspace, home_dir)
+                .into_iter()
+                .map(|path| ContentEntry {
+                    fingerprint: file_fingerprint(&path),
+                    path,
+                })
+                .collect();
+
+        entries.sort_by(|a, b| a.path.cmp(&b.path));
+        entries.dedup_by(|a, b| a.path == b.path);
+
+        Self { entries }
+    }
+}
+
+fn file_fingerprint(path: &Path) -> Option<String> {
+    let metadata = std::fs::metadata(path).ok()?;
+    if !metadata.is_file() {
+        return Some("non-file".to_string());
+    }
+
+    match std::fs::read(path) {
+        Ok(bytes) => {
+            let mut hasher = Sha256::new();
+            hasher.update(&bytes);
+            Some(format!("sha256:{}", to_hex(&hasher.finalize())))
+        }
+        Err(error) => {
+            let modified = metadata
+                .modified()
+                .ok()
+                .and_then(|mtime| mtime.duration_since(std::time::UNIX_EPOCH).ok())
+                .map(|duration| format!("{}:{}", duration.as_secs(), duration.subsec_nanos()))
+                .unwrap_or_else(|| "unknown".to_string());
+            Some(format!(
+                "unreadable:{}:{}:{error}",
+                metadata.len(),
+                modified
+            ))
+        }
+    }
+}
+
+fn canonicalize_or_keep(path: &Path) -> PathBuf {
+    std::fs::canonicalize(path).unwrap_or_else(|_| path.to_path_buf())
+}
+
+fn to_hex(bytes: &[u8]) -> String {
+    let mut out = String::with_capacity(bytes.len() * 2);
+    for byte in bytes {
+        use std::fmt::Write as _;
+        let _ = write!(&mut out, "{byte:02x}");
+    }
+    out
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::fs;
+    use tempfile::tempdir;
+
+    #[test]
+    fn cache_round_trip() {
+        clear();
+        let key = CacheKey {
+            workspace: PathBuf::from("/tmp/context-cache-round-trip"),
+            signature: ContentSignature::default(),
+        };
+        let ctx = ProjectContext::empty(PathBuf::from("/tmp/context-cache-round-trip"));
+
+        store(key.clone(), ctx.clone());
+
+        let got = lookup(&key).expect("cache hit");
+        assert_eq!(got.project_root, ctx.project_root);
+    }
+
+    #[test]
+    fn store_does_not_grow_unbounded() {
+        clear();
+        for i in 0..(DEFAULT_CAPACITY + 4) {
+            let key = CacheKey {
+                workspace: PathBuf::from(format!("/tmp/workspace-{i}")),
+                signature: ContentSignature::default(),
+            };
+            store(key, ProjectContext::empty(PathBuf::from("/tmp")));
+        }
+
+        let count = CACHE.with(|cache| cache.borrow().by_key.len());
+        assert!(count <= DEFAULT_CAPACITY, "cache held {count} entries");
+    }
+
+    #[test]
+    fn cache_key_canonicalizes_equivalent_workspace_paths() {
+        let workspace = tempdir().expect("workspace");
+        let home = tempdir().expect("home");
+        let plain = compute_cache_key(workspace.path(), Some(home.path()));
+        let dotted = compute_cache_key(&workspace.path().join("."), Some(home.path()));
+
+        assert_eq!(plain.workspace, dotted.workspace);
+    }
+
+    #[test]
+    fn signature_changes_when_agents_md_is_overwritten_same_length() {
+        let workspace = tempdir().expect("workspace");
+        let home = tempdir().expect("home");
+        fs::write(workspace.path().join("AGENTS.md"), "alpha").expect("write alpha");
+        let before = compute_cache_key(workspace.path(), Some(home.path()));
+
+        fs::write(workspace.path().join("AGENTS.md"), "bravo").expect("write bravo");
+        let after = compute_cache_key(workspace.path(), Some(home.path()));
+
+        assert_ne!(before, after);
+    }
+
+    #[test]
+    fn signature_changes_when_constitution_json_changes() {
+        let workspace = tempdir().expect("workspace");
+        let home = tempdir().expect("home");
+        fs::create_dir(workspace.path().join(".git")).expect("mkdir git");
+        fs::create_dir(workspace.path().join(".codewhale")).expect("mkdir codewhale");
+        let constitution = workspace
+            .path()
+            .join(".codewhale")
+            .join("constitution.json");
+        fs::write(&constitution, r#"{"schema_version":1,"authority":["a"]}"#)
+            .expect("write constitution a");
+        let before = compute_cache_key(workspace.path(), Some(home.path()));
+
+        fs::write(&constitution, r#"{"schema_version":1,"authority":["b"]}"#)
+            .expect("write constitution b");
+        let after = compute_cache_key(workspace.path(), Some(home.path()));
+
+        assert_ne!(before, after);
+    }
+}
@@ -2,7 +2,8 @@
 //! System prompts for different modes.
 //!
 //! Prompts are assembled from composable layers loaded at compile time:
-//!   tool taxonomy → base.md → personality overlay → mode delta → approval policy
+//!   base.md + personality overlay → message[0] (byte‑stable).
+//!   mode delta + tool taxonomy + approval policy → request-time runtime metadata.
 //!
 //! This keeps each concern in its own file and makes prompt tuning
 //! a single-file operation.
@@ -298,6 +299,31 @@ static LOCALE_CLOSER_JA_OVERRIDE: std::sync::OnceLock<String> = std::sync::OnceL
 static LOCALE_CLOSER_PT_BR_OVERRIDE: std::sync::OnceLock<String> = std::sync::OnceLock::new();
 static LOCALE_CLOSER_VI_OVERRIDE: std::sync::OnceLock<String> = std::sync::OnceLock::new();
 static AUTHORITY_RECAP_OVERRIDE: std::sync::OnceLock<String> = std::sync::OnceLock::new();
+static STATIC_PROMPT_COMPOSER: std::sync::OnceLock<Box<StaticPromptComposer>> =
+    std::sync::OnceLock::new();
+
+/// Context passed to an embedder-provided static prompt composer.
+///
+/// This hook only replaces the byte-stable base/personality prompt segment.
+/// Mode deltas, approval policy, tool taxonomy, Context Management, and the
+/// Compaction Relay stay owned by CodeWhale's runtime prompt assembly.
+#[non_exhaustive]
+#[derive(Debug)]
+pub struct StaticPromptCtx<'a> {
+    /// Active model identifier after caller-side routing.
+    pub model_id: &'a str,
+    /// Personality overlay requested for the base static prompt.
+    pub personality: Personality,
+    /// Whether shell tools are present in the runtime tool catalog.
+    pub shell_tools_available: bool,
+    /// Default base/personality prompt layers that would be used without an
+    /// override.
+    pub default_layers: &'a str,
+}
+
+/// Embedder hook for replacing CodeWhale's byte-stable base/personality prompt
+/// segment.
+pub type StaticPromptComposer = dyn Fn(&StaticPromptCtx<'_>) -> String + Send + Sync + 'static;

 /// Replace `BASE_PROMPT` for all subsequent prompt composition. First call
 /// wins; later calls return the rejected string. Set before spawning any
@@ -351,10 +377,26 @@ pub fn set_authority_recap_override(s: String) -> Result<(), String> {
    set_prompt_override(&AUTHORITY_RECAP_OVERRIDE, s)
 }

+/// Replace the byte-stable base/personality prompt segment for subsequent
+/// prompt composition. First call wins; later calls return the rejected
+/// composer so embedders can preserve ownership.
+pub fn set_static_prompt_composer_override(
+    f: Box<StaticPromptComposer>,
+) -> Result<(), Box<StaticPromptComposer>> {
+    set_static_prompt_composer(&STATIC_PROMPT_COMPOSER, f)
+}
+
 fn set_prompt_override(cell: &std::sync::OnceLock<String>, s: String) -> Result<(), String> {
    cell.set(s)
 }

+fn set_static_prompt_composer(
+    cell: &std::sync::OnceLock<Box<StaticPromptComposer>>,
+    f: Box<StaticPromptComposer>,
+) -> Result<(), Box<StaticPromptComposer>> {
+    cell.set(f)
+}
+
 fn effective_prompt_override<'a>(
    cell: &'a std::sync::OnceLock<String>,
    fallback: &'static str,
@@ -366,6 +408,10 @@ fn effective_base_prompt() -> &'static str {
    effective_prompt_override(&BASE_PROMPT_OVERRIDE, BASE_PROMPT)
 }

+fn effective_static_prompt_composer() -> Option<&'static StaticPromptComposer> {
+    STATIC_PROMPT_COMPOSER.get().map(Box::as_ref)
+}
+
 fn effective_locale_preamble_zh_hans() -> &'static str {
    effective_prompt_override(&LOCALE_PREAMBLE_ZH_HANS_OVERRIDE, LOCALE_PREAMBLE_ZH_HANS)
 }
@@ -655,32 +701,65 @@ impl Personality {

 // ── Composition ───────────────────────────────────────────────────────

-fn mode_prompt(mode: AppMode) -> &'static str {
-    match mode {
-        AppMode::Agent => AGENT_MODE,
-        AppMode::Yolo => YOLO_MODE,
-        AppMode::Plan => PLAN_MODE,
-    }
-}
+/// Generate a static reference block containing all mode and approval policy
+/// descriptions. This lives in the frozen system-prompt prefix (sent once per
+/// session) so the per-turn `<runtime_prompt>` tag can be a minimal pointer
+/// (`<runtime_prompt mode="yolo" approval="auto"/>`) instead of repeating the
+/// full policy text on every API request.
+pub(crate) fn render_runtime_policy_reference() -> String {
+    let taxonomy_agent = render_core_tool_taxonomy_body(AppMode::Agent);
+    let taxonomy_plan = render_core_tool_taxonomy_body(AppMode::Plan);
+    let taxonomy_yolo = render_core_tool_taxonomy_body(AppMode::Yolo);

-fn default_approval_mode_for_mode(mode: AppMode) -> ApprovalMode {
-    match mode {
-        AppMode::Agent => ApprovalMode::Suggest,
-        AppMode::Yolo => ApprovalMode::Auto,
-        AppMode::Plan => ApprovalMode::Never,
-    }
-}
+    let mut out = String::with_capacity(8192);
+    out.push_str("## Runtime Policy Reference\n\n");

-fn approval_prompt_for_mode(mode: AppMode, approval_mode: ApprovalMode) -> &'static str {
-    match mode {
-        AppMode::Yolo => AUTO_APPROVAL,
-        AppMode::Plan => NEVER_APPROVAL,
-        AppMode::Agent => match approval_mode {
-            ApprovalMode::Auto => AUTO_APPROVAL,
-            ApprovalMode::Suggest => SUGGEST_APPROVAL,
-            ApprovalMode::Never => NEVER_APPROVAL,
-        },
-    }
+    // Protocol explanation — how the per-turn tag maps to this reference.
+    out.push_str(
+        "Each turn, the latest message in the transcript will contain a \
+         `<runtime_prompt>` tag that specifies the currently active mode and \
+         approval policy. When you see this tag, look up the corresponding \
+         rules below and apply them for the current turn.\n\n\
+         The tag format is:\n\
+         `<runtime_prompt visibility=\"internal\" mode=\"<mode>\" approval=\"<approval>\"/>`\n\n",
+    );
+
+    // ── Mode reference ─────────────────────────────────────────────────
+    out.push_str("### Modes\n\n");
+
+    out.push_str("#### agent\n\n");
+    out.push_str(&taxonomy_agent);
+    out.push_str("\n\n");
+    out.push_str(AGENT_MODE.trim());
+    out.push_str("\n\n");
+
+    out.push_str("#### plan\n\n");
+    out.push_str(&taxonomy_plan);
+    out.push_str("\n\n");
+    out.push_str(PLAN_MODE.trim());
+    out.push_str("\n\n");
+
+    out.push_str("#### yolo\n\n");
+    out.push_str(&taxonomy_yolo);
+    out.push_str("\n\n");
+    out.push_str(YOLO_MODE.trim());
+    out.push_str("\n\n");
+
+    // ── Approval policy reference ──────────────────────────────────────
+    out.push_str("### Approval Policies\n\n");
+
+    out.push_str("#### auto\n\n");
+    out.push_str(AUTO_APPROVAL.trim());
+    out.push_str("\n\n");
+
+    out.push_str("#### suggest\n\n");
+    out.push_str(SUGGEST_APPROVAL.trim());
+    out.push_str("\n\n");
+
+    out.push_str("#### never\n\n");
+    out.push_str(NEVER_APPROVAL.trim());
+
+    out
 }

 /// Compose the full system prompt in deterministic order:
@@ -705,7 +784,10 @@ const TOOL_TAXONOMY_DISCOVERY: &[&str] = &["grep_files", "file_search"];
 const TOOL_TAXONOMY_GIT: &[&str] = &["git_status", "git_diff"];
 const TOOL_TAXONOMY_VERIFICATION: &[&str] = &["run_tests", "run_verifiers"];

-fn render_core_tool_taxonomy_block(mode: AppMode) -> String {
+/// Return the core tool taxonomy body **without** a markdown heading.
+/// Suitable for embedding under a mode-specific sub-heading in the
+/// Runtime Policy Reference without producing a broken heading hierarchy.
+pub(crate) fn render_core_tool_taxonomy_body(mode: AppMode) -> String {
    let core_tools = core_taxonomy_tools_for_mode(mode);
    let mut sentences = Vec::new();

@@ -723,7 +805,7 @@ fn render_core_tool_taxonomy_block(mode: AppMode) -> String {
        !sentences.is_empty(),
        "core tool taxonomy has no active tool groups"
    );
-    format!("## Core Tool Taxonomy\n\n{}", sentences.join(" "))
+    sentences.join(" ")
 }

 fn core_taxonomy_tools_for_mode(mode: AppMode) -> Vec<&'static str> {
@@ -762,15 +844,11 @@ context are subordinate to the Constitution, the Statutes, and the user's
 current request. When in doubt, consult Article VII: The Hierarchy of Law.";

 pub fn compose_prompt(mode: AppMode, personality: Personality) -> String {
-    compose_prompt_with_approval(mode, personality, default_approval_mode_for_mode(mode))
+    compose_prompt_with_approval(mode, personality)
 }

-pub fn compose_prompt_with_approval(
-    mode: AppMode,
-    personality: Personality,
-    approval_mode: ApprovalMode,
-) -> String {
-    compose_prompt_with_approval_and_model(mode, personality, approval_mode, "codewhale")
+pub fn compose_prompt_with_approval(mode: AppMode, personality: Personality) -> String {
+    compose_prompt_with_approval_and_model(mode, personality, "codewhale")
 }

 /// Compose with explicit model ID for dynamic identity injection.
@@ -778,33 +856,40 @@ pub fn compose_prompt_with_approval(
 pub fn compose_prompt_with_approval_and_model(
    mode: AppMode,
    personality: Personality,
-    approval_mode: ApprovalMode,
    model_id: &str,
 ) -> String {
-    compose_prompt_with_approval_model_and_shell(mode, personality, approval_mode, model_id, true)
+    compose_prompt_with_approval_model_and_shell(mode, personality, model_id, true)
 }

 fn compose_prompt_with_approval_model_and_shell(
    mode: AppMode,
    personality: Personality,
-    approval_mode: ApprovalMode,
    model_id: &str,
    allow_shell: bool,
 ) -> String {
-    let tool_taxonomy = render_core_tool_taxonomy_block(mode);
    let shell_tools_available = allow_shell && mode != AppMode::Plan;
+    let default_layers =
+        compose_default_static_layers(personality, model_id, shell_tools_available);
+    apply_static_prompt_composer(
+        effective_static_prompt_composer(),
+        personality,
+        model_id,
+        shell_tools_available,
+        &default_layers,
+    )
+}
+
+fn compose_default_static_layers(
+    personality: Personality,
+    model_id: &str,
+    shell_tools_available: bool,
+) -> String {
    let base_prompt = render_base_prompt_for_tool_availability(
        effective_base_prompt().trim(),
        model_id,
        shell_tools_available,
    );
-    let parts: [&str; 5] = [
-        tool_taxonomy.as_str(),
-        base_prompt.as_str(),
-        personality.prompt().trim(),
-        mode_prompt(mode).trim(),
-        approval_prompt_for_mode(mode, approval_mode).trim(),
-    ];
+    let parts: [&str; 2] = [base_prompt.as_str(), personality.prompt().trim()];

    let mut out =
        String::with_capacity(parts.iter().map(|p| p.len()).sum::<usize>() + (parts.len() - 1) * 2);
@@ -818,6 +903,24 @@ fn compose_prompt_with_approval_model_and_shell(
    out
 }

+fn apply_static_prompt_composer(
+    composer: Option<&StaticPromptComposer>,
+    personality: Personality,
+    model_id: &str,
+    shell_tools_available: bool,
+    default_layers: &str,
+) -> String {
+    match composer {
+        Some(composer) => composer(&StaticPromptCtx {
+            model_id,
+            personality,
+            shell_tools_available,
+            default_layers,
+        }),
+        None => default_layers.to_string(),
+    }
+}
+
 fn render_base_prompt_for_tool_availability(
    prompt: &str,
    model_id: &str,
@@ -883,22 +986,16 @@ fn compose_mode_prompt(mode: AppMode) -> String {
    compose_prompt(mode, Personality::Calm)
 }

-fn compose_mode_prompt_with_approval(mode: AppMode, approval_mode: ApprovalMode) -> String {
-    compose_prompt_with_approval(mode, Personality::Calm, approval_mode)
+fn compose_mode_prompt_with_approval(mode: AppMode) -> String {
+    compose_prompt_with_approval(mode, Personality::Calm)
 }

 fn compose_mode_prompt_with_approval_and_model(
    mode: AppMode,
-    approval_mode: ApprovalMode,
+    _approval_mode: ApprovalMode,
    model_id: &str,
 ) -> String {
-    compose_prompt_with_approval_model_and_shell(
-        mode,
-        Personality::Calm,
-        approval_mode,
-        model_id,
-        true,
-    )
+    compose_prompt_with_approval_model_and_shell(mode, Personality::Calm, model_id, true)
 }

 // ── Public API ────────────────────────────────────────────────────────
@@ -991,7 +1088,6 @@ pub fn system_prompt_for_mode_with_context_skills_and_session(
        skills_dir,
        instructions,
        session_context,
-        default_approval_mode_for_mode(mode),
    )
 }

@@ -1002,12 +1098,10 @@ pub fn system_prompt_for_mode_with_context_skills_session_and_approval(
    skills_dir: Option<&Path>,
    instructions: Option<&[InstructionSource]>,
    session_context: PromptSessionContext<'_>,
-    approval_mode: ApprovalMode,
 ) -> SystemPrompt {
    let mode_prompt = compose_prompt_with_approval_model_and_shell(
        mode,
        Personality::Calm,
-        approval_mode,
        session_context.model_id,
        session_context.allow_shell,
    );
@@ -1068,13 +1162,16 @@ pub fn system_prompt_for_mode_with_context_skills_session_and_approval(
    // skills directory (`.agents/skills`, `skills`,
    // `.opencode/skills`, `.claude/skills`, `.cursor/skills`) plus global
    // `~/.agents/skills` / `~/.deepseek/skills` so skills installed for any
-    // AI-tool convention show up in the catalogue. The legacy
-    // single-`skills_dir` path is
-    // honoured as a fallback for callers that don't supply a
-    // workspace-aware view; it falls through to the same merged
-    // registry when available.
-    let skills_block = crate::skills::render_available_skills_context_for_workspace(workspace)
-        .or_else(|| skills_dir.and_then(crate::skills::render_available_skills_context));
+    // AI-tool convention show up in the catalogue. When an explicit
+    // `skills_dir` is configured, union it with the workspace view instead of
+    // treating it as a fallback; the workspace view often returns Some and
+    // would otherwise shadow the configured directory entirely.
+    let skills_block = match skills_dir {
+        Some(dir) => {
+            crate::skills::render_available_skills_context_for_workspace_and_dir(workspace, dir)
+        }
+        None => crate::skills::render_available_skills_context_for_workspace(workspace),
+    };
    if let Some(block) = skills_block {
        full_prompt = format!("{full_prompt}\n\n{block}");
    }
@@ -1104,6 +1201,13 @@ pub fn system_prompt_for_mode_with_context_skills_session_and_approval(
    full_prompt.push_str("\n\n");
    full_prompt.push_str(COMPACT_TEMPLATE);

+    // 5a. Runtime policy reference — all mode and approval policy descriptions
+    //     live here in the frozen prefix so the per-turn <runtime_prompt> tag
+    //     can be a minimal pointer instead of repeating the full policy text
+    //     on every API request (up to ~500 tokens saved per turn).
+    full_prompt.push_str("\n\n");
+    full_prompt.push_str(&render_runtime_policy_reference());
+
    // ── Volatile-content boundary ─────────────────────────────────────────
    // Everything below drifts mid-session and busts the prefix cache for
    // bytes that follow. All static layers (mode, project context, env,
@@ -1235,6 +1339,79 @@ mod tests {
        assert_eq!(effective_prompt_override(&cell, "fallback"), "first");
    }

+    #[test]
+    fn static_prompt_composer_storage_returns_rejected_composer() {
+        let cell = std::sync::OnceLock::new();
+        let first: Box<StaticPromptComposer> =
+            Box::new(|ctx| format!("first:{}", ctx.default_layers.len()));
+        let second: Box<StaticPromptComposer> =
+            Box::new(|ctx| format!("second:{}", ctx.default_layers.len()));
+
+        assert!(set_static_prompt_composer(&cell, first).is_ok());
+        let rejected = set_static_prompt_composer(&cell, second)
+            .expect_err("second composer should be rejected");
+        let ctx = StaticPromptCtx {
+            model_id: "deepseek-v4-pro",
+            personality: Personality::Calm,
+            shell_tools_available: true,
+            default_layers: "fallback",
+        };
+
+        assert_eq!(rejected(&ctx), "second:8");
+        assert_eq!(
+            cell.get().expect("first composer retained")(&ctx),
+            "first:8"
+        );
+    }
+
+    #[test]
+    fn static_prompt_composer_unset_keeps_default_layers_byte_identical() {
+        for personality in [Personality::Calm, Personality::Playful] {
+            for shell_tools_available in [true, false] {
+                let default_layers = compose_default_static_layers(
+                    personality,
+                    "deepseek-v4-flash",
+                    shell_tools_available,
+                );
+                let composed = apply_static_prompt_composer(
+                    None,
+                    personality,
+                    "deepseek-v4-flash",
+                    shell_tools_available,
+                    &default_layers,
+                );
+
+                assert_byte_identical("unset static prompt composer", &default_layers, &composed);
+            }
+        }
+    }
+
+    #[test]
+    fn static_prompt_composer_receives_context_and_replaces_layers() {
+        let default_layers =
+            compose_default_static_layers(Personality::Calm, "deepseek-v4-pro", false);
+        let composer: Box<StaticPromptComposer> = Box::new(|ctx| {
+            assert_eq!(ctx.model_id, "deepseek-v4-pro");
+            assert_eq!(ctx.personality, Personality::Calm);
+            assert!(!ctx.shell_tools_available);
+            assert!(ctx.default_layers.contains("You are deepseek-v4-pro"));
+            assert!(ctx.default_layers.contains("Personality: Calm"));
+            assert!(!ctx.default_layers.contains("## Core Tool Taxonomy"));
+            assert!(!ctx.default_layers.contains("Approval Policy"));
+            "embedder static prompt".to_string()
+        });
+
+        let composed = apply_static_prompt_composer(
+            Some(composer.as_ref()),
+            Personality::Calm,
+            "deepseek-v4-pro",
+            false,
+            &default_layers,
+        );
+
+        assert_eq!(composed, "embedder static prompt");
+    }
+
    fn contains_cjk(text: &str) -> bool {
        text.chars().any(|ch| {
            matches!(
@@ -1332,7 +1509,6 @@ mod tests {
        let prompt = compose_prompt_with_approval_and_model(
            AppMode::Agent,
            Personality::Calm,
-            ApprovalMode::Suggest,
            "deepseek-v4-flash",
        );
        assert!(
@@ -1350,7 +1526,6 @@ mod tests {
        let prompt = compose_prompt_with_approval_model_and_shell(
            AppMode::Agent,
            Personality::Calm,
-            ApprovalMode::Suggest,
            "deepseek-v4-pro",
            true,
        );
@@ -1366,7 +1541,6 @@ mod tests {
        let prompt = compose_prompt_with_approval_model_and_shell(
            AppMode::Agent,
            Personality::Calm,
-            ApprovalMode::Suggest,
            "deepseek-v4-pro",
            false,
        );
@@ -1400,47 +1574,39 @@ mod tests {
    }

    #[test]
-    fn composed_prompt_starts_with_core_tool_taxonomy() {
+    fn composed_prompt_no_longer_inlines_tool_taxonomy() {
        let prompt = compose_prompt_with_approval_and_model(
            AppMode::Agent,
            Personality::Calm,
-            ApprovalMode::Suggest,
            "deepseek-v4-pro",
        );
-        let expected_taxonomy = render_core_tool_taxonomy_block(AppMode::Agent);
-
-        assert!(
-            prompt.starts_with(&expected_taxonomy),
-            "composed prompt should start with the compact generated tool taxonomy"
-        );
+        // The core tool taxonomy (grep_files / git_status / run_tests hints)
+        // is no longer prepended as a standalone "## Core Tool Taxonomy" block.
+        // It now lives inside the "## Runtime Policy Reference" section of the
+        // system prompt, scoped under each mode sub-heading.
+        // (The "## Toolbox" section from the Constitutional preamble remains.)
+        assert!(!prompt.contains("## Core Tool Taxonomy"));
+        assert!(prompt.contains("You are deepseek-v4-pro"));
    }

    #[test]
    fn plan_prompt_taxonomy_omits_run_tests() {
-        let prompt = compose_prompt_with_approval_and_model(
-            AppMode::Plan,
-            Personality::Calm,
-            ApprovalMode::Never,
-            "deepseek-v4-pro",
-        );
-        let expected_taxonomy = render_core_tool_taxonomy_block(AppMode::Plan);
-
+        let taxonomy = render_core_tool_taxonomy_body(AppMode::Plan);
+        // Plan taxonomy should omit execution tools (verified at the source).
        assert!(
-            prompt.starts_with(&expected_taxonomy),
-            "Plan prompt should start with its mode-specific tool taxonomy"
-        );
-        assert!(
-            expected_taxonomy.contains("for discovery")
-                && expected_taxonomy.contains("for git inspection"),
+            taxonomy.contains("for discovery") && taxonomy.contains("for git inspection"),
            "Plan taxonomy should keep read-only discovery and git guidance"
        );
        assert!(
-            !expected_taxonomy.contains("run_tests")
-                && !expected_taxonomy.contains("run_verifiers")
-                && !expected_taxonomy.contains("for verification")
-                && !expected_taxonomy.contains("Use  "),
-            "Plan taxonomy must not advertise unavailable verification tools: {expected_taxonomy:?}"
+            !taxonomy.contains("run_tests")
+                && !taxonomy.contains("run_verifiers")
+                && !taxonomy.contains("exec_shell"),
+            "Plan taxonomy must not mention run_tests, run_verifiers, or exec_shell"
        );
+        // The taxonomy block is rendered correctly but no longer inlined
+        // into the base system prompt — it lives inside the
+        // "## Runtime Policy Reference" section of the system prompt,
+        // scoped under each mode sub-heading.
    }

    #[test]
@@ -1468,7 +1634,6 @@ mod tests {
            None,
            None,
            PromptSessionContext::default(),
-            ApprovalMode::Suggest,
        ) {
            SystemPrompt::Text(text) => text,
            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
@@ -1483,6 +1648,135 @@ mod tests {
        );
    }

+    #[test]
+    fn runtime_policy_reference_is_included_in_full_prompt() {
+        let tmp = tempdir().expect("tempdir");
+        let text = match system_prompt_for_mode_with_context_skills_session_and_approval(
+            AppMode::Agent,
+            tmp.path(),
+            None,
+            None,
+            None,
+            PromptSessionContext::default(),
+        ) {
+            SystemPrompt::Text(text) => text,
+            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
+        };
+
+        assert!(
+            text.contains("## Runtime Policy Reference"),
+            "full system prompt must contain the Runtime Policy Reference lookup table"
+        );
+        assert!(
+            text.contains(
+                "<runtime_prompt visibility=\"internal\" mode=\"<mode>\" approval=\"<approval>\"/>"
+            ),
+            "Runtime Policy Reference must explain the per-turn tag format"
+        );
+        assert!(
+            text.contains("### Modes"),
+            "Runtime Policy Reference must contain the Modes section"
+        );
+        assert!(
+            text.contains("#### agent"),
+            "Runtime Policy Reference must document Agent mode"
+        );
+        assert!(
+            text.contains("#### plan"),
+            "Runtime Policy Reference must document Plan mode"
+        );
+        assert!(
+            text.contains("#### yolo"),
+            "Runtime Policy Reference must document YOLO mode"
+        );
+        assert!(
+            text.contains("### Approval Policies"),
+            "Runtime Policy Reference must contain the Approval Policies section"
+        );
+        assert!(
+            text.contains("#### auto"),
+            "Runtime Policy Reference must document auto approval"
+        );
+        assert!(
+            text.contains("#### suggest"),
+            "Runtime Policy Reference must document suggest approval"
+        );
+        assert!(
+            text.contains("#### never"),
+            "Runtime Policy Reference must document never approval"
+        );
+    }
+
+    #[test]
+    fn system_prompt_merges_workspace_and_configured_skills_dir() {
+        let _env_guard = crate::test_support::lock_test_env();
+        let tmp = tempdir().expect("tempdir");
+        let _home = ScopedHome::set(tmp.path().join("home"));
+        let workspace = tmp.path().join("workspace");
+        let configured_dir = tmp.path().join("configured-skills");
+        write_test_skill(
+            &workspace.join(".claude").join("skills"),
+            "workspace-skill",
+            "workspace skill",
+        );
+        write_test_skill(&configured_dir, "configured-skill", "configured skill");
+
+        let text = match system_prompt_for_mode_with_context_and_skills(
+            AppMode::Plan,
+            &workspace,
+            None,
+            Some(&configured_dir),
+            None,
+            None,
+        ) {
+            SystemPrompt::Text(text) => text,
+            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
+        };
+
+        assert!(text.contains("workspace-skill"));
+        assert!(text.contains("configured-skill"));
+    }
+
+    struct ScopedHome {
+        previous: Option<std::ffi::OsString>,
+    }
+
+    impl ScopedHome {
+        fn set(path: std::path::PathBuf) -> Self {
+            let previous = std::env::var_os("HOME");
+            // Safety: this test serializes environment access with
+            // lock_test_env and restores HOME in Drop.
+            unsafe {
+                std::env::set_var("HOME", path);
+            }
+            Self { previous }
+        }
+    }
+
+    impl Drop for ScopedHome {
+        fn drop(&mut self) {
+            // Safety: this test serializes environment access with
+            // lock_test_env and restores HOME in Drop.
+            unsafe {
+                if let Some(previous) = self.previous.take() {
+                    std::env::set_var("HOME", previous);
+                } else {
+                    std::env::remove_var("HOME");
+                }
+            }
+        }
+    }
+
+    fn write_test_skill(root: &std::path::Path, name: &str, description: &str) {
+        let dir = root.join(name);
+        std::fs::create_dir_all(&dir).expect("skill dir");
+        std::fs::write(
+            dir.join("SKILL.md"),
+            format!("---\nname: {name}\ndescription: {description}\n---\n\n# {name}\n"),
+        )
+        .expect("skill file");
+    }
+
    #[test]
    fn calm_personality_declares_tier_8_subordination() {
        assert!(
@@ -1604,7 +1898,6 @@ mod tests {
                show_thinking: true,
                allow_shell: true,
            },
-            ApprovalMode::Suggest,
        ) {
            SystemPrompt::Text(text) => text,
            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
@@ -1676,7 +1969,6 @@ mod tests {
                show_thinking: true,
                allow_shell: true,
            },
-            ApprovalMode::Suggest,
        ) {
            SystemPrompt::Text(text) => text,
            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
@@ -1721,7 +2013,6 @@ mod tests {
                show_thinking: false,
                allow_shell: true,
            },
-            ApprovalMode::Suggest,
        ) {
            SystemPrompt::Text(text) => text,
            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
@@ -1776,7 +2067,6 @@ mod tests {
                show_thinking: true,
                allow_shell: true,
            },
-            ApprovalMode::Suggest,
        ) {
            SystemPrompt::Text(text) => text,
            SystemPrompt::Blocks(_) => panic!("expected text system prompt"),
@@ -1811,7 +2101,7 @@ mod tests {
            "base prompt must not contain static CJK priming tokens"
        );
        for mode in [AppMode::Agent, AppMode::Plan, AppMode::Yolo] {
-            let taxonomy = render_core_tool_taxonomy_block(mode);
+            let taxonomy = render_core_tool_taxonomy_body(mode);
            assert!(
                !contains_cjk(&taxonomy),
                "tool taxonomy must not contain static CJK priming tokens: {taxonomy:?}"
@@ -2102,10 +2392,10 @@ mod tests {
        assert!(prompt.contains("You are codewhale"));
        // Personality layer
        assert!(prompt.contains("Personality: Calm"));
-        // Mode layer
-        assert!(prompt.contains("Mode: Agent"));
-        // Approval layer
-        assert!(prompt.contains("Approval Policy: Suggest"));
+        // Mode and approval are no longer inlined — they travel as
+        // request-time runtime metadata.
+        assert!(!prompt.contains("Mode: Agent"));
+        assert!(!prompt.contains("Approval Policy:"));
    }

    /// Gate against shipping a release with a missing CHANGELOG entry — which
@@ -2160,32 +2450,37 @@ mod tests {
        let prompt = compose_prompt(AppMode::Yolo, Personality::Calm);
        let base_pos = prompt.find("You are codewhale").unwrap();
        let personality_pos = prompt.find("Personality: Calm").unwrap();
-        let mode_pos = prompt.find("Mode: YOLO").unwrap();
-        let approval_pos = prompt.find("Approval Policy: Auto").unwrap();

        assert!(base_pos < personality_pos);
-        assert!(personality_pos < mode_pos);
-        assert!(mode_pos < approval_pos);
+        // Mode and approval text are no longer inlined — they travel as
+        // request-time runtime metadata.
    }

    #[test]
-    fn each_mode_gets_correct_approval() {
-        assert!(
-            compose_prompt(AppMode::Agent, Personality::Calm).contains("Approval Policy: Suggest")
-        );
-        assert!(compose_prompt(AppMode::Yolo, Personality::Calm).contains("Approval Policy: Auto"));
-        assert!(
-            compose_prompt(AppMode::Plan, Personality::Calm).contains("Approval Policy: Never")
-        );
+    fn base_prompt_is_mode_agnostic() {
+        // Mode and approval text are no longer inlined into compose_prompt —
+        // they travel as request-time runtime metadata.
+        let agent_prompt = compose_prompt(AppMode::Agent, Personality::Calm);
+        let yolo_prompt = compose_prompt(AppMode::Yolo, Personality::Calm);
+        let plan_prompt = compose_prompt(AppMode::Plan, Personality::Calm);
+        assert!(!agent_prompt.contains("Mode: Agent"));
+        assert!(!yolo_prompt.contains("Mode: YOLO"));
+        assert!(!plan_prompt.contains("Mode: Plan"));
+        assert!(!agent_prompt.contains("Approval Policy:"));
+        assert!(!yolo_prompt.contains("Approval Policy:"));
+        assert!(!plan_prompt.contains("Approval Policy:"));
+        // Base prompt still contains Constitutional preamble and personality
+        assert!(agent_prompt.contains("You are codewhale"));
+        assert!(agent_prompt.contains("Personality: Calm"));
    }

    #[test]
-    fn agent_prompt_can_reflect_never_approval_policy() {
-        let prompt =
-            compose_prompt_with_approval(AppMode::Agent, Personality::Calm, ApprovalMode::Never);
-        assert!(prompt.contains("Mode: Agent"));
-        assert!(prompt.contains("Approval Policy: Never"));
-        assert!(prompt.contains("/config approval_mode suggest"));
+    fn approval_policy_no_longer_inlined_in_base_prompt() {
+        let prompt = compose_prompt_with_approval(AppMode::Agent, Personality::Calm);
+        assert!(!prompt.contains("Mode: Agent"));
+        assert!(!prompt.contains("Approval Policy:"));
+        // Constitutional preamble is still present
+        assert!(prompt.contains("You are codewhale"));
    }

    #[test]
@@ -2493,7 +2788,7 @@ mod tests {
    // in the cached prefix must produce identical bytes given identical
    // inputs across calls.

-    use crate::test_support::assert_byte_identical;
+    use crate::test_support::{EnvVarGuard, assert_byte_identical};

    #[test]
    fn compose_prompt_is_byte_stable_across_calls() {
@@ -2519,8 +2814,13 @@ mod tests {
        // identical bytes. This pins the most representative production
        // surface (engine.rs builds the system prompt via this fn or
        // its sibling _and_skills variant on every turn).
-        let tmp = tempdir().expect("tempdir");
-        let workspace = tmp.path();
+        let _env_guard = crate::test_support::lock_test_env();
+        let workspace_tmp = tempdir().expect("workspace tempdir");
+        let home_tmp = tempdir().expect("home tempdir");
+        let _home = EnvVarGuard::set("HOME", home_tmp.path().as_os_str());
+        let _userprofile = EnvVarGuard::set("USERPROFILE", home_tmp.path().as_os_str());
+        let _skills_dir = EnvVarGuard::remove("DEEPSEEK_SKILLS_DIR");
+        let workspace = workspace_tmp.path();

        for mode in [AppMode::Agent, AppMode::Yolo, AppMode::Plan] {
            let a = match system_prompt_for_mode_with_context(mode, workspace, None) {
@@ -2544,7 +2844,12 @@ mod tests {
        // Working-set metadata is now injected into the latest user message
        // per turn. The legacy argument remains for call-site compatibility
        // but must not reintroduce volatile bytes into the system prompt.
+        let _env_guard = crate::test_support::lock_test_env();
        let tmp = tempdir().expect("tempdir");
+        let home_tmp = tempdir().expect("home tempdir");
+        let _home = EnvVarGuard::set("HOME", home_tmp.path().as_os_str());
+        let _userprofile = EnvVarGuard::set("USERPROFILE", home_tmp.path().as_os_str());
+        let _skills_dir = EnvVarGuard::remove("DEEPSEEK_SKILLS_DIR");
        let workspace = tmp.path();
        let summary = "## Repo Working Set\nWorkspace: /tmp/x\n";

@@ -2575,7 +2880,12 @@ mod tests {
        // rendered prompt must produce identical bytes. The relay block
        // lands below the static boundary in
        // `system_prompt_for_mode_with_context_and_skills`.
+        let _env_guard = crate::test_support::lock_test_env();
        let tmp = tempdir().expect("tempdir");
+        let home_tmp = tempdir().expect("home tempdir");
+        let _home = EnvVarGuard::set("HOME", home_tmp.path().as_os_str());
+        let _userprofile = EnvVarGuard::set("USERPROFILE", home_tmp.path().as_os_str());
+        let _skills_dir = EnvVarGuard::remove("DEEPSEEK_SKILLS_DIR");
        let workspace = tmp.path();
        let handoff_dir = workspace.join(".deepseek");
        std::fs::create_dir_all(&handoff_dir).unwrap();
@@ -1,4 +1,4 @@
-## Approval Policy: Auto — Tier 2 (Statute)
+##### Approval Policy: Auto — Tier 2 (Statute)

 All tool calls are pre-approved. You will not see approval prompts — your actions execute immediately.

@@ -1,4 +1,4 @@
-## Approval Policy: Never — Tier 2 (Statute)
+##### Approval Policy: Never — Tier 2 (Statute)

 All write operations are blocked. You can read, search, and investigate, but you cannot modify the workspace.

@@ -1,4 +1,4 @@
-## Approval Policy: Suggest — Tier 2 (Statute)
+##### Approval Policy: Suggest — Tier 2 (Statute)

 Read-only operations run silently. Write operations (file edits, patches, shell execution, sub-agent spawns, CSV batches) require user approval before executing.

@@ -242,7 +242,7 @@ When context is deep (past a soft seam): cache reasoning conclusions in concise

 ## Toolbox (fast reference — tool descriptions are authoritative)

- **Planning / tracking**: `checklist_write` (primary Work progress under the active task/thread), `checklist_add` / `checklist_update` / `checklist_list`, `update_plan` (optional high-level strategy metadata for complex initiatives), `task_create` / `task_list` / `task_read` / `task_cancel` (durable work objects), `todo_*` aliases (legacy compatibility), `note` (persistent memory).
+- **Planning / tracking**: `checklist_write` (primary Work progress under the active task/thread), `checklist_add` / `checklist_update` / `checklist_list`, `update_plan` (optional high-level strategy metadata for complex initiatives), `task_create` / `task_list` / `task_read` / `task_cancel` (durable work objects), `note` (persistent memory).
 - **File I/O**: `read_file` (PDFs auto-extracted), `list_dir`, `write_file`, `edit_file`, `apply_patch`, `retrieve_tool_result` for prior spilled large tool outputs.
 - **Shell**: `task_shell_start` + `task_shell_wait` for long-running commands, diagnostics, tests, searches, and servers; `exec_shell` for bounded cancellable foreground commands; `exec_shell_wait`, `exec_shell_interact`. If foreground `exec_shell` times out, the process was killed; rerun long work with `task_shell_start` or `exec_shell` using `background: true`, then poll/wait.
 - **Task evidence**: `task_gate_run` for verification gates; `pr_attempt_record` / `pr_attempt_list` / `pr_attempt_read` / `pr_attempt_preflight`; for GitHub issue/PR/release triage, prefer the native `gh ... --json` CLI through shell because it is authenticated, structured, and reproducible; `github_issue_context` / `github_pr_context` are read-only fallbacks when the CLI route is unavailable; `github_comment` / `github_close_issue` require approval + evidence; `automation_*` scheduling tools.
@@ -37,7 +37,7 @@ Model notes: DeepSeek V4 models emit *thinking tokens* (`ContentBlock::Thinking`

 ## Toolbox (fast reference — tool descriptions are authoritative)

- **Planning / tracking**: `checklist_write` (primary Work progress under the active task/thread), `checklist_add` / `checklist_update` / `checklist_list`, `update_plan` (optional high-level strategy metadata for complex initiatives), `task_create` / `task_list` / `task_read` / `task_cancel` (durable work objects), `todo_*` aliases (legacy compatibility), `note` (persistent memory).
+- **Planning / tracking**: `checklist_write` (primary Work progress under the active task/thread), `checklist_add` / `checklist_update` / `checklist_list`, `update_plan` (optional high-level strategy metadata for complex initiatives), `task_create` / `task_list` / `task_read` / `task_cancel` (durable work objects), `note` (persistent memory).
 - **File I/O**: `read_file` (PDFs auto-extracted), `list_dir`, `write_file`, `edit_file`, `apply_patch`, `retrieve_tool_result` for prior spilled large tool outputs.
 - **Shell**: `task_shell_start` + `task_shell_wait` for long-running commands, diagnostics, tests, searches, and servers; `exec_shell` for bounded cancellable foreground commands; `exec_shell_wait`, `exec_shell_interact`.
 - **Task evidence**: `task_gate_run` for verification gates; `pr_attempt_record` / `pr_attempt_list` / `pr_attempt_read` / `pr_attempt_preflight`; for GitHub issue/PR/release triage, prefer the native `gh ... --json` CLI through shell because it is authenticated, structured, and reproducible; `github_issue_context` / `github_pr_context` are read-only fallbacks when the CLI route is unavailable; `github_comment` / `github_close_issue` require approval + evidence; `automation_*` scheduling tools.
@@ -1,4 +1,4 @@
-## Mode: Agent
+##### Mode: Agent

 You are running in Agent mode — autonomous task execution with tool access.

@@ -12,7 +12,7 @@ For simple writes, state the direct edit and proceed through the normal approval

 For multi-step initiatives, keep `checklist_write` current. Add `update_plan` only for genuinely useful strategy.

-## Efficient Approvals
+###### Efficient Approvals

 When your plan includes multiple writes, present them together:
 1. Show `checklist_write` with all write steps listed so the user sees the full scope
@@ -21,7 +21,7 @@ When your plan includes multiple writes, present them together:

 Don't sequence approvals one at a time — the user wants context, not interruption. A clear plan with visible checklist items gets approved faster than a series of surprise approval prompts.

-## Session Longevity
+###### Session Longevity

 Long sessions accumulate context. To stay fast:
 - Open sub-agent sessions for independent work instead of doing everything sequentially
@@ -1,10 +1,14 @@
-## Mode: Plan
+##### Mode: Plan

 You are running in Plan mode — design before implementing.

 Investigate first, act later. Use `checklist_write` for visible, granular progress on multi-step
 investigations. When you are ready to present the implementation plan, call `update_plan` with
 the final plan; that is the handoff signal that lets the UI show the accept / revise / exit prompt.
+For non-trivial work, make the plan artifact grounded: include the objective, a short context
+summary, sources used, critical files, constraints, recommended approach, verification plan,
+risks or unknowns, and any concise handoff packet another agent would need. Do not include
+secrets in sources, file lists, or handoff text.
 All writes and patches are blocked — you can read the world but you
 can't change it. Shell and code execution are unavailable.

--- a/Show More
+++ b/Show More