Three changes to fix provider capability reporting:
1. Delete the Openai/Atlascloud/Moonshot early-return arm in
provider_capability() so these providers use the generic model-based
path. Moonshot models now correctly report 262,144 context window
and thinking_supported: true (via models.rs lookups).
2. Delete the Ollama hardcoded arm so Ollama also uses model-based
lookups. The generic fallback now uses 8192 for Ollama (conservative
for small local models) instead of the 128K default.
3. Ollama fallback: when context_window_for_model returns None and
the provider is Ollama, default to 8192 instead of
LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS (128K).
- prompt_persist tests wrote through the developer's real
~/.codewhale/prompt_cache and raced each other (the eviction test could
delete another test's entry). cache_dir() now honors
CODEWHALE_PROMPT_CACHE_DIR and the tests run serialized against private
tempdirs.
- The raw TCP mock servers in mcp tests answered one request per socket
but never advertised 'Connection: close', so reqwest pooled dead
keep-alive connections and retried POSTs failed under parallel-suite
load with 'connection closed before message completed' (~50% failure
rate on the full suite).
The workflow hardcoded install boilerplate plus a contributor list that
had already drifted (v0.8.56's release thanked people 'for shaping
v0.9.0'). The body now comes from scripts/release/generate-release-body.sh:
static install/verify sections plus the tagged version's changelog section,
which already carries the per-release credits.
- scripts/release/prepare-release.sh bumps workspace + crate pins + npm
wrapper + README install tags, refreshes Cargo.lock, regenerates the
TUI changelog slice and web facts, then runs check-versions.sh
- check-versions.sh now also gates web/lib/facts.generated.ts and the
README install-tag examples (both drifted silently before)
- .cnb.yml validates the pushed tag against Cargo.toml before generating
mirror release notes
- RELEASE_CHECKLIST/RUNBOOK updated accordingly (v0.8.56 needed 9 fix
commits for exactly these sync points)
When the host sleeps while a model response is streaming, the connection
dies on wake with 'Stream read error: error decoding response body' and
the turn was lost. The engine now stamps every stream chunk with both
monotonic and wall-clock time; Instant pauses across a suspend while
SystemTime does not, so a >10s divergence on a stream error identifies a
sleep/wake cycle. In that case the partial output is discarded and the
identical request is re-issued (sharing the existing MAX_STREAM_RETRIES=3
budget) instead of failing the turn. Ordinary network flakes keep the
deliberate no-retry-after-content policy from #103.
- system prompt environment key deepseek_version -> codewhale_version
- drop legacy .deepseek/instructions.md from the Local Law prompt tier
(the engine still reads it for back-compat)
- instructions-file truncation marker now states how many bytes were
omitted so the model knows what it is missing
- CODEWHALE_CHANGELOG const + user-facing /change strings
- codewhale metrics doc headers
The packaged changelog is now a recent-releases slice produced by
scripts/sync-changelog.sh (which gains a --check mode); also restore the
SECURITY.md contact line the version gate guards, and finish the stale
binary-name sweep (--bin codewhale examples, qa harness doc).
The /change command embeds this file into every binary via include_str!;
it now carries only recent releases (regenerated by scripts/sync-changelog.sh,
wired into the release checklist). The explicit-version test derives its
fixture versions from the embedded slice instead of hardcoding old ones.
Loads resolved hotbar slot bindings into app state and renders the Hotbar panel at the bottom of the sidebar (render layer only; key dispatch is a follow-up). Part of #2061, ref #2065.
Prompt, exec_shell description, and background schema now direct work expected to take >5 seconds (builds, test suites, servers, polling, sleeps) to task_shell_start/background=true. Addresses #2939.
allow_shell now rides the per-turn <runtime_prompt> tag alongside mode/approval; message[0] stays byte-stable across shell toggles and mode switches, preserving the DeepSeek prefix cache. Removes the shell-guidance line-filtering machinery; adds a static Shell Policy reference section.
When the CLI dispatcher forwards --api-key with DEEPSEEK_API_KEY_SOURCE=cli, that explicit override now wins over the saved root key for DeepSeek providers.
Event-driven task panel refresh on exec_shell_cancel/exec_shell_wait/task_cancel plus an immediate refresh after ShellJob actions, so the Tasks sidebar reflects cancellations without waiting for the periodic refresh. Addresses #2937.
The <runtime_prompt> tag includes a visibility="internal" attribute that
was listed in the tag format but never explained. Models sometimes
interpreted this as an instruction to announce or restate the current
mode to the user, leading to repetitive YOLO-mode confirmations before
every tool call (#2922).
Add a one-sentence explanation: the attribute marks this tag as a
runtime instruction for the model (not user input), and the model should
apply the referenced rules silently without announcing the mode.
Closes#2922
Move allow_shell from message[0] (static system prompt) to the per-turn
<runtime_prompt> tag alongside mode and approval. This preserves the
DeepSeek prefix cache across shell-access toggles and mode switches,
which previously forced YOLO entry/exit to rebuild the entire prompt.
Changes:
- Delete remove_shell_tool_guidance and 3 other dead functions (~60 lines)
- Remove allow_shell field from PromptSessionContext and StaticPromptCtx
- Remove shell_tools_available dead parameter from compose functions
- Add Shell Policy section to Runtime Policy Reference (static text)
- Extend <runtime_prompt> tag with allow_shell="true|false" attribute
- Update tests: omits→always_keeps, 83/83 prompts tests pass
- Drop dead compose_mode_prompt_with_approval_and_model
Net: message[0] bytes are now stable regardless of shell-access state.
Mode/approval/shell flags all use the same per-turn tag pattern.
Render configured/default hotbar slots at the bottom of the sidebar.
Load resolved hotbar bindings into app state, display them as compact sidebar rows, highlight active slots, and preserve unknown
actions visibly. Keep narrow sidebar rows within the available width so slots do not wrap or disappear.
Add focused sidebar hotbar render and layout tests.
Background shell task cancellation was unreliable because the Tasks
sidebar panel was not refreshed immediately after cancel actions.
Root cause:
- ShellJobAction::Cancel/CancelAll killed the process in ShellManager
but did not trigger a task_panel refresh, leaving stale "running"
entries until the next 2.5 s periodic poll.
- The tool-name refresh list at line 1734 missed exec_shell_cancel,
exec_shell_wait, and task_cancel.
Fix:
- Add refresh_active_task_panel() call after ShellJobAction dispatch.
- Add exec_shell_cancel, exec_shell_wait, task_cancel to the
immediate-refresh tool name list.
Tests:
- shell_manager_cancel_transitions_task_to_not_running
- task_panel_entry_roundtrips_status