Commit Graph

2456 Commits

Author SHA1 Message Date
hongchen1993 c84292235f feat(config): prefer dispatcher-provided API key over saved DeepSeek key when source is cli
When the CLI dispatcher launches the interactive TUI with an explicit
`--api-key` argument (e.g. for a DeepSeek-compatible subscription
endpoint), the environment variable `DEEPSEEK_API_KEY` carries the
intended key with `DEEPSEEK_API_KEY_SOURCE=cli`. Previously the
saved root `api_key` in config.toml always won over this env override
for the DeepSeek provider, blocking users from running:

    codewhale --provider deepseek \
      --api-key ark-... --base-url https://... --model auto

This change gives the dispatcher-supplied env key priority when the
source marker is `cli`, keeping full backward compatibility for
normal config-file or keyring paths, and also cleaning up a `***`
literal in an unrelated test.
2026-06-09 11:48:59 +08:00
Hunter B 9463266cb1 ci(web): make Cloudflare deploy manual 2026-06-08 08:33:26 -07:00
Hunter B 0854425dc6 ci(web): deploy public site from main 2026-06-08 08:30:00 -07:00
Hunter B 7344b88eac fix(web): sync frontend lockfile for CI 2026-06-08 08:28:10 -07:00
Hunter B 3d503a0a24 docs: bring public surface and npm-deferred install copy 2026-06-08 08:01:18 -07:00
Hunter B c4ff9e5345 fix(release): allow asset publication despite docker failure 2026-06-08 07:47:48 -07:00
Hunter B 533b0f5766 fix(release): regenerate Cargo.lock for 0.8.54 workspace versions 2026-06-08 07:00:50 -07:00
Hunter B 78ae354fa4 chore(release): merge v0.9.0-stewardship into v0.8.54
Includes Paulo's command parity and Gherkin E2E harnesses,
HUQIANTAO's concurrency/security fixes, LeoAlex0's runtime_prompt
slim, reidliu41's hotbar persistence, HarmonyOS scaffolding,
Whaleflow foundation crate, and all v0.9.0 stabilization work.
2026-06-08 06:54:09 -07:00
Hunter B edd28066e1 chore(release): v0.8.54 — benchmark harness runners, MiMo routing 2026-06-08 06:47:21 -07:00
Hunter B f88528a5a3 test(subagent): de-flake touch_refreshes_stale_running_agent_heartbeat
The 1ms heartbeat timeout raced the synchronous touch()->cleanup()
gap on loaded CI runners (Windows scheduler can deschedule >1ms),
intermittently reaping the just-touched agent so cleanup() returned 1.
Widen the timeout to 50ms and the staleness sleep to 150ms to keep the
logic exercised without the timing race. Addresses CI flakiness under
the v0.9.0 stabilization gate (#2721).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 10:49:36 -07:00
greyfreedom 17dbed13c7 feat(execpolicy): wire permissions.toml ask-rules into runtime
Harvested from PR #2885 by @greyfreedom. Wires ask-rules into the
app-server and core ExecPolicyEngine (previously inert). Removes the
original PR's NeedsApproval arm that incorrectly allow-listed the
working directory as a network host.

Co-Authored-By: greyfreedom <11493871+greyfreedom@users.noreply.github.com>
2026-06-07 10:49:36 -07:00
Hunter B 4e3184eae9 fix(client): consume probe response body to return connection to pool
Harvested from PR #2884 by @ousamabenyounes. Drops the orphan
desktop tray.rs module (dead code, never wired) from that PR.

Co-Authored-By: Ben Younes <2910651+ousamabenyounes@users.noreply.github.com>
2026-06-07 10:49:36 -07:00
Hunter B e2b7d5e197 fix: harvest safe bug fixes from PR #2880
Harvests 7 safe fixes from PR #2880 by @HUQIANTAO: tool-name hex-digit
guard, token-usage u32 clamp, read-file line usize::try_from, grep
context-lines cap, UTF-8 PDF trim, run_skill dedup, and
Volcengine/SiliconflowCn reasoning_content support. Excludes the
DeepSeek stream-stop change and the unwired prompt_persist module
(deferred for separate review).

Co-Authored-By: HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
2026-06-07 10:49:36 -07:00
Hunter B ab65495b0e Merge PR #2781 from punkcanyang: opt-in ghost-text follow-up suggestion 2026-06-07 10:21:01 -07:00
Hunter B 8d329a434c Merge PR #2869 from ousamabenyounes: list saved models from all providers in /model picker 2026-06-07 10:21:00 -07:00
Hunter B b39e00e72b Merge PR #2883 from HUQIANTAO: concurrency hardening (mutex recovery, join handles) 2026-06-07 10:21:00 -07:00
Hunter B 1a9549babd Merge PR #2881 from HUQIANTAO: log instead of swallowing errors 2026-06-07 10:21:00 -07:00
Hunter B 4caa28772b Merge PR #2882 from HUQIANTAO: security fixes in execution policy and approval mapping 2026-06-07 10:21:00 -07:00
Hunter B face4dc27a Merge PR #2877 from LeoAlex0: cache_inspect test spillover root 2026-06-07 10:21:00 -07:00
Hunter B a54d08f28d chore(fmt): rustfmt engine tests from PR #2874
Mechanical rustfmt of the runtime_prompt tests rewritten in PR #2874
(LeoAlex0). No logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 10:10:11 -07:00
Hunter B 3619962507 Merge PR #2874 from LeoAlex0: slim runtime_prompt to minimal tag 2026-06-07 10:09:21 -07:00
Hunter B a42e9115b1 Merge PR #2873 from reidliu41: hotbar slot persistence 2026-06-07 10:09:21 -07:00
Hunter B 2c56f7761e Merge PR #2887 from aboimpinto: Gherkin acceptance E2E harness 2026-06-07 10:04:12 -07:00
Hunter B b0d9c3196b Merge PR #2878 from aboimpinto: Layer 2 command parity harness 2026-06-07 10:04:08 -07:00
Ousama Ben Younes 97f6e0b2e5 fix(tui): use sort_by_key to satisfy clippy::unnecessary_sort_by 2026-06-07 15:17:13 +00:00
Paulo Aboim Pinto c25f7af219 Address acceptance harness review feedback 2026-06-07 16:29:40 +02:00
Paulo Aboim Pinto d90031f06f Add Gherkin acceptance E2E harness example 2026-06-07 16:12:12 +02:00
huqiantao bdf7b15bd7 revert: use std::thread::spawn for fire-and-forget hooks
tokio::task::spawn_blocking requires a running tokio runtime, which
breaks tests that call hook functions outside a tokio context. Since
hooks are fire-and-forget (no JoinHandle needed), std::thread::spawn
is the correct choice.
2026-06-07 19:59:17 +08:00
huqiantao 5cab1517a4 fix: add tracing dependency to app-server crate
Required for tracing::error! in persist_config error handling.
2026-06-07 19:56:06 +08:00
huqiantao 5e761a616c fix: collapse nested if-let to satisfy clippy::collapsible_if lint 2026-06-07 19:55:38 +08:00
huqiantao 3c197d707b fix: add sse_task field to SseTransport test initializer
The test at line 4768 was missing the new sse_task field added to
SseTransport. Add a dummy tokio::spawn task for the test.
2026-06-07 19:48:09 +08:00
huqiantao 9aa71e24c0 chore: update Cargo.lock for tracing dependency in core crate 2026-06-07 19:47:36 +08:00
huqiantao 4dd0a47c05 style: apply cargo fmt formatting 2026-06-07 19:46:24 +08:00
huqiantao 265b8ee142 fix: add tracing dependency to core crate and apply cargo fmt
- Add tracing.workspace = true to crates/core/Cargo.toml
  (required for tracing::warn! in lib.rs:752)
- Apply cargo fmt formatting to engine.rs, mcp.rs, tool_execution.rs, config/lib.rs
2026-06-07 19:46:02 +08:00
huqiantao 27ca87251e fix: use Box<dyn Write + Send> for cross-platform tracing writer
Replace platform-specific std::os::unix::io::FromRawFd with
Box<dyn std::io::Write + Send> return type. This compiles on
Windows, macOS, and Linux without unsafe code.

The closure now returns a boxed writer that is either:
- The cloned file handle (success case)
- A reopened file handle (clone failed)
- stderr (last resort, prevents panic)
2026-06-07 19:35:59 +08:00
huqiantao 75593a0eac fix: address security review comments
1. Fix whitespace bypass in normalize_command (execpolicy/lib.rs:446)
   - Collapse internal whitespace to prevent 'git  status' bypassing 'git status'
   - split_whitespace().join(' ') normalizes all whitespace

2. Fix 'never'/'deny' approval mapping (app-server/lib.rs:287)
   - Map to AskForApproval::Never instead of OnRequest
   - 'never'/'deny' should forbid commands, not prompt for approval

3. Optimize prefix matching (execpolicy/lib.rs:355, bash_arity.rs:375)
   - Avoid format! allocation on every check
   - Use byte comparison for space boundary check
2026-06-07 19:35:20 +08:00
huqiantao eb3a989eeb fix: address review comments on engine.rs
1. Replace let-else with if-let-Some to avoid compilation error
   - let-else with return would return from the entire function
   - if-let-Some correctly assigns to tool_registry and continues

2. Preserve original goal_objective_for_prompt behavior
   - Return None (not fallback) when objective exists but goal is inactive
   - Use state.is_active().then() to match original semantics
2026-06-07 19:33:52 +08:00
huqiantao 4304c89d65 fix: concurrency bugs - mutex handling, thread spawning, and resource management
1. Fix Mutex lock().unwrap() in MCP server (mcp_server.rs:384,434)
   - Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks
   - Previously, a single panic while holding the lock would cascade to all threads

2. Fix std::thread::spawn in async code (hooks.rs:1055)
   - Replace std::thread::spawn with tokio::task::spawn_blocking
   - Respects tokio's thread pool limits instead of creating unbounded OS threads
   - Fire-and-forget hook execution now properly managed by tokio runtime

3. Fix dropped JoinHandle in SSE loop (mcp.rs:647)
   - Store the JoinHandle in SseTransport struct
   - Enables detection of SSE loop termination
   - Prevents silent connection loss without structured error reporting

4. Fix std::sync::Mutex poison handling in cost_status (cost_status.rs:28-58)
   - Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks
   - Previously, a panic while holding the lock silently lost all subsequent cost data
   - Cost tracking now survives mutex poisoning

5. Fix .expect() in tracing writer (runtime_log.rs:162)
   - Replace expect() with fallback chain: try_clone -> reopen file -> stderr
   - Prevents panicking inside tracing subscriber on fd exhaustion
   - Previously, EMFILE during logging would crash the application
2026-06-07 19:18:19 +08:00
huqiantao 27fac5d704 fix: security bugs in execpolicy, app-server, and tools
1. Fix deny rule prefix matching without word boundary (execpolicy/lib.rs:351-353)
   - Deny rule 'rm' now blocks 'rm -rf /' but NOT 'rmdir' or 'rmview'
   - Previously used bare starts_with which matched any command starting with 'rm'
   - Add word-boundary check: command must equal rule or start with rule+space

2. Fix fallback prefix match clarity (execpolicy/bash_arity.rs:362-374)
   - Improve comment to clarify word-boundary matching behavior
   - The trailing space in starts_with already provides word boundary

3. Fix hardcoded AskForApproval::OnRequest in HTTP API (app-server/lib.rs:283)
   - Read approval_policy from config instead of hardcoding OnRequest
   - Users with 'auto'/'yolo' policy now get UnlessTrusted for API calls
   - Previously ignored user's configured security posture

4. Fix fuzzy indentation search destroying preceding text (tools/file.rs:714-735)
   - When match starts mid-line after whitespace stripping, use exact position
   - Previously always expanded to line start, destroying preceding content
   - Now only expands to line start when match is at a line boundary

5. Fix potential underflow in apply_hunk start index (tools/apply_patch.rs:1110-1115)
   - Use checked_add_signed to safely handle negative cumulative_offset
   - Prevents isize overflow on adversarial patch input
   - Clamp to lines.len() instead of relying on .max(0) cast
2026-06-07 19:13:43 +08:00
huqiantao ef4dc5ca61 fix: error handling bugs - log instead of silently swallowing errors
1. Fix swallowed persist_config errors (app-server/lib.rs:882,896)
   - Log errors when config persistence fails after set/unset
   - Users previously got success response even when disk write failed

2. Fix swallowed job store load error (core/lib.rs:751)
   - Add warning log when job store fails to load at startup
   - Previously silently started with empty job list on corruption

3. Fix silent config parse failures (config/lib.rs:1590)
   - Log warning when project config TOML is malformed
   - Previously returned None indistinguishable from 'no config file'

4. Fix MCP connect_all errors swallowed (mcp.rs:2151,2189)
   - Log warnings for each server that fails to connect
   - Previously returned incomplete resource list with no indication

5. Fix error context stripped in engine status (core/engine.rs:2223)
   - Use {err:#} format to include full error chain
   - Was inconsistent with line 2234 which already used {err:#}

6. Fix tool audit log failures silently dropped (tool_execution.rs:122-136)
   - Log each failure: serialization, directory creation, file open, write
   - Previously silently dropped all errors for security audit trail

7. Fix Err(_) arms discarding error info (runtime_log.rs:179, runtime_threads.rs:828)
   - Log stderr redirect failures on Windows
   - Log poisoned mutex in pending_approvals

8. Fix env var parsing errors silently ignored (config/lib.rs:2519-2530)
   - Warn when DEEPSEEK_TELEMETRY, DEEPSEEK_YOLO, DEEPSEEK_HTTP_HEADERS
     have invalid values instead of silently treating as unset

9. Fix MCP config reload errors swallowed (mcp.rs:2011)
   - Log config reload errors instead of complete silence

10. Fix .expect() on sub-agent runtime (core/engine.rs:1715)
    - Gracefully fall back to basic tool set when API client missing
    - Previously panicked if subagents enabled but no client configured

11. Fix .expect() on goal objective (core/engine.rs:2543)
    - Use safe if-let pattern instead of check+expect
    - Prevents panic if refactoring changes control flow
2026-06-07 19:04:47 +08:00
zLeoAlex 55d7499408 test: add runtime_policy_reference composition test, strengthen ChangeMode tests, fix outdated comments
- Add runtime_policy_reference_is_included_in_full_prompt test to verify
  that render_runtime_policy_reference() output lands in the composed
  system prompt. Guards against silent breakage if the push_str() call
  is accidentally removed (all existing tests would still pass).

- Strengthen change_mode_op_updates_current_mode_and_emits_status:
  destructure SessionUpdated to assert that session messages do NOT
  contain <runtime_prompt> tags after mode change — verifying the core
  invariant that Op::ChangeMode does not write session history.

- Extend current_mode_field_assignment_takes_effect_synchronously:
  now also verifies that messages_with_turn_metadata() produces the
  correct runtime tag (mode="yolo" approval="auto") after a mode
  switch, covering the tag-generation mechanism end-to-end.

- Fix outdated comments in composed_prompt_no_longer_inlines_tool_taxonomy
  and plan_prompt_taxonomy_omits_run_tests: replace stale references to
  deleted <mode_prompt> metadata with accurate descriptions of the
  ## Runtime Policy Reference section.
2026-06-07 18:31:36 +08:00
Paulo Aboim Pinto acaae1c2e5 test(tui): address command harness review 2026-06-07 12:24:13 +02:00
Paulo Aboim Pinto 96bff65797 test(tui): add command parity harness 2026-06-07 11:43:57 +02:00
zLeoAlex 256f34c621 fix(cache): set temp spillover root in cache_inspect test to survive nix sandbox
The test cache_inspect_displays_tool_result_budget_metadata relied on a
writable $HOME/.codewhale/tool_outputs/ for tool-result wire-dedup
persistence.  nix build sandboxes have a read-only home tree, so the
first tool-result SHA spillover write failed, the dedup hash table was
never populated, and the second identical tool result was not marked
deduplicated — causing the expect("repeat tool-result sighting should
report dedup metadata") assertion to fail.

Set TEST_SPILLOVER_ROOT to a tempdir inside the test (matching the
with_tool_result_sha_spillover_root pattern in chat.rs), so the
wire-dedup path works in any environment without depending on $HOME.
2026-06-07 16:06:38 +08:00
zLeoAlex 7b900b8699 test(cache): rename misleading test — does not exercise Op::ChangeMode dispatch
- Rename mode_change_op_updates_current_mode_and_emits_session_updated
  to current_mode_field_assignment_takes_effect_synchronously.
- The test directly mutates engine.current_mode, not through Op::ChangeMode.
  The dispatch path is separately covered by
  change_mode_op_updates_current_mode_and_emits_status.
2026-06-07 15:26:54 +08:00
zLeoAlex c6c3d2cc4d refactor(cache): inline single-call helpers, remove dead code
- Inline mode_prompt_marker_value and approval_prompt_marker_value into
  runtime_prompt_text (each called exactly once).
- Remove default_approval_mode_for_mode — zero callers.
2026-06-07 15:22:53 +08:00
zLeoAlex 039abb2ae6 refactor(cache): remove render_core_tool_taxonomy_block, inline to body variant
- Replace the 2 remaining test callers with render_core_tool_taxonomy_body
  (neither test depends on the ## heading — they check content only).
- Delete render_core_tool_taxonomy_block — zero production callers after
  the previous refactor.
2026-06-07 15:20:51 +08:00
zLeoAlex 12167b39c3 refactor(cache): replace taxonomy_body strip hack with source-level render_core_tool_taxonomy_body
- Add render_core_tool_taxonomy_body(mode) that generates the tool
  taxonomy text without the ## Core Tool Taxonomy heading.
- Refactor render_core_tool_taxonomy_block to use the body function
  internally (DRY).
- Delete taxonomy_body() — a downstream strip_prefix hack that
  worked around the source format instead of fixing it.
- Also removes the now-unnecessary debug_assert! (over-defensive,
  since the two functions are co-located in the same file).
2026-06-07 15:19:27 +08:00
zLeoAlex 0b5d574e63 fix(cache): address CR feedback — blank lines, heading hierarchy, debug_assert
- Add proper blank lines (\n\n) before mode headings in
  render_runtime_policy_reference (CommonMark/GFM compliance).
- Demote subheadings in agent.md from ##### to ###### so they
  nest correctly under the demoted main heading.
- Add debug_assert! in taxonomy_body() to loudly fail when
  render_core_tool_taxonomy_block format changes, preventing
  silent heading-hierarchy breakage.
2026-06-07 15:15:12 +08:00
zLeoAlex 427bd5d52f feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt
- Add render_runtime_policy_reference() in prompts.rs containing all
  mode and approval policy descriptions in the frozen system-prompt
  prefix (sent once per session, cache-hit thereafter).
- Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token
  self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>).
- Fix markdown heading hierarchy in all prompts/modes/*.md and
  prompts/approvals/*.md (## → #####) to nest correctly under ####.
- Remove now-unused legacy functions: mode_prompt(),
  approval_prompt_for_mode(), mode_change_runtime_message().
- Simplify Op::ChangeMode: no longer persists a mode_change event
  (next turn tag carries the current mode).
- Update and rename affected tests.

Builds on #2801. Reduces per-request runtime prompt overhead by 97%
(~471 tokens saved per API call). System prompt grows by ~1325 tokens
in the frozen prefix (one-time miss cost); break-even at 3 API calls.
2026-06-07 15:03:43 +08:00