Commit Graph

2441 Commits

Author SHA1 Message Date
greyfreedom 17dbed13c7 feat(execpolicy): wire permissions.toml ask-rules into runtime
Harvested from PR #2885 by @greyfreedom. Wires ask-rules into the
app-server and core ExecPolicyEngine (previously inert). Removes the
original PR's NeedsApproval arm that incorrectly allow-listed the
working directory as a network host.

Co-Authored-By: greyfreedom <11493871+greyfreedom@users.noreply.github.com>
2026-06-07 10:49:36 -07:00
Hunter B 4e3184eae9 fix(client): consume probe response body to return connection to pool
Harvested from PR #2884 by @ousamabenyounes. Drops the orphan
desktop tray.rs module (dead code, never wired) from that PR.

Co-Authored-By: Ben Younes <2910651+ousamabenyounes@users.noreply.github.com>
2026-06-07 10:49:36 -07:00
Hunter B e2b7d5e197 fix: harvest safe bug fixes from PR #2880
Harvests 7 safe fixes from PR #2880 by @HUQIANTAO: tool-name hex-digit
guard, token-usage u32 clamp, read-file line usize::try_from, grep
context-lines cap, UTF-8 PDF trim, run_skill dedup, and
Volcengine/SiliconflowCn reasoning_content support. Excludes the
DeepSeek stream-stop change and the unwired prompt_persist module
(deferred for separate review).

Co-Authored-By: HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
2026-06-07 10:49:36 -07:00
Hunter B ab65495b0e Merge PR #2781 from punkcanyang: opt-in ghost-text follow-up suggestion 2026-06-07 10:21:01 -07:00
Hunter B 8d329a434c Merge PR #2869 from ousamabenyounes: list saved models from all providers in /model picker 2026-06-07 10:21:00 -07:00
Hunter B b39e00e72b Merge PR #2883 from HUQIANTAO: concurrency hardening (mutex recovery, join handles) 2026-06-07 10:21:00 -07:00
Hunter B 1a9549babd Merge PR #2881 from HUQIANTAO: log instead of swallowing errors 2026-06-07 10:21:00 -07:00
Hunter B 4caa28772b Merge PR #2882 from HUQIANTAO: security fixes in execution policy and approval mapping 2026-06-07 10:21:00 -07:00
Hunter B face4dc27a Merge PR #2877 from LeoAlex0: cache_inspect test spillover root 2026-06-07 10:21:00 -07:00
Hunter B a54d08f28d chore(fmt): rustfmt engine tests from PR #2874
Mechanical rustfmt of the runtime_prompt tests rewritten in PR #2874
(LeoAlex0). No logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 10:10:11 -07:00
Hunter B 3619962507 Merge PR #2874 from LeoAlex0: slim runtime_prompt to minimal tag 2026-06-07 10:09:21 -07:00
Hunter B a42e9115b1 Merge PR #2873 from reidliu41: hotbar slot persistence 2026-06-07 10:09:21 -07:00
Hunter B 2c56f7761e Merge PR #2887 from aboimpinto: Gherkin acceptance E2E harness 2026-06-07 10:04:12 -07:00
Hunter B b0d9c3196b Merge PR #2878 from aboimpinto: Layer 2 command parity harness 2026-06-07 10:04:08 -07:00
Ousama Ben Younes 97f6e0b2e5 fix(tui): use sort_by_key to satisfy clippy::unnecessary_sort_by 2026-06-07 15:17:13 +00:00
Paulo Aboim Pinto c25f7af219 Address acceptance harness review feedback 2026-06-07 16:29:40 +02:00
Paulo Aboim Pinto d90031f06f Add Gherkin acceptance E2E harness example 2026-06-07 16:12:12 +02:00
huqiantao bdf7b15bd7 revert: use std::thread::spawn for fire-and-forget hooks
tokio::task::spawn_blocking requires a running tokio runtime, which
breaks tests that call hook functions outside a tokio context. Since
hooks are fire-and-forget (no JoinHandle needed), std::thread::spawn
is the correct choice.
2026-06-07 19:59:17 +08:00
huqiantao 5cab1517a4 fix: add tracing dependency to app-server crate
Required for tracing::error! in persist_config error handling.
2026-06-07 19:56:06 +08:00
huqiantao 5e761a616c fix: collapse nested if-let to satisfy clippy::collapsible_if lint 2026-06-07 19:55:38 +08:00
huqiantao 3c197d707b fix: add sse_task field to SseTransport test initializer
The test at line 4768 was missing the new sse_task field added to
SseTransport. Add a dummy tokio::spawn task for the test.
2026-06-07 19:48:09 +08:00
huqiantao 9aa71e24c0 chore: update Cargo.lock for tracing dependency in core crate 2026-06-07 19:47:36 +08:00
huqiantao 4dd0a47c05 style: apply cargo fmt formatting 2026-06-07 19:46:24 +08:00
huqiantao 265b8ee142 fix: add tracing dependency to core crate and apply cargo fmt
- Add tracing.workspace = true to crates/core/Cargo.toml
  (required for tracing::warn! in lib.rs:752)
- Apply cargo fmt formatting to engine.rs, mcp.rs, tool_execution.rs, config/lib.rs
2026-06-07 19:46:02 +08:00
huqiantao 27ca87251e fix: use Box<dyn Write + Send> for cross-platform tracing writer
Replace platform-specific std::os::unix::io::FromRawFd with
Box<dyn std::io::Write + Send> return type. This compiles on
Windows, macOS, and Linux without unsafe code.

The closure now returns a boxed writer that is either:
- The cloned file handle (success case)
- A reopened file handle (clone failed)
- stderr (last resort, prevents panic)
2026-06-07 19:35:59 +08:00
huqiantao 75593a0eac fix: address security review comments
1. Fix whitespace bypass in normalize_command (execpolicy/lib.rs:446)
   - Collapse internal whitespace to prevent 'git  status' bypassing 'git status'
   - split_whitespace().join(' ') normalizes all whitespace

2. Fix 'never'/'deny' approval mapping (app-server/lib.rs:287)
   - Map to AskForApproval::Never instead of OnRequest
   - 'never'/'deny' should forbid commands, not prompt for approval

3. Optimize prefix matching (execpolicy/lib.rs:355, bash_arity.rs:375)
   - Avoid format! allocation on every check
   - Use byte comparison for space boundary check
2026-06-07 19:35:20 +08:00
huqiantao eb3a989eeb fix: address review comments on engine.rs
1. Replace let-else with if-let-Some to avoid compilation error
   - let-else with return would return from the entire function
   - if-let-Some correctly assigns to tool_registry and continues

2. Preserve original goal_objective_for_prompt behavior
   - Return None (not fallback) when objective exists but goal is inactive
   - Use state.is_active().then() to match original semantics
2026-06-07 19:33:52 +08:00
huqiantao 4304c89d65 fix: concurrency bugs - mutex handling, thread spawning, and resource management
1. Fix Mutex lock().unwrap() in MCP server (mcp_server.rs:384,434)
   - Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks
   - Previously, a single panic while holding the lock would cascade to all threads

2. Fix std::thread::spawn in async code (hooks.rs:1055)
   - Replace std::thread::spawn with tokio::task::spawn_blocking
   - Respects tokio's thread pool limits instead of creating unbounded OS threads
   - Fire-and-forget hook execution now properly managed by tokio runtime

3. Fix dropped JoinHandle in SSE loop (mcp.rs:647)
   - Store the JoinHandle in SseTransport struct
   - Enables detection of SSE loop termination
   - Prevents silent connection loss without structured error reporting

4. Fix std::sync::Mutex poison handling in cost_status (cost_status.rs:28-58)
   - Use unwrap_or_else(|e| e.into_inner()) to recover from poisoned locks
   - Previously, a panic while holding the lock silently lost all subsequent cost data
   - Cost tracking now survives mutex poisoning

5. Fix .expect() in tracing writer (runtime_log.rs:162)
   - Replace expect() with fallback chain: try_clone -> reopen file -> stderr
   - Prevents panicking inside tracing subscriber on fd exhaustion
   - Previously, EMFILE during logging would crash the application
2026-06-07 19:18:19 +08:00
huqiantao 27fac5d704 fix: security bugs in execpolicy, app-server, and tools
1. Fix deny rule prefix matching without word boundary (execpolicy/lib.rs:351-353)
   - Deny rule 'rm' now blocks 'rm -rf /' but NOT 'rmdir' or 'rmview'
   - Previously used bare starts_with which matched any command starting with 'rm'
   - Add word-boundary check: command must equal rule or start with rule+space

2. Fix fallback prefix match clarity (execpolicy/bash_arity.rs:362-374)
   - Improve comment to clarify word-boundary matching behavior
   - The trailing space in starts_with already provides word boundary

3. Fix hardcoded AskForApproval::OnRequest in HTTP API (app-server/lib.rs:283)
   - Read approval_policy from config instead of hardcoding OnRequest
   - Users with 'auto'/'yolo' policy now get UnlessTrusted for API calls
   - Previously ignored user's configured security posture

4. Fix fuzzy indentation search destroying preceding text (tools/file.rs:714-735)
   - When match starts mid-line after whitespace stripping, use exact position
   - Previously always expanded to line start, destroying preceding content
   - Now only expands to line start when match is at a line boundary

5. Fix potential underflow in apply_hunk start index (tools/apply_patch.rs:1110-1115)
   - Use checked_add_signed to safely handle negative cumulative_offset
   - Prevents isize overflow on adversarial patch input
   - Clamp to lines.len() instead of relying on .max(0) cast
2026-06-07 19:13:43 +08:00
huqiantao ef4dc5ca61 fix: error handling bugs - log instead of silently swallowing errors
1. Fix swallowed persist_config errors (app-server/lib.rs:882,896)
   - Log errors when config persistence fails after set/unset
   - Users previously got success response even when disk write failed

2. Fix swallowed job store load error (core/lib.rs:751)
   - Add warning log when job store fails to load at startup
   - Previously silently started with empty job list on corruption

3. Fix silent config parse failures (config/lib.rs:1590)
   - Log warning when project config TOML is malformed
   - Previously returned None indistinguishable from 'no config file'

4. Fix MCP connect_all errors swallowed (mcp.rs:2151,2189)
   - Log warnings for each server that fails to connect
   - Previously returned incomplete resource list with no indication

5. Fix error context stripped in engine status (core/engine.rs:2223)
   - Use {err:#} format to include full error chain
   - Was inconsistent with line 2234 which already used {err:#}

6. Fix tool audit log failures silently dropped (tool_execution.rs:122-136)
   - Log each failure: serialization, directory creation, file open, write
   - Previously silently dropped all errors for security audit trail

7. Fix Err(_) arms discarding error info (runtime_log.rs:179, runtime_threads.rs:828)
   - Log stderr redirect failures on Windows
   - Log poisoned mutex in pending_approvals

8. Fix env var parsing errors silently ignored (config/lib.rs:2519-2530)
   - Warn when DEEPSEEK_TELEMETRY, DEEPSEEK_YOLO, DEEPSEEK_HTTP_HEADERS
     have invalid values instead of silently treating as unset

9. Fix MCP config reload errors swallowed (mcp.rs:2011)
   - Log config reload errors instead of complete silence

10. Fix .expect() on sub-agent runtime (core/engine.rs:1715)
    - Gracefully fall back to basic tool set when API client missing
    - Previously panicked if subagents enabled but no client configured

11. Fix .expect() on goal objective (core/engine.rs:2543)
    - Use safe if-let pattern instead of check+expect
    - Prevents panic if refactoring changes control flow
2026-06-07 19:04:47 +08:00
zLeoAlex 55d7499408 test: add runtime_policy_reference composition test, strengthen ChangeMode tests, fix outdated comments
- Add runtime_policy_reference_is_included_in_full_prompt test to verify
  that render_runtime_policy_reference() output lands in the composed
  system prompt. Guards against silent breakage if the push_str() call
  is accidentally removed (all existing tests would still pass).

- Strengthen change_mode_op_updates_current_mode_and_emits_status:
  destructure SessionUpdated to assert that session messages do NOT
  contain <runtime_prompt> tags after mode change — verifying the core
  invariant that Op::ChangeMode does not write session history.

- Extend current_mode_field_assignment_takes_effect_synchronously:
  now also verifies that messages_with_turn_metadata() produces the
  correct runtime tag (mode="yolo" approval="auto") after a mode
  switch, covering the tag-generation mechanism end-to-end.

- Fix outdated comments in composed_prompt_no_longer_inlines_tool_taxonomy
  and plan_prompt_taxonomy_omits_run_tests: replace stale references to
  deleted <mode_prompt> metadata with accurate descriptions of the
  ## Runtime Policy Reference section.
2026-06-07 18:31:36 +08:00
Paulo Aboim Pinto acaae1c2e5 test(tui): address command harness review 2026-06-07 12:24:13 +02:00
Paulo Aboim Pinto 96bff65797 test(tui): add command parity harness 2026-06-07 11:43:57 +02:00
zLeoAlex 256f34c621 fix(cache): set temp spillover root in cache_inspect test to survive nix sandbox
The test cache_inspect_displays_tool_result_budget_metadata relied on a
writable $HOME/.codewhale/tool_outputs/ for tool-result wire-dedup
persistence.  nix build sandboxes have a read-only home tree, so the
first tool-result SHA spillover write failed, the dedup hash table was
never populated, and the second identical tool result was not marked
deduplicated — causing the expect("repeat tool-result sighting should
report dedup metadata") assertion to fail.

Set TEST_SPILLOVER_ROOT to a tempdir inside the test (matching the
with_tool_result_sha_spillover_root pattern in chat.rs), so the
wire-dedup path works in any environment without depending on $HOME.
2026-06-07 16:06:38 +08:00
zLeoAlex 7b900b8699 test(cache): rename misleading test — does not exercise Op::ChangeMode dispatch
- Rename mode_change_op_updates_current_mode_and_emits_session_updated
  to current_mode_field_assignment_takes_effect_synchronously.
- The test directly mutates engine.current_mode, not through Op::ChangeMode.
  The dispatch path is separately covered by
  change_mode_op_updates_current_mode_and_emits_status.
2026-06-07 15:26:54 +08:00
zLeoAlex c6c3d2cc4d refactor(cache): inline single-call helpers, remove dead code
- Inline mode_prompt_marker_value and approval_prompt_marker_value into
  runtime_prompt_text (each called exactly once).
- Remove default_approval_mode_for_mode — zero callers.
2026-06-07 15:22:53 +08:00
zLeoAlex 039abb2ae6 refactor(cache): remove render_core_tool_taxonomy_block, inline to body variant
- Replace the 2 remaining test callers with render_core_tool_taxonomy_body
  (neither test depends on the ## heading — they check content only).
- Delete render_core_tool_taxonomy_block — zero production callers after
  the previous refactor.
2026-06-07 15:20:51 +08:00
zLeoAlex 12167b39c3 refactor(cache): replace taxonomy_body strip hack with source-level render_core_tool_taxonomy_body
- Add render_core_tool_taxonomy_body(mode) that generates the tool
  taxonomy text without the ## Core Tool Taxonomy heading.
- Refactor render_core_tool_taxonomy_block to use the body function
  internally (DRY).
- Delete taxonomy_body() — a downstream strip_prefix hack that
  worked around the source format instead of fixing it.
- Also removes the now-unnecessary debug_assert! (over-defensive,
  since the two functions are co-located in the same file).
2026-06-07 15:19:27 +08:00
zLeoAlex 0b5d574e63 fix(cache): address CR feedback — blank lines, heading hierarchy, debug_assert
- Add proper blank lines (\n\n) before mode headings in
  render_runtime_policy_reference (CommonMark/GFM compliance).
- Demote subheadings in agent.md from ##### to ###### so they
  nest correctly under the demoted main heading.
- Add debug_assert! in taxonomy_body() to loudly fail when
  render_core_tool_taxonomy_block format changes, preventing
  silent heading-hierarchy breakage.
2026-06-07 15:15:12 +08:00
zLeoAlex 427bd5d52f feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt
- Add render_runtime_policy_reference() in prompts.rs containing all
  mode and approval policy descriptions in the frozen system-prompt
  prefix (sent once per session, cache-hit thereafter).
- Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token
  self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>).
- Fix markdown heading hierarchy in all prompts/modes/*.md and
  prompts/approvals/*.md (## → #####) to nest correctly under ####.
- Remove now-unused legacy functions: mode_prompt(),
  approval_prompt_for_mode(), mode_change_runtime_message().
- Simplify Op::ChangeMode: no longer persists a mode_change event
  (next turn tag carries the current mode).
- Update and rename affected tests.

Builds on #2801. Reduces per-request runtime prompt overhead by 97%
(~471 tokens saved per API call). System prompt grows by ~1325 tokens
in the frozen prefix (one-time miss cost); break-even at 3 API calls.
2026-06-07 15:03:43 +08:00
reidliu41 00407b5bf8 feat(config): add hotbar slot persistence
Add durable [[hotbar]] config bindings for slots 1-8, including default
  bindings when no hotbar config is present.

  Validate bindings without panicking: skip out-of-range slots, use the last
  duplicate slot, and preserve unknown actions so future UI layers can show
  disabled placeholders.
2026-06-07 14:42:52 +08:00
Hunter B 3d676c2509 chore(tui): harden exec harness signals 2026-06-06 22:55:23 -07:00
Hunter B fde931ee89 chore(release): allow trusted v0.9 contributors 2026-06-06 19:56:11 -07:00
Hunter B f2159b7827 docs(release): honor v0.9 contributor credits 2026-06-06 19:45:28 -07:00
Hunter B 9b500a7b91 Prepare v0.9.0 release build 2026-06-06 19:39:02 -07:00
Hunter Bown 59d12f3b6a Merge pull request #2871 from aboimpinto/feat/2791-command-parity-harness
Layer 1: clean command support boundaries
2026-06-06 19:24:31 -07:00
Paulo Aboim Pinto fefd63f30b fix: address command layer review feedback 2026-06-07 03:19:45 +02:00
Paulo Aboim Pinto 18df8db056 refactor: extract neutral command support 2026-06-07 02:44:29 +02:00
Paulo Aboim Pinto 8e8b45a20e test: make command-adjacent tests hermetic 2026-06-07 02:44:15 +02:00
Paulo Aboim Pinto 5300dc484e chore: enforce lf for rust sources 2026-06-07 02:44:08 +02:00