Commit Graph

2413 Commits

Author SHA1 Message Date
Hunter B face4dc27a Merge PR #2877 from LeoAlex0: cache_inspect test spillover root 2026-06-07 10:21:00 -07:00
Hunter B a54d08f28d chore(fmt): rustfmt engine tests from PR #2874
Mechanical rustfmt of the runtime_prompt tests rewritten in PR #2874
(LeoAlex0). No logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 10:10:11 -07:00
Hunter B 3619962507 Merge PR #2874 from LeoAlex0: slim runtime_prompt to minimal tag 2026-06-07 10:09:21 -07:00
Hunter B a42e9115b1 Merge PR #2873 from reidliu41: hotbar slot persistence 2026-06-07 10:09:21 -07:00
Hunter B 2c56f7761e Merge PR #2887 from aboimpinto: Gherkin acceptance E2E harness 2026-06-07 10:04:12 -07:00
Hunter B b0d9c3196b Merge PR #2878 from aboimpinto: Layer 2 command parity harness 2026-06-07 10:04:08 -07:00
Paulo Aboim Pinto c25f7af219 Address acceptance harness review feedback 2026-06-07 16:29:40 +02:00
Paulo Aboim Pinto d90031f06f Add Gherkin acceptance E2E harness example 2026-06-07 16:12:12 +02:00
zLeoAlex 55d7499408 test: add runtime_policy_reference composition test, strengthen ChangeMode tests, fix outdated comments
- Add runtime_policy_reference_is_included_in_full_prompt test to verify
  that render_runtime_policy_reference() output lands in the composed
  system prompt. Guards against silent breakage if the push_str() call
  is accidentally removed (all existing tests would still pass).

- Strengthen change_mode_op_updates_current_mode_and_emits_status:
  destructure SessionUpdated to assert that session messages do NOT
  contain <runtime_prompt> tags after mode change — verifying the core
  invariant that Op::ChangeMode does not write session history.

- Extend current_mode_field_assignment_takes_effect_synchronously:
  now also verifies that messages_with_turn_metadata() produces the
  correct runtime tag (mode="yolo" approval="auto") after a mode
  switch, covering the tag-generation mechanism end-to-end.

- Fix outdated comments in composed_prompt_no_longer_inlines_tool_taxonomy
  and plan_prompt_taxonomy_omits_run_tests: replace stale references to
  deleted <mode_prompt> metadata with accurate descriptions of the
  ## Runtime Policy Reference section.
2026-06-07 18:31:36 +08:00
Paulo Aboim Pinto acaae1c2e5 test(tui): address command harness review 2026-06-07 12:24:13 +02:00
Paulo Aboim Pinto 96bff65797 test(tui): add command parity harness 2026-06-07 11:43:57 +02:00
zLeoAlex 256f34c621 fix(cache): set temp spillover root in cache_inspect test to survive nix sandbox
The test cache_inspect_displays_tool_result_budget_metadata relied on a
writable $HOME/.codewhale/tool_outputs/ for tool-result wire-dedup
persistence.  nix build sandboxes have a read-only home tree, so the
first tool-result SHA spillover write failed, the dedup hash table was
never populated, and the second identical tool result was not marked
deduplicated — causing the expect("repeat tool-result sighting should
report dedup metadata") assertion to fail.

Set TEST_SPILLOVER_ROOT to a tempdir inside the test (matching the
with_tool_result_sha_spillover_root pattern in chat.rs), so the
wire-dedup path works in any environment without depending on $HOME.
2026-06-07 16:06:38 +08:00
zLeoAlex 7b900b8699 test(cache): rename misleading test — does not exercise Op::ChangeMode dispatch
- Rename mode_change_op_updates_current_mode_and_emits_session_updated
  to current_mode_field_assignment_takes_effect_synchronously.
- The test directly mutates engine.current_mode, not through Op::ChangeMode.
  The dispatch path is separately covered by
  change_mode_op_updates_current_mode_and_emits_status.
2026-06-07 15:26:54 +08:00
zLeoAlex c6c3d2cc4d refactor(cache): inline single-call helpers, remove dead code
- Inline mode_prompt_marker_value and approval_prompt_marker_value into
  runtime_prompt_text (each called exactly once).
- Remove default_approval_mode_for_mode — zero callers.
2026-06-07 15:22:53 +08:00
zLeoAlex 039abb2ae6 refactor(cache): remove render_core_tool_taxonomy_block, inline to body variant
- Replace the 2 remaining test callers with render_core_tool_taxonomy_body
  (neither test depends on the ## heading — they check content only).
- Delete render_core_tool_taxonomy_block — zero production callers after
  the previous refactor.
2026-06-07 15:20:51 +08:00
zLeoAlex 12167b39c3 refactor(cache): replace taxonomy_body strip hack with source-level render_core_tool_taxonomy_body
- Add render_core_tool_taxonomy_body(mode) that generates the tool
  taxonomy text without the ## Core Tool Taxonomy heading.
- Refactor render_core_tool_taxonomy_block to use the body function
  internally (DRY).
- Delete taxonomy_body() — a downstream strip_prefix hack that
  worked around the source format instead of fixing it.
- Also removes the now-unnecessary debug_assert! (over-defensive,
  since the two functions are co-located in the same file).
2026-06-07 15:19:27 +08:00
zLeoAlex 0b5d574e63 fix(cache): address CR feedback — blank lines, heading hierarchy, debug_assert
- Add proper blank lines (\n\n) before mode headings in
  render_runtime_policy_reference (CommonMark/GFM compliance).
- Demote subheadings in agent.md from ##### to ###### so they
  nest correctly under the demoted main heading.
- Add debug_assert! in taxonomy_body() to loudly fail when
  render_core_tool_taxonomy_block format changes, preventing
  silent heading-hierarchy breakage.
2026-06-07 15:15:12 +08:00
zLeoAlex 427bd5d52f feat(cache): slim runtime_prompt to minimal tag, move policy descriptions to system prompt
- Add render_runtime_policy_reference() in prompts.rs containing all
  mode and approval policy descriptions in the frozen system-prompt
  prefix (sent once per session, cache-hit thereafter).
- Simplify runtime_prompt_text() from ~500-token XML block to a ~16-token
  self-closing tag (<runtime_prompt visibility="internal" mode="..." approval="..."/>).
- Fix markdown heading hierarchy in all prompts/modes/*.md and
  prompts/approvals/*.md (## → #####) to nest correctly under ####.
- Remove now-unused legacy functions: mode_prompt(),
  approval_prompt_for_mode(), mode_change_runtime_message().
- Simplify Op::ChangeMode: no longer persists a mode_change event
  (next turn tag carries the current mode).
- Update and rename affected tests.

Builds on #2801. Reduces per-request runtime prompt overhead by 97%
(~471 tokens saved per API call). System prompt grows by ~1325 tokens
in the frozen prefix (one-time miss cost); break-even at 3 API calls.
2026-06-07 15:03:43 +08:00
reidliu41 00407b5bf8 feat(config): add hotbar slot persistence
Add durable [[hotbar]] config bindings for slots 1-8, including default
  bindings when no hotbar config is present.

  Validate bindings without panicking: skip out-of-range slots, use the last
  duplicate slot, and preserve unknown actions so future UI layers can show
  disabled placeholders.
2026-06-07 14:42:52 +08:00
Hunter B 3d676c2509 chore(tui): harden exec harness signals 2026-06-06 22:55:23 -07:00
Hunter B fde931ee89 chore(release): allow trusted v0.9 contributors 2026-06-06 19:56:11 -07:00
Hunter B f2159b7827 docs(release): honor v0.9 contributor credits 2026-06-06 19:45:28 -07:00
Hunter B 9b500a7b91 Prepare v0.9.0 release build 2026-06-06 19:39:02 -07:00
Hunter Bown 59d12f3b6a Merge pull request #2871 from aboimpinto/feat/2791-command-parity-harness
Layer 1: clean command support boundaries
2026-06-06 19:24:31 -07:00
Paulo Aboim Pinto fefd63f30b fix: address command layer review feedback 2026-06-07 03:19:45 +02:00
Paulo Aboim Pinto 18df8db056 refactor: extract neutral command support 2026-06-07 02:44:29 +02:00
Paulo Aboim Pinto 8e8b45a20e test: make command-adjacent tests hermetic 2026-06-07 02:44:15 +02:00
Paulo Aboim Pinto 5300dc484e chore: enforce lf for rust sources 2026-06-07 02:44:08 +02:00
Hunter Bown ad70739b6a Merge pull request #2868 from Hmbown/codex/v090-vscode-git-meta
feat(vscode): show thread git metadata
2026-06-06 10:51:43 -07:00
Hunter B ce17f06db5 feat(vscode): show thread git metadata 2026-06-06 10:50:48 -07:00
Hunter B 6b1de930af chore(release): credit direct v0.9 community merges 2026-06-06 10:49:25 -07:00
Hunter Bown 0b96e8923a Merge pull request #2864 from ljm3790865/feat/tab-core-narrow
feat(tui): add multi-tab system core (manager + persistence)
2026-06-06 10:41:50 -07:00
Hunter Bown a7c1c034ab Merge pull request #2866 from reidliu41/feat/hotbar-action-registry
feat(tui): add hotbar action registry foundation
2026-06-06 10:40:23 -07:00
Hunter Bown 461c22f327 Merge pull request #2867 from ousamabenyounes/fix/azerty-altgr-at-key-conflict
fix(tui): prevent AltGr from swallowing @/#/$/!/%/ characters in composer
2026-06-06 10:37:48 -07:00
Hunter B ffaf110957 fix(tui): advance tab restore counters 2026-06-06 10:34:49 -07:00
Hunter B c9ce6c920b fix(tui): harden hotbar action dispatch 2026-06-06 10:32:18 -07:00
Hunter B 700a36edf1 style(tui): format AltGr sidebar shortcut guards 2026-06-06 10:30:17 -07:00
Ousama Ben Younes da6b8141ad fix(tui): prevent AltGr from swallowing @/#/$/!/%/ characters in composer
On Windows, AltGr is delivered as Ctrl+Alt by crossterm. European keyboard
layouts (French AZERTY, German QWERTZ, etc.) use AltGr to type characters
like @ (AltGr+0), # (AltGr+3), etc. The sidebar-focus shortcuts for
Alt+@/Alt+!/Alt+#/Alt+$/Alt+%) were matching on "contains ALT" alone,
swallowing these AltGr-typed characters instead of inserting them into
the composer.

Exclude the Ctrl modifier from these sidebar-focus shortcut guards so
AltGr-typed glyphs fall through to the  catch-all and
are inserted as text. This is consistent with the has_ctrl_or_alt /
is_altgr philosophy in key_hint.rs, which already treats Ctrl+Alt as
AltGr to preserve European keyboard input.

Closes #2863
2026-06-06 15:48:30 +00:00
reidliu41 1f99fcbd97 feat(tui): add hotbar action registry foundation
Introduce the hotbar action trait and registry, and register the built-in app
  actions needed by the first hotbar slice.
2026-06-06 23:23:48 +08:00
G1 Agent 7fcd7d7469 chore(tui): address Phase 2 bot review on narrow tab-core harvest
Six fixes for the bot review comments landed on PR #2864 head
649d3990. See phase2-playbook.md §7 for the triage rationale.

* persistence.rs: oversized state file now surfaces an InvalidData
  error instead of silently returning a default. The old behaviour
  would let the next save overwrite the oversized file and destroy
  the user's data. Test updated to expect the error.

* persistence.rs: PersistedDelegation gains a `status` field so
  in-flight `InProgress` delegations aren't silently demoted to
  `Pending` on restart. The snapshot now writes the live status
  and restore_from_snapshot honours it. Adds a regression test.

* mention.rs: resolve_tab_mention no longer sorts its input — tab
  mentions (@Tab2) must map to the visual order in the tab bar,
  not to an arbitrary ID sort. Test updated.

* manager.rs: `pending_tasks` renamed to `completed_delegations`
  because the getter returns completed DelegationResults, not
  in-flight tasks. Docstring points to the in-flight getter
  `pending_delegations` to avoid the same confusion recurring.

* manager.rs: delegate_task and start_meeting now validate that
  the from/to tab IDs (or all participant IDs) currently exist
  in the manager. Returns `None` on any unknown ID, preventing
  orphaned tasks / meetings with stale tab references. Two new
  regression tests cover both methods.

Local CI matrix (Windows runner, flags matching ci.yml):
- cargo fmt --all -- --check: exit 0
- cargo clippy --workspace --all-features --locked -- -D warnings: exit 0
- cargo test --workspace --all-features --locked: 4282 pass, 6 fail
  (the 6 failures are pre-existing on the baseline; not caused
  by this PR)
- git diff --exit-code -- Cargo.lock: exit 0

The three deferred threads (#1 close_tab cleanup, #5 cross_tab_links
snapshot, #8 TabGroup::new collision risk) are explicitly out of
scope here; they belong to the follow-up collab/UI PR per the
narrow-harvest promise to Hmbown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 20:23:51 +08:00
G1 Agent 649d3990d6 feat(tui): add multi-tab system core (manager + persistence)
Sister PR to #2753, scoped to the narrow tab-core/persistence slice
Hmbown asked for in the v0.9 stewardship review. Adds the `tab`
module under `crates/tui/src/tui/` and a one-line module registration
in `tui/mod.rs`. Nothing else in the host changes here — the
switcher / picker / meeting UI pass and the host wiring
(`App::tab_manager`, keyboard shortcuts, mouse menu, tab-bar layout
in `ui.rs`) live on #2753 and land in a follow-up PR.

Scope:

* `tab::TabManager` with monotonic `next_tab_id`, max 9 tabs by
  default, snapshot/restore round-trip, group assignment,
  cross-tab event/links, and persistence integration
* `tab::delegator::TaskDelegator` — bounded pending queue with
  `MAX_COMPLETED_RESULTS` auto-prune; `take_pending_for_tab` marks
  InProgress in place and returns a clone so subsequent
  `start_task` / `complete` / `fail_task` / `cancel_task` can still
  find the task (the previous `swap_remove` would have dropped it
  on the first call)
* `tab::meeting::MeetingManager` — participants, messages by
  type (Regular / Question / Answer / Proposal / Agreement /
  Objection / Summary), decisions
* `tab::cross_tab` — `CrossTabEvent` (TaskDelegation / ReviewRequest
  / MeetingInvite / ContextSync / ResultReturn) and `SharedContext`
* `tab::group` — `TabGroup` / `TabGroupManager` /
  `GroupColor` (Red/Orange/Yellow/Green/Cyan/Blue/Magenta/Gray)
* `tab::mention` — `@Tab<N>`, `@N`, `@tab<n>` (case-insensitive)
  parser, with `resolve_tab_mention` for 1-indexed tab lookups
* `tab::persistence` — JSON file in the user's data dir,
  `PersistedTabState` / `PersistedTab` / `PersistedDelegation` /
  `PersistedGroup` with schema-version header, atomic save,
  bounded file size, corruption-tolerant load
* `tab::benches` and `tab::key_e2e` regression suites (roundtrip,
  save/load, end-to-end save→load with keymap)

Lint posture:

* `#![allow(dead_code, unused_imports)]` on `tab/mod.rs` because
  the collab/UI pass is not on this branch; the public surface is
  intentionally exposed for the follow-up wiring in #2753
* One pre-existing `tools/shell.rs` fix piggy-backed: drop the
  redundant `as *mut c_void` cast on `child.as_raw_handle()` (the
  return type is already `*mut c_void`). Pre-existing on
  `codex/v0.9.0-stewardship`; promoted to `clippy::unnecessary_cast`
  by rust 1.95

Local CI matrix on this branch (Windows runner, same flags as
`.github/workflows/ci.yml`):

* `cargo fmt --all -- --check` — pass
* `cargo clippy --workspace --all-features --locked -- -D warnings`
  — pass, 0 errors
* `cargo test --bin codewhale-tui tab::` — 63/63 pass
* `git diff --exit-code -- Cargo.lock` — clean

GitHub Actions will run on the standard fork-PR approval gate; the
`on.pull_request` branch filter on this repo matches
`codex/v0.9.0-stewardship` so the same matrix will run on
ubuntu-latest, macos-latest, and windows-latest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 19:06:38 +08:00
Hunter Bown 5bd2f6a99b feat(runtime-api): expose git status metadata for agent view (#2862) 2026-06-06 02:51:21 -07:00
Hunter Bown cc3cbc823c docs(release): record Linux startup evidence (#2861) 2026-06-06 02:45:11 -07:00
Hunter Bown 137d65c31a docs(release): record DeepSeek v4 live smoke (#2860) 2026-06-06 02:41:09 -07:00
Hunter Bown 7bd68279e7 docs(release): record macOS startup evidence (#2859) 2026-06-06 02:37:22 -07:00
Hunter Bown b2e1ba13df docs(release): mark asset verification as pre-npm gate (#2858) 2026-06-06 02:33:51 -07:00
Hunter Bown ab8e3a12ca docs(release): record v0.9 core gate evidence (#2857) 2026-06-06 02:32:29 -07:00
Hunter Bown 2561a54df0 docs(release): close v0.9 credit rollback gates (#2856) 2026-06-06 02:24:16 -07:00
Hunter Bown a5a6b0a2d0 docs(release): record slash picker v0.9 evidence 2026-06-06 02:14:02 -07:00
Hunter Bown e69ea4539a docs(release): resolve v0.9 UI acceptance cutline 2026-06-06 02:11:38 -07:00