Commit Graph

1671 Commits

Author SHA1 Message Date
Hunter B fb86737a8c test(settings): assert migrated settings display canonical path
Extend the #2730 settings migration harvest with the missing platform-config fallback display assertion from review, and keep the v0.9 execution map/changelog credit current.

Validation: cargo fmt --all -- --check; git diff --check; cargo test -p codewhale-tui --bin codewhale-tui --locked settings_ -- --nocapture; cargo test -p codewhale-tui --bin codewhale-tui --locked display_localizes_header_and_config_file_label -- --nocapture.

Harvested from PR #2730 by @xyuai.

Co-authored-by: xyuai <281015099+xyuai@users.noreply.github.com>
2026-06-03 21:02:46 -07:00
Hunter B 23c9481af1 feat: add HarmonyOS OpenHarmony support
Harvest the HarmonyOS/OpenHarmony port from PR #2634 and make it publish-safe by target-gating unsupported host dependencies out of the OHOS TUI graph. Self-update is disabled on OHOS, PTY shell mode reports unsupported, and Starlark execpolicy parsing returns an explicit unsupported-platform error until upstream starlark/rustyline/nix support catches up.

Add OHOS SDK setup docs and launcher scripts, install the rustls ring provider for rustls-no-provider entrypoints, and keep the packaged codewhale-tui OHOS graph free of starlark, rustyline, nix@0.28, portable-pty, and arboard.

Validation: cargo fmt --all -- --check; git diff --check; git diff --cached --check; cargo check -p codewhale-cli --locked; cargo check -p codewhale-app-server --locked; cargo check -p codewhale-tui --locked; cargo test -p codewhale-cli --locked update::tests::; cargo test -p codewhale-release --locked; cargo test -p codewhale-tui --locked background_tty_command_has_controlling_terminal; cargo test -p codewhale-tui --locked clipboard; cargo package -p codewhale-tui --allow-dirty --no-verify --locked; packaged OHOS cargo tree checks. OHOS target check still requires a loaded OpenHarmony SDK/sysroot and currently stops in ring with missing assert.h when CC/CFLAGS/linker are unset.

Harvested from PR #2634 by @shenjackyuanjie.

Co-authored-by: shenjackyuanjie <54507071+shenjackyuanjie@users.noreply.github.com>
2026-06-03 21:02:46 -07:00
HUQIANTAO 98edba3683 refactor(engine): append turn metadata after user text
Place user text before volatile turn metadata in outgoing user-message content arrays so provider prefix caches can continue matching the stable user-input prefix across date, model-route, and working-set changes.

Also adds wire-level coverage proving tail-positioned turn metadata serializes after user text while preserving turn-meta deduplication.

Harvested from PR #2517 by @HUQIANTAO

Co-authored-by: HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
2026-06-03 21:02:45 -07:00
Hunter B 60f8e7d62e refactor(web_run): split cache locks for page reads
Harvested from PR #2502 by @HUQIANTAO

Co-authored-by: HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
2026-06-03 21:02:45 -07:00
Hunter B 311eb4002b feat(tui): add bounded restore snapshot listing
Harvested from PR #2513 by @cyq1017.

Co-authored-by: cyq1017 <61975706+cyq1017@users.noreply.github.com>
2026-06-03 21:02:45 -07:00
Hunter B 8502784218 fix(xiaomi-mimo): use token-plan api-key auth
Harvests the MiMo Token Plan auth-header behavior from #2627 while keeping Xiaomi env-key precedence unchanged so standard endpoints do not accidentally receive a Token Plan key.

Harvested from PR #2627 by @xyuai.

Co-authored-by: xyuai <281015099+xyuai@users.noreply.github.com>
2026-06-03 21:02:45 -07:00
Hunter B 159f509dd6 fix(tui): invalidate fanout card rows on sibling starts 2026-06-03 21:01:39 -07:00
jrcjrcc 7b2a7e513d fix: Windows sub-agent completion halves TUI render width
Root cause: AgentComplete unconditionally calls resume_terminal()
even when the terminal was never paused, causing a secondary
EnterAlternateScreen on Windows that creates a new buffer whose
width may differ from the window width. Additionally,
ColorCompatBackend had no terminal_size cache, so size() fell
through to crossterm::terminal::size() which on Windows returns
the WinAPI buffer width rather than the window width.

Changes:
- AgentComplete: add event_broker.is_paused() guard
- resume_terminal(): cache real terminal size before reset_viewport
- Resize handler: also set terminal_size alongside forced_size
- subagent_routing: 3x mark_history_updated -> bump_history_cell(idx)
- color_compat: add terminal_size field, set_terminal_size(), fix
  size() fallback priority (forced_size > terminal_size)
- tests: 3 unit tests for size() fallback chain

Review feedback addressed:
- forced_size now takes priority over terminal_size (gemini-code-assist)
- Redundant map lookups removed in subagent_routing (both bots)
- set_terminal_size moved before reset_terminal_viewport (greptile-apps)

(cherry picked from commit 4463c46644a6e485e7e20dc2b19c29c2e8eb3c5c)
2026-06-03 21:01:39 -07:00
Hunter B 4401f7a2e5 feat(tools): hide legacy subagent and shell aliases from model catalog (#2683)
Subagent aliases:
- Legacy names (agent_spawn, agent_result, agent_cancel, resume_agent,
  agent_list, agent_send_input, agent_assign, agent_wait,
  delegate_to_agent) are already NOT registered — they exist as dead code
  with #[allow(dead_code)] since v0.8.33
- Add test verifying model catalog only advertises canonical subagent
  tools: agent_open, agent_eval, agent_close, tool_agent

Shell aliases:
- Hide exec_wait from model catalog (legacy alias for exec_shell_wait)
- Hide exec_interact from model catalog (legacy alias for
  exec_shell_interact)
- Both remain callable for saved transcript replay
- Add test verifying shell aliases are hidden but callable

Verification: cargo test -p codewhale-tui --locked (4040 passed),
cargo clippy -D warnings
2026-06-03 21:01:38 -07:00
Implementist 88422f3ad3 fix(plan_prompt): pre-wrap CJK+Latin mixed text to avoid forced line-breaks at script boundaries
Wrap plan steps via wrap_text() before rendering, breaking only on display-width overflow, not on Latin/CJK Unicode word boundaries. Switch main render path from Wrap { trim: true } to Wrap { trim: false } since all content is pre-wrapped. Replace wrapped_line_count() with lines.len() for accurate scroll bounds. Keep confirm-exit dialog on Wrap { trim: true } (English-only, no risk).
2026-06-03 21:01:38 -07:00
Implementist 966b5cf1fb refactor(plan_prompt): use display-width in wrap_text, skip wasted render work
- wrap_text: replace chars().count() with UnicodeWidthStr::width() so
  CJK text is wrapped by display columns, consistent with
  wrapped_line_count and ratatui's Paragraph::wrap.  Also fix the
  hard-split loop to use exclusive byte ranges (..end) instead of
  inclusive (..=i) so multi-byte UTF-8 prefixes are always valid.
- render: hoist the confirming_exit branch to an early return so the
  plan-content construction (lines, scroll bounds, footer) is skipped
  entirely when the confirmation dialog is visible.
2026-06-03 21:01:38 -07:00
Implementist e3a52555eb fix(plan_prompt): clear pending_g on Esc, deduplicate render_modal_chrome
- Clear pending_g when Esc triggers the exit-confirmation prompt so a
  stray 'g' press does not leak into and survive the confirmation dialog.
- Move render_modal_chrome into the else branch so only one call fires
  per render pass, eliminating a shadow artifact when confirming_exit
  is active.
2026-06-03 21:01:38 -07:00
Implementist 47c071a0d5 chore: apply cargo fmt fix to plan_prompt.rs 2026-06-03 21:01:38 -07:00
Implementist 6d79d55b6c fix(plan_prompt): use display-width for leading spaces and de-hardcode wrap width
- wrapped_line_count: compute leading-space width via UnicodeWidthStr
  instead of byte length, so non-ASCII leading whitespace is measured
  correctly.
- render: hoist popup_area / content_width computation above plan
  rendering so wrap_text can share the same content_width derived
  from the actual popup geometry instead of a magic 68.
2026-06-03 21:01:38 -07:00
Implementist 11c448d66e fix(plan_prompt): remove step truncation to allow content overflow into scroll region 2026-06-03 21:01:38 -07:00
Implementist 537a8bccf3 fix(tui): replace manual div_ceil with usize::div_ceil to satisfy clippy lint 2026-06-03 21:01:38 -07:00
Implementist 14db9b2466 fix(tui): avoid spurious exit-confirmation on short plan after scroll key
Use clamped (effective) scroll instead of raw `self.scroll` in the Esc
handler so a short plan that fits entirely (max_scroll == 0) never
triggers the "exit without implementing?" dialog when the user pressed
a scroll key (PgDn/Ctrl-D/G/End) beforehand.
2026-06-03 21:01:38 -07:00
Implementist 1669e3c12e fix(tui): address code review feedback on plan prompt modal
- Use word-wrapping-aware line count to prevent underestimating scroll range
  (gemini-code-assist / greptile-apps)
- Merge PLAN_OPTIONS, PLAN_SHORTCUTS, PLAN_SHORT_LABELS into PlanOption struct
  (gemini-code-assist)
- Remove dead Esc code in handle_key (greptile-apps)
- Guard gg/G with modifier checks (gemini-code-assist)
- Increase PgUp/PgDn scroll amount from 6 to 12 (greptile-apps)
- Use u16::try_from for scroll value to avoid silent truncation (greptile-apps)
- Update related unit tests for new scroll values
2026-06-03 21:01:38 -07:00
Implementist 68784cff52 fix(tui): add scroll support to plan prompt modal
- Add scroll state field to PlanPromptView with PgUp/PgDn, Ctrl+U/D/F/B,
  Home/End, gg/G vim-style keybindings
- Show scroll indicator footer when content overflows the popup
- Add confirming_exit state: Esc while scrolled asks for confirmation
  before discarding, preventing accidental exits on long plans
- Clamp scroll in render() so overscroll doesn't hide bottom options
- Use wrapped_line_count() with UnicodeWidthStr for accurate overflow
  detection with CJK characters
- Add 11 unit tests covering scroll, keybindings, and exit confirmation
2026-06-03 21:01:38 -07:00
HUQIANTAO 863f55cc68 perf(history): simplify output_rows cache API and switch to FNV-1a
Three follow-ups to the previous perf commit:

1. Drop the rows_hash field on CacheEntry. The field was computed and
   stored but never read on the hot path; tests exercised it only to
   assert the cache returned a stable hash. After this change
   get_or_compute_rows returns just Vec<OutputRow>, halving the
   tuple-return ABI and removing one DefaultHasher::write pass on
   every cache miss.

2. Replace DefaultHasher (SipHash) with a hand-rolled FNV-1a 64-bit
   hash. SipHash is per-process-keyed and ~5-10x slower than FNV on
   the small-to-medium tool output strings we see at 120 FPS. FNV-1a
   has no per-process key, fits in 20 lines of pure-Rust, and a 64-bit
   collision space is more than wide enough for the per-process LRU's
   expected <= a few hundred entries. The cache is a correctness
   optimization, not a security boundary; collisions only cause a
   false miss, never wrong data.

3. Caller in tui::history::render_preserved_output_mode updated to
   the new Vec<OutputRow>-only signature. Two new tests cover the
   FNV-1a properties (length-suffix sensitivity, empty-input
   stability).
2026-06-03 21:01:38 -07:00
HUQIANTAO 3b0ef3f63c perf(history): cache output_rows and selected_output_indices per cell
output_rows (in tui::history) walks the raw tool output, ANSI-strips
each line, classifies path/URL-like rows, and wraps the rest to the
current viewport width. selected_output_indices then computes the
head/tail/importance subset that the compact Live view shows. Both
functions are pure, but they are called on every render frame for
every visible tool cell. For a 4 KB tool output on a 120 FPS render
loop that is 2-6 redundant walks per frame, per cell, and the
function is called from a non-trivial number of cells across
exec, tool, command, and review history.

Add tui::output_rows_cache, a thread-local, content-addressed cache
keyed on (content_hash, width) for the rows and (content_hash, width,
line_limit) for the indices. The cache stores the wrapped
Vec<OutputRow> plus a per-line-limit map of selected indices on a
single entry, so a single key lookup satisfies both render steps.

render_preserved_output_mode now consults the cache for both the
rows and the indices; on a hit, neither the per-line ANSI strip nor
the importance-ranking pass runs. The cache is bounded (default
capacity 256) with insertion-order eviction. The OutputRow struct
gains PartialEq + Eq + pub fields so the cache module can store and
hash it without exposing private internals.

Tests: 6 new unit tests cover the hit/miss path, width invalidation,
content invalidation, indices per-line_limit caching, capacity
eviction, and hash stability. The wider tui::history test suite (68
tests) still passes.
2026-06-03 21:01:38 -07:00
HUQIANTAO c0b36824c2 perf(capacity): let scan_canonical_inputs early-exit without verified-user lookup
The build_canonical_state path never reads
CanonicalStateScan::latest_verified_user_idx, but the previous patch
required is_complete() to find a verified user message before it would
short-circuit. On a long history with no verification replay — the
common case — the scan walked the entire message list looking for a
match that could not exist.

Add a find_verified: bool parameter to scan_canonical_inputs and
CanonicalStateScan::is_complete. build_canonical_state now passes
false, so the loop stops as soon as the goal and CANONICAL_SCAN_MAX_FACTS
facts are found. The replan path (apply_verify_and_replan) keeps the
existing true behavior so it still locates the latest verified user
message.

Test calls are updated to match; no behavior change for any test.
2026-06-03 21:01:38 -07:00
HUQIANTAO 837a6f8c54 perf(capacity): collapse build_canonical_state's reverse scans to one pass
build_canonical_state previously did two independent reverse walks of
session.messages — one to extract the most recent user goal, and one
to collect up to four confirmed-fact snippets. apply_verify_and_replan
then added a third and fourth reverse scan to locate the latest user
message and the latest [verification replay] user message for the
re-plan path.

All four reverse scans collect disjoint facts about the same most-
recent-first view of the conversation. This PR folds them into a
single helper, scan_canonical_inputs, that walks messages once in
reverse, fills a CanonicalStateScan, and short-circuits as soon as
every collector is satisfied. The helper exposes the latest-message
indices so apply_verify_and_replan can clone the full Message values
after the scan (eliminating the two independent find().cloned() walks).

The output CanonicalState is byte-identical to the prior
implementation: same goal, same confirmed facts (newest first, errors
filtered), same fallback string when no user text exists. The re-plan
path's keep-messages set is identical: latest user + latest verified.

Tests: 6 new unit tests cover the goal lookup, fact cap, error-result
filter, verified-marker scan, empty input, and the early-exit
condition. The full engine test suite (153 tests) still passes.
2026-06-03 21:01:38 -07:00
HUQIANTAO e3adc98baf perf(prefix-cache): fold tool.strict into identity hash, share cache with PrefixFingerprint::compute
Three follow-ups to the previous perf commit:

1. Correctness: tool.strict participates in the wire format emitted by
   tool_to_api_json, so it MUST participate in the cache identity. Two
   catalogs that differ only in strict would otherwise collide and serve
   a stale SHA-256, silently busting prefix-cache stability on the wire.

2. Allocation: replace the per-tool serde_json::to_string in
   tool_set_identity with a hash_json_value helper that walks the JSON
   tree directly. For a 60-tool catalog this drops ~25-40 KB of
   transient allocation per cache miss.

3. Dead code: the previous patch introduced PrefixFingerprint::compute,
   CachedCatalog::joined, ToolCatalogCache::{invalidate,is_empty}, and a
   thread-local cache helper that were not used outside tests. With
   -D warnings in CI all four triggered dead-code errors. The compute
   helper is now only built in cfg(test); the rest are marked
   #[allow(dead_code)] with comments explaining their observability and
   test-only use.
2026-06-03 21:01:37 -07:00
HUQIANTAO baef5ba95d perf(prefix-cache): cache tool-catalog JSON serialization across checks
PrefixFingerprint::compute is called once per turn by the turn loop
prefix-stability check. The tool-side work serializes every tool to the
chat-API JSON shape, sorts the resulting strings, joins with newlines,
and SHA-256s the result. For a 60-tool catalog that is ~25-40 KB of
allocation plus a sort, all of which produces a byte-identical output
once the tool set is stable across turns (the common case after the
first turn of a session).

Introduce a process-local ToolCatalogCache that stores the joined+sorted
catalog under a content-derived u64 identity (length + per-tool name +
description + serialized input_schema). On a hit, the per-tool JSON
serialization, sort, and join are skipped entirely — the pre-computed
SHA-256 hex digest is returned directly.

The cache lives on PrefixStabilityManager (per-session ownership) and
backs a new PrefixFingerprint::compute_with_tool_cache entry point.
check_and_update, PrefixStabilityManager::new, and pin() all use the
cached path. The original compute() is kept as a fallback for callers
that do not have a cache in hand (e.g. CLI tools that build a one-shot
fingerprint).

The cache is bounded (default capacity = 8) and uses insertion-order
eviction, matching the eviction strategy already in
transcript_cache.rs. invalidate() is exposed for tool-registry hot-reload
and MCP attach paths.

Tests: 8 new unit tests cover the miss/hit path (pointer-equal Arc on
hit), identity collisions, schema change detection, capacity eviction,
invalidate, empty slice, and the equivalence between cached and uncached
fingerprints. The full 30-test prefix_cache suite passes; the wider
prefix-cache contract tests in settings, prompts, and
core::engine::tests continue to pass.
2026-06-03 21:01:37 -07:00
HUQIANTAO 3de07a99ed perf(engine): memoize estimated_input_tokens via content-keyed cache
The token estimator walks the full session.messages and the active system
prompt. Five call sites per turn in the engine (capacity pre/post tool
checkpoints, error escalation, the seam manager, the trim budget check)
plus four TUI/command consumers (footer, /status, /debug, context
inspector) all re-walked the same data independently. On a 200-message
history with 5 KB of tool results that is roughly 2 ms per call, or
~20 ms of pure waste on a single turn.

Introduce a process-local TokenEstimateCache keyed on
(session.messages_revision, system_prompt_fingerprint). Repeated calls
with the same inputs return the cached value without re-walking the
message list. The cache invalidates as soon as either input changes:
  * session.messages_revision is a monotonic counter bumped in
    Session::add_message, Session::replace_messages, the new
    Session::bump_messages_revision helper, and at every direct
    session.messages mutation site in core/engine.rs and
    core/engine/capacity_flow.rs.
  * system_prompt_fingerprint is a stable 64-bit hash of the
    SystemPrompt::Text or SystemPrompt::Blocks payload.

Also restructures layered_context_checkpoint to compute the estimated
token count before taking a long-lived &SeamManager borrow, and
re-routes the capacity pre/post tool checkpoints to compute the
observation into a local before calling
capacity_controller.observe_*. Both refactors are required to satisfy
the borrow checker once estimated_input_tokens requires &mut self.

Tests: 10 new unit tests cover the miss/hit path, revision bumps,
system-prompt changes, audit-ring capacity, and downward-revision
no-ops. The full 157-test engine suite still passes.
2026-06-03 21:01:37 -07:00
xyuai 9e15805f64 fix(settings): tighten legacy path migration coverage 2026-06-03 19:24:37 -07:00
Hunter B f7a602cd20 feat(tools): hide todo_* aliases from model catalog, add deprecation metadata (#2682)
- Add model_visible() hook to ToolSpec trait (default true)
- Override model_visible() -> false on todo_write, todo_add, todo_update, todo_list
- Checklist variants remain model-visible as the canonical surface
- Legacy todo_* calls still work for saved transcript replay
- Return _deprecation metadata with use_instead and removed_in=0.9.0
- Update prompts to recommend checklist_* only
- Update TOOL_SURFACE.md with v0.9.0 deprecation notes
- Add tests for hidden catalog, compat alias behavior, and metadata

Verification: cargo test -p codewhale-tui -- todo, cargo clippy -D warnings
2026-06-03 19:20:23 -07:00
Hunter Bown 8dff2f7525 fix(tui): guard xiaomi mimo defaults test against CI env vars 2026-06-03 16:25:04 -07:00
Hunter Bown 772ec46c98 chore(release): v0.8.53 — Arcee support, telegram bridge, provider fixes
- Fix Rust syntax/clippy fallout in client.rs, cli/src/lib.rs, web_search.rs
- Fix 0.8.53 release metadata: changelog links, TUI changelog, npm wrapper
- Update visible help copy for multi-provider support
- Add telegram-bridge integration with deploy configs
- Add US remote VM quickstart doc
- Update Tencent Cloud deploy scripts and docs
- Bump npm wrapper to 0.8.53
2026-06-03 16:12:38 -07:00
RefuseOdd 8b0e1cc3c0 Limit path suffix to chat completions 2026-06-03 15:34:24 -07:00
RefuseOdd d2999bb402 Add path_suffix to ProviderConfigToml and ProviderConfig
Adds an optional path_suffix field that lets users override the API path
for OpenAI-compatible endpoints. When set, the suffix replaces the default
/v1/<path> pattern, enabling use with endpoints that don't accept /v1/
prefixes (e.g. /chat/completions instead of /v1/chat/completions).

Changes:
- ProviderConfigToml (config crate): path_suffix field
- ProviderConfig (tui crate): path_suffix field
- merge_provider_config: propagates path_suffix
- merge_project_provider_config: propagates path_suffix
- api_url: delegates to new api_url_with_suffix function
- api_url_with_suffix: uses suffix when present, skips /v1 versioning
- DeepSeekClient: reads path_suffix from config, passes to URL builder
- config.example.toml: documents the new option
- Tests for the new URL building behavior

Closes #2089
2026-06-03 15:34:24 -07:00
cyq 45562822f0 feat(agent): classify model families 2026-06-03 15:34:12 -07:00
reidliu41 195dd6b9ab fix(tui): hide shell prompt guidance when shell is disabled
Thread allow_shell into system prompt composition and remove shell-only guidance
  when shell tools are not available.

  This keeps the prompt aligned with the runtime tool catalog and prevents the
  model from trying exec_shell or task_shell_* after allow_shell = false.
2026-06-03 15:28:29 -07:00
xyuai dba332e8d5 fix(tui): persist provider switches to config 2026-06-03 15:28:17 -07:00
Hunter Bown 260ee737b0 style: cargo fmt 2026-06-03 15:18:19 -07:00
Hunter Bown be7a3e7e69 fix(tui): provider picker r shortcut with modifier guard
- add r/R shortcut to re-enter API key for any provider in picker
- guard against Ctrl/Alt/Meta modifiers (only plain r triggers)
- dynamic footer: 'apply' when key exists, 'set key' otherwise
- add 'R edit key' hint to picker footer
- add route/model to scoped auth status output
- add tests for r shortcut, ctrl-r guard, footer text, and route/model

Ports #2717 with review fix. Fixes #2662.
2026-06-03 15:14:39 -07:00
Hunter Bown 5719301d1e fix(auth): all-provider auth status and scoped logout
- auth status shows every known provider with config/keyring/env status
- auth status --provider <id> shows detailed single-provider info
- auth list now probes keyring for all providers (was only active)
- /logout clears only the active provider's key (was clearing all)
- add clear_active_provider_api_key for scoped TOML key removal
- add Huggingface to ProviderArg enum
- add auth status tests for all-provider and scoped views

Fixes #2716
2026-06-03 15:08:28 -07:00
Hunter Bown d9ca5fbbff docs(tui): mirror v0.8.53 changelog 2026-06-03 14:43:08 -07:00
Hunter Bown 28a0f19c13 fix(provider): polish v0.8.53 routing and shell gating 2026-06-03 14:40:25 -07:00
Hunter Bown 5786584767 chore(release): bump workspace to 0.8.53 2026-06-03 12:39:01 -07:00
Hunter Bown ed4ec3f799 Merge branch 'codex/v0.8.53-deprecate-whale-md' into codex/v0.8.53 2026-06-03 12:38:00 -07:00
Hunter Bown 8bc994e492 Merge branch 'codex/v0.8.53-tool-deferred-ux' into codex/v0.8.53 2026-06-03 12:37:53 -07:00
Hunter Bown a10e17a62a fix(context): prefer global AGENTS over WHALE 2026-06-03 12:37:39 -07:00
Hunter Bown f5c8d7e5c5 fix(subagent): align advertised role aliases 2026-06-03 12:37:39 -07:00
Hunter Bown 025089494b fix(rlm): include session object in source hints 2026-06-03 12:37:39 -07:00
Hunter Bown fc8ad7b3a8 feat(project): enrich repo constitution (invariants, branch policy, escalation)
Per the layered-authority clarification (base myth → global Constitution → repo
constitution = local law → task packet → runtime policy), extend
.codewhale/constitution.json beyond authority+verification with optional:

- protected_invariants — repo invariants the agent must not break
- branch_policy — branch/release policy in effect
- escalate_when — conditions to stop and escalate to the user

All optional; rendered as concise model-facing prose. The global Brother Whale
identity anchor and Constitution in prompts/base.md are unchanged (verified
untouched on this branch). Dogfood constitution.json filled with CodeWhale's
real invariants (prefix-cache byte-stability, transcript replay, stable Rust,
cli/tui parity), branch policy (codex/v0.8.53), and escalation rules. Docs note
the layered hierarchy.

cargo test -p codewhale-tui --bins → 3946 passed; clippy clean.
2026-06-03 12:16:06 -07:00
Hunter Bown 9d9616e898 feat(project): deprecate WHALE.md; add .codewhale/constitution.json authority layer
Splits repo-level guidance into two clear artifacts and deprecates the
confusing WHALE.md concept (overlapped with AGENTS.md):

- AGENTS.md is the canonical cross-agent project-instructions file.
- .codewhale/constitution.json is the CodeWhale-specific repo authority /
  prioritization policy (when local sources conflict, which to trust first; what
  to verify before claiming done). Rendered into the system prompt as a
  higher-authority <codewhale_repo_constitution> block; takes precedence over a
  legacy WHALE.md.

WHALE.md migration (compat-preserving):
- AGENTS.md now ranks above WHALE.md in both project and global discovery; with
  both present, AGENTS.md wins.
- WHALE.md is still read as a legacy fallback, but now emits a deprecation
  warning and is never created or recommended (init.rs no longer suggests it).
- Discovery/docs updated; the global CodeWhale Constitution in prompts/base.md
  is unaffected (different thing).

constitution.json:
- New RepoConstitution (serde, all fields optional, unknown fields ignored,
  schema_version checked). Discovered at .codewhale/constitution.json in the
  workspace or any parent up to the git root. Malformed JSON warns, never panics.
- Loaded after the auto-generate fallback so it can't be clobbered.

.gitignore: ignore .codewhale/ contents at any depth EXCEPT the committed
constitution.json (a directory exclude can't be negated, so **/.codewhale/* +
negation). init.rs writes the same pattern for new repos. Dogfood: this repo's
.codewhale/constitution.json added.

find_git_root made pub(crate) and reused (no duplicate loader).

Tests: AGENTS-over-WHALE precedence, WHALE legacy-read-with-warning,
constitution render + system-block surfacing, malformed-constitution warning,
gitignore-keeps-constitution. cargo test -p codewhale-tui --bins → 3946 passed;
clippy clean.

Targets codex/v0.8.53.
2026-06-03 12:12:34 -07:00
Hunter Bown 7bbc6b78e4 fix(tools): activate read-only git history + actionable RLM/field errors
v0.8.53 tool/deferred/error UX (PR group 4), low-risk subset:

- #2654: add git_log and git_show to DEFAULT_ACTIVE_NATIVE_TOOLS so read-only
  git history joins git_diff/git_status in the active partition (kept
  alphabetical → prefix-cache head stays sorted/byte-stable). git_blame and
  other history tools remain deferred.
- #2655: rlm_open's source-count error now echoes common misnamed fields with a
  "did you mean file_path/content/url" hint; rlm_eval's missing-`code` error
  explains it runs raw Python and shows an example. Schema descriptions for
  rlm_eval name/code sharpened.
- #2659: likely_field_corrections gains RLM source-field rename hints (the
  role/type vocabulary change itself lives in the WS3 PR #2684 to avoid a
  double-edit of normalize_role_alias).

Deferred to the medium-risk batch: #2648 (render deferred-tool hydration
distinctly from "done") — needs a ToolStatus/cell-build change with wider
render blast radius than this low-risk PR.

Verification: cargo test -p codewhale-tui --bins → 3944 passed, 0 failed
(incl. prefix-cache sort invariant); cargo clippy clean.
Targets codex/v0.8.53.
2026-06-03 11:31:33 -07:00
Hunter Bown 725abeb603 fix(subagent): clearer role vocab, lifecycle signals, and eval ergonomics
Make the sub-agent surface easier for less-capable models to drive:

- Unify role/type vocabulary (#2649): normalize_role_alias now accepts the
  full set SubAgentType::from_str accepts (reviewer/implementer/verifier/...),
  and SubAgentType::from_str learns `planner`, so the dual-validation pass no
  longer rejects natural roles with a stale four-value hint. Error strings and
  schema descriptions now enumerate the real accepted aliases.
- agent_eval/agent_close always active (#2605) so a first call executes instead
  of hydrating its schema and forcing a double-invoke; both accept an
  `agent_name` session alias (#2650).
- Self-diagnosing name conflicts (#2656): the duplicate-name error names the
  conflicting agent_id and its status.
- Self-describing completion sentinels (#2658): subagent.done now carries
  result_clipped / summary_complete / next_action so the parent knows whether
  to trust the previous-line summary or call agent_eval.
- Actionable child-model-unavailable diagnostics (#2653): a provider 403/404
  is annotated with the model id and recovery path instead of a bare error.

Tests: role vocabulary acceptance + error wording, agent_name resolution,
duplicate-name diagnostics, clipped-result sentinel, child-model annotation,
agent_eval/agent_close default-active. Full tui suite green (3948), clippy clean.

Targets codex/v0.8.53 (v0.8.53 stabilization).
2026-06-03 11:22:56 -07:00