PinchBench runner now defaults to openrouter/xiaomi/mimo-v2.5-pro instead
of deepseek/deepseek-chat. Adds --direct-mimo flag for routing through
Xiaomi's API directly (bypasses OpenRouter), with tp-/sk- key type
detection and endpoint mismatch warnings.
Harbor adapter gains --provider CLI flag for MiMo provider routing.
Known issues documented in docs/MIMO_BENCHMARK_ISSUES.md:
- PinchBench model validation requires OpenRouter prefix
- OPENROUTER_API_KEY needed even for some direct-provider paths
- Token Plan vs pay-as-you-go key/endpoint mismatch
- PinchBench runs through OpenClaw, not CodeWhale
Benchmark harness for evaluating CodeWhale against three external
benchmarks:
- SWE-bench: batch driver wrapping existing codewhale swebench commands
- Terminal-Bench: Harbor adapter (BaseInstalledAgent) for container eval
- PinchBench: runner with auto-install for real-world agent tasks
Includes docs/BENCHMARKS.md umbrella doc with setup, usage, and
reproducibility checklist. Scripts record version/commit/timestamp
metadata for each run.
Branch: codex/v0.8.53-benchmarks (based on v0.8.53)
Per the layered-authority clarification (base myth → global Constitution → repo
constitution = local law → task packet → runtime policy), extend
.codewhale/constitution.json beyond authority+verification with optional:
- protected_invariants — repo invariants the agent must not break
- branch_policy — branch/release policy in effect
- escalate_when — conditions to stop and escalate to the user
All optional; rendered as concise model-facing prose. The global Brother Whale
identity anchor and Constitution in prompts/base.md are unchanged (verified
untouched on this branch). Dogfood constitution.json filled with CodeWhale's
real invariants (prefix-cache byte-stability, transcript replay, stable Rust,
cli/tui parity), branch policy (codex/v0.8.53), and escalation rules. Docs note
the layered hierarchy.
cargo test -p codewhale-tui --bins → 3946 passed; clippy clean.
Splits repo-level guidance into two clear artifacts and deprecates the
confusing WHALE.md concept (overlapped with AGENTS.md):
- AGENTS.md is the canonical cross-agent project-instructions file.
- .codewhale/constitution.json is the CodeWhale-specific repo authority /
prioritization policy (when local sources conflict, which to trust first; what
to verify before claiming done). Rendered into the system prompt as a
higher-authority <codewhale_repo_constitution> block; takes precedence over a
legacy WHALE.md.
WHALE.md migration (compat-preserving):
- AGENTS.md now ranks above WHALE.md in both project and global discovery; with
both present, AGENTS.md wins.
- WHALE.md is still read as a legacy fallback, but now emits a deprecation
warning and is never created or recommended (init.rs no longer suggests it).
- Discovery/docs updated; the global CodeWhale Constitution in prompts/base.md
is unaffected (different thing).
constitution.json:
- New RepoConstitution (serde, all fields optional, unknown fields ignored,
schema_version checked). Discovered at .codewhale/constitution.json in the
workspace or any parent up to the git root. Malformed JSON warns, never panics.
- Loaded after the auto-generate fallback so it can't be clobbered.
.gitignore: ignore .codewhale/ contents at any depth EXCEPT the committed
constitution.json (a directory exclude can't be negated, so **/.codewhale/* +
negation). init.rs writes the same pattern for new repos. Dogfood: this repo's
.codewhale/constitution.json added.
find_git_root made pub(crate) and reused (no duplicate loader).
Tests: AGENTS-over-WHALE precedence, WHALE legacy-read-with-warning,
constitution render + system-block surfacing, malformed-constitution warning,
gitignore-keeps-constitution. cargo test -p codewhale-tui --bins → 3946 passed;
clippy clean.
Targets codex/v0.8.53.
Design-only deliverables for the v0.8.53 "tool surface diet / canonical
surfaces" cutover (no catalog code in this cycle). Grounded in a verified
inventory of the actual tool registry.
- docs/TOOL_LIFECYCLE.md (#2681): the umbrella policy. Five lifecycle states
(active / deferred / hidden-compatibility / deprecated / removed) modeled as
const name-sets + an alias table in tool_catalog.rs (not a per-ToolSpec
field), so registration stays untouched and old transcripts always replay.
Includes the deprecation manifest (exec_wait/exec_interact/tts →
hidden-compat; todo_* → checklist_* deprecated; 11 legacy subagent names are
already non-visible dead code → cleanup + guardrail), per-mode/per-provider
active-catalog budget (incl. Arcee's 8-tool first-turn set), prefix-cache
safety rules, and the tool_agent decision: canonical but DeepSeek-V4-gated.
- docs/CODEBASE_SEARCH_DESIGN.md (#2680, v0.9.0): local-first FTS5/BM25 +
symbol/path ranking + RRF hybrid; rusqlite storage; mtime/branch/vendor
invalidation; an explainable tool contract returning reasons[]; and a real
CodeWhale query eval set. Complements grep_files/file_search, never replaces.
- docs/SKILL_INVOCATION_DESIGN.md (0.9.0): the $<skill-name> inline invocation
syntax (the token IS the skill name), namespaced resolution, ambiguity-
suggests-not-guesses, visible activation line, and a smallest-viable slice.
- docs/VISION_NORTH_STAR.md (0.9.0+): intent router, hybrid codebase
intelligence, WhaleFlow typed workflow IR, skills/rules runtime, the layered
context-memory stack, tool repair/autoload, the evaluation loop, and the
command-surface taxonomy (/memory small · /context dashboard · /rules ·
/workflow · /overlay · $<skill> · codebase_search). Marked DIRECTION, not
committed 0.8.53 work; also records the deferred-not-done diet items.
Targets codex/v0.8.53.
Make the sub-agent surface easier for less-capable models to drive:
- Unify role/type vocabulary (#2649): normalize_role_alias now accepts the
full set SubAgentType::from_str accepts (reviewer/implementer/verifier/...),
and SubAgentType::from_str learns `planner`, so the dual-validation pass no
longer rejects natural roles with a stale four-value hint. Error strings and
schema descriptions now enumerate the real accepted aliases.
- agent_eval/agent_close always active (#2605) so a first call executes instead
of hydrating its schema and forcing a double-invoke; both accept an
`agent_name` session alias (#2650).
- Self-diagnosing name conflicts (#2656): the duplicate-name error names the
conflicting agent_id and its status.
- Self-describing completion sentinels (#2658): subagent.done now carries
result_clipped / summary_complete / next_action so the parent knows whether
to trust the previous-line summary or call agent_eval.
- Actionable child-model-unavailable diagnostics (#2653): a provider 403/404
is annotated with the model id and recovery path instead of a bare error.
Tests: role vocabulary acceptance + error wording, agent_name resolution,
duplicate-name diagnostics, clipped-result sentinel, child-model annotation,
agent_eval/agent_close default-active. Full tui suite green (3948), clippy clean.
Targets codex/v0.8.53 (v0.8.53 stabilization).
Refs #2569
Harvests the safe part of PR #2569 by allowing AtlasCloud provider-hinted namespaced model IDs to route exactly as requested, without freezing a volatile provider model catalog in the static registry.
Co-authored-by: lucaszhu-hue <lucas.zhu@atlascloud.ai>
- Define UnStrStr macro for uninstaller string functions
- Use un.StrStr instead of StrStr in uninstaller context
- Rewrite un.RemoveFromPath with correct offset calculations
and semicolon handling to prevent PATH corruption
- Use dynamic version fetch from GitHub API in CLASSROOM_INSTALL.md
Closes#1983
- Add scripts/installer/codewhale.nsi: NSIS installer that installs both
codewhale.exe and codewhale-tui.exe to %LOCALAPPDATA%\Programs\CodeWhale\bin,
adds to current-user PATH, and includes an uninstaller that cleans PATH
- Add docs/CLASSROOM_INSTALL.md: step-by-step checklist for IT admins
deploying CodeWhale in labs/classrooms, covering silent install, manual
fallback, API key provisioning, imaging notes, and troubleshooting
- Update docs/INSTALL.md: add Windows NSIS Installer section referencing
the new installer and classroom checklist
Support `! <command>` and `!command` in the TUI composer to run shell commands through the existing exec_shell path.
The shortcut keeps normal approval, sandbox, policy, transcript, and work-panel handling, while avoiding model context
pollution from local-only tool results.
Refs #1546
Refs #1722
Preserves auto_compact as opt-in, adds the saved threshold setting, keeps the 500K hard floor, and wires Ctrl+L as a manual compaction shortcut for context-pressure recovery.
Harvested from PR #1723 by @aboimpinto
Co-authored-by: Paulo Aboim Pinto <aboimpinto@gmail.com>
- Fix false 'Turn stalled' during long active turns with running tools.
Add turn_last_activity_at tracking and active-tool awareness to
reconcile_turn_liveness(). Three new tests cover the fix.
- Remove Qwen 3.7 Max OpenRouter preset from registry, picker, docs,
and tests. Qwen 3.7 Max is a hosted model; the preset will return
when an open-weight Qwen 3.7 release ships. MiniMax M3 remains as
a full 1M-context multimodal route.
- Sync root CHANGELOG to crates/tui/CHANGELOG for crates.io packaging.
Update docs/CONFIGURATION.md, docs/PROVIDERS.md, and README to
reflect the Qwen 3.7 removal. Regenerate web facts timestamp.