Commit Graph

2188 Commits

Author SHA1 Message Date
Hunter B edd28066e1 chore(release): v0.8.54 — benchmark harness runners, MiMo routing 2026-06-08 06:47:21 -07:00
Hunter B ce46e29e38 fix(benchmarks): fix workspace file copying and add LLM judge grading
Two bugs from the initial run:
1. workspace_files format is [{source, dest}] not {path, content} —
   files live in PinchBench's assets/ directory, not tasks/. Now checks
   both tasks/ and assets/ directories.
2. LLM judge tasks (writing, research) scored 0% because the judge
   wasn't implemented. Now uses codewhale exec as the judge — sends
   the rubric + workspace contents and parses a JSON score response.

Also strips ANSI escape codes and control characters from judge output
to prevent JSON parse failures.
2026-06-05 15:57:06 -07:00
Hunter B c8fcef7f1e feat(benchmarks): add CodeWhale-native PinchBench runner
Runs PinchBench tasks directly through codewhale exec --auto instead
of going through OpenClaw. Loads task markdown, creates workspace,
runs the prompt, and grades using PinchBench's embedded automated
checks.

No external agent framework dependency — just codewhale + pyyaml.
2026-06-04 20:26:05 -07:00
Hunter B b7798ba0f6 feat(benchmarks): default PinchBench to direct MiMo routing, auto-read config
PinchBench runner now defaults to direct Xiaomi API (no OpenRouter).
Reads API key from ~/.codewhale/config.toml [providers.xiaomi_mimo]
when XIAOMI_MIMO_API_KEY env var is not set. --openrouter flag for
the old OpenRouter path.
2026-06-04 19:38:46 -07:00
Hunter B a5f27aae3a feat(benchmarks): default PinchBench to MiMo v2.5 Pro, add direct-mimo routing
PinchBench runner now defaults to openrouter/xiaomi/mimo-v2.5-pro instead
of deepseek/deepseek-chat. Adds --direct-mimo flag for routing through
Xiaomi's API directly (bypasses OpenRouter), with tp-/sk- key type
detection and endpoint mismatch warnings.

Harbor adapter gains --provider CLI flag for MiMo provider routing.

Known issues documented in docs/MIMO_BENCHMARK_ISSUES.md:
- PinchBench model validation requires OpenRouter prefix
- OPENROUTER_API_KEY needed even for some direct-provider paths
- Token Plan vs pay-as-you-go key/endpoint mismatch
- PinchBench runs through OpenClaw, not CodeWhale
2026-06-04 19:33:43 -07:00
Hunter B b329a532f5 feat(benchmarks): add SWE-bench, Terminal-Bench, and PinchBench integration
Benchmark harness for evaluating CodeWhale against three external
benchmarks:

- SWE-bench: batch driver wrapping existing codewhale swebench commands
- Terminal-Bench: Harbor adapter (BaseInstalledAgent) for container eval
- PinchBench: runner with auto-install for real-world agent tasks

Includes docs/BENCHMARKS.md umbrella doc with setup, usage, and
reproducibility checklist. Scripts record version/commit/timestamp
metadata for each run.

Branch: codex/v0.8.53-benchmarks (based on v0.8.53)
2026-06-04 19:22:06 -07:00
Hunter Bown 8dff2f7525 fix(tui): guard xiaomi mimo defaults test against CI env vars 2026-06-03 16:25:04 -07:00
Hunter Bown 772ec46c98 chore(release): v0.8.53 — Arcee support, telegram bridge, provider fixes
- Fix Rust syntax/clippy fallout in client.rs, cli/src/lib.rs, web_search.rs
- Fix 0.8.53 release metadata: changelog links, TUI changelog, npm wrapper
- Update visible help copy for multi-provider support
- Add telegram-bridge integration with deploy configs
- Add US remote VM quickstart doc
- Update Tencent Cloud deploy scripts and docs
- Bump npm wrapper to 0.8.53
2026-06-03 16:12:38 -07:00
Hunter Bown f884ceb6af docs(readme): credit xyuai and RefuseOdd for v0.8.53 contributions 2026-06-03 15:43:05 -07:00
RefuseOdd 8b0e1cc3c0 Limit path suffix to chat completions 2026-06-03 15:34:24 -07:00
RefuseOdd d2999bb402 Add path_suffix to ProviderConfigToml and ProviderConfig
Adds an optional path_suffix field that lets users override the API path
for OpenAI-compatible endpoints. When set, the suffix replaces the default
/v1/<path> pattern, enabling use with endpoints that don't accept /v1/
prefixes (e.g. /chat/completions instead of /v1/chat/completions).

Changes:
- ProviderConfigToml (config crate): path_suffix field
- ProviderConfig (tui crate): path_suffix field
- merge_provider_config: propagates path_suffix
- merge_project_provider_config: propagates path_suffix
- api_url: delegates to new api_url_with_suffix function
- api_url_with_suffix: uses suffix when present, skips /v1 versioning
- DeepSeekClient: reads path_suffix from config, passes to URL builder
- config.example.toml: documents the new option
- Tests for the new URL building behavior

Closes #2089
2026-06-03 15:34:24 -07:00
cyq 45562822f0 feat(agent): classify model families 2026-06-03 15:34:12 -07:00
reidliu41 195dd6b9ab fix(tui): hide shell prompt guidance when shell is disabled
Thread allow_shell into system prompt composition and remove shell-only guidance
  when shell tools are not available.

  This keeps the prompt aligned with the runtime tool catalog and prevents the
  model from trying exec_shell or task_shell_* after allow_shell = false.
2026-06-03 15:28:29 -07:00
xyuai dba332e8d5 fix(tui): persist provider switches to config 2026-06-03 15:28:17 -07:00
Hunter Bown 260ee737b0 style: cargo fmt 2026-06-03 15:18:19 -07:00
Hunter Bown be7a3e7e69 fix(tui): provider picker r shortcut with modifier guard
- add r/R shortcut to re-enter API key for any provider in picker
- guard against Ctrl/Alt/Meta modifiers (only plain r triggers)
- dynamic footer: 'apply' when key exists, 'set key' otherwise
- add 'R edit key' hint to picker footer
- add route/model to scoped auth status output
- add tests for r shortcut, ctrl-r guard, footer text, and route/model

Ports #2717 with review fix. Fixes #2662.
2026-06-03 15:14:39 -07:00
Hunter Bown 3f8e02d6cf docs(readme): add Hugging Face provider to all localized READMEs 2026-06-03 15:11:31 -07:00
Hunter Bown 4b990e190c docs(rfc): file decomposition plan for v0.9.0 2026-06-03 15:08:31 -07:00
Hunter Bown 5719301d1e fix(auth): all-provider auth status and scoped logout
- auth status shows every known provider with config/keyring/env status
- auth status --provider <id> shows detailed single-provider info
- auth list now probes keyring for all providers (was only active)
- /logout clears only the active provider's key (was clearing all)
- add clear_active_provider_api_key for scoped TOML key removal
- add Huggingface to ProviderArg enum
- add auth status tests for all-provider and scoped views

Fixes #2716
2026-06-03 15:08:28 -07:00
Hunter Bown d9ca5fbbff docs(tui): mirror v0.8.53 changelog 2026-06-03 14:43:08 -07:00
Hunter Bown 28a0f19c13 fix(provider): polish v0.8.53 routing and shell gating 2026-06-03 14:40:25 -07:00
Hunter Bown 5786584767 chore(release): bump workspace to 0.8.53 2026-06-03 12:39:01 -07:00
Hunter Bown ed4ec3f799 Merge branch 'codex/v0.8.53-deprecate-whale-md' into codex/v0.8.53 2026-06-03 12:38:00 -07:00
Hunter Bown d5c6856754 Merge branch 'codex/v0.8.53-toolsurface-design-docs' into codex/v0.8.53 2026-06-03 12:37:57 -07:00
Hunter Bown 8bc994e492 Merge branch 'codex/v0.8.53-tool-deferred-ux' into codex/v0.8.53 2026-06-03 12:37:53 -07:00
Hunter Bown a10e17a62a fix(context): prefer global AGENTS over WHALE 2026-06-03 12:37:39 -07:00
Hunter Bown aa4c734602 docs: align v0.8.53 tool surface notes 2026-06-03 12:37:39 -07:00
Hunter Bown f5c8d7e5c5 fix(subagent): align advertised role aliases 2026-06-03 12:37:39 -07:00
Hunter Bown 025089494b fix(rlm): include session object in source hints 2026-06-03 12:37:39 -07:00
Hunter Bown fc8ad7b3a8 feat(project): enrich repo constitution (invariants, branch policy, escalation)
Per the layered-authority clarification (base myth → global Constitution → repo
constitution = local law → task packet → runtime policy), extend
.codewhale/constitution.json beyond authority+verification with optional:

- protected_invariants — repo invariants the agent must not break
- branch_policy — branch/release policy in effect
- escalate_when — conditions to stop and escalate to the user

All optional; rendered as concise model-facing prose. The global Brother Whale
identity anchor and Constitution in prompts/base.md are unchanged (verified
untouched on this branch). Dogfood constitution.json filled with CodeWhale's
real invariants (prefix-cache byte-stability, transcript replay, stable Rust,
cli/tui parity), branch policy (codex/v0.8.53), and escalation rules. Docs note
the layered hierarchy.

cargo test -p codewhale-tui --bins → 3946 passed; clippy clean.
2026-06-03 12:16:06 -07:00
Hunter Bown 9d9616e898 feat(project): deprecate WHALE.md; add .codewhale/constitution.json authority layer
Splits repo-level guidance into two clear artifacts and deprecates the
confusing WHALE.md concept (overlapped with AGENTS.md):

- AGENTS.md is the canonical cross-agent project-instructions file.
- .codewhale/constitution.json is the CodeWhale-specific repo authority /
  prioritization policy (when local sources conflict, which to trust first; what
  to verify before claiming done). Rendered into the system prompt as a
  higher-authority <codewhale_repo_constitution> block; takes precedence over a
  legacy WHALE.md.

WHALE.md migration (compat-preserving):
- AGENTS.md now ranks above WHALE.md in both project and global discovery; with
  both present, AGENTS.md wins.
- WHALE.md is still read as a legacy fallback, but now emits a deprecation
  warning and is never created or recommended (init.rs no longer suggests it).
- Discovery/docs updated; the global CodeWhale Constitution in prompts/base.md
  is unaffected (different thing).

constitution.json:
- New RepoConstitution (serde, all fields optional, unknown fields ignored,
  schema_version checked). Discovered at .codewhale/constitution.json in the
  workspace or any parent up to the git root. Malformed JSON warns, never panics.
- Loaded after the auto-generate fallback so it can't be clobbered.

.gitignore: ignore .codewhale/ contents at any depth EXCEPT the committed
constitution.json (a directory exclude can't be negated, so **/.codewhale/* +
negation). init.rs writes the same pattern for new repos. Dogfood: this repo's
.codewhale/constitution.json added.

find_git_root made pub(crate) and reused (no duplicate loader).

Tests: AGENTS-over-WHALE precedence, WHALE legacy-read-with-warning,
constitution render + system-block surfacing, malformed-constitution warning,
gitignore-keeps-constitution. cargo test -p codewhale-tui --bins → 3946 passed;
clippy clean.

Targets codex/v0.8.53.
2026-06-03 12:12:34 -07:00
Hunter Bown 8cb4f94f30 docs: v0.8.53 tool-surface-diet design + north-star direction
Design-only deliverables for the v0.8.53 "tool surface diet / canonical
surfaces" cutover (no catalog code in this cycle). Grounded in a verified
inventory of the actual tool registry.

- docs/TOOL_LIFECYCLE.md (#2681): the umbrella policy. Five lifecycle states
  (active / deferred / hidden-compatibility / deprecated / removed) modeled as
  const name-sets + an alias table in tool_catalog.rs (not a per-ToolSpec
  field), so registration stays untouched and old transcripts always replay.
  Includes the deprecation manifest (exec_wait/exec_interact/tts →
  hidden-compat; todo_* → checklist_* deprecated; 11 legacy subagent names are
  already non-visible dead code → cleanup + guardrail), per-mode/per-provider
  active-catalog budget (incl. Arcee's 8-tool first-turn set), prefix-cache
  safety rules, and the tool_agent decision: canonical but DeepSeek-V4-gated.
- docs/CODEBASE_SEARCH_DESIGN.md (#2680, v0.9.0): local-first FTS5/BM25 +
  symbol/path ranking + RRF hybrid; rusqlite storage; mtime/branch/vendor
  invalidation; an explainable tool contract returning reasons[]; and a real
  CodeWhale query eval set. Complements grep_files/file_search, never replaces.
- docs/SKILL_INVOCATION_DESIGN.md (0.9.0): the $<skill-name> inline invocation
  syntax (the token IS the skill name), namespaced resolution, ambiguity-
  suggests-not-guesses, visible activation line, and a smallest-viable slice.
- docs/VISION_NORTH_STAR.md (0.9.0+): intent router, hybrid codebase
  intelligence, WhaleFlow typed workflow IR, skills/rules runtime, the layered
  context-memory stack, tool repair/autoload, the evaluation loop, and the
  command-surface taxonomy (/memory small · /context dashboard · /rules ·
  /workflow · /overlay · $<skill> · codebase_search). Marked DIRECTION, not
  committed 0.8.53 work; also records the deferred-not-done diet items.

Targets codex/v0.8.53.
2026-06-03 11:47:29 -07:00
Hunter Bown 7bbc6b78e4 fix(tools): activate read-only git history + actionable RLM/field errors
v0.8.53 tool/deferred/error UX (PR group 4), low-risk subset:

- #2654: add git_log and git_show to DEFAULT_ACTIVE_NATIVE_TOOLS so read-only
  git history joins git_diff/git_status in the active partition (kept
  alphabetical → prefix-cache head stays sorted/byte-stable). git_blame and
  other history tools remain deferred.
- #2655: rlm_open's source-count error now echoes common misnamed fields with a
  "did you mean file_path/content/url" hint; rlm_eval's missing-`code` error
  explains it runs raw Python and shows an example. Schema descriptions for
  rlm_eval name/code sharpened.
- #2659: likely_field_corrections gains RLM source-field rename hints (the
  role/type vocabulary change itself lives in the WS3 PR #2684 to avoid a
  double-edit of normalize_role_alias).

Deferred to the medium-risk batch: #2648 (render deferred-tool hydration
distinctly from "done") — needs a ToolStatus/cell-build change with wider
render blast radius than this low-risk PR.

Verification: cargo test -p codewhale-tui --bins → 3944 passed, 0 failed
(incl. prefix-cache sort invariant); cargo clippy clean.
Targets codex/v0.8.53.
2026-06-03 11:31:33 -07:00
Hunter Bown 725abeb603 fix(subagent): clearer role vocab, lifecycle signals, and eval ergonomics
Make the sub-agent surface easier for less-capable models to drive:

- Unify role/type vocabulary (#2649): normalize_role_alias now accepts the
  full set SubAgentType::from_str accepts (reviewer/implementer/verifier/...),
  and SubAgentType::from_str learns `planner`, so the dual-validation pass no
  longer rejects natural roles with a stale four-value hint. Error strings and
  schema descriptions now enumerate the real accepted aliases.
- agent_eval/agent_close always active (#2605) so a first call executes instead
  of hydrating its schema and forcing a double-invoke; both accept an
  `agent_name` session alias (#2650).
- Self-diagnosing name conflicts (#2656): the duplicate-name error names the
  conflicting agent_id and its status.
- Self-describing completion sentinels (#2658): subagent.done now carries
  result_clipped / summary_complete / next_action so the parent knows whether
  to trust the previous-line summary or call agent_eval.
- Actionable child-model-unavailable diagnostics (#2653): a provider 403/404
  is annotated with the model id and recovery path instead of a bare error.

Tests: role vocabulary acceptance + error wording, agent_name resolution,
duplicate-name diagnostics, clipped-result sentinel, child-model annotation,
agent_eval/agent_close default-active. Full tui suite green (3948), clippy clean.

Targets codex/v0.8.53 (v0.8.53 stabilization).
2026-06-03 11:22:56 -07:00
Hunter Bown 03d1bba538 Merge pull request #2630 from Hmbown/codex/v0.8.52-home-cost-fixes
fix(release): tighten 0.8.52 home and cost accounting
2026-06-03 03:44:40 -07:00
Hunter B b965d2ecd5 fix(release): tighten 0.8.52 home and cost accounting 2026-06-03 03:35:46 -07:00
Hunter Bown c8ce2b8e92 Merge pull request #2626 from Hmbown/codex/v0.8.52-stabilization
fix(release): stabilize v0.8.52
2026-06-03 03:07:40 -07:00
Hunter B 32e6aa5e17 fix(tui): keep work panel summary during lock misses
Co-authored-by: Hanmiao Li <894876246@qq.com>
2026-06-03 02:59:17 -07:00
Hunter B 14c882be53 fix(provider): expose siliconflow-cn registry coverage 2026-06-03 02:51:42 -07:00
Hunter B 54446e6c07 fix(release): stabilize v0.8.52 2026-06-03 02:39:45 -07:00
Hunter Bown 25340d17a7 feat(provider): add SiliconFlow China region (siliconflow-CN) (#2615)
Adds SiliconFlow China regional endpoint (api.siliconflow.cn) as new provider variant.

Credit: @Raid10Without1 (PR #2588)

Co-authored-by: Raid10没有1 <88494433+Raid10Without1@users.noreply.github.com>
2026-06-02 21:27:40 -07:00
Hunter Bown dd26114697 feat(tui): send /attach images as multimodal content (#2584, #2587) (#2607)
Adds OpenAI-compatible image_url content blocks to the chat message
model, wiring attached images through build_chat_messages_with_reasoning
as multimodal user-content arrays. When images are present, user
messages emit a content array of text + image_url parts instead of a
plain string, matching the OpenAI vision API shape.

- models.rs: new ImageUrlContent struct, ContentBlock::ImageUrl variant
- client/chat.rs: image_parts collection, multimodal wire format for
  user messages, image-aware message inspection, stream-event no-op
- Exhaustiveness arms added across 10 files (compaction, seam_manager,
  capacity_flow, purge, notifications, session_picker, utils,
  working_set, rlm/session, runtime_api)
- Test: request_builder_emits_openai_image_url_parts_for_user_images

Credit: @xyuai (PR #2587 — root cause + initial implementation)
Closes: #2584

Co-authored-by: xyuai <xyuai@users.noreply.github.com>
2026-06-02 21:27:31 -07:00
AresNing 8981d5c5fd feat: add subagent lifecycle hooks
Add subagent lifecycle hooks for better control over subagent initialization and teardown.
2026-06-02 20:48:09 -07:00
Gordon b4691bc082 feat(i18n): localize context-inspector surface across 7 locales
Localize the context-inspector surface across 7 locales for improved internationalization support.
2026-06-02 20:47:53 -07:00
Justin Gao 29acb87a9d feat(engine): inject mode-change runtime message and include mode in turn metadata
Inject a mode-change runtime message into the engine and include mode information in turn metadata for better tracking.
2026-06-02 20:47:36 -07:00
Hanmiao Li 1781312c7a feat(tui): add drag-to-resize sidebar width
Add drag-to-resize functionality for the TUI sidebar width, allowing users to interactively resize the sidebar.
2026-06-02 20:47:18 -07:00
Hunter B 2721b2a077 Merge branch 'codex/v0.8.51-arcee-provider' — v0.8.51 release
v0.8.51: Arcee provider, cycle removal, compaction improvements,
TUI fixes, model persistence, and community harvest.
2026-06-02 20:38:10 -07:00
Hunter B 541926eb38 docs(changelog): backfill v0.8.51 Fixed entries + Community credits
Adds 8 missing Fixed entries for commits that landed after the
release-prep commit (06612495f): DEC CSI fragment fix, engine panic
recovery, nested file-picker, command-palette scroll, .NET/Windows
env, config key warnings, diff-render whitespace, and model
persistence. Adds Community credits for contributors whose work
landed or shaped this release cycle.
2026-06-02 20:37:52 -07:00
Hunter B c000bd7e60 harvest(v0.8.51): diff-render whitespace fix + schema dead_code + model persistence + prompt updates
- fix(diff-render): preserve leading whitespace in patch content lines
  Credit: @zlh124 (PR #2591), with extra-space bug fixed.
- fix(tui): allow unused schema migration registry
  Credit: @reidliu41 (PR #2601).
- feat(tui): persist per-provider model selection from /model command
- docs(prompts): prefer gh --json CLI for GitHub triage in agent instructions
2026-06-02 20:30:31 -07:00
Claude f886f28acf test(tui): update walk-depth test for new default depth (#2488)
`workspace_completions_honor_configured_walk_depth` placed its probe file at
component depth 9 and asserted the *default* walk excludes it — true at the
old default (6) but not the new one (10). Move the probe to depth 12 so it
stays past the default while remaining within the explicit deeper walk (16)
and the unlimited (0) cases the test also exercises.

https://claude.ai/code/session_01MQrnh6wHfrEYN5BBdMarC1
2026-06-03 01:41:13 +00:00