codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter B	edd28066e1	chore(release): v0.8.54 — benchmark harness runners, MiMo routing	2026-06-08 06:47:21 -07:00
Hunter B	ce46e29e38	fix(benchmarks): fix workspace file copying and add LLM judge grading Two bugs from the initial run: 1. workspace_files format is [{source, dest}] not {path, content} — files live in PinchBench's assets/ directory, not tasks/. Now checks both tasks/ and assets/ directories. 2. LLM judge tasks (writing, research) scored 0% because the judge wasn't implemented. Now uses codewhale exec as the judge — sends the rubric + workspace contents and parses a JSON score response. Also strips ANSI escape codes and control characters from judge output to prevent JSON parse failures.	2026-06-05 15:57:06 -07:00
Hunter B	c8fcef7f1e	feat(benchmarks): add CodeWhale-native PinchBench runner Runs PinchBench tasks directly through codewhale exec --auto instead of going through OpenClaw. Loads task markdown, creates workspace, runs the prompt, and grades using PinchBench's embedded automated checks. No external agent framework dependency — just codewhale + pyyaml.	2026-06-04 20:26:05 -07:00
Hunter B	b7798ba0f6	feat(benchmarks): default PinchBench to direct MiMo routing, auto-read config PinchBench runner now defaults to direct Xiaomi API (no OpenRouter). Reads API key from ~/.codewhale/config.toml [providers.xiaomi_mimo] when XIAOMI_MIMO_API_KEY env var is not set. --openrouter flag for the old OpenRouter path.	2026-06-04 19:38:46 -07:00
Hunter B	a5f27aae3a	feat(benchmarks): default PinchBench to MiMo v2.5 Pro, add direct-mimo routing PinchBench runner now defaults to openrouter/xiaomi/mimo-v2.5-pro instead of deepseek/deepseek-chat. Adds --direct-mimo flag for routing through Xiaomi's API directly (bypasses OpenRouter), with tp-/sk- key type detection and endpoint mismatch warnings. Harbor adapter gains --provider CLI flag for MiMo provider routing. Known issues documented in docs/MIMO_BENCHMARK_ISSUES.md: - PinchBench model validation requires OpenRouter prefix - OPENROUTER_API_KEY needed even for some direct-provider paths - Token Plan vs pay-as-you-go key/endpoint mismatch - PinchBench runs through OpenClaw, not CodeWhale	2026-06-04 19:33:43 -07:00
Hunter B	b329a532f5	feat(benchmarks): add SWE-bench, Terminal-Bench, and PinchBench integration Benchmark harness for evaluating CodeWhale against three external benchmarks: - SWE-bench: batch driver wrapping existing codewhale swebench commands - Terminal-Bench: Harbor adapter (BaseInstalledAgent) for container eval - PinchBench: runner with auto-install for real-world agent tasks Includes docs/BENCHMARKS.md umbrella doc with setup, usage, and reproducibility checklist. Scripts record version/commit/timestamp metadata for each run. Branch: codex/v0.8.53-benchmarks (based on v0.8.53)	2026-06-04 19:22:06 -07:00
Hunter Bown	8dff2f7525	fix(tui): guard xiaomi mimo defaults test against CI env vars	2026-06-03 16:25:04 -07:00
Hunter Bown	772ec46c98	chore(release): v0.8.53 — Arcee support, telegram bridge, provider fixes - Fix Rust syntax/clippy fallout in client.rs, cli/src/lib.rs, web_search.rs - Fix 0.8.53 release metadata: changelog links, TUI changelog, npm wrapper - Update visible help copy for multi-provider support - Add telegram-bridge integration with deploy configs - Add US remote VM quickstart doc - Update Tencent Cloud deploy scripts and docs - Bump npm wrapper to 0.8.53	2026-06-03 16:12:38 -07:00
Hunter Bown	f884ceb6af	docs(readme): credit xyuai and RefuseOdd for v0.8.53 contributions	2026-06-03 15:43:05 -07:00
RefuseOdd	8b0e1cc3c0	Limit path suffix to chat completions	2026-06-03 15:34:24 -07:00
RefuseOdd	d2999bb402	Add path_suffix to ProviderConfigToml and ProviderConfig Adds an optional path_suffix field that lets users override the API path for OpenAI-compatible endpoints. When set, the suffix replaces the default /v1/<path> pattern, enabling use with endpoints that don't accept /v1/ prefixes (e.g. /chat/completions instead of /v1/chat/completions). Changes: - ProviderConfigToml (config crate): path_suffix field - ProviderConfig (tui crate): path_suffix field - merge_provider_config: propagates path_suffix - merge_project_provider_config: propagates path_suffix - api_url: delegates to new api_url_with_suffix function - api_url_with_suffix: uses suffix when present, skips /v1 versioning - DeepSeekClient: reads path_suffix from config, passes to URL builder - config.example.toml: documents the new option - Tests for the new URL building behavior Closes #2089	2026-06-03 15:34:24 -07:00
cyq	45562822f0	feat(agent): classify model families	2026-06-03 15:34:12 -07:00
reidliu41	195dd6b9ab	fix(tui): hide shell prompt guidance when shell is disabled Thread allow_shell into system prompt composition and remove shell-only guidance when shell tools are not available. This keeps the prompt aligned with the runtime tool catalog and prevents the model from trying exec_shell or task_shell_* after allow_shell = false.	2026-06-03 15:28:29 -07:00
xyuai	dba332e8d5	fix(tui): persist provider switches to config	2026-06-03 15:28:17 -07:00
Hunter Bown	260ee737b0	style: cargo fmt	2026-06-03 15:18:19 -07:00
Hunter Bown	be7a3e7e69	fix(tui): provider picker r shortcut with modifier guard - add r/R shortcut to re-enter API key for any provider in picker - guard against Ctrl/Alt/Meta modifiers (only plain r triggers) - dynamic footer: 'apply' when key exists, 'set key' otherwise - add 'R edit key' hint to picker footer - add route/model to scoped auth status output - add tests for r shortcut, ctrl-r guard, footer text, and route/model Ports #2717 with review fix. Fixes #2662.	2026-06-03 15:14:39 -07:00
Hunter Bown	3f8e02d6cf	docs(readme): add Hugging Face provider to all localized READMEs	2026-06-03 15:11:31 -07:00
Hunter Bown	4b990e190c	docs(rfc): file decomposition plan for v0.9.0	2026-06-03 15:08:31 -07:00
Hunter Bown	5719301d1e	fix(auth): all-provider auth status and scoped logout - auth status shows every known provider with config/keyring/env status - auth status --provider <id> shows detailed single-provider info - auth list now probes keyring for all providers (was only active) - /logout clears only the active provider's key (was clearing all) - add clear_active_provider_api_key for scoped TOML key removal - add Huggingface to ProviderArg enum - add auth status tests for all-provider and scoped views Fixes #2716	2026-06-03 15:08:28 -07:00
Hunter Bown	d9ca5fbbff	docs(tui): mirror v0.8.53 changelog	2026-06-03 14:43:08 -07:00
Hunter Bown	28a0f19c13	fix(provider): polish v0.8.53 routing and shell gating	2026-06-03 14:40:25 -07:00
Hunter Bown	5786584767	chore(release): bump workspace to 0.8.53	2026-06-03 12:39:01 -07:00
Hunter Bown	ed4ec3f799	Merge branch 'codex/v0.8.53-deprecate-whale-md' into codex/v0.8.53	2026-06-03 12:38:00 -07:00
Hunter Bown	d5c6856754	Merge branch 'codex/v0.8.53-toolsurface-design-docs' into codex/v0.8.53	2026-06-03 12:37:57 -07:00
Hunter Bown	8bc994e492	Merge branch 'codex/v0.8.53-tool-deferred-ux' into codex/v0.8.53	2026-06-03 12:37:53 -07:00
Hunter Bown	a10e17a62a	fix(context): prefer global AGENTS over WHALE	2026-06-03 12:37:39 -07:00
Hunter Bown	aa4c734602	docs: align v0.8.53 tool surface notes	2026-06-03 12:37:39 -07:00
Hunter Bown	f5c8d7e5c5	fix(subagent): align advertised role aliases	2026-06-03 12:37:39 -07:00
Hunter Bown	025089494b	fix(rlm): include session object in source hints	2026-06-03 12:37:39 -07:00
Hunter Bown	fc8ad7b3a8	feat(project): enrich repo constitution (invariants, branch policy, escalation) Per the layered-authority clarification (base myth → global Constitution → repo constitution = local law → task packet → runtime policy), extend .codewhale/constitution.json beyond authority+verification with optional: - protected_invariants — repo invariants the agent must not break - branch_policy — branch/release policy in effect - escalate_when — conditions to stop and escalate to the user All optional; rendered as concise model-facing prose. The global Brother Whale identity anchor and Constitution in prompts/base.md are unchanged (verified untouched on this branch). Dogfood constitution.json filled with CodeWhale's real invariants (prefix-cache byte-stability, transcript replay, stable Rust, cli/tui parity), branch policy (codex/v0.8.53), and escalation rules. Docs note the layered hierarchy. cargo test -p codewhale-tui --bins → 3946 passed; clippy clean.	2026-06-03 12:16:06 -07:00
Hunter Bown	9d9616e898	feat(project): deprecate WHALE.md; add .codewhale/constitution.json authority layer Splits repo-level guidance into two clear artifacts and deprecates the confusing WHALE.md concept (overlapped with AGENTS.md): - AGENTS.md is the canonical cross-agent project-instructions file. - .codewhale/constitution.json is the CodeWhale-specific repo authority / prioritization policy (when local sources conflict, which to trust first; what to verify before claiming done). Rendered into the system prompt as a higher-authority <codewhale_repo_constitution> block; takes precedence over a legacy WHALE.md. WHALE.md migration (compat-preserving): - AGENTS.md now ranks above WHALE.md in both project and global discovery; with both present, AGENTS.md wins. - WHALE.md is still read as a legacy fallback, but now emits a deprecation warning and is never created or recommended (init.rs no longer suggests it). - Discovery/docs updated; the global CodeWhale Constitution in prompts/base.md is unaffected (different thing). constitution.json: - New RepoConstitution (serde, all fields optional, unknown fields ignored, schema_version checked). Discovered at .codewhale/constitution.json in the workspace or any parent up to the git root. Malformed JSON warns, never panics. - Loaded after the auto-generate fallback so it can't be clobbered. .gitignore: ignore .codewhale/ contents at any depth EXCEPT the committed constitution.json (a directory exclude can't be negated, so */.codewhale/ + negation). init.rs writes the same pattern for new repos. Dogfood: this repo's .codewhale/constitution.json added. find_git_root made pub(crate) and reused (no duplicate loader). Tests: AGENTS-over-WHALE precedence, WHALE legacy-read-with-warning, constitution render + system-block surfacing, malformed-constitution warning, gitignore-keeps-constitution. cargo test -p codewhale-tui --bins → 3946 passed; clippy clean. Targets codex/v0.8.53.	2026-06-03 12:12:34 -07:00
Hunter Bown	8cb4f94f30	docs: v0.8.53 tool-surface-diet design + north-star direction Design-only deliverables for the v0.8.53 "tool surface diet / canonical surfaces" cutover (no catalog code in this cycle). Grounded in a verified inventory of the actual tool registry. - docs/TOOL_LIFECYCLE.md (#2681): the umbrella policy. Five lifecycle states (active / deferred / hidden-compatibility / deprecated / removed) modeled as const name-sets + an alias table in tool_catalog.rs (not a per-ToolSpec field), so registration stays untouched and old transcripts always replay. Includes the deprecation manifest (exec_wait/exec_interact/tts → hidden-compat; todo_* → checklist_* deprecated; 11 legacy subagent names are already non-visible dead code → cleanup + guardrail), per-mode/per-provider active-catalog budget (incl. Arcee's 8-tool first-turn set), prefix-cache safety rules, and the tool_agent decision: canonical but DeepSeek-V4-gated. - docs/CODEBASE_SEARCH_DESIGN.md (#2680, v0.9.0): local-first FTS5/BM25 + symbol/path ranking + RRF hybrid; rusqlite storage; mtime/branch/vendor invalidation; an explainable tool contract returning reasons[]; and a real CodeWhale query eval set. Complements grep_files/file_search, never replaces. - docs/SKILL_INVOCATION_DESIGN.md (0.9.0): the $<skill-name> inline invocation syntax (the token IS the skill name), namespaced resolution, ambiguity- suggests-not-guesses, visible activation line, and a smallest-viable slice. - docs/VISION_NORTH_STAR.md (0.9.0+): intent router, hybrid codebase intelligence, WhaleFlow typed workflow IR, skills/rules runtime, the layered context-memory stack, tool repair/autoload, the evaluation loop, and the command-surface taxonomy (/memory small · /context dashboard · /rules · /workflow · /overlay · $<skill> · codebase_search). Marked DIRECTION, not committed 0.8.53 work; also records the deferred-not-done diet items. Targets codex/v0.8.53.	2026-06-03 11:47:29 -07:00
Hunter Bown	7bbc6b78e4	fix(tools): activate read-only git history + actionable RLM/field errors v0.8.53 tool/deferred/error UX (PR group 4), low-risk subset: - #2654: add git_log and git_show to DEFAULT_ACTIVE_NATIVE_TOOLS so read-only git history joins git_diff/git_status in the active partition (kept alphabetical → prefix-cache head stays sorted/byte-stable). git_blame and other history tools remain deferred. - #2655: rlm_open's source-count error now echoes common misnamed fields with a "did you mean file_path/content/url" hint; rlm_eval's missing-`code` error explains it runs raw Python and shows an example. Schema descriptions for rlm_eval name/code sharpened. - #2659: likely_field_corrections gains RLM source-field rename hints (the role/type vocabulary change itself lives in the WS3 PR #2684 to avoid a double-edit of normalize_role_alias). Deferred to the medium-risk batch: #2648 (render deferred-tool hydration distinctly from "done") — needs a ToolStatus/cell-build change with wider render blast radius than this low-risk PR. Verification: cargo test -p codewhale-tui --bins → 3944 passed, 0 failed (incl. prefix-cache sort invariant); cargo clippy clean. Targets codex/v0.8.53.	2026-06-03 11:31:33 -07:00
Hunter Bown	725abeb603	fix(subagent): clearer role vocab, lifecycle signals, and eval ergonomics Make the sub-agent surface easier for less-capable models to drive: - Unify role/type vocabulary (#2649): normalize_role_alias now accepts the full set SubAgentType::from_str accepts (reviewer/implementer/verifier/...), and SubAgentType::from_str learns `planner`, so the dual-validation pass no longer rejects natural roles with a stale four-value hint. Error strings and schema descriptions now enumerate the real accepted aliases. - agent_eval/agent_close always active (#2605) so a first call executes instead of hydrating its schema and forcing a double-invoke; both accept an `agent_name` session alias (#2650). - Self-diagnosing name conflicts (#2656): the duplicate-name error names the conflicting agent_id and its status. - Self-describing completion sentinels (#2658): subagent.done now carries result_clipped / summary_complete / next_action so the parent knows whether to trust the previous-line summary or call agent_eval. - Actionable child-model-unavailable diagnostics (#2653): a provider 403/404 is annotated with the model id and recovery path instead of a bare error. Tests: role vocabulary acceptance + error wording, agent_name resolution, duplicate-name diagnostics, clipped-result sentinel, child-model annotation, agent_eval/agent_close default-active. Full tui suite green (3948), clippy clean. Targets codex/v0.8.53 (v0.8.53 stabilization).	2026-06-03 11:22:56 -07:00
Hunter Bown	03d1bba538	Merge pull request #2630 from Hmbown/codex/v0.8.52-home-cost-fixes fix(release): tighten 0.8.52 home and cost accounting	2026-06-03 03:44:40 -07:00
Hunter B	b965d2ecd5	fix(release): tighten 0.8.52 home and cost accounting	2026-06-03 03:35:46 -07:00
Hunter Bown	c8ce2b8e92	Merge pull request #2626 from Hmbown/codex/v0.8.52-stabilization fix(release): stabilize v0.8.52	2026-06-03 03:07:40 -07:00
Hunter B	32e6aa5e17	fix(tui): keep work panel summary during lock misses Co-authored-by: Hanmiao Li <894876246@qq.com>	2026-06-03 02:59:17 -07:00
Hunter B	14c882be53	fix(provider): expose siliconflow-cn registry coverage	2026-06-03 02:51:42 -07:00
Hunter B	54446e6c07	fix(release): stabilize v0.8.52	2026-06-03 02:39:45 -07:00
Hunter Bown	25340d17a7	feat(provider): add SiliconFlow China region (siliconflow-CN) (#2615 ) Adds SiliconFlow China regional endpoint (api.siliconflow.cn) as new provider variant. Credit: @Raid10Without1 (PR #2588) Co-authored-by: Raid10没有1 <88494433+Raid10Without1@users.noreply.github.com>	2026-06-02 21:27:40 -07:00
Hunter Bown	dd26114697	feat(tui): send /attach images as multimodal content (#2584 , #2587 ) (#2607 ) Adds OpenAI-compatible image_url content blocks to the chat message model, wiring attached images through build_chat_messages_with_reasoning as multimodal user-content arrays. When images are present, user messages emit a content array of text + image_url parts instead of a plain string, matching the OpenAI vision API shape. - models.rs: new ImageUrlContent struct, ContentBlock::ImageUrl variant - client/chat.rs: image_parts collection, multimodal wire format for user messages, image-aware message inspection, stream-event no-op - Exhaustiveness arms added across 10 files (compaction, seam_manager, capacity_flow, purge, notifications, session_picker, utils, working_set, rlm/session, runtime_api) - Test: request_builder_emits_openai_image_url_parts_for_user_images Credit: @xyuai (PR #2587 — root cause + initial implementation) Closes: #2584 Co-authored-by: xyuai <xyuai@users.noreply.github.com>	2026-06-02 21:27:31 -07:00
AresNing	8981d5c5fd	feat: add subagent lifecycle hooks Add subagent lifecycle hooks for better control over subagent initialization and teardown.	2026-06-02 20:48:09 -07:00
Gordon	b4691bc082	feat(i18n): localize context-inspector surface across 7 locales Localize the context-inspector surface across 7 locales for improved internationalization support.	2026-06-02 20:47:53 -07:00
Justin Gao	29acb87a9d	feat(engine): inject mode-change runtime message and include mode in turn metadata Inject a mode-change runtime message into the engine and include mode information in turn metadata for better tracking.	2026-06-02 20:47:36 -07:00
Hanmiao Li	1781312c7a	feat(tui): add drag-to-resize sidebar width Add drag-to-resize functionality for the TUI sidebar width, allowing users to interactively resize the sidebar.	2026-06-02 20:47:18 -07:00
Hunter B	2721b2a077	Merge branch 'codex/v0.8.51-arcee-provider' — v0.8.51 release v0.8.51: Arcee provider, cycle removal, compaction improvements, TUI fixes, model persistence, and community harvest.	2026-06-02 20:38:10 -07:00
Hunter B	541926eb38	docs(changelog): backfill v0.8.51 Fixed entries + Community credits Adds 8 missing Fixed entries for commits that landed after the release-prep commit (`06612495f`): DEC CSI fragment fix, engine panic recovery, nested file-picker, command-palette scroll, .NET/Windows env, config key warnings, diff-render whitespace, and model persistence. Adds Community credits for contributors whose work landed or shaped this release cycle.	2026-06-02 20:37:52 -07:00
Hunter B	c000bd7e60	harvest(v0.8.51): diff-render whitespace fix + schema dead_code + model persistence + prompt updates - fix(diff-render): preserve leading whitespace in patch content lines Credit: @zlh124 (PR #2591), with extra-space bug fixed. - fix(tui): allow unused schema migration registry Credit: @reidliu41 (PR #2601). - feat(tui): persist per-provider model selection from /model command - docs(prompts): prefer gh --json CLI for GitHub triage in agent instructions	2026-06-02 20:30:31 -07:00
Claude	f886f28acf	test(tui): update walk-depth test for new default depth (#2488 ) `workspace_completions_honor_configured_walk_depth` placed its probe file at component depth 9 and asserted the default walk excludes it — true at the old default (6) but not the new one (10). Move the probe to depth 12 so it stays past the default while remaining within the explicit deeper walk (16) and the unlimited (0) cases the test also exercises. https://claude.ai/code/session_01MQrnh6wHfrEYN5BBdMarC1	2026-06-03 01:41:13 +00:00

1 2 3 4 5 ...

2188 Commits