feat(v0.8.44): SWE-bench adapter, markdown table fix, contributor sync, receipt truncation fix

- SWE-bench: codewhale swebench run/export writes prediction JSONL from working-tree diff, with untracked-file inclusion via git add -N - CLI: --workspace / -C global flag forwards to TUI for file ops - CLI: codewhale exec --auto semantics clarified in help text - Markdown: table pipes inside inline code no longer create phantom columns (split_table_cells with backtick-awareness) - Receipt: floor_char_boundary prevents multibyte UTF-8 slice panic - Contributors: Ling (LING71671 #1839 #1911), Ben Younes (ousamabenyounes #1938), jeoor npm fix (#1860) credited across all 3 READMEs - ja-JP README: 19 contributors synced to parity with EN/zh-CN (80 each) - Docs: SWEBENCH.md, RECURSIVE_SELF_IMPROVEMENT.md, MODES.md exec clarification - Sub-agent footer: Alt+V hint now says 'details' not 'raw'
2026-05-24 14:47:42 -05:00
parent 494988118c
commit 25ce4f5970
61 changed files with 1966 additions and 330 deletions
@@ -22,15 +22,16 @@ Run `/mode` to open the mode picker, or switch directly with `/mode agent`,
 - **Agent**: multi-step tool use. Approvals for shell and paid tools (file writes are allowed without a prompt).
 - **YOLO**: enables shell + trust mode and auto-approves all tools. Use only in trusted repos.

-All three modes have access to persistent RLM sessions through `rlm_open`, `rlm_eval`, `rlm_configure`, and `rlm_close`. Inside an RLM Python REPL, `sub_query_batch` fans out 1-16 cheap parallel child calls pinned to `deepseek-v4-flash`. The model reaches for it when work is too large or repetitive for the parent transcript.
+All action-capable modes have access to persistent RLM sessions through `rlm_open`, `rlm_eval`, `rlm_configure`, and `rlm_close`. Inside an RLM Python REPL, `sub_query_batch` fans out 1-16 cheap parallel child calls pinned to `deepseek-v4-flash`. The model reaches for it when work is too large or repetitive for the parent transcript.

 The fast `deepseek-v4-flash` / thinking-off path is called Fin in the product
 language. Fin is a seam for routing, summaries, cheap child calls, and
 coordination work; it does not change approval behavior.

-`/goal` sets a session objective with an optional token budget. It is goal
-tracking today, not a separate TUI mode. If CodeWhale grows a persistent Goal
-work surface later, it should remain distinct from `--model auto`.
+`/goal` sets a session objective with an optional token budget and keeps that
+objective visible as Work context. It does not change the active TUI mode,
+approval mode, or model route. This remains distinct from `--model auto`, which
+only controls model and thinking selection.

 ## Compatibility Notes

@@ -90,9 +91,10 @@ See `MCP.md`.
 Run `codewhale --help` for the canonical list. Common flags:

 - `-p, --prompt <TEXT>`: one-shot prompt mode (prints and exits)
- `codewhale exec --output-format stream-json <PROMPT>`: emit one JSON object per line for harnesses and backend wrappers
+- `codewhale exec --auto --output-format stream-json <PROMPT>`: run the tool-backed non-interactive agent and emit one JSON object per line for harnesses and backend wrappers
 - `codewhale exec --resume <ID|PREFIX> <PROMPT>` / `--session-id <ID|PREFIX>`: continue a saved session non-interactively
 - `codewhale exec --continue <PROMPT>`: continue the most recent saved session for this workspace non-interactively
+- `codewhale swebench run --instance-id <ID> --issue-file <PATH>`: run the tool-backed agent on one SWE-bench task and write/update a prediction JSONL row
 - `codewhale fork <ID|PREFIX>` / `codewhale fork --last`: copy a saved session into a new sibling session; forked sessions retain additive parent-session metadata and show that lineage in session listings
 - `--model <MODEL>`: when using the `codewhale` facade, forward a DeepSeek model override to the TUI
 - `--workspace <DIR>`: workspace root for file tools