docs: remove stale internal docs (handoffs, old audits, orphaned roadmaps)

2026-06-09 23:22:32 -07:00
parent b4edb4e1ef
commit bf2e5504a2
6 changed files with 0 additions and 503 deletions
@@ -1,37 +0,0 @@
-# v0.7.6 Legacy Rust Audit
-
-Status date: 2026-04-29
-
-This audit is deliberately non-destructive. No compatibility code is removed in v0.7.6 unless tests prove public CLI, saved-session, tool-schema, and documented command paths no longer depend on it.
-
-## Summary
-
-| Surface | Owner module | Current consumer | Reference check | Compatibility reason | Current warning | Recommended action |
-|---|---|---|---|---|---|---|
-| Legacy MCP sync API (`McpServerInput`, `list`, `add`, `remove`, `call_tool`, `load_legacy`) | `crates/tui/src/mcp.rs` | Not wired into current `/mcp` command path; retained behind `#[allow(dead_code)]` | Direct Rust references and current MCP command path inspected; saved/config JSON compatibility still needs a dedicated smoke | Preserves old JSON shape including `mcpServers` alias and sync call helpers while the async MCP manager is the active path | Code TODO only | Gate behind an explicit legacy module or remove after CLI/runtime parity tests prove no caller uses it. Tracked by #218. |
-| Legacy prompt constants/functions (`AGENT_PROMPT`, `YOLO_PROMPT`, `PLAN_PROMPT`, `base_system_prompt`, `normal_system_prompt`, etc.) | `crates/tui/src/prompts.rs` | Tests and older callers that still import prompt constants directly | Direct Rust references remain; public-crate and older harness imports are not proven absent | Layered prompt API replaced monolithic prompts, but older call sites may still compile against constants | None | Keep for v0.7.6; add deprecation annotations only after internal callers are migrated. Tracked by #219. |
-| `/compact` slash command positioning | `crates/tui/src/commands/mod.rs` | Public slash-command registry and help overlay | Public command registry/docs path still references it | Users may still run `/compact` manually when they want an immediate replacement-style summary | Description is intentionally explicit about manual compaction | Keep as a manual compatibility command; do not remove until context/token issues are resolved. |
-| `todo_*` compatibility tools | `crates/tui/src/tools/todo.rs` | Tool registry/model calls that still use `todo_add`, `todo_update`, `todo_list`, `todo_write` | Tool registry compatibility and saved tool-call risk remain | `checklist_*` is canonical, but old tool names may appear in saved prompts, traces, or model priors | Metadata marks `compat_alias: true`; descriptions say compatibility alias | Add explicit deprecation metadata with target version, then remove only after tool-schema migration evidence. Tracked by #220. |
-| Deprecated sub-agent alias tools (`spawn_agent`, `send_input`, delegate aliases) | `crates/tui/src/tools/subagent/mod.rs` | Tool registry and model/tool-call compatibility | Tool registry compatibility and saved tool-call risk remain | Canonical names are `agent_spawn`, `agent_send_input`, etc.; alias names preserve older tool-call compatibility | `_deprecation` metadata and tracing warn; removal target is `v0.8.0` | Keep through v0.7.x; removal already has metadata. Tracked by #221. |
-| Legacy root/provider TOML `api_key` compatibility | `crates/tui/src/config.rs`, `crates/config/src/lib.rs` | Config resolver; users with existing `api_key` in config files | Public config loading and docs still mention migration behavior | Keyring migration is preferred, but breaking existing configs would block startup/auth | Tracing warnings point to `deepseek auth set` / `deepseek auth migrate` | Keep; warnings are user-actionable. Removal should wait for a migration command and release-note window. |
-| Model alias canonicalization (`deepseek-chat`, `deepseek-reasoner`, older V3/R1 aliases) | `crates/tui/src/config.rs`, `crates/config/src/lib.rs` | Config/env/model picker normalization | Public docs and existing configs may still use aliases | Preserves old documented DeepSeek aliases and maps them to `deepseek-v4-flash` | Silent alias by design | Keep; removing aliases would break configs without meaningful benefit. |
-| Deprecated palette constants and aliases | `crates/tui/src/palette.rs`, `crates/tui/tests/palette_audit.rs` | Existing call sites plus audit tests | Palette audit enforces the remaining allowlist | Semantic aliases are preferred, but old constants exist to prevent broad style churn | Palette audit blocks direct deprecated uses outside allowlist | Keep aliases; continue moving call sites to semantic roles opportunistically. |
-
-## Follow-Up Removal Candidates
-
-These are not safe to remove in v0.7.6:
-
-1. #218 Legacy MCP sync API: requires a call-graph check and explicit CLI/runtime parity tests for `/mcp`, `deepseek mcp`, and MCP server validation flows.
-2. #219 Legacy prompt constants/functions: requires proving no public crate or older test harness imports them.
-3. #220 `todo_*` tool aliases: requires deprecation metadata and a saved-trace/tool-schema migration window.
-4. #221 Deprecated sub-agent alias tools: removal target is already encoded as `v0.8.0`, but the actual removal should be tracked and tested separately.
-
-## Verification Checklist
-
-Before removing any compatibility surface:
-
-1. Search direct Rust references with `rg`.
-2. Search docs and README command examples.
-3. Run workspace tests with all features.
-4. Run a saved-session/tool-call compatibility smoke if the surface affects tool schemas or persisted history.
-5. Keep a release-note entry and, for user-visible config/tool changes, a migration hint for at least one minor release.
@@ -1,89 +0,0 @@
-# MiMo v2.5 Benchmarking — Known Issues
-
-Tracking doc for quirks and workarounds when benchmarking Xiaomi MiMo v2.5
-through CodeWhale's harness integrations.
-
-## PinchBench
-
-### Issue 1: Model validation requires OpenRouter prefix
-
-PinchBench validates models against OpenRouter's `/models` endpoint. If you
-pass `mimo-v2.5-pro` without the `openrouter/xiaomi/` prefix, validation is
-skipped entirely (it assumes it's a non-OpenRouter model). This means you
-won't know if the model ID is wrong until the run fails.
-
-**Workaround:** Always use `openrouter/xiaomi/mimo-v2.5-pro` for OpenRouter
-routing, or use `--direct-mimo` for the Xiaomi API.
-
-### Issue 2: PinchBench requires OPENROUTER_API_KEY
-
-Even when using a direct provider, PinchBench's `lib_agent.py` checks for
-`OPENROUTER_API_KEY` in some code paths. The `--direct-mimo` flag in our
-runner works around this by setting up a custom OpenAI-compatible provider
-entry in OpenClaw's `models.json` and exporting `OPENAI_API_KEY`/`OPENAI_BASE_URL`.
-
-### Issue 3: Token Plan vs Pay-as-you-go key mismatch
-
-Xiaomi MiMo has two API endpoints:
- **Token Plan** (`tp-` keys): `https://token-plan-sgp.xiaomimimo.com/v1`
- **Pay-as-you-go** (`sk-` keys): `https://api.xiaomimimo.com/v1`
-
-Using the wrong key type with the wrong endpoint produces auth errors. The
-runner now detects this and warns.
-
-### Issue 4: OpenClaw is the runtime, not CodeWhale
-
-PinchBench runs tasks through OpenClaw, not CodeWhale. This means the
-benchmark measures MiMo v2.5's performance through OpenClaw's agent harness,
-not through CodeWhale's tool system. For CodeWhale-native evaluation,
-Terminal-Bench (via Harbor) is the better fit.
-
-**Future:** Create a CodeWhale-native PinchBench adapter that loads tasks
-from PinchBench's `tasks/` directory and runs them through `codewhale exec`.
-
-## Terminal-Bench (Harbor)
-
-### Issue 1: MiMo provider routing
-
-Harbor passes models as `provider/model` format. For MiMo via OpenRouter,
-use `openrouter/xiaomi/mimo-v2.5-pro`. For direct Xiaomi API, pass
-`--provider xiaomi-mimo` as an extra agent flag.
-
-### Issue 2: Container environment
-
-The Harbor adapter installs codewhale via npm in the container. MiMo API
-keys must be forwarded from the host environment. The adapter checks for
-`XIAOMI_MIMO_API_KEY`, `OPENROUTER_API_KEY`, and `OPENAI_API_KEY`.
-
-## SWE-bench
-
-### Issue 1: MiMo thinking mode
-
-MiMo v2.5 Pro supports extended thinking. For SWE-bench patch generation,
-ensure the thinking level is set appropriately. The `--thinking high` flag
-is passed through the CLI.
-
-### Issue 2: Context window
-
-MiMo v2.5 Pro has a 128K context window. Large SWE-bench instances (e.g.,
-Django, sympy) may benefit from the full window. No special handling needed,
-but worth monitoring token usage.
-
-## Environment Variables Reference
-
-```
-# Xiaomi MiMo direct API
-XIAOMI_MIMO_API_KEY=tp-...    # Token Plan key
-XIAOMI_MIMO_API_KEY=sk-...    # Pay-as-you-go key
-XIAOMI_MIMO_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
-XIAOMI_MIMO_MODEL=mimo-v2.5-pro
-
-# Aliases also accepted
-XIAOMI_API_KEY=...
-MIMO_API_KEY=...
-MIMO_BASE_URL=...
-MIMO_MODEL=...
-
-# OpenRouter (for MiMo via OpenRouter)
-OPENROUTER_API_KEY=...
-```
@@ -1,84 +0,0 @@
-\# CodeWhale Review Pipeline
-
-
-
-Welcome to CodeWhale! We receive a high volume of community PRs. To ensure a smooth and fast review process, please review our pipeline expectations below. 
-
-
-
-\## 1. CI Gates (Pre-Review Checklist)
-
-Before a maintainer reviews your PR, it must pass our continuous integration (CI) checks. 
-
-
-
-\*\*Required Checks (Must Pass):\*\*
-
-Please run these locally before pushing your code to avoid CI failures:
-
-\* \*\*Format:\*\* `cargo fmt --all -- --check`
-
-\* \*\*Linting:\*\* `cargo clippy --workspace --all-targets --all-features`
-
-\* \*\*Tests:\*\* `cargo test --workspace --all-features --locked`
-
-
-
-\*\*Informational Checks:\*\*
-
-Checks from \*\*Greptile\*\* and \*\*GitGuardian\*\* are informational. If they flag something, review it, but they do not strictly block a review on their own unless a secret is leaked.
-
-
-
-\## 2. Common Failure Modes \& Local Fixes
-
-If CI fails, it is usually one of these three reasons:
-
-\* \*\*Version Drift (`Cargo.lock` out of date):\*\* Run `cargo update` or `cargo build` locally to update the lockfile and commit the changes.
-
-\* \*\*Lint Failures:\*\* Check the clippy warnings from the command above and fix the specific lines flagged.
-
-\* \*\*Windows Test Flakiness:\*\* Occasionally, tests may time out on Windows runners. If you are confident your code didn't break it, leave a comment asking a maintainer to re-trigger the CI.
-
-
-
-\## 3. PR Etiquette
-
-To help us review your code quickly, please adhere to the following:
-
-\* \*\*One Concern Per PR:\*\* Keep diffs highly focused. Do not mix refactoring with new feature additions.
-
-\* \*\*Link the Issue:\*\* Always include `Closes #N` (replace N with the issue number) in your PR description so GitHub automatically links them.
-
-\* \*\*Rebase:\*\* Always rebase your branch onto the latest `main` branch before requesting a review.
-
-
-
-\## 4. The Review Workflow
-
-Once CI is green, your PR enters the review queue.
-
-\* \*\*Who reviews:\*\* Core maintainers will review the PR. 
-
-\* \*\*`autonomous-ready` Label:\*\* If a maintainer applies this label, it means the PR is approved in concept and is queued for our automated integration system.
-
-\* \*\*The Nightly Loop:\*\* We run extensive integration loops overnight. If your PR is approved, it may wait for this nightly loop before final merging to ensure system stability.
-
-
-
-\## 5. Post-Merge Actions
-
-After your code is merged, the following automated actions occur:
-
-\* `CHANGELOG.md` is updated.
-
-\* `npm` wrappers are synced.
-
-\* Binary rebuilds are triggered for all platforms.
-
-\* Website and documentation are synced with your new changes.
-
-
-
-Thank you for contributing to CodeWhale!
-
@@ -1,92 +0,0 @@
-# RLM Branching Roadmap
-
-This note records the v0.8.45 design direction for RLM, DSPy, GEPA, and Model
-Lab without adding runtime dependencies or changing the live agent loop.
-
-## Branching Primitive
-
-CodeWhale uses the same branching primitive at three scales:
-
-1. Release tracks. Each milestone fans into named tracks. A track must stay
-   independently reviewable, mergeable, and slippable. Unfinished work rolls
-   forward instead of blocking the release.
-2. Capability worksets. Model Lab capabilities such as Hugging Face,
-   observability, evals, serving, DSPy, GEPA, and training infrastructure ship
-   as opt-in worksets with their own feature flag, install path, license note,
-   and telemetry posture.
-3. Pareto compile branches. Optimizable modules keep candidate
-   `(instructions, demos, score)` triples. Branches that violate pinned
-   constitution clauses are pruned; branches that win at least one eval remain
-   on the frontier until the maintainer lands or rejects them.
-
-The maintainer chooses the frontier point. CodeWhale should not collapse
-branches prematurely.
-
-## v0.8.45
-
- Close the current control-plane and workbench issues before the broader
-  fan-out begins: #1982, #2027, #2032, #2016, and #2034.
- Keep `AGENTS.md` and `CLAUDE.md` maintainer-local. `AGENTS.md` is ignored
-  from this milestone forward.
- Land the RLM symbolic-object substrate: active prompt, session metadata,
-  transcript, latest user message, and per-message refs are named objects that
-  RLM can open without copying raw prompt/history text into the parent
-  transcript.
-
-## v0.8.46
-
- Generalize Fin into a structured-feedback verifier substrate.
- Add first replay-eval definitions harvested from existing trajectories.
- Scaffold the Repeatability Score footer slot as pending until evals populate
-  it.
- Add module artifact schema v0 as Rust types only.
- Draft the "Compiled Word" constitution article.
-
-## v0.8.47
-
- Promote Hugging Face as a first-class provider through Inference Providers
-  and Router.
- Add deterministic RLM replay: context snapshot, seed, child model IDs, and
-  temperatures.
- Route large logs and payloads to RLM workbench sessions instead of the
-  parent transcript.
- Add sub-query memoization keyed by prompt, context hash, and model.
- Enforce RLM budgets at the Rust registry layer: depth, calls, wall time, and
-  cost.
-
-## v0.8.48
-
- Remove the legacy `deepseek` and `deepseek-tui` shim binaries.
- Finish Docker and Homebrew rename cleanup.
- Populate Repeatability Score from a small offline eval suite that ships in
-  core.
-
-## v0.9.0
-
- Emit per-turn `trajectory.jsonl` as the trainset substrate.
- Add `codewhale replay <turn_id>` for deterministic replay.
- Render module artifacts from the `[[ ## field ## ]]` form through a Rust
-  adapter.
- Land the eval pipeline: suites, replay evals, and measurement substrate.
- Add a `/compile` command stub that explains the offline loop.
-
-## v0.10.0
-
- Add opt-in Model Lab workset installers for DSPy and GEPA. The default
-  install keeps zero Python dependencies.
- Build the first offline compile pipeline: Rust harvests trainsets, a Python
-  sidecar runs the optimizer, and CodeWhale emits a reviewed Module JSON
-  artifact.
- Add the Compile TUI panel with Pareto frontier, lineage tree, and
-  Land/Reject/Revise actions.
- Land the first optimized tool-description and agent-prompt artifacts through
-  PRs. Constitution clauses remain pinned outside the optimized region.
- Add whale-species module passports, for example
-  `Sei: codewhale-agent-prompt.v0.10.0-gepa-1`.
-
-## Trust Boundary
-
-Compilation is offline. Runtime consumes reviewed JSON artifacts. Online
-closed-loop optimization is out of scope because adversarial users could game a
-live coding harness. Any workset can fail independently without dragging the
-release, the core runtime, or other Pareto branches with it.
@@ -1,61 +0,0 @@
-# v0.7.5 Implementation Plan
-
-Scope: background shell job UX, in-TUI MCP management/discovery, and V4
-context/cache policy. Do not include provider expansion or Whalescale
-rename/migration work in this release lane.
-
-## Context/cache decision
-
-Default path:
-
- Keep the transcript append-only and preserve the stable prefix for DeepSeek V4 cache reuse.
- Disable replacement-style `auto_compact` by default.
- Keep replacement compaction manual or late: if a user enables `auto_compact`, V4 compacts only near the 80% model-window guard (`800000` tokens for 1M-context models), not at reasoning-effort soft caps.
- Keep the Flash seam manager (`[context].enabled`) opt-in until issue #200 has repeatable cache-hit/miss evidence.
- Keep the capacity controller disabled by default. Treat it as telemetry or an experimental guardrail unless `capacity.enabled = true` is set.
- Use emergency overflow recovery only when the request would otherwise exceed the model input budget.
-
-Rationale: V4's 1M-token window and prefix-cache economics make early
-replacement compaction suspect. The first shippable slice should prevent old
-128K-era heuristics from rewriting context before there is evidence that the
-rewrite is cheaper and more reliable than preserving a hot prefix.
-
-## Shippable slices
-
-### Slice 1: Context policy and docs
-
- Change default `auto_compact` to off.
- Keep V4 replacement-compaction thresholds late and independent of reasoning effort.
- Make `[context].enabled` default to false.
- Make `docs/CONFIGURATION.md`, `docs/capacity_controller.md`, and `config.example.toml` match code defaults.
- Add focused tests for defaults and V4 threshold behavior.
-
-### Slice 2: Background shell job center (#195)
-
- Add a job-center view fed by `ShellManager::list()`.
- Show command, cwd, linked task id when available, status, elapsed time, exit code, and latest output.
- Add controls to inspect full output, poll latest output, send stdin for PTY/stdin-capable jobs, kill a background job, and attach completed output as task evidence.
- Mark restart-stale jobs explicitly rather than presenting them as live.
- Add lifecycle tests for start, poll, cancel, complete, stale/restart, plus TUI snapshots for running and completed job details.
-
-### Slice 3: MCP manager (#196)
-
- Add `/mcp` or a command-palette action that opens an MCP manager view.
- Show resolved config path, server enabled/disabled state, transport, command/url, timeout settings, startup errors, and discovered tool/resource/prompt counts.
- Wire `mcp_config_path` into the interactive config surface.
- Support init, add stdio server, add HTTP/SSE server, enable, disable, remove, validate, reconnect, and inspect tools/resources/prompts.
- Preserve both `servers` and `mcpServers` config shapes.
-
-### Slice 4: MCP discoverability (#197)
-
- Add an MCP command-palette section backed by the same discovery state as the manager.
- Group tools/resources/prompts by server.
- Show disabled/failed servers without blocking palette rendering.
- Keep model-visible names consistent with `mcp_<server>_<tool>`.
-
-## Stop rules
-
- Do not close #159 or #162 unless a verified PR actually resolves them.
- Do not add provider expansion.
- Do not rename or migrate anything to Whalescale.
- Do not broaden the TUI into a large redesign; each slice should remain independently testable and shippable.