docs: remove stale internal docs (handoffs, old audits, orphaned roadmaps)
This commit is contained in:
@@ -1,37 +0,0 @@
|
||||
# v0.7.6 Legacy Rust Audit
|
||||
|
||||
Status date: 2026-04-29
|
||||
|
||||
This audit is deliberately non-destructive. No compatibility code is removed in v0.7.6 unless tests prove public CLI, saved-session, tool-schema, and documented command paths no longer depend on it.
|
||||
|
||||
## Summary
|
||||
|
||||
| Surface | Owner module | Current consumer | Reference check | Compatibility reason | Current warning | Recommended action |
|
||||
|---|---|---|---|---|---|---|
|
||||
| Legacy MCP sync API (`McpServerInput`, `list`, `add`, `remove`, `call_tool`, `load_legacy`) | `crates/tui/src/mcp.rs` | Not wired into current `/mcp` command path; retained behind `#[allow(dead_code)]` | Direct Rust references and current MCP command path inspected; saved/config JSON compatibility still needs a dedicated smoke | Preserves old JSON shape including `mcpServers` alias and sync call helpers while the async MCP manager is the active path | Code TODO only | Gate behind an explicit legacy module or remove after CLI/runtime parity tests prove no caller uses it. Tracked by #218. |
|
||||
| Legacy prompt constants/functions (`AGENT_PROMPT`, `YOLO_PROMPT`, `PLAN_PROMPT`, `base_system_prompt`, `normal_system_prompt`, etc.) | `crates/tui/src/prompts.rs` | Tests and older callers that still import prompt constants directly | Direct Rust references remain; public-crate and older harness imports are not proven absent | Layered prompt API replaced monolithic prompts, but older call sites may still compile against constants | None | Keep for v0.7.6; add deprecation annotations only after internal callers are migrated. Tracked by #219. |
|
||||
| `/compact` slash command positioning | `crates/tui/src/commands/mod.rs` | Public slash-command registry and help overlay | Public command registry/docs path still references it | Users may still run `/compact` manually when they want an immediate replacement-style summary | Description is intentionally explicit about manual compaction | Keep as a manual compatibility command; do not remove until context/token issues are resolved. |
|
||||
| `todo_*` compatibility tools | `crates/tui/src/tools/todo.rs` | Tool registry/model calls that still use `todo_add`, `todo_update`, `todo_list`, `todo_write` | Tool registry compatibility and saved tool-call risk remain | `checklist_*` is canonical, but old tool names may appear in saved prompts, traces, or model priors | Metadata marks `compat_alias: true`; descriptions say compatibility alias | Add explicit deprecation metadata with target version, then remove only after tool-schema migration evidence. Tracked by #220. |
|
||||
| Deprecated sub-agent alias tools (`spawn_agent`, `send_input`, delegate aliases) | `crates/tui/src/tools/subagent/mod.rs` | Tool registry and model/tool-call compatibility | Tool registry compatibility and saved tool-call risk remain | Canonical names are `agent_spawn`, `agent_send_input`, etc.; alias names preserve older tool-call compatibility | `_deprecation` metadata and tracing warn; removal target is `v0.8.0` | Keep through v0.7.x; removal already has metadata. Tracked by #221. |
|
||||
| Legacy root/provider TOML `api_key` compatibility | `crates/tui/src/config.rs`, `crates/config/src/lib.rs` | Config resolver; users with existing `api_key` in config files | Public config loading and docs still mention migration behavior | Keyring migration is preferred, but breaking existing configs would block startup/auth | Tracing warnings point to `deepseek auth set` / `deepseek auth migrate` | Keep; warnings are user-actionable. Removal should wait for a migration command and release-note window. |
|
||||
| Model alias canonicalization (`deepseek-chat`, `deepseek-reasoner`, older V3/R1 aliases) | `crates/tui/src/config.rs`, `crates/config/src/lib.rs` | Config/env/model picker normalization | Public docs and existing configs may still use aliases | Preserves old documented DeepSeek aliases and maps them to `deepseek-v4-flash` | Silent alias by design | Keep; removing aliases would break configs without meaningful benefit. |
|
||||
| Deprecated palette constants and aliases | `crates/tui/src/palette.rs`, `crates/tui/tests/palette_audit.rs` | Existing call sites plus audit tests | Palette audit enforces the remaining allowlist | Semantic aliases are preferred, but old constants exist to prevent broad style churn | Palette audit blocks direct deprecated uses outside allowlist | Keep aliases; continue moving call sites to semantic roles opportunistically. |
|
||||
|
||||
## Follow-Up Removal Candidates
|
||||
|
||||
These are not safe to remove in v0.7.6:
|
||||
|
||||
1. #218 Legacy MCP sync API: requires a call-graph check and explicit CLI/runtime parity tests for `/mcp`, `deepseek mcp`, and MCP server validation flows.
|
||||
2. #219 Legacy prompt constants/functions: requires proving no public crate or older test harness imports them.
|
||||
3. #220 `todo_*` tool aliases: requires deprecation metadata and a saved-trace/tool-schema migration window.
|
||||
4. #221 Deprecated sub-agent alias tools: removal target is already encoded as `v0.8.0`, but the actual removal should be tracked and tested separately.
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
Before removing any compatibility surface:
|
||||
|
||||
1. Search direct Rust references with `rg`.
|
||||
2. Search docs and README command examples.
|
||||
3. Run workspace tests with all features.
|
||||
4. Run a saved-session/tool-call compatibility smoke if the surface affects tool schemas or persisted history.
|
||||
5. Keep a release-note entry and, for user-visible config/tool changes, a migration hint for at least one minor release.
|
||||
@@ -1,89 +0,0 @@
|
||||
# MiMo v2.5 Benchmarking — Known Issues
|
||||
|
||||
Tracking doc for quirks and workarounds when benchmarking Xiaomi MiMo v2.5
|
||||
through CodeWhale's harness integrations.
|
||||
|
||||
## PinchBench
|
||||
|
||||
### Issue 1: Model validation requires OpenRouter prefix
|
||||
|
||||
PinchBench validates models against OpenRouter's `/models` endpoint. If you
|
||||
pass `mimo-v2.5-pro` without the `openrouter/xiaomi/` prefix, validation is
|
||||
skipped entirely (it assumes it's a non-OpenRouter model). This means you
|
||||
won't know if the model ID is wrong until the run fails.
|
||||
|
||||
**Workaround:** Always use `openrouter/xiaomi/mimo-v2.5-pro` for OpenRouter
|
||||
routing, or use `--direct-mimo` for the Xiaomi API.
|
||||
|
||||
### Issue 2: PinchBench requires OPENROUTER_API_KEY
|
||||
|
||||
Even when using a direct provider, PinchBench's `lib_agent.py` checks for
|
||||
`OPENROUTER_API_KEY` in some code paths. The `--direct-mimo` flag in our
|
||||
runner works around this by setting up a custom OpenAI-compatible provider
|
||||
entry in OpenClaw's `models.json` and exporting `OPENAI_API_KEY`/`OPENAI_BASE_URL`.
|
||||
|
||||
### Issue 3: Token Plan vs Pay-as-you-go key mismatch
|
||||
|
||||
Xiaomi MiMo has two API endpoints:
|
||||
- **Token Plan** (`tp-` keys): `https://token-plan-sgp.xiaomimimo.com/v1`
|
||||
- **Pay-as-you-go** (`sk-` keys): `https://api.xiaomimimo.com/v1`
|
||||
|
||||
Using the wrong key type with the wrong endpoint produces auth errors. The
|
||||
runner now detects this and warns.
|
||||
|
||||
### Issue 4: OpenClaw is the runtime, not CodeWhale
|
||||
|
||||
PinchBench runs tasks through OpenClaw, not CodeWhale. This means the
|
||||
benchmark measures MiMo v2.5's performance through OpenClaw's agent harness,
|
||||
not through CodeWhale's tool system. For CodeWhale-native evaluation,
|
||||
Terminal-Bench (via Harbor) is the better fit.
|
||||
|
||||
**Future:** Create a CodeWhale-native PinchBench adapter that loads tasks
|
||||
from PinchBench's `tasks/` directory and runs them through `codewhale exec`.
|
||||
|
||||
## Terminal-Bench (Harbor)
|
||||
|
||||
### Issue 1: MiMo provider routing
|
||||
|
||||
Harbor passes models as `provider/model` format. For MiMo via OpenRouter,
|
||||
use `openrouter/xiaomi/mimo-v2.5-pro`. For direct Xiaomi API, pass
|
||||
`--provider xiaomi-mimo` as an extra agent flag.
|
||||
|
||||
### Issue 2: Container environment
|
||||
|
||||
The Harbor adapter installs codewhale via npm in the container. MiMo API
|
||||
keys must be forwarded from the host environment. The adapter checks for
|
||||
`XIAOMI_MIMO_API_KEY`, `OPENROUTER_API_KEY`, and `OPENAI_API_KEY`.
|
||||
|
||||
## SWE-bench
|
||||
|
||||
### Issue 1: MiMo thinking mode
|
||||
|
||||
MiMo v2.5 Pro supports extended thinking. For SWE-bench patch generation,
|
||||
ensure the thinking level is set appropriately. The `--thinking high` flag
|
||||
is passed through the CLI.
|
||||
|
||||
### Issue 2: Context window
|
||||
|
||||
MiMo v2.5 Pro has a 128K context window. Large SWE-bench instances (e.g.,
|
||||
Django, sympy) may benefit from the full window. No special handling needed,
|
||||
but worth monitoring token usage.
|
||||
|
||||
## Environment Variables Reference
|
||||
|
||||
```
|
||||
# Xiaomi MiMo direct API
|
||||
XIAOMI_MIMO_API_KEY=tp-... # Token Plan key
|
||||
XIAOMI_MIMO_API_KEY=sk-... # Pay-as-you-go key
|
||||
XIAOMI_MIMO_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
|
||||
XIAOMI_MIMO_MODEL=mimo-v2.5-pro
|
||||
|
||||
# Aliases also accepted
|
||||
XIAOMI_API_KEY=...
|
||||
MIMO_API_KEY=...
|
||||
MIMO_BASE_URL=...
|
||||
MIMO_MODEL=...
|
||||
|
||||
# OpenRouter (for MiMo via OpenRouter)
|
||||
OPENROUTER_API_KEY=...
|
||||
```
|
||||
@@ -1,84 +0,0 @@
|
||||
\# CodeWhale Review Pipeline
|
||||
|
||||
|
||||
|
||||
Welcome to CodeWhale! We receive a high volume of community PRs. To ensure a smooth and fast review process, please review our pipeline expectations below.
|
||||
|
||||
|
||||
|
||||
\## 1. CI Gates (Pre-Review Checklist)
|
||||
|
||||
Before a maintainer reviews your PR, it must pass our continuous integration (CI) checks.
|
||||
|
||||
|
||||
|
||||
\*\*Required Checks (Must Pass):\*\*
|
||||
|
||||
Please run these locally before pushing your code to avoid CI failures:
|
||||
|
||||
\* \*\*Format:\*\* `cargo fmt --all -- --check`
|
||||
|
||||
\* \*\*Linting:\*\* `cargo clippy --workspace --all-targets --all-features`
|
||||
|
||||
\* \*\*Tests:\*\* `cargo test --workspace --all-features --locked`
|
||||
|
||||
|
||||
|
||||
\*\*Informational Checks:\*\*
|
||||
|
||||
Checks from \*\*Greptile\*\* and \*\*GitGuardian\*\* are informational. If they flag something, review it, but they do not strictly block a review on their own unless a secret is leaked.
|
||||
|
||||
|
||||
|
||||
\## 2. Common Failure Modes \& Local Fixes
|
||||
|
||||
If CI fails, it is usually one of these three reasons:
|
||||
|
||||
\* \*\*Version Drift (`Cargo.lock` out of date):\*\* Run `cargo update` or `cargo build` locally to update the lockfile and commit the changes.
|
||||
|
||||
\* \*\*Lint Failures:\*\* Check the clippy warnings from the command above and fix the specific lines flagged.
|
||||
|
||||
\* \*\*Windows Test Flakiness:\*\* Occasionally, tests may time out on Windows runners. If you are confident your code didn't break it, leave a comment asking a maintainer to re-trigger the CI.
|
||||
|
||||
|
||||
|
||||
\## 3. PR Etiquette
|
||||
|
||||
To help us review your code quickly, please adhere to the following:
|
||||
|
||||
\* \*\*One Concern Per PR:\*\* Keep diffs highly focused. Do not mix refactoring with new feature additions.
|
||||
|
||||
\* \*\*Link the Issue:\*\* Always include `Closes #N` (replace N with the issue number) in your PR description so GitHub automatically links them.
|
||||
|
||||
\* \*\*Rebase:\*\* Always rebase your branch onto the latest `main` branch before requesting a review.
|
||||
|
||||
|
||||
|
||||
\## 4. The Review Workflow
|
||||
|
||||
Once CI is green, your PR enters the review queue.
|
||||
|
||||
\* \*\*Who reviews:\*\* Core maintainers will review the PR.
|
||||
|
||||
\* \*\*`autonomous-ready` Label:\*\* If a maintainer applies this label, it means the PR is approved in concept and is queued for our automated integration system.
|
||||
|
||||
\* \*\*The Nightly Loop:\*\* We run extensive integration loops overnight. If your PR is approved, it may wait for this nightly loop before final merging to ensure system stability.
|
||||
|
||||
|
||||
|
||||
\## 5. Post-Merge Actions
|
||||
|
||||
After your code is merged, the following automated actions occur:
|
||||
|
||||
\* `CHANGELOG.md` is updated.
|
||||
|
||||
\* `npm` wrappers are synced.
|
||||
|
||||
\* Binary rebuilds are triggered for all platforms.
|
||||
|
||||
\* Website and documentation are synced with your new changes.
|
||||
|
||||
|
||||
|
||||
Thank you for contributing to CodeWhale!
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
# RLM Branching Roadmap
|
||||
|
||||
This note records the v0.8.45 design direction for RLM, DSPy, GEPA, and Model
|
||||
Lab without adding runtime dependencies or changing the live agent loop.
|
||||
|
||||
## Branching Primitive
|
||||
|
||||
CodeWhale uses the same branching primitive at three scales:
|
||||
|
||||
1. Release tracks. Each milestone fans into named tracks. A track must stay
|
||||
independently reviewable, mergeable, and slippable. Unfinished work rolls
|
||||
forward instead of blocking the release.
|
||||
2. Capability worksets. Model Lab capabilities such as Hugging Face,
|
||||
observability, evals, serving, DSPy, GEPA, and training infrastructure ship
|
||||
as opt-in worksets with their own feature flag, install path, license note,
|
||||
and telemetry posture.
|
||||
3. Pareto compile branches. Optimizable modules keep candidate
|
||||
`(instructions, demos, score)` triples. Branches that violate pinned
|
||||
constitution clauses are pruned; branches that win at least one eval remain
|
||||
on the frontier until the maintainer lands or rejects them.
|
||||
|
||||
The maintainer chooses the frontier point. CodeWhale should not collapse
|
||||
branches prematurely.
|
||||
|
||||
## v0.8.45
|
||||
|
||||
- Close the current control-plane and workbench issues before the broader
|
||||
fan-out begins: #1982, #2027, #2032, #2016, and #2034.
|
||||
- Keep `AGENTS.md` and `CLAUDE.md` maintainer-local. `AGENTS.md` is ignored
|
||||
from this milestone forward.
|
||||
- Land the RLM symbolic-object substrate: active prompt, session metadata,
|
||||
transcript, latest user message, and per-message refs are named objects that
|
||||
RLM can open without copying raw prompt/history text into the parent
|
||||
transcript.
|
||||
|
||||
## v0.8.46
|
||||
|
||||
- Generalize Fin into a structured-feedback verifier substrate.
|
||||
- Add first replay-eval definitions harvested from existing trajectories.
|
||||
- Scaffold the Repeatability Score footer slot as pending until evals populate
|
||||
it.
|
||||
- Add module artifact schema v0 as Rust types only.
|
||||
- Draft the "Compiled Word" constitution article.
|
||||
|
||||
## v0.8.47
|
||||
|
||||
- Promote Hugging Face as a first-class provider through Inference Providers
|
||||
and Router.
|
||||
- Add deterministic RLM replay: context snapshot, seed, child model IDs, and
|
||||
temperatures.
|
||||
- Route large logs and payloads to RLM workbench sessions instead of the
|
||||
parent transcript.
|
||||
- Add sub-query memoization keyed by prompt, context hash, and model.
|
||||
- Enforce RLM budgets at the Rust registry layer: depth, calls, wall time, and
|
||||
cost.
|
||||
|
||||
## v0.8.48
|
||||
|
||||
- Remove the legacy `deepseek` and `deepseek-tui` shim binaries.
|
||||
- Finish Docker and Homebrew rename cleanup.
|
||||
- Populate Repeatability Score from a small offline eval suite that ships in
|
||||
core.
|
||||
|
||||
## v0.9.0
|
||||
|
||||
- Emit per-turn `trajectory.jsonl` as the trainset substrate.
|
||||
- Add `codewhale replay <turn_id>` for deterministic replay.
|
||||
- Render module artifacts from the `[[ ## field ## ]]` form through a Rust
|
||||
adapter.
|
||||
- Land the eval pipeline: suites, replay evals, and measurement substrate.
|
||||
- Add a `/compile` command stub that explains the offline loop.
|
||||
|
||||
## v0.10.0
|
||||
|
||||
- Add opt-in Model Lab workset installers for DSPy and GEPA. The default
|
||||
install keeps zero Python dependencies.
|
||||
- Build the first offline compile pipeline: Rust harvests trainsets, a Python
|
||||
sidecar runs the optimizer, and CodeWhale emits a reviewed Module JSON
|
||||
artifact.
|
||||
- Add the Compile TUI panel with Pareto frontier, lineage tree, and
|
||||
Land/Reject/Revise actions.
|
||||
- Land the first optimized tool-description and agent-prompt artifacts through
|
||||
PRs. Constitution clauses remain pinned outside the optimized region.
|
||||
- Add whale-species module passports, for example
|
||||
`Sei: codewhale-agent-prompt.v0.10.0-gepa-1`.
|
||||
|
||||
## Trust Boundary
|
||||
|
||||
Compilation is offline. Runtime consumes reviewed JSON artifacts. Online
|
||||
closed-loop optimization is out of scope because adversarial users could game a
|
||||
live coding harness. Any workset can fail independently without dragging the
|
||||
release, the core runtime, or other Pareto branches with it.
|
||||
@@ -1,61 +0,0 @@
|
||||
# v0.7.5 Implementation Plan
|
||||
|
||||
Scope: background shell job UX, in-TUI MCP management/discovery, and V4
|
||||
context/cache policy. Do not include provider expansion or Whalescale
|
||||
rename/migration work in this release lane.
|
||||
|
||||
## Context/cache decision
|
||||
|
||||
Default path:
|
||||
|
||||
- Keep the transcript append-only and preserve the stable prefix for DeepSeek V4 cache reuse.
|
||||
- Disable replacement-style `auto_compact` by default.
|
||||
- Keep replacement compaction manual or late: if a user enables `auto_compact`, V4 compacts only near the 80% model-window guard (`800000` tokens for 1M-context models), not at reasoning-effort soft caps.
|
||||
- Keep the Flash seam manager (`[context].enabled`) opt-in until issue #200 has repeatable cache-hit/miss evidence.
|
||||
- Keep the capacity controller disabled by default. Treat it as telemetry or an experimental guardrail unless `capacity.enabled = true` is set.
|
||||
- Use emergency overflow recovery only when the request would otherwise exceed the model input budget.
|
||||
|
||||
Rationale: V4's 1M-token window and prefix-cache economics make early
|
||||
replacement compaction suspect. The first shippable slice should prevent old
|
||||
128K-era heuristics from rewriting context before there is evidence that the
|
||||
rewrite is cheaper and more reliable than preserving a hot prefix.
|
||||
|
||||
## Shippable slices
|
||||
|
||||
### Slice 1: Context policy and docs
|
||||
|
||||
- Change default `auto_compact` to off.
|
||||
- Keep V4 replacement-compaction thresholds late and independent of reasoning effort.
|
||||
- Make `[context].enabled` default to false.
|
||||
- Make `docs/CONFIGURATION.md`, `docs/capacity_controller.md`, and `config.example.toml` match code defaults.
|
||||
- Add focused tests for defaults and V4 threshold behavior.
|
||||
|
||||
### Slice 2: Background shell job center (#195)
|
||||
|
||||
- Add a job-center view fed by `ShellManager::list()`.
|
||||
- Show command, cwd, linked task id when available, status, elapsed time, exit code, and latest output.
|
||||
- Add controls to inspect full output, poll latest output, send stdin for PTY/stdin-capable jobs, kill a background job, and attach completed output as task evidence.
|
||||
- Mark restart-stale jobs explicitly rather than presenting them as live.
|
||||
- Add lifecycle tests for start, poll, cancel, complete, stale/restart, plus TUI snapshots for running and completed job details.
|
||||
|
||||
### Slice 3: MCP manager (#196)
|
||||
|
||||
- Add `/mcp` or a command-palette action that opens an MCP manager view.
|
||||
- Show resolved config path, server enabled/disabled state, transport, command/url, timeout settings, startup errors, and discovered tool/resource/prompt counts.
|
||||
- Wire `mcp_config_path` into the interactive config surface.
|
||||
- Support init, add stdio server, add HTTP/SSE server, enable, disable, remove, validate, reconnect, and inspect tools/resources/prompts.
|
||||
- Preserve both `servers` and `mcpServers` config shapes.
|
||||
|
||||
### Slice 4: MCP discoverability (#197)
|
||||
|
||||
- Add an MCP command-palette section backed by the same discovery state as the manager.
|
||||
- Group tools/resources/prompts by server.
|
||||
- Show disabled/failed servers without blocking palette rendering.
|
||||
- Keep model-visible names consistent with `mcp_<server>_<tool>`.
|
||||
|
||||
## Stop rules
|
||||
|
||||
- Do not close #159 or #162 unless a verified PR actually resolves them.
|
||||
- Do not add provider expansion.
|
||||
- Do not rename or migrate anything to Whalescale.
|
||||
- Do not broaden the TUI into a large redesign; each slice should remain independently testable and shippable.
|
||||
Reference in New Issue
Block a user