docs: v0.8.53 tool-surface-diet design + north-star direction

Design-only deliverables for the v0.8.53 "tool surface diet / canonical surfaces" cutover (no catalog code in this cycle). Grounded in a verified inventory of the actual tool registry. - docs/TOOL_LIFECYCLE.md (#2681): the umbrella policy. Five lifecycle states (active / deferred / hidden-compatibility / deprecated / removed) modeled as const name-sets + an alias table in tool_catalog.rs (not a per-ToolSpec field), so registration stays untouched and old transcripts always replay. Includes the deprecation manifest (exec_wait/exec_interact/tts → hidden-compat; todo_* → checklist_* deprecated; 11 legacy subagent names are already non-visible dead code → cleanup + guardrail), per-mode/per-provider active-catalog budget (incl. Arcee's 8-tool first-turn set), prefix-cache safety rules, and the tool_agent decision: canonical but DeepSeek-V4-gated. - docs/CODEBASE_SEARCH_DESIGN.md (#2680, v0.9.0): local-first FTS5/BM25 + symbol/path ranking + RRF hybrid; rusqlite storage; mtime/branch/vendor invalidation; an explainable tool contract returning reasons[]; and a real CodeWhale query eval set. Complements grep_files/file_search, never replaces. - docs/SKILL_INVOCATION_DESIGN.md (0.9.0): the $<skill-name> inline invocation syntax (the token IS the skill name), namespaced resolution, ambiguity- suggests-not-guesses, visible activation line, and a smallest-viable slice. - docs/VISION_NORTH_STAR.md (0.9.0+): intent router, hybrid codebase intelligence, WhaleFlow typed workflow IR, skills/rules runtime, the layered context-memory stack, tool repair/autoload, the evaluation loop, and the command-surface taxonomy (/memory small · /context dashboard · /rules · /workflow · /overlay · $<skill> · codebase_search). Marked DIRECTION, not committed 0.8.53 work; also records the deferred-not-done diet items. Targets codex/v0.8.53.
2026-06-03 11:47:29 -07:00
parent 03d1bba538
commit 8cb4f94f30
4 changed files with 1371 additions and 0 deletions
@@ -0,0 +1,472 @@
+# CodeWhale North Star (0.9.0+)
+
+> **STATUS: DIRECTION, NOT COMMITTED WORK.**
+> Everything in this document is the maintainer's intended *direction* for
+> CodeWhale 0.9.0 and beyond. **None of it is committed 0.8.53 work.** The
+> 0.8.53 cycle ships **design docs only** for these areas — no tool-catalog code
+> lands this cycle except the small, already-scoped subagent/git/RLM fixes in
+> PR #2684 and PR #2685. Treat every "rough shape" below as a sketch to be
+> refined, not an API contract. Where this doc names tools that do not exist yet
+> (`codebase_search`, `read_file` as a canonical alias, `agent_run`, etc.) those
+> are **aspirational names** that will *map onto today's tools*; see each
+> section.
+
+## Why this document exists
+
+The vision is at risk of being lost between point releases. CodeWhale is
+accumulating capability (subagents, RLM, skills, workflows, an enormous tool
+catalog) faster than it is accumulating *shape*. This is the north star that the
+incremental 0.8.x stabilization work is steering toward, written down once so it
+survives the next dozen PRs.
+
+### The one principle
+
+**The harness handles memory, search, routing, state, and guardrails so a
+weaker model can just *think*.** Every design decision below is in service of
+moving cognitive load *out* of the model and *into* the harness. A
+`deepseek-v4-flash`-class model should not have to remember ~80 tool names, hold
+the codebase index in its head, track which layer of memory a fact lives in, or
+re-derive a recovery path after a malformed tool call. The harness does that.
+The model decides *what it wants*; the harness figures out *how*.
+
+---
+
+## Ground-truth anchor (today's reality)
+
+So the direction is honest about where it starts:
+
+- **Active first-turn tool set** is `DEFAULT_ACTIVE_NATIVE_TOOLS`
+  (`crates/tui/src/core/engine/tool_catalog.rs:37-64`) — 26 tools. Everything
+  else is **deferred** and hydrates via `tool_search_tool_regex` /
+  `tool_search_tool_bm25` (`tool_catalog.rs:26-35`).
+- **Catalog-head byte-stability is a hard invariant** for DeepSeek's KV
+  prefix cache (`tool_catalog.rs:169-196`). The active first-turn tool block
+  must stay byte-identical run-to-run; any change to it is a **one-time,
+  deterministic edit**, never a per-turn or per-mode mutation.
+- **Arcee** narrows the first turn to 8 read-only tools
+  (`ARCEE_FIRST_TURN_NATIVE_TOOLS`, `tool_catalog.rs:106-115`) as a Cloudflare
+  WAF workaround — proof the active partition is already provider-shaped.
+- **Subagent tools that are model-visible:** only `agent_open`, `agent_eval`,
+  `tool_agent`, `agent_close` (`crates/tui/src/tools/registry.rs:1017-1029`).
+  All legacy names (`agent_spawn`, `spawn_agent`, `agent_result`, `agent_wait`,
+  `agent_send_input`, `agent_assign`, `agent_list`, `agent_cancel`,
+  `resume_agent`, `delegate_to_agent`, …) are `#[allow(dead_code)]` structs in
+  `crates/tui/src/tools/subagent/mod.rs`, never instantiated outside tests →
+  **already not model-visible**. The live internal `send_input` / `cancel` /
+  `resume` methods on `SubAgentManager` (`mod.rs:1495,1521,1605`) back
+  `agent_eval` / `agent_close` and **stay**.
+- **`tool_agent` is "Fin"** — the experimental fast-lane executor: DeepSeek V4
+  Flash with thinking forced off (`mod.rs:5233`, `TOOL_AGENT_INTRO`;
+  `DEFAULT_CHILD_MODEL = "deepseek-v4-flash"`, `rlm.rs:26`).
+- **Known duplicates today:** `exec_wait ≡ exec_shell_wait`,
+  `exec_interact ≡ exec_shell_interact` (same structs, all four in the active
+  set), `tts ≡ speech` (both deferred). `todo_*` are deferred twins of
+  `checklist_*` (same `TodoWriteTool`, `::new` vs `::checklist`,
+  `todo.rs:187,194`). The router already unifies `exec_wait`/`exec_shell_wait`
+  (`crates/tui/src/tui/tool_routing.rs:1139-1140`).
+
+This is the surface the north star refactors *toward simplicity*.
+
+---
+
+## 1. Intent Router
+
+**What it is.** A thin layer where the model declares an **intent** —
+*search / inspect / edit / test / delegate / ask-user / run-shell /
+run-workflow* — and the harness maps that intent to the correct low-level tool
+and arguments. The model picks from a tiny, stable verb vocabulary instead of
+recalling ~80 concrete tool names and their schemas.
+
+**Why it helps weaker models.** Tool-name recall is one of the largest sources
+of wasted turns for small models: choosing a deferred tool (double-invoke),
+choosing a deprecated alias, or hallucinating a name. A fixed intent vocabulary
+collapses that decision space to ~10 verbs. The model spends its budget on
+*reasoning about the task*, not on *remembering the API*.
+
+**Rough shape.** A small **canonical visible set** — aspirational names that
+route onto today's tools:
+
+| Intent verb (aspirational) | Routes onto today |
+|---|---|
+| `codebase_search` | concept-level retrieval over the hybrid index (§2); today: `grep_files` + `file_search` + `project_map` |
+| `read_file` | `read_file` (already canonical) |
+| `apply_patch` | `apply_patch` (canonical; `edit_file`/`write_file`/`fim_edit` remain as distinct lower-level tools) |
+| `run_tests` | `run_tests` / `run_verifiers` |
+| `git_status` | `git_status` |
+| `git_diff` | `git_diff` |
+| `work_update` | `update_plan` / `checklist_write` |
+| `ask_user` | `request_user_input` |
+| `shell_run` | `exec_shell` (canonical; `exec_wait`/`exec_interact` hidden — §10) |
+| `agent_run` | `agent_open` / `tool_agent` (gated, §3) / `agent_eval` / `agent_close` |
+| `workflow_run` | WhaleFlow runner (§4) |
+
+The router is the *only* place the catalog's full complexity is allowed to live.
+It is also where **tool repair** (§7) hooks in: a mis-stated intent or a
+deferred/deprecated name is rewritten to the canonical route.
+
+**Dependencies.** The small canonical surface (§3), the lifecycle alias table
+(§3 / `docs/TOOL_LIFECYCLE.md`), and the hybrid index for `codebase_search`
+(§2). Must respect the **catalog-head byte-stability invariant**: the visible
+verb set is itself a one-time deterministic edit, not a dynamic per-turn list.
+
+---
+
+## 2. Default Hybrid Codebase Intelligence
+
+**What it is.** An always-on, local-first codebase index that ships with the
+harness — not an opt-in tool the model has to remember to build. It fuses:
+
+- plain **text** search,
+- **symbol** index (definitions/references),
+- **import / call graph**,
+- **FTS5 + BM25** lexical ranking (rusqlite is already a dependency —
+  `Cargo.toml`),
+- **sparse** retrieval,
+- optional **dense** (embedding) retrieval,
+- **PR / commit / issue history** as a first-class retrieval source,
+- a **codemap** (structural overview, the successor to today's deferred
+  `project_map`).
+
+**Why it helps weaker models.** Today the model must orchestrate `grep_files`
+(content), `file_search` (filename), and `project_map` (structure) by hand,
+reconcile their outputs, and re-run them as it narrows. There is **no FTS5/BM25
+or semantic index today** — every search is a cold walk (`file_search` uses the
+`ignore` crate's `WalkBuilder` for vendor exclusion, `file_search.rs:~210`). A
+weaker model burns turns stitching partial results. A single `codebase_search`
+intent backed by a hybrid index returns ranked, concept-level hits in one call,
+so the model reasons about *answers*, not *query mechanics*.
+
+**Rough shape.** A background indexer maintains a SQLite store (FTS5 + symbol +
+graph tables), refreshed on file change and on git events. `codebase_search`
+(§1) queries it; the codemap is regenerated incrementally. Vendor exclusion
+reuses the existing `ignore`/`WalkBuilder` path.
+
+**Dependencies.** rusqlite/FTS5; the Intent Router (§1) for the
+`codebase_search` verb; the trace store (§6/§8) for history retrieval. **Full
+design lives in `docs/CODEBASE_SEARCH_DESIGN.md`** (to be written this cycle).
+
+---
+
+## 3. Small Canonical Tool Surface
+
+**What it is.** A deliberately tiny set of always-visible canonical tools;
+**everything else is hidden, deferred, or skill-scoped**. The catalog grows
+behind the scenes but the *visible* surface stays small and stable.
+
+**Why it helps weaker models.** Fewer choices, no aliases competing for the same
+job, no deferred double-invokes for common operations. The model sees the verbs
+it needs and nothing else.
+
+**Rough shape — tool lifecycle states.** Five states, represented as **const
+name-sets plus an alias table in `tool_catalog.rs`** (NOT a per-`ToolSpec`
+field, to preserve the byte-stable head):
+
+1. **active** — in the first-turn catalog head.
+2. **deferred** — registered, hydrated via tool-search.
+3. **hidden-compatibility** — registered + dispatchable, **dropped from both
+   active and search**, identical behavior, **no notice**. (For exact
+   duplicates that should simply disappear from discovery.)
+4. **deprecated** — registered + dispatchable, **dropped from search**, appends
+   a *replacement notice to RESULT METADATA only* — **never** to the cached
+   prefix.
+5. **removed** — final state; no longer registered.
+
+**Invariant:** deprecated and hidden-compatibility tools **stay registered and
+dispatchable forever** so old transcripts always replay deterministically.
+
+**Planned diet (documented this cycle, not yet coded):**
+
+- `exec_wait`, `exec_interact`, `tts` → **hidden-compatibility** (exact
+  duplicates of `exec_shell_wait`, `exec_shell_interact`, `speech`).
+- `todo_*` (`todo_write/add/update/list`) → **deprecated → checklist_*** (drop
+  from tool-search, keep registered, add result-metadata notice).
+- Legacy subagent names → already hidden; remaining work is **cleanup +
+  guardrail tests**, rebased on PR #2684.
+
+**Explicitly NOT touched** (distinct niches, per #2681 non-goals) — doc-only
+canonical guidance, no diet: `apply_patch` / `edit_file` / `write_file` /
+`fim_edit`; `grep_files` / `file_search` / `project_map`; `fetch_url` /
+`web.run` / `web_search`; `task_shell_*`; `handle_read` /
+`retrieve_tool_result`.
+
+**`tool_agent` gating decision.** `tool_agent` ("Fin") **stays** as a canonical
+subagent tool, but is **gated to DeepSeek-V4 models only**. It is the fast,
+non-thinking executor lane built on `deepseek-v4-flash`; offering it to other
+providers/models is meaningless (the lane *is* a specific model) and would just
+add a name to recall. The gate is provider/model-conditional in the same spirit
+as the Arcee first-turn narrowing.
+
+**Dependencies.** The alias table backs the Intent Router (§1) and Tool Repair
+(§7). **Full spec in `docs/TOOL_LIFECYCLE.md`** (to be written this cycle).
+
+---
+
+## 4. WhaleFlow / Workflow Mode
+
+**What it is.** A typed, multi-agent **workflow runner**. A workflow is a graph
+of typed nodes — **branches, leaves, reviewers, verifiers, test-runners,
+PR-creators**, with **trace-replay** and a **progress-monitor**. Authors write
+workflows in **Starlark or YAML**, which compile to a **typed Rust IR**; the
+**Rust executor** runs the IR. "Like Claude's workflow mode, but safer" — the
+safety comes from the typed IR and Rust execution boundary rather than free-form
+model-driven orchestration.
+
+**Why it helps weaker models.** Long-running, multi-step work (implement →
+review → verify → test → open PR) is exactly where weaker models drift, lose
+state, or skip verification. Encoding the *process* as a typed graph means the
+model only has to be competent at each *leaf*, while the harness guarantees the
+sequencing, the verification gates, and the evidence trail.
+
+**Rough shape.** Starlark/YAML → typed IR → Rust executor. Nodes map to
+subagent lanes (`agent_open` / `tool_agent` / `agent_eval` / `agent_close`,
+`registry.rs:1017-1029`). Reviewer/verifier/test-runner nodes are first-class
+node *types*, not ad-hoc prompts. Every run emits a trace (→ §8). Surfaced via
+`/workflow` (alias `/whaleflow`) and the `workflow_run` intent (§1).
+
+**Dependencies.** Subagent runtime; the evaluation loop (§8) for traces;
+Skills & Rules (§5) so a skill can *define* a workflow; the command taxonomy
+(§9).
+
+---
+
+## 5. Skills & Rules as First-Class Runtime
+
+**What it is.** Skills and rules become real runtime objects, not just prompt
+text. Skills gain **activation modes**:
+
+- **always-on** — injected every turn,
+- **glob** — activated when matching files are in scope,
+- **model-decision** — offered to the model to opt into,
+- **manual** — only via explicit `$<skill-name>` invocation (§9).
+
+Skills can **restrict the tool surface**, **define workflows** (§4), and
+**inject repo context**.
+
+**Why it helps weaker models.** A skill scoped to a task can shrink the tool
+surface to exactly what that task needs and pre-load the relevant rules and
+context — so the model operates inside a curated, smaller world instead of the
+full catalog.
+
+**Rough shape (vs. today).** Today: skills are discovered
+(`crates/tui/src/tools/skills/mod.rs`, `discover_in_workspace ~421`; struct
+parses name/description `~382-388`), enable-state is tracked
+(`skill_state.rs`, `SkillStateStore::is_enabled ~73`), and there's an
+inline-mention popup (`slash_menu.rs ~86`). **But:** no parser activates inline
+`$` mentions on submit (submit path: `ui.rs build_queued_message ~4721`), there
+is **no activation-mode concept**, and **skills cannot restrict tools**. The
+direction adds (a) a submit-time `$<skill-name>` activation parser, (b) the
+four activation modes in skill metadata, and (c) a tool-restriction field
+enforced by the registry/router.
+
+**Dependencies.** Tool lifecycle/alias table (§3) for restriction; Intent Router
+(§1); WhaleFlow (§4); command taxonomy (§9). **Full design in
+`docs/SKILL_INVOCATION_DESIGN.md`** (to be written this cycle).
+
+---
+
+## 6. Context Memory Stack
+
+**What it is.** Memory modeled as **explicit, layered, inspectable** stores
+rather than one undifferentiated blob. Each layer is **visible, inspectable,
+clearable, and scoped**:
+
+1. **User memory** — small user prefs/facts (surfaced via `/memory`, §9).
+2. **Repo rules** — checked-in guidance (`/rules`).
+3. **Codemap-wiki** — derived structural/semantic knowledge of the repo (§2).
+4. **Trace store** — recorded workflow/turn evidence (§8).
+5. **ARMH–RLM memo** — the RLM kernel's in-session working memory
+   (`rlm_open`/`rlm_eval`/`rlm_configure`/`rlm_close`/`rlm_session_objects`,
+   `crates/tui/src/tools/rlm.rs`; `handle_read` retrieves var handles;
+   `finalize`/`FINAL` is an *in-kernel Python function*, not a tool).
+6. **Cached-main overlay** — promoted lessons from the cached main branch
+   (`/overlay`, §9).
+7. **External memory (Aleph)** — large local data via the `aleph` skill.
+
+**Why it helps weaker models.** The model never has to *guess* where a fact
+should live or *re-derive* context it already established. Each layer has a
+clear scope and a clear command to inspect/clear it, so stale context is
+visible and removable rather than silently poisoning the prefix.
+
+**Rough shape.** A `/context` dashboard (§9) renders all active layers and their
+sizes; `/memory` manages the small user layer; `/overlay` manages promoted
+lessons. The RLM layer already exists and is plumbed through `rlm.rs`.
+
+**Dependencies.** Command taxonomy (§9); codebase intelligence (§2); evaluation
+loop (§8) for promotion into the overlay.
+
+---
+
+## 7. Tool Repair & Autoload
+
+**What it is.** When the model emits a wrong, deferred, deprecated, or
+environment-blocked tool call, the harness **repairs** it instead of returning a
+bare error — and **autoloads** what's needed.
+
+**Why it helps weaker models.** Recovery from a malformed call is precisely
+where weak models loop or give up. Turning every failure into an actionable,
+schema-bearing correction keeps the model on-task.
+
+**Rough shape — representative repairs:**
+
+- **Wrong/legacy name** → *"you meant `agent_eval`; here's the schema"* (autoload
+  the deferred tool's schema in the same turn).
+- **Mode mismatch** → *"shell is unavailable in Plan mode — ask the user or
+  switch modes"*.
+- **Missing dependency** → *"this tool needs Node; Node is missing"*
+  (dependency probe via `ExternalTool`, already imported in `tool_catalog.rs`).
+- **Deprecated alias** → silently **routed to the canonical** tool, with the
+  replacement notice in **result metadata only** (§3) — never the cached prefix.
+
+**Dependencies.** The alias table + lifecycle states (§3); the Intent Router
+(§1); dependency detection (`ExternalTool`). Builds on PR #2685's actionable
+RLM/field errors and PR #2684's lifecycle signals — **must not contradict
+either**.
+
+---
+
+## 8. Evaluation Loop
+
+**What it is.** Every workflow run **leaves evidence**: the tests it ran, the
+diffs it produced, the failures it hit, the searches it issued, the claims it
+verified, and the PR outcome. A **teacher/student replay** turns *good* traces
+into reusable **rules, skills, tests, and cached guidance**.
+
+**Why it helps weaker models.** The system gets better at *this repo* over time
+without the model getting smarter. Verified good traces become rules/skills the
+weaker model can lean on next time, and become the source of the cached-main
+overlay (§6).
+
+**Rough shape.** Workflow nodes (§4) emit structured evidence into the trace
+store (§6). A replay/distillation pass (teacher reviews student trace) promotes
+high-value traces into: repo rules (`/rules`), skills (§5), regression tests,
+and overlay guidance (`/overlay`). Verified-claim tracking ties into the
+adversarial-verification posture already used elsewhere.
+
+**Dependencies.** WhaleFlow (§4) for trace emission; trace store + overlay (§6);
+Skills & Rules (§5) as promotion targets.
+
+---
+
+## 9. Command-Surface Taxonomy
+
+**What it is.** One name = **one thing**. The command surface is split so each
+prefix has a single, memorable responsibility:
+
+| Surface | Responsibility |
+|---|---|
+| `/memory` | **Small** user prefs/facts only |
+| `/context` | **Dashboard** of all active memory layers (§6) |
+| `/rules` | Repo guidance |
+| `/workflow` (`/whaleflow`) | Long-running multi-agent runs (§4) |
+| `/overlay` | Promoted cached-main lessons (§6/§8) |
+| `$<skill-name>` | Skill invocation — **the token *is* the skill name** |
+| `codebase_search` | Concept-level code retrieval (§2) |
+
+**Why it helps weaker models (and users).** No overloaded command does five
+jobs; the model/user never has to disambiguate *which* `/memory` behavior they
+meant. `$systematic-debugging` self-documents what it invokes.
+
+**`/memory` subcommand sketch:**
+
+```
+/memory add "<fact>"        # store a small pref/fact
+/memory edit                # edit stored facts
+/memory search <query>      # find a stored fact
+/memory clear               # clear user memory
+/memory doctor              # health check; detects legacy ~/.deepseek path
+/memory promote <fact>      # (later) promote a fact to a higher layer
+```
+
+`doctor` specifically detects the **legacy `~/.deepseek`** path and guides
+migration.
+
+**`$<skill-name>` invocation examples:**
+
+```
+$systematic-debugging       # local skill
+$github:gh-fix-ci           # namespaced skill
+```
+
+The submit-time parser (to be added; submit path `ui.rs ~4721`) recognizes the
+`$` token and activates the named skill (§5).
+
+**`/context` layers dashboard (example render):**
+
+```
+/context
+  user-memory      ▸ 7 facts                 (12 KB)   [clear]
+  repo-rules       ▸ CLAUDE.md, AGENTS.md     (8 KB)   [view]
+  codemap-wiki     ▸ 412 symbols indexed     (auto)    [rebuild]
+  trace-store      ▸ 3 recent workflow runs  (—)       [open]
+  rlm-memo         ▸ 0 active sessions        (—)       [—]
+  cached-overlay   ▸ 5 promoted lessons       (3 KB)   [view]
+  aleph-external   ▸ not attached             (—)       [attach]
+```
+
+**Dependencies.** Memory stack (§6); skills (§5); codebase intelligence (§2);
+workflow runner (§4).
+
+---
+
+## 10. Deferred-Not-Done 0.8.53 Diet Items
+
+Recorded here so they are **not silently dropped** — these were considered for
+the 0.8.53 diet and deliberately **deferred** (design-only or out of scope this
+cycle):
+
+- **File-mutation overload** — `apply_patch` / `edit_file` / `write_file` /
+  `fim_edit` overlap in purpose. Per #2681 non-goals these stay distinct;
+  canonical *guidance* (prefer `apply_patch`) is doc-only, no consolidation
+  this cycle.
+- **`task_shell_*` ↔ `exec_*` redundancy** — `task_shell_start` /
+  `task_shell_wait` overlap conceptually with the `exec_*` family. Left intact
+  this cycle (distinct niche per #2681); revisit under §1/§3.
+- **`handle_read` / `retrieve_tool_result`** — result-handle plumbing kept as-is
+  (doc-only canonical guidance); folds naturally into the memory stack (§6) and
+  intent routing (§1) later.
+- **Search-cluster consolidation** — `grep_files` / `file_search` /
+  `project_map` remain three tools this cycle; consolidation is the *job of the
+  hybrid index* (§2) under `codebase_search`, not a catalog edit in 0.8.53.
+
+---
+
+## Phased Roadmap
+
+### 0.8.53 — design + small fixes only
+- **Code:** only the already-scoped, narrow fixes — PR #2684 (subagent role
+  vocab, lifecycle signals, eval ergonomics) and PR #2685 (read-only git history
+  active + actionable RLM/field errors). Subagent legacy-name cleanup +
+  guardrail tests rebased on #2684.
+- **Docs:** this north star, plus `docs/TOOL_LIFECYCLE.md`,
+  `docs/CODEBASE_SEARCH_DESIGN.md`, `docs/SKILL_INVOCATION_DESIGN.md`.
+- **No tool-catalog code:** the diet (§3), the Intent Router (§1), and the
+  hybrid index (§2) are **documented, not coded** this cycle.
+
+### 0.9.0 — first structural moves
+- Implement the **tool lifecycle** const name-sets + alias table in
+  `tool_catalog.rs` (§3) as a one-time deterministic head edit.
+- Land the **planned diet**: `exec_wait`/`exec_interact`/`tts` →
+  hidden-compatibility; `todo_*` → deprecated→`checklist_*` (result-metadata
+  notice only).
+- Gate **`tool_agent`** to DeepSeek-V4 models only (§3).
+- First version of the **default hybrid codebase index** (FTS5/BM25 + symbol +
+  codemap) behind `codebase_search` (§2).
+- First **Intent Router** verbs mapping onto today's tools (§1).
+- **Tool Repair** for deferred/deprecated/mode/dependency cases (§7).
+
+### Later (post-0.9.0)
+- **WhaleFlow** typed-IR workflow runner (§4) and the **evaluation loop** /
+  teacher-student replay (§8).
+- **Skills activation modes** + tool restriction + `$<skill-name>` submit-time
+  activation (§5).
+- Full **Context Memory Stack** with `/context` dashboard, `/overlay`
+  promotion, and Aleph external memory (§6).
+- Dense/semantic retrieval and PR/commit/issue history in the index (§2).
+- Search-cluster consolidation and the remaining §10 deferred items.
+
+---
+
+## North-star one-liner
+
+> **The harness handles memory, search, routing, state, and guardrails — so a
+> weaker model can just think.**