docs: align v0.8.53 tool surface notes

2026-06-03 12:37:39 -07:00
parent 8cb4f94f30
commit aa4c734602
3 changed files with 39 additions and 12 deletions
@@ -1,6 +1,6 @@
 # `codebase_search` — Local-First Semantic Code Retrieval

-> **Status:** Design note + eval scaffold. **Code is DEFERRED.**
+> **Status:** Design note + planned eval scaffold. **Code is DEFERRED.**
 > GitHub #2680 · Milestone **v0.9.0** · This DOC ships in **v0.8.53** (doc-only; no catalog code in this cycle).
 > Related in-flight: PR #2684 (subagent role vocab / lifecycle signals / eval ergonomics), PR #2685 (git history active + RLM/field errors). This note must not contradict either.

@@ -214,7 +214,12 @@ A fixed set of real CodeWhale concept queries, each with the **expected** file(s
 | 14 | Where is the queued user message built on submit? | `crates/tui/src/tui/ui.rs` | `build_queued_message` ~4721 |
 | 15 | Where are speech / TTS tools registered? (duplicate names) | `crates/tui/src/tools/registry.rs` | `speech` ≡ `tts` :787-792 |

-Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). Phase 1 ships the harness that runs all queries against the live index and reports recall@k and MRR; a regression bar (e.g. recall@10 ≥ target) gates future ranking changes.
+Each entry is intended to become a `(query, expected_paths[])` row in a fixture
+(e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). This PR ships
+the design table only; the fixture and harness are deferred to Phase 1. The
+Phase 1 harness runs all queries against the live index and reports recall@k
+and MRR; a regression bar (e.g. recall@10 >= target) gates future ranking
+changes.

 ---

@@ -222,7 +227,7 @@ Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/t

 ### Phasing

- **Phase 0 (this cycle, v0.8.53):** this design note + eval fixture only. No catalog code.
+- **Phase 0 (this cycle, v0.8.53):** this design note + benchmark table only. No fixture, harness, or catalog code.
 - **Phase 1 (v0.9.0):** local lexical core — FTS5 `bm25()` + symbol + path + session-relevance + exact grep fallback, fused via RRF. SQLite index at `~/.codewhale/index/<workspace-hash>.db`. Eval harness wired into CI. **No network, no model downloads.** Tool registered as deferred (hydrated via tool-search) initially; promotion to the active first-turn set is a separate, deliberate decision (see lifecycle below) because of the prefix-cache invariant.
 - **Phase 2:** incremental/background reindex, branch-aware invalidation hardening, richer chunkers (tree-sitter per language).
 - **Phase 3 (feature-flagged, off by default):** `sparse-splade` and `dense-embed` RRF signals. Embedding/HF downloads behind the flag + workset opt-in (§3 Privacy).
@@ -205,21 +205,25 @@ separate, deliberate decision per name.

 ## 5. Active-catalog budget (per mode, per provider)

-The active set is the first-turn cost. Current default active set:
-`DEFAULT_ACTIVE_NATIVE_TOOLS` has **25** entries (`tool_catalog.rs:37-64`).
+The active set is the first-turn cost. Do not duplicate the exact
+`DEFAULT_ACTIVE_NATIVE_TOOLS` count here: adjacent PRs in the v0.8.53 batch may
+add or remove active tools, and the source of truth is always
+`tool_catalog.rs`. This document defines the diet policy and invariants, not a
+second catalog snapshot.

 ### Per provider

-| Provider | First-turn active source | Current count | Target after diet |
-|---|---|---|---|
-| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | 25 | ~22 (drop `exec_wait`, `exec_interact`; `todo_*` already not active) |
-| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | 8 (read-only WAF workaround) | 8 (unchanged) |
+| Provider | First-turn active source | Budget policy |
+|---|---|---|
+| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | Remove duplicate aliases from the active head when their canonical twins stay active; any net growth needs an explicit budget decision. |
+| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | Provider-specific read-only WAF workaround; unchanged by the default diet unless explicitly reviewed. |

 The default diet removes `exec_wait` and `exec_interact` from the active head
 (they become hidden-compat; their canonical twins `exec_shell_wait` /
 `exec_shell_interact` stay). `tts` and `todo_*` are *already not* in the active
-set, so the active count moves **25 → 23** from the wait/interact removal alone;
-the broader target is a stable budget of roughly **≤ 22** canonical tools.
+set, so they do not change the active budget in this diet. The net effect of
+this specific diet is to remove two duplicate active aliases from whatever
+default active head is current after the surrounding v0.8.53 PR batch.

 ### Per mode (Plan / Agent / YOLO)

@@ -232,7 +236,7 @@ add or remove native tools from `DEFAULT_ACTIVE_NATIVE_TOOLS`

 | Mode | Native active budget | MCP tools active? |
 |---|---|---|
-| Plan | same native head (target ≤ 22) | No (deferred) |
+| Plan | same native head | No (deferred) |
 | Agent | same native head | No (deferred) |
 | YOLO | same native head | Yes (a known, intentional widening) |

@@ -357,11 +357,28 @@ prefix has a single, memorable responsibility:
 | `/memory` | **Small** user prefs/facts only |
 | `/context` | **Dashboard** of all active memory layers (§6) |
 | `/rules` | Repo guidance |
+| `.codewhale/constitution.json` | Repo constitution: checked-in **local law** |
 | `/workflow` (`/whaleflow`) | Long-running multi-agent runs (§4) |
 | `/overlay` | Promoted cached-main lessons (§6/§8) |
 | `$<skill-name>` | Skill invocation — **the token *is* the skill name** |
 | `codebase_search` | Concept-level code retrieval (§2) |

+The repo constitution is not another memory bucket. It is the local-law layer in
+a layered authority model:
+
+```
+base myth / global Constitution
+  -> repo constitution (.codewhale/constitution.json)
+  -> task packet
+  -> runtime policy
+```
+
+At conflict time, the **current user request for the task remains above the repo
+constitution**; the repo constitution supplies durable defaults and local law
+only when the active task packet and runtime policy leave room. Runtime policy is
+the compiled enforcement surface for the run, not a separate place for the model
+to invent new rules.
+
 **Why it helps weaker models (and users).** No overloaded command does five
 jobs; the model/user never has to disambiguate *which* `/memory` behavior they
 meant. `$systematic-debugging` self-documents what it invokes.
@@ -395,6 +412,7 @@ The submit-time parser (to be added; submit path `ui.rs ~4721`) recognizes the
 ```
 /context
  user-memory      ▸ 7 facts                 (12 KB)   [clear]
+  repo-constitution ▸ .codewhale/constitution.json (4 KB) [view]
  repo-rules       ▸ CLAUDE.md, AGENTS.md     (8 KB)   [view]
  codemap-wiki     ▸ 412 symbols indexed     (auto)    [rebuild]
  trace-store      ▸ 3 recent workflow runs  (—)       [open]