docs: align v0.8.53 tool surface notes

This commit is contained in:
Hunter Bown
2026-06-03 12:37:39 -07:00
parent 8cb4f94f30
commit aa4c734602
3 changed files with 39 additions and 12 deletions
+8 -3
View File
@@ -1,6 +1,6 @@
# `codebase_search` — Local-First Semantic Code Retrieval
> **Status:** Design note + eval scaffold. **Code is DEFERRED.**
> **Status:** Design note + planned eval scaffold. **Code is DEFERRED.**
> GitHub #2680 · Milestone **v0.9.0** · This DOC ships in **v0.8.53** (doc-only; no catalog code in this cycle).
> Related in-flight: PR #2684 (subagent role vocab / lifecycle signals / eval ergonomics), PR #2685 (git history active + RLM/field errors). This note must not contradict either.
@@ -214,7 +214,12 @@ A fixed set of real CodeWhale concept queries, each with the **expected** file(s
| 14 | Where is the queued user message built on submit? | `crates/tui/src/tui/ui.rs` | `build_queued_message` ~4721 |
| 15 | Where are speech / TTS tools registered? (duplicate names) | `crates/tui/src/tools/registry.rs` | `speech``tts` :787-792 |
Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). Phase 1 ships the harness that runs all queries against the live index and reports recall@k and MRR; a regression bar (e.g. recall@10 ≥ target) gates future ranking changes.
Each entry is intended to become a `(query, expected_paths[])` row in a fixture
(e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). This PR ships
the design table only; the fixture and harness are deferred to Phase 1. The
Phase 1 harness runs all queries against the live index and reports recall@k
and MRR; a regression bar (e.g. recall@10 >= target) gates future ranking
changes.
---
@@ -222,7 +227,7 @@ Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/t
### Phasing
- **Phase 0 (this cycle, v0.8.53):** this design note + eval fixture only. No catalog code.
- **Phase 0 (this cycle, v0.8.53):** this design note + benchmark table only. No fixture, harness, or catalog code.
- **Phase 1 (v0.9.0):** local lexical core — FTS5 `bm25()` + symbol + path + session-relevance + exact grep fallback, fused via RRF. SQLite index at `~/.codewhale/index/<workspace-hash>.db`. Eval harness wired into CI. **No network, no model downloads.** Tool registered as deferred (hydrated via tool-search) initially; promotion to the active first-turn set is a separate, deliberate decision (see lifecycle below) because of the prefix-cache invariant.
- **Phase 2:** incremental/background reindex, branch-aware invalidation hardening, richer chunkers (tree-sitter per language).
- **Phase 3 (feature-flagged, off by default):** `sparse-splade` and `dense-embed` RRF signals. Embedding/HF downloads behind the flag + workset opt-in (§3 Privacy).
+13 -9
View File
@@ -205,21 +205,25 @@ separate, deliberate decision per name.
## 5. Active-catalog budget (per mode, per provider)
The active set is the first-turn cost. Current default active set:
`DEFAULT_ACTIVE_NATIVE_TOOLS` has **25** entries (`tool_catalog.rs:37-64`).
The active set is the first-turn cost. Do not duplicate the exact
`DEFAULT_ACTIVE_NATIVE_TOOLS` count here: adjacent PRs in the v0.8.53 batch may
add or remove active tools, and the source of truth is always
`tool_catalog.rs`. This document defines the diet policy and invariants, not a
second catalog snapshot.
### Per provider
| Provider | First-turn active source | Current count | Target after diet |
|---|---|---|---|
| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | 25 | ~22 (drop `exec_wait`, `exec_interact`; `todo_*` already not active) |
| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | 8 (read-only WAF workaround) | 8 (unchanged) |
| Provider | First-turn active source | Budget policy |
|---|---|---|
| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | Remove duplicate aliases from the active head when their canonical twins stay active; any net growth needs an explicit budget decision. |
| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | Provider-specific read-only WAF workaround; unchanged by the default diet unless explicitly reviewed. |
The default diet removes `exec_wait` and `exec_interact` from the active head
(they become hidden-compat; their canonical twins `exec_shell_wait` /
`exec_shell_interact` stay). `tts` and `todo_*` are *already not* in the active
set, so the active count moves **25 → 23** from the wait/interact removal alone;
the broader target is a stable budget of roughly **≤ 22** canonical tools.
set, so they do not change the active budget in this diet. The net effect of
this specific diet is to remove two duplicate active aliases from whatever
default active head is current after the surrounding v0.8.53 PR batch.
### Per mode (Plan / Agent / YOLO)
@@ -232,7 +236,7 @@ add or remove native tools from `DEFAULT_ACTIVE_NATIVE_TOOLS`
| Mode | Native active budget | MCP tools active? |
|---|---|---|
| Plan | same native head (target ≤ 22) | No (deferred) |
| Plan | same native head | No (deferred) |
| Agent | same native head | No (deferred) |
| YOLO | same native head | Yes (a known, intentional widening) |
+18
View File
@@ -357,11 +357,28 @@ prefix has a single, memorable responsibility:
| `/memory` | **Small** user prefs/facts only |
| `/context` | **Dashboard** of all active memory layers (§6) |
| `/rules` | Repo guidance |
| `.codewhale/constitution.json` | Repo constitution: checked-in **local law** |
| `/workflow` (`/whaleflow`) | Long-running multi-agent runs (§4) |
| `/overlay` | Promoted cached-main lessons (§6/§8) |
| `$<skill-name>` | Skill invocation — **the token *is* the skill name** |
| `codebase_search` | Concept-level code retrieval (§2) |
The repo constitution is not another memory bucket. It is the local-law layer in
a layered authority model:
```
base myth / global Constitution
-> repo constitution (.codewhale/constitution.json)
-> task packet
-> runtime policy
```
At conflict time, the **current user request for the task remains above the repo
constitution**; the repo constitution supplies durable defaults and local law
only when the active task packet and runtime policy leave room. Runtime policy is
the compiled enforcement surface for the run, not a separate place for the model
to invent new rules.
**Why it helps weaker models (and users).** No overloaded command does five
jobs; the model/user never has to disambiguate *which* `/memory` behavior they
meant. `$systematic-debugging` self-documents what it invokes.
@@ -395,6 +412,7 @@ The submit-time parser (to be added; submit path `ui.rs ~4721`) recognizes the
```
/context
user-memory ▸ 7 facts (12 KB) [clear]
repo-constitution ▸ .codewhale/constitution.json (4 KB) [view]
repo-rules ▸ CLAUDE.md, AGENTS.md (8 KB) [view]
codemap-wiki ▸ 412 symbols indexed (auto) [rebuild]
trace-store ▸ 3 recent workflow runs (—) [open]