diff --git a/docs/CODEBASE_SEARCH_DESIGN.md b/docs/CODEBASE_SEARCH_DESIGN.md index 1a516c84..d7711d1a 100644 --- a/docs/CODEBASE_SEARCH_DESIGN.md +++ b/docs/CODEBASE_SEARCH_DESIGN.md @@ -1,6 +1,6 @@ # `codebase_search` — Local-First Semantic Code Retrieval -> **Status:** Design note + eval scaffold. **Code is DEFERRED.** +> **Status:** Design note + planned eval scaffold. **Code is DEFERRED.** > GitHub #2680 · Milestone **v0.9.0** · This DOC ships in **v0.8.53** (doc-only; no catalog code in this cycle). > Related in-flight: PR #2684 (subagent role vocab / lifecycle signals / eval ergonomics), PR #2685 (git history active + RLM/field errors). This note must not contradict either. @@ -214,7 +214,12 @@ A fixed set of real CodeWhale concept queries, each with the **expected** file(s | 14 | Where is the queued user message built on submit? | `crates/tui/src/tui/ui.rs` | `build_queued_message` ~4721 | | 15 | Where are speech / TTS tools registered? (duplicate names) | `crates/tui/src/tools/registry.rs` | `speech` ≡ `tts` :787-792 | -Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). Phase 1 ships the harness that runs all queries against the live index and reports recall@k and MRR; a regression bar (e.g. recall@10 ≥ target) gates future ranking changes. +Each entry is intended to become a `(query, expected_paths[])` row in a fixture +(e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). This PR ships +the design table only; the fixture and harness are deferred to Phase 1. The +Phase 1 harness runs all queries against the live index and reports recall@k +and MRR; a regression bar (e.g. recall@10 >= target) gates future ranking +changes. --- @@ -222,7 +227,7 @@ Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/t ### Phasing -- **Phase 0 (this cycle, v0.8.53):** this design note + eval fixture only. No catalog code. +- **Phase 0 (this cycle, v0.8.53):** this design note + benchmark table only. No fixture, harness, or catalog code. - **Phase 1 (v0.9.0):** local lexical core — FTS5 `bm25()` + symbol + path + session-relevance + exact grep fallback, fused via RRF. SQLite index at `~/.codewhale/index/.db`. Eval harness wired into CI. **No network, no model downloads.** Tool registered as deferred (hydrated via tool-search) initially; promotion to the active first-turn set is a separate, deliberate decision (see lifecycle below) because of the prefix-cache invariant. - **Phase 2:** incremental/background reindex, branch-aware invalidation hardening, richer chunkers (tree-sitter per language). - **Phase 3 (feature-flagged, off by default):** `sparse-splade` and `dense-embed` RRF signals. Embedding/HF downloads behind the flag + workset opt-in (§3 Privacy). diff --git a/docs/TOOL_LIFECYCLE.md b/docs/TOOL_LIFECYCLE.md index 136e468b..07dc3407 100644 --- a/docs/TOOL_LIFECYCLE.md +++ b/docs/TOOL_LIFECYCLE.md @@ -205,21 +205,25 @@ separate, deliberate decision per name. ## 5. Active-catalog budget (per mode, per provider) -The active set is the first-turn cost. Current default active set: -`DEFAULT_ACTIVE_NATIVE_TOOLS` has **25** entries (`tool_catalog.rs:37-64`). +The active set is the first-turn cost. Do not duplicate the exact +`DEFAULT_ACTIVE_NATIVE_TOOLS` count here: adjacent PRs in the v0.8.53 batch may +add or remove active tools, and the source of truth is always +`tool_catalog.rs`. This document defines the diet policy and invariants, not a +second catalog snapshot. ### Per provider -| Provider | First-turn active source | Current count | Target after diet | -|---|---|---|---| -| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | 25 | ~22 (drop `exec_wait`, `exec_interact`; `todo_*` already not active) | -| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | 8 (read-only WAF workaround) | 8 (unchanged) | +| Provider | First-turn active source | Budget policy | +|---|---|---| +| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | Remove duplicate aliases from the active head when their canonical twins stay active; any net growth needs an explicit budget decision. | +| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | Provider-specific read-only WAF workaround; unchanged by the default diet unless explicitly reviewed. | The default diet removes `exec_wait` and `exec_interact` from the active head (they become hidden-compat; their canonical twins `exec_shell_wait` / `exec_shell_interact` stay). `tts` and `todo_*` are *already not* in the active -set, so the active count moves **25 → 23** from the wait/interact removal alone; -the broader target is a stable budget of roughly **≤ 22** canonical tools. +set, so they do not change the active budget in this diet. The net effect of +this specific diet is to remove two duplicate active aliases from whatever +default active head is current after the surrounding v0.8.53 PR batch. ### Per mode (Plan / Agent / YOLO) @@ -232,7 +236,7 @@ add or remove native tools from `DEFAULT_ACTIVE_NATIVE_TOOLS` | Mode | Native active budget | MCP tools active? | |---|---|---| -| Plan | same native head (target ≤ 22) | No (deferred) | +| Plan | same native head | No (deferred) | | Agent | same native head | No (deferred) | | YOLO | same native head | Yes (a known, intentional widening) | diff --git a/docs/VISION_NORTH_STAR.md b/docs/VISION_NORTH_STAR.md index 420b432a..064d8885 100644 --- a/docs/VISION_NORTH_STAR.md +++ b/docs/VISION_NORTH_STAR.md @@ -357,11 +357,28 @@ prefix has a single, memorable responsibility: | `/memory` | **Small** user prefs/facts only | | `/context` | **Dashboard** of all active memory layers (§6) | | `/rules` | Repo guidance | +| `.codewhale/constitution.json` | Repo constitution: checked-in **local law** | | `/workflow` (`/whaleflow`) | Long-running multi-agent runs (§4) | | `/overlay` | Promoted cached-main lessons (§6/§8) | | `$` | Skill invocation — **the token *is* the skill name** | | `codebase_search` | Concept-level code retrieval (§2) | +The repo constitution is not another memory bucket. It is the local-law layer in +a layered authority model: + +``` +base myth / global Constitution + -> repo constitution (.codewhale/constitution.json) + -> task packet + -> runtime policy +``` + +At conflict time, the **current user request for the task remains above the repo +constitution**; the repo constitution supplies durable defaults and local law +only when the active task packet and runtime policy leave room. Runtime policy is +the compiled enforcement surface for the run, not a separate place for the model +to invent new rules. + **Why it helps weaker models (and users).** No overloaded command does five jobs; the model/user never has to disambiguate *which* `/memory` behavior they meant. `$systematic-debugging` self-documents what it invokes. @@ -395,6 +412,7 @@ The submit-time parser (to be added; submit path `ui.rs ~4721`) recognizes the ``` /context user-memory ▸ 7 facts (12 KB) [clear] + repo-constitution ▸ .codewhale/constitution.json (4 KB) [view] repo-rules ▸ CLAUDE.md, AGENTS.md (8 KB) [view] codemap-wiki ▸ 412 symbols indexed (auto) [rebuild] trace-store ▸ 3 recent workflow runs (—) [open]