docs: align v0.8.53 tool surface notes
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# `codebase_search` — Local-First Semantic Code Retrieval
|
||||
|
||||
> **Status:** Design note + eval scaffold. **Code is DEFERRED.**
|
||||
> **Status:** Design note + planned eval scaffold. **Code is DEFERRED.**
|
||||
> GitHub #2680 · Milestone **v0.9.0** · This DOC ships in **v0.8.53** (doc-only; no catalog code in this cycle).
|
||||
> Related in-flight: PR #2684 (subagent role vocab / lifecycle signals / eval ergonomics), PR #2685 (git history active + RLM/field errors). This note must not contradict either.
|
||||
|
||||
@@ -214,7 +214,12 @@ A fixed set of real CodeWhale concept queries, each with the **expected** file(s
|
||||
| 14 | Where is the queued user message built on submit? | `crates/tui/src/tui/ui.rs` | `build_queued_message` ~4721 |
|
||||
| 15 | Where are speech / TTS tools registered? (duplicate names) | `crates/tui/src/tools/registry.rs` | `speech` ≡ `tts` :787-792 |
|
||||
|
||||
Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). Phase 1 ships the harness that runs all queries against the live index and reports recall@k and MRR; a regression bar (e.g. recall@10 ≥ target) gates future ranking changes.
|
||||
Each entry is intended to become a `(query, expected_paths[])` row in a fixture
|
||||
(e.g. `crates/tui/tests/fixtures/codebase_search_eval.jsonl`). This PR ships
|
||||
the design table only; the fixture and harness are deferred to Phase 1. The
|
||||
Phase 1 harness runs all queries against the live index and reports recall@k
|
||||
and MRR; a regression bar (e.g. recall@10 >= target) gates future ranking
|
||||
changes.
|
||||
|
||||
---
|
||||
|
||||
@@ -222,7 +227,7 @@ Each entry is a `(query, expected_paths[])` row in a fixture (e.g. `crates/tui/t
|
||||
|
||||
### Phasing
|
||||
|
||||
- **Phase 0 (this cycle, v0.8.53):** this design note + eval fixture only. No catalog code.
|
||||
- **Phase 0 (this cycle, v0.8.53):** this design note + benchmark table only. No fixture, harness, or catalog code.
|
||||
- **Phase 1 (v0.9.0):** local lexical core — FTS5 `bm25()` + symbol + path + session-relevance + exact grep fallback, fused via RRF. SQLite index at `~/.codewhale/index/<workspace-hash>.db`. Eval harness wired into CI. **No network, no model downloads.** Tool registered as deferred (hydrated via tool-search) initially; promotion to the active first-turn set is a separate, deliberate decision (see lifecycle below) because of the prefix-cache invariant.
|
||||
- **Phase 2:** incremental/background reindex, branch-aware invalidation hardening, richer chunkers (tree-sitter per language).
|
||||
- **Phase 3 (feature-flagged, off by default):** `sparse-splade` and `dense-embed` RRF signals. Embedding/HF downloads behind the flag + workset opt-in (§3 Privacy).
|
||||
|
||||
+13
-9
@@ -205,21 +205,25 @@ separate, deliberate decision per name.
|
||||
|
||||
## 5. Active-catalog budget (per mode, per provider)
|
||||
|
||||
The active set is the first-turn cost. Current default active set:
|
||||
`DEFAULT_ACTIVE_NATIVE_TOOLS` has **25** entries (`tool_catalog.rs:37-64`).
|
||||
The active set is the first-turn cost. Do not duplicate the exact
|
||||
`DEFAULT_ACTIVE_NATIVE_TOOLS` count here: adjacent PRs in the v0.8.53 batch may
|
||||
add or remove active tools, and the source of truth is always
|
||||
`tool_catalog.rs`. This document defines the diet policy and invariants, not a
|
||||
second catalog snapshot.
|
||||
|
||||
### Per provider
|
||||
|
||||
| Provider | First-turn active source | Current count | Target after diet |
|
||||
|---|---|---|---|
|
||||
| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | 25 | ~22 (drop `exec_wait`, `exec_interact`; `todo_*` already not active) |
|
||||
| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | 8 (read-only WAF workaround) | 8 (unchanged) |
|
||||
| Provider | First-turn active source | Budget policy |
|
||||
|---|---|---|
|
||||
| Default (DeepSeek et al.) | `DEFAULT_ACTIVE_NATIVE_TOOLS` | Remove duplicate aliases from the active head when their canonical twins stay active; any net growth needs an explicit budget decision. |
|
||||
| Arcee (Trinity) | `ARCEE_FIRST_TURN_NATIVE_TOOLS` | Provider-specific read-only WAF workaround; unchanged by the default diet unless explicitly reviewed. |
|
||||
|
||||
The default diet removes `exec_wait` and `exec_interact` from the active head
|
||||
(they become hidden-compat; their canonical twins `exec_shell_wait` /
|
||||
`exec_shell_interact` stay). `tts` and `todo_*` are *already not* in the active
|
||||
set, so the active count moves **25 → 23** from the wait/interact removal alone;
|
||||
the broader target is a stable budget of roughly **≤ 22** canonical tools.
|
||||
set, so they do not change the active budget in this diet. The net effect of
|
||||
this specific diet is to remove two duplicate active aliases from whatever
|
||||
default active head is current after the surrounding v0.8.53 PR batch.
|
||||
|
||||
### Per mode (Plan / Agent / YOLO)
|
||||
|
||||
@@ -232,7 +236,7 @@ add or remove native tools from `DEFAULT_ACTIVE_NATIVE_TOOLS`
|
||||
|
||||
| Mode | Native active budget | MCP tools active? |
|
||||
|---|---|---|
|
||||
| Plan | same native head (target ≤ 22) | No (deferred) |
|
||||
| Plan | same native head | No (deferred) |
|
||||
| Agent | same native head | No (deferred) |
|
||||
| YOLO | same native head | Yes (a known, intentional widening) |
|
||||
|
||||
|
||||
@@ -357,11 +357,28 @@ prefix has a single, memorable responsibility:
|
||||
| `/memory` | **Small** user prefs/facts only |
|
||||
| `/context` | **Dashboard** of all active memory layers (§6) |
|
||||
| `/rules` | Repo guidance |
|
||||
| `.codewhale/constitution.json` | Repo constitution: checked-in **local law** |
|
||||
| `/workflow` (`/whaleflow`) | Long-running multi-agent runs (§4) |
|
||||
| `/overlay` | Promoted cached-main lessons (§6/§8) |
|
||||
| `$<skill-name>` | Skill invocation — **the token *is* the skill name** |
|
||||
| `codebase_search` | Concept-level code retrieval (§2) |
|
||||
|
||||
The repo constitution is not another memory bucket. It is the local-law layer in
|
||||
a layered authority model:
|
||||
|
||||
```
|
||||
base myth / global Constitution
|
||||
-> repo constitution (.codewhale/constitution.json)
|
||||
-> task packet
|
||||
-> runtime policy
|
||||
```
|
||||
|
||||
At conflict time, the **current user request for the task remains above the repo
|
||||
constitution**; the repo constitution supplies durable defaults and local law
|
||||
only when the active task packet and runtime policy leave room. Runtime policy is
|
||||
the compiled enforcement surface for the run, not a separate place for the model
|
||||
to invent new rules.
|
||||
|
||||
**Why it helps weaker models (and users).** No overloaded command does five
|
||||
jobs; the model/user never has to disambiguate *which* `/memory` behavior they
|
||||
meant. `$systematic-debugging` self-documents what it invokes.
|
||||
@@ -395,6 +412,7 @@ The submit-time parser (to be added; submit path `ui.rs ~4721`) recognizes the
|
||||
```
|
||||
/context
|
||||
user-memory ▸ 7 facts (12 KB) [clear]
|
||||
repo-constitution ▸ .codewhale/constitution.json (4 KB) [view]
|
||||
repo-rules ▸ CLAUDE.md, AGENTS.md (8 KB) [view]
|
||||
codemap-wiki ▸ 412 symbols indexed (auto) [rebuild]
|
||||
trace-store ▸ 3 recent workflow runs (—) [open]
|
||||
|
||||
Reference in New Issue
Block a user