perf(context): cache project context with content signatures

Harvested from PR #2636 by @HUQIANTAO with widened cache invalidation for constitution files, generated context, trust state, canonical paths, and same-length overwrites.

Co-authored-by: HUQIANTAO <58421104+HUQIANTAO@users.noreply.github.com>
This commit is contained in:
Hunter B
2026-06-03 23:49:08 -07:00
parent 6a7063c912
commit e18f072a5a
8 changed files with 475 additions and 21 deletions
+9 -2
View File
@@ -97,14 +97,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
top-level folders visible in noisy large workspaces while the dynamic
`<project_context_pack>` marker remains controlled by its own setting
(#697, #1827).
- Project context loading now uses a bounded process-local content-signature
cache for repeated hot-path loads. The cache covers workspace/parent
instructions, global AGENTS/WHALE fallbacks, repo constitution files,
generated-context targets, trust markers, and trust config paths, and it
stores post-load signatures so auto-generated context deletion/regeneration
stays correct (#2636).
### Community
Thanks to **@cyq1017** for the restore-listing implementation (#2513) and
pending-input delivery-mode label work (#2532, #2054),
**@wywsoor** for the broader macOS/iTerm rollback UX report (#2494),
**@HUQIANTAO** for the `web_run` lock-splitting work (#2502) and turn-metadata
prefix-cache stability work (#2517), **@xyuai** for canonical CodeWhale
**@HUQIANTAO** for the `web_run` lock-splitting work (#2502), turn-metadata
prefix-cache stability work (#2517), and project-context cache direction
(#2636), **@xyuai** for canonical CodeWhale
settings-path migration work (#2730), **@gaord** for the runtime thread
workspace update and completed-thread save APIs (#2640, #2639),
**@shenjackyuanjie** for the
+2 -1
View File
@@ -624,7 +624,8 @@ Current v0.9 track credits:
- **[shenjackyuanjie](https://github.com/shenjackyuanjie)** — HarmonyOS /
OpenHarmony porting work and MatePad Edge validation trail (#2634)
- **[HUQIANTAO](https://github.com/HUQIANTAO)** — `web_run` cache-state
lock-splitting and turn-metadata prefix-cache stability work (#2502, #2517)
lock-splitting, turn-metadata prefix-cache stability, and project-context
cache work (#2502, #2517, #2636)
- **[idling11](https://github.com/idling11)** — PlanArtifact continuity and
dense tool-call transcript collapse direction (#2733, #2738, #2692)
- **[h3c-hexin](https://github.com/h3c-hexin)** — sub-agent model inheritance,
+9 -2
View File
@@ -97,14 +97,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
top-level folders visible in noisy large workspaces while the dynamic
`<project_context_pack>` marker remains controlled by its own setting
(#697, #1827).
- Project context loading now uses a bounded process-local content-signature
cache for repeated hot-path loads. The cache covers workspace/parent
instructions, global AGENTS/WHALE fallbacks, repo constitution files,
generated-context targets, trust markers, and trust config paths, and it
stores post-load signatures so auto-generated context deletion/regeneration
stays correct (#2636).
### Community
Thanks to **@cyq1017** for the restore-listing implementation (#2513) and
pending-input delivery-mode label work (#2532, #2054),
**@wywsoor** for the broader macOS/iTerm rollback UX report (#2494),
**@HUQIANTAO** for the `web_run` lock-splitting work (#2502) and turn-metadata
prefix-cache stability work (#2517), **@xyuai** for canonical CodeWhale
**@HUQIANTAO** for the `web_run` lock-splitting work (#2502), turn-metadata
prefix-cache stability work (#2517), and project-context cache direction
(#2636), **@xyuai** for canonical CodeWhale
settings-path migration work (#2730), **@gaord** for the runtime thread
workspace update and completed-thread save APIs (#2640, #2639),
**@shenjackyuanjie** for the
+14
View File
@@ -2936,6 +2936,20 @@ fn home_config_path() -> Option<PathBuf> {
})
}
pub(crate) fn workspace_trust_config_candidate_paths() -> Vec<PathBuf> {
if let Some(path) = env_config_path() {
return vec![path];
}
let Some(home) = effective_home_dir() else {
return Vec::new();
};
vec![
home.join(".codewhale").join("config.toml"),
home.join(".deepseek").join("config.toml"),
]
}
#[must_use]
pub(crate) fn is_workspace_trusted(workspace: &Path) -> bool {
let Some(config_path) = default_config_path() else {
+1
View File
@@ -51,6 +51,7 @@ mod palette;
mod prefix_cache;
mod pricing;
mod project_context;
mod project_context_cache;
mod project_doc;
mod prompt_zones;
mod prompts;
+219 -15
View File
@@ -660,7 +660,23 @@ pub fn load_project_context(workspace: &Path) -> ProjectContext {
///
/// This allows for monorepo setups where a root AGENTS.md applies to all subdirectories.
pub fn load_project_context_with_parents(workspace: &Path) -> ProjectContext {
load_project_context_with_parents_and_home(workspace, dirs::home_dir().as_deref())
load_project_context_with_parents_cached_and_home(workspace, dirs::home_dir().as_deref())
}
fn load_project_context_with_parents_cached_and_home(
workspace: &Path,
home_dir: Option<&Path>,
) -> ProjectContext {
let workspace = canonicalize_workspace_or_keep(workspace);
let pre_load_key = crate::project_context_cache::compute_cache_key(&workspace, home_dir);
if let Some(ctx) = crate::project_context_cache::lookup(&pre_load_key) {
return ctx;
}
let ctx = load_project_context_with_parents_and_home(&workspace, home_dir);
let post_load_key = crate::project_context_cache::compute_cache_key(&workspace, home_dir);
crate::project_context_cache::store(post_load_key, ctx.clone());
ctx
}
fn load_project_context_with_parents_and_home(
@@ -746,6 +762,80 @@ fn load_project_context_with_parents_and_home(
ctx
}
pub(crate) fn project_context_cache_candidate_paths(
workspace: &Path,
home_dir: Option<&Path>,
) -> Vec<PathBuf> {
let workspace = canonicalize_workspace_or_keep(workspace);
let mut paths = Vec::new();
let mut current = Some(workspace.as_path());
while let Some(dir) = current {
for filename in PROJECT_CONTEXT_FILES {
paths.push(dir.join(filename));
}
current = dir.parent();
}
if let Some(home) = home_dir {
for candidate in global_context_relative_paths() {
paths.push(join_relative_components(home, candidate));
}
}
paths.extend(repo_constitution_candidate_paths(&workspace));
paths.push(workspace.join(".deepseek").join("trusted"));
paths.push(workspace.join(".deepseek").join("trust.json"));
paths.extend(crate::config::workspace_trust_config_candidate_paths());
paths
}
fn repo_constitution_candidate_paths(workspace: &Path) -> Vec<PathBuf> {
let git_root = crate::project_doc::find_git_root(workspace);
let mut current = workspace.to_path_buf();
let mut paths = Vec::new();
loop {
paths.push(join_relative_components(
&current,
REPO_CONSTITUTION_RELATIVE_PATH,
));
if let Some(ref root) = git_root
&& current == *root
{
break;
}
match current.parent() {
Some(parent) if parent != current => current = parent.to_path_buf(),
_ => break,
}
}
paths
}
fn global_context_relative_paths() -> [&'static [&'static str]; 6] {
[
GLOBAL_AGENTS_RELATIVE_PATH,
GLOBAL_AGENTS_VENDOR_NEUTRAL_PATH,
GLOBAL_AGENTS_LEGACY_PATH,
GLOBAL_WHALE_RELATIVE_PATH,
GLOBAL_WHALE_VENDOR_NEUTRAL_PATH,
GLOBAL_WHALE_LEGACY_PATH,
]
}
fn join_relative_components(base: &Path, relative: &[&str]) -> PathBuf {
let mut path = base.to_path_buf();
for component in relative {
path.push(component);
}
path
}
fn canonicalize_workspace_or_keep(workspace: &Path) -> PathBuf {
fs::canonicalize(workspace).unwrap_or_else(|_| workspace.to_path_buf())
}
/// Combine global user-wide preferences with a project-local
/// AGENTS.md/CLAUDE.md/instructions.md. Global comes first so
/// workspace-specific rules can override it — the model reads in declared
@@ -776,22 +866,10 @@ fn load_global_agents_context(workspace: &Path, home_dir: Option<&Path>) -> Opti
// 4. ~/.codewhale/WHALE.md (deprecated, legacy fallback)
// 5. ~/.agents/WHALE.md (deprecated, vendor-neutral legacy)
// 6. ~/.deepseek/WHALE.md (deprecated, legacy)
let candidates: &[&[&str]] = &[
GLOBAL_AGENTS_RELATIVE_PATH,
GLOBAL_AGENTS_VENDOR_NEUTRAL_PATH,
GLOBAL_AGENTS_LEGACY_PATH,
GLOBAL_WHALE_RELATIVE_PATH,
GLOBAL_WHALE_VENDOR_NEUTRAL_PATH,
GLOBAL_WHALE_LEGACY_PATH,
];
let mut warnings = Vec::new();
for candidate in candidates {
let mut path = home.to_path_buf();
for component in *candidate {
path.push(component);
}
for candidate in global_context_relative_paths() {
let path = join_relative_components(home, candidate);
if path.exists() && path.is_file() {
match load_context_file(&path) {
@@ -1434,6 +1512,132 @@ mod tests {
);
}
#[test]
fn cached_context_reflects_overwritten_agents_md() {
crate::project_context_cache::clear();
let workspace = tempdir().expect("workspace tempdir");
let home = tempdir().expect("home tempdir");
let agents = workspace.path().join("AGENTS.md");
fs::write(&agents, "alpha").expect("write alpha");
let first =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(
first
.instructions
.as_deref()
.is_some_and(|s| s.contains("alpha")),
"expected alpha instructions: {:?}",
first.instructions
);
fs::write(&agents, "bravo").expect("write bravo");
let second =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(
second
.instructions
.as_deref()
.is_some_and(|s| s.contains("bravo")),
"cache must invalidate on same-length content overwrite: {:?}",
second.instructions
);
}
#[test]
fn cached_context_reflects_constitution_json_change() {
crate::project_context_cache::clear();
let workspace = tempdir().expect("workspace tempdir");
let home = tempdir().expect("home tempdir");
fs::create_dir(workspace.path().join(".git")).expect("mkdir git");
fs::create_dir(workspace.path().join(".codewhale")).expect("mkdir codewhale");
let constitution = workspace
.path()
.join(".codewhale")
.join("constitution.json");
fs::write(
&constitution,
r#"{"schema_version":1,"authority":["alpha authority"]}"#,
)
.expect("write alpha constitution");
let first =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(
first
.constitution_block
.as_deref()
.is_some_and(|s| s.contains("alpha authority")),
"expected alpha constitution block: {:?}",
first.constitution_block
);
fs::write(
&constitution,
r#"{"schema_version":1,"authority":["bravo authority"]}"#,
)
.expect("write bravo constitution");
let second =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(
second
.constitution_block
.as_deref()
.is_some_and(|s| s.contains("bravo authority")),
"cache must invalidate when constitution changes: {:?}",
second.constitution_block
);
}
#[test]
fn cached_context_regenerates_after_auto_generated_context_is_deleted() {
crate::project_context_cache::clear();
let workspace = tempdir().expect("workspace tempdir");
let home = tempdir().expect("home tempdir");
let first =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(first.has_instructions());
let generated_path = workspace.path().join(".codewhale").join("instructions.md");
assert!(generated_path.is_file(), "expected generated instructions");
fs::remove_file(&generated_path).expect("remove generated instructions");
assert!(!generated_path.exists());
let second =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(second.has_instructions());
assert!(
generated_path.is_file(),
"cache hit under the missing-file signature would skip regeneration"
);
}
#[test]
fn cached_context_reflects_trust_marker_created() {
crate::project_context_cache::clear();
let workspace = tempdir().expect("workspace tempdir");
let home = tempdir().expect("home tempdir");
fs::write(workspace.path().join("AGENTS.md"), "instructions").expect("write agents");
let first =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(!first.is_trusted);
let trust_dir = workspace.path().join(".deepseek");
fs::create_dir(&trust_dir).expect("mkdir trust dir");
fs::write(trust_dir.join("trusted"), "").expect("write trust marker");
let second =
load_project_context_with_parents_cached_and_home(workspace.path(), Some(home.path()));
assert!(
second.is_trusted,
"cache must invalidate when trust marker appears"
);
}
#[test]
fn project_context_pack_sort_is_cross_platform_and_priority_aware() {
let mut unix_paths = vec![
+220
View File
@@ -0,0 +1,220 @@
//! Process-local cache for project context loading.
//!
//! The project-context loader sits on prompt/session hot paths and repeatedly
//! checks the same workspace, parent, global, constitution, and trust files.
//! This cache avoids rereading unchanged context while keeping the signature
//! broad enough for the loader's side effects and authority surfaces.
use std::cell::RefCell;
use std::collections::{HashMap, VecDeque};
use std::path::{Path, PathBuf};
use sha2::{Digest, Sha256};
use crate::project_context::ProjectContext;
const DEFAULT_CAPACITY: usize = 8;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub(crate) struct CacheKey {
workspace: PathBuf,
signature: ContentSignature,
}
#[derive(Debug, Clone, Default, PartialEq, Eq, Hash)]
struct ContentSignature {
entries: Vec<ContentEntry>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct ContentEntry {
path: PathBuf,
fingerprint: Option<String>,
}
#[derive(Debug, Default)]
struct WorkspaceCache {
by_key: HashMap<CacheKey, ProjectContext>,
order: VecDeque<CacheKey>,
}
thread_local! {
static CACHE: RefCell<WorkspaceCache> = RefCell::new(WorkspaceCache::default());
}
pub(crate) fn lookup(key: &CacheKey) -> Option<ProjectContext> {
CACHE.with(|cache| cache.borrow().by_key.get(key).cloned())
}
pub(crate) fn store(key: CacheKey, value: ProjectContext) {
CACHE.with(|cache| {
let mut cache = cache.borrow_mut();
if cache.by_key.insert(key.clone(), value).is_none() {
cache.order.push_back(key);
}
while cache.by_key.len() > DEFAULT_CAPACITY {
let Some(oldest) = cache.order.pop_front() else {
break;
};
cache.by_key.remove(&oldest);
}
});
}
#[cfg(test)]
pub(crate) fn clear() {
CACHE.with(|cache| {
let mut cache = cache.borrow_mut();
cache.by_key.clear();
cache.order.clear();
});
}
#[must_use]
pub(crate) fn compute_cache_key(workspace: &Path, home_dir: Option<&Path>) -> CacheKey {
let workspace = canonicalize_or_keep(workspace);
CacheKey {
signature: ContentSignature::for_loader(&workspace, home_dir),
workspace,
}
}
impl ContentSignature {
fn for_loader(workspace: &Path, home_dir: Option<&Path>) -> Self {
let mut entries: Vec<ContentEntry> =
crate::project_context::project_context_cache_candidate_paths(workspace, home_dir)
.into_iter()
.map(|path| ContentEntry {
fingerprint: file_fingerprint(&path),
path,
})
.collect();
entries.sort_by(|a, b| a.path.cmp(&b.path));
entries.dedup_by(|a, b| a.path == b.path);
Self { entries }
}
}
fn file_fingerprint(path: &Path) -> Option<String> {
let metadata = std::fs::metadata(path).ok()?;
if !metadata.is_file() {
return Some("non-file".to_string());
}
match std::fs::read(path) {
Ok(bytes) => {
let mut hasher = Sha256::new();
hasher.update(&bytes);
Some(format!("sha256:{}", to_hex(&hasher.finalize())))
}
Err(error) => {
let modified = metadata
.modified()
.ok()
.and_then(|mtime| mtime.duration_since(std::time::UNIX_EPOCH).ok())
.map(|duration| format!("{}:{}", duration.as_secs(), duration.subsec_nanos()))
.unwrap_or_else(|| "unknown".to_string());
Some(format!(
"unreadable:{}:{}:{error}",
metadata.len(),
modified
))
}
}
}
fn canonicalize_or_keep(path: &Path) -> PathBuf {
std::fs::canonicalize(path).unwrap_or_else(|_| path.to_path_buf())
}
fn to_hex(bytes: &[u8]) -> String {
let mut out = String::with_capacity(bytes.len() * 2);
for byte in bytes {
use std::fmt::Write as _;
let _ = write!(&mut out, "{byte:02x}");
}
out
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs;
use tempfile::tempdir;
#[test]
fn cache_round_trip() {
clear();
let key = CacheKey {
workspace: PathBuf::from("/tmp/context-cache-round-trip"),
signature: ContentSignature::default(),
};
let ctx = ProjectContext::empty(PathBuf::from("/tmp/context-cache-round-trip"));
store(key.clone(), ctx.clone());
let got = lookup(&key).expect("cache hit");
assert_eq!(got.project_root, ctx.project_root);
}
#[test]
fn store_does_not_grow_unbounded() {
clear();
for i in 0..(DEFAULT_CAPACITY + 4) {
let key = CacheKey {
workspace: PathBuf::from(format!("/tmp/workspace-{i}")),
signature: ContentSignature::default(),
};
store(key, ProjectContext::empty(PathBuf::from("/tmp")));
}
let count = CACHE.with(|cache| cache.borrow().by_key.len());
assert!(count <= DEFAULT_CAPACITY, "cache held {count} entries");
}
#[test]
fn cache_key_canonicalizes_equivalent_workspace_paths() {
let workspace = tempdir().expect("workspace");
let home = tempdir().expect("home");
let plain = compute_cache_key(workspace.path(), Some(home.path()));
let dotted = compute_cache_key(&workspace.path().join("."), Some(home.path()));
assert_eq!(plain, dotted);
}
#[test]
fn signature_changes_when_agents_md_is_overwritten_same_length() {
let workspace = tempdir().expect("workspace");
let home = tempdir().expect("home");
fs::write(workspace.path().join("AGENTS.md"), "alpha").expect("write alpha");
let before = compute_cache_key(workspace.path(), Some(home.path()));
fs::write(workspace.path().join("AGENTS.md"), "bravo").expect("write bravo");
let after = compute_cache_key(workspace.path(), Some(home.path()));
assert_ne!(before, after);
}
#[test]
fn signature_changes_when_constitution_json_changes() {
let workspace = tempdir().expect("workspace");
let home = tempdir().expect("home");
fs::create_dir(workspace.path().join(".git")).expect("mkdir git");
fs::create_dir(workspace.path().join(".codewhale")).expect("mkdir codewhale");
let constitution = workspace
.path()
.join(".codewhale")
.join("constitution.json");
fs::write(&constitution, r#"{"schema_version":1,"authority":["a"]}"#)
.expect("write constitution a");
let before = compute_cache_key(workspace.path(), Some(home.path()));
fs::write(&constitution, r#"{"schema_version":1,"authority":["b"]}"#)
.expect("write constitution b");
let after = compute_cache_key(workspace.path(), Some(home.path()));
assert_ne!(before, after);
}
}
+1 -1
View File
@@ -56,7 +56,7 @@ harvest/stewardship commits:
| #2029 sub-agent checkpoint continuation | Locally implemented as the live-timeout recovery slice. | Sub-agents now persist `SubAgentCheckpoint` metadata through state, results, projections, and transcript handles. The runner checkpoints local messages before API calls and after model/tool cycles; per-step API timeout marks the child interrupted with `continuable=true`; `agent_eval { continue: true }` resumes only live checkpointed interrupted children. Reload preserves checkpoint metadata, but cold-restart continuation is intentionally not claimed because the child task/input channel is not rehydrated yet. `cargo test -p codewhale-tui --bin codewhale-tui --locked subagent -- --nocapture`, `cargo fmt --all -- --check`, `git diff --check`, and `cargo clippy -p codewhale-tui --locked -- -D warnings` passed. Credit @qiyuanlicn for the recovery report; keep #2029 open only if cold-restart continuation or broader checkpoint UX remains required. |
| #1786 stale running task recovery | Locally implemented as the durable restart-safety slice. | `TaskManager::load_state` now marks tasks that were persisted as `running` in a prior process as failed with an explicit restart/interrupted error instead of requeueing them. Running tool-call summaries inside those stale tasks are also marked failed. `cargo test -p codewhale-tui --bin codewhale-tui --locked running_tasks_are_not_requeued_after_restart -- --nocapture` and `cargo test -p codewhale-tui --bin codewhale-tui --locked task_manager -- --nocapture` passed. Credit @bevis-wong; keep #1786 open for foreground shell hang root cause and careful LIVE-state watchdog work that does not abort legitimate foreground commands. |
| #697/#1827 bounded auto-generated project context | Locally implemented from the stabilization audit. | When no project instructions exist, startup now writes `.codewhale/instructions.md` from the bounded Project Context Pack data instead of an unbounded summary/tree scan. The generated file avoids the dynamic `<project_context_pack>` marker when that setting is disabled, keeps later top-level folders visible, and omits noisy directory tails. `cargo test -p codewhale-tui --bin codewhale-tui --locked auto_generated_context_is_bounded_for_many_file_workspace -- --nocapture` and `cargo test -p codewhale-tui --bin codewhale-tui --locked project_context_pack -- --nocapture` passed. Credit reporters @NASLXTO and @wuxixing, plus earlier context-cap/startup work from @linzhiqin2003 and @merchloubna70-dot; leave #697/#1827 open pending real massive-repo/manual startup verification. |
| #2636 project-context mtime cache | Defer direct merge; harvest only after cache key/signature is widened. | Must include constitution changes, auto-generated context deletion, canonical path equivalence, and overwrite detection before landing. |
| #2636 project-context context-signature cache | Locally harvested with widened invalidation. | Project context hot-path loads now use a bounded process-local cache keyed by canonical workspace plus content fingerprints for workspace/parent instructions, global AGENTS/WHALE fallbacks, repo constitution candidates, generated-context targets, trust markers, and trust config paths. The wrapper stores under a post-load signature so auto-generated `.codewhale/instructions.md` deletion/regeneration stays correct. `cargo test -p codewhale-tui --bin codewhale-tui --locked project_context -- --nocapture` passed. Credit @HUQIANTAO; comment/close #2636 after the integration branch is public. |
| #2634 HarmonyOS port | Locally harvested with additional Nix-chain clearance; keep credited and do not close until the integration branch is public. | User-supplied MatePad Edge demo (`https://bilibili.com/video/av116689597368905`) confirms real-device interest. Added env-driven OpenHarmony SDK setup, OHOS platform guards/fallbacks, self-update disablement, and OHOS target gating for Starlark execpolicy parsing plus PTY support so published OHOS builds do not pull `nix` 0.28 through `rustyline` or `portable-pty`. `./scripts/release/check-ohos-deps.sh` now guards the OHOS graph against `nix` 0.28/0.29, `portable-pty`, `starlark`, `arboard`, and `keyring`; `cargo check --workspace --all-features --locked` and focused PTY/clipboard tests passed. Full OHOS target check is blocked on this host because `OHOS_NATIVE_SDK`/target CC/sysroot are not configured and `ring` cannot find `assert.h`. |
| #2687 append-only mode/approval prompt | Defer direct merge; draft has compile failures and Plan-mode prompt correctness risks. | Any future harvest must keep stable `message[0]` genuinely mode-agnostic, preserve mode/approval suffixes after capacity replans, and distinguish external overrides from persisted generated prompts. |
| #2581 provider fallback chain design doc | Manually harvested as `docs/rfcs/2574-provider-fallback-chain.md` because the current PR head has no net file changes. | Keep issue #2574 open for implementation; close/comment on #2581 after the integration branch is public, crediting @idling11 and reporter @hsdbeebou. |