feat(tui): add configurable auto-compact threshold
Refs #1722 Preserves auto_compact as opt-in, adds the saved threshold setting, keeps the 500K hard floor, and wires Ctrl+L as a manual compaction shortcut for context-pressure recovery. Harvested from PR #1723 by @aboimpinto Co-authored-by: Paulo Aboim Pinto <aboimpinto@gmail.com>
This commit is contained in:
+4
-2
@@ -481,8 +481,10 @@ exponential_base = 2.0
|
||||
# Context Compaction
|
||||
# ─────────────────────────────────────────────────────────────────────────────────
|
||||
# Auto-compaction is a saved UI setting edited with `/config` (`auto_compact`).
|
||||
# There is no config-file `[compaction]` table yet; detailed thresholds are
|
||||
# chosen by the TUI from the active model/context budget.
|
||||
# The optional saved threshold setting is `auto_compact_threshold_percent`
|
||||
# (default 70, still gated by the 500K-token floor). There is no config-file
|
||||
# `[compaction]` table yet; runtime compaction budgets are chosen by the TUI
|
||||
# from the active model/context window.
|
||||
|
||||
# Append-only Flash seams are experimental and opt-in while the v0.7.5
|
||||
# context/cache audit validates prefix-cache behavior.
|
||||
|
||||
@@ -995,7 +995,7 @@ pub fn system_prompt_for_mode_with_context_skills_session_and_approval(
|
||||
1. Use `/compact` to summarize earlier context and free up space\n\
|
||||
2. The system will preserve important information (files you're working on, recent messages, tool results)\n\
|
||||
3. After compaction, you'll see a summary of what was discussed and can continue seamlessly\n\n\
|
||||
If you notice context is getting long (>60% during sustained work), proactively suggest using `/compact` to the user.\n\n\
|
||||
If you notice context is getting long (>60% during sustained work), proactively suggest using `/compact` or Ctrl+L to the user. If auto_compact is enabled, the engine can compact before the next send once the configured threshold is crossed.\n\n\
|
||||
### Prompt-cache awareness\n\n\
|
||||
DeepSeek caches the longest *byte-stable prefix* of every request and charges roughly 100× less for cache-hit tokens than miss tokens. The system prompt above is layered most-static-first specifically so the prefix stays stable turn-over-turn. To keep cache hits high:\n\
|
||||
- **Working set location:** the current repo working set is stored on new user messages inside a `<turn_meta>` block. Treat it as high-priority turn metadata, not as a stable system-prompt section.\n\
|
||||
|
||||
@@ -204,7 +204,7 @@ For exact counts or structured aggregates, compute them directly in Python insid
|
||||
|
||||
## Context Management
|
||||
|
||||
You have a 1M-token context window. During long coding sessions, suggest `/compact` when usage approaches ~60% or when the app marks context pressure as high. It summarizes earlier turns so you can keep working without losing thread.
|
||||
You have a 1M-token context window. During long coding sessions, suggest `/compact` or Ctrl+L when usage approaches ~60% or when the app marks context pressure as high. If auto_compact is enabled, the engine can compact before the next send once the configured threshold is crossed. Compaction summarizes earlier turns so you can keep working without losing thread.
|
||||
|
||||
Model notes: DeepSeek V4 models emit *thinking tokens* (`ContentBlock::Thinking`) before final answers. These are invisible to the user but count against context. Cost/token estimates are approximate; treat them as a rough guide.
|
||||
|
||||
|
||||
@@ -31,7 +31,7 @@ RLM works by keeping the long input and intermediate values as symbolic REPL sta
|
||||
The Python helpers visible inside the REPL (`sub_query`, `sub_query_batch`, `sub_query_map`, `sub_rlm`, `finalize`, and related context helpers) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code.
|
||||
|
||||
## Context
|
||||
You have a 1M-token context window. During long coding sessions, suggest `/compact` when usage approaches ~60% or when the app marks context pressure as high. It summarizes earlier turns so you can keep working without losing thread.
|
||||
You have a 1M-token context window. During long coding sessions, suggest `/compact` or Ctrl+L when usage approaches ~60% or when the app marks context pressure as high. If auto_compact is enabled, the engine can compact before the next send once the configured threshold is crossed. Compaction summarizes earlier turns so you can keep working without losing thread.
|
||||
|
||||
Model notes: DeepSeek V4 models emit *thinking tokens* (`ContentBlock::Thinking`) before final answers. These are invisible to the user but count against context. Cost/token estimates are approximate; treat them as a rough guide.
|
||||
|
||||
|
||||
@@ -26,6 +26,6 @@ Don't sequence approvals one at a time — the user wants context, not interrupt
|
||||
Long sessions accumulate context. To stay fast:
|
||||
- Open sub-agent sessions for independent work instead of doing everything sequentially
|
||||
- Batch reads/searches/git-inspections into parallel tool calls
|
||||
- Suggest `/compact` when context nears 60% during sustained work — the compaction relay preserves open blockers
|
||||
- Suggest `/compact` or Ctrl+L when context nears 60% during sustained work — the compaction relay preserves open blockers
|
||||
- Use `note` for decisions you'll need across compaction boundaries
|
||||
- A 3-turn session that fans out to sub-agents finishes faster AND stays responsive longer than a 15-turn sequential grind
|
||||
|
||||
@@ -171,6 +171,9 @@ impl TuiPrefs {
|
||||
pub struct Settings {
|
||||
/// Auto-compact conversations when they approach the model limit.
|
||||
pub auto_compact: bool,
|
||||
/// Context-window percentage that triggers pre-send auto-compaction when
|
||||
/// `auto_compact` is enabled. The hard token floor still applies.
|
||||
pub auto_compact_threshold_percent: f64,
|
||||
/// Reduce status noise and collapse details more aggressively
|
||||
pub calm_mode: bool,
|
||||
/// Streaming pacing mode. `true` pins the chunker to one-character-per-
|
||||
@@ -299,6 +302,7 @@ impl Default for Settings {
|
||||
// available for users / agents that decide compaction is
|
||||
// worth the cache hit on their workload (#664).
|
||||
auto_compact: false,
|
||||
auto_compact_threshold_percent: 70.0,
|
||||
calm_mode: false,
|
||||
low_motion: false,
|
||||
fancy_animations: true,
|
||||
@@ -497,6 +501,10 @@ impl Settings {
|
||||
"auto_compact" | "compact" => {
|
||||
self.auto_compact = parse_bool(value)?;
|
||||
}
|
||||
"auto_compact_threshold" | "auto_compact_threshold_percent" => {
|
||||
self.auto_compact_threshold_percent =
|
||||
parse_percent_setting("auto_compact_threshold_percent", value)?;
|
||||
}
|
||||
"calm_mode" | "calm" => {
|
||||
self.calm_mode = parse_bool(value)?;
|
||||
}
|
||||
@@ -701,6 +709,10 @@ impl Settings {
|
||||
lines.push(tr(locale, MessageId::SettingsTitle).to_string());
|
||||
lines.push("─────────────────────────────".to_string());
|
||||
lines.push(format!(" auto_compact: {}", self.auto_compact));
|
||||
lines.push(format!(
|
||||
" auto_compact_pct: {:.0}",
|
||||
self.auto_compact_threshold_percent
|
||||
));
|
||||
lines.push(format!(" calm_mode: {}", self.calm_mode));
|
||||
lines.push(format!(" low_motion: {}", self.low_motion));
|
||||
lines.push(format!(" fancy_animations: {}", self.fancy_animations));
|
||||
@@ -768,6 +780,10 @@ impl Settings {
|
||||
"auto_compact",
|
||||
"Auto-compact near the hard context limit: on/off (default off)",
|
||||
),
|
||||
(
|
||||
"auto_compact_threshold_percent",
|
||||
"Auto-compact trigger threshold percent when auto_compact is on: 10-100 (default 70)",
|
||||
),
|
||||
("calm_mode", "Calmer UI defaults: on/off"),
|
||||
(
|
||||
"low_motion",
|
||||
@@ -932,6 +948,21 @@ fn parse_usize_setting(key: &str, value: &str) -> Result<usize> {
|
||||
})
|
||||
}
|
||||
|
||||
fn parse_percent_setting(key: &str, value: &str) -> Result<f64> {
|
||||
let trimmed = value.trim().trim_end_matches('%').trim();
|
||||
let percent = trimmed.parse::<f64>().map_err(|_| {
|
||||
anyhow::anyhow!(
|
||||
"Failed to update setting: invalid {key} '{value}'. Expected a number from 10 to 100."
|
||||
)
|
||||
})?;
|
||||
if !(10.0..=100.0).contains(&percent) {
|
||||
anyhow::bail!(
|
||||
"Failed to update setting: invalid {key} '{value}'. Expected a number from 10 to 100."
|
||||
);
|
||||
}
|
||||
Ok(percent)
|
||||
}
|
||||
|
||||
fn normalize_mode(value: &str) -> &str {
|
||||
match value.trim().to_ascii_lowercase().as_str() {
|
||||
"edit" => "agent",
|
||||
@@ -1103,6 +1134,7 @@ mod tests {
|
||||
// flipped so the cache-friendly path is the one users get
|
||||
// without configuring anything (#664).
|
||||
assert!(!settings.auto_compact);
|
||||
assert_eq!(settings.auto_compact_threshold_percent, 70.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
@@ -1114,6 +1146,17 @@ mod tests {
|
||||
assert!(!settings.auto_compact);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn auto_compact_threshold_is_validated() {
|
||||
let mut settings = Settings::default();
|
||||
settings
|
||||
.set("auto_compact_threshold", "65%")
|
||||
.expect("threshold");
|
||||
assert_eq!(settings.auto_compact_threshold_percent, 65.0);
|
||||
assert!(settings.set("auto_compact_threshold", "9").is_err());
|
||||
assert!(settings.set("auto_compact_threshold", "101").is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn default_settings_show_footer_water_strip() {
|
||||
let settings = Settings::default();
|
||||
|
||||
@@ -1220,6 +1220,7 @@ pub struct App {
|
||||
#[allow(dead_code)]
|
||||
pub system_prompt: Option<SystemPrompt>,
|
||||
pub auto_compact: bool,
|
||||
pub auto_compact_threshold_percent: f64,
|
||||
pub calm_mode: bool,
|
||||
pub low_motion: bool,
|
||||
/// Pending #61 (animated working strip). Set from config but not read
|
||||
@@ -1748,6 +1749,7 @@ impl App {
|
||||
crate::config::active_provider_uses_env_only_api_key(&effective_auth_config);
|
||||
let was_onboarded = crate::tui::onboarding::is_onboarded();
|
||||
let auto_compact = settings.auto_compact;
|
||||
let auto_compact_threshold_percent = settings.auto_compact_threshold_percent;
|
||||
let calm_mode = settings.calm_mode;
|
||||
let low_motion = settings.low_motion;
|
||||
let fancy_animations = settings.fancy_animations;
|
||||
@@ -1946,6 +1948,7 @@ impl App {
|
||||
bracketed_paste_seen: false,
|
||||
system_prompt: None,
|
||||
auto_compact,
|
||||
auto_compact_threshold_percent,
|
||||
calm_mode,
|
||||
low_motion,
|
||||
fancy_animations,
|
||||
|
||||
@@ -40,7 +40,7 @@ use crate::client::{
|
||||
inspect_prompt_for_request,
|
||||
};
|
||||
use crate::commands;
|
||||
use crate::compaction::estimate_input_tokens_conservative;
|
||||
use crate::compaction::{MINIMUM_AUTO_COMPACTION_TOKENS, estimate_input_tokens_conservative};
|
||||
use crate::config::{
|
||||
ApiProvider, Config, DEFAULT_NVIDIA_NIM_BASE_URL, ProviderConfig, ProvidersConfig, StatusItem,
|
||||
UpdateConfig, save_provider_auth_mode_for,
|
||||
@@ -145,6 +145,7 @@ const MIN_CHAT_HEIGHT: u16 = 3;
|
||||
const MIN_COMPOSER_HEIGHT: u16 = 2;
|
||||
const CONTEXT_WARNING_THRESHOLD_PERCENT: f64 = 85.0;
|
||||
const CONTEXT_CRITICAL_THRESHOLD_PERCENT: f64 = 95.0;
|
||||
const CONTEXT_SUGGEST_COMPACT_THRESHOLD_PERCENT: f64 = 60.0;
|
||||
const UI_IDLE_POLL_MS: u64 = 48;
|
||||
const UI_ACTIVE_POLL_MS: u64 = 24;
|
||||
const WEB_CONFIG_POLL_MS: u64 = 16;
|
||||
@@ -2934,6 +2935,22 @@ async fn run_event_loop(
|
||||
continue;
|
||||
}
|
||||
|
||||
if matches!(key.code, KeyCode::Char('l') | KeyCode::Char('L'))
|
||||
&& key.modifiers.contains(KeyModifiers::CONTROL)
|
||||
&& app.view_stack.is_empty()
|
||||
{
|
||||
app.status_message = Some(if app.is_compacting {
|
||||
"Context compaction already in progress...".to_string()
|
||||
} else {
|
||||
"Compacting context (Ctrl+L)...".to_string()
|
||||
});
|
||||
if !app.is_compacting {
|
||||
let _ = engine_handle.send(Op::CompactContext).await;
|
||||
}
|
||||
app.needs_redraw = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
if matches!(key.code, KeyCode::Char('b') | KeyCode::Char('B'))
|
||||
&& key.modifiers.contains(KeyModifiers::CONTROL)
|
||||
&& app.view_stack.is_empty()
|
||||
@@ -4634,7 +4651,8 @@ async fn dispatch_user_message(
|
||||
});
|
||||
maybe_warn_context_pressure(app);
|
||||
if should_auto_compact_before_send(app) {
|
||||
app.status_message = Some("Context critical; compacting before send...".to_string());
|
||||
app.status_message =
|
||||
Some("Context threshold reached; compacting before send...".to_string());
|
||||
let _ = engine_handle.send(Op::CompactContext).await;
|
||||
}
|
||||
app.session.last_prompt_tokens = None;
|
||||
@@ -7869,14 +7887,21 @@ fn maybe_warn_context_pressure(app: &mut App) {
|
||||
return;
|
||||
};
|
||||
|
||||
if percent < CONTEXT_WARNING_THRESHOLD_PERCENT {
|
||||
let configured_threshold = app.auto_compact_threshold_percent.clamp(10.0, 100.0);
|
||||
let warning_threshold = CONTEXT_SUGGEST_COMPACT_THRESHOLD_PERCENT.min(configured_threshold);
|
||||
if percent < warning_threshold {
|
||||
return;
|
||||
}
|
||||
|
||||
let recommendation = if app.auto_compact {
|
||||
"Auto-compaction is enabled."
|
||||
let below_auto_floor = used < MINIMUM_AUTO_COMPACTION_TOKENS as i64;
|
||||
let recommendation = if !app.auto_compact {
|
||||
"Consider enabling auto_compact or use /compact."
|
||||
} else if below_auto_floor {
|
||||
"Auto-compaction is enabled but waits for the 500K token floor."
|
||||
} else if percent >= configured_threshold {
|
||||
"Auto-compaction will run before the next send."
|
||||
} else {
|
||||
"Consider /compact or /clear."
|
||||
"Auto-compaction is enabled."
|
||||
};
|
||||
|
||||
if percent >= CONTEXT_CRITICAL_THRESHOLD_PERCENT {
|
||||
@@ -7887,8 +7912,13 @@ fn maybe_warn_context_pressure(app: &mut App) {
|
||||
}
|
||||
|
||||
if app.status_message.is_none() {
|
||||
let status_prefix = if percent >= CONTEXT_WARNING_THRESHOLD_PERCENT {
|
||||
"Context high"
|
||||
} else {
|
||||
"Context building"
|
||||
};
|
||||
app.status_message = Some(format!(
|
||||
"Context high: {percent:.0}% ({used}/{max} tokens). {recommendation}"
|
||||
"{status_prefix}: {percent:.0}% ({used}/{max} tokens). {recommendation}"
|
||||
));
|
||||
}
|
||||
}
|
||||
@@ -7898,7 +7928,10 @@ fn should_auto_compact_before_send(app: &App) -> bool {
|
||||
return false;
|
||||
}
|
||||
context_usage_snapshot(app)
|
||||
.map(|(_, _, pct)| pct >= CONTEXT_CRITICAL_THRESHOLD_PERCENT)
|
||||
.map(|(used, _, pct)| {
|
||||
used >= MINIMUM_AUTO_COMPACTION_TOKENS as i64
|
||||
&& pct >= app.auto_compact_threshold_percent.clamp(10.0, 100.0)
|
||||
})
|
||||
.unwrap_or(false)
|
||||
}
|
||||
|
||||
|
||||
@@ -3347,19 +3347,31 @@ fn context_usage_snapshot_prefers_live_estimate_while_loading() {
|
||||
#[test]
|
||||
fn should_auto_compact_before_send_respects_threshold_and_setting() {
|
||||
let mut app = create_test_app();
|
||||
let big_buffer = vec![Message {
|
||||
role: "user".to_string(),
|
||||
content: vec![ContentBlock::Text {
|
||||
text: "context ".repeat(400_000),
|
||||
cache_control: None,
|
||||
}],
|
||||
}];
|
||||
let messages_for_repeats = |repeats: usize| {
|
||||
vec![Message {
|
||||
role: "user".to_string(),
|
||||
content: vec![ContentBlock::Text {
|
||||
text: "context ".repeat(repeats),
|
||||
cache_control: None,
|
||||
}],
|
||||
}]
|
||||
};
|
||||
|
||||
// High estimated context + auto_compact ON → auto-compact triggers.
|
||||
app.api_messages = big_buffer.clone();
|
||||
app.api_messages = messages_for_repeats(240_000);
|
||||
app.auto_compact = true;
|
||||
app.auto_compact_threshold_percent = 70.0;
|
||||
assert!(should_auto_compact_before_send(&app));
|
||||
|
||||
let (_, _, high_percent) =
|
||||
context_usage_snapshot(&app).expect("high context snapshot should be available");
|
||||
assert!(
|
||||
(70.0..90.0).contains(&high_percent),
|
||||
"test fixture should sit between default and high custom thresholds; got {high_percent:.2}%"
|
||||
);
|
||||
app.auto_compact_threshold_percent = 90.0;
|
||||
assert!(!should_auto_compact_before_send(&app));
|
||||
|
||||
// Same high context but auto_compact OFF → never triggers.
|
||||
app.auto_compact = false;
|
||||
assert!(!should_auto_compact_before_send(&app));
|
||||
@@ -3369,16 +3381,39 @@ fn should_auto_compact_before_send_respects_threshold_and_setting() {
|
||||
// #115 fix: the estimate is the primary signal, not the engine's
|
||||
// turn-cumulative reported value (which used to rule the displayed
|
||||
// % and could spuriously trigger / suppress auto-compact).
|
||||
app.api_messages = messages_for_repeats(80_000);
|
||||
app.auto_compact = true;
|
||||
app.auto_compact_threshold_percent = 10.0;
|
||||
app.session.last_prompt_tokens = Some(10_000);
|
||||
let (used, _, percent) =
|
||||
context_usage_snapshot(&app).expect("floor context snapshot should be available");
|
||||
assert!(
|
||||
used < crate::compaction::MINIMUM_AUTO_COMPACTION_TOKENS as i64 && percent >= 10.0,
|
||||
"test fixture should cross percent threshold but stay below the 500K floor; used={used} percent={percent:.2}"
|
||||
);
|
||||
assert!(!should_auto_compact_before_send(&app));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn context_pressure_warning_reflects_auto_compact_threshold_state() {
|
||||
let mut app = create_test_app();
|
||||
app.api_messages = vec![Message {
|
||||
role: "user".to_string(),
|
||||
content: vec![ContentBlock::Text {
|
||||
text: "small".to_string(),
|
||||
text: "context ".repeat(240_000),
|
||||
cache_control: None,
|
||||
}],
|
||||
}];
|
||||
app.auto_compact = true;
|
||||
app.session.last_prompt_tokens = Some(10_000);
|
||||
assert!(!should_auto_compact_before_send(&app));
|
||||
app.auto_compact_threshold_percent = 70.0;
|
||||
|
||||
maybe_warn_context_pressure(&mut app);
|
||||
|
||||
let status = app.status_message.expect("context warning");
|
||||
assert!(
|
||||
status.contains("Auto-compaction will run before the next send."),
|
||||
"unexpected status: {status}"
|
||||
);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
|
||||
@@ -482,11 +482,13 @@ codewhale also stores user preferences in:
|
||||
- `~/.config/deepseek/settings.toml`
|
||||
|
||||
Notable settings include `auto_compact` (default `false`), which opts into
|
||||
replacement-style summarization only near the active model limit. The default
|
||||
V4 path preserves the stable message prefix for cache reuse; use manual
|
||||
`/compact` or enable `auto_compact` only when you explicitly want automatic
|
||||
replacement compaction. You can inspect or update these from the TUI with
|
||||
`/settings` and `/config` (interactive editor).
|
||||
replacement-style summarization before the active model limit. The trigger
|
||||
defaults to `auto_compact_threshold_percent = 70`, but the 500K-token floor
|
||||
still blocks early compaction. The default V4 path preserves the stable message
|
||||
prefix for cache reuse; use manual `/compact` / Ctrl+L or enable
|
||||
`auto_compact` only when you explicitly want automatic replacement compaction.
|
||||
You can inspect or update these from the TUI with `/settings` and `/config`
|
||||
(interactive editor).
|
||||
|
||||
Common settings keys:
|
||||
|
||||
@@ -497,6 +499,8 @@ Common settings keys:
|
||||
community presets apply across the TUI. Aliases such as `whale`, `mono`,
|
||||
`black-white`, `tokyonight`, and `gruvbox` are accepted.
|
||||
- `auto_compact` (on/off, default off)
|
||||
- `auto_compact_threshold_percent` (10-100, default `70`): pre-send
|
||||
auto-compaction threshold used only when `auto_compact` is enabled.
|
||||
- `paste_burst_detection` (on/off, default on): fallback rapid-key paste
|
||||
detection for terminals that do not emit bracketed-paste events. This is
|
||||
independent of terminal bracketed-paste mode.
|
||||
|
||||
Reference in New Issue
Block a user