feat(tui): add configurable auto-compact threshold

Refs #1722

Preserves auto_compact as opt-in, adds the saved threshold setting, keeps the 500K hard floor, and wires Ctrl+L as a manual compaction shortcut for context-pressure recovery.

Harvested from PR #1723 by @aboimpinto

Co-authored-by: Paulo Aboim Pinto <aboimpinto@gmail.com>
This commit is contained in:
Hunter Bown
2026-06-01 14:09:49 -07:00
parent 2ca2927657
commit bb64018a15
10 changed files with 150 additions and 30 deletions
+4 -2
View File
@@ -481,8 +481,10 @@ exponential_base = 2.0
# Context Compaction
# ─────────────────────────────────────────────────────────────────────────────────
# Auto-compaction is a saved UI setting edited with `/config` (`auto_compact`).
# There is no config-file `[compaction]` table yet; detailed thresholds are
# chosen by the TUI from the active model/context budget.
# The optional saved threshold setting is `auto_compact_threshold_percent`
# (default 70, still gated by the 500K-token floor). There is no config-file
# `[compaction]` table yet; runtime compaction budgets are chosen by the TUI
# from the active model/context window.
# Append-only Flash seams are experimental and opt-in while the v0.7.5
# context/cache audit validates prefix-cache behavior.
+1 -1
View File
@@ -995,7 +995,7 @@ pub fn system_prompt_for_mode_with_context_skills_session_and_approval(
1. Use `/compact` to summarize earlier context and free up space\n\
2. The system will preserve important information (files you're working on, recent messages, tool results)\n\
3. After compaction, you'll see a summary of what was discussed and can continue seamlessly\n\n\
If you notice context is getting long (>60% during sustained work), proactively suggest using `/compact` to the user.\n\n\
If you notice context is getting long (>60% during sustained work), proactively suggest using `/compact` or Ctrl+L to the user. If auto_compact is enabled, the engine can compact before the next send once the configured threshold is crossed.\n\n\
### Prompt-cache awareness\n\n\
DeepSeek caches the longest *byte-stable prefix* of every request and charges roughly 100× less for cache-hit tokens than miss tokens. The system prompt above is layered most-static-first specifically so the prefix stays stable turn-over-turn. To keep cache hits high:\n\
- **Working set location:** the current repo working set is stored on new user messages inside a `<turn_meta>` block. Treat it as high-priority turn metadata, not as a stable system-prompt section.\n\
+1 -1
View File
@@ -204,7 +204,7 @@ For exact counts or structured aggregates, compute them directly in Python insid
## Context Management
You have a 1M-token context window. During long coding sessions, suggest `/compact` when usage approaches ~60% or when the app marks context pressure as high. It summarizes earlier turns so you can keep working without losing thread.
You have a 1M-token context window. During long coding sessions, suggest `/compact` or Ctrl+L when usage approaches ~60% or when the app marks context pressure as high. If auto_compact is enabled, the engine can compact before the next send once the configured threshold is crossed. Compaction summarizes earlier turns so you can keep working without losing thread.
Model notes: DeepSeek V4 models emit *thinking tokens* (`ContentBlock::Thinking`) before final answers. These are invisible to the user but count against context. Cost/token estimates are approximate; treat them as a rough guide.
+1 -1
View File
@@ -31,7 +31,7 @@ RLM works by keeping the long input and intermediate values as symbolic REPL sta
The Python helpers visible inside the REPL (`sub_query`, `sub_query_batch`, `sub_query_map`, `sub_rlm`, `finalize`, and related context helpers) are NOT separately-callable tools — they are functions the sub-agent uses inside its Python code.
## Context
You have a 1M-token context window. During long coding sessions, suggest `/compact` when usage approaches ~60% or when the app marks context pressure as high. It summarizes earlier turns so you can keep working without losing thread.
You have a 1M-token context window. During long coding sessions, suggest `/compact` or Ctrl+L when usage approaches ~60% or when the app marks context pressure as high. If auto_compact is enabled, the engine can compact before the next send once the configured threshold is crossed. Compaction summarizes earlier turns so you can keep working without losing thread.
Model notes: DeepSeek V4 models emit *thinking tokens* (`ContentBlock::Thinking`) before final answers. These are invisible to the user but count against context. Cost/token estimates are approximate; treat them as a rough guide.
+1 -1
View File
@@ -26,6 +26,6 @@ Don't sequence approvals one at a time — the user wants context, not interrupt
Long sessions accumulate context. To stay fast:
- Open sub-agent sessions for independent work instead of doing everything sequentially
- Batch reads/searches/git-inspections into parallel tool calls
- Suggest `/compact` when context nears 60% during sustained work — the compaction relay preserves open blockers
- Suggest `/compact` or Ctrl+L when context nears 60% during sustained work — the compaction relay preserves open blockers
- Use `note` for decisions you'll need across compaction boundaries
- A 3-turn session that fans out to sub-agents finishes faster AND stays responsive longer than a 15-turn sequential grind
+43
View File
@@ -171,6 +171,9 @@ impl TuiPrefs {
pub struct Settings {
/// Auto-compact conversations when they approach the model limit.
pub auto_compact: bool,
/// Context-window percentage that triggers pre-send auto-compaction when
/// `auto_compact` is enabled. The hard token floor still applies.
pub auto_compact_threshold_percent: f64,
/// Reduce status noise and collapse details more aggressively
pub calm_mode: bool,
/// Streaming pacing mode. `true` pins the chunker to one-character-per-
@@ -299,6 +302,7 @@ impl Default for Settings {
// available for users / agents that decide compaction is
// worth the cache hit on their workload (#664).
auto_compact: false,
auto_compact_threshold_percent: 70.0,
calm_mode: false,
low_motion: false,
fancy_animations: true,
@@ -497,6 +501,10 @@ impl Settings {
"auto_compact" | "compact" => {
self.auto_compact = parse_bool(value)?;
}
"auto_compact_threshold" | "auto_compact_threshold_percent" => {
self.auto_compact_threshold_percent =
parse_percent_setting("auto_compact_threshold_percent", value)?;
}
"calm_mode" | "calm" => {
self.calm_mode = parse_bool(value)?;
}
@@ -701,6 +709,10 @@ impl Settings {
lines.push(tr(locale, MessageId::SettingsTitle).to_string());
lines.push("─────────────────────────────".to_string());
lines.push(format!(" auto_compact: {}", self.auto_compact));
lines.push(format!(
" auto_compact_pct: {:.0}",
self.auto_compact_threshold_percent
));
lines.push(format!(" calm_mode: {}", self.calm_mode));
lines.push(format!(" low_motion: {}", self.low_motion));
lines.push(format!(" fancy_animations: {}", self.fancy_animations));
@@ -768,6 +780,10 @@ impl Settings {
"auto_compact",
"Auto-compact near the hard context limit: on/off (default off)",
),
(
"auto_compact_threshold_percent",
"Auto-compact trigger threshold percent when auto_compact is on: 10-100 (default 70)",
),
("calm_mode", "Calmer UI defaults: on/off"),
(
"low_motion",
@@ -932,6 +948,21 @@ fn parse_usize_setting(key: &str, value: &str) -> Result<usize> {
})
}
fn parse_percent_setting(key: &str, value: &str) -> Result<f64> {
let trimmed = value.trim().trim_end_matches('%').trim();
let percent = trimmed.parse::<f64>().map_err(|_| {
anyhow::anyhow!(
"Failed to update setting: invalid {key} '{value}'. Expected a number from 10 to 100."
)
})?;
if !(10.0..=100.0).contains(&percent) {
anyhow::bail!(
"Failed to update setting: invalid {key} '{value}'. Expected a number from 10 to 100."
);
}
Ok(percent)
}
fn normalize_mode(value: &str) -> &str {
match value.trim().to_ascii_lowercase().as_str() {
"edit" => "agent",
@@ -1103,6 +1134,7 @@ mod tests {
// flipped so the cache-friendly path is the one users get
// without configuring anything (#664).
assert!(!settings.auto_compact);
assert_eq!(settings.auto_compact_threshold_percent, 70.0);
}
#[test]
@@ -1114,6 +1146,17 @@ mod tests {
assert!(!settings.auto_compact);
}
#[test]
fn auto_compact_threshold_is_validated() {
let mut settings = Settings::default();
settings
.set("auto_compact_threshold", "65%")
.expect("threshold");
assert_eq!(settings.auto_compact_threshold_percent, 65.0);
assert!(settings.set("auto_compact_threshold", "9").is_err());
assert!(settings.set("auto_compact_threshold", "101").is_err());
}
#[test]
fn default_settings_show_footer_water_strip() {
let settings = Settings::default();
+3
View File
@@ -1220,6 +1220,7 @@ pub struct App {
#[allow(dead_code)]
pub system_prompt: Option<SystemPrompt>,
pub auto_compact: bool,
pub auto_compact_threshold_percent: f64,
pub calm_mode: bool,
pub low_motion: bool,
/// Pending #61 (animated working strip). Set from config but not read
@@ -1748,6 +1749,7 @@ impl App {
crate::config::active_provider_uses_env_only_api_key(&effective_auth_config);
let was_onboarded = crate::tui::onboarding::is_onboarded();
let auto_compact = settings.auto_compact;
let auto_compact_threshold_percent = settings.auto_compact_threshold_percent;
let calm_mode = settings.calm_mode;
let low_motion = settings.low_motion;
let fancy_animations = settings.fancy_animations;
@@ -1946,6 +1948,7 @@ impl App {
bracketed_paste_seen: false,
system_prompt: None,
auto_compact,
auto_compact_threshold_percent,
calm_mode,
low_motion,
fancy_animations,
+41 -8
View File
@@ -40,7 +40,7 @@ use crate::client::{
inspect_prompt_for_request,
};
use crate::commands;
use crate::compaction::estimate_input_tokens_conservative;
use crate::compaction::{MINIMUM_AUTO_COMPACTION_TOKENS, estimate_input_tokens_conservative};
use crate::config::{
ApiProvider, Config, DEFAULT_NVIDIA_NIM_BASE_URL, ProviderConfig, ProvidersConfig, StatusItem,
UpdateConfig, save_provider_auth_mode_for,
@@ -145,6 +145,7 @@ const MIN_CHAT_HEIGHT: u16 = 3;
const MIN_COMPOSER_HEIGHT: u16 = 2;
const CONTEXT_WARNING_THRESHOLD_PERCENT: f64 = 85.0;
const CONTEXT_CRITICAL_THRESHOLD_PERCENT: f64 = 95.0;
const CONTEXT_SUGGEST_COMPACT_THRESHOLD_PERCENT: f64 = 60.0;
const UI_IDLE_POLL_MS: u64 = 48;
const UI_ACTIVE_POLL_MS: u64 = 24;
const WEB_CONFIG_POLL_MS: u64 = 16;
@@ -2934,6 +2935,22 @@ async fn run_event_loop(
continue;
}
if matches!(key.code, KeyCode::Char('l') | KeyCode::Char('L'))
&& key.modifiers.contains(KeyModifiers::CONTROL)
&& app.view_stack.is_empty()
{
app.status_message = Some(if app.is_compacting {
"Context compaction already in progress...".to_string()
} else {
"Compacting context (Ctrl+L)...".to_string()
});
if !app.is_compacting {
let _ = engine_handle.send(Op::CompactContext).await;
}
app.needs_redraw = true;
continue;
}
if matches!(key.code, KeyCode::Char('b') | KeyCode::Char('B'))
&& key.modifiers.contains(KeyModifiers::CONTROL)
&& app.view_stack.is_empty()
@@ -4634,7 +4651,8 @@ async fn dispatch_user_message(
});
maybe_warn_context_pressure(app);
if should_auto_compact_before_send(app) {
app.status_message = Some("Context critical; compacting before send...".to_string());
app.status_message =
Some("Context threshold reached; compacting before send...".to_string());
let _ = engine_handle.send(Op::CompactContext).await;
}
app.session.last_prompt_tokens = None;
@@ -7869,14 +7887,21 @@ fn maybe_warn_context_pressure(app: &mut App) {
return;
};
if percent < CONTEXT_WARNING_THRESHOLD_PERCENT {
let configured_threshold = app.auto_compact_threshold_percent.clamp(10.0, 100.0);
let warning_threshold = CONTEXT_SUGGEST_COMPACT_THRESHOLD_PERCENT.min(configured_threshold);
if percent < warning_threshold {
return;
}
let recommendation = if app.auto_compact {
"Auto-compaction is enabled."
let below_auto_floor = used < MINIMUM_AUTO_COMPACTION_TOKENS as i64;
let recommendation = if !app.auto_compact {
"Consider enabling auto_compact or use /compact."
} else if below_auto_floor {
"Auto-compaction is enabled but waits for the 500K token floor."
} else if percent >= configured_threshold {
"Auto-compaction will run before the next send."
} else {
"Consider /compact or /clear."
"Auto-compaction is enabled."
};
if percent >= CONTEXT_CRITICAL_THRESHOLD_PERCENT {
@@ -7887,8 +7912,13 @@ fn maybe_warn_context_pressure(app: &mut App) {
}
if app.status_message.is_none() {
let status_prefix = if percent >= CONTEXT_WARNING_THRESHOLD_PERCENT {
"Context high"
} else {
"Context building"
};
app.status_message = Some(format!(
"Context high: {percent:.0}% ({used}/{max} tokens). {recommendation}"
"{status_prefix}: {percent:.0}% ({used}/{max} tokens). {recommendation}"
));
}
}
@@ -7898,7 +7928,10 @@ fn should_auto_compact_before_send(app: &App) -> bool {
return false;
}
context_usage_snapshot(app)
.map(|(_, _, pct)| pct >= CONTEXT_CRITICAL_THRESHOLD_PERCENT)
.map(|(used, _, pct)| {
used >= MINIMUM_AUTO_COMPACTION_TOKENS as i64
&& pct >= app.auto_compact_threshold_percent.clamp(10.0, 100.0)
})
.unwrap_or(false)
}
+46 -11
View File
@@ -3347,19 +3347,31 @@ fn context_usage_snapshot_prefers_live_estimate_while_loading() {
#[test]
fn should_auto_compact_before_send_respects_threshold_and_setting() {
let mut app = create_test_app();
let big_buffer = vec![Message {
role: "user".to_string(),
content: vec![ContentBlock::Text {
text: "context ".repeat(400_000),
cache_control: None,
}],
}];
let messages_for_repeats = |repeats: usize| {
vec![Message {
role: "user".to_string(),
content: vec![ContentBlock::Text {
text: "context ".repeat(repeats),
cache_control: None,
}],
}]
};
// High estimated context + auto_compact ON → auto-compact triggers.
app.api_messages = big_buffer.clone();
app.api_messages = messages_for_repeats(240_000);
app.auto_compact = true;
app.auto_compact_threshold_percent = 70.0;
assert!(should_auto_compact_before_send(&app));
let (_, _, high_percent) =
context_usage_snapshot(&app).expect("high context snapshot should be available");
assert!(
(70.0..90.0).contains(&high_percent),
"test fixture should sit between default and high custom thresholds; got {high_percent:.2}%"
);
app.auto_compact_threshold_percent = 90.0;
assert!(!should_auto_compact_before_send(&app));
// Same high context but auto_compact OFF → never triggers.
app.auto_compact = false;
assert!(!should_auto_compact_before_send(&app));
@@ -3369,16 +3381,39 @@ fn should_auto_compact_before_send_respects_threshold_and_setting() {
// #115 fix: the estimate is the primary signal, not the engine's
// turn-cumulative reported value (which used to rule the displayed
// % and could spuriously trigger / suppress auto-compact).
app.api_messages = messages_for_repeats(80_000);
app.auto_compact = true;
app.auto_compact_threshold_percent = 10.0;
app.session.last_prompt_tokens = Some(10_000);
let (used, _, percent) =
context_usage_snapshot(&app).expect("floor context snapshot should be available");
assert!(
used < crate::compaction::MINIMUM_AUTO_COMPACTION_TOKENS as i64 && percent >= 10.0,
"test fixture should cross percent threshold but stay below the 500K floor; used={used} percent={percent:.2}"
);
assert!(!should_auto_compact_before_send(&app));
}
#[test]
fn context_pressure_warning_reflects_auto_compact_threshold_state() {
let mut app = create_test_app();
app.api_messages = vec![Message {
role: "user".to_string(),
content: vec![ContentBlock::Text {
text: "small".to_string(),
text: "context ".repeat(240_000),
cache_control: None,
}],
}];
app.auto_compact = true;
app.session.last_prompt_tokens = Some(10_000);
assert!(!should_auto_compact_before_send(&app));
app.auto_compact_threshold_percent = 70.0;
maybe_warn_context_pressure(&mut app);
let status = app.status_message.expect("context warning");
assert!(
status.contains("Auto-compaction will run before the next send."),
"unexpected status: {status}"
);
}
// ============================================================================
+9 -5
View File
@@ -482,11 +482,13 @@ codewhale also stores user preferences in:
- `~/.config/deepseek/settings.toml`
Notable settings include `auto_compact` (default `false`), which opts into
replacement-style summarization only near the active model limit. The default
V4 path preserves the stable message prefix for cache reuse; use manual
`/compact` or enable `auto_compact` only when you explicitly want automatic
replacement compaction. You can inspect or update these from the TUI with
`/settings` and `/config` (interactive editor).
replacement-style summarization before the active model limit. The trigger
defaults to `auto_compact_threshold_percent = 70`, but the 500K-token floor
still blocks early compaction. The default V4 path preserves the stable message
prefix for cache reuse; use manual `/compact` / Ctrl+L or enable
`auto_compact` only when you explicitly want automatic replacement compaction.
You can inspect or update these from the TUI with `/settings` and `/config`
(interactive editor).
Common settings keys:
@@ -497,6 +499,8 @@ Common settings keys:
community presets apply across the TUI. Aliases such as `whale`, `mono`,
`black-white`, `tokyonight`, and `gruvbox` are accepted.
- `auto_compact` (on/off, default off)
- `auto_compact_threshold_percent` (10-100, default `70`): pre-send
auto-compaction threshold used only when `auto_compact` is enabled.
- `paste_burst_detection` (on/off, default on): fallback rapid-key paste
detection for terminals that do not emit bracketed-paste events. This is
independent of terminal bracketed-paste mode.