Files
codewhale/crates/tui
Hunter Bown fa99fb5124 fix(engine): 256K output budget + capacity controller off by default
User feedback after v0.6.2 dogfooding: "we'd be better off simplifying and
removing guardrails." Two changes that meaningfully shrink the surface:

1. TURN_MAX_OUTPUT_TOKENS: 32_768 → 262_144 (256K).
   V4 thinking models can produce tens of thousands of reasoning tokens
   on hard prompts before the visible reply, and DeepSeek V4 has a 1M
   context window. 32K was tight for that workload (showed up as the
   model "stopping mid-response" once reasoning exhausted the budget).
   256K is generous enough that the per-turn ceiling effectively never
   bites in normal use.

2. CapacityControllerConfig::enabled: true → false.
   The controller's main intervention, `TargetedContextRefresh`, runs
   `compact_messages_safe` which rewrites the live conversation —
   visually identical to the agent "restarting" mid-turn. The failure
   mode it protects against (context overflow) is rare in practice and
   self-correcting (the model surfaces a clear error). Power users on
   V4 do not need the guardrail; users who do can re-enable it via
   `capacity.enabled = true` in `~/.deepseek/config.toml`.

Tests:
- context_budget_reserves_output_and_headroom: switched fixture model
  to deepseek-v4-pro (1M context) so the 256K reservation doesn't
  saturate the budget to zero.
- cooldown_blocks_repeated_action: explicitly enables the controller
  (the cooldown logic short-circuits when disabled).

cargo clippy --workspace -- -D warnings clean; full test suite green
(990 + adjacent crate tests).
2026-04-26 15:51:58 -05:00
..