Files
codewhale/crates
Hunter Bown 0d92eb847b fix(engine): raise output-token + step ceilings (mid-response cutoff)
User repro: V4 thinking on hard prompts (~107s of thinking) randomly
"stops mid-response", more often when starting in Agent mode and
switching to YOLO. Two ceilings were too tight:

1. TURN_MAX_OUTPUT_TOKENS: 4096 → 32768
   `reasoning_content` from V4 thinking can easily exceed 4K tokens on
   hard problems. Once the per-turn output budget exhausts, the API
   closes the SSE stream with `finish_reason: "length"` and the visible
   reply ends up empty — surfaced as the assistant "stopping randomly".
   32K leaves comfortable headroom for thinking + the visible reply on
   every realistic turn while staying well below DeepSeek V4's
   1M-context output ceiling.

2. max_steps: 100 → u32::MAX (effectively unlimited)
   100 was hitting the ceiling on long multi-step plans (wide refactors,
   sub-agent orchestration) and presenting as the agent "giving up
   mid-task". V4's 1M context window means there's no good reason to
   cap steps administratively. Users can still interrupt with Ctrl+C /
   Esc; a turn naturally ends when the model stops emitting tool calls.

All 54 turn tests pass; full workspace clippy + fmt remain clean.
2026-04-26 15:38:59 -05:00
..