fix(cli): honor config.toml reasoning_effort on non-auto exec routes (#1511)

`resolve_cli_auto_route` was hard-coding `reasoning_effort: None` when
`--model` is not `auto`, which silently dropped the value the user had
set in `~/.deepseek/config.toml` on every non-auto-route exec/one-shot
call.

For vllm + Qwen3 users with `reasoning_effort = "off"`, thinking was
therefore never disabled. The model emitted a long reasoning trace for
every prompt and SSE idle timeouts (`did not receive response headers
after 45s`) fired on any non-trivial prompt. After this fix, the same
prompts return in ~1.5s.

Route the configured value through `ReasoningEffort::from_setting`, the
same parser the TUI uses elsewhere for this field. Auto-route behaviour
(`--model auto`) is unchanged.

Verified by capturing the outgoing request body with `nc` before and
after; chat_template_kwargs.enable_thinking=false now appears in the
body on vllm exec runs.

Co-authored-by: hexin <he.xin@h3c.com>
This commit is contained in:
hexin
2026-05-13 04:37:04 +08:00
committed by GitHub
parent 9fb3d5d636
commit ec527b6a2b
+9 -1
View File
@@ -4269,9 +4269,17 @@ async fn resolve_cli_auto_route(config: &Config, model: &str, prompt: &str) -> C
auto_model: true,
}
} else {
// When --model is not `auto`, fall back to the reasoning_effort
// declared in the user's config.toml. The previous hard-coded `None`
// silently dropped the user's setting on every non-auto-route exec
// call, which (for example) prevented vllm + Qwen3 users from
// disabling thinking via `reasoning_effort = "off"` and caused
// 30+ second SSE idle timeouts on trivial prompts.
CliAutoRoute {
model: model.to_string(),
reasoning_effort: None,
reasoning_effort: config
.reasoning_effort()
.map(crate::tui::app::ReasoningEffort::from_setting),
auto_model: false,
}
}