codewhale

dgf1988/codewhale

Files

T

hexin 4f3a0c3cfc feat(engine): allow DEEPSEEK_MAX_OUTPUT_TOKENS env override for tight-context providers (#2147 )

The `effective_max_output_tokens` heuristic defaults to 64K for any model
not in the known-context-window table. This is fine for DeepSeek's hosted
API (1M context) but causes immediate HTTP 400s on self-hosted providers
with tight `max-model-len`.

Example: vLLM serving Qwen3.6 with `--max-model-len 65536` rejects
requests because 64000 (output) + ~1500 (input) exceeds the limit by 1
token.

This change lets the operator set `DEEPSEEK_MAX_OUTPUT_TOKENS=16384` (or
whatever fits their deployment) to override the heuristic. The env var
takes precedence over the model-table lookup when set to a positive
integer; otherwise the existing behavior is preserved.

No new config struct field — env-only override keeps the public API
unchanged. Useful for embedded users (e.g. pinvou3) who need to control
output budget without forking the engine config schema.

Co-authored-by: hexin <he.xin@h3c.com>

2026-05-26 10:31:26 -05:00

assets/skills

test(rebrand): residual brand-string cleanup across source and assets

2026-05-23 11:58:34 -05:00

src

feat(engine): allow DEEPSEEK_MAX_OUTPUT_TOKENS env override for tight-context providers (#2147 )

2026-05-26 10:31:26 -05:00

tests

chore(release): prepare v0.8.45

2026-05-25 18:45:36 -05:00

build.rs

chore(rebrand): finish codewhale release surfaces