Files
codewhale/crates
Hunter Bown 678198440a docs(prompts): capture the Chinese-native-model design tension in locale-preamble docstring
Community feedback on the v0.8.29 follow-up (WeChat thread on
#1118) made a sharp point: the standard Western-LLM advice
"always write prompts in English" doesn't transfer to DeepSeek
V4, which is a Chinese-first multilingual model with a
Chinese-co-trained tokenizer. `你好` typically encodes to ~1
token, not 2; the "Chinese is expensive" framing is folk wisdom
from a different model family.

The naïve translation of that argument is "ship a fully
translated base.md per locale" — and that's the move v0.9.x
might eventually make. For v0.8.29 we deliberately stop at the
bookend (preamble + closer in native script, English middle)
because of three concrete costs:

  1. Drift risk between N translated copies of a 200-line
     prompt — every rule change has to land in lockstep.
  2. Cache stability — one English `base.md` lets us share
     prefix-cache state across locales for the workspace-
     static portion of the prompt.
  3. Translation QA expense — 95% right is bad, because the
     missing 5% becomes silent behavior divergence.

Captured all of this in the `locale_reinforcement_preamble`
docstring so the next maintainer reading the prompt-assembly
code sees the design tension and the cost model explicitly,
and knows full translation is the natural next step if the
bookend stops being sufficient.

No runtime change; documentation only. Credit @MuMu (via Hunter)
for the bookend pattern that motivates this design, and the
unnamed WeChat commenter who made the tokenizer-economics
argument that motivates this docstring expansion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:04:49 -05:00
..
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00
2026-05-10 20:40:20 -05:00