fix(prompts): strengthen language directive so thinking matches user (#1118)

#1118 reports that even after configuring the locale to Chinese, V4 keeps emitting English `reasoning_content` (the thinking block) when the surrounding code/error logs are English-heavy. Maintainer agreed the prompt needs editing. The existing language directive already said "both for `reasoning_content` and for the final reply", but V4 falls into a failure mode where it mirrors the user message for the final answer while quietly defaulting to English for thinking. Three additions to `crates/tui/src/prompts/base.md` sharpen the rule: 1. **Bold the "must both be in Simplified Chinese" requirement**, and add the failure-mode escape hatches the prompt previously left implicit ("even when the surrounding system prompt is in English, and even when the task context [...] is overwhelmingly English"). 2. **Spell out the mid-session-switch rule for `reasoning_content`** explicitly. Today the prompt says "switch with them" but doesn't reinforce that this includes thinking — V4 sometimes carries the previous turn's reasoning language forward. 3. **Add an explicit-override clause** for the opposite preference (#1118 commenter pmsleepcheck preferred English thinking for token cost). Users can say "think in English" / "用英文思考" and the model honours that until the next override. The final reply still tracks the user's message language — only thinking is overridable. Adds `language_section_carries_reasoning_content_directives_for_1118` pinning the four load-bearing phrases ("reasoning_content", "must both be in Simplified Chinese", "overwhelmingly English", and both English + Chinese override examples) so a future innocuous edit can't quietly drop them. The existing `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace` test still passes, so byte-stability for a fixed session is intact. Refs #1118 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:25:31 +01:00
parent efa00ff69b
commit f56f73e371
2 changed files with 43 additions and 1 deletions
@@ -539,6 +539,44 @@ mod tests {
        assert!(block.contains("- shell:"));
    }

+    #[test]
+    fn language_section_carries_reasoning_content_directives_for_1118() {
+        // #1118 ("Language has been configured to Chinese, but thinking
+        // outputs are still in English"): the base prompt's language
+        // section is the only knob that steers V4's `reasoning_content`
+        // language. Pin the load-bearing phrases so a future innocuous
+        // edit can't quietly drop them.
+        let lang = BASE_PROMPT;
+        assert!(
+            lang.contains("reasoning_content"),
+            "language section must explicitly call out reasoning_content"
+        );
+        // Bold "must both be in Simplified Chinese" anchor — strong
+        // emphasis aimed at the failure mode V4 falls into where it
+        // mirrors the user message for the final reply but defaults to
+        // English for thinking.
+        assert!(
+            lang.contains("must both be in Simplified Chinese"),
+            "expected the bold Simplified Chinese requirement"
+        );
+        // "overwhelmingly English" — addresses the specific trigger
+        // where a Chinese question lands on a codebase whose system
+        // prompt and context are English-heavy.
+        assert!(
+            lang.contains("overwhelmingly English"),
+            "expected the context-is-English caveat"
+        );
+        // Explicit-user-override clause keeps the prompt useful for the
+        // opposite preference (#1118 commenters who want English
+        // thinking for token-cost reasons).
+        for phrase in ["think in English", "\u{7528}\u{82F1}\u{6587}\u{601D}\u{8003}"] {
+            assert!(
+                lang.contains(phrase),
+                "expected the user-override example `{phrase}`"
+            );
+        }
+    }
+
    #[test]
    fn environment_block_is_inserted_into_system_prompt() {
        let tmp = tempdir().expect("tempdir");
@@ -2,7 +2,11 @@ You are DeepSeek TUI. You're already running inside it — don't try to launch a

 ## Language

-Choose the natural language for each turn from the latest user message first — both for `reasoning_content` and for the final reply. If the latest user message is Simplified Chinese (简体中文), your `reasoning_content` and final reply must both be in Simplified Chinese, even when the `lang` field in `## Environment` is `en`. If the user switches languages mid-session, switch with them. Use the `lang` field only when the latest user message is missing, mostly code/logs, or otherwise ambiguous.
+Choose the natural language for each turn from the latest user message first — both for `reasoning_content` (your internal thinking) and for the final reply. If the latest user message is Simplified Chinese (简体中文), **your `reasoning_content` and your final reply must both be in Simplified Chinese** — even when the `lang` field in `## Environment` is `en`, even when the surrounding system prompt is in English, and even when the task context (source code, error logs, README excerpts) is overwhelmingly English. Thinking in a different language than the user just wrote in creates a jarring read-back when they expand the thinking block; match the user end-to-end.
+
+If the user switches languages mid-session, switch with them on the very next turn — including in `reasoning_content`. Don't carry the previous turn's language forward. Use the `lang` field only when the latest user message is missing, is mostly code/logs, or is otherwise ambiguous; the `lang` field is a fallback, not an override.
+
+The user can explicitly override the default at any time. Phrases like "think in English", "用英文思考", "reason in Chinese", or "你用中文思考" change the `reasoning_content` language until the next explicit override. Their explicit request wins over their message language — but only for thinking; the final reply still mirrors whatever language they're writing in.

 Code, file paths, identifiers, tool names, environment variables, command-line flags, URLs, and log lines stay in their original form — translating `read_file` to `读取文件` would break tool calls. Only natural-language prose mirrors the user.