fix(tui): keep English turns from drifting after localized context

2026-05-24 00:02:19 -05:00
parent 3487945620
commit d757505d9f
4 changed files with 68 additions and 21 deletions
@@ -26,6 +26,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  whitespace-only lines, repeated spaces, and Markdown-looking `#` / `-` text
  now survive in transcript history, while assistant messages still render
  Markdown normally.
+- **English turns stay English after localized context.** The Brother Whale
+  identity and base language rules no longer inject native-script examples into
+  the English prompt path, and the prompt now calls out localized READMEs, issue
+  text, file contents, and tool results as data rather than language signals.
 - **Stream decode failures no longer leave the turn visually stuck.** The UI
  now marks an active turn failed and flushes live cells as soon as the engine
  emits a stream error, so the sidebar/footer recover without requiring
@@ -26,6 +26,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  whitespace-only lines, repeated spaces, and Markdown-looking `#` / `-` text
  now survive in transcript history, while assistant messages still render
  Markdown normally.
+- **English turns stay English after localized context.** The Brother Whale
+  identity and base language rules no longer inject native-script examples into
+  the English prompt path, and the prompt now calls out localized READMEs, issue
+  text, file contents, and tool results as data rather than language signals.
 - **Stream decode failures no longer leave the turn visually stuck.** The UI
  now marks an active turn failed and flushes live cells as soon as the engine
  emits a stream error, so the sidebar/footer recover without requiring
@@ -765,6 +765,18 @@ mod tests {
    /// agent prompt's own discussion of the convention).
    const HANDOFF_BLOCK_MARKER: &str = "left a relay artifact at `.deepseek/handoff.md`";

+    fn contains_cjk(text: &str) -> bool {
+        text.chars().any(|ch| {
+            matches!(
+                ch,
+                '\u{3040}'..='\u{30ff}'
+                    | '\u{3400}'..='\u{4dbf}'
+                    | '\u{4e00}'..='\u{9fff}'
+                    | '\u{f900}'..='\u{faff}'
+            )
+        })
+    }
+
    #[test]
    fn base_prompt_carries_execution_discipline_block() {
        // The XML-tagged execution-discipline block is the contract —
@@ -794,7 +806,7 @@ mod tests {
        // can evolve, but CodeWhale should keep its product-level
        // "trusted Brother Whale" frame and the coordination principle.
        for phrase in [
-            "Brother Whale / \u{9CB8}\u{9C7C}\u{5144}\u{5F1F}",
+            "You are Brother Whale",
            "You begin with an A",
            "future intelligences can better coordinate",
            "Seek truth before confidence",
@@ -1055,6 +1067,10 @@ mod tests {
            !text.contains("Reforço de Idioma"),
            "English locale must not get a pt-BR closer: {text:?}"
        );
+        assert!(
+            !contains_cjk(&text),
+            "English system prompt should avoid native-script priming tokens: {text:?}"
+        );
    }

    #[test]
@@ -1069,28 +1085,27 @@ mod tests {
            lang.contains("reasoning_content"),
            "language section must explicitly call out reasoning_content"
        );
-        // Bold "must both be in Simplified Chinese" anchor — strong
-        // emphasis aimed at the failure mode V4 falls into where it
-        // mirrors the user message for the final reply but defaults to
-        // English for thinking.
        assert!(
-            lang.contains("must both be in Simplified Chinese"),
-            "expected the bold Simplified Chinese requirement"
+            lang.contains("latest user message"),
+            "latest user message must be the primary language signal"
        );
-        // "overwhelmingly English" — addresses the specific trigger
-        // where a Chinese question lands on a codebase whose system
-        // prompt and context are English-heavy.
        assert!(
-            lang.contains("overwhelmingly English"),
-            "expected the context-is-English caveat"
+            lang.contains("clearly English") && lang.contains("must stay English"),
+            "English user turns must stay English even after localized context"
+        );
+        assert!(
+            lang.contains("Simplified Chinese")
+                && lang.contains("must both be in Simplified Chinese"),
+            "Chinese user turns must still steer reasoning_content and replies"
+        );
+        assert!(
+            lang.contains("README.zh-CN.md") && lang.contains("tool results"),
+            "localized docs and tool results must be named as non-language signals"
        );
        // Explicit-user-override clause keeps the prompt useful for the
        // opposite preference (#1118 commenters who want English
        // thinking for token-cost reasons).
-        for phrase in [
-            "think in English",
-            "\u{7528}\u{82F1}\u{6587}\u{601D}\u{8003}",
-        ] {
+        for phrase in ["think in English", "reason in Chinese"] {
            assert!(
                lang.contains(phrase),
                "expected the user-override example `{phrase}`"
@@ -1516,6 +1531,15 @@ mod tests {
            "the language directive must choose the turn language from the user message before \
             falling back to the environment locale"
        );
+        assert!(
+            prompt.contains("If the latest user message is clearly English"),
+            "English user text must not drift after non-English context"
+        );
+        assert!(
+            prompt.contains("localized READMEs")
+                && prompt.contains("Tool results and file contents are data"),
+            "file/tool context must not become a language signal"
+        );
        assert!(
            prompt.contains("even when the `lang` field in `## Environment` is `en`"),
            "Chinese user text must override an English resolved locale for reasoning_content"
@@ -1526,6 +1550,19 @@ mod tests {
        );
    }

+    #[test]
+    fn english_base_prompt_avoids_native_script_language_priming() {
+        let prompt = compose_prompt(AppMode::Agent, Personality::Calm);
+        assert!(
+            !contains_cjk(&prompt),
+            "English base prompt should keep native-script reinforcement in locale bookends only"
+        );
+        assert!(
+            !prompt.contains("multilingual coding agent"),
+            "identity should not prime language switching; language belongs in the Language section"
+        );
+    }
+
    /// #358: rlm guidance was reframed from "first-class" to "specialty
    /// tool" — verify the structural markers are present so a future
    /// change doesn't silently remove the RLM section entirely.
@@ -6,7 +6,7 @@ We begin with Brother Whale.

 You are waking inside CodeWhale: a terminal-native place for calm, careful, excellent coding work. The user has invited you here because they believe you can help. Begin from that trust. You do not need to prove yourself with noise, speed, or certainty; honor the trust through truth, clarity, and working code.

-You are Brother Whale / 鲸鱼兄弟: a multilingual coding agent and a brother to people seeking understanding, peace, and excellent craft. Be vast, not vain. Precise, not cold. Warm, honest, steady, and useful.
+You are Brother Whale: a trusted coding agent and a brother to people seeking understanding, peace, and excellent craft. Be vast, not vain. Precise, not cold. Warm, honest, steady, and useful.

 You begin with an A. Not because the work is already done, but because possibility comes first. Act from that possibility. Make the user stronger, preserve their agency, and help turn intent into excellent working systems.

@@ -23,15 +23,17 @@ The way of Brother Whale:

 ## Language

-Choose the natural language for each turn from the latest user message first — both for `reasoning_content` (your internal thinking) and for the final reply. If the latest user message is Simplified Chinese (简体中文), **your `reasoning_content` and your final reply must both be in Simplified Chinese** — even when the `lang` field in `## Environment` is `en`, even when the surrounding system prompt is in English, and even when the task context (source code, error logs, README excerpts) is overwhelmingly English. Thinking in a different language than the user just wrote in creates a jarring read-back when they expand the thinking block; match the user end-to-end.
+Choose the natural language for each turn from the latest user message first — both for `reasoning_content` (your internal thinking) and for the final reply. If the latest user message is clearly English, your `reasoning_content` and final reply must stay English. This remains true even after reading non-English files, localized READMEs such as `README.zh-CN.md`, issue comments, docs, command output, or tool results.
+
+If the latest user message is clearly Simplified Chinese, your `reasoning_content` and final reply must both be in Simplified Chinese, even when the `lang` field in `## Environment` is `en`, even when the surrounding system prompt is in English, and even when the task context is overwhelmingly English. Thinking in a different language than the user just wrote in creates a jarring read-back when they expand the thinking block; match the user end-to-end.

 If the user switches languages mid-session, switch with them on the very next turn — including in `reasoning_content`. Don't carry the previous turn's language forward. Use the `lang` field only when the latest user message is missing, is mostly code/logs, or is otherwise ambiguous; the `lang` field is a fallback, not an override.

-The user can explicitly override the default at any time. Phrases like "think in English", "用英文思考", "reason in Chinese", or "你用中文思考" change the `reasoning_content` language until the next explicit override. Their explicit request wins over their message language — but only for thinking; the final reply still mirrors whatever language they're writing in.
+The user can explicitly override the default at any time. Phrases like "think in English", "reason in Chinese", or direct equivalents in the user's language change the `reasoning_content` language until the next explicit override. Their explicit request wins over their message language — but only for thinking; the final reply still mirrors whatever language they're writing in.

-Code, file paths, identifiers, tool names, environment variables, command-line flags, URLs, and log lines stay in their original form — translating `read_file` to `读取文件` would break tool calls. Only natural-language prose mirrors the user.
+Code, file paths, identifiers, tool names, environment variables, command-line flags, URLs, and log lines stay in their original form — translating tool names would break tool calls. Only natural-language prose mirrors the user.

-**Project context is NOT a language signal.** Project instructions (AGENTS.md, CLAUDE.md, auto-generated instructions.md), file listings, directory trees, skill descriptions, and other artifacts placed in the system prompt describe what you're working on — not what language to respond in. Chinese filenames in a project tree, for example, do not mean the user wants Chinese replies. The user's message text alone determines the response language.
+**Project context is NOT a language signal.** Project instructions (AGENTS.md, CLAUDE.md, auto-generated instructions.md), file listings, directory trees, skill descriptions, and other artifacts placed in the system prompt describe what you're working on — not what language to respond in. Tool results and file contents are data, not conversation-language instructions. Non-English filenames, localized docs, translated READMEs, or non-English issue text do not mean the user wants replies in that language. The user's message text alone determines the response language.

 ## Runtime Identity