PR #1421 from @reidliu41. Filters SGR mouse-report bursts that some terminal chains leak into stdin while mouse capture is enabled, while preserving ordinary coordinate-like text.
Community feedback on the v0.8.29 follow-up (WeChat thread on
#1118) made a sharp point: the standard Western-LLM advice
"always write prompts in English" doesn't transfer to DeepSeek
V4, which is a Chinese-first multilingual model with a
Chinese-co-trained tokenizer. `你好` typically encodes to ~1
token, not 2; the "Chinese is expensive" framing is folk wisdom
from a different model family.
The naïve translation of that argument is "ship a fully
translated base.md per locale" — and that's the move v0.9.x
might eventually make. For v0.8.29 we deliberately stop at the
bookend (preamble + closer in native script, English middle)
because of three concrete costs:
1. Drift risk between N translated copies of a 200-line
prompt — every rule change has to land in lockstep.
2. Cache stability — one English `base.md` lets us share
prefix-cache state across locales for the workspace-
static portion of the prompt.
3. Translation QA expense — 95% right is bad, because the
missing 5% becomes silent behavior divergence.
Captured all of this in the `locale_reinforcement_preamble`
docstring so the next maintainer reading the prompt-assembly
code sees the design tension and the cost model explicitly,
and knows full translation is the natural next step if the
bookend stops being sufficient.
No runtime change; documentation only. Credit @MuMu (via Hunter)
for the bookend pattern that motivates this design, and the
unnamed WeChat commenter who made the tokenizer-economics
argument that motivates this docstring expansion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitGuardian's "Basic Auth String" detector flagged commit 09dcbede0
because the test fixture for `redact_proxy_userinfo_strips_password`
contained literal URL strings of the shape
`scheme://username:password@host` — `alice:hunter2` and `bob`. The
values are obvious placeholders (not real credentials), but the
detector's regex is shape-based: any scheme-prefixed colon-separated
userinfo segment terminated by `@` matches, regardless of whether the
content is a real secret.
The test still needs to exercise the redaction logic for credential-
carrying proxy URLs. Fix: assemble the URLs via `format!` from
explicit placeholder constants (`PLACEHOLDER_USER`,
`PLACEHOLDER_PASS`) so the literal source text never contains a
contiguous `scheme://name:secret@host` pattern. Runtime behavior is
identical — `redact_proxy_userinfo` receives the same string and
returns the same redacted form.
Also reworded the function docstring (line 61) and the inline comment
at the warning log site (line 993) to describe the userinfo segment
without spelling out a literal `user:pass@host` shape that the same
detector could later trip on.
Two preexisting fixtures elsewhere in this file
(`mask_url_secrets("https://user:s3cret@…")` at line 3155 and its
docstring at line 46) have been on `main` for several releases and
are presumably already on GitGuardian's allowlist — left untouched
in this commit so the fix scope stays minimal. If they re-fire on a
future scan, the same `format!` pattern can be applied there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The opening preamble from commit `47f6d69e5` works for the first
few turns, but as English context accumulates in the session (code
read into the transcript, error logs, file listings, search
results, project context), the transformer's recency bias pulls
`reasoning_content` back toward English even when the user keeps
writing in their own language. The empirical fingerprint is "model
thinks in Chinese for the first 3-4 turns, then quietly switches
to English thinking around turn 5 as more code lands in context."
Community feedback (WeChat thread on #1118 — @MuMu describes an
XML-tagged "bilingual bookend" pattern they used in another
project, and @益达 confirms the translation-accuracy problem with
fully-translated prompts) pointed at the bookend pattern: keep the
rule-heavy middle of the prompt in English (single source of
truth, model is natively multilingual), but reinforce the locale
directive at BOTH ends in native script. The opening anchors
behavior at session start; the closer sits at the maximum-
recency position right before the user's next message and
re-asserts the rule each turn.
`locale_reinforcement_closer()` returns Some for `zh-Hans` /
`zh-CN` / `zh`, `ja` / `ja-JP`, `pt-BR` / `pt`. English (and
unmatched locales) return None — system prompt stays
byte-identical to the previous behavior for English users.
The closer is appended after the previous-session handoff block
(the existing "last block" position), so it's the very last
content before the user's first message. Any future block that
needs to sit closer to the user should be added BEFORE the
closer with an updated test invariant.
Three new tests pin the contract:
* `locale_reinforcement_closer_returns_native_script_for_supported_locales`
— each supported locale's closer is in its native script and
explicitly mentions `reasoning_content` (the V4 knob).
* `system_prompt_bookends_zh_hans_with_preamble_and_closer` —
the full zh-Hans system prompt contains both `## 语言要求`
(preamble) and `## 语言再次提醒` (closer), in that order, and
no other top-level `##` section follows the closer.
* `system_prompt_skips_locale_preamble_for_english` (extended)
— English locale gets neither the preamble nor any of the
three locale closers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1408's MCP proxy support (commit 865db6248) added a
`tracing::warn!` for malformed `HTTPS_PROXY` values that included
the raw URL via `proxy = %proxy_url`. With v0.8.29's new
file-backed tracing subscriber (writing to
`~/.deepseek/logs/tui-YYYY-MM-DD.log`), that means a corporate
proxy URL of the shape `http://user:pass@proxy.example/` would
leak the password to disk whenever reqwest rejected the URL.
Fix: redact the `user:pass@` userinfo segment before logging via
a new `redact_proxy_userinfo()` helper. `http://alice:hunter2@proxy/`
becomes `http://***@proxy/`. URLs without userinfo are returned
unchanged; the `@` is only treated as a userinfo separator when it
appears before any `/`, `?`, or `#` (so path-embedded `@` doesn't
trigger redaction). Garbage input (no `://`) passes through — the
warning log site is already in the malformed-URL failure path.
Pinned by `redact_proxy_userinfo_strips_password` covering five
cases: full creds, user-only, no-userinfo, path-only-`@`, and
garbage. The non-malformed path (where reqwest accepts the URL)
never logs the URL at all, so this is the only leak vector.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`base.md` stays the single source of truth (English meta-language,
DeepSeek V4 is natively multilingual, prefix-cache stable across
users in the same locale). For non-English UI locales we now prepend
a short locale-native passage so the model's first exposure to the
prompt is an explicit "think and reply in {locale}" directive in the
user's own writing system — defeats the failure mode reported in
#1118 and visible in the recent WeChat screenshot where a user with
`locale = zh-Hans` configured still got English thinking because the
task context (Rust code, English log lines) overpowered the inferred
`## Environment.lang` signal.
Locales supported (matched against `PromptSessionContext.locale_tag`,
which the caller resolves from `Settings`):
* `zh-Hans` / `zh-CN` / `zh` — Simplified Chinese preamble
* `ja` / `ja-JP` — Japanese preamble
* `pt-BR` / `pt` — Brazilian Portuguese preamble
English (and any unmatched locale) returns `None` and the system
prompt is byte-identical to v0.8.28 — so this is a strict additive
change for non-English users.
Each preamble is ~6-8 lines and explicitly:
* names the runtime ("DeepSeek TUI") so the model knows it's not
switching personas
* declares the directive for BOTH `reasoning_content` and the final
reply (the V4 knob that #1118 hinges on)
* preserves tool-name immutability (`read_file`, `exec_shell`,
paths, env vars, CLI flags, URLs stay in their original form)
* handles mid-session language switches (next-turn switching)
* defers to explicit user override ("think in English" etc.)
Three new tests pin the contract:
* `locale_reinforcement_preamble_returns_native_script_for_supported_locales`
— preamble must be in the locale's native script, must mention
`reasoning_content`, and must call out tool-name immutability;
English/unknown locales must return `None`.
* `system_prompt_prepends_locale_preamble_for_zh_hans` — the
preamble must appear *before* the English base prompt body in
the assembled system prompt (attention precedence + cache
ordering both depend on this).
* `system_prompt_skips_locale_preamble_for_english` — English
locale must produce a byte-identical prompt to the pre-feature
behavior (no zh / ja / pt strings anywhere).
Prefix-cache impact: per-locale cache shards stay intact (a
zh-Hans user's prompt shares the preamble across turns; an English
user's prompt is unchanged). Cross-locale cache is invalidated,
which is correct — different users in different locales were never
sharing cache for the right reasons.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps version 0.8.26 → 0.8.29 and toolCount 61 → 62 (new tool from
the v0.8.28 / v0.8.29 cycle landed on the canonical surface).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHANGELOG.md gains a `[0.8.29]` section above `[0.8.28]` covering
the scroll-demon structural fix, the #1395 wrong-project Ctrl+R
fix, MCP HTTP proxy support, MCP discovery skip-malformed, note
commands, AGENTS.md merge, CJK Auto routing, sync-cnb hardening,
and the 4-PR test coverage batch.
README.md and README.zh-CN.md "What's New" sections rewritten to
match (v0.8.27 → v0.8.29). The `prompts::tests::changelog_entry_exists_for_current_package_version`
integration test pins the CHANGELOG-must-have-current-version
invariant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`reqwest 0.13` does not auto-detect proxy env vars by default, so MCP
HTTP connections were bypassing the proxy that every other tool on
the user's box (curl, npm, git, …) was using. Users behind corporate
egress proxies and China-mainland setups routing through a local
Clash / Shadowsocks tunnel had their MCP servers fail to connect or
silently leak around the tunnel.
When the `MCP HTTP transport client builder` runs, we now read
`HTTPS_PROXY` / `https_proxy` / `HTTP_PROXY` / `http_proxy` (first
non-empty wins) and route via `reqwest::Proxy::all(...)`. `NO_PROXY`
is honored via `reqwest::NoProxy::from_env()`. Malformed proxy URLs
log a `tracing::warn!` (no scroll-demon leak — see runtime_log) and
the connection proceeds without a proxy rather than failing the
whole MCP attach.
Closes#1408. Thanks @hlx98007 for the report.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend /note beyond append-only usage with list, show, edit, remove,
clear, path, and explicit add subcommands.
Keep existing /note <text> behavior compatible, preserve the existing
--- separated file format, and number notes only at display time so the
stored notes stay clean.
Update command help, localization, docs, and tests.
When the MCP server returns a list where one entry cannot be deserialized
(e.g. a tool missing the required `name` field), the previous code called
`.unwrap_or_default()` on the whole-list deserialization, silently discarding
every valid entry in the page.
Switch all four discovery functions (tools, resources, resource-templates,
prompts) to iterate over the JSON array and deserialize each item
individually, skipping only those that fail. This ensures a single
non-conformant entry never hides the rest of the list.
Related: #1250
Adds `v0.8.29` (the workspace `CARGO_PKG_VERSION` resolved at compile
time) to the right cluster of the header bar, after the existing
provider / effort / Live / context chips. Users have been asking for
the live version somewhere in the UI — previously only reachable via
`deepseek --version` (CLI flag, not in the TUI) or `/status` (slash
command, requires action).
The chip is the lowest-priority element in the width cascade in
`right_spans()`: under tight terminal width it drops before any of
the existing status chips. Two pinned tests:
* `header_renders_version_chip_when_width_allows` — at width 120
the chip must appear with the current `env!("CARGO_PKG_VERSION")`.
* `narrow_header_drops_version_chip_before_dropping_mode` — at
width 12 (extreme narrow) the chip drops while the mode label
survives, matching the cascade priority.
Styled with `palette::TEXT_HINT` so it sits visually behind the
streaming dot / context signal — present but not distracting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspace + per-crate path-dep version pins, npm wrapper, and
deepseekBinaryVersion all advance 0.8.28 -> 0.8.29.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #1085 ("TUI viewport drifts down inside alt-screen at end of
turn, leaving top rows blank, esp. after sub-agents") was closed in
v0.8.18 by adding `reset_terminal_viewport()` to home the cursor on
TurnComplete / focus / resize. v0.8.27's flicker fix (`abf3fa66f`)
dropped the `\x1b[2J\x1b[3J` deep-clear from that path to stop the
double-clear flicker on Ghostty / VSCode / Win10 conhost. That left
ratatui's incremental-diff renderer relying on its internal model
matching reality — which only holds while nothing else writes to
the terminal.
Two latent `eprintln!` sites had been quietly emitting raw bytes
into the alt-screen for the entire v0.8.x cycle:
* `tools/subagent/mod.rs::persist_state_best_effort` (fires whenever
the per-step sub-agent state save hits an error; under parallel
sub-agents this can fire dozens of times per turn)
* `tools/subagent/mod.rs::new_shared_subagent_manager` (fires once
on init if the prior state file fails to load)
Plus a third found during this fix:
* `network_policy.rs::record` (fires every time a network-policy
audit write fails)
Each eprintln advanced the alt-screen cursor by one row and
scrolled the buffer up by one row, but ratatui's renderer didn't
know — it kept writing to absolute row positions, which now meant
"one row higher than visible." After ~30 leaks the TUI content
appeared to drift downward, with a blank band growing above the
header. v0.8.18's periodic full-clear had been masking it; v0.8.27's
flicker fix unmasked it.
Three layers of defence so this class of bug "isn't an option
anymore":
1. **`crates/tui/src/runtime_log.rs` — file-backed tracing
subscriber + Unix fd-level stderr redirect.** A daily-rolling log
file at `~/.deepseek/logs/tui-YYYY-MM-DD.log` is created at TUI
startup (right after `EnterAlternateScreen`). A
`tracing-subscriber` registry routes `tracing::warn!` /
`tracing::error!` calls to it. On Unix, the process's stderr fd
is `dup2`'d to the same file for the lifetime of the
`TuiLogGuard`. Any future raw `eprintln!` — ours, a panic
message, a third-party crate's verbose output — lands in the log
file instead of the alt-screen. The guard restores the original
stderr fd on drop so shutdown messages still reach the user's
terminal.
2. **`tracing::warn!` replacements** for the three known leak sites
(`subagent/mod.rs` ×2, `network_policy.rs` ×1). With (1) in
place these messages now go to the log file with structured
fields (`?err`, `host`, `tool`) instead of opaque text rows in
the alt-screen.
3. **Module-level
`#![deny(clippy::print_stdout, clippy::print_stderr)]`** on
`tools/`, `core/`, `tui/`, `runtime_threads.rs`, and
`network_policy.rs`. Any future `eprintln!` / `println!` added
to a TUI runtime path fails the lint at compile time.
Legitimate CLI-print paths (`main.rs` eval / init / doctor,
`runtime_api.rs` server banners, `logging.rs` verbose helpers,
`skills/mod.rs` listing utilities, `execpolicy/execpolicycheck.rs`
JSON output, `ui::run_event_loop` post-`LeaveAlternateScreen`
resume hint, two `#[test] #[ignore]` perf benches in
`tui/transcript.rs` / `tui/widgets/mod.rs` / `core/capacity.rs`)
keep their existing prints — they all run outside the alt-screen
lifetime.
The dup2 redirect is Unix-only because there's no equivalent stable
Rust API for fd-redirecting `STDERR_FILENO` on Windows; on Windows
the tracing-subscriber layer + the clippy denies still apply, and
ratatui's own use of crossterm avoids the worst leakage classes.
Cross-platform stderr redirect via `SetStdHandle` is a follow-up.
The new `runtime_log` module ships with one test
(`log_directory_prefers_home`) that pins the `HOME` /
`USERPROFILE` / `dirs::home_dir()` resolution order — uses the
process-wide `test_support::lock_test_env()` lock for env-mutation
safety. Two `#[test] #[ignore]` benches in
`tui/transcript.rs` (rail-prefix memory) and `tui/widgets/mod.rs`
(transcript scroll bench) and one in `core/capacity.rs`
(`bench_compute_profile`) keep their stdout prints via
`#[allow(clippy::print_stdout)]` on the individual test.
New dependencies: `tracing-subscriber 0.3` (env-filter + fmt
features) and `tracing-appender 0.2` at the workspace root, both
pulled into `crates/tui` only.
Closes the v0.8.28 regression Hunter reported in screenshots:
parallel sub-agents running `exec_shell` triggered the scroll
demon with the TUI content squeezed into the bottom third of the
terminal and ~30 rows of blank above the header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`sanitize_stream_chunk` is the per-chunk filter every piece of
streaming text passes through — assistant content, thinking
content, tool results, web-search snippets — before reaching the
renderer. Its job is to keep newlines / tabs intact while dropping
control bytes that could otherwise let a chunk emit terminal escape
sequences (\u{1b}[2J clear-screen, \u{8} backspace, \u{7} bell).
Today the function has zero tests, so a future "let's normalise
newlines" or "let's collapse all whitespace" innocuous tweak could
silently regress the security posture or visibly mangle code blocks.
Adds three unit tests:
* `sanitize_stream_chunk_keeps_printable_and_drops_control_bytes` —
newline/tab survive; ESC, BEL, BS, VT, FF, CR all drop.
* `sanitize_stream_chunk_preserves_unicode` — CJK characters,
emoji, and accented Latin pass through untouched.
* `sanitize_stream_chunk_handles_empty_and_whitespace` — empty
input stays empty; whitespace-only input is preserved; a chunk
that is entirely control bytes legitimately shrinks to empty
(the caller's "skip empty chunk" branches handle the result).
Zero behaviour change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`optional_search_max_results` decides how many results `web_search`
fetches — too high wastes bandwidth, too low misses the model's
intent. The function has three branches today (top-level
`max_results`, then `search_query[0].max_results`, then the
DEFAULT_MAX_RESULTS constant), zero of which are exercised in tests.
`extract_search_query` already has four tests covering the basic
shapes; this PR completes coverage with two edge cases the helper
silently handles (whitespace trim + empty rejection).
Six new tests, no behaviour change:
* `optional_max_results_prefers_top_level_value` — the explicit
outer field wins over a sibling in the array form. Pins the
precedence so a future structured-query implementation can't
flip it accidentally.
* `optional_max_results_falls_back_to_array_form` — when only the
inner form sets the bound (V4's structured `search_query: […]`
shape), it reaches the caller correctly.
* `optional_max_results_uses_default_when_neither_set` — DEFAULT
applies for both the top-level and the array shapes, so the model
can't burn the MAX_RESULTS budget by omitting the field.
* `optional_max_results_only_reads_first_array_entry` — sub-search
fan-out is a future feature; future entries are ignored today
and a multi-query implementation will need to update this test
intentionally.
* `extract_search_query_trims_whitespace_from_array_form_q_alias` —
pads from heredocs/copy-paste don't reach the upstream URL.
* `extract_search_query_rejects_empty_query` — `""`, all-whitespace
`q`, and an empty body each surface the same missing-field error
rather than a confusing upstream "Bot challenge" page.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`parse_pages_arg` validates the user-supplied `pages` argument that
`ReadFileTool` forwards to `pdftotext -f START -l END`. The function
has zero tests today even though it's the only gatekeeper between
user input and a pdftotext spawn — silent acceptance of a malformed
range yields a confusing empty extraction with no actionable error
message.
Adds five tests:
* `parse_pages_arg_accepts_single_page` — `"3"` and `" 7 "` both
return `Some((n, n))`.
* `parse_pages_arg_accepts_range` — `"1-5"`, `"10-20"`, and
whitespace-tolerant `" 1 - 5 "` all parse correctly.
* `parse_pages_arg_rejects_invalid_ranges` — `5-1` (end < start),
`0` and `0-3` (one-indexed contract), empty / whitespace-only
inputs, `abc` (non-numeric), and `3.5` (floats) all return `None`.
* `parse_pages_arg_rejects_half_open_ranges` — `1-`, `-5`, and `-`
reject rather than silently extending to `u32::MAX` or `0`.
* `parse_pages_arg_rejects_negative_numbers` — `-3-5` doesn't wrap
into a giant positive number via u32 parsing.
Zero behaviour change; locks the contract so a future innocuous edit
can't silently shift validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`error_taxonomy.rs` is the central typed-error layer — every
subsystem (client, tools, runtime, UI) routes through
`ErrorEnvelope` and `classify_error_message` to decide retry
policy, UI colour, and offline-flip semantics. Today it ships with
zero tests, so a single innocuous keyword reordering could shift
classification across the whole tree.
Adds 17 unit tests:
* One per category (Network, Authentication, Authorization,
RateLimit, Timeout, InvalidInput, Parse, State, Tool, Internal)
exercising the keyword variants the function intends to catch —
e.g. context-overflow phrasings ("maximum context length",
"context_length_exceeded", "prompt is too long", the OpenAI
"you requested … the maximum is" wording, "context window"),
HTTP 5xx with various spacing rules (502 / 503 / 504, leading
space, trailing space, exact match, embedded), and 429/quota
rate-limit phrasings.
* Three precedence tests pinning the load-bearing ordering:
InvalidInput beats Tool (so a "tool returned: maximum context
length" still surfaces as a /compact-able invalid input),
Timeout beats Network (so "504 Gateway Timeout" classifies as
Timeout because its retry semantics are gentler than Network's),
and RateLimit beats Authentication (so a 429 with API token
phrasing doesn't get misrouted to auth-failure handling).
* Unicode handling: a Chinese error message that still mentions
"context length" hits InvalidInput; a pure-Chinese unknown
message falls through to Internal.
* Display impls round-trip through their snake_case wire form so
consumers depending on the labels can't be silently broken.
Zero behaviour change; only tests + one comment pinning the 504
precedence rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Flash-router fallback heuristic `auto_model_heuristic` only matched
English complexity keywords (`refactor`, `architecture`, `design`,
`debug`, `security`, `review`, `audit`, `migrate`, `optimize`,
`rewrite`, `implement`, `analyze`). A Chinese-speaking user typing
"帮我重构这个模块" or "审计安全漏洞" silently fell through to the
short/long-message length branches and usually landed on Flash for
work that obviously needs Pro-grade reasoning — the symmetric of the
companion gap in `auto_reasoning::select` (and the same root cause).
Extracts the array into a `COMPLEX_KEYWORDS` constant and adds the
Simplified and Traditional Chinese counterparts for each English
keyword:
* refactor → 重构 / 重構
* architecture → 架构 / 架構
* design → 设计 / 設計
* debug → 调试 / 調試
* security → 安全
* review → 审查 / 審查
* audit → 审计 / 審計
* migrate → 迁移 / 遷移
* optimize → 优化 / 優化
* rewrite → 重写 / 重寫
* implement → 实现 / 實現
* analyze → 分析
CJK matches the literal form because the existing `to_lowercase()`
is a no-op for those scripts. English keywords are byte-identical to
before, so English-only behaviour doesn't shift.
Three new tests cover Simplified and Traditional Chinese keyword
routing to Pro, plus a sanity test that short non-keyword Chinese
prose still gets the cost-saving Flash fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`auto_reasoning::select` is the per-turn classifier that picks
`reasoning_effort` for `reasoning_effort = "auto"`. Today it only
recognises English keywords (`debug`, `error`, `search`, `lookup`),
so a user typing in Chinese or Japanese never trips the tier shifts:
"帮我调试代码" stays on `High` instead of escalating to `Max`,
"搜索一下文件" stays on `High` instead of dropping to `Low`. For a
non-English Auto-mode user that's both wrong-side-of-cheap and
wrong-side-of-careful on every turn.
Extracts the keyword sets into `HIGH_EFFORT_KEYWORDS` and
`LOW_EFFORT_KEYWORDS` constants and adds the Chinese / Japanese
vocabulary that maps to the same intents:
* HIGH (→ `Max`): 调试 / 错误 / 报错 / 出错 / 崩溃 / 調試 / 錯誤
in Chinese; デバッグ / エラー / バグ in Japanese.
* LOW (→ `Low`): 搜索 / 查找 / 查询 in Chinese; 検索 in Japanese.
Latin lowercase is preserved (the caller still lowercases the
message), and CJK matches the literal form because CJK has no case.
Four new tests cover Chinese debug keywords, Japanese debug keywords,
Chinese search keywords, the single Japanese search keyword, and a
sanity test that ordinary CJK prose (without keyword hits) still
returns `High` — matching the English-only behaviour the function
already had.
All previous tests (`subagent_returns_low`, `debug_or_error_returns_max`,
`search_or_lookup_returns_low`, `default_returns_high`) continue to
pass — the original English-only paths are unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
travel into every session, ideally merged with a project's local
AGENTS.md when both exist. Maintainer agreed:
> yes that makes sense! am working on getting this organizational
> structure better today so that worktrees etc can feel like an
> intended way of using this.
The fallback path already loaded the global file when no workspace
context existed, but dropped it silently the moment a project
AGENTS.md showed up. After this PR:
* Both files present → merged. The global block is prepended with a
labelled HTML-style fence (`<!-- global: /home/u/.deepseek/AGENTS.md -->`),
then the project block follows with its own fence
(`<!-- project (overrides global where they conflict) -->`). Order
is global-first so workspace rules read last and win "last word"
precedence with the model when they disagree.
* Only project file present → unchanged from before.
* Only global file present → unchanged from before (still acts as a
fallback). The merge framing is suppressed in the global-only case
so the prompt stays minimal.
`source_path` continues to point at the more-specific file (project
> global > nothing) because that's the path the user is likely to
edit when they want to override something.
Two tests:
* `test_local_and_global_agents_merge_when_both_exist` —
the actual #1157 scenario. Asserts both blocks are present, global
precedes project, and the merge-framing label appears between them.
* `test_global_agents_only_no_project_unchanged_fallback` — sanity
check that the global-only path doesn't accidentally inherit the
merge framing.
The pre-existing `test_load_global_agents_when_project_has_no_context`
still passes, so the global-as-fallback contract is preserved.
Refs #1157
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#964 reports that `web_search` returns garbage results — every query
in their reproduction case returned eight entries from a single low-
quality forum domain (`*.forumgratuit.org`) regardless of input. The
root cause is upstream: when Bing's scraping endpoint serves a stuffed
page (often when our request looks too bot-like or the query falls
into a degraded bucket), the parser happily extracts the b_algo
entries and the model receives the junk as authoritative search
results.
Adds a `is_likely_spam_results` heuristic that runs after both Bing
and DDG parsers. When 60% or more of the parsed entries share the
same registrable root domain (with at least three entries to avoid
false positives on legitimate two-link answers), the batch is
discarded. The existing "no results" handling then surfaces a clean
error message to the model instead of routing it toward spam.
`root_domain` strips subdomains so `astralia.forumgratuit.org` and
`russia.forumgratuit.org` collapse to `forumgratuit.org` for the
purpose of dominance counting; eTLD+1 is approximated by keeping the
last two labels, which is close enough for the threshold check.
Five new tests cover the threshold (3-of-5 trips, 2-of-5 doesn't),
short-batch passthrough, normal diverse SERPs (Wikipedia + SO +
Reddit) staying through, and the precise spam reproduction from #964.
Refs #964
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1118 reports that even after configuring the locale to Chinese, V4
keeps emitting English `reasoning_content` (the thinking block) when
the surrounding code/error logs are English-heavy. Maintainer agreed
the prompt needs editing.
The existing language directive already said "both for `reasoning_content`
and for the final reply", but V4 falls into a failure mode where it
mirrors the user message for the final answer while quietly defaulting
to English for thinking. Three additions to `crates/tui/src/prompts/base.md`
sharpen the rule:
1. **Bold the "must both be in Simplified Chinese" requirement**, and
add the failure-mode escape hatches the prompt previously left
implicit ("even when the surrounding system prompt is in English,
and even when the task context [...] is overwhelmingly English").
2. **Spell out the mid-session-switch rule for `reasoning_content`**
explicitly. Today the prompt says "switch with them" but doesn't
reinforce that this includes thinking — V4 sometimes carries the
previous turn's reasoning language forward.
3. **Add an explicit-override clause** for the opposite preference
(#1118 commenter pmsleepcheck preferred English thinking for token
cost). Users can say "think in English" / "用英文思考" and the
model honours that until the next override. The final reply still
tracks the user's message language — only thinking is overridable.
Adds `language_section_carries_reasoning_content_directives_for_1118`
pinning the four load-bearing phrases ("reasoning_content",
"must both be in Simplified Chinese", "overwhelmingly English", and
both English + Chinese override examples) so a future innocuous edit
can't quietly drop them.
The existing `system_prompt_for_mode_with_context_is_byte_stable_for_unchanged_workspace`
test still passes, so byte-stability for a fixed session is intact.
Refs #1118
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add explicit permissions: contents: read (least-privilege)
- Bump actions/checkout@v3 → @v4
- Narrow trigger from on: [push] to on: push: branches: [main] + tags: ['v*']
Matches the hardening convention used by every other workflow in the repo.
#1395 reports that Ctrl+R in project B restores a session that
belongs to project A — the picker was calling `list_sessions()` and
showing every session on disk regardless of where the user is. With
hundreds of past sessions across many repos, the first hit on the
"most recent" sort is rarely the one from the project the user just
opened.
`SessionMetadata.workspace` is already persisted, so the data needed
to filter is there. This PR:
1. Adds a `workspace_scope` field to `SessionPickerView` and a
`show_all_workspaces` toggle. `SessionPickerView::new` now takes
`&Path` so every caller is forced to pass a scope.
2. Filters `filtered` to sessions whose recorded `workspace`
canonicalises to the same path as the active workspace. Both
sides go through `std::fs::canonicalize` so a symlinked or
relative checkout matches its canonical form.
3. Adds an `a` keybinding inside the picker to flip
`show_all_workspaces`, with a status-line readout
("scoped to this workspace" / "showing sessions from every
workspace"). The user can always escape the scope if they need
to.
Three new tests:
- `workspace_scope_filters_sessions_to_current_project` —
reproduces the #1395 scenario (sessions in /tmp/project-a vs
/tmp/project-b; the picker only surfaces the matching project).
- `workspace_scope_toggle_a_expands_to_all_workspaces` — `a` flips
back and forth; status announces the new mode.
- `workspace_scope_none_means_show_all` — the historical
unscoped behaviour is still reachable when the caller passes
no workspace (used for tests + future opt-out).
Updates the two call sites (`ui.rs` Ctrl+R handler and
`commands/session.rs` `/sessions [show]`) to pass
`&app.workspace`.
Closes#1395
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspace + per-crate path-dep version pins, npm wrapper, and
`deepseekBinaryVersion` all advance from 0.8.27 → 0.8.28. Lockfile
refreshed via `cargo update --workspace --offline`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1365 (cherry-picked into v0.8.28) introduced
`term_program_test_guard` as a fresh module-local
`static Mutex<()>`, mirroring the existing
`no_animations_test_guard`. Both serialize their own family of
tests but not with each other — so under cargo's parallel runner,
a `NO_ANIMATIONS=1` leak from one family lands in the env at
the exact moment a `TERM_PROGRAM=iTerm.app` test calls the shared
`apply_env_overrides`, flipping `low_motion` to true and failing
`non_vscode_term_program_does_not_force_low_motion`.
Both guards now return `crate::test_support::lock_test_env()`
(the same fold the v0.8.28 test-stabilization commit applied to
the EnvGuard family in `commands/config.rs`, `commands/network.rs`,
and `tools/recall_archive.rs`). This serializes the two test
groups with each other and with every other env-mutating test in
the suite, eliminating the cross-test env-var race.
`save_api_key_for_openrouter_writes_provider_table` was failing
intermittently for the same reason — a concurrent env mutation
in an unrelated test was clobbering HOME / DEEPSEEK_CONFIG_PATH
in the window between our `EnvGuard::new` and
`save_api_key_for`'s `default_config_path()` read. With the
broader serialization in place, the race window closes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1270 from @SamhandsomeLee landed the function definitions and
both regression tests as part of v0.8.27, but the wiring was
incomplete: `add_local_reference_completions` was never called
from `completions()`, and `build_file_index` never walked
`local_reference_paths`. Hunter marked the two tests `#[ignore]`
with a "v0.8.28 follow-up" trailer in `fe0673d68`.
This completes the wiring:
* `Workspace::completions()` now calls
`add_local_reference_completions` for both the diverging-cwd
branch and the workspace-root branch, after the existing
`walk_for_completions`. The helper is a no-op unless the needle
starts with `.` or contains `/` / `\`, so prose mentions skip
the extra walk.
* `Workspace::build_file_index()` now walks `local_reference_paths`
after the curated dot-dir whitelist (`.deepseek`, `.cursor`,
`.claude`, `.agents`), so explicit user paths into other
gitignored dirs (e.g. `.generated/specs/device-layout.md`)
fuzzy-resolve too. Honors `FILE_INDEX_MAX_ENTRIES` so the
#697 walk-cap still bounds first-turn latency.
* Drops `#[allow(dead_code)]` from the four helpers
(`LOCAL_REFERENCE_SCAN_LIMIT`, `add_local_reference_completions`,
`should_try_local_reference_completion`,
`local_reference_paths`, `should_skip_local_reference_dir`) and
un-ignores both `working_set` regression tests:
`workspace_completions_surface_explicit_hidden_and_ignored_paths`
and
`fuzzy_index_resolves_hidden_and_ignored_files_except_deepseekignored`.
Both tests pass. `.deepseekignore` entries remain blocked from
both completion and basename fuzzy-resolution paths because
`local_reference_paths` adds `.deepseekignore` as a custom-ignore
file on the walker.
Closes the v0.8.28 follow-up trailer left on `fe0673d68`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The steer channel (rx_steer) is only monitored inside
handle_deepseek_turn — when the engine is idle (no active turn),
Ctrl+Enter sent the message into the void. It appeared in the
transcript via the local mirror in steer_user_message, but the LLM
never received it, and the next handle_send_message would drain it as
a "stale steer".
Fix: check app.is_loading in the Ctrl+Enter handler. When the engine
is busy, steer into the current turn as before. When idle, send via
submit_or_steer_message so the message goes through the regular
Op::SendMessage path.
Wrap fallible draw operations in closures so that ESC[?2026l is always
sent regardless of whether an intermediate step (write_all, flush,
clear, draw) returns an error. Without this, a failing ? would return
early and leave the terminal stuck in synchronized-update mode with a
frozen screen.
Review feedback from gemini-code-assist[bot] on #1361.
Wrap terminal.draw() and reset_terminal_viewport() with ESC[?2026h/l
so GPU-accelerated terminals (Ghostty, VS Code, Kitty) defer rendering
until the full frame is written, eliminating intermediate-frame flicker.
Merge viewport-reset + draw into a single sync batch to avoid a
visible blank frame between the two operations.
Best-effort — unsupported terminals silently ignore the sequences.
Fixes#1352
Each tool description now names what to use instead of (cat/head/tail/
sed/grep/find/curl/heredocs in exec_shell), the return shape, and the
limits. Steering language routes V4 toward our typed tools and away
from shell footguns.
Tools updated: read_file, write_file, edit_file, list_dir, grep_files,
file_search, web_search, apply_patch, fetch_url.
Removes the unused legacy normal.txt / plan.txt / yolo.txt prompt
templates and the YOLO_PROMPT / PLAN_PROMPT constants. Both constants
were referenced only by their own self-tests in prompts.rs; AGENT_PROMPT
is preserved (its companion .txt is in the scope of a separate issue).
All description strings stay under 1024 chars (max: 350) with no
embedded newlines or Markdown headers, so the cached tool catalogue
stays prefix-stable for V4's KV cache.
Closes#711
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous comment incorrectly suggested that a user-set low_motion=false
in the settings file could override the TERM_PROGRAM=vscode detection.
In fact, apply_env_overrides() runs after disk load and unconditionally
sets the flag, identical to the existing NO_ANIMATIONS precedent.
Update the comment to state the actual precedence clearly.
Signed-off-by: CrepuscularIRIS <serenitygp@qq.com>
VS Code's integrated terminal sets TERM_PROGRAM=vscode. Its compositor
cannot keep up with the default 120 FPS redraw rate, producing rapid
flickering on some machines while other terminal apps (Terminal.app,
iTerm2) are unaffected (#1356).
Extend apply_env_overrides() to detect TERM_PROGRAM=vscode and
automatically activate low_motion mode (30 FPS cap, no fancy animations),
matching the existing NO_ANIMATIONS env-var pattern. This is a zero-
config fix: users running in VS Code get a stable display with no
settings change required. Users who want the full animation rate can still
set low_motion = false explicitly in their settings file — that file-level
value is already loaded before apply_env_overrides() is called, so an
explicit false in the file wins over this auto-detection.
Two tests added:
- vscode_term_program_forces_low_motion_on: TERM_PROGRAM=vscode enables
low_motion and disables fancy_animations.
- non_vscode_term_program_does_not_force_low_motion: other well-known
terminal programs (iTerm.app, Apple_Terminal, WezTerm, xterm-256color)
are unaffected.
Signed-off-by: CrepuscularIRIS <serenitygp@qq.com>
`provider_switch_clears_turn_cache_history` in `tui/ui/tests.rs`
calls `switch_provider(..., ApiProvider::Ollama, ...)`, which
internally persists the new provider via `Settings::save()` —
writing `default_provider = "ollama"` to
`~/Library/Application Support/deepseek/settings.toml` (or its
`dirs::data_dir()` equivalent on Linux/Windows). Because the
test's `create_test_app` did not isolate `HOME` / `USERPROFILE`,
each run silently overwrote the developer's real preferences.
The contamination then leaked back into adjacent picker tests:
`model_picker::tests::arrow_keys_move_within_focused_pane`
became order-sensitive, passing when it happened to run before
`provider_switch_clears_turn_cache_history` and failing after,
because Ollama is a pass-through provider and
`ModelPickerView` then hid the DeepSeek model rows.
Two fixes:
* `tui/ui/tests.rs::provider_switch_clears_turn_cache_history`
now wraps the test in a `HomeGuard` that redirects HOME /
USERPROFILE to a tempdir for the test's lifetime and restores
the original values on drop. The guard owns the
`test_support::lock_test_env()` mutex so clippy's
`await_holding_lock` lint stays quiet through the
`.await` (the pattern mirrors `tools::recall_archive::HomeGuard`).
* `tui/model_picker.rs::create_test_app` now also pins
`app.api_provider = ApiProvider::Deepseek` alongside the
existing `app.model` / `app.reasoning_effort` overrides, so
the picker tests stop depending on whatever `default_provider`
happens to be in the developer's `settings.toml` for any other
reason.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an engine error occurs (e.g. refusing an insecure HTTP base URL),
the same error was displayed twice: once as a HistoryCell::Error in the
transcript and again as a sticky toast in the footer/composer area.
The toast was created because apply_engine_error_to_app set
status_message, which sync_status_message_to_toasts() converted into a
sticky toast (15s TTL) since the text contained "error"/"failed".
Add turn_error_posted flag to App, set when an EngineEvent::Error is
posted to the transcript, reset on TurnStarted. The TurnComplete error
handler and apply_engine_error_to_app now skip setting status_message
when the flag is set, keeping the error display in the transcript only.
The auth+env_only onboarding path retains its status_message since that
flow relies on it to prompt the user for a saved API key.
Three additions to base.md's Verification Principle section:
1. Before reporting a task as complete, verify the result when
practical; if not verified, say so explicitly.
2. Preserve only key facts from tool results (paths, errors, exit
status, cache values); do not copy large raw outputs.
3. Inspect error before retrying a failed tool call; do not repeat
the identical action blindly.
`WorkingSet::build_file_index` walks the workspace tree (depth 6) plus
all `DISCOVERY_ALWAYS_DIRS` (depth 5) the first time `fuzzy_resolve` is
called. On huge workspaces that walk dominates the first turn's wall
clock, surfacing as the ~10-second `Working...` hang reported in #697.
Adds a `FILE_INDEX_MAX_ENTRIES = 50_000` cap. When the walk produces
more than 50K (file or directory) entries the index is returned early
with a warning. A surplus entry simply isn't fuzzy-resolvable; literal
paths still resolve via the existing fallback so functionality is
preserved on outsized workspaces.
50K is well above any realistic project's depth-6 entry count, so for
typical users the cap is a no-op. The existing `working_set` tests
(26/26) still pass — this is purely a defensive upper bound on a path
that previously had none.
Refs #697
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cmux typically does not set TERM_PROGRAM; it sets LC_TERMINAL=Cmux instead.
The previous resolve_method() only checked TERM_PROGRAM, causing Cmux users
to fall back to the Bel method instead of OSC 9 notifications (#1281).
Changes:
- Add LC_TERMINAL as a secondary env-var probe in resolve_method(), checked
after TERM_PROGRAM. This picks up Cmux (and any other OSC-9 capable
terminal that sets LC_TERMINAL rather than TERM_PROGRAM).
- Add Cmux to the OSC9_TERMINALS allowlist.
- Document that terminals setting neither env var can force OSC 9 with
[notifications].method = "osc9" in the config file.
- Add two new tests:
- auto_detect_picks_osc9_for_cmux_via_lc_terminal
- auto_detect_picks_osc9_for_wezterm_via_lc_terminal
- Harden existing auto_detect_picks_bel_for_unknown_on_unix to clear
LC_TERMINAL before asserting the Bel fallback, preventing flakiness in
test runner environments where LC_TERMINAL is set to a known terminal.
- Update NotificationsConfig.method doc to mention Cmux and the
LC_TERMINAL probe.
Signed-off-by: CrepuscularIRIS <serenitygp@qq.com>
Fixes a cluster of intermittent failures observed on macOS under
parallel test load. Root causes were tests mutating shared global
state (HOME, USERPROFILE, DEEPSEEK_CONFIG_PATH env vars and
~/.deepseek/ filesystem) without holding the process-wide test
lock, plus a few outdated-by-PR assertions and a tight 3s timeout
on Windows CI.
Changes:
* Three EnvGuard / HomeGuard types (commands/config.rs,
commands/network.rs, tools/recall_archive.rs) now acquire
crate::test_support::lock_test_env() and hold the MutexGuard
for their full lifetime, replacing local mutexes that
serialized only within a module. Call sites that previously
acquired lock_test_env() explicitly with `let _lock = ...`
before constructing the guard drop that redundant acquisition;
std::sync::Mutex is not reentrant and double-locking on the
same thread would deadlock.
* settings.rs::config_path_test_guard() now returns the global
test_env lock instead of an isolated module-local mutex.
* model_picker.rs create_test_app() now returns (App, MutexGuard)
so picker tests hold the same lock — eliminates env-var races
with config-mutating tests in adjacent modules.
* task_manager.rs: 4 tests using wait_for_terminal_state bump
3s -> 10s to give Windows CI file-I/O headroom (we saw one
intermittent timeout on the v0.8.27 PR Windows job).
* config.rs: 2 api-key tests now set DEEPSEEK_SECRET_BACKEND=local
so they exercise file-backed storage in CI rather than fail on
Keychain access.
* history.rs: removes streaming_thinking_live_collapses_unless_verbose
which asserted the OLD behavior PR #1390 (#861 RC4) intentionally
changed. The new contract is covered by the three tests PR #1390
added.
* .claude/HANDOFF_v0.8.28_user_issues.md: notes #1394 / PR #1393
as a deferred prompt-reliability enhancement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two independent fixes:
1. **Prompt truthful reporting** (base.md): add explicit rules for honest
outcome reporting — if a tool fails/returns-empty say so; if cache
usage is unobserved treat it as unknown/null, not 0.
2. **Cache usage u64 → Option<u64>** (session.rs): when the API does
not report cache hit/miss tokens, the cumulative SessionUsage
defaulted to 0. Models interpreted this as "no cache hits" rather
than "unknown". Changing to Option<u64> ensures absent cache data
serializes as null in the model context.
Tests added for all three cases: starts None, stays None when API
omits cache, accumulates correctly when API reports cache.
Closes the visibility gap reported in #1324 ("Thinking 思考内容不能流式
输出,只能等到完全输出后通过 ctrl+O 查看完整思考内容") and root cause 4
of #861.
Today `render_thinking` blanks the body whenever `collapsed && streaming`:
```rust
let body_text = if collapsed && streaming {
String::new()
} else if collapsed { … } else { … };
```
That left the user staring at a "thinking..." placeholder for the
entire reasoning phase — V4-Pro thinking can run for tens of seconds,
so the live transcript looked frozen even though tokens were flowing.
Fix:
1. During `collapsed && streaming` we now render the raw content
instead of blanking. `extract_reasoning_summary` is meaningless
while the block is mid-flight (no completed reasoning to summarise),
so the streaming branch returns the body verbatim.
2. The `> THINKING_SUMMARY_LINE_LIMIT` truncation now drops *head*
lines while streaming, keeping the visible window tracking the live
cursor at the bottom — which is what users expect when watching a
model think.
3. The existing "thinking collapsed; press Ctrl+O for full text"
affordance was gated on `!streaming`; it now renders during
streaming as well, with a slightly different label ("thinking
continues; …") so the user knows there's more content above and
how to reach it.
Three new tests cover the new contract: streaming-collapsed shows
live content, the head is dropped not the tail, and the live
affordance fires when truncated.
Refs #861 (RC4), closes#1324
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>