Harvested with thanks to @aboimpinto.
Includes the ExternalTool abstraction layer plus follow-up fixes for lossy REPL stdout handling and unquoted unicode git diff paths.
Validation included full CI and focused local checks for non-UTF8 REPL stdout, git_diff, and external_tool behavior.
A small cleanup pass to catch brand mentions that the R5 sweep missed
because they hid in:
- HTTP User-Agent format strings (`Mozilla/5.0 (compatible; deepseek-tui/`
in `client.rs` and `fetch_url.rs`).
- Multi-line error messages whose phrase boundary straddled a line break
("…restart\n deepseek-tui." in `js_execution.rs`,
`tool_catalog.rs`, `repl/runtime.rs`).
- Doc comments mentioning `deepseek-tui` as a binary (`config/src/lib.rs`,
`core/capacity.rs`, `tui/streaming/chunking.rs`, `features.rs`).
- Skill descriptions shipped in `crates/tui/assets/skills/*/SKILL.md`.
- Test fixtures with placeholder paths / git emails
(`tui/external_editor.rs`, `snapshot/repo.rs`).
- `task_manager.rs`'s `cargo test -p deepseek-tui --lib` example.
- `scripts/tencent-lighthouse/doctor.sh` info-line prefix.
The remaining `deepseek-tui` mentions in the codebase are intentional
(the legacy `[[bin]]` entry in `crates/tui/Cargo.toml`, the legacy
`npm/deepseek-tui/` deprecation shim package, the CNB mirror namespace,
the security email, the legacy bin's shim source file, and historical
CHANGELOG entries) and were preserved per the rebrand anti-scope.
Local gates green: `cargo check --workspace --all-targets --locked`,
`cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings`, `cargo test --workspace
--all-features --locked` (3226+ pass, 0 fail).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the workspace/npm wrapper to 0.8.0 and fixes completed background shell jobs retaining live process handles, which could cause Too many open files, checkpoint save failures, shell spawn failures, and lag around send/close/Esc. Also includes Windows REPL bootstrap timeout hardening and Cargo/TUNA mirror install docs.
The RLM tool used to spawn a fresh `python3 -c "..."` per round and route
sub-LLM calls through a localhost axum sidecar; state persisted only via a
JSON file (lossy: imports and non-JSON values were lost). The model could
also short-circuit by replying with prose and the loop would ship the
prose as if it came from the REPL.
This commit replaces that with one long-lived `python3 -u` subprocess per
turn driven by a stdin/stdout RPC protocol with UUID-prefixed sentinels.
No more HTTP server, no more port allocation, no more JSON state file —
variables, imports, and any other Python state persist naturally across
rounds. The `RlmBridge` (`crates/tui/src/rlm/bridge.rs`) services
`llm_query` / `llm_query_batched` / `rlm_query` / `rlm_query_batched`
calls inline, recursing into `run_rlm_turn_inner` for sub-RLMs.
The system prompt is tightened: the only legal turn shape is one
` ```repl ` block; calling `FINAL(...)` from prose without ever invoking a
sub-LLM is rejected with a strict reminder. The `DirectAnswer` termination
is gone, replaced by `NoCode` which only surfaces after multiple consecutive
empty rounds. `rlm_process` now returns a per-round trace (code summary,
sub-LLM call count, elapsed) so callers can verify the model actually
engaged with `context` rather than guessing from the preview.
Net: -313 lines. 17 new REPL runtime tests cover variable persistence,
import persistence, RPC round-trips, FINAL capture, and error recovery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous /rlm slash command flow had a UI rendering gap (the answer
never made it back to the model's view) and required the user to invoke
it manually. Pivoting to a tool-call surface and aligning the in-REPL
helpers with the canonical reference (alexzhang13/rlm) by the paper
authors so the same prompts and decomposition patterns transfer.
New tool: rlm_process
- crates/tui/src/tools/rlm_process.rs
- Inputs: task (small, shown to root LLM each iter as root_prompt) +
exactly one of file_path (workspace-relative, preferred) or content
(inline, capped at 200k chars). Optional child_model and max_depth.
- Loaded across Plan/Agent/YOLO; never deferred via ToolSearch.
- Returns the final answer string + metadata (iterations, duration,
tokens, termination).
REPL surface aligned with reference (alexzhang13/rlm):
- Variable name `context` (was PROMPT)
- Code fence ```repl (was ```python; python/py kept as fallback)
- Helpers: llm_query, llm_query_batched (NEW), rlm_query (was sub_rlm),
rlm_query_batched (NEW), SHOW_VARS (NEW), FINAL, FINAL_VAR,
repl_get/repl_set
- Top-level JSON-serializable user variables auto-persist across rounds
(no repl_set ceremony required)
- FINAL(...) / FINAL_VAR(...) parseable from the model's raw response
text (parse_text_final), in addition to the in-REPL sentinel path.
Code-fenced occurrences are correctly ignored to prevent false hits.
Sidecar (axum, 127.0.0.1:0):
- Added POST /llm_batch and POST /rlm_batch endpoints (parallel fanout,
cap 16 prompts per batch). Mirrors the reference's batched semantics.
Other:
- System prompt rewritten with reference's strategy patterns
(PREVIEW → CHUNK+map-reduce via llm_query_batched → RECURSIVE
decomposition via rlm_query → programmatic compute + LLM interp).
- Strict termination loop unchanged: must emit ```repl or text-level
FINAL each round; one fence-less round → reminder, two → DirectAnswer.
- /rlm slash command remains for manual debug; description points the
model toward rlm_process for the in-agent flow.
Versions: workspace 0.6.4 → 0.6.5; npm wrapper 0.6.4 → 0.6.5.
Gates green: cargo fmt, cargo clippy --all-targets --all-features
--locked -D warnings, cargo test --workspace --all-features --locked
(all pass), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=
-Dwarnings cargo doc --workspace --no-deps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.6.3 RLM loop had Algorithm 1's outer shape but the substrate was
non-functional: `llm_query()` was a Python stub that returned a hardcoded
string and `child_model` was bound with an underscore prefix and silently
dropped. The recursive sub-LLM call advertised by /rlm never fired.
This commit wires the substrate end-to-end per Zhang/Kraska/Khattab
(arXiv:2512.24601, Algorithm 1):
- New axum HTTP sidecar (`rlm/sidecar.rs`) bound to 127.0.0.1:0 for the
duration of one RLM turn. Python's `llm_query()` and `sub_rlm()` are
real `urllib.request` POSTs; Rust services them via the existing
DeepSeek client. Token usage from sidecar-served calls folds into the
parent `RlmTurnResult.usage`.
- `child_model` is plumbed through `Op::RlmQuery` → `AppAction::RlmQuery`
→ `run_rlm_turn` → sidecar handlers; default remains `deepseek-v4-flash`.
- New `sub_rlm(prompt)` Python helper runs a full Algorithm-1 turn at
depth-1 (paper's `sub_RLM`). Default `max_depth = 2` from `/rlm`. The
recursive opaque-future cycle is broken by returning a concrete
`Pin<Box<dyn Future + Send>>` from `run_rlm_turn_inner`.
- Strict termination: the loop ends only via `FINAL(value)` (or the
iteration cap). One fence-less round is tolerated with a reminder
appended; two consecutive ones surface the model text as a
`RlmTermination::DirectAnswer` exit. New `RlmTermination` enum lets
callers tell `Final | DirectAnswer | Exhausted | Error` apart.
- Richer `Metadata(state)`: includes paper-required access patterns
(`repl_get` / slicing / `splitlines` / `repl_set` / `llm_query` /
`sub_rlm` / `FINAL`) and a live list of variable keys currently in
the REPL state file.
- Unicode-safe `truncate_text` (was mixing bytes with chars), per-turn
state-file cleanup, `ROOM_TEMPERATURE` typo → `ROOT_TEMPERATURE`.
- New end-to-end test `sidecar_url_is_exported_to_python_env` stands up
a stand-in axum server, runs `print(llm_query('hello'))` in the real
PythonRuntime, and asserts the reply round-trips. Catches future
regressions in sidecar URL passthrough.
Versions: workspace 0.6.3 → 0.6.4 in Cargo.toml; npm wrapper 0.6.3 → 0.6.4
in npm/deepseek-tui/package.json.
Gates: cargo fmt, cargo clippy --all-targets --all-features --locked
-D warnings, cargo test --workspace --all-features --locked (1088
passed), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=-Dwarnings
cargo doc --workspace --no-deps — all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the true Recursive Language Model (RLM) inference paradigm:
- rlm/mod.rs — module root with public API
- rlm/prompt.rs — RLM system prompt teaching the model to write code
- rlm/turn.rs — Algorithm 1 implementation:
- P stored as REPL variable (NEVER in LLM context window)
- Metadata-only context sent to root LLM (constant-size)
- LLM generates Python code, not free text
- Code executed in PythonRuntime with llm_query() for recursion
- FINAL() detection ends the loop
- Op::RlmQuery variant in ops.rs
- /rlm command in the command system
- AppAction::RlmQuery handler in ui.rs
- PythonRuntime::with_state_path made public for RLM integration
- 18 new unit tests for code extraction, metadata building, truncation
Key differences from previous 'RLM-inspired' approach:
✅ P is external (REPL variable), not in LLM context
✅ Only metadata(state) in LLM context (constant-size)
✅ LLM generates code, not free text + tool calls
✅ sub-LLM recursion via llm_query() inside REPL code
✅ FINAL() mechanism for programmatic termination
After the assistant message is persisted, when tool_uses is empty,
check for inline ```repl blocks and execute them via PythonRuntime:
- Extract REPL blocks from assistant text
- Spawn PythonRuntime and execute each block sequentially
- If a round returns FINAL: replace the assistant message text with
the final value and break the turn
- If no FINAL: append truncated stdout/stderr as user feedback and
continue the turn loop for iterative refinement
- Emit status events so the user sees 'REPL round N: ...' in the UI
All 26 REPL tests + RLM tests pass. Release build verified.
Refs: paper-spec RLM (Zhang et al., arXiv:2512.24601) §2