Commit Graph

19 Commits

Author SHA1 Message Date
Paulo Aboim Pinto a3ae974676 feat(tui): add ExternalTool abstraction layer (#2294)
Harvested with thanks to @aboimpinto.

Includes the ExternalTool abstraction layer plus follow-up fixes for lossy REPL stdout handling and unquoted unicode git diff paths.

Validation included full CI and focused local checks for non-UTF8 REPL stdout, git_diff, and external_tool behavior.
2026-05-31 02:24:25 -07:00
Hunter Bown 2947eff9d1 fix(ci): satisfy Rust 1.88 clippy gate 2026-05-24 01:20:19 -05:00
Hunter Bown b7bc8773f3 fix(tui): fail stuck stream turns and smooth RLM handles 2026-05-23 20:08:57 -05:00
Hunter Bown 32ce14d6b2 test(rebrand): residual brand-string cleanup across source and assets
A small cleanup pass to catch brand mentions that the R5 sweep missed
because they hid in:

- HTTP User-Agent format strings (`Mozilla/5.0 (compatible; deepseek-tui/`
  in `client.rs` and `fetch_url.rs`).
- Multi-line error messages whose phrase boundary straddled a line break
  ("…restart\n             deepseek-tui." in `js_execution.rs`,
  `tool_catalog.rs`, `repl/runtime.rs`).
- Doc comments mentioning `deepseek-tui` as a binary (`config/src/lib.rs`,
  `core/capacity.rs`, `tui/streaming/chunking.rs`, `features.rs`).
- Skill descriptions shipped in `crates/tui/assets/skills/*/SKILL.md`.
- Test fixtures with placeholder paths / git emails
  (`tui/external_editor.rs`, `snapshot/repo.rs`).
- `task_manager.rs`'s `cargo test -p deepseek-tui --lib` example.
- `scripts/tencent-lighthouse/doctor.sh` info-line prefix.

The remaining `deepseek-tui` mentions in the codebase are intentional
(the legacy `[[bin]]` entry in `crates/tui/Cargo.toml`, the legacy
`npm/deepseek-tui/` deprecation shim package, the CNB mirror namespace,
the security email, the legacy bin's shim source file, and historical
CHANGELOG entries) and were preserved per the rebrand anti-scope.

Local gates green: `cargo check --workspace --all-targets --locked`,
`cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings`, `cargo test --workspace
--all-features --locked` (3226+ pass, 0 fail).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:58:34 -05:00
Hunter Bown 6821f294e2 fix(rlm): decode stdout lossily 2026-05-21 00:12:59 +08:00
Hunter Bown d5c45d962d chore(release): prepare v0.8.36
Squash merge of work/v0.8.36-cache-hygiene into main.

All preflight gates passed: version-drift/check/lint/test (3073 pass, 0 fail) / CodeQL / GitGuardian / npm-smoke. Preparing the v0.8.36 release tag.
2026-05-14 00:31:18 -05:00
Hunter Bown a507885fb8 test(rlm): tolerate windows python line endings 2026-05-12 22:18:59 -05:00
Hunter Bown 485ba7bbd4 chore(release): finish v0.8.33 polish 2026-05-12 22:03:47 -05:00
Hunter Bown 99c6b22e83 chore(release): v0.8.33 — sub-agent and RLM renovation with persistent sessions
- Persistent RLM sessions (rlm_open/rlm_eval/rlm_close) with bounded REPL helpers
- Fork-aware sub-agent sessions (agent_open/agent_eval/agent_close) with handle_read
- Shared handle_read storage with slice/range/count/JSONPath projections
- Slash-command routing: /rlm, /agent, /relay (/接力) for handoff prompts
- Sidebar renamed to "Work" tab, consistent across Plan/Agent/YOLO modes
- Tool papercuts: file_search excludes, grep_files strings, fetch_url JSON,
  edit_file fuzz, exec_shell merged stdout/stderr, revert_turn no-op reject
- CLI reasoning-effort honoured on non-auto exec routes (#1511 @h3c-hexin)
- Edit-file replacement boundaries clarified (#1516)
- Pandoc output validated before probing (#1523)
- Running turns steerable/repaintable (#1533, #1537)
- Tasks/Activity Detail calmer under load
- npm retry timeout hint (#1538 @reidliu41)
- Issue templates improved (#1525 @reidliu41)
- Shell: kill process group to prevent UI freeze (#828 @CrepuscularIRIS)
- TUI: ignore leaked SGR mouse reports in composer (#1421 @reidliu41)
- Footer: keep chips within available width (#1417 @Wenjunyun123)
- Session picker: scope Ctrl+R to current workspace (#1395 @LinQ)
- Removed stale competitive-analysis doc
- Prompts/docs teach only new tool names
2026-05-12 19:54:08 -05:00
Hunter Bown e6d4eae5d6 fix(security): scrub child process environments 2026-05-08 14:24:07 -05:00
Hunter Bown 4f77c625fd fix(tui): forward-port v0.8.16 hotfix to main
Forward-port the v0.8.16 RLM/sub-agent hotfix onto main after tagging the release branch.
2026-05-07 00:04:31 -05:00
Hunter Bown 03e59c60ce fix(rlm): pin child calls to flash (#832) 2026-05-06 03:41:47 -05:00
Hunter Bown 3f24759966 release: stabilize shell handles for v0.8.0
Bumps the workspace/npm wrapper to 0.8.0 and fixes completed background shell jobs retaining live process handles, which could cause Too many open files, checkpoint save failures, shell spawn failures, and lag around send/close/Esc. Also includes Windows REPL bootstrap timeout hardening and Cargo/TUNA mirror install docs.
2026-04-30 21:34:00 -05:00
Hunter Bown 2865c9a766 refactor(rlm): drop HTTP sidecar — long-lived Python REPL over stdin/stdout
The RLM tool used to spawn a fresh `python3 -c "..."` per round and route
sub-LLM calls through a localhost axum sidecar; state persisted only via a
JSON file (lossy: imports and non-JSON values were lost). The model could
also short-circuit by replying with prose and the loop would ship the
prose as if it came from the REPL.

This commit replaces that with one long-lived `python3 -u` subprocess per
turn driven by a stdin/stdout RPC protocol with UUID-prefixed sentinels.
No more HTTP server, no more port allocation, no more JSON state file —
variables, imports, and any other Python state persist naturally across
rounds. The `RlmBridge` (`crates/tui/src/rlm/bridge.rs`) services
`llm_query` / `llm_query_batched` / `rlm_query` / `rlm_query_batched`
calls inline, recursing into `run_rlm_turn_inner` for sub-RLMs.

The system prompt is tightened: the only legal turn shape is one
` ```repl ` block; calling `FINAL(...)` from prose without ever invoking a
sub-LLM is rejected with a strict reminder. The `DirectAnswer` termination
is gone, replaced by `NoCode` which only surfaces after multiple consecutive
empty rounds. `rlm_process` now returns a per-round trace (code summary,
sub-LLM call count, elapsed) so callers can verify the model actually
engaged with `context` rather than guessing from the preview.

Net: -313 lines. 17 new REPL runtime tests cover variable persistence,
import persistence, RPC round-trips, FINAL capture, and error recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:44:53 -05:00
Hunter Bown 5cec1534be feat(rlm): align with reference impl + add rlm_process tool; bump 0.6.5
The previous /rlm slash command flow had a UI rendering gap (the answer
never made it back to the model's view) and required the user to invoke
it manually. Pivoting to a tool-call surface and aligning the in-REPL
helpers with the canonical reference (alexzhang13/rlm) by the paper
authors so the same prompts and decomposition patterns transfer.

New tool: rlm_process
- crates/tui/src/tools/rlm_process.rs
- Inputs: task (small, shown to root LLM each iter as root_prompt) +
  exactly one of file_path (workspace-relative, preferred) or content
  (inline, capped at 200k chars). Optional child_model and max_depth.
- Loaded across Plan/Agent/YOLO; never deferred via ToolSearch.
- Returns the final answer string + metadata (iterations, duration,
  tokens, termination).

REPL surface aligned with reference (alexzhang13/rlm):
- Variable name `context` (was PROMPT)
- Code fence ```repl (was ```python; python/py kept as fallback)
- Helpers: llm_query, llm_query_batched (NEW), rlm_query (was sub_rlm),
  rlm_query_batched (NEW), SHOW_VARS (NEW), FINAL, FINAL_VAR,
  repl_get/repl_set
- Top-level JSON-serializable user variables auto-persist across rounds
  (no repl_set ceremony required)
- FINAL(...) / FINAL_VAR(...) parseable from the model's raw response
  text (parse_text_final), in addition to the in-REPL sentinel path.
  Code-fenced occurrences are correctly ignored to prevent false hits.

Sidecar (axum, 127.0.0.1:0):
- Added POST /llm_batch and POST /rlm_batch endpoints (parallel fanout,
  cap 16 prompts per batch). Mirrors the reference's batched semantics.

Other:
- System prompt rewritten with reference's strategy patterns
  (PREVIEW → CHUNK+map-reduce via llm_query_batched → RECURSIVE
  decomposition via rlm_query → programmatic compute + LLM interp).
- Strict termination loop unchanged: must emit ```repl or text-level
  FINAL each round; one fence-less round → reminder, two → DirectAnswer.
- /rlm slash command remains for manual debug; description points the
  model toward rlm_process for the in-agent flow.

Versions: workspace 0.6.4 → 0.6.5; npm wrapper 0.6.4 → 0.6.5.

Gates green: cargo fmt, cargo clippy --all-targets --all-features
--locked -D warnings, cargo test --workspace --all-features --locked
(all pass), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=
-Dwarnings cargo doc --workspace --no-deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:17:09 -05:00
Hunter Bown bd938a559c fix(rlm): wire real recursive substrate; bump 0.6.4
The v0.6.3 RLM loop had Algorithm 1's outer shape but the substrate was
non-functional: `llm_query()` was a Python stub that returned a hardcoded
string and `child_model` was bound with an underscore prefix and silently
dropped. The recursive sub-LLM call advertised by /rlm never fired.

This commit wires the substrate end-to-end per Zhang/Kraska/Khattab
(arXiv:2512.24601, Algorithm 1):

- New axum HTTP sidecar (`rlm/sidecar.rs`) bound to 127.0.0.1:0 for the
  duration of one RLM turn. Python's `llm_query()` and `sub_rlm()` are
  real `urllib.request` POSTs; Rust services them via the existing
  DeepSeek client. Token usage from sidecar-served calls folds into the
  parent `RlmTurnResult.usage`.
- `child_model` is plumbed through `Op::RlmQuery` → `AppAction::RlmQuery`
  → `run_rlm_turn` → sidecar handlers; default remains `deepseek-v4-flash`.
- New `sub_rlm(prompt)` Python helper runs a full Algorithm-1 turn at
  depth-1 (paper's `sub_RLM`). Default `max_depth = 2` from `/rlm`. The
  recursive opaque-future cycle is broken by returning a concrete
  `Pin<Box<dyn Future + Send>>` from `run_rlm_turn_inner`.
- Strict termination: the loop ends only via `FINAL(value)` (or the
  iteration cap). One fence-less round is tolerated with a reminder
  appended; two consecutive ones surface the model text as a
  `RlmTermination::DirectAnswer` exit. New `RlmTermination` enum lets
  callers tell `Final | DirectAnswer | Exhausted | Error` apart.
- Richer `Metadata(state)`: includes paper-required access patterns
  (`repl_get` / slicing / `splitlines` / `repl_set` / `llm_query` /
  `sub_rlm` / `FINAL`) and a live list of variable keys currently in
  the REPL state file.
- Unicode-safe `truncate_text` (was mixing bytes with chars), per-turn
  state-file cleanup, `ROOM_TEMPERATURE` typo → `ROOT_TEMPERATURE`.
- New end-to-end test `sidecar_url_is_exported_to_python_env` stands up
  a stand-in axum server, runs `print(llm_query('hello'))` in the real
  PythonRuntime, and asserts the reply round-trips. Catches future
  regressions in sidecar URL passthrough.

Versions: workspace 0.6.3 → 0.6.4 in Cargo.toml; npm wrapper 0.6.3 → 0.6.4
in npm/deepseek-tui/package.json.

Gates: cargo fmt, cargo clippy --all-targets --all-features --locked
-D warnings, cargo test --workspace --all-features --locked (1088
passed), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=-Dwarnings
cargo doc --workspace --no-deps — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:36:59 -05:00
Hunter Bown 42c684367f feat(rlm): implement true RLM loop per Algorithm 1 (Zhang et al., arXiv:2512.24601)
Adds the true Recursive Language Model (RLM) inference paradigm:

- rlm/mod.rs — module root with public API
- rlm/prompt.rs — RLM system prompt teaching the model to write code
- rlm/turn.rs — Algorithm 1 implementation:
  - P stored as REPL variable (NEVER in LLM context window)
  - Metadata-only context sent to root LLM (constant-size)
  - LLM generates Python code, not free text
  - Code executed in PythonRuntime with llm_query() for recursion
  - FINAL() detection ends the loop
- Op::RlmQuery variant in ops.rs
- /rlm command in the command system
- AppAction::RlmQuery handler in ui.rs
- PythonRuntime::with_state_path made public for RLM integration
- 18 new unit tests for code extraction, metadata building, truncation

Key differences from previous 'RLM-inspired' approach:
 P is external (REPL variable), not in LLM context
 Only metadata(state) in LLM context (constant-size)
 LLM generates code, not free text + tool calls
 sub-LLM recursion via llm_query() inside REPL code
 FINAL() mechanism for programmatic termination
2026-04-26 23:34:17 -05:00
Hunter Bown ac8a882be5 chore: clean v0.6.3 repl build warnings 2026-04-26 23:12:57 -05:00
Hunter Bown 4e46fd06f6 feat(repl): wire PythonRuntime into engine turn loop (Phase 2)
After the assistant message is persisted, when tool_uses is empty,
check for inline ```repl blocks and execute them via PythonRuntime:

- Extract REPL blocks from assistant text
- Spawn PythonRuntime and execute each block sequentially
- If a round returns FINAL: replace the assistant message text with
  the final value and break the turn
- If no FINAL: append truncated stdout/stderr as user feedback and
  continue the turn loop for iterative refinement
- Emit status events so the user sees 'REPL round N: ...' in the UI

All 26 REPL tests + RLM tests pass. Release build verified.

Refs: paper-spec RLM (Zhang et al., arXiv:2512.24601) §2
2026-04-26 18:54:46 -05:00