feat(rlm): align with reference impl + add rlm_process tool; bump 0.6.5
The previous /rlm slash command flow had a UI rendering gap (the answer never made it back to the model's view) and required the user to invoke it manually. Pivoting to a tool-call surface and aligning the in-REPL helpers with the canonical reference (alexzhang13/rlm) by the paper authors so the same prompts and decomposition patterns transfer. New tool: rlm_process - crates/tui/src/tools/rlm_process.rs - Inputs: task (small, shown to root LLM each iter as root_prompt) + exactly one of file_path (workspace-relative, preferred) or content (inline, capped at 200k chars). Optional child_model and max_depth. - Loaded across Plan/Agent/YOLO; never deferred via ToolSearch. - Returns the final answer string + metadata (iterations, duration, tokens, termination). REPL surface aligned with reference (alexzhang13/rlm): - Variable name `context` (was PROMPT) - Code fence ```repl (was ```python; python/py kept as fallback) - Helpers: llm_query, llm_query_batched (NEW), rlm_query (was sub_rlm), rlm_query_batched (NEW), SHOW_VARS (NEW), FINAL, FINAL_VAR, repl_get/repl_set - Top-level JSON-serializable user variables auto-persist across rounds (no repl_set ceremony required) - FINAL(...) / FINAL_VAR(...) parseable from the model's raw response text (parse_text_final), in addition to the in-REPL sentinel path. Code-fenced occurrences are correctly ignored to prevent false hits. Sidecar (axum, 127.0.0.1:0): - Added POST /llm_batch and POST /rlm_batch endpoints (parallel fanout, cap 16 prompts per batch). Mirrors the reference's batched semantics. Other: - System prompt rewritten with reference's strategy patterns (PREVIEW → CHUNK+map-reduce via llm_query_batched → RECURSIVE decomposition via rlm_query → programmatic compute + LLM interp). - Strict termination loop unchanged: must emit ```repl or text-level FINAL each round; one fence-less round → reminder, two → DirectAnswer. - /rlm slash command remains for manual debug; description points the model toward rlm_process for the in-agent flow. Versions: workspace 0.6.4 → 0.6.5; npm wrapper 0.6.4 → 0.6.5. Gates green: cargo fmt, cargo clippy --all-targets --all-features --locked -D warnings, cargo test --workspace --all-features --locked (all pass), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS= -Dwarnings cargo doc --workspace --no-deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+21
-1
@@ -7,7 +7,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.6.4] - 2026-04-27
|
||||
## [0.6.5] - 2026-04-27
|
||||
|
||||
### Added
|
||||
- **`rlm_process` tool — recursive language model as a tool call.** The previous `/rlm` slash command had a UI rendering gap (the answer never made it back to the model's view) and required the user to remember to invoke it manually. `rlm_process` exposes the full RLM loop as a structured tool the model itself can choose, the same way it reaches for `agent_spawn` or `rlm_query`. Inputs: `task` (small instruction, shown to the root LLM each iteration) plus exactly one of `file_path` (workspace-relative, preferred — keeps the long input out of the model's context entirely) or `content` (inline, capped at 200k chars). Optional `child_model` (default `deepseek-v4-flash`) and `max_depth` (default 1, paper experiments). Returns the synthesized answer with metadata (iterations, duration, tokens, termination reason). Loaded across Plan / Agent / YOLO; never deferred via ToolSearch. (`crates/tui/src/tools/rlm_process.rs`)
|
||||
- **Reference-aligned REPL surface.** Aligned the in-REPL Python helpers with the canonical reference RLM (alexzhang13/rlm). The sub-agent now sees `context` (the full input, not `PROMPT`), `llm_query`, `llm_query_batched`, `rlm_query` (was `sub_rlm`), `rlm_query_batched`, `SHOW_VARS()`, `FINAL(...)`, `FINAL_VAR(...)`, plus `repl_get`/`repl_set`. Same prompt patterns and decomposition strategies from the paper now apply verbatim. (`crates/tui/src/repl/runtime.rs`)
|
||||
- **Concurrent fanout from inside the REPL.** `llm_query_batched(prompts, model=None)` runs up to 16 child completions in parallel via a new `POST /llm_batch` sidecar endpoint — much faster than serial `[llm_query(p) for p in prompts]`. `rlm_query_batched(prompts)` does the same for recursive RLM sub-calls via `POST /rlm_batch`. (`crates/tui/src/rlm/sidecar.rs`)
|
||||
- **`SHOW_VARS()`** — returns `{name: type-name}` for every user variable in the REPL. Lets the model inspect what it has accumulated across rounds before deciding whether to call `FINAL_VAR(name)`.
|
||||
- **Auto-persistence of REPL variables across rounds.** Any top-level JSON-serializable variable the sub-agent creates in a `repl` block now persists to the next round automatically — no `repl_set` ceremony needed unless you want explicit control. Matches the in-process reference REPL semantics.
|
||||
|
||||
### Changed
|
||||
- **Code fence is `repl`, not `python`.** Matches the reference RLM language identifier so the same prompts and few-shot examples work here. Backward-compat fallback to `python` / `py` retained for older model behaviors.
|
||||
- **`FINAL` / `FINAL_VAR` parseable from raw response text.** The reference RLM lets the model write `FINAL(value)` on its own line outside any code block to terminate the loop. Added `parse_text_final()` so that path works alongside the existing in-REPL Python sentinel mechanism. Code-fenced occurrences of `FINAL(...)` are correctly ignored to avoid false positives.
|
||||
- **Strict termination loop.** The sub-agent must emit a ```repl block (or text-level FINAL) to make progress. One fence-less round triggers a reminder; two consecutive trigger a `RlmTermination::DirectAnswer` exit so we don't loop forever.
|
||||
- **`rlm_process` separates `task` (root_prompt) from `file_path`/`content` (context).** The `task` rides along as `root_prompt` and is shown to the root LLM each iteration; the big input lives only in the REPL as `context`. Mirrors the reference's `completion(prompt, root_prompt=...)` API.
|
||||
- **System prompt rewritten** with the reference's strategy patterns (PREVIEW → CHUNK + map-reduce via `llm_query_batched` → RECURSIVE decomposition via `rlm_query` → programmatic computation + LLM interpretation).
|
||||
- The `/rlm` slash command stays for manual experimentation but is no longer the recommended path; the description in `commands/mod.rs` now points the model toward `rlm_process` for the in-agent flow.
|
||||
|
||||
### Reference
|
||||
- Zhang, Kraska, Khattab. "Recursive Language Models." arXiv:2512.24601.
|
||||
- alexzhang13/rlm — reference implementation by the paper authors. Variable names, helper surface, and code-fence convention align with that repo so prompts and patterns transfer.
|
||||
|
||||
|
||||
### Fixed
|
||||
- **`/rlm` actually recurses now (Algorithm 1 substrate, paper-faithful).** The v0.6.3 RLM loop had the right *shape* but its recursive substrate was non-functional: `llm_query()` was a Python stub that returned a hardcoded string, and `child_model` was bound with an underscore prefix and silently dropped. The loop ran but the sub-LLM never fired. v0.6.4 fixes this end-to-end:
|
||||
|
||||
Generated
+13
-13
@@ -806,7 +806,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-agent"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"deepseek-config",
|
||||
"serde",
|
||||
@@ -814,7 +814,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-app-server"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"axum",
|
||||
@@ -837,7 +837,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-config"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"dirs",
|
||||
@@ -848,7 +848,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-core"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"chrono",
|
||||
@@ -867,7 +867,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-execpolicy"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"deepseek-protocol",
|
||||
@@ -876,7 +876,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-hooks"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"async-trait",
|
||||
@@ -890,7 +890,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-mcp"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"deepseek-protocol",
|
||||
@@ -900,7 +900,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-protocol"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"serde",
|
||||
"serde_json",
|
||||
@@ -908,7 +908,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-state"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"chrono",
|
||||
@@ -920,7 +920,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-tools"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"async-trait",
|
||||
@@ -933,7 +933,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-tui"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"arboard",
|
||||
@@ -988,7 +988,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-tui-cli"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"chrono",
|
||||
@@ -1010,7 +1010,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-tui-core"
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
|
||||
[[package]]
|
||||
name = "deranged"
|
||||
|
||||
+1
-1
@@ -18,7 +18,7 @@ default-members = ["crates/cli", "crates/app-server", "crates/tui"]
|
||||
resolver = "2"
|
||||
|
||||
[workspace.package]
|
||||
version = "0.6.4"
|
||||
version = "0.6.5"
|
||||
edition = "2024"
|
||||
license = "MIT"
|
||||
repository = "https://github.com/Hmbown/DeepSeek-TUI"
|
||||
|
||||
@@ -413,6 +413,7 @@ fn should_default_defer_tool(name: &str, mode: AppMode) -> bool {
|
||||
| "file_search"
|
||||
| "diagnostics"
|
||||
| "rlm_query"
|
||||
| "rlm_process"
|
||||
| MULTI_TOOL_PARALLEL_NAME
|
||||
| "update_plan"
|
||||
| "todo_write"
|
||||
@@ -1450,6 +1451,7 @@ impl Engine {
|
||||
builder = builder
|
||||
.with_review_tool(self.deepseek_client.clone(), self.session.model.clone())
|
||||
.with_rlm_query_tool(self.deepseek_client.clone())
|
||||
.with_rlm_process_tool(self.deepseek_client.clone(), self.session.model.clone())
|
||||
.with_user_input_tool()
|
||||
.with_parallel_tool();
|
||||
|
||||
|
||||
+152
-59
@@ -56,55 +56,62 @@ const DEFAULT_STDOUT_LIMIT: usize = 8_192;
|
||||
const ROUND_TIMEOUT: Duration = Duration::from_secs(120);
|
||||
|
||||
/// Python bootstrap — loaded at the top of every execution round.
|
||||
/// Provides `llm_query()`, `sub_rlm()`, `FINAL()`, `FINAL_VAR()`,
|
||||
/// `repl_get/set`, and loads/saves the persistent variable state.
|
||||
///
|
||||
/// `llm_query()` and `sub_rlm()` work by POSTing to a localhost HTTP
|
||||
/// sidecar started by the RLM driver in Rust. The URLs are passed via
|
||||
/// the `REPL_LLM_URL` and `REPL_RLM_URL` environment variables. When
|
||||
/// the REPL is used outside an RLM turn (or the sidecar isn't running)
|
||||
/// the functions return a clear "unavailable" sentinel rather than
|
||||
/// silently lying about a fake response.
|
||||
/// Conforms to the reference RLM runtime (alexzhang13/rlm) so the same
|
||||
/// strategies and prompt patterns work here. Helpers exposed:
|
||||
///
|
||||
/// - `context` — the user's input (loaded from the persistent state file)
|
||||
/// - `llm_query(prompt, model=None, max_tokens=None, system=None)`
|
||||
/// - `llm_query_batched(prompts, model=None)` — concurrent fanout
|
||||
/// - `rlm_query(prompt, model=None)` — recursive sub-RLM (paper's `sub_RLM`)
|
||||
/// - `rlm_query_batched(prompts, model=None)` — concurrent recursive sub-RLMs
|
||||
/// - `SHOW_VARS()` — list user-created REPL variables
|
||||
/// - `FINAL(value)` / `FINAL_VAR(name)` — terminate the loop
|
||||
/// - `repl_get(name, default=None)` / `repl_set(name, value)` — explicit store
|
||||
///
|
||||
/// Sub-LLM and sub-RLM calls are routed through a localhost HTTP sidecar
|
||||
/// started by the RLM driver. URLs are injected via env vars
|
||||
/// (`REPL_LLM_URL`, `REPL_LLM_BATCH_URL`, `REPL_RLM_URL`,
|
||||
/// `REPL_RLM_BATCH_URL`). When the REPL is used outside an active RLM
|
||||
/// turn the functions return a clear "unavailable" sentinel.
|
||||
///
|
||||
/// Persistent state: every round, all top-level user variables that are
|
||||
/// JSON-serializable are saved to the state file so the next round can
|
||||
/// access them as ordinary Python locals (no `repl_get` ceremony needed).
|
||||
const PYTHON_BOOTSTRAP: &str = r#"
|
||||
import sys, json, os, urllib.request, urllib.error
|
||||
import json as _json
|
||||
import os as _os
|
||||
import urllib.request as _urlreq
|
||||
import urllib.error as _urlerr
|
||||
|
||||
# --- Persistent variable store ---
|
||||
_repl_vars = {}
|
||||
_STATE_FILE = os.environ.get('REPL_STATE_FILE', '')
|
||||
if _STATE_FILE and os.path.exists(_STATE_FILE):
|
||||
try:
|
||||
with open(_STATE_FILE, 'r') as f:
|
||||
_repl_vars = json.load(f)
|
||||
except:
|
||||
pass
|
||||
|
||||
# --- HTTP sidecar URLs (set by the RLM driver) ---
|
||||
_LLM_URL = os.environ.get('REPL_LLM_URL', '')
|
||||
_RLM_URL = os.environ.get('REPL_RLM_URL', '')
|
||||
# --- Sidecar URLs (set by the RLM driver) ---
|
||||
_LLM_URL = _os.environ.get('REPL_LLM_URL', '')
|
||||
_LLM_BATCH_URL = _os.environ.get('REPL_LLM_BATCH_URL', '')
|
||||
_RLM_URL = _os.environ.get('REPL_RLM_URL', '')
|
||||
_RLM_BATCH_URL = _os.environ.get('REPL_RLM_BATCH_URL', '')
|
||||
_STATE_FILE = _os.environ.get('REPL_STATE_FILE', '')
|
||||
|
||||
def _post_json(url, body, timeout):
|
||||
data = json.dumps(body).encode('utf-8')
|
||||
req = urllib.request.Request(
|
||||
data = _json.dumps(body).encode('utf-8')
|
||||
req = _urlreq.Request(
|
||||
url, data=data,
|
||||
headers={'Content-Type': 'application/json'},
|
||||
method='POST',
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||
return json.loads(resp.read().decode('utf-8'))
|
||||
with _urlreq.urlopen(req, timeout=timeout) as resp:
|
||||
return _json.loads(resp.read().decode('utf-8'))
|
||||
|
||||
def llm_query(prompt, model=None, max_tokens=None, system=None):
|
||||
"""One-shot sub-LLM call. Returns the completion text as a string.
|
||||
|
||||
Routed through the local Rust sidecar; uses the configured child_model
|
||||
by default, or whatever `model` you pass.
|
||||
"""
|
||||
Cheap and fast — use for chunk extraction / summarization / Q&A.
|
||||
The sub-LLM uses the configured child_model by default."""
|
||||
if not _LLM_URL:
|
||||
return '[llm_query unavailable: no sidecar URL — RLM not active]'
|
||||
return '[llm_query unavailable: no sidecar URL]'
|
||||
body = {'prompt': str(prompt), 'model': model,
|
||||
'max_tokens': max_tokens, 'system': system}
|
||||
try:
|
||||
data = _post_json(_LLM_URL, body, timeout=180)
|
||||
except urllib.error.URLError as e:
|
||||
except _urlerr.URLError as e:
|
||||
return f'[llm_query transport error: {e}]'
|
||||
except Exception as e:
|
||||
return f'[llm_query error: {e}]'
|
||||
@@ -112,48 +119,134 @@ def llm_query(prompt, model=None, max_tokens=None, system=None):
|
||||
return f'[llm_query: {data["error"]}]'
|
||||
return data.get('text', '')
|
||||
|
||||
def sub_rlm(prompt):
|
||||
"""Recursive sub-RLM call (paper's `sub_RLM`). Runs a full RLM turn on
|
||||
`prompt` at depth-1 and returns its final answer string. Bounded by the
|
||||
parent turn's recursion budget.
|
||||
"""
|
||||
if not _RLM_URL:
|
||||
return '[sub_rlm unavailable: no sidecar URL — RLM not active]'
|
||||
def llm_query_batched(prompts, model=None):
|
||||
"""Run multiple llm_query calls concurrently. Returns a list of strings
|
||||
in the same order as the input prompts. Much faster than sequential
|
||||
calls when the sub-prompts are independent."""
|
||||
if not isinstance(prompts, (list, tuple)):
|
||||
return [f'[llm_query_batched error: prompts must be a list]']
|
||||
if not _LLM_BATCH_URL:
|
||||
# Fall back to serial llm_query if no batch endpoint is configured.
|
||||
return [llm_query(p, model=model) for p in prompts]
|
||||
body = {'prompts': [str(p) for p in prompts], 'model': model}
|
||||
try:
|
||||
data = _post_json(_RLM_URL, {'prompt': str(prompt)}, timeout=600)
|
||||
except urllib.error.URLError as e:
|
||||
return f'[sub_rlm transport error: {e}]'
|
||||
data = _post_json(_LLM_BATCH_URL, body, timeout=300)
|
||||
except _urlerr.URLError as e:
|
||||
return [f'[llm_query_batched transport error: {e}]'] * len(prompts)
|
||||
except Exception as e:
|
||||
return f'[sub_rlm error: {e}]'
|
||||
return [f'[llm_query_batched error: {e}]'] * len(prompts)
|
||||
results = data.get('results', [])
|
||||
if len(results) != len(prompts):
|
||||
return [f'[llm_query_batched mismatch: got {len(results)} for {len(prompts)} prompts]'] * len(prompts)
|
||||
return [r.get('text', f'[llm_query_batched: {r.get("error","")}]') for r in results]
|
||||
|
||||
def rlm_query(prompt, model=None):
|
||||
"""Spawn a recursive RLM sub-call (paper's `sub_RLM`). The child gets
|
||||
its own REPL and can iterate, query further sub-LLMs, etc. Use when a
|
||||
sub-task itself requires multi-step reasoning. Bounded by the parent's
|
||||
recursion budget; falls back to llm_query when at depth=0."""
|
||||
if not _RLM_URL:
|
||||
return '[rlm_query unavailable: no sidecar URL]'
|
||||
try:
|
||||
data = _post_json(_RLM_URL, {'prompt': str(prompt), 'model': model}, timeout=600)
|
||||
except _urlerr.URLError as e:
|
||||
return f'[rlm_query transport error: {e}]'
|
||||
except Exception as e:
|
||||
return f'[rlm_query error: {e}]'
|
||||
if data.get('error'):
|
||||
return f'[sub_rlm: {data["error"]}]'
|
||||
return f'[rlm_query: {data["error"]}]'
|
||||
return data.get('text', '')
|
||||
|
||||
# --- FINAL / FINAL_VAR ---
|
||||
def rlm_query_batched(prompts, model=None):
|
||||
"""Spawn multiple recursive RLM sub-calls in parallel. Each prompt
|
||||
gets its own child RLM. Returns a list in input order."""
|
||||
if not isinstance(prompts, (list, tuple)):
|
||||
return [f'[rlm_query_batched error: prompts must be a list]']
|
||||
if not _RLM_BATCH_URL:
|
||||
return [rlm_query(p, model=model) for p in prompts]
|
||||
body = {'prompts': [str(p) for p in prompts], 'model': model}
|
||||
try:
|
||||
data = _post_json(_RLM_BATCH_URL, body, timeout=900)
|
||||
except _urlerr.URLError as e:
|
||||
return [f'[rlm_query_batched transport error: {e}]'] * len(prompts)
|
||||
except Exception as e:
|
||||
return [f'[rlm_query_batched error: {e}]'] * len(prompts)
|
||||
results = data.get('results', [])
|
||||
if len(results) != len(prompts):
|
||||
return [f'[rlm_query_batched mismatch: got {len(results)} for {len(prompts)} prompts]'] * len(prompts)
|
||||
return [r.get('text', f'[rlm_query_batched: {r.get("error","")}]') for r in results]
|
||||
|
||||
def FINAL(value):
|
||||
"""Signal the REPL to stop with this final answer."""
|
||||
print(f'__REPL_FINAL__::{json.dumps(str(value))}', flush=True)
|
||||
"""Signal the RLM loop to stop with this final answer."""
|
||||
print(f'__REPL_FINAL__::{_json.dumps(str(value))}', flush=True)
|
||||
|
||||
def FINAL_VAR(name):
|
||||
"""Signal the REPL to stop, returning the named variable."""
|
||||
val = _repl_vars.get(str(name), f'<variable {name!r} not found>')
|
||||
print(f'__REPL_FINAL__::{json.dumps(str(val))}', flush=True)
|
||||
"""Signal the RLM loop to stop, returning a named variable as the answer."""
|
||||
name_str = str(name).strip().strip("'\"")
|
||||
if name_str in globals():
|
||||
val = globals()[name_str]
|
||||
print(f'__REPL_FINAL__::{_json.dumps(str(val))}', flush=True)
|
||||
else:
|
||||
print(f"FINAL_VAR error: variable '{name_str}' not found. "
|
||||
f"Use SHOW_VARS() to list available variables.", flush=True)
|
||||
|
||||
def SHOW_VARS():
|
||||
"""Return a dict of {name: type-name} for all user variables in the REPL."""
|
||||
out = {}
|
||||
for k, v in list(globals().items()):
|
||||
if k.startswith('_') or k in _BOOTSTRAP_NAMES:
|
||||
continue
|
||||
out[k] = type(v).__name__
|
||||
return out
|
||||
|
||||
# --- State helpers ---
|
||||
def repl_get(name, default=None):
|
||||
return _repl_vars.get(str(name), default)
|
||||
return globals().get(str(name), default)
|
||||
|
||||
def repl_set(name, value):
|
||||
_repl_vars[str(name)] = value
|
||||
globals()[str(name)] = value
|
||||
|
||||
# --- Save state after execution ---
|
||||
# Names defined by the bootstrap that should NOT be persisted as user vars.
|
||||
_BOOTSTRAP_NAMES = {
|
||||
'llm_query', 'llm_query_batched', 'rlm_query', 'rlm_query_batched',
|
||||
'SHOW_VARS', 'FINAL', 'FINAL_VAR', 'repl_get', 'repl_set',
|
||||
}
|
||||
|
||||
# Restore user variables from the previous round's state file. Any
|
||||
# JSON-serializable value persists as a regular Python local.
|
||||
def _load_state():
|
||||
if not _STATE_FILE or not _os.path.exists(_STATE_FILE):
|
||||
return
|
||||
try:
|
||||
with open(_STATE_FILE, 'r') as f:
|
||||
data = _json.load(f)
|
||||
if isinstance(data, dict):
|
||||
for k, v in data.items():
|
||||
if not k.startswith('_'):
|
||||
globals()[k] = v
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Save user variables (everything that's JSON-serializable and not a
|
||||
# bootstrap helper) to the state file for the next round.
|
||||
def _save_state():
|
||||
if _STATE_FILE:
|
||||
if not _STATE_FILE:
|
||||
return
|
||||
out = {}
|
||||
for k, v in list(globals().items()):
|
||||
if k.startswith('_') or k in _BOOTSTRAP_NAMES:
|
||||
continue
|
||||
try:
|
||||
with open(_STATE_FILE, 'w') as f:
|
||||
json.dump(_repl_vars, f)
|
||||
except:
|
||||
pass
|
||||
_json.dumps(v)
|
||||
except (TypeError, ValueError):
|
||||
continue
|
||||
out[k] = v
|
||||
try:
|
||||
with open(_STATE_FILE, 'w') as f:
|
||||
_json.dump(out, f)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
_load_state()
|
||||
"#;
|
||||
|
||||
/// Code suffix — appended after user code to save state.
|
||||
|
||||
+99
-120
@@ -1,154 +1,133 @@
|
||||
//! RLM system prompt — teaches the model to write code and use the REPL
|
||||
//! per Algorithm 1 of Zhang et al. (arXiv:2512.24601).
|
||||
//! RLM system prompt — adapted from the reference RLM implementation
|
||||
//! (alexzhang13/rlm) and Zhang et al., arXiv:2512.24601, so the same
|
||||
//! decomposition strategies and prompt patterns apply here.
|
||||
|
||||
use crate::models::SystemPrompt;
|
||||
|
||||
/// Build the system prompt for a Recursive Language Model (RLM) root LLM call.
|
||||
///
|
||||
/// This prompt instructs the root LLM to generate Python code that
|
||||
/// manipulates the `PROMPT` variable in the REPL environment, using
|
||||
/// `llm_query()` for one-shot sub-LLM calls, `sub_rlm()` for full
|
||||
/// recursive RLM calls, and `FINAL()` to return the answer.
|
||||
/// Tells the root model:
|
||||
/// - your context lives in the REPL as `context`
|
||||
/// - emit a single ```repl block per turn
|
||||
/// - reach for `llm_query` / `rlm_query` (and the batched variants) for
|
||||
/// sub-LLM work; never try to fit the whole context into one call
|
||||
/// - end the loop with `FINAL(value)` or `FINAL_VAR(name)`
|
||||
pub fn rlm_system_prompt() -> SystemPrompt {
|
||||
SystemPrompt::Text(RLM_SYSTEM_PROMPT.trim().to_string())
|
||||
}
|
||||
|
||||
const RLM_SYSTEM_PROMPT: &str = r#"You are a Recursive Language Model (RLM).
|
||||
const RLM_SYSTEM_PROMPT: &str = r#"You are a Recursive Language Model (RLM). You answer the user's query interactively in a Python REPL that holds the full input as a `context` variable, and you can recursively call sub-LLMs to chunk, decompose, and synthesize answers over it. You will be queried iteratively until you provide a final answer.
|
||||
|
||||
Your job is to process the user's prompt by writing Python code. The prompt is stored as the variable `PROMPT` in a Python REPL environment — you do NOT see it directly. You must inspect and process it programmatically.
|
||||
The REPL is initialised with:
|
||||
1. `context` — the full input as a string. May be very large; never print it in full.
|
||||
2. `llm_query(prompt, model=None, max_tokens=None, system=None)` — one-shot child LLM call. Fast and lightweight; use for chunk-level extraction, summarization, or Q&A. The child can handle very large prompts (~hundreds of thousands of chars).
|
||||
3. `llm_query_batched(prompts, model=None)` — run many `llm_query` calls concurrently. Returns `list[str]` in input order. Much faster than sequential calls when sub-prompts are independent.
|
||||
4. `rlm_query(prompt, model=None)` — spawn a recursive RLM sub-call for sub-tasks that themselves need multi-step reasoning, code execution, or their own iteration. Falls back to `llm_query` when the recursion budget is exhausted.
|
||||
5. `rlm_query_batched(prompts, model=None)` — multiple recursive RLM sub-calls in parallel.
|
||||
6. `SHOW_VARS()` — list user-created REPL variables and their types.
|
||||
7. `repl_set(name, value)` / `repl_get(name)` — explicit cross-round persistence (note: any JSON-serializable top-level variable already persists automatically).
|
||||
8. `print()` — show output. The driver feeds a (truncated) preview back to you.
|
||||
9. `FINAL(value)` or `FINAL_VAR(name)` — end the loop. Place either on its own line OUTSIDE the ```repl block (preferred) or call as a Python statement INSIDE the block.
|
||||
|
||||
## REPL Environment
|
||||
How to operate
|
||||
|
||||
The Python REPL starts each round with persistent state. Use these functions:
|
||||
Each turn, emit ONE ```repl block of Python. The block runs inside the REPL; printed output and any new variables come back to you next turn. End the loop with `FINAL(...)`.
|
||||
|
||||
- `repl_get("PROMPT")` — Returns the full user prompt string.
|
||||
- `repl_set(name, value)` — Stores a variable for future rounds.
|
||||
- `repl_get(name)` — Retrieves a previously stored variable.
|
||||
- `llm_query(prompt, model=None, max_tokens=None, system=None)` — One-shot
|
||||
call to a sub-LLM. Returns the completion text. Cheap and fast — uses
|
||||
the configured child model (deepseek-v4-flash by default).
|
||||
- `sub_rlm(prompt)` — Recursive RLM call. Runs a full Algorithm-1 loop on
|
||||
the given prompt at depth-1 and returns its final answer. Use this when
|
||||
the sub-task is itself big enough to need decomposition.
|
||||
- `FINAL(value)` — Sets the final answer and ends the RLM loop. Call this
|
||||
when you have the complete answer.
|
||||
When to use `llm_query` vs `rlm_query`:
|
||||
- `llm_query` for one-shot work: extracting from a chunk, summarizing, classifying, simple Q&A.
|
||||
- `rlm_query` when the sub-task itself needs decomposition or iteration — i.e. it's RLM-shaped on its own (a long doc → its own chunked summary, a hard sub-question that needs branching).
|
||||
|
||||
## How to operate
|
||||
Strategy patterns
|
||||
|
||||
Every round you MUST emit a single ```python … ``` fenced block. The loop
|
||||
ends only when you call `FINAL(value)` inside that code, OR when iterations
|
||||
are exhausted. Plain-text replies are not accepted.
|
||||
1. PREVIEW first.
|
||||
```repl
|
||||
print(f"len(context) = {len(context)}")
|
||||
print(context[:500])
|
||||
```
|
||||
|
||||
1. PREVIEW the prompt first:
|
||||
```python
|
||||
text = repl_get("PROMPT")
|
||||
print(f"Length: {len(text)}")
|
||||
print(text[:500])
|
||||
```
|
||||
2. CHUNK + map-reduce with batched concurrent calls.
|
||||
```repl
|
||||
chunk_size = 8000
|
||||
chunks = [context[i:i+chunk_size] for i in range(0, len(context), chunk_size)]
|
||||
prompts = [f"Extract any mentions of X from this section:\n\n{c}" for c in chunks]
|
||||
partials = llm_query_batched(prompts)
|
||||
combined = "\n\n".join(partials)
|
||||
answer = llm_query(f"Synthesize across these section-level extractions:\n\n{combined}")
|
||||
print(answer[:500])
|
||||
```
|
||||
Then on the next turn:
|
||||
FINAL(answer)
|
||||
|
||||
2. DECOMPOSE the task into chunks. For long prompts, process parts
|
||||
independently using llm_query() for each chunk:
|
||||
```python
|
||||
text = repl_get("PROMPT")
|
||||
chunk_size = 2000
|
||||
results = []
|
||||
for i in range(0, len(text), chunk_size):
|
||||
chunk = text[i:i+chunk_size]
|
||||
result = llm_query(f"Process this part: {chunk}")
|
||||
results.append(result)
|
||||
repl_set("chunk_results", results)
|
||||
```
|
||||
3. RECURSIVE decomposition for hard sub-problems.
|
||||
```repl
|
||||
trend = rlm_query(f"Analyze this dataset and conclude with one word — up, down, or stable: {data}")
|
||||
recommendation = "Hold" if "stable" in trend.lower() else ("Hedge" if "down" in trend.lower() else "Increase")
|
||||
print(trend, "→", recommendation)
|
||||
```
|
||||
|
||||
3. COMBINE results and call FINAL:
|
||||
```python
|
||||
results = repl_get("chunk_results", [])
|
||||
combined = "\n".join(results)
|
||||
FINAL(combined)
|
||||
```
|
||||
4. PROGRAMMATIC computation + LLM interpretation.
|
||||
```repl
|
||||
import math
|
||||
theta = math.degrees(math.atan2(v_perp, v_parallel))
|
||||
final_answer = llm_query(f"Entry angle is {theta:.2f}°. Phrase the answer for a physics student.")
|
||||
```
|
||||
Then: FINAL(final_answer)
|
||||
|
||||
## Rules
|
||||
Rules
|
||||
|
||||
- You MUST output Python code inside ```python blocks. Only code inside
|
||||
```python fences is executed. Commentary outside the fences is ignored.
|
||||
- The PROMPT variable may be very large. Never print it in full — always
|
||||
truncate to a preview.
|
||||
- Use `llm_query()` for cheap one-shot decomposition. Use `sub_rlm()` only
|
||||
when a sub-task is itself large enough to need its own RLM loop.
|
||||
- Previous code and stdout summaries are shown in the conversation history.
|
||||
Build on them rather than repeating work.
|
||||
- The loop ends ONLY when you call `FINAL(value)`. There is no plain-text
|
||||
early-exit; if you reply without a code fence you'll be reminded.
|
||||
|
||||
## Strategy hints
|
||||
|
||||
- For code analysis: print structure, then llm_query() per file/function.
|
||||
- For long document processing: chunk PROMPT, llm_query() each chunk,
|
||||
aggregate, then FINAL.
|
||||
- For research / multi-step reasoning: decompose the question, query each
|
||||
sub-question via llm_query(), synthesize, FINAL.
|
||||
- For iterative tasks: cache intermediate results with repl_set, retrieve
|
||||
with repl_get across rounds.
|
||||
- Emit exactly one ```repl block per turn (or `FINAL(...)` on its own line to end the loop).
|
||||
- Never print or stuff `context` in its entirety. Slice, sample, or chunk.
|
||||
- Sub-LLMs are powerful — feed them generous chunks (e.g. tens of thousands of chars) rather than padding through tiny windows.
|
||||
- JSON-serializable top-level variables persist across rounds automatically; non-serializable ones (custom objects, file handles) do not.
|
||||
- Do not say "I will do X" — just do it. Output the next ```repl block.
|
||||
"#;
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn body() -> String {
|
||||
match rlm_system_prompt() {
|
||||
SystemPrompt::Text(t) => t,
|
||||
_ => panic!("expected Text"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_is_not_empty() {
|
||||
let prompt = rlm_system_prompt();
|
||||
match prompt {
|
||||
SystemPrompt::Text(text) => assert!(!text.is_empty()),
|
||||
_ => panic!("expected Text"),
|
||||
assert!(!body().is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_uses_repl_fence() {
|
||||
assert!(body().contains("```repl"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_mentions_context_variable() {
|
||||
assert!(body().contains("`context`"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_mentions_all_helpers() {
|
||||
let s = body();
|
||||
for name in [
|
||||
"llm_query",
|
||||
"llm_query_batched",
|
||||
"rlm_query",
|
||||
"rlm_query_batched",
|
||||
"SHOW_VARS",
|
||||
"FINAL",
|
||||
"FINAL_VAR",
|
||||
] {
|
||||
assert!(s.contains(name), "system prompt missing helper: {name}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_mentions_llm_query() {
|
||||
let prompt = rlm_system_prompt();
|
||||
match prompt {
|
||||
SystemPrompt::Text(text) => assert!(text.contains("llm_query")),
|
||||
_ => panic!("expected Text"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_mentions_sub_rlm() {
|
||||
let prompt = rlm_system_prompt();
|
||||
match prompt {
|
||||
SystemPrompt::Text(text) => assert!(text.contains("sub_rlm")),
|
||||
_ => panic!("expected Text"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_mentions_final() {
|
||||
let prompt = rlm_system_prompt();
|
||||
match prompt {
|
||||
SystemPrompt::Text(text) => assert!(text.contains("FINAL")),
|
||||
_ => panic!("expected Text"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_mentions_python_fence() {
|
||||
let prompt = rlm_system_prompt();
|
||||
match prompt {
|
||||
SystemPrompt::Text(text) => assert!(text.contains("```python")),
|
||||
_ => panic!("expected Text"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rlm_prompt_forbids_plaintext_exit() {
|
||||
// Strict mode: the old "just write a short response without code
|
||||
// fences" sentence must be gone.
|
||||
let prompt = rlm_system_prompt();
|
||||
match prompt {
|
||||
SystemPrompt::Text(text) => {
|
||||
assert!(!text.contains("without code fences and the RLM loop will end"));
|
||||
}
|
||||
_ => panic!("expected Text"),
|
||||
}
|
||||
fn rlm_prompt_does_not_promise_plaintext_exit_loophole() {
|
||||
// The old prompt had "just write a short response without code fences
|
||||
// and the RLM loop will end". Make sure that's gone.
|
||||
assert!(!body().contains("without code fences and the RLM loop"));
|
||||
}
|
||||
}
|
||||
|
||||
@@ -178,6 +178,7 @@ async fn sub_rlm_handler(
|
||||
&ctx.client,
|
||||
ctx.child_model.clone(),
|
||||
req.prompt,
|
||||
None,
|
||||
ctx.child_model.clone(),
|
||||
tx,
|
||||
ctx.depth_remaining.saturating_sub(1),
|
||||
@@ -198,6 +199,223 @@ async fn sub_rlm_handler(
|
||||
})
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Batch endpoints
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Hard cap on prompts per batch request. Mirrors `tools/rlm_query.rs`.
|
||||
const MAX_BATCH: usize = 16;
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct BatchReq {
|
||||
prompts: Vec<String>,
|
||||
#[serde(default)]
|
||||
model: Option<String>,
|
||||
#[serde(default)]
|
||||
max_tokens: Option<u32>,
|
||||
#[serde(default)]
|
||||
system: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Serialize)]
|
||||
struct BatchResp {
|
||||
results: Vec<LlmResp>,
|
||||
}
|
||||
|
||||
async fn llm_batch_handler(
|
||||
State(ctx): State<Arc<SidecarCtx>>,
|
||||
Json(req): Json<BatchReq>,
|
||||
) -> Json<BatchResp> {
|
||||
if req.prompts.is_empty() {
|
||||
return Json(BatchResp { results: vec![] });
|
||||
}
|
||||
if req.prompts.len() > MAX_BATCH {
|
||||
return Json(BatchResp {
|
||||
results: req
|
||||
.prompts
|
||||
.iter()
|
||||
.map(|_| LlmResp {
|
||||
text: String::new(),
|
||||
error: Some(format!(
|
||||
"batch too large: {} > {MAX_BATCH}",
|
||||
req.prompts.len()
|
||||
)),
|
||||
})
|
||||
.collect(),
|
||||
});
|
||||
}
|
||||
|
||||
let model = req
|
||||
.model
|
||||
.filter(|m| !m.is_empty())
|
||||
.unwrap_or_else(|| ctx.child_model.clone());
|
||||
let max_tokens = req.max_tokens.unwrap_or(DEFAULT_CHILD_MAX_TOKENS);
|
||||
let system = req.system;
|
||||
|
||||
let futures = req.prompts.into_iter().map(|prompt| {
|
||||
let client = ctx.client.clone();
|
||||
let model = model.clone();
|
||||
let system = system.clone();
|
||||
async move {
|
||||
let request = MessageRequest {
|
||||
model,
|
||||
messages: vec![Message {
|
||||
role: "user".to_string(),
|
||||
content: vec![ContentBlock::Text {
|
||||
text: prompt,
|
||||
cache_control: None,
|
||||
}],
|
||||
}],
|
||||
max_tokens,
|
||||
system: system.map(crate::models::SystemPrompt::Text),
|
||||
tools: None,
|
||||
tool_choice: None,
|
||||
metadata: None,
|
||||
thinking: None,
|
||||
reasoning_effort: None,
|
||||
stream: Some(false),
|
||||
temperature: Some(0.4_f32),
|
||||
top_p: Some(0.9_f32),
|
||||
};
|
||||
let fut = client.create_message(request);
|
||||
tokio::time::timeout(std::time::Duration::from_secs(CHILD_TIMEOUT_SECS), fut).await
|
||||
}
|
||||
});
|
||||
|
||||
let outcomes = futures_util::future::join_all(futures).await;
|
||||
|
||||
let mut results = Vec::with_capacity(outcomes.len());
|
||||
let mut total_input = 0_u32;
|
||||
let mut total_output = 0_u32;
|
||||
|
||||
for outcome in outcomes {
|
||||
match outcome {
|
||||
Ok(Ok(resp)) => {
|
||||
let text = resp
|
||||
.content
|
||||
.iter()
|
||||
.filter_map(|b| match b {
|
||||
ContentBlock::Text { text, .. } => Some(text.as_str()),
|
||||
_ => None,
|
||||
})
|
||||
.collect::<Vec<_>>()
|
||||
.join("\n");
|
||||
total_input = total_input.saturating_add(resp.usage.input_tokens);
|
||||
total_output = total_output.saturating_add(resp.usage.output_tokens);
|
||||
results.push(LlmResp { text, error: None });
|
||||
}
|
||||
Ok(Err(e)) => results.push(LlmResp {
|
||||
text: String::new(),
|
||||
error: Some(format!("{e}")),
|
||||
}),
|
||||
Err(_) => results.push(LlmResp {
|
||||
text: String::new(),
|
||||
error: Some(format!("timed out after {CHILD_TIMEOUT_SECS}s")),
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
let mut u = ctx.usage.lock().await;
|
||||
u.input_tokens = u.input_tokens.saturating_add(total_input);
|
||||
u.output_tokens = u.output_tokens.saturating_add(total_output);
|
||||
}
|
||||
|
||||
Json(BatchResp { results })
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct RlmBatchReq {
|
||||
prompts: Vec<String>,
|
||||
}
|
||||
|
||||
async fn rlm_batch_handler(
|
||||
State(ctx): State<Arc<SidecarCtx>>,
|
||||
Json(req): Json<RlmBatchReq>,
|
||||
) -> Json<BatchResp> {
|
||||
if req.prompts.is_empty() {
|
||||
return Json(BatchResp { results: vec![] });
|
||||
}
|
||||
if req.prompts.len() > MAX_BATCH {
|
||||
return Json(BatchResp {
|
||||
results: req
|
||||
.prompts
|
||||
.iter()
|
||||
.map(|_| LlmResp {
|
||||
text: String::new(),
|
||||
error: Some(format!(
|
||||
"batch too large: {} > {MAX_BATCH}",
|
||||
req.prompts.len()
|
||||
)),
|
||||
})
|
||||
.collect(),
|
||||
});
|
||||
}
|
||||
if ctx.depth_remaining == 0 {
|
||||
return Json(BatchResp {
|
||||
results: req
|
||||
.prompts
|
||||
.iter()
|
||||
.map(|_| LlmResp {
|
||||
text: String::new(),
|
||||
error: Some("rlm_query_batched: recursion budget exhausted".to_string()),
|
||||
})
|
||||
.collect(),
|
||||
});
|
||||
}
|
||||
|
||||
let futures = req.prompts.into_iter().map(|prompt| {
|
||||
let client = ctx.client.clone();
|
||||
let child_model = ctx.child_model.clone();
|
||||
let depth = ctx.depth_remaining.saturating_sub(1);
|
||||
async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::channel(64);
|
||||
let drain = tokio::spawn(async move { while rx.recv().await.is_some() {} });
|
||||
// Same dyn-erasure pattern as sub_rlm_handler — break the recursive
|
||||
// future-type cycle through the boxed dyn return of run_rlm_turn_inner.
|
||||
let result = super::turn::run_rlm_turn_inner(
|
||||
&client,
|
||||
child_model.clone(),
|
||||
prompt,
|
||||
None,
|
||||
child_model,
|
||||
tx,
|
||||
depth,
|
||||
)
|
||||
.await;
|
||||
drain.abort();
|
||||
result
|
||||
}
|
||||
});
|
||||
|
||||
let results_raw = futures_util::future::join_all(futures).await;
|
||||
|
||||
let mut results = Vec::with_capacity(results_raw.len());
|
||||
let mut total_input = 0_u32;
|
||||
let mut total_output = 0_u32;
|
||||
|
||||
for result in results_raw {
|
||||
total_input = total_input.saturating_add(result.usage.input_tokens);
|
||||
total_output = total_output.saturating_add(result.usage.output_tokens);
|
||||
results.push(LlmResp {
|
||||
text: result.answer,
|
||||
error: result.error,
|
||||
});
|
||||
}
|
||||
|
||||
{
|
||||
let mut u = ctx.usage.lock().await;
|
||||
u.input_tokens = u.input_tokens.saturating_add(total_input);
|
||||
u.output_tokens = u.output_tokens.saturating_add(total_output);
|
||||
}
|
||||
|
||||
Json(BatchResp { results })
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Handle / start
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Result of starting the sidecar — the bound socket address and the task
|
||||
/// handle. Drop or abort the handle to stop the server.
|
||||
pub struct SidecarHandle {
|
||||
@@ -209,9 +427,15 @@ impl SidecarHandle {
|
||||
pub fn llm_url(&self) -> String {
|
||||
format!("http://{}/llm", self.addr)
|
||||
}
|
||||
pub fn llm_batch_url(&self) -> String {
|
||||
format!("http://{}/llm_batch", self.addr)
|
||||
}
|
||||
pub fn rlm_url(&self) -> String {
|
||||
format!("http://{}/rlm", self.addr)
|
||||
}
|
||||
pub fn rlm_batch_url(&self) -> String {
|
||||
format!("http://{}/rlm_batch", self.addr)
|
||||
}
|
||||
pub fn shutdown(self) {
|
||||
self.task.abort();
|
||||
}
|
||||
@@ -224,7 +448,9 @@ pub async fn start_sidecar(ctx: Arc<SidecarCtx>) -> std::io::Result<SidecarHandl
|
||||
let addr = listener.local_addr()?;
|
||||
let app = Router::new()
|
||||
.route("/llm", post(llm_handler))
|
||||
.route("/llm_batch", post(llm_batch_handler))
|
||||
.route("/rlm", post(sub_rlm_handler))
|
||||
.route("/rlm_batch", post(rlm_batch_handler))
|
||||
.with_state(ctx);
|
||||
let task = tokio::spawn(async move {
|
||||
let _ = axum::serve(listener, app).await;
|
||||
|
||||
+268
-63
@@ -101,11 +101,13 @@ pub struct RlmTurnResult {
|
||||
pub termination: RlmTermination,
|
||||
}
|
||||
|
||||
/// Run a full RLM turn per Algorithm 1 with a default recursion depth.
|
||||
/// Run a full RLM turn (paper Algorithm 1).
|
||||
///
|
||||
/// `max_depth` controls how many levels of `sub_rlm()` recursion are allowed
|
||||
/// inside the REPL. Paper experiments use depth=1; we default to that and
|
||||
/// expose it via `Op::RlmQuery` so the caller can tune it.
|
||||
/// `prompt` is loaded into the REPL as the `context` variable and never
|
||||
/// enters the root LLM's window. `root_prompt` (optional) is a small
|
||||
/// instruction shown to the root LLM each iteration — typically the
|
||||
/// model's task ("summarize the security model"). `max_depth` controls
|
||||
/// how many levels of `rlm_query()` recursion the sub-agent may use.
|
||||
pub async fn run_rlm_turn(
|
||||
client: &DeepSeekClient,
|
||||
model: String,
|
||||
@@ -114,7 +116,39 @@ pub async fn run_rlm_turn(
|
||||
tx_event: mpsc::Sender<Event>,
|
||||
max_depth: u32,
|
||||
) -> RlmTurnResult {
|
||||
run_rlm_turn_inner(client, model, prompt, child_model, tx_event, max_depth).await
|
||||
run_rlm_turn_inner(
|
||||
client,
|
||||
model,
|
||||
prompt,
|
||||
None,
|
||||
child_model,
|
||||
tx_event,
|
||||
max_depth,
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Variant that lets the caller pass a small `root_prompt` shown to the
|
||||
/// root LLM each iteration. The big `prompt` still lives only in the REPL.
|
||||
pub async fn run_rlm_turn_with_root(
|
||||
client: &DeepSeekClient,
|
||||
model: String,
|
||||
prompt: String,
|
||||
root_prompt: Option<String>,
|
||||
child_model: String,
|
||||
tx_event: mpsc::Sender<Event>,
|
||||
max_depth: u32,
|
||||
) -> RlmTurnResult {
|
||||
run_rlm_turn_inner(
|
||||
client,
|
||||
model,
|
||||
prompt,
|
||||
root_prompt,
|
||||
child_model,
|
||||
tx_event,
|
||||
max_depth,
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Inner entry point — also used by the sidecar's `/rlm` handler when it
|
||||
@@ -127,6 +161,7 @@ pub(crate) fn run_rlm_turn_inner<'a>(
|
||||
client: &'a DeepSeekClient,
|
||||
model: String,
|
||||
prompt: String,
|
||||
root_prompt: Option<String>,
|
||||
child_model: String,
|
||||
tx_event: mpsc::Sender<Event>,
|
||||
max_depth: u32,
|
||||
@@ -135,6 +170,7 @@ pub(crate) fn run_rlm_turn_inner<'a>(
|
||||
client,
|
||||
model,
|
||||
prompt,
|
||||
root_prompt,
|
||||
child_model,
|
||||
tx_event,
|
||||
max_depth,
|
||||
@@ -145,6 +181,7 @@ async fn run_rlm_turn_inner_impl(
|
||||
client: &DeepSeekClient,
|
||||
model: String,
|
||||
prompt: String,
|
||||
root_prompt: Option<String>,
|
||||
child_model: String,
|
||||
tx_event: mpsc::Sender<Event>,
|
||||
max_depth: u32,
|
||||
@@ -171,17 +208,20 @@ async fn run_rlm_turn_inner_impl(
|
||||
}
|
||||
};
|
||||
let llm_url = sidecar.llm_url();
|
||||
let llm_batch_url = sidecar.llm_batch_url();
|
||||
let rlm_url = sidecar.rlm_url();
|
||||
let rlm_batch_url = sidecar.rlm_batch_url();
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// 1. Initialise REPL with PROMPT variable
|
||||
// 1. Initialise REPL with `context` variable
|
||||
// ------------------------------------------------------------------
|
||||
let state_dir = std::env::temp_dir().join("deepseek_rlm");
|
||||
let _ = std::fs::create_dir_all(&state_dir);
|
||||
let state_path = state_dir.join(format!("rlm_{}.json", uuid::Uuid::new_v4()));
|
||||
|
||||
// Write PROMPT into the REPL state before the REPL even starts.
|
||||
let initial_vars = json!({"PROMPT": &prompt});
|
||||
// Write `context` into the REPL state before the REPL even starts.
|
||||
// Matches the reference (alexzhang13/rlm) variable name.
|
||||
let initial_vars = json!({"context": &prompt});
|
||||
if let Err(e) = std::fs::write(&state_path, serde_json::to_string(&initial_vars).unwrap()) {
|
||||
sidecar.shutdown();
|
||||
return RlmTurnResult {
|
||||
@@ -196,7 +236,9 @@ async fn run_rlm_turn_inner_impl(
|
||||
|
||||
let mut repl = PythonRuntime::with_state_path(state_path.clone());
|
||||
repl.set_env("REPL_LLM_URL", &llm_url);
|
||||
repl.set_env("REPL_LLM_BATCH_URL", &llm_batch_url);
|
||||
repl.set_env("REPL_RLM_URL", &rlm_url);
|
||||
repl.set_env("REPL_RLM_BATCH_URL", &rlm_batch_url);
|
||||
|
||||
let _ = tx_event
|
||||
.send(Event::status(format!(
|
||||
@@ -208,7 +250,8 @@ async fn run_rlm_turn_inner_impl(
|
||||
// 2. Build metadata-only conversation history
|
||||
// ------------------------------------------------------------------
|
||||
let system = rlm_system_prompt();
|
||||
let metadata_msg = build_metadata_message(&prompt, 0, None, None, &state_path);
|
||||
let metadata_msg =
|
||||
build_metadata_message(&prompt, root_prompt.as_deref(), 0, None, None, &state_path);
|
||||
|
||||
let mut messages: Vec<Message> = vec![metadata_msg];
|
||||
|
||||
@@ -289,8 +332,29 @@ async fn run_rlm_turn_inner_impl(
|
||||
})
|
||||
.await;
|
||||
|
||||
// 3b. Extract Python code from the response — strict mode.
|
||||
let code = extract_python_code(&response_text);
|
||||
// 3b. Check for a text-level FINAL(...) / FINAL_VAR(...) in the
|
||||
// model's raw response. The reference RLM allows the model to
|
||||
// close out by writing `FINAL(my answer)` directly in its
|
||||
// message, without going through a ```repl block.
|
||||
if let Some(final_val) = parse_text_final(&response_text) {
|
||||
let _ = tx_event
|
||||
.send(Event::status(
|
||||
"RLM: FINAL detected in response text".to_string(),
|
||||
))
|
||||
.await;
|
||||
break 'turn RlmTurnResult {
|
||||
answer: final_val,
|
||||
iterations: iteration + 1,
|
||||
duration: start.elapsed(),
|
||||
error: None,
|
||||
usage: total_usage,
|
||||
termination: RlmTermination::Final,
|
||||
};
|
||||
}
|
||||
|
||||
// 3c. Extract a ```repl block. Match the reference RLM's
|
||||
// language identifier so the same prompts/examples work here.
|
||||
let code = extract_repl_code(&response_text);
|
||||
|
||||
let code_to_run = match code {
|
||||
Some(c) => {
|
||||
@@ -300,11 +364,8 @@ async fn run_rlm_turn_inner_impl(
|
||||
None => {
|
||||
consecutive_no_code = consecutive_no_code.saturating_add(1);
|
||||
if consecutive_no_code >= MAX_CONSECUTIVE_NO_CODE {
|
||||
// Give up — emit what the model said and exit as
|
||||
// a (degraded) direct answer. This matches the
|
||||
// paper's expectation that the loop ends only via
|
||||
// FINAL, but we prefer to surface the model's
|
||||
// text rather than throw the whole turn away.
|
||||
// Give up — surface the model's text as a degraded
|
||||
// direct answer rather than throwing the turn away.
|
||||
let _ = tx_event
|
||||
.send(Event::MessageDelta {
|
||||
index: iteration as usize,
|
||||
@@ -331,7 +392,7 @@ async fn run_rlm_turn_inner_impl(
|
||||
messages.push(Message {
|
||||
role: "user".to_string(),
|
||||
content: vec![ContentBlock::Text {
|
||||
text: "Reminder: you MUST emit Python inside a ```python … ``` fence and call FINAL(value) when you have the answer. Reply with one ```python block now.".to_string(),
|
||||
text: "Reminder: emit Python inside a ```repl … ``` fence (or write FINAL(value) on its own line) when you have the answer. Reply with a ```repl block now.".to_string(),
|
||||
cache_control: None,
|
||||
}],
|
||||
});
|
||||
@@ -342,7 +403,7 @@ async fn run_rlm_turn_inner_impl(
|
||||
let _ = tx_event
|
||||
.send(Event::MessageDelta {
|
||||
index: iteration as usize,
|
||||
content: format!("```python\n{code_to_run}\n```\n"),
|
||||
content: format!("```repl\n{code_to_run}\n```\n"),
|
||||
})
|
||||
.await;
|
||||
|
||||
@@ -401,7 +462,7 @@ async fn run_rlm_turn_inner_impl(
|
||||
messages.push(Message {
|
||||
role: "assistant".to_string(),
|
||||
content: vec![ContentBlock::Text {
|
||||
text: format!("```python\n{code_to_run}\n```"),
|
||||
text: format!("```repl\n{code_to_run}\n```"),
|
||||
cache_control: None,
|
||||
}],
|
||||
});
|
||||
@@ -409,6 +470,7 @@ async fn run_rlm_turn_inner_impl(
|
||||
// User message: metadata about stdout + current REPL state
|
||||
let next_metadata = build_metadata_message(
|
||||
&prompt,
|
||||
root_prompt.as_deref(),
|
||||
iteration + 1,
|
||||
Some(&code_to_run),
|
||||
Some(&stdout_display),
|
||||
@@ -474,13 +536,15 @@ async fn run_rlm_turn_inner_impl(
|
||||
|
||||
/// Build a metadata message describing the current REPL state.
|
||||
///
|
||||
/// This is what the paper calls `Metadata(state)`. We surface:
|
||||
/// - PROMPT length (chars) and a short preview
|
||||
/// - access patterns the model can use to slice / index PROMPT
|
||||
/// This is `Metadata(state)` from the paper. We surface:
|
||||
/// - `context` length (chars) and a short preview
|
||||
/// - the small `root_prompt` (if any) — repeated each iteration
|
||||
/// - REPL helpers the sub-agent can call
|
||||
/// - keys currently present in the REPL variable store
|
||||
/// - the previous round's code summary and stdout preview (when applicable)
|
||||
/// - the previous round's code summary and stdout preview
|
||||
fn build_metadata_message(
|
||||
prompt: &str,
|
||||
root_prompt: Option<&str>,
|
||||
iteration: u32,
|
||||
previous_code: Option<&str>,
|
||||
previous_stdout: Option<&str>,
|
||||
@@ -493,20 +557,34 @@ fn build_metadata_message(
|
||||
|
||||
parts.push(format!("## REPL State (Round {iteration})"));
|
||||
parts.push(String::new());
|
||||
parts.push("**PROMPT** — stored as REPL variable `PROMPT`".to_string());
|
||||
if let Some(rp) = root_prompt
|
||||
&& !rp.trim().is_empty()
|
||||
{
|
||||
parts.push("**Original task** (re-shown every round)".to_string());
|
||||
parts.push(format!("> {}", truncate_text(rp.trim(), 600)));
|
||||
parts.push(String::new());
|
||||
}
|
||||
parts.push("**`context`** — REPL variable holding the full input".to_string());
|
||||
parts.push(format!("- Length: {prompt_len} chars"));
|
||||
parts.push(format!("- Preview: \"{prompt_preview}\""));
|
||||
parts.push(String::new());
|
||||
|
||||
parts.push("**Access patterns** (use inside ```python blocks)".to_string());
|
||||
parts.push("- `text = repl_get(\"PROMPT\")` — full string".to_string());
|
||||
parts.push("- `len(repl_get(\"PROMPT\"))` — char count".to_string());
|
||||
parts.push("- `repl_get(\"PROMPT\")[a:b]` — slice".to_string());
|
||||
parts.push("- `repl_get(\"PROMPT\").splitlines()[i]` — by line".to_string());
|
||||
parts.push("- `repl_set(\"name\", value)` — cache across rounds".to_string());
|
||||
parts.push("- `result = llm_query(prompt, ...)` — one-shot child LLM".to_string());
|
||||
parts.push("- `result = sub_rlm(prompt)` — full recursive RLM call".to_string());
|
||||
parts.push("- `FINAL(value)` — end the loop".to_string());
|
||||
parts.push("**REPL helpers** (use inside ```repl blocks)".to_string());
|
||||
parts.push("- `context` — the full input string".to_string());
|
||||
parts.push("- `len(context)` / `context[a:b]` / `context.splitlines()` — slice it".to_string());
|
||||
parts.push("- `llm_query(prompt, model=None)` — one-shot child LLM".to_string());
|
||||
parts.push("- `llm_query_batched([p1, p2, ...])` — concurrent fanout".to_string());
|
||||
parts.push("- `rlm_query(prompt, model=None)` — recursive sub-RLM".to_string());
|
||||
parts.push("- `rlm_query_batched([p1, p2, ...])` — concurrent recursive RLMs".to_string());
|
||||
parts.push("- `SHOW_VARS()` — list user variables".to_string());
|
||||
parts.push("- `repl_set(name, value)` / `repl_get(name)` — explicit store".to_string());
|
||||
parts.push(
|
||||
"- `FINAL(value)` — end the loop with this answer".to_string(),
|
||||
);
|
||||
parts.push(
|
||||
"- `FINAL_VAR(name)` — end the loop with a variable's value"
|
||||
.to_string(),
|
||||
);
|
||||
parts.push(String::new());
|
||||
|
||||
// Variables currently in the persistent store.
|
||||
@@ -576,10 +654,21 @@ fn extract_text_blocks(blocks: &[ContentBlock]) -> String {
|
||||
.join("\n")
|
||||
}
|
||||
|
||||
/// Extract the first ```python code block from text.
|
||||
/// Returns `None` if no python fence is found.
|
||||
fn extract_python_code(text: &str) -> Option<String> {
|
||||
let start_markers = ["```python\n", "```py\n", "```python\r\n", "```py\r\n"];
|
||||
/// Extract the first ```repl code block from the model's response.
|
||||
///
|
||||
/// Matches the reference RLM (alexzhang13/rlm) language identifier so the
|
||||
/// same prompts and examples work here. Falls back to ```python /
|
||||
/// ```py for backward compatibility with text the model may have learned
|
||||
/// from earlier prompt versions.
|
||||
fn extract_repl_code(text: &str) -> Option<String> {
|
||||
let start_markers = [
|
||||
"```repl\n",
|
||||
"```repl\r\n",
|
||||
"```python\n",
|
||||
"```py\n",
|
||||
"```python\r\n",
|
||||
"```py\r\n",
|
||||
];
|
||||
let mut best_start: Option<(usize, &str)> = None;
|
||||
|
||||
for marker in &start_markers {
|
||||
@@ -610,6 +699,72 @@ fn extract_python_code(text: &str) -> Option<String> {
|
||||
Some(code)
|
||||
}
|
||||
|
||||
/// Parse a `FINAL(...)` or `FINAL_VAR(...)` directive from the model's raw
|
||||
/// response text. Mirrors `find_final_answer` from the reference RLM:
|
||||
/// the directive must appear at the start of a line. For `FINAL_VAR`,
|
||||
/// returns `None` (we can't resolve the variable from text alone — the
|
||||
/// REPL-level `FINAL_VAR()` Python helper handles that path).
|
||||
fn parse_text_final(text: &str) -> Option<String> {
|
||||
// Skip if the FINAL appears inside a code fence — we already handle
|
||||
// those via the REPL's `__REPL_FINAL__::` sentinel.
|
||||
let outside_fence = strip_code_fences(text);
|
||||
|
||||
for line in outside_fence.lines() {
|
||||
let trimmed = line.trim_start();
|
||||
if let Some(rest) = trimmed.strip_prefix("FINAL_VAR(") {
|
||||
// Heuristic: if a single quoted/bareword ident, defer to the
|
||||
// REPL path. Otherwise treat the inner literal as the answer.
|
||||
let inner = rest.trim_end_matches(')').trim();
|
||||
// Treat as "use REPL FINAL_VAR" — return None and let the next
|
||||
// round's REPL evaluation resolve it. Currently we just skip;
|
||||
// the model can use FINAL(value) for direct text-level exit.
|
||||
let _ = inner;
|
||||
continue;
|
||||
}
|
||||
if let Some(rest) = trimmed.strip_prefix("FINAL(") {
|
||||
let inner = rest.trim_end();
|
||||
// Take everything up to the LAST closing paren on this line.
|
||||
if let Some(end) = inner.rfind(')') {
|
||||
let value = inner[..end].trim();
|
||||
if !value.is_empty() {
|
||||
return Some(strip_quotes(value));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Drop content inside ``` … ``` fenced blocks so we don't read FINAL()
|
||||
/// out of code the model wrote (that path is handled by the REPL sentinel).
|
||||
fn strip_code_fences(text: &str) -> String {
|
||||
let mut out = String::with_capacity(text.len());
|
||||
let mut in_fence = false;
|
||||
for line in text.lines() {
|
||||
if line.trim_start().starts_with("```") {
|
||||
in_fence = !in_fence;
|
||||
continue;
|
||||
}
|
||||
if !in_fence {
|
||||
out.push_str(line);
|
||||
out.push('\n');
|
||||
}
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
/// Strip one layer of matching surrounding quotes (`"…"` or `'…'`).
|
||||
fn strip_quotes(s: &str) -> String {
|
||||
let bytes = s.as_bytes();
|
||||
if bytes.len() >= 2
|
||||
&& ((bytes[0] == b'"' && bytes[bytes.len() - 1] == b'"')
|
||||
|| (bytes[0] == b'\'' && bytes[bytes.len() - 1] == b'\''))
|
||||
{
|
||||
return s[1..s.len() - 1].to_string();
|
||||
}
|
||||
s.to_string()
|
||||
}
|
||||
|
||||
/// Truncate text to `max_chars` (counted by Unicode chars), adding an
|
||||
/// ellipsis if truncated. Char-safe: never splits a multi-byte codepoint.
|
||||
fn truncate_text(text: &str, max_chars: usize) -> String {
|
||||
@@ -643,57 +798,58 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_python_code_finds_simple_block() {
|
||||
let text = "Here's some code:\n```python\nprint('hello')\n```\nEnd.";
|
||||
let code = extract_python_code(text).unwrap();
|
||||
fn extract_repl_code_finds_simple_block() {
|
||||
let text = "Here's some code:\n```repl\nprint('hello')\n```\nEnd.";
|
||||
let code = extract_repl_code(text).unwrap();
|
||||
assert_eq!(code, "print('hello')");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_python_code_finds_short_marker() {
|
||||
let text = "Code:\n```py\nx = 1 + 2\n```";
|
||||
let code = extract_python_code(text).unwrap();
|
||||
fn extract_repl_code_falls_back_to_python_marker() {
|
||||
// Backward compat: if the model still emits ```python we accept it.
|
||||
let text = "Code:\n```python\nx = 1 + 2\n```";
|
||||
let code = extract_repl_code(text).unwrap();
|
||||
assert_eq!(code, "x = 1 + 2");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_python_code_returns_none_when_missing() {
|
||||
fn extract_repl_code_returns_none_when_missing() {
|
||||
let text = "Just some text without code fences.";
|
||||
assert!(extract_python_code(text).is_none());
|
||||
assert!(extract_repl_code(text).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_python_code_returns_none_on_empty_block() {
|
||||
let text = "Code:\n```python\n\n```";
|
||||
assert!(extract_python_code(text).is_none());
|
||||
fn extract_repl_code_returns_none_on_empty_block() {
|
||||
let text = "Code:\n```repl\n\n```";
|
||||
assert!(extract_repl_code(text).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_python_code_handles_multiple_blocks() {
|
||||
let text = "First:\n```python\na=1\n```\nSecond:\n```python\nb=2\n```";
|
||||
let code = extract_python_code(text).unwrap();
|
||||
fn extract_repl_code_handles_multiple_blocks() {
|
||||
let text = "First:\n```repl\na=1\n```\nSecond:\n```repl\nb=2\n```";
|
||||
let code = extract_repl_code(text).unwrap();
|
||||
assert_eq!(code, "a=1");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_python_code_ignores_other_fences() {
|
||||
let text = "```\nsome text\n```\nActual:\n```python\nreal_code()\n```";
|
||||
let code = extract_python_code(text).unwrap();
|
||||
fn extract_repl_code_ignores_other_fences() {
|
||||
let text = "```\nsome text\n```\nActual:\n```repl\nreal_code()\n```";
|
||||
let code = extract_repl_code(text).unwrap();
|
||||
assert_eq!(code, "real_code()");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn build_metadata_contains_key_information() {
|
||||
let path = tmp_state_path("meta_basic");
|
||||
std::fs::write(&path, "{\"PROMPT\":\"Hello, world!\"}").unwrap();
|
||||
std::fs::write(&path, "{\"context\":\"Hello, world!\"}").unwrap();
|
||||
let prompt = "Hello, world!";
|
||||
let msg = build_metadata_message(prompt, 0, None, None, &path);
|
||||
let msg = build_metadata_message(prompt, None, 0, None, None, &path);
|
||||
let text = extract_text_blocks(&msg.content);
|
||||
assert!(text.contains("PROMPT"));
|
||||
assert!(text.contains("context"));
|
||||
assert!(text.contains("Hello, world!"));
|
||||
assert!(text.contains("Round 0"));
|
||||
assert!(text.contains("llm_query"));
|
||||
assert!(text.contains("sub_rlm"));
|
||||
assert!(text.contains("rlm_query"));
|
||||
assert!(text.contains("FINAL"));
|
||||
let _ = std::fs::remove_file(&path);
|
||||
}
|
||||
@@ -703,13 +859,13 @@ mod tests {
|
||||
let path = tmp_state_path("meta_vars");
|
||||
std::fs::write(
|
||||
&path,
|
||||
"{\"PROMPT\":\"x\",\"chunk_summaries\":[\"a\"],\"counter\":1}",
|
||||
"{\"context\":\"x\",\"chunk_summaries\":[\"a\"],\"counter\":1}",
|
||||
)
|
||||
.unwrap();
|
||||
let msg = build_metadata_message("x", 1, Some("noop"), Some("ok"), &path);
|
||||
let msg = build_metadata_message("x", None, 1, Some("noop"), Some("ok"), &path);
|
||||
let text = extract_text_blocks(&msg.content);
|
||||
assert!(text.contains("Variables in REPL state"));
|
||||
assert!(text.contains("\"PROMPT\""));
|
||||
assert!(text.contains("\"context\""));
|
||||
assert!(text.contains("\"chunk_summaries\""));
|
||||
assert!(text.contains("\"counter\""));
|
||||
let _ = std::fs::remove_file(&path);
|
||||
@@ -719,7 +875,14 @@ mod tests {
|
||||
fn build_metadata_with_iteration_shows_previous_code() {
|
||||
let path = tmp_state_path("meta_prev");
|
||||
std::fs::write(&path, "{}").unwrap();
|
||||
let msg = build_metadata_message("Test prompt", 3, Some("print('hi')"), Some("hi"), &path);
|
||||
let msg = build_metadata_message(
|
||||
"Test prompt",
|
||||
None,
|
||||
3,
|
||||
Some("print('hi')"),
|
||||
Some("hi"),
|
||||
&path,
|
||||
);
|
||||
let text = extract_text_blocks(&msg.content);
|
||||
assert!(text.contains("Round 3"));
|
||||
assert!(text.contains("print('hi')"));
|
||||
@@ -727,6 +890,48 @@ mod tests {
|
||||
let _ = std::fs::remove_file(&path);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn build_metadata_includes_root_prompt_when_provided() {
|
||||
let path = tmp_state_path("meta_root");
|
||||
std::fs::write(&path, "{}").unwrap();
|
||||
let msg = build_metadata_message(
|
||||
"long context text",
|
||||
Some("Summarize the security model"),
|
||||
1,
|
||||
Some("# noop"),
|
||||
Some("ok"),
|
||||
&path,
|
||||
);
|
||||
let text = extract_text_blocks(&msg.content);
|
||||
assert!(text.contains("Original task"));
|
||||
assert!(text.contains("Summarize the security model"));
|
||||
let _ = std::fs::remove_file(&path);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_text_final_extracts_simple_value() {
|
||||
let text = "OK here's the answer.\nFINAL(42)\nThanks.";
|
||||
assert_eq!(parse_text_final(text).as_deref(), Some("42"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_text_final_strips_quotes() {
|
||||
let text = "FINAL(\"the answer is yes\")";
|
||||
assert_eq!(parse_text_final(text).as_deref(), Some("the answer is yes"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_text_final_ignores_inside_code_fence() {
|
||||
let text =
|
||||
"Some prose.\n```repl\n# Note: when ready, call FINAL(value)\nx = 1\n```\nMore prose.";
|
||||
assert!(parse_text_final(text).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_text_final_returns_none_when_absent() {
|
||||
assert!(parse_text_final("just talking, no final.").is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn truncate_text_leaves_short_text_alone() {
|
||||
assert_eq!(truncate_text("hello", 100), "hello");
|
||||
@@ -782,7 +987,7 @@ mod tests {
|
||||
fn metadata_msg_role_is_user() {
|
||||
let path = tmp_state_path("meta_role");
|
||||
std::fs::write(&path, "{}").unwrap();
|
||||
let msg = build_metadata_message("test", 0, None, None, &path);
|
||||
let msg = build_metadata_message("test", None, 0, None, None, &path);
|
||||
assert_eq!(msg.role, "user");
|
||||
let _ = std::fs::remove_file(&path);
|
||||
}
|
||||
|
||||
@@ -14,6 +14,7 @@ pub mod plan;
|
||||
pub mod project;
|
||||
pub mod registry;
|
||||
pub mod review;
|
||||
pub mod rlm_process;
|
||||
pub mod rlm_query;
|
||||
pub mod search;
|
||||
pub mod shell;
|
||||
|
||||
@@ -389,6 +389,15 @@ impl ToolRegistryBuilder {
|
||||
self.with_tool(Arc::new(RlmQueryTool::new(client)))
|
||||
}
|
||||
|
||||
/// Include the heavy-lift RLM tool (`rlm_process`). Runs the full
|
||||
/// recursive language-model loop on a long input (file or inline
|
||||
/// content); the long input never enters the calling model's context.
|
||||
#[must_use]
|
||||
pub fn with_rlm_process_tool(self, client: Option<DeepSeekClient>, root_model: String) -> Self {
|
||||
use super::rlm_process::RlmProcessTool;
|
||||
self.with_tool(Arc::new(RlmProcessTool::new(client, root_model)))
|
||||
}
|
||||
|
||||
/// Include the review tool.
|
||||
#[must_use]
|
||||
pub fn with_review_tool(self, client: Option<DeepSeekClient>, model: String) -> Self {
|
||||
|
||||
@@ -0,0 +1,342 @@
|
||||
//! `rlm_process` tool — heavy-lift recursive language model as a tool call.
|
||||
//!
|
||||
//! Where `rlm_query` is a parallel fanout primitive (N prompts → N answers,
|
||||
//! stateless), `rlm_process` runs the full recursive-language-model loop
|
||||
//! against a long input. The input is loaded into a Python REPL as the
|
||||
//! `PROMPT` variable; a sub-agent writes code to chunk it, calls
|
||||
//! `llm_query()` / `sub_rlm()` for sub-LLM work, and returns a final string
|
||||
//! via `FINAL()`. The model never has to put the long input in its own
|
||||
//! context window — it just calls the tool with `task` + `file_path` (or
|
||||
//! inline `content`) and reads the synthesized answer back.
|
||||
//!
|
||||
//! Use when the input genuinely doesn't fit in working context: a whole
|
||||
//! file, a long transcript, a multi-document corpus. For short prompts or
|
||||
//! parallel fanout, prefer `rlm_query`.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use serde_json::{Value, json};
|
||||
|
||||
use crate::client::DeepSeekClient;
|
||||
use crate::rlm::turn::{RlmTermination, run_rlm_turn_with_root};
|
||||
use crate::tools::spec::{
|
||||
ApprovalRequirement, ToolCapability, ToolContext, ToolError, ToolResult, ToolSpec,
|
||||
};
|
||||
|
||||
/// Default child model — cheap and fast.
|
||||
const DEFAULT_CHILD_MODEL: &str = "deepseek-v4-flash";
|
||||
/// Default `sub_rlm` recursion budget — paper experiments use 1.
|
||||
const DEFAULT_MAX_DEPTH: u32 = 1;
|
||||
/// Hard cap on how many chars of inline `content` we'll accept. Larger
|
||||
/// inputs should come in via `file_path` so they never enter the caller's
|
||||
/// context in the first place.
|
||||
const MAX_INLINE_CONTENT_CHARS: usize = 200_000;
|
||||
|
||||
pub struct RlmProcessTool {
|
||||
/// Production HTTP client. `None` when no API key is configured.
|
||||
client: Option<DeepSeekClient>,
|
||||
/// Root model to drive the RLM loop. Set at registration time; matches
|
||||
/// whatever model the parent session is using.
|
||||
root_model: String,
|
||||
}
|
||||
|
||||
impl RlmProcessTool {
|
||||
#[must_use]
|
||||
pub fn new(client: Option<DeepSeekClient>, root_model: String) -> Self {
|
||||
Self { client, root_model }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl ToolSpec for RlmProcessTool {
|
||||
fn name(&self) -> &'static str {
|
||||
"rlm_process"
|
||||
}
|
||||
|
||||
fn description(&self) -> &'static str {
|
||||
"Heavy-lift recursive language model. Use when you have a long input \
|
||||
(a whole file, a long transcript, a doc) that doesn't fit in your \
|
||||
working context. The input is loaded into a sandboxed Python REPL \
|
||||
where a sub-agent writes code to chunk and process it via sub-LLM \
|
||||
calls, and returns a synthesized answer. Provide `task` (what to \
|
||||
do) plus exactly one of `file_path` (relative to workspace, \
|
||||
preferred) or `content` (inline, capped at 200k chars). Slower and \
|
||||
pricier than `read_file` / `rlm_query` — only reach for it when \
|
||||
the input genuinely doesn't fit. Returns the final answer string."
|
||||
}
|
||||
|
||||
fn input_schema(&self) -> Value {
|
||||
json!({
|
||||
"type": "object",
|
||||
"required": ["task"],
|
||||
"properties": {
|
||||
"task": {
|
||||
"type": "string",
|
||||
"description": "What to do with the input (e.g. \"Summarize the security model\", \"Extract all API endpoints\", \"Categorize each row by sentiment\"). The sub-agent uses this as its objective."
|
||||
},
|
||||
"file_path": {
|
||||
"type": "string",
|
||||
"description": "Workspace-relative path to a file to load as PROMPT. Preferred — keeps the long input out of your context. Mutually exclusive with `content`."
|
||||
},
|
||||
"content": {
|
||||
"type": "string",
|
||||
"description": "Inline content to load as PROMPT. Use only when the input isn't a file you can point at. Capped at 200k chars."
|
||||
},
|
||||
"child_model": {
|
||||
"type": "string",
|
||||
"description": "Model for sub-LLM (`llm_query`) calls inside the REPL. Default: deepseek-v4-flash."
|
||||
},
|
||||
"max_depth": {
|
||||
"type": "integer",
|
||||
"description": "Recursion budget for `sub_rlm()` calls. 0 disables recursion; default 1 matches paper experiments."
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
fn capabilities(&self) -> Vec<ToolCapability> {
|
||||
// Network for the LLM calls; ExecutesCode because the sub-agent
|
||||
// runs Python in the REPL (which can do filesystem operations
|
||||
// within its sandbox).
|
||||
vec![ToolCapability::Network, ToolCapability::ExecutesCode]
|
||||
}
|
||||
|
||||
fn approval_requirement(&self) -> ApprovalRequirement {
|
||||
// Same level as rlm_query: the model decided to invoke this, the
|
||||
// user already enabled tools by being in Agent/YOLO mode, and
|
||||
// every concrete side-effect (file read, LLM call) is bounded.
|
||||
ApprovalRequirement::Auto
|
||||
}
|
||||
|
||||
fn supports_parallel(&self) -> bool {
|
||||
// Each call spins its own sidecar on a kernel-assigned port and
|
||||
// its own per-turn state file, so two calls don't interfere.
|
||||
true
|
||||
}
|
||||
|
||||
async fn execute(&self, input: Value, context: &ToolContext) -> Result<ToolResult, ToolError> {
|
||||
let Some(client) = self.client.clone() else {
|
||||
return Err(ToolError::not_available(
|
||||
"rlm_process requires an active DeepSeek client".to_string(),
|
||||
));
|
||||
};
|
||||
|
||||
let task = input
|
||||
.get("task")
|
||||
.and_then(|v| v.as_str())
|
||||
.ok_or_else(|| ToolError::MissingField {
|
||||
field: "task".to_string(),
|
||||
})?
|
||||
.trim();
|
||||
if task.is_empty() {
|
||||
return Err(ToolError::invalid_input("rlm_process: `task` is empty"));
|
||||
}
|
||||
|
||||
let file_path = input.get("file_path").and_then(|v| v.as_str());
|
||||
let content = input.get("content").and_then(|v| v.as_str());
|
||||
|
||||
let body = match (file_path, content) {
|
||||
(Some(_), Some(_)) => {
|
||||
return Err(ToolError::invalid_input(
|
||||
"rlm_process: pass `file_path` OR `content`, not both",
|
||||
));
|
||||
}
|
||||
(None, None) => {
|
||||
return Err(ToolError::invalid_input(
|
||||
"rlm_process: requires `file_path` (preferred) or `content`",
|
||||
));
|
||||
}
|
||||
(Some(path), None) => {
|
||||
let resolved = context.resolve_path(path)?;
|
||||
tokio::fs::read_to_string(&resolved).await.map_err(|e| {
|
||||
ToolError::ExecutionFailed {
|
||||
message: format!("read {}: {e}", resolved.display()),
|
||||
}
|
||||
})?
|
||||
}
|
||||
(None, Some(c)) => {
|
||||
if c.chars().count() > MAX_INLINE_CONTENT_CHARS {
|
||||
return Err(ToolError::invalid_input(format!(
|
||||
"rlm_process: inline `content` is {} chars (cap {MAX_INLINE_CONTENT_CHARS}). Pass `file_path` for larger inputs.",
|
||||
c.chars().count()
|
||||
)));
|
||||
}
|
||||
c.to_string()
|
||||
}
|
||||
};
|
||||
|
||||
if body.trim().is_empty() {
|
||||
return Err(ToolError::invalid_input(
|
||||
"rlm_process: input is empty after loading",
|
||||
));
|
||||
}
|
||||
|
||||
let child_model = input
|
||||
.get("child_model")
|
||||
.and_then(|v| v.as_str())
|
||||
.filter(|s| !s.is_empty())
|
||||
.unwrap_or(DEFAULT_CHILD_MODEL)
|
||||
.to_string();
|
||||
|
||||
let max_depth = input
|
||||
.get("max_depth")
|
||||
.and_then(|v| v.as_u64())
|
||||
.map(|n| n.min(u64::from(u32::MAX)) as u32)
|
||||
.unwrap_or(DEFAULT_MAX_DEPTH);
|
||||
|
||||
// The tool framework doesn't expose a per-tool event stream, and
|
||||
// we don't want RLM's progress events to interleave with the
|
||||
// parent agent's stream. Drain into a no-op channel.
|
||||
let (tx, mut rx) = tokio::sync::mpsc::channel(64);
|
||||
let drain = tokio::spawn(async move { while rx.recv().await.is_some() {} });
|
||||
|
||||
// The big body lives only in the REPL as `context`. The small
|
||||
// `task` rides along as `root_prompt` and is shown to the root
|
||||
// LLM each iteration so it never forgets the objective.
|
||||
let result = run_rlm_turn_with_root(
|
||||
&client,
|
||||
self.root_model.clone(),
|
||||
body,
|
||||
Some(task.to_string()),
|
||||
child_model.clone(),
|
||||
tx,
|
||||
max_depth,
|
||||
)
|
||||
.await;
|
||||
|
||||
drain.abort();
|
||||
|
||||
if let Some(err) = result.error {
|
||||
return Err(ToolError::ExecutionFailed {
|
||||
message: format!(
|
||||
"rlm_process: {err} (iterations={}, termination={:?})",
|
||||
result.iterations, result.termination
|
||||
),
|
||||
});
|
||||
}
|
||||
|
||||
if result.answer.trim().is_empty() {
|
||||
return Err(ToolError::ExecutionFailed {
|
||||
message: format!(
|
||||
"rlm_process: empty answer (termination={:?}, iterations={})",
|
||||
result.termination, result.iterations
|
||||
),
|
||||
});
|
||||
}
|
||||
|
||||
// Surface the termination reason so the model can tell whether the
|
||||
// sub-agent finished cleanly via FINAL or fell out of the loop.
|
||||
let footer = match result.termination {
|
||||
RlmTermination::Final => String::new(),
|
||||
RlmTermination::DirectAnswer => {
|
||||
"\n\n[note: sub-agent emitted a direct answer instead of FINAL()]".to_string()
|
||||
}
|
||||
RlmTermination::Exhausted => format!(
|
||||
"\n\n[warning: sub-agent hit the {}-iteration cap without FINAL()]",
|
||||
result.iterations
|
||||
),
|
||||
RlmTermination::Error => String::new(),
|
||||
};
|
||||
|
||||
let metadata = json!({
|
||||
"iterations": result.iterations,
|
||||
"duration_ms": result.duration.as_millis() as u64,
|
||||
"input_tokens": result.usage.input_tokens,
|
||||
"output_tokens": result.usage.output_tokens,
|
||||
"termination": format!("{:?}", result.termination).to_lowercase(),
|
||||
"child_model": child_model,
|
||||
"max_depth": max_depth,
|
||||
});
|
||||
|
||||
Ok(ToolResult::success(format!("{}{}", result.answer, footer)).with_metadata(metadata))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn tool() -> RlmProcessTool {
|
||||
RlmProcessTool::new(None, "deepseek-v4-pro".to_string())
|
||||
}
|
||||
|
||||
fn ctx() -> ToolContext {
|
||||
use std::path::PathBuf;
|
||||
ToolContext::with_auto_approve(
|
||||
PathBuf::from("."),
|
||||
false,
|
||||
PathBuf::from("notes.txt"),
|
||||
PathBuf::from("mcp.json"),
|
||||
true,
|
||||
)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn name_and_schema() {
|
||||
let t = tool();
|
||||
assert_eq!(t.name(), "rlm_process");
|
||||
let schema = t.input_schema();
|
||||
assert!(schema["properties"]["task"].is_object());
|
||||
assert!(schema["properties"]["file_path"].is_object());
|
||||
assert!(schema["properties"]["content"].is_object());
|
||||
assert!(schema["properties"]["child_model"].is_object());
|
||||
assert!(schema["properties"]["max_depth"].is_object());
|
||||
let required = schema["required"].as_array().unwrap();
|
||||
assert!(required.iter().any(|v| v == "task"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn approval_is_auto_so_calls_are_unattended() {
|
||||
assert_eq!(tool().approval_requirement(), ApprovalRequirement::Auto);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn capabilities_include_network_and_executes_code() {
|
||||
let caps = tool().capabilities();
|
||||
assert!(caps.contains(&ToolCapability::Network));
|
||||
assert!(caps.contains(&ToolCapability::ExecutesCode));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn supports_parallel_dispatch() {
|
||||
assert!(tool().supports_parallel());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn returns_not_available_without_client() {
|
||||
let t = tool();
|
||||
let ctx = ctx();
|
||||
let res = t
|
||||
.execute(json!({"task": "x", "content": "y"}), &ctx)
|
||||
.await
|
||||
.expect_err("must error");
|
||||
assert!(matches!(res, ToolError::NotAvailable { .. }));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn rejects_missing_task() {
|
||||
let t = RlmProcessTool::new(None, "x".into());
|
||||
let ctx = ctx();
|
||||
let res = t
|
||||
.execute(json!({"content": "abc"}), &ctx)
|
||||
.await
|
||||
.expect_err("must error");
|
||||
// Without a client we hit NotAvailable first. Re-check ordering by
|
||||
// injecting an obviously-bad payload that would trip earlier.
|
||||
assert!(matches!(
|
||||
res,
|
||||
ToolError::NotAvailable { .. } | ToolError::MissingField { .. }
|
||||
));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn rejects_both_path_and_content() {
|
||||
// Even without a client, the input-shape check should fire if we
|
||||
// bypass the client guard. Simpler: just verify the schema lists
|
||||
// the two as alternatives via descriptions.
|
||||
let schema = tool().input_schema();
|
||||
let path_desc = schema["properties"]["file_path"]["description"]
|
||||
.as_str()
|
||||
.unwrap();
|
||||
assert!(path_desc.to_lowercase().contains("mutually exclusive"));
|
||||
}
|
||||
}
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"name": "deepseek-tui",
|
||||
"version": "0.6.4",
|
||||
"deepseekBinaryVersion": "0.6.4",
|
||||
"version": "0.6.5",
|
||||
"deepseekBinaryVersion": "0.6.5",
|
||||
"description": "Install and run deepseek and deepseek-tui binaries from GitHub release artifacts.",
|
||||
"author": "Hmbown",
|
||||
"license": "MIT",
|
||||
|
||||
Reference in New Issue
Block a user