DeepSeek TUI - Unofficial terminal UI + CLI for DeepSeek models. Features: - Interactive TUI with multiple modes (Normal, Plan, Agent, YOLO, RLM, Duo) - Comprehensive tool access with approval gating - File operations, shell execution, task management - Sub-agent system for parallel work - MCP integration for external tool servers - Session management and skills system - Cross-platform support (macOS, Linux, Windows) 🤖 Generated with [Claude Code](https://claude.ai/code)
11 KiB
RLM Implementation Gap Analysis
This document compares the DeepSeek CLI's current RLM-like sub-agent system against the actual Recursive Language Models (RLM) architecture described in the paper by Khattab et al. (2025).
Overview
The RLM paper introduces a paradigm where LLMs treat long prompts as part of an external environment, allowing programmatic examination, decomposition, and recursive self-calling over prompt snippets. The DeepSeek CLI has implemented a sub-agent system that touches on some RLM concepts but lacks critical RLM-specific infrastructure.
Current Status: DeepSeek CLI now includes a shared RLM session with dedicated tools (rlm_load, rlm_exec, rlm_query, rlm_status) and an RLM system prompt that externalizes context. Remaining gaps are mostly around deeper recursive orchestration and semantic chunking.
Update (v0.1.6)
The following RLM gaps have been addressed in Sprint 2/3:
- REPL integration via
rlm_exectool against a shared RLM session - Sub-call support via
rlm_querywith batch and verify modes - Externalized context with RLM context summaries injected into the system prompt
- RLM-specific prompt (
src/prompts/rlm.txt) with FINAL / FINAL_VAR guidance - Chunking helpers (
chunk_sections,chunk_lines,chunk_auto) for semantic-ish splits - Auto-chunk batching (
rlm_query+auto_chunks) for whole-doc sweeps - Buffer variables (
vars/get/set/append/del+store_as+ FINAL_VAR parsing) - Usage tracking for RLM sub-calls (query count + token totals)
- REPL toggle (
/repl) with RLM chat default - LLM-managed context loading (
rlm_load, plus/load @pathworkspace support) - RLM session status (
rlm_statusfor context + usage summaries) - Auto-RLM switching for large file requests and large pastes (keeps small-context queries in base mode per paper tradeoff)
- RLM usage guardrails in the footer (warns on high query/token usage)
Remaining opportunities (low priority): deeper recursive sub-agent loops and more model-specific prompt tuning.
Key RLM Concepts (From Paper)
Core Architecture
- REPL Environment: Python REPL where context is loaded as a variable
- llm_query Function: Enables recursive sub-LM calls within the REPL
- Context as External Variable: Prompt is NOT fed directly to the LLM
- Programmatic Context Interaction: Model writes code to examine/decompose context
- Buffer Variables: Accumulate partial results across recursive calls
- FINAL/FINAL_VAR Tags: Structured answer output mechanism
Key Behaviors
- Iterative code execution in REPL
- Dynamic context chunking based on analysis
- Recursive sub-calls for information-dense tasks
- Answer verification through sub-LM calls
- Cost-aware sub-call batching
Gap Analysis
1. Missing REPL Integration for LLM
RLM Paper Requirement:
"The REPL environment is initialized with: 1) A 'context' variable that contains extremely important information about your query. 2) A 'llm_query' function that allows you to query an LLM inside your REPL environment. 3) The ability to use 'print()' statements to view the output of your REPL code."
Current DeepSeek Implementation (v0.1.6):
- RLM mode exposes
rlm_execandrlm_querytools to the model - REPL expressions operate on shared session state across turns
- LLM can execute expressions and spawn sub-calls from tool usage
Gap Severity: 🟢 LOW
Status: ✅ Addressed via RLM tools + prompt integration
2. No Recursive Sub-Call Architecture
RLM Paper Requirement:
"RLMs defer essentially unbounded-length reasoning chains to sub-(R)LM calls... RLMs store the output of sub-LM calls over the input in variables and stitch them together to form a final answer."
Current DeepSeek Implementation (v0.1.6):
- Recursive sub-calls are now available via repeated
rlm_querytool invocations - Shared buffer variables allow stitching results across calls
- Sub-agent nesting is still flat (no hierarchical runtime)
Gap Severity: 🟡 MEDIUM
Remaining Enhancements:
- Optional nested sub-agent orchestration with shared buffers + depth limits
3. Missing RLM-Specific System Prompts
RLM Paper Requirement:
"You are tasked with answering a query with associated context... You can access, transform, and analyze this context interactively in a REPL environment that can recursively query sub-LLs, which you are strongly encouraged to use as much as possible."
Current DeepSeek Implementation (v0.1.6):
- Dedicated RLM prompt (
src/prompts/rlm.txt) with REPL/tool guidance - RLM sub-call prompt enforces FINAL / FINAL_VAR output conventions
- Prompt guidance for batching and verification
Gap Severity: 🟢 LOW
Status: ✅ Addressed
4. No Context Offloading to External Environment
RLM Paper Requirement:
"The key insight is that long prompts should not be fed into the neural network directly but should instead be treated as part of the environment that the LLM can symbolically interact with."
Current DeepSeek Implementation (v0.1.6):
- RLM contexts are stored externally in
RlmSession - Only summaries are injected into the system prompt
- LLM accesses context via
rlm_exec,rlm_query, andrlm_load
Gap Severity: 🟢 LOW
Status: ✅ Addressed
5. Missing Context Chunking Intelligence
RLM Paper Requirement:
"An example strategy is to first look at the context and figure out a chunking strategy, then break up the context into smart chunks, and query an LLM per chunk with a particular question."
Current DeepSeek Implementation (v0.1.6):
- Fixed-size chunking (
chunk) pluschunk_sections,chunk_lines, andchunk_auto - LLM controls chunking via
rlm_execbefore issuing sub-calls rlm_query auto_chunksenables whole-document sweeps overchunk_auto- No true semantic chunking (AST/function/paragraph-aware)
Current Code (src/rlm.rs):
pub fn chunk(&self, chunk_size: usize, overlap: usize) -> Vec<ChunkInfo> {
// Fixed-size character-based chunking only
}
Gap Severity: 🟡 MEDIUM
Remaining Enhancements:
- Deeper semantic chunking (AST/function-aware) and richer metadata
6. No Buffer Variable System
RLM Paper Requirement:
"Use these variables as buffers to build up your final answer... store the output of sub-LM calls over the input in variables and stitch them together."
Current DeepSeek Implementation (v0.1.6):
- Buffer variables are supported via
vars/get/set/append/del rlm_querysupportsstore_as+ FINAL_VAR parsing to persist results- Variables persist per context across tool calls
Current Code (src/rlm.rs):
pub struct RlmContext {
pub variables: HashMap<String, String>,
...
}
Gap Severity: 🟢 LOW
Status: ✅ Addressed
7. Missing Answer Verification Pattern
RLM Paper Requirement:
"We observed several instances of answer verification made by RLMs through sub-LM calls... Some of these strategies implicitly avoid context rot by using sub-LMs to perform verification."
Current DeepSeek Implementation (v0.1.6):
rlm_querysupportsmode="verify"for explicit verification calls- LLM can batch verification queries to cross-check answers
Gap Severity: 🟢 LOW
Remaining Enhancements:
- Optional confidence scoring or contradiction heuristics
8. No Cost-Aware Sub-Call Batching
RLM Paper Requirement (Appendix D.1):
"IMPORTANT: Be very careful about using 'llm_query' as it incurs high runtime costs. Always batch as much information as reasonably possible into each call (aim for around 200k characters per call)."
Current DeepSeek Implementation (v0.1.6):
- Sub-call usage tracking (query count + token totals)
- Prompt guidance to batch queries and cap payload size
rlm_statusexposes aggregate usage stats- Footer guardrails warn on high query/token usage
Gap Severity: 🟢 LOW
Remaining Enhancements:
- Optional hard caps or per-model budget limits
9. No Iterative REPL Loop Integration
RLM Paper Requirement:
"You will be queried iteratively until you provide a final answer... Output to the REPL environment and recursive LLMs as much as possible."
Current DeepSeek Implementation (v0.1.6):
- Shared RLM session persists across tool calls and turns
- LLM iteratively invokes
rlm_exec/rlm_querywithin a single turn - FINAL / FINAL_VAR markers enforced in prompts
Gap Severity: 🟢 LOW
Status: ✅ Addressed
10. Missing Model-Specific RLM Tuning
RLM Paper Requirement:
"The only difference in the prompt is an extra line... warning against using too many sub-calls... Between GPT-5 and Qwen3-Coder, we found different behavior... models are inefficient decision makers over their context."
Current DeepSeek Implementation:
- Single system prompt for all sub-agent types
- No model-specific tuning
- No adaptive prompting based on model behavior
- No sub-call warning mechanisms
Gap Severity: 🟢 LOW
Required Implementation:
- Model-aware prompting strategies
- Adaptive sub-call limits per model
- Behavior monitoring and correction
- Per-model cost/performance tracking
Remaining Optional Components
The core RLM workflow is now implemented via tools (rlm_load, rlm_exec, rlm_query, rlm_status)
and prompt integration. The following are optional future refactors:
src/rlm_engine.rs: central orchestration layer if RLM logic growssrc/rlm_prompts.rs: model-specific prompt variants and tuningsrc/rlm_repl.rs: richer syntax/REPL language (current expressions are sufficient)src/tools/subagent.rs: nested sub-agent orchestration with shared buffers
Remaining Improvements (Post-Sprint 3)
| Priority | Gap | Files to Change | Effort |
|---|---|---|---|
| P2 | Semantic chunking + metadata | rlm.rs | Medium |
| P2 | Budget hard caps / per-model limits | rlm.rs, tui/ui.rs | Medium |
| P3 | Nested sub-agent orchestration | tools/subagent.rs | High |
| P3 | Model-specific tuning | prompts/rlm.txt or new module | Low |
Comparison Summary
| Aspect | RLM Paper | DeepSeek CLI | Gap |
|---|---|---|---|
| Context Handling | External variable in REPL | Externalized RLM session + prompt summary | 🟢 LOW |
| Sub-Calls | Recursive with buffers | rlm_query + shared buffers (no nested runtime) |
🟡 MEDIUM |
| REPL | Python REPL with llm_query | Tool-based REPL (rlm_exec + rlm_query) |
🟢 LOW |
| Output Format | FINAL/FINAL_VAR tags | Enforced in RLM prompts | 🟢 LOW |
| System Prompts | RLM-specific with examples | RLM + sub-call prompts | 🟢 LOW |
| Context Chunking | Adaptive, semantic | Fixed + section/line/auto chunking | 🟡 MEDIUM |
| Buffer Variables | Persistent across calls | Vars + store_as + FINAL_VAR | 🟢 LOW |
| Cost Tracking | Per-sub-call budgeting | Usage totals + batch guidance + UI warnings | 🟢 LOW |
| Answer Verification | Sub-LM confirmation | Verify mode in rlm_query |
🟢 LOW |
| Iterative Execution | Multi-turn REPL loop | Shared session across turns | 🟢 LOW |
References
- Khattab, O., Kraska, A., & Zhang, A. L. (2025). Recursive Language Models. arXiv:2512.24601
- DeepSeek CLI Implementation: src/rlm.rs, src/tools/subagent.rs