Commit Graph

292 Commits

Author SHA1 Message Date
Hunter Bown 5e7dbcd32b Merge branch 'feat/v067-help' (#93 help overlay) 2026-04-27 22:17:30 -05:00
Hunter Bown 48c30473da Merge branch 'feat/v067-providers' (#52 OpenRouter + Novita providers) 2026-04-27 22:17:27 -05:00
Hunter Bown 363f064fce Merge branch 'fix/v067-file-mention' (#101 @-file mention BLOCKER fix) 2026-04-27 22:17:22 -05:00
Hunter Bown 0f7252198d Merge branch 'fix/v067-stream-retry' (#103 stream-error retry + diagnostics) 2026-04-27 22:17:19 -05:00
Hunter Bown a3d0134173 Merge branch 'feat/v067-approval' (#129 approval modal Codex-style takeover) 2026-04-27 22:17:16 -05:00
Hunter Bown 7819fcc18b Merge branch 'feat/v067-cards' (#128 sub-agent in-transcript cards) 2026-04-27 22:17:11 -05:00
Hunter Bown 176a2ba4f4 Merge branch 'feat/v067-mailbox' (#130 sub-agent mailbox port) 2026-04-27 22:17:00 -05:00
Hunter Bown 34cba09e22 Merge branch 'feat/v067-prompts' (#68 sub-agent prompts tightening) 2026-04-27 22:16:57 -05:00
Hunter Bown 63cb06637b feat(tui): #128 in-transcript DelegateCard + FanoutCard
Cards consume the #130 mailbox stream and render live in the transcript:
- DelegateCard: last-3-actions tree for active agent_spawn
- FanoutCard: dot-grid + aggregate stats for agent_swarm / rlm fanout
Sidebar demoted to a navigator (count + role); detail lives in the card.

Engine wires SubAgentRuntime::with_mailbox so the primitive actually flows.
Cards re-bind on session resume via runtime_threads agent_ids.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 22:15:26 -05:00
Hunter Bown f118db8201 feat(providers): #52 OpenRouter + Novita as first-class providers
ProviderKind gains Openrouter + Novita variants; ModelRegistry registers
deepseek/deepseek-v4-{pro,flash} against both. /provider opens a picker
modal with inline API-key prompt for un-configured providers. Env
fallbacks: OPENROUTER_API_KEY, NOVITA_API_KEY.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:58:51 -05:00
Hunter Bown 9d4c1c1966 feat(tui): #129 approval modal Codex-style takeover
Reflowed approval.rs to a full-screen modal with two stakes-based variants:
benign (single-key approve / always) and destructive (explicit confirm).
Variant routing classifies from tool kind + command-safety so destructive
ops never get a muscle-memory accept.

Existing approval tests still pass; new tests cover variant routing + keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:57:16 -05:00
Hunter Bown 36320c5bea fix(client): #103 stream-error diagnostics + transparent retry on early decode failure
Phase 1: log full reqwest error chain + headers + bytes-received at decode site
Phase 2: HTTP/2 keepalive settings + tcp keepalive on the reqwest builder
Phase 3: engine transparently retries when stream errors before any content;
         surface error on mid-stream failure (no double-bill); stream_errors
         threshold relaxed 3 -> 5 with the new keepalive
Phase 4: unit tests for the four classes of stream failure

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:57:13 -05:00
Hunter Bown b759e3f74c feat(tui): #93 help overlay — ? opens searchable command + keybinding reference
New HelpView modal lists all slash commands with descriptions and all
keybindings, with a live substring filter. Bound to `?` when focus is
outside the composer; Esc / `?` toggles.

Slash commands pull from the existing slash_menu registry; keybindings
pull from a new KeybindingCatalog single-source-of-truth so docs can't
drift from the wired handlers.
2026-04-27 21:54:25 -05:00
Hunter Bown fd13dffd60 fix(tui): #101 @-file mention resolves CWD before workspace fallback
Two-pass resolution in file_mention::resolve_mention_path -- try
workspace.join first, then std::env::current_dir().join, then a basename
fuzzy fallback. Extracted the shared resolver into working_set.rs so the
future Ctrl+P fuzzy picker (#97) uses the same logic.

Tests cover the workspace/CWD divergence repro that was masquerading as a
"@ doesn't work" report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:52:25 -05:00
Hunter Bown 8d8c1ad2d4 feat(prompts): #68 tighten sub-agent output format + stop conditions
Each sub-agent type now has an explicit SUMMARY / EVIDENCE / CHANGES /
RISKS / BLOCKERS output contract, mode-specific guidance (explorer /
planner / reviewer / general), and tool-calling conventions that prefer
the typed tool surface over exec_shell shellouts.

The output format is defined once and referenced from each per-type
prompt, so future tweaks live in one place.
2026-04-27 21:50:18 -05:00
Hunter Bown 32750cb52d feat(subagent): #130 mailbox abstraction with seq + backpressure
Internal upgrade — public tool surface (agent_spawn, agent_swarm, …) unchanged.
The mailbox primitive replaces ad-hoc mpsc plumbing in the runtime so:
- progress events have monotonic ordering
- subscribers get watch-based backpressure
- close-as-cancel propagates through nested children

Pairs with #128 (in-transcript cards consume the mailbox stream).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:50:14 -05:00
Hunter Bown 4ac7219d77 fix(test): cycle archive tests are Windows-portable
The Windows CI test for archive_cycle_writes_jsonl_with_header_and_messages
was failing because:

1. The path-suffix assertion used "/1.jsonl" as a literal — Windows uses
   backslashes, so the assertion never matched. Replace with file_name()
   comparison which is platform-agnostic.
2. set_var("HOME", ...) doesn't redirect dirs::home_dir() on Windows;
   that function reads USERPROFILE on Windows. Set both env vars so the
   test redirects portably.

Same HomeGuard pattern in tools/recall_archive.rs tests was passing on
Windows by luck (each test uses a unique session_id, so writes to the
real ~/.deepseek/sessions/ didn't clash) but was polluting the user's
home directory in CI. Mirror the dual-env-var fix there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:25:33 -05:00
Hunter Bown 0db918f9e8 Merge branch 'feat/v066-ui-redesign'
#121 — UI redesign (4 of 6 sub-areas + foundation):
- Color-depth + brightness palette helpers
- Speaker glyphs (▎ / ●) with 2s pulse
- Reasoning treatment (dashed rail, italic, warm tint)
- Tool-card verb glyphs + family vocabulary module
- Footer crest waves + cost-on-left + 150ms cadence

Deferred (separate follow-up issues): sub-agent in-transcript cards,
approval modal Codex-style takeover.
2026-04-27 21:10:50 -05:00
Hunter Bown 95d6937d34 Merge branch 'feat/v066-steer-queue'
#122 — Esc-to-steer + queue visibility. SubmitDisposition state machine,
pending_steers/rejected_steers buckets, partial-output [interrupted] save.
2026-04-27 21:10:41 -05:00
Hunter Bown f95be44bc8 Merge branch 'feat/v066-cycle-restart'
#124 + #127 — checkpoint-restart context cycles + recall_archive tool.
2026-04-27 21:10:37 -05:00
Hunter Bown 2b2bddcf7e style: cargo fmt + clippy fixes for v0.6.6 UI redesign
Run `cargo fmt --all` after the four redesign sub-areas land to settle
attribute placement (`#[allow(dead_code)]` lives after doc comments,
not between them — interleaving was splitting docs from items).
Inline the trailing `let dom = …; dom` in `nearest_ansi16` to satisfy
clippy::let_and_return.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:08:29 -05:00
Hunter Bown aeba004c7b feat(tui): tool-card verb glyphs + family vocabulary module
Sub-area #3 of the v0.6.6 UI redesign (issue #121).

Introduces `crates/tui/src/tui/widgets/tool_card.rs` — a small,
self-contained vocabulary module that owns:

- `ToolFamily` enum (Read / Patch / Run / Find / Delegate / Fanout /
  Think / Generic) and the verb-glyph + label per family
  (▷ read, ◆ patch, ▶ run, ⌕ find, ◐ delegate, ⋮⋮ fanout, … think)
- `tool_family_for_title` — maps the legacy header titles
  (`"Shell"`, `"Patch"`, `"Workspace"`, `"Search"`, `"Diff"`, `"Image"`)
  to a family, so existing call sites pick up the new glyph without
  re-architecture
- `tool_family_for_name` — maps actual tool names
  (`agent_spawn`, `apply_patch`, etc.) for `GenericToolCell`, which
  shares the catch-all `"Tool"` title across every model-facing tool
- `CardRail` + `rail_glyph` — the `╭ │ ╰` rail vocabulary, declared
  here so any future per-card refactor has the matching glyphs

Wires the verb glyph + label into `render_tool_header` and adds a
`render_tool_header_with_family` overload so `GenericToolCell` can
route by tool name rather than the generic title. The header now reads
`<spinner> <verb-glyph> <verb> <state>` instead of
`<spinner> <Title-Case-Word> <state>`.

Existing parity tests for ExecCell / PlanUpdate are updated to assert
against the new header structure (verb + glyph) — the colour wiring is
unchanged. New tests pin the verb-glyph format end-to-end:
`agent_spawn` → `◐ delegate`, exec → `▶ run`.

Spinner cadence (TOOL_STATUS_SYMBOL_MS = 720 ms) is unchanged — the
spec already matched.

Deferred to a follow-up: full per-card rail (`╭ │ ╰`) refactor that
threads `CardRail` through every cell render path. The vocabulary is
in place; the layout pass is the next bite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:04:23 -05:00
Hunter Bown 7c3a01c7b8 feat(footer): crest waves + cost on left + 150 ms tick cadence
Sub-area #5 of the v0.6.6 UI redesign (issue #121).

Spout strip:
- Replace box-drawing arc cups (`╭───╮`) with paired crests (`⌒‿`) over
  the existing water-surface (`─`). Crests read as gentle ripples instead
  of hard architectural arches — calmer eye-feel.
- Two crests at independent cadences: A advances every 4 ticks (~600 ms),
  B every 6 ticks (~900 ms). Phase jitter every 17 ticks (~2.5 s) keeps
  the pattern from settling into a strict beat.
- Frame counter cadence in `ui.rs` retimed from 80 ms to 150 ms so the
  4×6×17 tick math lands on the spec'd timings.

Footer left cluster consolidates "what costs you what":
- Cost chip moves from the right-hand parade to the left, between model
  and status: `mode · model · cost · status`.
- Priority drop is now status → cost → truncate model → mode-only. Cost
  outranks status because it's steady info; status is a transient signal.
- Right cluster shrinks to coherence / agents / replay / cache.

Tests: existing strip determinism / position-advances tests are retuned
for the new tick math (12-tick window covers both crests). New tests pin
the cost slot's order on wide widths and confirm cost survives status
drop in the priority cascade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:59:07 -05:00
Hunter Bown 764aed65ed feat(tui): #122 Esc-to-steer + queue visibility
While a turn is running, Esc with non-empty composer input now steers:
the typed text is captured, the in-flight HTTP request is cancelled,
and on TurnComplete::Interrupted every accumulated steer is merged
into a single fresh user message that re-enters the engine. Empty-input
Esc still cancels exactly as before.

State machine. SubmitDisposition::{Immediate, Queue, Steer} replaces
the implicit if/else in submit_or_steer_message; truth table preserves
the offline_mode + busy fallback path. App.pending_steers /
rejected_steers / submit_pending_steers_after_interrupt back the new
flow and the queue-visibility widget.

Partial save. Deliberate divergence from openai/codex which discards
on abort: V4 thinking is expensive, so the streaming Assistant cell is
tagged '[interrupted] …' (or '[interrupted]' when nothing streamed yet)
and its spinner is flipped off. The TurnComplete handler also calls the
helper so Ctrl+C / network failures get the same treatment, idempotent
with the optimistic call in the Esc handler.

Queue visibility. PendingInputPreview already supported all three
buckets; build_pending_input_preview now populates pending_steers and
rejected_steers alongside queued_messages. rejected_steers stays empty
under today's engine paths (no rejection signal yet) but renders if/when
populated.

Recovery. If TurnComplete arrives with Failed instead of Interrupted
while pending_steers is non-empty, the steers are demoted to the
visible queue so they're not silently lost.

Tests. 13 new app-level units cover the disposition truth table,
push/drain semantics, double-Esc idempotency, and the partial-save
helper. 11 new ui-level units cover Esc-action routing, slash-menu
priority, whitespace-only input handling, merge_pending_steers (empty
/ single / multi / skill-instruction), and the three-bucket preview.

Closes #122.
2026-04-27 20:59:00 -05:00
Hunter Bown 2b0e73a4cf feat(tui): reasoning cells get dashed rail, italic body, warm tint
Sub-area #2 of the v0.6.6 UI redesign (issue #121).

Reasoning is the only deliberately-warm element in the redesigned
transcript. The treatment makes that visible:

- Header opener becomes `…` (slow exhale) instead of the spinner glyph
- Body left rail switches from solid `▏` to dashed `╎` so it visibly
  differs from message body and tool output rails
- Body text carries an italic modifier
- Body lines tint with `palette::reasoning_surface_tint(depth)` — 12%
  blend of SURFACE_REASONING (#362C1A) over DEEPSEEK_INK. ANSI-16
  terminals get no bg (the named-color mapping would distort the warm)
- A trailing `▎` cursor in ACCENT_REASONING_LIVE follows the most recent
  body line during streaming, suppressed under low_motion

Wires up palette helpers from the prior commit: `ColorDepth::detect`,
`reasoning_surface_tint`, `blend`. SURFACE_REASONING is no longer
dead-coded. The unused `thinking_symbol` helper is removed since the new
header doesn't spin.

Tests: dashed rail and italic body land on every body line; streaming
cursor appears only when motion is allowed; collapsed-summary affordance
keeps working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:52:59 -05:00
Hunter Bown 10bc2480db feat(core): #124 + #127 — checkpoint-restart cycles + recall_archive
Replaces lossy summarization compaction with a checkpoint-restart
architecture (#124). At 110K cumulative tokens (per V4's 128K retrieval
elbow) the engine runs a briefing turn, archives the cycle to JSONL at
~/.deepseek/sessions/<id>/cycles/<n>.jsonl, then resets the in-memory
buffer to a fresh context: original system prompt + structured state
(plan/todos/working-set/sub-agents) + the model-curated <carry_forward>
briefing (~3K token cap, hard-bounded).

The compaction summarizer is now off by default. Per-model thresholds in
[cycle.per_model] let operators tune deepseek-v4-pro vs -flash separately.
Phase guard in should_advance_cycle blocks mid-tool/stream/approval boundaries;
engine only invokes at clean turn-completed events. Sub-agents are not
awaited — their handles are captured in the structured-state block so the
new cycle sees them still running.

Adds the recall_archive tool (#127) — BM25 over message text in archived
cycles, top-N hits with cycle/index/excerpt. Always-loaded across modes
via should_default_defer_tool so the agent doesn't need ToolSearch to
discover it. Children inherit it via with_full_agent_surface.

UI surfaces:
- /cycles, /cycle <n>, /recall <query> slash commands
- Sidebar shows cycle counter once a boundary fires
- CycleAdvanced engine event carries the full briefing so the UI can
  populate app.cycle_briefings for /cycle <n>
- runtime_threads schema bumped to v2 (cycle.advanced events appear in
  the durable timeline; load rejects future versions)

Tests: 21 cycle_manager + 13 recall_archive + 4 commands::cycle.
All 1168 workspace tests pass. Three parity gates pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:52:29 -05:00
Hunter Bown 5b7ff8cb69 feat(tui): replace literal speaker labels with calm glyphs
Sub-area #1 of the v0.6.6 UI redesign (issue #121).

User cells now lead with `▎` (solid bar, no animation — input is finished).
Assistant cells lead with `●`. While streaming, the bullet pulses on a
2-second cycle between 30%..100% brightness via `palette::pulse_brightness`;
once a turn completes the bullet sits at full DEEPSEEK_SKY so finished
history reads as solid.

Honors `low_motion`: the pulse is suppressed and the glyph holds full
brightness regardless of streaming state. Pager / clipboard exports
(`transcript_lines`) also skip the pulse so screenshots are stable.

Existing pager titles in `ui.rs` (`history_cell_to_text`) keep the literal
"You" / "Assistant" wording — those drive the modal title bar and read
better as words than as glyphs.

Tests: glyphs replace the literal labels in both User and Assistant cells;
streaming pulse demonstrably dips below source brightness; idle and
low_motion both pin to source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:50:24 -05:00
Hunter Bown 9358b92af5 feat(palette): color-depth fallback + brightness pulse helpers
Adds the foundation for the v0.6.6 UI redesign (issue #121):

- ColorDepth enum with detect() reading COLORTERM/TERM
- adapt_color / adapt_bg gates for ANSI-16 terminals (drops bg tints
  rather than coarse-mapping them, which would distort the palette)
- blend(fg, bg, alpha) for alpha compositing on RGB
- reasoning_surface_tint() — 12% blend over DEEPSEEK_INK (None on ANSI-16)
- pulse_brightness(color, now_ms) — 30%..100% sine swing on a 2s cycle
- nearest_ansi16() helper for legacy-terminal foreground mapping

Helpers carry #[allow(dead_code)] until the per-area redesign sub-tasks
wire them in (speaker pulse, reasoning treatment, footer cost cluster).
Tests cover envelope bounds and brand-color routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 20:47:27 -05:00
Hunter Bown e075ecd0fe chore: add #[allow(dead_code)] for v0.6.6 additions not yet wired
- ClientError, StreamError, and their impl blocks in error_taxonomy.rs
- ApprovalCache in approval_cache.rs (pending #66 follow-up wiring)
- Legacy prompt constants in prompts.rs (backward compat)
- with_mcp_tools and McpToolAdapter in registry.rs (pending MCP migration)
- Fix rlm_query → rlm in when_not_to_use_sections_present test
2026-04-27 19:57:53 -05:00
Hunter Bown 8be032b6bd style: cargo fmt after parallel stream merges 2026-04-27 19:43:58 -05:00
Hunter Bown f10d5d1829 Merge branch 'feat/tool-polish' 2026-04-27 19:41:44 -05:00
Hunter Bown 7a06915b0b feat(tools): approval cache + error taxonomy + defer_loading + command safety trim
- Add fingerprint-based ApprovalCache with call-specific keys (patch hash,
  shell prefix, URL host) instead of tool-name keys. Session-keyed.
- Add ClientError/StreamError enums in error_taxonomy.rs with Retry-After
  header support. Wire ErrorEnvelope into Event::Error.
- Add defer_loading() default method to ToolSpec trait. McpToolAdapter
  returns true for non-discovery MCP tools.
- Add with_mcp_tools() on ToolRegistryBuilder for unified pipeline.
- Trim DANGEROUS_PATTERNS in command_safety.rs from 25→5 entries.
  Only rm -rf and fork bomb remain; chaining/substitution downgraded
  to RequiresApproval. Matches Codex's restraint.
- ApprovalRequired events now carry approval_key for UI caching.

TODO_BACKEND.md §1, §5
2026-04-27 19:40:49 -05:00
Hunter Bown a32148dac9 Merge branch 'feat/prompts-restructure' 2026-04-27 19:34:27 -05:00
Hunter Bown a345a956aa feat(perf): wire frame-rate limiter + adaptive chunking low-motion mode
- Wire 120 FPS FrameRateLimiter into run_event_loop via
  time_until_next_draw + mark_emitted
- Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL
- Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode
- Add StreamingState::set_low_motion() to propagate to all block policies
- Tool spinner already freezes on first frame when low_motion is set

TODO_BACKEND.md §3, TODO_FIXES.md #4
2026-04-27 19:33:52 -05:00
Hunter Bown 6ef2421d61 feat(prompts): restructure into composable personality overlays
Split monolithic agent.txt into:
- base.md: core identity, toolbox, subagent.done protocol
- personalities/calm.md + playful.md: voice overlays
- modes/agent.md, plan.md, yolo.md: mode deltas
- approvals/auto.md, suggest.md, never.md: approval-policy deltas
- compact.md: 9-line compaction handoff template

Add compose_prompt() in prompts.rs: base → personality → mode → approval.
Add Personality enum with from_settings(). Preserve legacy .txt constants for
backward compatibility.

TODO_BACKEND.md §6
2026-04-27 19:33:52 -05:00
Hunter Bown d4b9ccfdb3 feat(subagent): full registry inheritance + auto-approve + depth cap + cwd (#99)
v0.6.6 — sub-agents inherit the parent's full tool registry,
auto-approve, respect a depth cap, and propagate cancellation.
Adds optional cwd to agent_spawn for parallel-worktree dispatch.
Schema-ready for roles (full library lands in 0.6.7).

Changes:
- New ToolRegistryBuilder::with_full_agent_surface(...) shared by parent and child
- SubAgentToolRegistry::new refactored to use shared builder; per-type
  allowlist becomes advisory
- SubAgentRuntime gains auto_approve, spawn_depth, max_spawn_depth, cancel_token
- Depth check at spawn entry; cancellation cascade via CancellationToken::child_token()
- <deepseek:subagent.done> sentinel emitted on child completion
- cwd: Option<PathBuf> on agent_spawn with workspace-boundary validation
- Stream wall-clock cap bumped to 30 min (was 300s)
- max_spawn_depth configurable via EngineConfig (default 3)
- Version bump to 0.6.6

Closes #99.
2026-04-27 19:16:22 -05:00
Hunter Bown 2787cdc7b9 refactor(tools): drop parallel_fanout — rlm is the only RLM tool
Two near-duplicate top-level tools made the surface confusing. With
parallel_fanout (formerly rlm_query) removed, there's exactly one RLM
shape: load a long input as `context` in a Python REPL via `rlm`, and
let the sub-agent fan out from inside the REPL via `llm_query_batched`
where it has `context` in scope to chunk against.

For non-RLM parallel work the dispatcher already runs multiple tool
calls per turn concurrently — no separate fan-out tool needed. The
GenericToolCell.prompts rendering hook stays (one-row-per-child for any
future fan-out tool), but no tool currently populates it.

Also drops two stray test artifacts (rlm_catalog.md, rlm_test_doc.md)
the model wrote to repo root during a previous live test session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:14:10 -05:00
Hunter Bown 9dd0d12cea refactor(tools): rename rlm_process → rlm, rlm_query → parallel_fanout
Two top-level tools shared the rlm_ prefix but did completely different
things — rlm_query was a flat parallel-completion fan-out wearing an
RLM-shaped name, and rlm_process was the actual recursive language model.
The overlap was the source of the "our rlm query is completely wrong"
confusion.

  rlm_process  → rlm              # single, honest name for the recursive tool
  rlm_query    → parallel_fanout  # honest name for the flat fanout

Internal renames follow:
  Op::RlmQuery       → Op::Rlm
  AppAction::RlmQuery → AppAction::Rlm
  handle_rlm_query    → handle_rlm
  RlmProcessTool      → RlmTool
  RlmQueryTool        → ParallelFanoutTool
  RlmChildClient      → FanoutChildClient
  with_rlm_process_tool → with_rlm_tool
  with_rlm_query_tool   → with_parallel_fanout_tool

The REPL helpers `rlm_query` / `rlm_query_batched` / `llm_query` /
`llm_query_batched` keep their names — those are correctly named (they
ARE recursive within the REPL) and the model knows them from the system
prompt and metadata.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:10:17 -05:00
Hunter Bown 2865c9a766 refactor(rlm): drop HTTP sidecar — long-lived Python REPL over stdin/stdout
The RLM tool used to spawn a fresh `python3 -c "..."` per round and route
sub-LLM calls through a localhost axum sidecar; state persisted only via a
JSON file (lossy: imports and non-JSON values were lost). The model could
also short-circuit by replying with prose and the loop would ship the
prose as if it came from the REPL.

This commit replaces that with one long-lived `python3 -u` subprocess per
turn driven by a stdin/stdout RPC protocol with UUID-prefixed sentinels.
No more HTTP server, no more port allocation, no more JSON state file —
variables, imports, and any other Python state persist naturally across
rounds. The `RlmBridge` (`crates/tui/src/rlm/bridge.rs`) services
`llm_query` / `llm_query_batched` / `rlm_query` / `rlm_query_batched`
calls inline, recursing into `run_rlm_turn_inner` for sub-RLMs.

The system prompt is tightened: the only legal turn shape is one
` ```repl ` block; calling `FINAL(...)` from prose without ever invoking a
sub-LLM is rejected with a strict reminder. The `DirectAnswer` termination
is gone, replaced by `NoCode` which only surfaces after multiple consecutive
empty rounds. `rlm_process` now returns a per-round trace (code summary,
sub-LLM call count, elapsed) so callers can verify the model actually
engaged with `context` rather than guessing from the preview.

Net: -313 lines. 17 new REPL runtime tests cover variable persistence,
import persistence, RPC round-trips, FINAL capture, and error recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:44:53 -05:00
Hunter Bown 5cec1534be feat(rlm): align with reference impl + add rlm_process tool; bump 0.6.5
The previous /rlm slash command flow had a UI rendering gap (the answer
never made it back to the model's view) and required the user to invoke
it manually. Pivoting to a tool-call surface and aligning the in-REPL
helpers with the canonical reference (alexzhang13/rlm) by the paper
authors so the same prompts and decomposition patterns transfer.

New tool: rlm_process
- crates/tui/src/tools/rlm_process.rs
- Inputs: task (small, shown to root LLM each iter as root_prompt) +
  exactly one of file_path (workspace-relative, preferred) or content
  (inline, capped at 200k chars). Optional child_model and max_depth.
- Loaded across Plan/Agent/YOLO; never deferred via ToolSearch.
- Returns the final answer string + metadata (iterations, duration,
  tokens, termination).

REPL surface aligned with reference (alexzhang13/rlm):
- Variable name `context` (was PROMPT)
- Code fence ```repl (was ```python; python/py kept as fallback)
- Helpers: llm_query, llm_query_batched (NEW), rlm_query (was sub_rlm),
  rlm_query_batched (NEW), SHOW_VARS (NEW), FINAL, FINAL_VAR,
  repl_get/repl_set
- Top-level JSON-serializable user variables auto-persist across rounds
  (no repl_set ceremony required)
- FINAL(...) / FINAL_VAR(...) parseable from the model's raw response
  text (parse_text_final), in addition to the in-REPL sentinel path.
  Code-fenced occurrences are correctly ignored to prevent false hits.

Sidecar (axum, 127.0.0.1:0):
- Added POST /llm_batch and POST /rlm_batch endpoints (parallel fanout,
  cap 16 prompts per batch). Mirrors the reference's batched semantics.

Other:
- System prompt rewritten with reference's strategy patterns
  (PREVIEW → CHUNK+map-reduce via llm_query_batched → RECURSIVE
  decomposition via rlm_query → programmatic compute + LLM interp).
- Strict termination loop unchanged: must emit ```repl or text-level
  FINAL each round; one fence-less round → reminder, two → DirectAnswer.
- /rlm slash command remains for manual debug; description points the
  model toward rlm_process for the in-agent flow.

Versions: workspace 0.6.4 → 0.6.5; npm wrapper 0.6.4 → 0.6.5.

Gates green: cargo fmt, cargo clippy --all-targets --all-features
--locked -D warnings, cargo test --workspace --all-features --locked
(all pass), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=
-Dwarnings cargo doc --workspace --no-deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:17:09 -05:00
Hunter Bown 950a66c24a chore(rlm): drop "Algorithm 1" from user-facing status strings
Keep the paper reference in code/doc comments where it actually helps a
future reader; the live status line just needs to say what's happening,
not cite the citation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:51:17 -05:00
Hunter Bown bd938a559c fix(rlm): wire real recursive substrate; bump 0.6.4
The v0.6.3 RLM loop had Algorithm 1's outer shape but the substrate was
non-functional: `llm_query()` was a Python stub that returned a hardcoded
string and `child_model` was bound with an underscore prefix and silently
dropped. The recursive sub-LLM call advertised by /rlm never fired.

This commit wires the substrate end-to-end per Zhang/Kraska/Khattab
(arXiv:2512.24601, Algorithm 1):

- New axum HTTP sidecar (`rlm/sidecar.rs`) bound to 127.0.0.1:0 for the
  duration of one RLM turn. Python's `llm_query()` and `sub_rlm()` are
  real `urllib.request` POSTs; Rust services them via the existing
  DeepSeek client. Token usage from sidecar-served calls folds into the
  parent `RlmTurnResult.usage`.
- `child_model` is plumbed through `Op::RlmQuery` → `AppAction::RlmQuery`
  → `run_rlm_turn` → sidecar handlers; default remains `deepseek-v4-flash`.
- New `sub_rlm(prompt)` Python helper runs a full Algorithm-1 turn at
  depth-1 (paper's `sub_RLM`). Default `max_depth = 2` from `/rlm`. The
  recursive opaque-future cycle is broken by returning a concrete
  `Pin<Box<dyn Future + Send>>` from `run_rlm_turn_inner`.
- Strict termination: the loop ends only via `FINAL(value)` (or the
  iteration cap). One fence-less round is tolerated with a reminder
  appended; two consecutive ones surface the model text as a
  `RlmTermination::DirectAnswer` exit. New `RlmTermination` enum lets
  callers tell `Final | DirectAnswer | Exhausted | Error` apart.
- Richer `Metadata(state)`: includes paper-required access patterns
  (`repl_get` / slicing / `splitlines` / `repl_set` / `llm_query` /
  `sub_rlm` / `FINAL`) and a live list of variable keys currently in
  the REPL state file.
- Unicode-safe `truncate_text` (was mixing bytes with chars), per-turn
  state-file cleanup, `ROOM_TEMPERATURE` typo → `ROOT_TEMPERATURE`.
- New end-to-end test `sidecar_url_is_exported_to_python_env` stands up
  a stand-in axum server, runs `print(llm_query('hello'))` in the real
  PythonRuntime, and asserts the reply round-trips. Catches future
  regressions in sidecar URL passthrough.

Versions: workspace 0.6.3 → 0.6.4 in Cargo.toml; npm wrapper 0.6.3 → 0.6.4
in npm/deepseek-tui/package.json.

Gates: cargo fmt, cargo clippy --all-targets --all-features --locked
-D warnings, cargo test --workspace --all-features --locked (1088
passed), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=-Dwarnings
cargo doc --workspace --no-deps — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:36:59 -05:00
Hunter Bown e8c3e7d1bf docs: fix rustdoc release warnings 2026-04-26 23:43:31 -05:00
Hunter Bown bb3c460358 chore: satisfy v0.6.3 release gates 2026-04-26 23:39:03 -05:00
Hunter Bown 42c684367f feat(rlm): implement true RLM loop per Algorithm 1 (Zhang et al., arXiv:2512.24601)
Adds the true Recursive Language Model (RLM) inference paradigm:

- rlm/mod.rs — module root with public API
- rlm/prompt.rs — RLM system prompt teaching the model to write code
- rlm/turn.rs — Algorithm 1 implementation:
  - P stored as REPL variable (NEVER in LLM context window)
  - Metadata-only context sent to root LLM (constant-size)
  - LLM generates Python code, not free text
  - Code executed in PythonRuntime with llm_query() for recursion
  - FINAL() detection ends the loop
- Op::RlmQuery variant in ops.rs
- /rlm command in the command system
- AppAction::RlmQuery handler in ui.rs
- PythonRuntime::with_state_path made public for RLM integration
- 18 new unit tests for code extraction, metadata building, truncation

Key differences from previous 'RLM-inspired' approach:
 P is external (REPL variable), not in LLM context
 Only metadata(state) in LLM context (constant-size)
 LLM generates code, not free text + tool calls
 sub-LLM recursion via llm_query() inside REPL code
 FINAL() mechanism for programmatic termination
2026-04-26 23:34:17 -05:00
Hunter Bown ac8a882be5 chore: clean v0.6.3 repl build warnings 2026-04-26 23:12:57 -05:00
Hunter Bown 4e46fd06f6 feat(repl): wire PythonRuntime into engine turn loop (Phase 2)
After the assistant message is persisted, when tool_uses is empty,
check for inline ```repl blocks and execute them via PythonRuntime:

- Extract REPL blocks from assistant text
- Spawn PythonRuntime and execute each block sequentially
- If a round returns FINAL: replace the assistant message text with
  the final value and break the turn
- If no FINAL: append truncated stdout/stderr as user feedback and
  continue the turn loop for iterative refinement
- Emit status events so the user sees 'REPL round N: ...' in the UI

All 26 REPL tests + RLM tests pass. Release build verified.

Refs: paper-spec RLM (Zhang et al., arXiv:2512.24601) §2
2026-04-26 18:54:46 -05:00
Hunter Bown 2fcc637d4f Merge pull request #118 from Hmbown/fix/issue-115-context-percent
fix(tui): context-usage % no longer drops after multi-round turns (#115)
2026-04-26 18:01:14 -05:00
Hunter Bown b7c5cb4112 test(ui): update auto-compact test for #115 estimate-first behavior 2026-04-26 17:55:56 -05:00
Hunter Bown 60f5f39584 Merge pull request #109 from Hmbown/feat/issue-85-pending-input-preview
feat(tui): pending-input preview widget (#85 Phase 1)
2026-04-26 17:54:35 -05:00