codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	5e7dbcd32b	Merge branch 'feat/v067-help' (#93 help overlay)	2026-04-27 22:17:30 -05:00
Hunter Bown	48c30473da	Merge branch 'feat/v067-providers' (#52 OpenRouter + Novita providers)	2026-04-27 22:17:27 -05:00
Hunter Bown	363f064fce	Merge branch 'fix/v067-file-mention' (#101 @-file mention BLOCKER fix)	2026-04-27 22:17:22 -05:00
Hunter Bown	0f7252198d	Merge branch 'fix/v067-stream-retry' (#103 stream-error retry + diagnostics)	2026-04-27 22:17:19 -05:00
Hunter Bown	a3d0134173	Merge branch 'feat/v067-approval' (#129 approval modal Codex-style takeover)	2026-04-27 22:17:16 -05:00
Hunter Bown	7819fcc18b	Merge branch 'feat/v067-cards' (#128 sub-agent in-transcript cards)	2026-04-27 22:17:11 -05:00
Hunter Bown	176a2ba4f4	Merge branch 'feat/v067-mailbox' (#130 sub-agent mailbox port)	2026-04-27 22:17:00 -05:00
Hunter Bown	34cba09e22	Merge branch 'feat/v067-prompts' (#68 sub-agent prompts tightening)	2026-04-27 22:16:57 -05:00
Hunter Bown	63cb06637b	feat(tui): #128 in-transcript DelegateCard + FanoutCard Cards consume the #130 mailbox stream and render live in the transcript: - DelegateCard: last-3-actions tree for active agent_spawn - FanoutCard: dot-grid + aggregate stats for agent_swarm / rlm fanout Sidebar demoted to a navigator (count + role); detail lives in the card. Engine wires SubAgentRuntime::with_mailbox so the primitive actually flows. Cards re-bind on session resume via runtime_threads agent_ids. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 22:15:26 -05:00
Hunter Bown	f118db8201	feat(providers): #52 OpenRouter + Novita as first-class providers ProviderKind gains Openrouter + Novita variants; ModelRegistry registers deepseek/deepseek-v4-{pro,flash} against both. /provider opens a picker modal with inline API-key prompt for un-configured providers. Env fallbacks: OPENROUTER_API_KEY, NOVITA_API_KEY. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:58:51 -05:00
Hunter Bown	9d4c1c1966	feat(tui): #129 approval modal Codex-style takeover Reflowed approval.rs to a full-screen modal with two stakes-based variants: benign (single-key approve / always) and destructive (explicit confirm). Variant routing classifies from tool kind + command-safety so destructive ops never get a muscle-memory accept. Existing approval tests still pass; new tests cover variant routing + keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:57:16 -05:00
Hunter Bown	36320c5bea	fix(client): #103 stream-error diagnostics + transparent retry on early decode failure Phase 1: log full reqwest error chain + headers + bytes-received at decode site Phase 2: HTTP/2 keepalive settings + tcp keepalive on the reqwest builder Phase 3: engine transparently retries when stream errors before any content; surface error on mid-stream failure (no double-bill); stream_errors threshold relaxed 3 -> 5 with the new keepalive Phase 4: unit tests for the four classes of stream failure Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:57:13 -05:00
Hunter Bown	b759e3f74c	feat(tui): #93 help overlay — `?` opens searchable command + keybinding reference New HelpView modal lists all slash commands with descriptions and all keybindings, with a live substring filter. Bound to `?` when focus is outside the composer; Esc / `?` toggles. Slash commands pull from the existing slash_menu registry; keybindings pull from a new KeybindingCatalog single-source-of-truth so docs can't drift from the wired handlers.	2026-04-27 21:54:25 -05:00
Hunter Bown	fd13dffd60	fix(tui): #101 @-file mention resolves CWD before workspace fallback Two-pass resolution in file_mention::resolve_mention_path -- try workspace.join first, then std::env::current_dir().join, then a basename fuzzy fallback. Extracted the shared resolver into working_set.rs so the future Ctrl+P fuzzy picker (#97) uses the same logic. Tests cover the workspace/CWD divergence repro that was masquerading as a "@ doesn't work" report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:52:25 -05:00
Hunter Bown	8d8c1ad2d4	feat(prompts): #68 tighten sub-agent output format + stop conditions Each sub-agent type now has an explicit SUMMARY / EVIDENCE / CHANGES / RISKS / BLOCKERS output contract, mode-specific guidance (explorer / planner / reviewer / general), and tool-calling conventions that prefer the typed tool surface over exec_shell shellouts. The output format is defined once and referenced from each per-type prompt, so future tweaks live in one place.	2026-04-27 21:50:18 -05:00
Hunter Bown	32750cb52d	feat(subagent): #130 mailbox abstraction with seq + backpressure Internal upgrade — public tool surface (agent_spawn, agent_swarm, …) unchanged. The mailbox primitive replaces ad-hoc mpsc plumbing in the runtime so: - progress events have monotonic ordering - subscribers get watch-based backpressure - close-as-cancel propagates through nested children Pairs with #128 (in-transcript cards consume the mailbox stream). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:50:14 -05:00
Hunter Bown	4ac7219d77	fix(test): cycle archive tests are Windows-portable The Windows CI test for archive_cycle_writes_jsonl_with_header_and_messages was failing because: 1. The path-suffix assertion used "/1.jsonl" as a literal — Windows uses backslashes, so the assertion never matched. Replace with file_name() comparison which is platform-agnostic. 2. set_var("HOME", ...) doesn't redirect dirs::home_dir() on Windows; that function reads USERPROFILE on Windows. Set both env vars so the test redirects portably. Same HomeGuard pattern in tools/recall_archive.rs tests was passing on Windows by luck (each test uses a unique session_id, so writes to the real ~/.deepseek/sessions/ didn't clash) but was polluting the user's home directory in CI. Mirror the dual-env-var fix there too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:25:33 -05:00
Hunter Bown	0db918f9e8	Merge branch 'feat/v066-ui-redesign' #121 — UI redesign (4 of 6 sub-areas + foundation): - Color-depth + brightness palette helpers - Speaker glyphs (▎ / ●) with 2s pulse - Reasoning treatment (dashed rail, italic, warm tint) - Tool-card verb glyphs + family vocabulary module - Footer crest waves + cost-on-left + 150ms cadence Deferred (separate follow-up issues): sub-agent in-transcript cards, approval modal Codex-style takeover.	2026-04-27 21:10:50 -05:00
Hunter Bown	95d6937d34	Merge branch 'feat/v066-steer-queue' #122 — Esc-to-steer + queue visibility. SubmitDisposition state machine, pending_steers/rejected_steers buckets, partial-output [interrupted] save.	2026-04-27 21:10:41 -05:00
Hunter Bown	f95be44bc8	Merge branch 'feat/v066-cycle-restart' #124 + #127 — checkpoint-restart context cycles + recall_archive tool.	2026-04-27 21:10:37 -05:00
Hunter Bown	2b2bddcf7e	style: cargo fmt + clippy fixes for v0.6.6 UI redesign Run `cargo fmt --all` after the four redesign sub-areas land to settle attribute placement (`#[allow(dead_code)]` lives after doc comments, not between them — interleaving was splitting docs from items). Inline the trailing `let dom = …; dom` in `nearest_ansi16` to satisfy clippy::let_and_return. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:08:29 -05:00
Hunter Bown	aeba004c7b	feat(tui): tool-card verb glyphs + family vocabulary module Sub-area #3 of the v0.6.6 UI redesign (issue #121). Introduces `crates/tui/src/tui/widgets/tool_card.rs` — a small, self-contained vocabulary module that owns: - `ToolFamily` enum (Read / Patch / Run / Find / Delegate / Fanout / Think / Generic) and the verb-glyph + label per family (▷ read, ◆ patch, ▶ run, ⌕ find, ◐ delegate, ⋮⋮ fanout, … think) - `tool_family_for_title` — maps the legacy header titles (`"Shell"`, `"Patch"`, `"Workspace"`, `"Search"`, `"Diff"`, `"Image"`) to a family, so existing call sites pick up the new glyph without re-architecture - `tool_family_for_name` — maps actual tool names (`agent_spawn`, `apply_patch`, etc.) for `GenericToolCell`, which shares the catch-all `"Tool"` title across every model-facing tool - `CardRail` + `rail_glyph` — the `╭ │ ╰` rail vocabulary, declared here so any future per-card refactor has the matching glyphs Wires the verb glyph + label into `render_tool_header` and adds a `render_tool_header_with_family` overload so `GenericToolCell` can route by tool name rather than the generic title. The header now reads `<spinner> <verb-glyph> <verb> <state>` instead of `<spinner> <Title-Case-Word> <state>`. Existing parity tests for ExecCell / PlanUpdate are updated to assert against the new header structure (verb + glyph) — the colour wiring is unchanged. New tests pin the verb-glyph format end-to-end: `agent_spawn` → `◐ delegate`, exec → `▶ run`. Spinner cadence (TOOL_STATUS_SYMBOL_MS = 720 ms) is unchanged — the spec already matched. Deferred to a follow-up: full per-card rail (`╭ │ ╰`) refactor that threads `CardRail` through every cell render path. The vocabulary is in place; the layout pass is the next bite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:04:23 -05:00
Hunter Bown	7c3a01c7b8	feat(footer): crest waves + cost on left + 150 ms tick cadence Sub-area #5 of the v0.6.6 UI redesign (issue #121). Spout strip: - Replace box-drawing arc cups (`╭───╮`) with paired crests (`⌒‿`) over the existing water-surface (`─`). Crests read as gentle ripples instead of hard architectural arches — calmer eye-feel. - Two crests at independent cadences: A advances every 4 ticks (~600 ms), B every 6 ticks (~900 ms). Phase jitter every 17 ticks (~2.5 s) keeps the pattern from settling into a strict beat. - Frame counter cadence in `ui.rs` retimed from 80 ms to 150 ms so the 4×6×17 tick math lands on the spec'd timings. Footer left cluster consolidates "what costs you what": - Cost chip moves from the right-hand parade to the left, between model and status: `mode · model · cost · status`. - Priority drop is now status → cost → truncate model → mode-only. Cost outranks status because it's steady info; status is a transient signal. - Right cluster shrinks to coherence / agents / replay / cache. Tests: existing strip determinism / position-advances tests are retuned for the new tick math (12-tick window covers both crests). New tests pin the cost slot's order on wide widths and confirm cost survives status drop in the priority cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:59:07 -05:00
Hunter Bown	764aed65ed	feat(tui): #122 Esc-to-steer + queue visibility While a turn is running, Esc with non-empty composer input now steers: the typed text is captured, the in-flight HTTP request is cancelled, and on TurnComplete::Interrupted every accumulated steer is merged into a single fresh user message that re-enters the engine. Empty-input Esc still cancels exactly as before. State machine. SubmitDisposition::{Immediate, Queue, Steer} replaces the implicit if/else in submit_or_steer_message; truth table preserves the offline_mode + busy fallback path. App.pending_steers / rejected_steers / submit_pending_steers_after_interrupt back the new flow and the queue-visibility widget. Partial save. Deliberate divergence from openai/codex which discards on abort: V4 thinking is expensive, so the streaming Assistant cell is tagged '[interrupted] …' (or '[interrupted]' when nothing streamed yet) and its spinner is flipped off. The TurnComplete handler also calls the helper so Ctrl+C / network failures get the same treatment, idempotent with the optimistic call in the Esc handler. Queue visibility. PendingInputPreview already supported all three buckets; build_pending_input_preview now populates pending_steers and rejected_steers alongside queued_messages. rejected_steers stays empty under today's engine paths (no rejection signal yet) but renders if/when populated. Recovery. If TurnComplete arrives with Failed instead of Interrupted while pending_steers is non-empty, the steers are demoted to the visible queue so they're not silently lost. Tests. 13 new app-level units cover the disposition truth table, push/drain semantics, double-Esc idempotency, and the partial-save helper. 11 new ui-level units cover Esc-action routing, slash-menu priority, whitespace-only input handling, merge_pending_steers (empty / single / multi / skill-instruction), and the three-bucket preview. Closes #122.	2026-04-27 20:59:00 -05:00
Hunter Bown	2b0e73a4cf	feat(tui): reasoning cells get dashed rail, italic body, warm tint Sub-area #2 of the v0.6.6 UI redesign (issue #121). Reasoning is the only deliberately-warm element in the redesigned transcript. The treatment makes that visible: - Header opener becomes `…` (slow exhale) instead of the spinner glyph - Body left rail switches from solid `▏` to dashed `╎` so it visibly differs from message body and tool output rails - Body text carries an italic modifier - Body lines tint with `palette::reasoning_surface_tint(depth)` — 12% blend of SURFACE_REASONING (#362C1A) over DEEPSEEK_INK. ANSI-16 terminals get no bg (the named-color mapping would distort the warm) - A trailing `▎` cursor in ACCENT_REASONING_LIVE follows the most recent body line during streaming, suppressed under low_motion Wires up palette helpers from the prior commit: `ColorDepth::detect`, `reasoning_surface_tint`, `blend`. SURFACE_REASONING is no longer dead-coded. The unused `thinking_symbol` helper is removed since the new header doesn't spin. Tests: dashed rail and italic body land on every body line; streaming cursor appears only when motion is allowed; collapsed-summary affordance keeps working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:52:59 -05:00
Hunter Bown	10bc2480db	feat(core): #124 + #127 — checkpoint-restart cycles + recall_archive Replaces lossy summarization compaction with a checkpoint-restart architecture (#124). At 110K cumulative tokens (per V4's 128K retrieval elbow) the engine runs a briefing turn, archives the cycle to JSONL at ~/.deepseek/sessions/<id>/cycles/<n>.jsonl, then resets the in-memory buffer to a fresh context: original system prompt + structured state (plan/todos/working-set/sub-agents) + the model-curated <carry_forward> briefing (~3K token cap, hard-bounded). The compaction summarizer is now off by default. Per-model thresholds in [cycle.per_model] let operators tune deepseek-v4-pro vs -flash separately. Phase guard in should_advance_cycle blocks mid-tool/stream/approval boundaries; engine only invokes at clean turn-completed events. Sub-agents are not awaited — their handles are captured in the structured-state block so the new cycle sees them still running. Adds the recall_archive tool (#127) — BM25 over message text in archived cycles, top-N hits with cycle/index/excerpt. Always-loaded across modes via should_default_defer_tool so the agent doesn't need ToolSearch to discover it. Children inherit it via with_full_agent_surface. UI surfaces: - /cycles, /cycle <n>, /recall <query> slash commands - Sidebar shows cycle counter once a boundary fires - CycleAdvanced engine event carries the full briefing so the UI can populate app.cycle_briefings for /cycle <n> - runtime_threads schema bumped to v2 (cycle.advanced events appear in the durable timeline; load rejects future versions) Tests: 21 cycle_manager + 13 recall_archive + 4 commands::cycle. All 1168 workspace tests pass. Three parity gates pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:52:29 -05:00
Hunter Bown	5b7ff8cb69	feat(tui): replace literal speaker labels with calm glyphs Sub-area #1 of the v0.6.6 UI redesign (issue #121). User cells now lead with `▎` (solid bar, no animation — input is finished). Assistant cells lead with `●`. While streaming, the bullet pulses on a 2-second cycle between 30%..100% brightness via `palette::pulse_brightness`; once a turn completes the bullet sits at full DEEPSEEK_SKY so finished history reads as solid. Honors `low_motion`: the pulse is suppressed and the glyph holds full brightness regardless of streaming state. Pager / clipboard exports (`transcript_lines`) also skip the pulse so screenshots are stable. Existing pager titles in `ui.rs` (`history_cell_to_text`) keep the literal "You" / "Assistant" wording — those drive the modal title bar and read better as words than as glyphs. Tests: glyphs replace the literal labels in both User and Assistant cells; streaming pulse demonstrably dips below source brightness; idle and low_motion both pin to source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:50:24 -05:00
Hunter Bown	9358b92af5	feat(palette): color-depth fallback + brightness pulse helpers Adds the foundation for the v0.6.6 UI redesign (issue #121): - ColorDepth enum with detect() reading COLORTERM/TERM - adapt_color / adapt_bg gates for ANSI-16 terminals (drops bg tints rather than coarse-mapping them, which would distort the palette) - blend(fg, bg, alpha) for alpha compositing on RGB - reasoning_surface_tint() — 12% blend over DEEPSEEK_INK (None on ANSI-16) - pulse_brightness(color, now_ms) — 30%..100% sine swing on a 2s cycle - nearest_ansi16() helper for legacy-terminal foreground mapping Helpers carry #[allow(dead_code)] until the per-area redesign sub-tasks wire them in (speaker pulse, reasoning treatment, footer cost cluster). Tests cover envelope bounds and brand-color routing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:47:27 -05:00
Hunter Bown	e075ecd0fe	chore: add #[allow(dead_code)] for v0.6.6 additions not yet wired - ClientError, StreamError, and their impl blocks in error_taxonomy.rs - ApprovalCache in approval_cache.rs (pending #66 follow-up wiring) - Legacy prompt constants in prompts.rs (backward compat) - with_mcp_tools and McpToolAdapter in registry.rs (pending MCP migration) - Fix rlm_query → rlm in when_not_to_use_sections_present test	2026-04-27 19:57:53 -05:00
Hunter Bown	8be032b6bd	style: cargo fmt after parallel stream merges	2026-04-27 19:43:58 -05:00
Hunter Bown	f10d5d1829	Merge branch 'feat/tool-polish'	2026-04-27 19:41:44 -05:00
Hunter Bown	7a06915b0b	feat(tools): approval cache + error taxonomy + defer_loading + command safety trim - Add fingerprint-based ApprovalCache with call-specific keys (patch hash, shell prefix, URL host) instead of tool-name keys. Session-keyed. - Add ClientError/StreamError enums in error_taxonomy.rs with Retry-After header support. Wire ErrorEnvelope into Event::Error. - Add defer_loading() default method to ToolSpec trait. McpToolAdapter returns true for non-discovery MCP tools. - Add with_mcp_tools() on ToolRegistryBuilder for unified pipeline. - Trim DANGEROUS_PATTERNS in command_safety.rs from 25→5 entries. Only rm -rf and fork bomb remain; chaining/substitution downgraded to RequiresApproval. Matches Codex's restraint. - ApprovalRequired events now carry approval_key for UI caching. TODO_BACKEND.md §1, §5	2026-04-27 19:40:49 -05:00
Hunter Bown	a32148dac9	Merge branch 'feat/prompts-restructure'	2026-04-27 19:34:27 -05:00
Hunter Bown	a345a956aa	feat(perf): wire frame-rate limiter + adaptive chunking low-motion mode - Wire 120 FPS FrameRateLimiter into run_event_loop via time_until_next_draw + mark_emitted - Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL - Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode - Add StreamingState::set_low_motion() to propagate to all block policies - Tool spinner already freezes on first frame when low_motion is set TODO_BACKEND.md §3, TODO_FIXES.md #4	2026-04-27 19:33:52 -05:00
Hunter Bown	6ef2421d61	feat(prompts): restructure into composable personality overlays Split monolithic agent.txt into: - base.md: core identity, toolbox, subagent.done protocol - personalities/calm.md + playful.md: voice overlays - modes/agent.md, plan.md, yolo.md: mode deltas - approvals/auto.md, suggest.md, never.md: approval-policy deltas - compact.md: 9-line compaction handoff template Add compose_prompt() in prompts.rs: base → personality → mode → approval. Add Personality enum with from_settings(). Preserve legacy .txt constants for backward compatibility. TODO_BACKEND.md §6	2026-04-27 19:33:52 -05:00
Hunter Bown	d4b9ccfdb3	feat(subagent): full registry inheritance + auto-approve + depth cap + cwd (#99 ) v0.6.6 — sub-agents inherit the parent's full tool registry, auto-approve, respect a depth cap, and propagate cancellation. Adds optional cwd to agent_spawn for parallel-worktree dispatch. Schema-ready for roles (full library lands in 0.6.7). Changes: - New ToolRegistryBuilder::with_full_agent_surface(...) shared by parent and child - SubAgentToolRegistry::new refactored to use shared builder; per-type allowlist becomes advisory - SubAgentRuntime gains auto_approve, spawn_depth, max_spawn_depth, cancel_token - Depth check at spawn entry; cancellation cascade via CancellationToken::child_token() - <deepseek:subagent.done> sentinel emitted on child completion - cwd: Option<PathBuf> on agent_spawn with workspace-boundary validation - Stream wall-clock cap bumped to 30 min (was 300s) - max_spawn_depth configurable via EngineConfig (default 3) - Version bump to 0.6.6 Closes #99.	2026-04-27 19:16:22 -05:00
Hunter Bown	2787cdc7b9	refactor(tools): drop parallel_fanout — rlm is the only RLM tool Two near-duplicate top-level tools made the surface confusing. With parallel_fanout (formerly rlm_query) removed, there's exactly one RLM shape: load a long input as `context` in a Python REPL via `rlm`, and let the sub-agent fan out from inside the REPL via `llm_query_batched` where it has `context` in scope to chunk against. For non-RLM parallel work the dispatcher already runs multiple tool calls per turn concurrently — no separate fan-out tool needed. The GenericToolCell.prompts rendering hook stays (one-row-per-child for any future fan-out tool), but no tool currently populates it. Also drops two stray test artifacts (rlm_catalog.md, rlm_test_doc.md) the model wrote to repo root during a previous live test session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:14:10 -05:00
Hunter Bown	9dd0d12cea	refactor(tools): rename rlm_process → rlm, rlm_query → parallel_fanout Two top-level tools shared the rlm_ prefix but did completely different things — rlm_query was a flat parallel-completion fan-out wearing an RLM-shaped name, and rlm_process was the actual recursive language model. The overlap was the source of the "our rlm query is completely wrong" confusion. rlm_process → rlm # single, honest name for the recursive tool rlm_query → parallel_fanout # honest name for the flat fanout Internal renames follow: Op::RlmQuery → Op::Rlm AppAction::RlmQuery → AppAction::Rlm handle_rlm_query → handle_rlm RlmProcessTool → RlmTool RlmQueryTool → ParallelFanoutTool RlmChildClient → FanoutChildClient with_rlm_process_tool → with_rlm_tool with_rlm_query_tool → with_parallel_fanout_tool The REPL helpers `rlm_query` / `rlm_query_batched` / `llm_query` / `llm_query_batched` keep their names — those are correctly named (they ARE recursive within the REPL) and the model knows them from the system prompt and metadata. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:10:17 -05:00
Hunter Bown	2865c9a766	refactor(rlm): drop HTTP sidecar — long-lived Python REPL over stdin/stdout The RLM tool used to spawn a fresh `python3 -c "..."` per round and route sub-LLM calls through a localhost axum sidecar; state persisted only via a JSON file (lossy: imports and non-JSON values were lost). The model could also short-circuit by replying with prose and the loop would ship the prose as if it came from the REPL. This commit replaces that with one long-lived `python3 -u` subprocess per turn driven by a stdin/stdout RPC protocol with UUID-prefixed sentinels. No more HTTP server, no more port allocation, no more JSON state file — variables, imports, and any other Python state persist naturally across rounds. The `RlmBridge` (`crates/tui/src/rlm/bridge.rs`) services `llm_query` / `llm_query_batched` / `rlm_query` / `rlm_query_batched` calls inline, recursing into `run_rlm_turn_inner` for sub-RLMs. The system prompt is tightened: the only legal turn shape is one ` ```repl ` block; calling `FINAL(...)` from prose without ever invoking a sub-LLM is rejected with a strict reminder. The `DirectAnswer` termination is gone, replaced by `NoCode` which only surfaces after multiple consecutive empty rounds. `rlm_process` now returns a per-round trace (code summary, sub-LLM call count, elapsed) so callers can verify the model actually engaged with `context` rather than guessing from the preview. Net: -313 lines. 17 new REPL runtime tests cover variable persistence, import persistence, RPC round-trips, FINAL capture, and error recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:44:53 -05:00
Hunter Bown	5cec1534be	feat(rlm): align with reference impl + add rlm_process tool; bump 0.6.5 The previous /rlm slash command flow had a UI rendering gap (the answer never made it back to the model's view) and required the user to invoke it manually. Pivoting to a tool-call surface and aligning the in-REPL helpers with the canonical reference (alexzhang13/rlm) by the paper authors so the same prompts and decomposition patterns transfer. New tool: rlm_process - crates/tui/src/tools/rlm_process.rs - Inputs: task (small, shown to root LLM each iter as root_prompt) + exactly one of file_path (workspace-relative, preferred) or content (inline, capped at 200k chars). Optional child_model and max_depth. - Loaded across Plan/Agent/YOLO; never deferred via ToolSearch. - Returns the final answer string + metadata (iterations, duration, tokens, termination). REPL surface aligned with reference (alexzhang13/rlm): - Variable name `context` (was PROMPT) - Code fence ```repl (was ```python; python/py kept as fallback) - Helpers: llm_query, llm_query_batched (NEW), rlm_query (was sub_rlm), rlm_query_batched (NEW), SHOW_VARS (NEW), FINAL, FINAL_VAR, repl_get/repl_set - Top-level JSON-serializable user variables auto-persist across rounds (no repl_set ceremony required) - FINAL(...) / FINAL_VAR(...) parseable from the model's raw response text (parse_text_final), in addition to the in-REPL sentinel path. Code-fenced occurrences are correctly ignored to prevent false hits. Sidecar (axum, 127.0.0.1:0): - Added POST /llm_batch and POST /rlm_batch endpoints (parallel fanout, cap 16 prompts per batch). Mirrors the reference's batched semantics. Other: - System prompt rewritten with reference's strategy patterns (PREVIEW → CHUNK+map-reduce via llm_query_batched → RECURSIVE decomposition via rlm_query → programmatic compute + LLM interp). - Strict termination loop unchanged: must emit ```repl or text-level FINAL each round; one fence-less round → reminder, two → DirectAnswer. - /rlm slash command remains for manual debug; description points the model toward rlm_process for the in-agent flow. Versions: workspace 0.6.4 → 0.6.5; npm wrapper 0.6.4 → 0.6.5. Gates green: cargo fmt, cargo clippy --all-targets --all-features --locked -D warnings, cargo test --workspace --all-features --locked (all pass), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS= -Dwarnings cargo doc --workspace --no-deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:17:09 -05:00
Hunter Bown	950a66c24a	chore(rlm): drop "Algorithm 1" from user-facing status strings Keep the paper reference in code/doc comments where it actually helps a future reader; the live status line just needs to say what's happening, not cite the citation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:51:17 -05:00
Hunter Bown	bd938a559c	fix(rlm): wire real recursive substrate; bump 0.6.4 The v0.6.3 RLM loop had Algorithm 1's outer shape but the substrate was non-functional: `llm_query()` was a Python stub that returned a hardcoded string and `child_model` was bound with an underscore prefix and silently dropped. The recursive sub-LLM call advertised by /rlm never fired. This commit wires the substrate end-to-end per Zhang/Kraska/Khattab (arXiv:2512.24601, Algorithm 1): - New axum HTTP sidecar (`rlm/sidecar.rs`) bound to 127.0.0.1:0 for the duration of one RLM turn. Python's `llm_query()` and `sub_rlm()` are real `urllib.request` POSTs; Rust services them via the existing DeepSeek client. Token usage from sidecar-served calls folds into the parent `RlmTurnResult.usage`. - `child_model` is plumbed through `Op::RlmQuery` → `AppAction::RlmQuery` → `run_rlm_turn` → sidecar handlers; default remains `deepseek-v4-flash`. - New `sub_rlm(prompt)` Python helper runs a full Algorithm-1 turn at depth-1 (paper's `sub_RLM`). Default `max_depth = 2` from `/rlm`. The recursive opaque-future cycle is broken by returning a concrete `Pin<Box<dyn Future + Send>>` from `run_rlm_turn_inner`. - Strict termination: the loop ends only via `FINAL(value)` (or the iteration cap). One fence-less round is tolerated with a reminder appended; two consecutive ones surface the model text as a `RlmTermination::DirectAnswer` exit. New `RlmTermination` enum lets callers tell `Final \| DirectAnswer \| Exhausted \| Error` apart. - Richer `Metadata(state)`: includes paper-required access patterns (`repl_get` / slicing / `splitlines` / `repl_set` / `llm_query` / `sub_rlm` / `FINAL`) and a live list of variable keys currently in the REPL state file. - Unicode-safe `truncate_text` (was mixing bytes with chars), per-turn state-file cleanup, `ROOM_TEMPERATURE` typo → `ROOT_TEMPERATURE`. - New end-to-end test `sidecar_url_is_exported_to_python_env` stands up a stand-in axum server, runs `print(llm_query('hello'))` in the real PythonRuntime, and asserts the reply round-trips. Catches future regressions in sidecar URL passthrough. Versions: workspace 0.6.3 → 0.6.4 in Cargo.toml; npm wrapper 0.6.3 → 0.6.4 in npm/deepseek-tui/package.json. Gates: cargo fmt, cargo clippy --all-targets --all-features --locked -D warnings, cargo test --workspace --all-features --locked (1088 passed), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:36:59 -05:00
Hunter Bown	e8c3e7d1bf	docs: fix rustdoc release warnings	2026-04-26 23:43:31 -05:00
Hunter Bown	bb3c460358	chore: satisfy v0.6.3 release gates	2026-04-26 23:39:03 -05:00
Hunter Bown	42c684367f	feat(rlm): implement true RLM loop per Algorithm 1 (Zhang et al., arXiv:2512.24601) Adds the true Recursive Language Model (RLM) inference paradigm: - rlm/mod.rs — module root with public API - rlm/prompt.rs — RLM system prompt teaching the model to write code - rlm/turn.rs — Algorithm 1 implementation: - P stored as REPL variable (NEVER in LLM context window) - Metadata-only context sent to root LLM (constant-size) - LLM generates Python code, not free text - Code executed in PythonRuntime with llm_query() for recursion - FINAL() detection ends the loop - Op::RlmQuery variant in ops.rs - /rlm command in the command system - AppAction::RlmQuery handler in ui.rs - PythonRuntime::with_state_path made public for RLM integration - 18 new unit tests for code extraction, metadata building, truncation Key differences from previous 'RLM-inspired' approach: ✅ P is external (REPL variable), not in LLM context ✅ Only metadata(state) in LLM context (constant-size) ✅ LLM generates code, not free text + tool calls ✅ sub-LLM recursion via llm_query() inside REPL code ✅ FINAL() mechanism for programmatic termination	2026-04-26 23:34:17 -05:00
Hunter Bown	ac8a882be5	chore: clean v0.6.3 repl build warnings	2026-04-26 23:12:57 -05:00
Hunter Bown	4e46fd06f6	feat(repl): wire PythonRuntime into engine turn loop (Phase 2) After the assistant message is persisted, when tool_uses is empty, check for inline ```repl blocks and execute them via PythonRuntime: - Extract REPL blocks from assistant text - Spawn PythonRuntime and execute each block sequentially - If a round returns FINAL: replace the assistant message text with the final value and break the turn - If no FINAL: append truncated stdout/stderr as user feedback and continue the turn loop for iterative refinement - Emit status events so the user sees 'REPL round N: ...' in the UI All 26 REPL tests + RLM tests pass. Release build verified. Refs: paper-spec RLM (Zhang et al., arXiv:2512.24601) §2	2026-04-26 18:54:46 -05:00
Hunter Bown	2fcc637d4f	Merge pull request #118 from Hmbown/fix/issue-115-context-percent fix(tui): context-usage % no longer drops after multi-round turns (#115)	2026-04-26 18:01:14 -05:00
Hunter Bown	b7c5cb4112	test(ui): update auto-compact test for #115 estimate-first behavior	2026-04-26 17:55:56 -05:00
Hunter Bown	60f5f39584	Merge pull request #109 from Hmbown/feat/issue-85-pending-input-preview feat(tui): pending-input preview widget (#85 Phase 1)	2026-04-26 17:54:35 -05:00

1 2 3 4 5 ...

292 Commits