codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	95d6937d34	Merge branch 'feat/v066-steer-queue' #122 — Esc-to-steer + queue visibility. SubmitDisposition state machine, pending_steers/rejected_steers buckets, partial-output [interrupted] save.	2026-04-27 21:10:41 -05:00
Hunter Bown	f95be44bc8	Merge branch 'feat/v066-cycle-restart' #124 + #127 — checkpoint-restart context cycles + recall_archive tool.	2026-04-27 21:10:37 -05:00
Hunter Bown	2b2bddcf7e	style: cargo fmt + clippy fixes for v0.6.6 UI redesign Run `cargo fmt --all` after the four redesign sub-areas land to settle attribute placement (`#[allow(dead_code)]` lives after doc comments, not between them — interleaving was splitting docs from items). Inline the trailing `let dom = …; dom` in `nearest_ansi16` to satisfy clippy::let_and_return. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:08:29 -05:00
Hunter Bown	aeba004c7b	feat(tui): tool-card verb glyphs + family vocabulary module Sub-area #3 of the v0.6.6 UI redesign (issue #121). Introduces `crates/tui/src/tui/widgets/tool_card.rs` — a small, self-contained vocabulary module that owns: - `ToolFamily` enum (Read / Patch / Run / Find / Delegate / Fanout / Think / Generic) and the verb-glyph + label per family (▷ read, ◆ patch, ▶ run, ⌕ find, ◐ delegate, ⋮⋮ fanout, … think) - `tool_family_for_title` — maps the legacy header titles (`"Shell"`, `"Patch"`, `"Workspace"`, `"Search"`, `"Diff"`, `"Image"`) to a family, so existing call sites pick up the new glyph without re-architecture - `tool_family_for_name` — maps actual tool names (`agent_spawn`, `apply_patch`, etc.) for `GenericToolCell`, which shares the catch-all `"Tool"` title across every model-facing tool - `CardRail` + `rail_glyph` — the `╭ │ ╰` rail vocabulary, declared here so any future per-card refactor has the matching glyphs Wires the verb glyph + label into `render_tool_header` and adds a `render_tool_header_with_family` overload so `GenericToolCell` can route by tool name rather than the generic title. The header now reads `<spinner> <verb-glyph> <verb> <state>` instead of `<spinner> <Title-Case-Word> <state>`. Existing parity tests for ExecCell / PlanUpdate are updated to assert against the new header structure (verb + glyph) — the colour wiring is unchanged. New tests pin the verb-glyph format end-to-end: `agent_spawn` → `◐ delegate`, exec → `▶ run`. Spinner cadence (TOOL_STATUS_SYMBOL_MS = 720 ms) is unchanged — the spec already matched. Deferred to a follow-up: full per-card rail (`╭ │ ╰`) refactor that threads `CardRail` through every cell render path. The vocabulary is in place; the layout pass is the next bite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 21:04:23 -05:00
Hunter Bown	7c3a01c7b8	feat(footer): crest waves + cost on left + 150 ms tick cadence Sub-area #5 of the v0.6.6 UI redesign (issue #121). Spout strip: - Replace box-drawing arc cups (`╭───╮`) with paired crests (`⌒‿`) over the existing water-surface (`─`). Crests read as gentle ripples instead of hard architectural arches — calmer eye-feel. - Two crests at independent cadences: A advances every 4 ticks (~600 ms), B every 6 ticks (~900 ms). Phase jitter every 17 ticks (~2.5 s) keeps the pattern from settling into a strict beat. - Frame counter cadence in `ui.rs` retimed from 80 ms to 150 ms so the 4×6×17 tick math lands on the spec'd timings. Footer left cluster consolidates "what costs you what": - Cost chip moves from the right-hand parade to the left, between model and status: `mode · model · cost · status`. - Priority drop is now status → cost → truncate model → mode-only. Cost outranks status because it's steady info; status is a transient signal. - Right cluster shrinks to coherence / agents / replay / cache. Tests: existing strip determinism / position-advances tests are retuned for the new tick math (12-tick window covers both crests). New tests pin the cost slot's order on wide widths and confirm cost survives status drop in the priority cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:59:07 -05:00
Hunter Bown	764aed65ed	feat(tui): #122 Esc-to-steer + queue visibility While a turn is running, Esc with non-empty composer input now steers: the typed text is captured, the in-flight HTTP request is cancelled, and on TurnComplete::Interrupted every accumulated steer is merged into a single fresh user message that re-enters the engine. Empty-input Esc still cancels exactly as before. State machine. SubmitDisposition::{Immediate, Queue, Steer} replaces the implicit if/else in submit_or_steer_message; truth table preserves the offline_mode + busy fallback path. App.pending_steers / rejected_steers / submit_pending_steers_after_interrupt back the new flow and the queue-visibility widget. Partial save. Deliberate divergence from openai/codex which discards on abort: V4 thinking is expensive, so the streaming Assistant cell is tagged '[interrupted] …' (or '[interrupted]' when nothing streamed yet) and its spinner is flipped off. The TurnComplete handler also calls the helper so Ctrl+C / network failures get the same treatment, idempotent with the optimistic call in the Esc handler. Queue visibility. PendingInputPreview already supported all three buckets; build_pending_input_preview now populates pending_steers and rejected_steers alongside queued_messages. rejected_steers stays empty under today's engine paths (no rejection signal yet) but renders if/when populated. Recovery. If TurnComplete arrives with Failed instead of Interrupted while pending_steers is non-empty, the steers are demoted to the visible queue so they're not silently lost. Tests. 13 new app-level units cover the disposition truth table, push/drain semantics, double-Esc idempotency, and the partial-save helper. 11 new ui-level units cover Esc-action routing, slash-menu priority, whitespace-only input handling, merge_pending_steers (empty / single / multi / skill-instruction), and the three-bucket preview. Closes #122.	2026-04-27 20:59:00 -05:00
Hunter Bown	2b0e73a4cf	feat(tui): reasoning cells get dashed rail, italic body, warm tint Sub-area #2 of the v0.6.6 UI redesign (issue #121). Reasoning is the only deliberately-warm element in the redesigned transcript. The treatment makes that visible: - Header opener becomes `…` (slow exhale) instead of the spinner glyph - Body left rail switches from solid `▏` to dashed `╎` so it visibly differs from message body and tool output rails - Body text carries an italic modifier - Body lines tint with `palette::reasoning_surface_tint(depth)` — 12% blend of SURFACE_REASONING (#362C1A) over DEEPSEEK_INK. ANSI-16 terminals get no bg (the named-color mapping would distort the warm) - A trailing `▎` cursor in ACCENT_REASONING_LIVE follows the most recent body line during streaming, suppressed under low_motion Wires up palette helpers from the prior commit: `ColorDepth::detect`, `reasoning_surface_tint`, `blend`. SURFACE_REASONING is no longer dead-coded. The unused `thinking_symbol` helper is removed since the new header doesn't spin. Tests: dashed rail and italic body land on every body line; streaming cursor appears only when motion is allowed; collapsed-summary affordance keeps working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:52:59 -05:00
Hunter Bown	10bc2480db	feat(core): #124 + #127 — checkpoint-restart cycles + recall_archive Replaces lossy summarization compaction with a checkpoint-restart architecture (#124). At 110K cumulative tokens (per V4's 128K retrieval elbow) the engine runs a briefing turn, archives the cycle to JSONL at ~/.deepseek/sessions/<id>/cycles/<n>.jsonl, then resets the in-memory buffer to a fresh context: original system prompt + structured state (plan/todos/working-set/sub-agents) + the model-curated <carry_forward> briefing (~3K token cap, hard-bounded). The compaction summarizer is now off by default. Per-model thresholds in [cycle.per_model] let operators tune deepseek-v4-pro vs -flash separately. Phase guard in should_advance_cycle blocks mid-tool/stream/approval boundaries; engine only invokes at clean turn-completed events. Sub-agents are not awaited — their handles are captured in the structured-state block so the new cycle sees them still running. Adds the recall_archive tool (#127) — BM25 over message text in archived cycles, top-N hits with cycle/index/excerpt. Always-loaded across modes via should_default_defer_tool so the agent doesn't need ToolSearch to discover it. Children inherit it via with_full_agent_surface. UI surfaces: - /cycles, /cycle <n>, /recall <query> slash commands - Sidebar shows cycle counter once a boundary fires - CycleAdvanced engine event carries the full briefing so the UI can populate app.cycle_briefings for /cycle <n> - runtime_threads schema bumped to v2 (cycle.advanced events appear in the durable timeline; load rejects future versions) Tests: 21 cycle_manager + 13 recall_archive + 4 commands::cycle. All 1168 workspace tests pass. Three parity gates pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:52:29 -05:00
Hunter Bown	5b7ff8cb69	feat(tui): replace literal speaker labels with calm glyphs Sub-area #1 of the v0.6.6 UI redesign (issue #121). User cells now lead with `▎` (solid bar, no animation — input is finished). Assistant cells lead with `●`. While streaming, the bullet pulses on a 2-second cycle between 30%..100% brightness via `palette::pulse_brightness`; once a turn completes the bullet sits at full DEEPSEEK_SKY so finished history reads as solid. Honors `low_motion`: the pulse is suppressed and the glyph holds full brightness regardless of streaming state. Pager / clipboard exports (`transcript_lines`) also skip the pulse so screenshots are stable. Existing pager titles in `ui.rs` (`history_cell_to_text`) keep the literal "You" / "Assistant" wording — those drive the modal title bar and read better as words than as glyphs. Tests: glyphs replace the literal labels in both User and Assistant cells; streaming pulse demonstrably dips below source brightness; idle and low_motion both pin to source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:50:24 -05:00
Hunter Bown	9358b92af5	feat(palette): color-depth fallback + brightness pulse helpers Adds the foundation for the v0.6.6 UI redesign (issue #121): - ColorDepth enum with detect() reading COLORTERM/TERM - adapt_color / adapt_bg gates for ANSI-16 terminals (drops bg tints rather than coarse-mapping them, which would distort the palette) - blend(fg, bg, alpha) for alpha compositing on RGB - reasoning_surface_tint() — 12% blend over DEEPSEEK_INK (None on ANSI-16) - pulse_brightness(color, now_ms) — 30%..100% sine swing on a 2s cycle - nearest_ansi16() helper for legacy-terminal foreground mapping Helpers carry #[allow(dead_code)] until the per-area redesign sub-tasks wire them in (speaker pulse, reasoning treatment, footer cost cluster). Tests cover envelope bounds and brand-color routing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 20:47:27 -05:00
Hunter Bown	e075ecd0fe	chore: add #[allow(dead_code)] for v0.6.6 additions not yet wired - ClientError, StreamError, and their impl blocks in error_taxonomy.rs - ApprovalCache in approval_cache.rs (pending #66 follow-up wiring) - Legacy prompt constants in prompts.rs (backward compat) - with_mcp_tools and McpToolAdapter in registry.rs (pending MCP migration) - Fix rlm_query → rlm in when_not_to_use_sections_present test	2026-04-27 19:57:53 -05:00
Hunter Bown	8be032b6bd	style: cargo fmt after parallel stream merges	2026-04-27 19:43:58 -05:00
Hunter Bown	f10d5d1829	Merge branch 'feat/tool-polish'	2026-04-27 19:41:44 -05:00
Hunter Bown	7a06915b0b	feat(tools): approval cache + error taxonomy + defer_loading + command safety trim - Add fingerprint-based ApprovalCache with call-specific keys (patch hash, shell prefix, URL host) instead of tool-name keys. Session-keyed. - Add ClientError/StreamError enums in error_taxonomy.rs with Retry-After header support. Wire ErrorEnvelope into Event::Error. - Add defer_loading() default method to ToolSpec trait. McpToolAdapter returns true for non-discovery MCP tools. - Add with_mcp_tools() on ToolRegistryBuilder for unified pipeline. - Trim DANGEROUS_PATTERNS in command_safety.rs from 25→5 entries. Only rm -rf and fork bomb remain; chaining/substitution downgraded to RequiresApproval. Matches Codex's restraint. - ApprovalRequired events now carry approval_key for UI caching. TODO_BACKEND.md §1, §5	2026-04-27 19:40:49 -05:00
Hunter Bown	a32148dac9	Merge branch 'feat/prompts-restructure'	2026-04-27 19:34:27 -05:00
Hunter Bown	a345a956aa	feat(perf): wire frame-rate limiter + adaptive chunking low-motion mode - Wire 120 FPS FrameRateLimiter into run_event_loop via time_until_next_draw + mark_emitted - Add low_motion support: 30 FPS cap via LOW_MOTION_MIN_FRAME_INTERVAL - Add AdaptiveChunkingPolicy::set_low_motion() to force Smooth mode - Add StreamingState::set_low_motion() to propagate to all block policies - Tool spinner already freezes on first frame when low_motion is set TODO_BACKEND.md §3, TODO_FIXES.md #4	2026-04-27 19:33:52 -05:00
Hunter Bown	6ef2421d61	feat(prompts): restructure into composable personality overlays Split monolithic agent.txt into: - base.md: core identity, toolbox, subagent.done protocol - personalities/calm.md + playful.md: voice overlays - modes/agent.md, plan.md, yolo.md: mode deltas - approvals/auto.md, suggest.md, never.md: approval-policy deltas - compact.md: 9-line compaction handoff template Add compose_prompt() in prompts.rs: base → personality → mode → approval. Add Personality enum with from_settings(). Preserve legacy .txt constants for backward compatibility. TODO_BACKEND.md §6	2026-04-27 19:33:52 -05:00
Hunter Bown	d4b9ccfdb3	feat(subagent): full registry inheritance + auto-approve + depth cap + cwd (#99 ) v0.6.6 — sub-agents inherit the parent's full tool registry, auto-approve, respect a depth cap, and propagate cancellation. Adds optional cwd to agent_spawn for parallel-worktree dispatch. Schema-ready for roles (full library lands in 0.6.7). Changes: - New ToolRegistryBuilder::with_full_agent_surface(...) shared by parent and child - SubAgentToolRegistry::new refactored to use shared builder; per-type allowlist becomes advisory - SubAgentRuntime gains auto_approve, spawn_depth, max_spawn_depth, cancel_token - Depth check at spawn entry; cancellation cascade via CancellationToken::child_token() - <deepseek:subagent.done> sentinel emitted on child completion - cwd: Option<PathBuf> on agent_spawn with workspace-boundary validation - Stream wall-clock cap bumped to 30 min (was 300s) - max_spawn_depth configurable via EngineConfig (default 3) - Version bump to 0.6.6 Closes #99.	2026-04-27 19:16:22 -05:00
Hunter Bown	2787cdc7b9	refactor(tools): drop parallel_fanout — rlm is the only RLM tool Two near-duplicate top-level tools made the surface confusing. With parallel_fanout (formerly rlm_query) removed, there's exactly one RLM shape: load a long input as `context` in a Python REPL via `rlm`, and let the sub-agent fan out from inside the REPL via `llm_query_batched` where it has `context` in scope to chunk against. For non-RLM parallel work the dispatcher already runs multiple tool calls per turn concurrently — no separate fan-out tool needed. The GenericToolCell.prompts rendering hook stays (one-row-per-child for any future fan-out tool), but no tool currently populates it. Also drops two stray test artifacts (rlm_catalog.md, rlm_test_doc.md) the model wrote to repo root during a previous live test session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:14:10 -05:00
Hunter Bown	9dd0d12cea	refactor(tools): rename rlm_process → rlm, rlm_query → parallel_fanout Two top-level tools shared the rlm_ prefix but did completely different things — rlm_query was a flat parallel-completion fan-out wearing an RLM-shaped name, and rlm_process was the actual recursive language model. The overlap was the source of the "our rlm query is completely wrong" confusion. rlm_process → rlm # single, honest name for the recursive tool rlm_query → parallel_fanout # honest name for the flat fanout Internal renames follow: Op::RlmQuery → Op::Rlm AppAction::RlmQuery → AppAction::Rlm handle_rlm_query → handle_rlm RlmProcessTool → RlmTool RlmQueryTool → ParallelFanoutTool RlmChildClient → FanoutChildClient with_rlm_process_tool → with_rlm_tool with_rlm_query_tool → with_parallel_fanout_tool The REPL helpers `rlm_query` / `rlm_query_batched` / `llm_query` / `llm_query_batched` keep their names — those are correctly named (they ARE recursive within the REPL) and the model knows them from the system prompt and metadata. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:10:17 -05:00
Hunter Bown	2865c9a766	refactor(rlm): drop HTTP sidecar — long-lived Python REPL over stdin/stdout The RLM tool used to spawn a fresh `python3 -c "..."` per round and route sub-LLM calls through a localhost axum sidecar; state persisted only via a JSON file (lossy: imports and non-JSON values were lost). The model could also short-circuit by replying with prose and the loop would ship the prose as if it came from the REPL. This commit replaces that with one long-lived `python3 -u` subprocess per turn driven by a stdin/stdout RPC protocol with UUID-prefixed sentinels. No more HTTP server, no more port allocation, no more JSON state file — variables, imports, and any other Python state persist naturally across rounds. The `RlmBridge` (`crates/tui/src/rlm/bridge.rs`) services `llm_query` / `llm_query_batched` / `rlm_query` / `rlm_query_batched` calls inline, recursing into `run_rlm_turn_inner` for sub-RLMs. The system prompt is tightened: the only legal turn shape is one ` ```repl ` block; calling `FINAL(...)` from prose without ever invoking a sub-LLM is rejected with a strict reminder. The `DirectAnswer` termination is gone, replaced by `NoCode` which only surfaces after multiple consecutive empty rounds. `rlm_process` now returns a per-round trace (code summary, sub-LLM call count, elapsed) so callers can verify the model actually engaged with `context` rather than guessing from the preview. Net: -313 lines. 17 new REPL runtime tests cover variable persistence, import persistence, RPC round-trips, FINAL capture, and error recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:44:53 -05:00
Hunter Bown	5cec1534be	feat(rlm): align with reference impl + add rlm_process tool; bump 0.6.5 The previous /rlm slash command flow had a UI rendering gap (the answer never made it back to the model's view) and required the user to invoke it manually. Pivoting to a tool-call surface and aligning the in-REPL helpers with the canonical reference (alexzhang13/rlm) by the paper authors so the same prompts and decomposition patterns transfer. New tool: rlm_process - crates/tui/src/tools/rlm_process.rs - Inputs: task (small, shown to root LLM each iter as root_prompt) + exactly one of file_path (workspace-relative, preferred) or content (inline, capped at 200k chars). Optional child_model and max_depth. - Loaded across Plan/Agent/YOLO; never deferred via ToolSearch. - Returns the final answer string + metadata (iterations, duration, tokens, termination). REPL surface aligned with reference (alexzhang13/rlm): - Variable name `context` (was PROMPT) - Code fence ```repl (was ```python; python/py kept as fallback) - Helpers: llm_query, llm_query_batched (NEW), rlm_query (was sub_rlm), rlm_query_batched (NEW), SHOW_VARS (NEW), FINAL, FINAL_VAR, repl_get/repl_set - Top-level JSON-serializable user variables auto-persist across rounds (no repl_set ceremony required) - FINAL(...) / FINAL_VAR(...) parseable from the model's raw response text (parse_text_final), in addition to the in-REPL sentinel path. Code-fenced occurrences are correctly ignored to prevent false hits. Sidecar (axum, 127.0.0.1:0): - Added POST /llm_batch and POST /rlm_batch endpoints (parallel fanout, cap 16 prompts per batch). Mirrors the reference's batched semantics. Other: - System prompt rewritten with reference's strategy patterns (PREVIEW → CHUNK+map-reduce via llm_query_batched → RECURSIVE decomposition via rlm_query → programmatic compute + LLM interp). - Strict termination loop unchanged: must emit ```repl or text-level FINAL each round; one fence-less round → reminder, two → DirectAnswer. - /rlm slash command remains for manual debug; description points the model toward rlm_process for the in-agent flow. Versions: workspace 0.6.4 → 0.6.5; npm wrapper 0.6.4 → 0.6.5. Gates green: cargo fmt, cargo clippy --all-targets --all-features --locked -D warnings, cargo test --workspace --all-features --locked (all pass), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS= -Dwarnings cargo doc --workspace --no-deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:17:09 -05:00
Hunter Bown	950a66c24a	chore(rlm): drop "Algorithm 1" from user-facing status strings Keep the paper reference in code/doc comments where it actually helps a future reader; the live status line just needs to say what's happening, not cite the citation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:51:17 -05:00
Hunter Bown	bd938a559c	fix(rlm): wire real recursive substrate; bump 0.6.4 The v0.6.3 RLM loop had Algorithm 1's outer shape but the substrate was non-functional: `llm_query()` was a Python stub that returned a hardcoded string and `child_model` was bound with an underscore prefix and silently dropped. The recursive sub-LLM call advertised by /rlm never fired. This commit wires the substrate end-to-end per Zhang/Kraska/Khattab (arXiv:2512.24601, Algorithm 1): - New axum HTTP sidecar (`rlm/sidecar.rs`) bound to 127.0.0.1:0 for the duration of one RLM turn. Python's `llm_query()` and `sub_rlm()` are real `urllib.request` POSTs; Rust services them via the existing DeepSeek client. Token usage from sidecar-served calls folds into the parent `RlmTurnResult.usage`. - `child_model` is plumbed through `Op::RlmQuery` → `AppAction::RlmQuery` → `run_rlm_turn` → sidecar handlers; default remains `deepseek-v4-flash`. - New `sub_rlm(prompt)` Python helper runs a full Algorithm-1 turn at depth-1 (paper's `sub_RLM`). Default `max_depth = 2` from `/rlm`. The recursive opaque-future cycle is broken by returning a concrete `Pin<Box<dyn Future + Send>>` from `run_rlm_turn_inner`. - Strict termination: the loop ends only via `FINAL(value)` (or the iteration cap). One fence-less round is tolerated with a reminder appended; two consecutive ones surface the model text as a `RlmTermination::DirectAnswer` exit. New `RlmTermination` enum lets callers tell `Final \| DirectAnswer \| Exhausted \| Error` apart. - Richer `Metadata(state)`: includes paper-required access patterns (`repl_get` / slicing / `splitlines` / `repl_set` / `llm_query` / `sub_rlm` / `FINAL`) and a live list of variable keys currently in the REPL state file. - Unicode-safe `truncate_text` (was mixing bytes with chars), per-turn state-file cleanup, `ROOM_TEMPERATURE` typo → `ROOT_TEMPERATURE`. - New end-to-end test `sidecar_url_is_exported_to_python_env` stands up a stand-in axum server, runs `print(llm_query('hello'))` in the real PythonRuntime, and asserts the reply round-trips. Catches future regressions in sidecar URL passthrough. Versions: workspace 0.6.3 → 0.6.4 in Cargo.toml; npm wrapper 0.6.3 → 0.6.4 in npm/deepseek-tui/package.json. Gates: cargo fmt, cargo clippy --all-targets --all-features --locked -D warnings, cargo test --workspace --all-features --locked (1088 passed), parity_protocol/parity_state/snapshot, RUSTDOCFLAGS=-Dwarnings cargo doc --workspace --no-deps — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:36:59 -05:00
Hunter Bown	e8c3e7d1bf	docs: fix rustdoc release warnings	2026-04-26 23:43:31 -05:00
Hunter Bown	bb3c460358	chore: satisfy v0.6.3 release gates	2026-04-26 23:39:03 -05:00
Hunter Bown	42c684367f	feat(rlm): implement true RLM loop per Algorithm 1 (Zhang et al., arXiv:2512.24601) Adds the true Recursive Language Model (RLM) inference paradigm: - rlm/mod.rs — module root with public API - rlm/prompt.rs — RLM system prompt teaching the model to write code - rlm/turn.rs — Algorithm 1 implementation: - P stored as REPL variable (NEVER in LLM context window) - Metadata-only context sent to root LLM (constant-size) - LLM generates Python code, not free text - Code executed in PythonRuntime with llm_query() for recursion - FINAL() detection ends the loop - Op::RlmQuery variant in ops.rs - /rlm command in the command system - AppAction::RlmQuery handler in ui.rs - PythonRuntime::with_state_path made public for RLM integration - 18 new unit tests for code extraction, metadata building, truncation Key differences from previous 'RLM-inspired' approach: ✅ P is external (REPL variable), not in LLM context ✅ Only metadata(state) in LLM context (constant-size) ✅ LLM generates code, not free text + tool calls ✅ sub-LLM recursion via llm_query() inside REPL code ✅ FINAL() mechanism for programmatic termination	2026-04-26 23:34:17 -05:00
Hunter Bown	ac8a882be5	chore: clean v0.6.3 repl build warnings	2026-04-26 23:12:57 -05:00
Hunter Bown	4e46fd06f6	feat(repl): wire PythonRuntime into engine turn loop (Phase 2) After the assistant message is persisted, when tool_uses is empty, check for inline ```repl blocks and execute them via PythonRuntime: - Extract REPL blocks from assistant text - Spawn PythonRuntime and execute each block sequentially - If a round returns FINAL: replace the assistant message text with the final value and break the turn - If no FINAL: append truncated stdout/stderr as user feedback and continue the turn loop for iterative refinement - Emit status events so the user sees 'REPL round N: ...' in the UI All 26 REPL tests + RLM tests pass. Release build verified. Refs: paper-spec RLM (Zhang et al., arXiv:2512.24601) §2	2026-04-26 18:54:46 -05:00
Hunter Bown	2fcc637d4f	Merge pull request #118 from Hmbown/fix/issue-115-context-percent fix(tui): context-usage % no longer drops after multi-round turns (#115)	2026-04-26 18:01:14 -05:00
Hunter Bown	b7c5cb4112	test(ui): update auto-compact test for #115 estimate-first behavior	2026-04-26 17:55:56 -05:00
Hunter Bown	60f5f39584	Merge pull request #109 from Hmbown/feat/issue-85-pending-input-preview feat(tui): pending-input preview widget (#85 Phase 1)	2026-04-26 17:54:35 -05:00
Hunter Bown	6fc4680b91	Merge pull request #117 from Hmbown/feat/issue-92-clipboard-image feat(tui): clipboard image paste (#92)	2026-04-26 17:52:56 -05:00
Hunter Bown	28e8d70ffc	chore(release): bump workspace + npm wrapper to 0.6.3	2026-04-26 17:52:29 -05:00
Hunter Bown	8d33b92f82	style(tests): rustfmt collapses short let-binding pairs	2026-04-26 17:51:23 -05:00
Hunter Bown	e31d6db3ee	style(pending_input_preview): single-line fmt for short literal arrays	2026-04-26 17:50:53 -05:00
Hunter Bown	2378fbc26f	feat(tui): pending-input preview widget (#85 Phase 1) Port of codex-rs's `bottom_pane/pending_input_preview.rs` for the queued / pending steer / rejected steer surface. Phase 1 ships the widget + 7 unit tests in isolation so reviewers can evaluate the rendering decisions without also reviewing the composer-area integration. Phase 2 wires it into `ui.rs` and threads the `pending_steers` / `rejected_steers` fields onto `App`. The widget renders three semantic buckets when any are non-empty: • Messages to be submitted after next tool call (press Esc to send now) ↳ <pending steer> ↳ <pending steer> • Messages to be submitted at end of turn ↳ <rejected steer> • Queued follow-up inputs ↳ <queued message> Alt+↑ edit last queued message Items truncate to 3 visible rows with a `…` overflow indicator. Long URL-like tokens emit on their own row instead of fanning out into junk ellipsis rows (regression test included). Empty state renders zero rows so the composer doesn't gain wasted height when nothing is queued. Refs #85. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:50:16 -05:00
Hunter Bown	32a977e58e	Merge pull request #116 from Hmbown/feat/issue-97-fuzzy-picker feat(tui): Ctrl+P fuzzy file picker (#97)	2026-04-26 17:49:41 -05:00
Hunter Bown	6ba43e749a	Merge pull request #114 from Hmbown/fix/issue-103-stream-retry fix(engine): transparent retry on stream death with no content (#103 Phase 3)	2026-04-26 17:49:38 -05:00
Hunter Bown	904ec869a6	Merge pull request #112 from Hmbown/feat/issue-96-pager-search feat(pager): match highlighting + counter no-clip + Esc clears (#96)	2026-04-26 17:49:33 -05:00
Hunter Bown	a326ef2891	fix(tui): context-usage % no longer drops after multi-round turns (#115 ) User reported: "the context % at the top is pretty inconsistent — like I just had a message where it was 31% then I sent another message and it went to 9%? not sure how that works......" Root cause: `context_usage_snapshot` preferred `app.last_prompt_tokens` (reported, from `Event::TurnComplete.usage`) over the estimate computed from `app.api_messages`. The engine populates that usage via `turn.add_usage`, which SUMS `input_tokens` across every round in a turn: ``` pub fn add_usage(&mut self, usage: &Usage) { self.usage.input_tokens += usage.input_tokens; ... } ``` So a multi-round tool-call turn reports a value much larger than the actual context window state (e.g., 200k from round 1 + 210k from round 2 = 410k displayed as 31% of 1M), then the next single-round turn drops back to a single round's input_tokens (e.g., 90k displayed as 9%). Fix: prefer the estimate, which is computed from the current `api_messages` and is monotonic wrt conversation growth. Reported tokens fall back only when no estimate is available (e.g., immediately after a session restore). Also clamp `used` to the model's context window so the ratio never exceeds 100%. `is_reported_context_inflated` is no longer in the primary path; kept behind `#[allow(dead_code)]` because existing tests still exercise it and a future heuristic may want to distinguish "obviously inflated reported tokens" from healthy reports. Regression test `context_usage_does_not_drop_when_reported_shrinks_after_multi_round_turn` exercises the exact 31% → 9% scenario the user hit. Fixes #115. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:48:57 -05:00
Hunter Bown	5b0af72c3d	feat(tui): clipboard image paste (#92 ) Save clipboard images as PNG under ~/.deepseek/clipboard-images/ instead of PPM in the workspace, and surface dimensions + size in the composer's [Attached image: WxH PNG (NkB) at <path>] token plus the post-paste status hint. DeepSeek V4 does not currently accept inline image input on its Chat Completions endpoint, so we materialize the bytes to disk and let the model reach them via the existing file tools rather than base64-embedding them in the request. Adds the `image` crate (PNG-only feature; already pulled in transitively via arboard, so no compile-time delta) plus unit tests covering PNG header round-trip and label formatting. Fixes #92 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:47:34 -05:00
Hunter Bown	07dc8f2037	feat(tui): Ctrl+P fuzzy file picker (#97 ) Adds a modal overlay (`FilePickerView`) bound to Ctrl+P from the composer when no other modal is open and the engine is not streaming. * Single-pass `WalkBuilder` walk at construction (depth 6, hidden=true, follow_links=false, .gitignore honored) caches workspace-relative paths so per-keystroke filtering is fully in-memory. * Custom subsequence scorer with start/boundary bonuses, consecutive-run reward, and gap penalty. ~70 lines, no new crate dependency. * Up/Down + PgUp/PgDn navigate; Backspace and Ctrl+U edit the query; Enter emits `ViewEvent::FilePickerSelected` which the UI handler inserts at the composer cursor as `@<path>` (with surrounding spaces so the existing `@`-mention parser picks it up); Esc closes without modifying the composer. * Ten unit tests cover the scorer (subsequence / boundary / case / empty-query edge cases) and the view (typing narrows, backspace widens, Enter emits, Esc closes, `.ignore` is honored). Fixes #97 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:45:14 -05:00
Hunter Bown	6a13ce8c29	style(pager): single-line fmt for highlight Style chain	2026-04-26 17:40:42 -05:00
Hunter Bown	e5954160fb	Merge pull request #111 from Hmbown/feat/issue-89-key-hint-rendering feat(tui): terminal-aware keybinding rendering (#89)	2026-04-26 17:39:57 -05:00
Hunter Bown	54aaba1876	Merge pull request #110 from Hmbown/feat/issue-90-quit-confirmation feat(tui): two-tap Ctrl+C quit confirmation with 2s countdown	2026-04-26 17:39:54 -05:00
Hunter Bown	633379be0e	Merge pull request #108 from Hmbown/feat/issue-91-external-editor feat(tui): external editor support (Ctrl+E)	2026-04-26 17:39:52 -05:00
Hunter Bown	9dd4f65af2	Merge pull request #107 from Hmbown/feat/issue-88-footer-collapse feat(tui): priority-ordered footer hint dropping for narrow terminals	2026-04-26 17:39:49 -05:00
Hunter Bown	44f3b2cae5	fix(engine): transparent retry on stream death with no content (#103 Phase 3) When the chunked-transfer connection to DeepSeek dies mid-stream — the "Stream read error: error decoding response body" symptom — the engine previously surfaced the error to the user and ended the turn as Failed, even when no useful content had been received. The user's only recourse was to manually re-send the same message. Phase 3 closes that loop. After the inner stream-consumption loop ends, detect "stream died with nothing actionable": - stream_errors > 0 (the stream errored at some point) - tool_uses.is_empty() (no tool call landed) - current_text_visible is empty/whitespace - current_thinking is empty/whitespace - !pending_message_complete If all hold AND stream_retry_attempts < MAX_STREAM_RETRIES (3), silently re-issue the SAME outer-loop iteration: rebuilds the request from self.session.messages, calls create_message_stream again, and starts a fresh inner loop. Surface a "Connection interrupted; retrying (N/3)" status to the user so they know something's happening, but don't trip the engine-level Error event so we don't double-display the failure as a History cell. Healthy rounds (stream_errors == 0) reset the retry budget so a single proxy hiccup doesn't poison subsequent rounds in the same turn. Crucially: if we got partial output (any tool call, any visible text, or any thinking), we DON'T retry. Re-running the request would double-bill the user; ship the partial state to the rest of the turn pipeline (existing tool execution, content_blocks finalization) and let the agent loop continue. Combined with #103 Phase 1+2 (TCP/HTTP2 keepalives + diagnostic logging in client.rs), this should turn the user-visible "Turn failed: Stream read error" into either a fully-recovered turn OR a clearly-labeled 3-attempts-exhausted failure. Refs #103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:37:48 -05:00
Hunter Bown	30ae78ee19	feat(pager): match highlighting + status counter no-clip + Esc clears matches (#96 ) Polish pass on the existing pager search loop. The infrastructure was already there (`/` opens search, type query, Enter commits, `n`/`N` cycle, `match X/Y (n/N)` status row gets pushed) but had three rough edges that made it less than the codex pager-overlay parity #96 asks for: 1. Status row clipped on small popup heights. `visible_height` was `popup_area.height - 2` (borders only). With `Padding::uniform(1)` on the block we actually have 4 rows of overhead, not 2 — so the status row got pushed past the viewport on shorter pagers and the user never saw match-count feedback. Subtract 4, then reserve another row for the status when matches exist. 2. Matched lines weren't visually distinguished. Searching jumped the scroll to the match but the line itself rendered the same as surrounding rows. Now the current match row gets a Yellow/Black bold background; other matches get a DarkGray/Yellow background. Per-substring highlighting (preserving the original spans' styling) is deferred — the all-row highlight is enough to navigate and avoids the substring-styling-vs-pre-styled-spans interaction that needs its own design pass. 3. Esc in the search prompt left stale matches behind. Pressing `/` then Esc to bail now ALSO clears `search_input` / `search_matches` / `search_index`, returning the pager to a clean un-highlighted view. Codex parity. To resume from where the user left off they re-`/` and re-type. 4 new tests (`search_finds_matches_and_renders_match_counter`, `esc_in_search_mode_clears_matches`, `n_and_capital_n_cycle_matches_with_wrap`, `matched_lines_get_highlight_background`). 22/22 pager tests pass. Fixes #96. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:35:05 -05:00

1 2 3 4 5 ...

324 Commits