codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	86f59cd2c2	merge: slash-menu scroll viewport fix (fixes #64 )	2026-04-26 13:37:30 -05:00
Hunter Bown	320325e419	fix(tui): bump SLASH_MENU_LIMIT to 128 so the scroll viewport works The composer's render path already paginates with center-tracking, but the source list was hard-capped at 6 entries — so pressing Down arrow past index 5 had no entries to land on. Repro: with ~37 slash commands, hitting Down repeatedly stuck at the last visible row. Bumping the source cap to 128 lets the existing viewport scroll logic exercise the full filtered command list. No render-path change needed. Fixes #64	2026-04-26 13:37:29 -05:00
Hunter Bown	38069700cc	chore: wip capacity canonical state + tool alias deprecation	2026-04-26 13:11:57 -05:00
Hunter Bown	2adbe398ba	merge: tool alias deprecation metadata (fixes #72 )	2026-04-26 12:55:17 -05:00
Hunter Bown	4f18809d74	merge: V4 capacity priors (fixes #73 )	2026-04-26 12:53:31 -05:00
Hunter Bown	c58d10ded1	feat(tools): mark alias tools with deprecation metadata Add `wrap_with_deprecation_notice` helper in the subagent module that merges a `_deprecation` block into a ToolResult's metadata. Applied exclusively on alias invocations: - `spawn_agent` → use `agent_spawn` (removed in v0.8.0) - `delegate_to_agent` → use `agent_spawn` (removed in v0.8.0) - `close_agent` → use `agent_cancel` (removed in v0.8.0) - `send_input` → use `agent_send_input` (removed in v0.8.0) Canonical names are unaffected. Each alias invocation also emits a `tracing::warn` so the deprecation appears in audit logs. Documents the deprecation schedule in `docs/TOOL_SURFACE.md`. Four unit tests verify the notice shape and that canonical tools stay clean. Refs #72 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:32:26 -05:00
Hunter Bown	cf9fdef9d8	fix(capacity): add V4 model priors and key normalization Add deepseek_v4_pro (3.5) and deepseek_v4_flash (4.2) priors to CapacityControllerConfig::default() so V4 models are no longer silently mapped to the generic 3.8 fallback. Extend normalize_model_prior_key to match v4-pro, v4_pro, v4-flash, v4_flash, and deepseek-ai/-prefixed NIM identifiers before the V3/ reasoner branches to prevent cross-matches. V3 and reasoner fallbacks are unchanged. Add deepseek_v4_pro_prior / deepseek_v4_flash_prior fields to CapacityConfig (config.toml) and DEEPSEEK_CAPACITY_PRIOR_V4_PRO / DEEPSEEK_CAPACITY_PRIOR_V4_FLASH env-var overrides, matching the existing V3 pattern. Refs #73 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:28:21 -05:00
Hunter Bown	e9970fcad3	ci: switch npm publish to NPM_TOKEN + add auto-tag workflow The OIDC Trusted Publisher path for npm has 404'd on PUT for v0.5.1, v0.5.2, and v0.6.1, even with valid OIDC tokens. Switch publish-npm and publish-npm-manual to a classic NPM_TOKEN automation token (set the NPM_TOKEN repo secret to a granular access token scoped to deepseek-tui with publish permission) so future releases ship reliably. Also add .github/workflows/auto-tag.yml: when the workspace version on main changes, push the matching v$VERSION tag automatically so release.yml fires without a manual tag push. Requires a RELEASE_TAG_PAT secret to trigger downstream workflows (GITHUB_TOKEN tag pushes don't trigger on: push: tags by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:22:15 -05:00
Hunter Bown	e1ac84ae44	release: v0.6.1 — pricing update, remove light theme + theme setting - V4 cache-hit input prices cut to 1/10th per DeepSeek pricing update: Pro promo 0.03625→0.003625, Pro base 0.145→0.0145, Flash 0.028→0.0028 - Remove the 'light' theme variant (Variant::Light, Theme::light(), test) - Remove the theme setting entirely — hardcode UI_THEME to whale/dark, drop the theme field from Settings, ConfigView, and config command - Bump workspace version 0.6.0 → 0.6.1 (Cargo.toml, npm pkg, CHANGELOG) - De-cringe the README: drop emojis, marketing fluff, unverified claims	2026-04-26 11:56:41 -05:00
Hunter Bown	c5a584d5c3	refactor(client): extract chat + responses into folder module (P1.1) Split client.rs into client/mod.rs (public API + helpers), client/chat.rs (chat-completions streaming), and client/responses.rs (responses API helpers). Internal helpers promoted to pub(super) for intra-module visibility; the public DeepSeekClient API is unchanged. While here, redesign all five system prompts around decomposition-first philosophy inspired by the mismanaged-geniuses hypothesis (Zhang et al., 2026). The model is now instructed to todo_write / update_plan before acting, fan out sub-agents for parallel work, and keep the sidebar populated so the user always sees what's happening. Mode prompts updated: - agent.txt: 'Before requesting approval, lay out work with todo_write' - plan.txt: 'Use update_plan for strategy, todo_write for tactics' - yolo.txt: 'Even with auto-approval, create a todo_write first' - normal.txt: same pattern for legacy compatibility Update CHANGELOG [Unreleased] and README modes section accordingly.	2026-04-26 11:39:44 -05:00
Hunter Bown	1a100fe96c	refactor(core): carve approval + dispatch helpers out of engine.rs (P1.3) Splits `core/engine.rs` (4670 → 4314 lines) into a small folder module: - `engine/approval.rs` (~125 lines) — `ApprovalDecision`, `UserInputDecision`, `ApprovalResult`, plus the two handshake methods `Engine::await_tool_approval` and `Engine::await_user_input`. - `engine/dispatch.rs` (~300 lines) — tool-input parsing (`final_tool_input`, `parse_tool_input`, fenced/JSON segment helpers), `multi_tool_use.parallel` payload parser, dispatch policy predicates (`should_parallelize_tool_batch`, `should_force_update_plan_first`, `should_stop_after_plan_tool`, the read-only MCP tool helpers), and the `ToolExecutionPlan`/`ToolExecOutcome`/`ParallelToolResult`/ `ToolExecGuard` types the batch driver passes around. The public engine surface (`EngineConfig`, `EngineHandle`, `spawn_engine`, `MockEngineHandle`, `mock_engine_handle`, `compact_tool_result_for_context`, `TOOL_CALL__MARKERS`, `FAKE_WRAPPER_NOTICE`) stays in `engine.rs` — every external user imports unchanged. Not split this round: the 1268-line `handle_deepseek_turn` method. Carving its inline parallel/sequential dispatch and approval handshake arms requires extracting two new methods from a borrow-heavy turn loop; flagged in the v0.6.0 audit doc as future work. Workspace tests: 1011/1011 still green. No clippy regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:57:27 -05:00
Hunter Bown	25cfe11736	refactor(tui): extract slash-menu helpers into tui/slash_menu.rs (P1.2) Lifts `visible_slash_menu_entries`, `apply_slash_menu_selection`, and `try_autocomplete_slash_command` from `tui/ui.rs` into a sibling module. Drops the now-unused `slash_completion_hints` import from `ui.rs` (the new module imports it directly). Kept separate from `tui::file_mention` per the audit doc — the two popups have distinct trigger characters, ranking, and post-selection behaviour even though they share UI scaffolding. `ui.rs`: ~5070 → ~4990 lines. Workspace tests: 1011/1011 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:47:44 -05:00
Hunter Bown	56308bb5d7	refactor(tui): extract paste-burst handlers into tui/paste.rs (P1.2) Lifts `handle_paste_burst_key`, `handle_paste_burst_decision`, `apply_paste_burst_retro_capture`, and the local `in_command_context` helper out of `tui/ui.rs` into a sibling module. The state machine (`PasteBurst`) and its tests stay in `paste_burst.rs`; only the keymap- side wiring moves. Drops the now-unused `CharDecision` import from `ui.rs`. Workspace tests: 1011/1011 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:45:35 -05:00
Hunter Bown	4138053dd8	refactor(tui): extract sidebar rendering into tui/sidebar.rs (P1.2) Moves the four sidebar panels (Plan, Todos, Tasks, Agents) plus the shared `render_sidebar_section` wrapper out of `tui/ui.rs` into a new sibling module. `truncate_line_to_width` becomes `pub(crate)` so the new module can reuse it. Drops six imports from `ui.rs` that the sidebar took with it. `ui.rs`: 5450 → ~5070 lines. Workspace tests: 1011/1011 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:43:43 -05:00
Hunter Bown	5d51143194	feat(tui): group thinking + tool calls in one active cell (P2.3) Routes Thinking content through `active_cell` so a turn that emits Thinking → Tool → Tool renders as one logical "Working…" block until the next assistant prose chunk flushes the group into history. - `ActiveCell::push_thinking` parallels `push_tool` for non-tool entries. - `mark_in_progress_as_interrupted` now also stops streaming Thinking spinners on cancellation, matching tool cell behaviour. - New `streaming_thinking_active_entry` field on `App` tracks the in-flight thinking entry index so deltas can mutate it in place. - `flush_active_cell` finalizes any unclosed thinking spinner before draining the group into history (defensive guard). - Removed the dead `StreamingCellKind::Thinking` variant and tightened `append_streaming_text` to Assistant only. Tests cover: push_thinking, group ordering, drain order, interrupt- clears-spinner, the full Thinking → Tool → Tool → flush flow, defensive flush of an unclosed thinking block, and a second thinking block appending inside the same active cell. Workspace tests: 1004/1004 → 1011/1011. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:40:05 -05:00
Hunter Bown	ca7ca9f75f	docs: drop stale handoff/migration/parity docs Removed: - `.claude/next-agent-prompt.md` (111 lines) — v0.4.6-era session prompt describing slices A/B/C that have all shipped. Successive sessions use fresh prompts (e.g. .deepseek/v0.6.0-overnight-review.md); this one is pure history. - `docs/archive/workspace_migration_status.md` (92 lines) — explicitly archived (April 11), describes a one-time migration that's complete. Removed enclosing `docs/archive/` directory too (was the only file). CHANGELOG entry from v0.4.x still narrates the archival as history. - `docs/parity_release_and_ci.md` (38 lines) — duplicates what `.github/workflows/parity.yml` and CONTRIBUTING.md already say authoritatively. Single source of truth wins. - `AI_HANDOFF.md` + `todo.md` (untracked, no commit needed) — `todo.md` was a 7-line pointer to AI_HANDOFF.md, which itself was an April 11 snapshot listing "remaining work" that's mostly delivered. CLAUDE.md is the live developer guide now. 1004/1004 tests still green; no doc/code references broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:27:58 -05:00
Hunter Bown	d6bfcda474	chore: drop unreferenced assets/hero.png Not referenced from README.md, docs/, npm/, or any Cargo metadata. README uses assets/screenshot.png. Reduces repo size by 226 KB. Also cleaned up working-directory cruft (untracked, no commit needed): apps/ (empty), python/ (empty after egg-info removed), counterpoint.copilot.db, firebase-debug.log, excalidraw.log, .DS_Store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:23:08 -05:00
Hunter Bown	f3df5e515e	docs(changelog): roll up Phase 2/4 polish — agents chip, mention popup, P2.4 tests, subagent split, parse-counter de-flake Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:16:30 -05:00
Hunter Bown	a4f4f5040f	style: cargo fmt --all (post-Phase-2/4 cleanup) Auto-format pass after the tool-call rendering work, footer chip, mention popup, subagent split, and parse-counter de-flake. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:15:11 -05:00
Hunter Bown	a02898545d	refactor(tools): split subagent.rs into folder module — start with tests (P1.1) Promote `tools/subagent.rs` (4206 lines) to a folder module: tools/subagent/ mod.rs — runtime types, manager, tool implementations (~3577 lines) tests.rs — extracted test module (~631 lines) This is the safe first step. The audit doc proposed a 4-way split (mod / spec / executor / tests). I tried the 3-way (mod / tools / tests) and the runtime <-> tool-impl coupling produces unresolved-symbol errors because shared helpers (`SubAgentTask`, `run_subagent_task`, `build_allowed_tools`, `normalize_role_alias`, `parse_spawn_request`, the agent prompt constants) are referenced from both layers. Doing that split right needs a small API design pass to decide which helpers graduate to the manager API and which stay tool-private — out of scope for a structural reorg. Pulled the test module out as the cleanest no-API-change win and left a path open for the bigger split later. Public API unchanged — `pub mod subagent;` still exports the same items because `mod.rs` is a drop-in replacement for `subagent.rs`. 954 → 954 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:13:35 -05:00
Hunter Bown	2185b8c3c6	feat(tui): wire up @-mention popup end-to-end (P2.1) The audit doc claimed the wiring was "in place" but only the App state fields existed (`mention_menu_selected`, `mention_menu_hidden`) — no helpers, no widget rendering, no key handling. Building it out fully so the popup actually shows when the user types `@` in the composer and Up/Down/Enter/Tab/Esc behave the way the slash menu does. What's new: 1. `file_mention::visible_mention_menu_entries(app, limit)` — the entries source. Returns `Vec<String>` from the workspace walk, gated on the `mention_menu_hidden` flag and on the cursor being inside an `@token`. 2. `file_mention::apply_mention_menu_selection(app, entries)` — splices the selected entry into the input via the existing `replace_file_mention`, resets `mention_menu_hidden`, surfaces a status confirmation. 3. `ComposerWidget::new(app, max_height, slash_entries, mention_entries)` — second menu slot. The widget renders whichever slice is non-empty, addressed by the matching selected index. Mention entries get an `@` prefix so the popup row reads like the actual mention being composed. Mention takes precedence (positional check is stricter than slash's "starts-with-/"). 4. ui.rs key handler: - Up/Down navigate `mention_menu_selected` when the popup is open. - Enter applies `apply_mention_menu_selection` instead of submitting. - Tab applies the selection (then falls through to the existing slash / command-completion / file-mention chain). - Esc hides the popup until the next input edit (`insert_str` already resets `mention_menu_hidden`, so typing re-opens it). 6 new tests in `ui/tests.rs`: - mention_popup_is_empty_when_cursor_is_not_in_a_mention - mention_popup_lists_workspace_matches_for_cursor_partial - mention_popup_respects_hidden_flag - apply_mention_menu_selection_splices_selected_entry - apply_mention_menu_selection_is_noop_outside_a_mention - apply_mention_menu_selection_with_no_entries_is_noop Also fixes a stray duplicate `#[cfg(...)]` and an unused-doc-comment warning that landed when the parse-counter went thread-local — back to baseline 7 clippy warnings. 948 → 954 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:09:04 -05:00
Hunter Bown	06355e3aea	test(tui): pin auto-scroll churn contract for P2.4 regression coverage Audit pass found the auto-scroll paths are already gated correctly: - `mark_history_updated` only bumps history_version + needs_redraw — does NOT scroll. - All tool-cell handlers (`handle_tool_call_started`, `handle_tool_call_complete`, `push_active_tool_cell`, `register_tool_cell`) call `mark_history_updated` only — none of them call `scroll_to_bottom`. - `add_message` and `flush_active_cell` gate their auto-scroll on `user_scrolled_during_stream`. - The per-stream lock clears at TurnComplete (ui.rs ~557) and when the user scrolls back to the live tail (widgets/mod.rs ~126). - Explicit user actions (vim G, End, session resume, message submit) call `scroll_to_bottom` directly — that's correct. 5 new regression tests in ui/tests.rs lock the contract so a future contributor adding `app.scroll_to_bottom()` to a tool-cell handler hits a red CI immediately: - add_message_does_not_scroll_when_user_scrolled_away - add_message_pins_to_tail_when_user_was_following - tool_call_started_does_not_scroll_when_user_scrolled_away - tool_call_complete_does_not_scroll_when_user_scrolled_away - mark_history_updated_does_not_call_scroll_to_bottom 948 → 948 (no changes; tests were already passing — they just weren't written yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:00:43 -05:00
Hunter Bown	75de26c7a1	test(tui): de-flake parse-invocation counter via thread-local `parse_invocations_increment` and `render_parsed_does_not_call_parse` both read the global PARSE_INVOCATIONS atomic. They were racing whenever any other test in the suite called `parse()` in parallel — the global counter would tick once for each unrelated call and the assertion (== 2 / == 0) would mismatch. Switching to `thread_local!<Cell<u64>>` gives each test thread its own counter, so concurrent callers from other tests can't pollute the result. Tested across 8 sequential full-suite runs: 8/8 green (was ~40% green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:00:32 -05:00
Hunter Bown	9467d26db7	feat(tui): surface in-flight sub-agents in the footer status strip (P2.5) FooterProps gains an `agents` chip slot, populated by `footer_agents_chip` which mirrors the rest of the footer chips: empty `Vec<Span>` when `running_agent_count == 0` (chip hides), "1 agent" / "N agents" otherwise, DeepSeek-sky color matching the model badge. The widget's `auxiliary_spans` includes it in the same drop-from-end fit-to-width chain as the existing chips, so on narrow terminals the cost chip drops first as before. The "0 running" wording the audit doc called out wasn't actually in FooterProps — that wording is in the agent sidebar (ui.rs ~2960) and was already fixed there to swap to "N done" once nothing is in flight. So the P2.5 work here is the additive footer surface, not a wording fix. 4 new tests in widgets/footer.rs: - footer_agents_chip_is_empty_when_no_agents_running - footer_agents_chip_uses_singular_for_one - footer_agents_chip_uses_plural_for_many - footer_agents_chip_renders_into_widget 939 → 943 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:54:03 -05:00
Hunter Bown	93efb09038	fix(tui): tool-call rendering — defer ToolCallStarted, progressive labels, elapsed badge The engine used to fire `Event::ToolCallStarted` from `ContentBlockStart::ToolUse` with `input: json!({})` — before any `Delta::InputJsonDelta` had streamed in. The UI's `handle_tool_call_started` baked the placeholder into the cell at creation time and never refreshed, so users saw `<command>` and `<file>` literals while the args finished streaming. Fix relocates the emission to `ContentBlockStop` (where the input is finalized already) and routes it through a new `final_tool_input(state)` helper that prefers the parsed buffer over a stale empty initial input. Three regression tests in `engine/tests.rs` pin the contract. Also bundled (same theme — make in-flight tool cells read right): - Progressive labels via `exploring_label`: "Read foo.rs" → "Reading foo.rs", "List X" → "Listing X", "Search pattern" → "Searching for `pattern`", "List files" → "Listing files". 5 tests in `ui/tests.rs`. - `running_status_label_with_elapsed` in `history.rs`: from 3 s onward the status segment becomes `running (Ns)` and ticks every second, driven by the existing CX#3 status-animation tick. Below 3 s no badge — quick reads/greps stay quiet. Wired through `render_tool_header`. 2 tests. - Spinner cadence sped up: `TOOL_STATUS_SYMBOL_MS` 1800 → 720 ms per glyph, so the 4-glyph "heartbeat" is ~2.88 s instead of ~7.2 s. 929 → 939 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:50:32 -05:00
Hunter Bown	42fe888d35	Merge CX#7: one active cell mutated in place Replaces "tool start pushes new cell" with a single ActiveCell that collects parallel/serial tool entries at the transcript tail and flushes as a contiguous block on first assistant text or turn complete. Stops the bounce when many tools fire concurrently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:14:07 -05:00
Hunter Bown	63d7391ff8	CX#7: one active cell, mutated in place Codex pattern — instead of appending a new ToolCall history cell for each parallel tool invocation, keep one Exploring/Searching/Reading active cell at the tail of the transcript and mutate its contents in place as new tool calls fire. Drops cell churn and keeps the visual anchor stable while multiple tools stream concurrently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:13:57 -05:00
Hunter Bown	585dd2f7d0	CX#8: two surfaces — display_lines vs transcript_lines - HistoryCell::Thinking — live shows first ~4 lines + Ctrl+O affordance; transcript_lines() returns full content with all paragraphs. - ExecCell — live caps with head/tail + omission marker; transcript emits all wrapped lines without truncation. - Tool/Patch/Mcp/Review cells — live caps + affordance; transcript uncapped. - User/Assistant/System/Plan/Diff/etc — display == transcript. - Pager (Ctrl+O / Ctrl+T) flows through transcript_lines via history_cell_to_text — opening the pager on a thinking or capped tool cell shows the full body. Updated affordance assertion to match the post-CX#9 wording (press Ctrl+O for full text). 911/911 tests pass; clippy -D warnings clean; fmt clean.	2026-04-25 22:44:42 -05:00
Hunter Bown	8f05f272d3	CX#5 + CX#11: line-buffer newline gate + pure-render footer CX#5 (newline-boundary streaming gate): - New crates/tui/src/tui/streaming/line_buffer.rs — LineBuffer holds text after the last \n until the next \n arrives, so partial code fences never become visible state. - Wired into BlockState in streaming/mod.rs. Assistant text gates; thinking deltas bypass (reasoning stays live). - 9 unit tests including the partial-fence regression case. CX#11 (pure-render footer): - New crates/tui/src/tui/widgets/footer.rs — FooterProps / FooterToast / FooterWidget. Pure render of pre-computed props. - ui.rs::render_footer rewritten to build props once and delegate to FooterWidget. Visual output identical; existing 10 footer tests pass unchanged. 5 new from_app tests for the props builder. 908/908 tests pass; cargo clippy --workspace -D warnings clean; cargo fmt clean.	2026-04-25 22:43:03 -05:00
Hunter Bown	1ad0c886b8	CX#4: two-gear streaming chunker (Smooth ↔ CatchUp) Splits crates/tui/src/tui/streaming.rs into a streaming/ module: - streaming/mod.rs — StreamingState with per-block BlockState - streaming/chunking.rs — policy state machine, 7 tests - streaming/commit_tick.rs — StreamChunker queue + run_commit_tick Thresholds match codex parity: ENTER_QUEUE_DEPTH=8, ENTER_OLDEST_AGE=120ms, EXIT_QUEUE_DEPTH=2, EXIT_OLDEST_AGE=40ms, EXIT_HOLD=250ms. 894/894 tests pass; clippy -D warnings clean; fmt clean.	2026-04-25 22:28:00 -05:00
Hunter Bown	d111680a3b	v0.6: codex-pattern wave 1 — frame rate, scroll model, revision cache, vim pager, cap output, kill-buffer Merges in 6 of 13 codex-pattern items as a wave-1 checkpoint: - CX#3 frame-rate limiter (~120 FPS coalesce) - CX#9 LIVE_TOOL_OUTPUT_MAX_LINES + Ctrl+O affordance - CX#13 kill-buffer Ctrl+K / Ctrl+Y composer - CX#12 vim pager keys (j/k/g/G/Ctrl+D/Ctrl+U/Ctrl+F/Ctrl+B/Space) - CX#2 flat line-offset scroll model (replaces TranscriptScroll cell-anchor enum) - CX#10 per-cell revision counter for partial cache rebuilds 883/883 tests pass (was 853). cargo clippy --workspace -D warnings clean. cargo fmt clean.	2026-04-25 22:25:30 -05:00
Hunter Bown	5f223adea6	v0.6.0: native rlm_query tool + scroll fix + cleanup Adds a structured rlm_query tool for parallel/batched LLM fan-out. The model calls it with one prompt or up to 16 concurrent prompts; children dispatch via tokio::join_all against the existing DeepSeek client. Default child model is deepseek-v4-flash; override per-call via the model field. Available in Plan / Agent / YOLO. Cost folds into the session's running total automatically. Fixes scroll-stuck regression (#56): TranscriptScroll::resolve_top and scrolled_by now use a three-level fallback chain (same line → same cell line 0 → nearest cell at-or-before) instead of teleporting to ToBottom when an anchor cell vanishes. Loosens command-safety chains (#57): cargo build && cargo test and similar chains of known-safe commands now escalate to RequiresApproval instead of being hard-blocked as Dangerous. Chains containing unknown commands still block. Suppresses the GettingCrowded footer chip — context-percent header already covers conversation pressure. Refactors: - Extracts file_mention parsing/completion/expansion (~450 LOC) from the 5,500-line ui.rs into crates/tui/src/tui/file_mention.rs. - Deletes truly unused helpers (write_bytes, timestamped_filename, extension_from_url, output_path, has_project_doc, primary_doc_path). Tests: 853 pass. cargo clippy --workspace -D warnings clean. cargo fmt --all -- --check clean. Closes #46 #47 #48 #49 #50 #53 #54 #55 #56 #57 #58.	2026-04-25 21:48:17 -05:00
Hunter Bown	027d6d19b6	docs(rlm): land Hetun design + helper layer + Sakana research methodology Captures the full RLM-fundamental story across the design doc, MODES.md, and the Hetun prompt. Tracking issues are now #46–#55 (helper layer filed as #53, Hetun as #54, vendoring as #55). What this nails down: - Hetun mode is added at the END of the Tab cycle (Plan → Agent → YOLO → Hetun → Plan), not as a Plan replacement. Default landing mode is unchanged so people don't accidentally start there. Plan stays as it is. - Mission-level approval, not block-level. Hetun runs a research phase, presents one mission card, and only executes after explicit user approval. Inside the execution turn the repl block runs straight through with no per-block prompts — that's the whole point of the mode. - The user's configured model is left alone on enter/exit. Pro/max users stay on Pro/max. The flash-as-coordinator behaviour is internal to the runtime (ZIGRLM_RLM_CMD always points to flash regardless of mode). No global model swap. - No /hetun slash command. Tab cycles into the mode; /plan keeps switching to Plan as today. - The helper layer (#53) is fundamental, not aleph-derived. A curated ~20-function ctx-helper module + AST-validated Python sandbox baked into the repl runtime so a single block can load → slice → fan out flash queries → aggregate without crossing tool boundaries. Inspired by aleph's pattern but our own native primitive — not a port. - Hetun research methodology adopts Sakana's Fugu patterns. The research phase is recursive novelty sampling + hierarchical narrative tree synthesis + multi-detector cross-verification (flash for breadth, Pro for depth) + hypothesis-verification loop. Not "fan out 8 fixed queries". This is what makes "Plan + Recursive Agents" meaningful versus a flash-coordinator wrapper. - No version-number framing anywhere. The plan ships as one cohesive RLM landing across #46/#48/#49/#50/#53/#54/#55 — order is dependency, not release schedule. We keep shipping. - Auto-compaction stays automatic. Removed a manual /compact nag from the Hetun prompt; the existing coherence + capacity system already handles this. Files: docs/rlm-design.md new — full design doc with Hetun details docs/research-react-vs-rlm.md new — supporting research treatment docs/MODES.md 4-mode cycle, Hetun added at end, Plan kept crates/tui/src/prompts/hetun.txt prompt teaching the recursive-novelty + hierarchical-synthesis + verification-loop rhythm, mission-card structure, two-step gate .gitignore ignore .claude/scheduled_tasks.lock runtime Closes nothing yet — implementation lands across the tracking issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 15:37:25 -05:00
Hunter Bown	229b1993ac	ci: mirror release-workflow strict gates so failed publishes can't slip through The Release / parity job runs with `--locked` and clippy `-D warnings`. Main CI ran without either flag, so commits could pass main CI but fail the release pipeline at the parity stage — which has been silently blocking every npm publish since v0.4.6 (latest npm = 0.4.8 even though git tags reach v0.5.2). Most failures were either fmt drift caused by new stable rustc / rustfmt revisions or lockfile drift the workspace CI never noticed. Aligns the Lint job's clippy step with `--locked -- -D warnings` and the Test job with `--locked` + an explicit `git diff --exit-code -- Cargo.lock` lockfile drift guard. From here on, anything that would fail Release / parity also fails main CI on the same push, so we never push a tag we know will fail the publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 15:07:28 -05:00
Hunter Bown	686ef94719	fix: doc-link warning in model_picker module header Removed the `[`Settings`]` intra-doc link that referenced a type not in scope at the module-doc level — RUSTDOCFLAGS=-Dwarnings rejects it. Replaced with backtick code formatting; the rest of the doc is unchanged. Caught by the Documentation CI job on v0.5.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 15:03:37 -05:00
Hunter Bown	ba6bc351da	release: v0.5.2 — /model picker Single-feature release that lands #39 (the /model two-pane picker) on top of the v0.5.1 quality-of-life batch. Bumps workspace + npm wrapper + Cargo.lock in lockstep; check-versions.sh verifies. Closes #39. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 15:00:19 -05:00
Hunter Bown	ebdda09c29	feat(#39 ): /model opens Pro/Flash + Off/High/Max picker `/model` with no argument now opens a two-pane modal: model on the left (deepseek-v4-pro flagship vs deepseek-v4-flash fast-and-cheap, with the current id appearing as a "current (custom)" row when it isn't one of the listed defaults), and thinking effort on the right (Off, High, Max). Tab/←/→ swaps panes, ↑/↓ moves within the focused pane, Enter applies both, Esc cancels. Effort exposes only the three rows DeepSeek behaviorally distinguishes per the Thinking Mode docs — `low`/`medium` are mapped server-side to `high`, and `xhigh` to `max`, so listing them as separate choices would mislead. The legacy variants stay valid in `~/.deepseek/settings.toml` for back-compat (the existing `cycle_next` already only visits Off→High→Max), the picker just doesn't surface them. Apply path: * mutates app.model and app.reasoning_effort * resets last__tokens / cache / replay-token gauges so the next-turn footer numbers reflect the new model rather than stale ones persists `default_model` and `reasoning_effort` to settings via the existing Settings::set/save flow so the choice survives restart * forwards Op::SetModel + Op::SetCompaction to the engine so the running session picks up the new compaction budget * surfaces a one-line summary describing what changed * if persistence fails, the in-memory change still applies and a "(not persisted: ...)" suffix is appended to the status line `/model <id>` keeps working unchanged for power users; only the no-argument branch was redirected to the new modal. Files: * tui/model_picker.rs — new ModelPickerView struct + ModalView impl, plus eight unit tests (initial state, low/medium normalisation, custom model preservation, arrow navigation, focus toggle, Enter emits ModelPickerApplied with the right values, Esc closes silently, and a guard that the picker exposes exactly off/high/max). * tui/views/mod.rs — adds ModalKind::ModelPicker and ViewEvent::ModelPickerApplied carrying both new and previous model+effort so the handler can describe the diff. * tui/app.rs — adds AppAction::OpenModelPicker. * commands/core.rs — `/model` no-arg branch now returns AppAction::OpenModelPicker; `/model <id>` shortcut is unchanged. * tui/ui.rs — pushes ModelPickerView on the action and adds apply_model_picker_choice() that handles persistence + engine sync when ViewEvent::ModelPickerApplied fires. * tui/mod.rs — registers the new submodule. Closes #39 (against v0.5.2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:50:49 -05:00
Hunter Bown	f42f94207c	release: v0.5.1 — telemetry, completion, and trust quality-of-life Bumps workspace version to 0.5.1 and finalises the changelog with the issues that landed since v0.5.0: * #25, #27, #31, #33, #34 (already on main) * #28 @file Tab-completion * #29 per-workspace trust list with /trust slash command * #30 reasoning-replay token chip in the footer * #36 regression tests for sidebar gutter bleed scripts/release/check-versions.sh is green: workspace=0.5.1, npm=0.5.1, Cargo.lock in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:40:06 -05:00
Hunter Bown	d79178a926	feat(#28,#30): @file Tab-completion + reasoning replay footer chip Two related TUI affordances bundled because they share ui.rs and the ui/tests.rs file. #30 — Reasoning-content replay telemetry, end-to-end: * models.rs — Usage gains reasoning_replay_tokens: Option<u32>. * client.rs — sanitize_thinking_mode_messages now returns the approximate replay-token count (~4 chars/token); the streaming pipeline overlays it onto the parsed MessageDelta usage so the server-reported and client-estimated numbers reach the engine together. * app.rs — App stores last_reasoning_replay_tokens. * ui.rs — TurnComplete handler copies the value into the App; new footer_reasoning_replay_spans renders an `rsn N.Nk` chip in the footer next to the cache hit-rate, warning-coloured when replay tokens exceed 50% of the input budget. * ui/tests.rs — covers chip-on, chip-hidden-when-zero, and the sanitizer's None-on-non-thinking-model path. #28 — Tab-complete @file mentions against the workspace: * ui.rs — adds partial_file_mention_at_cursor (with a guard against `user@example.com`-style false positives) and try_autocomplete_file_mention. Walks the workspace via the existing ignore::WalkBuilder, ranks prefix matches above substring matches, applies the unique match outright, extends to the longest common prefix when multiple match, and surfaces ambiguous candidates via the status line. Wired into the existing Tab handler after the slash-command branch. * ui/tests.rs — covers cursor-inside-mention extraction, email guard, prefix vs substring ranking, single-match application, common-prefix extension, no-match status, and the no-mention-no-op path. The mention-expansion path that ships file contents to the model is unchanged — this is purely a discovery aid for typing the path. Inline-contents and a fuzzy popup picker are queued for v0.5.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:39:59 -05:00
Hunter Bown	bcf6ba9a8e	feat(#29 ): per-workspace trust list with /trust slash command Adds a persistent allowlist of external paths the agent may read/write from outside the current workspace, scoped to the workspace it was granted in. The list lives in ~/.deepseek/workspace-trust.json with schema {"workspaces": {"<ws>": ["<trusted>", ...]}}; canonical paths on both sides keep symlink-aliased macOS tempdirs sane. Surface area: * crates/tui/src/workspace_trust.rs — new module: load_for / add / remove plus _at variants for tests that need an explicit file path rather than HOME mutation. tools/spec.rs — ToolContext gains trusted_external_paths and resolve_path consults it before returning PathEscape, both for the existing-path branch and the to-be-created (parent-canonical) branch. * core/engine.rs — build_tool_context loads the trust snapshot on every tool dispatch so /trust mutations apply on the next call. * commands/config.rs — /trust now takes subcommands (add, remove, list, on, off, status) instead of being a single all-or-nothing toggle. Tilde expansion handled in-line. * commands/mod.rs — registry entry updated with the new usage string and a dispatcher that forwards args. * tools/diagnostics.rs — adds trusted_external_paths to the JSON output so the agent and the user can see the list at a glance. The interactive "Allow once / Always allow / Deny" prompt that the issue describes is deferred — for v0.5.1 the workflow is "grant ahead with /trust add". A future change will add a hook in ToolContext::resolve_path that surfaces an ApprovalRequest when an escape path is hit, so the slash-command remains the durable mechanism while the prompt becomes the discovery one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:39:37 -05:00
Hunter Bown	7a85f182e2	test(#36 ): regression tests for sidebar gutter bleed Adds two snapshot tests against ChatWidget rendering to lock in that long single-line tool results never write any cells outside chat_area at the widths reported in the bug (80, 120, 165, 200 cols), and that the scrollbar coexists with content along the right edge instead of overdrawing the penultimate column. The acceptance criterion in the issue specifically requires this regression coverage; the tests pass against current code, so existing rendering is the baseline being guarded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:39:19 -05:00
Hunter Bown	f7fe5e09a5	fix(#37 ): make NIM a peer provider in config.example.toml + setup status A user couldn't find an `NVIDIA_API_KEY` block in `~/.deepseek/config.toml` because the example file only mentioned NIM as commented-out alternates to the top-level keys. Two fixes: - `config.example.toml` now has explicit `[providers.deepseek]` and `[providers.nvidia_nim]` sections (placed after all top-level keys so the TOML still parses cleanly), each documenting `api_key` / `base_url` / `model` plus the env vars that override them. Both providers can be stored at once and toggled via `/provider` or `--provider` without re-entering keys. - `setup --status` "missing api_key" message is now provider-aware: on `nvidia-nim` it points at `NVIDIA_API_KEY` + `[providers.nvidia_nim]` + `deepseek auth set --provider nvidia-nim`, instead of the DeepSeek-only hint. Audit verified: the v0.5.0 multi-turn replay fix path (`should_replay_reasoning_content` → `requires_reasoning_content` in `crates/tui/src/client.rs:1796`) keys off the model name (matches `deepseek-v4`), not the provider, so NIM-hosted V4 models get the replay automatically. No NIM-specific 400-class regression there. Closes #37 (docs/UX); the live multi-turn-against-NIM verification remains a manual smoke step listed in the issue (no NIM creds in CI).	2026-04-25 13:52:44 -05:00
Hunter Bown	24b8945010	feat(#32 ): basic session-handoff convention via .deepseek/handoff.md Minimum-viable version of the handoff artifact described in #32: - New `HANDOFF_RELATIVE_PATH = ".deepseek/handoff.md"` convention. - `system_prompt_for_mode_with_context` now reads that path on every prompt rebuild and prepends a `## Previous Session Handoff` block to the system prompt when the file is non-empty. A fresh agent gets the prior session's blockers/decisions/files-touched in turn-1 context with zero discovery cost. - Agent prompt updated to make the convention explicit: "if the block appears, read it first; before exit/`/compact`, write or update it via `write_file`." - `.deepseek/` is already gitignored, so the handoff travels with the workspace but doesn't pollute commits unless the user opts in. Tests cover: present-and-non-empty (block injected with file content), missing file (no block), empty/whitespace-only file (no block). A unique marker in the injected block (`"left a handoff at .deepseek/..."`) discriminates the actual block from the agent prompt's own description of the convention. Out of scope for v0.5.1: a `/handoff` slash command, a startup banner toast, automatic write on exit, and the diff-against-HEAD-on-resume mechanism. The agent can already write the file via `write_file` when the user types `write a session handoff`. Closes #32.	2026-04-25 13:48:22 -05:00
Hunter Bown	82e4a564aa	refactor(#35 ): tighten agent prompt tool descriptions, drop alias dupes Tool-surface audit pass: - FILE OPERATIONS rewritten so each line states the niche, not just the verb. read_file mentions PDF auto-extraction + `pages` slicing. - New SEARCH section consolidates grep_files / file_search / web_search / fetch_url so the model sees them next to each other and picks the right one. fetch_url (#33) added; previously absent from the prompt. - request_user_input pulled out of FILE OPERATIONS into its own USER section — it never belonged there. - SUB-AGENTS list shrinks by 3: drops `spawn_agent` (use `agent_spawn`), `close_agent` (use `agent_cancel`), and the `agent_assign / assign_agent` dual-name. The underlying dispatchers still resolve those names, so existing sessions don't break — they just no longer pollute the model's tool list. Adds `docs/TOOL_SURFACE.md` with the rationale, the v0.5.1 final surface, and the dropped aliases. Calls out that grep_files is pure-Rust (no rg/grep shell-out, so the "fall back to grep" AC from #35 is vacuously satisfied — the tool has no shell dependency to fall back from). Closes #35.	2026-04-25 13:44:43 -05:00
Hunter Bown	07ae792068	fix(#38 ): show provider chip in header when not on default DeepSeek The reasoning-effort tier (`max` chip + whale icon) and the live/context indicators were the only signals on the right of the header. Switching to nvidia-nim left the right-hand side identical to a DeepSeek session, so it wasn't obvious at a glance that requests were going to a different backend. Now: when `app.api_provider != Deepseek`, the header surfaces a bold `NIM` chip on the right, leftmost in the chip cluster (so it survives the narrow-width fallback variants in `right_spans`). Default-DeepSeek sessions are unchanged — `provider_label = None` short-circuits the chip. Closes #38.	2026-04-25 13:41:52 -05:00
Hunter Bown	ba40ae4aac	feat(#34 ): auto-extract text from PDFs in read_file `read_file` now detects PDFs by extension or `%PDF-` magic bytes and shells out to `pdftotext -layout` (poppler) to return plain text directly to the model. New optional `pages` arg accepts `N` or `N-M` slices so big papers can be read in pieces without burning context. When `pdftotext` isn't on `$PATH`, the tool returns a structured `{type: "binary_unavailable", kind: "pdf", reason, hint}` payload with install hints (`brew install poppler` / `apt install poppler-utils`) instead of crashing or returning UTF-8 garbage from a binary file. Tests cover extension detection (case-insensitive), magic-byte sniffing on extension-less files, the negative case for plain text, the pages arg parser (single, range, whitespace, invalid forms), and the binary_unavailable branch when `pdftotext` is absent. .docx / .epub / .html stripping deferred — same dispatch can take more extractors later. Closes #34.	2026-04-25 13:36:30 -05:00
Hunter Bown	7f2c382343	feat(#33 ): add fetch_url tool for direct HTTP GET Complements `web_search` for cases where the URL is already known — GitHub repos, blog posts, spec pages — and a search-engine round trip is overkill or actively unhelpful (which #25 had been making worse). Surface: - `fetch_url(url, format?, max_bytes?, timeout_ms?)` - `format`: `markdown` (default), `text`, `raw` - HTTPS preferred, http:// allowed; non-http schemes rejected up front - Follows up to 5 redirects; 1 MB default cap (10 MB hard ceiling); 15 s default timeout (60 s ceiling) - HTML responses are stripped to readable text via the same regex pattern used by `web_search` (script/style strip → tag strip → entity decode → whitespace collapse) - 4xx / 5xx responses still return the body (with `success: false`) so the caller can read JSON error envelopes Capabilities: `ReadOnly + Network`. Approval: `Auto` (matches `web_search`). Registered in `with_web_tools` so it's available wherever `web_search` is. Tests cover: format parsing aliases, scheme rejection, missing/empty url validation, html-to-text stripping. The over-the-wire cases (redirect chains, oversized truncation) are exercised by integration tests once the test suite is wired to a local mock HTTP server — deferring that since the unit tests already lock in the input validation and HTML processing. Closes #33.	2026-04-25 13:33:22 -05:00
Hunter Bown	017ac97d0d	feat(#30 ): debug-log reasoning_content replay size per request The thinking-mode sanitizer now sums the byte size of every replayed `reasoning_content` field in the outgoing chat-completions body and emits an `info`-level log line: Reasoning-content replay: 7 assistant message(s), ~3.2K input tokens (12,884 chars) being re-sent in this request This is visible under `RUST_LOG=deepseek_tui=info` (or higher). It's the first step toward the footer/status-line indicator described in #30 — the model's input-side reasoning replay is now observable per turn, even before it gets a dedicated UI surface. Tests cover both branches: bodies that already have reasoning_content (count is summed across all assistant turns) and bodies where the sanitizer had to inject the `(reasoning omitted)` placeholder (the placeholder bytes are included in the count since they ship over the wire). Footer integration deferred — that needs a new event from client → engine → TUI to surface the count alongside `cache N%` / `$X.XX`. Part of #30 remains open.	2026-04-25 13:28:44 -05:00
Hunter Bown	0a394e1587	fix(#31 ): catch version drift in CI, not at release time Adds scripts/release/check-versions.sh and a `versions` CI job that runs on every push/PR. Verifies: - no per-crate Cargo.toml carries a literal version (must inherit the workspace version) - npm/deepseek-tui/package.json matches the workspace version - Cargo.lock is in sync with the manifests Closes #31.	2026-04-25 13:25:55 -05:00
Hunter Bown	fafb76063d	ci: fix unused import + cargo fmt drift from #27 The #27 per-mode context budget commit (`1be18e69`) replaced calls to compaction_threshold_for_model with compaction_threshold_for_model_and_effort but left the old name in the import list, which fails under -Dwarnings on Build, Test, and the npm wrapper smoke job. Also re-runs cargo fmt over the four files the lint job flagged.	2026-04-25 13:21:16 -05:00

1 2 3 4

183 Commits