codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter Bown	fa99fb5124	fix(engine): 256K output budget + capacity controller off by default User feedback after v0.6.2 dogfooding: "we'd be better off simplifying and removing guardrails." Two changes that meaningfully shrink the surface: 1. TURN_MAX_OUTPUT_TOKENS: 32_768 → 262_144 (256K). V4 thinking models can produce tens of thousands of reasoning tokens on hard prompts before the visible reply, and DeepSeek V4 has a 1M context window. 32K was tight for that workload (showed up as the model "stopping mid-response" once reasoning exhausted the budget). 256K is generous enough that the per-turn ceiling effectively never bites in normal use. 2. CapacityControllerConfig::enabled: true → false. The controller's main intervention, `TargetedContextRefresh`, runs `compact_messages_safe` which rewrites the live conversation — visually identical to the agent "restarting" mid-turn. The failure mode it protects against (context overflow) is rare in practice and self-correcting (the model surfaces a clear error). Power users on V4 do not need the guardrail; users who do can re-enable it via `capacity.enabled = true` in `~/.deepseek/config.toml`. Tests: - context_budget_reserves_output_and_headroom: switched fixture model to deepseek-v4-pro (1M context) so the 256K reservation doesn't saturate the budget to zero. - cooldown_blocks_repeated_action: explicitly enables the controller (the cooldown logic short-circuits when disabled). cargo clippy --workspace -- -D warnings clean; full test suite green (990 + adjacent crate tests).	2026-04-26 15:51:58 -05:00
Hunter Bown	6ab2fcc21f	fix(tui): rustfmt parity + working-strip stays visible all turn Two fixes folded into one commit (the parity failure was blocking the v0.6.2 npm publish, the strip fix is the dogfooding follow-up): 1. cargo fmt --all: subagent/mod.rs (long timeout wrapper) was over the line-length budget when committed earlier; rustfmt rewraps it. CI parity (`cargo fmt -- --check`) was failing the release pipeline. 2. footer working-strip stays visible for the entire turn: previously the strip only animated while `is_loading \|\| is_compacting \|\| running_agents > 0`. Between LLM rounds inside a single turn (tool execution, reasoning replay, capacity refresh) `is_loading` flickers off — and so the user saw the strip vanish for seconds at a time even though the agent was clearly still working. Widen the gate to ALSO include `runtime_turn_status == Some("in_progress")`, which only clears when `EngineEvent::TurnComplete` fires — so the strip now stays lit for the whole turn duration.	2026-04-26 15:45:13 -05:00
Hunter Bown	0d92eb847b	fix(engine): raise output-token + step ceilings (mid-response cutoff) User repro: V4 thinking on hard prompts (~107s of thinking) randomly "stops mid-response", more often when starting in Agent mode and switching to YOLO. Two ceilings were too tight: 1. TURN_MAX_OUTPUT_TOKENS: 4096 → 32768 `reasoning_content` from V4 thinking can easily exceed 4K tokens on hard problems. Once the per-turn output budget exhausts, the API closes the SSE stream with `finish_reason: "length"` and the visible reply ends up empty — surfaced as the assistant "stopping randomly". 32K leaves comfortable headroom for thinking + the visible reply on every realistic turn while staying well below DeepSeek V4's 1M-context output ceiling. 2. max_steps: 100 → u32::MAX (effectively unlimited) 100 was hitting the ceiling on long multi-step plans (wide refactors, sub-agent orchestration) and presenting as the agent "giving up mid-task". V4's 1M context window means there's no good reason to cap steps administratively. Users can still interrupt with Ctrl+C / Esc; a turn naturally ends when the model stops emitting tool calls. All 54 turn tests pass; full workspace clippy + fmt remain clean.	2026-04-26 15:38:59 -05:00
Hunter Bown	d98cc58028	fix(tui): sidebar padding + capacity controller tuning Two tuning fixes for issues observed in v0.6.2 dogfooding: #63 follow-up — sidebar panels still empty in compact terminals: `section_padding: Padding::uniform(1)` ate two rows of every sidebar panel (one above content, one below). At the 25% layout split, in terminals around 12-15 rows tall, Plan/Todos/Tasks each get only 3 rows total — borders take 2, vertical padding takes 2, leaving -1 (saturated to 0) rows for the actual content. Even "No todos" / "No active plan" got eaten. Switched to horizontal-only padding so the inner row survives. Capacity-controller tuning (user feedback: "refreshing context is overtuned"): `apply_targeted_context_refresh` runs `compact_messages_safe` which rewrites the conversation history — visually identical to the agent "restarting" mid-session. The previous defaults (low_risk_max=0.34, refresh_cooldown_turns=2, min_turns=2) fired this every couple of turns once p_fail crept above 0.34. Bumped: - low_risk_max: 0.34 → 0.50 - refresh_cooldown_turns: 2 → 6 - min_turns_before_guardrail: 2 → 4 Still well below the medium-risk ceiling (0.62), so genuine drift still triggers; routine noise no longer does. All 14 capacity tests + workspace clippy + fmt remain clean.	2026-04-26 15:27:22 -05:00
Hunter Bown	aa8d0dc73a	merge: per-cell transcript line cache + revisions (closes #78 ) Resolves conflicts with the #65 resize fix that landed first. Both branches converged on the same resize-coalescing + display-width truncation fix; took the perf branch's more detailed inline comments and combined the transcript bench from #78 with the existing #65 resize regression tests. Issue #78 baseline (release, 5000-cell synthetic transcript): pure scroll, off=0 3549µs → 21µs (~150x) pure scroll, off=2000 3303µs → 19µs (~170x) streaming append 11.6ms → 3.4ms (~3.4x)	2026-04-26 14:53:04 -05:00
Hunter Bown	eee5081ef7	merge: clean resize redraw + display-width truncation (closes #65 )	2026-04-26 14:49:42 -05:00
Hunter Bown	ab70c40beb	perf(tui): cache wrapped transcript lines per-cell (closes #78 ) Scrolling far back through a long transcript stalled the entire UI: every keypress paid the cost of re-wrapping every history cell from index 0 on every frame. Two bugs combined to defeat the existing per-cell cache: 1. Uniform cache keys — `widgets/mod.rs` synthesized `cell_revisions = vec![app.history_version; len]`, so a single mutation anywhere bumped every cell's revision and busted the entire cache. 2. Vec-deep-clone on cache hit — `CachedCell.lines: Vec<Line>` deep-cloned on every `prev.clone()` inside `ensure`, so even a fully-cached frame paid O(total_lines) per render. Fix mirrors Codex's chatwidget pattern: track per-cell revisions in `App.history_revisions`, bump only the cell whose content actually changed, and store cached lines behind `Arc<Vec<Line>>` so a cache-hit clone is O(1). The cache reuse path is unchanged; what changed is the keying. Touchpoints: * `App::history_revisions` + `next_history_revision` counter, kept in lockstep with `history` via `add_message` / `extend_history` / `push_history_cell` / `clear_history` / `pop_history` / `bump_history_cell` helpers. * `cell_at_virtual_index_mut` and the `append_streaming_text` path now bump only the targeted cell's revision instead of fanning the global `history_version` across the whole transcript. * `TranscriptViewCache::ensure_split` accepts cell shards directly so the caller no longer concatenates history + active-cell entries into a fresh `Vec<HistoryCell>` every frame. * `mark_history_updated` resyncs `history_revisions.len()` to `history.len()`, preserving correctness for direct callers that bulk mutate via `clear`/`extend`. Bench (release, 5000-cell synthetic transcript, 100×30 area): \| scenario \| before \| after \| \|----------------------\|--------:\|-------:\| \| pure scroll, off=0 \| 3549 µs \| 23 µs \| \| pure scroll, off=100 \| 3338 µs \| 23 µs \| \| pure scroll, off=500 \| 3306 µs \| 20 µs \| \| pure scroll, off=2k \| 3303 µs \| 20 µs \| \| streaming, off=0 \| 11.6 ms \| 3.4 ms \| \| streaming, off=2k \| 11.6 ms \| 3.3 ms \| Pure-scroll renders are now ~150× faster and constant-time vs scroll offset; streaming cost is ~3.5× lower (the remaining cost is the per-frame flatten which always rebuilds the line buffer when the cell count changes — orthogonal follow-up). Bench is `#[ignore]`'d: `cargo test -p deepseek-tui --release bench_transcript_scroll -- --ignored --nocapture` All existing transcript and scroll tests pass; clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:47:17 -05:00
Hunter Bown	033fef6cb2	fix(tui): force clean redraw on resize / bound sidebar labels (closes #65 ) After v0.6.1's light-theme removal exposed it more visibly, rapid resizes left stale glyphs in the right column (sidebar fragments, mid-character title truncation, duplicated transcript timestamps). Three small fixes: - Coalesce queued `Event::Resize` events, run a single `terminal.clear()`, and immediately draw the new frame instead of waiting for the next event loop iteration. Previously the cleared screen could sit blank between the resize handler's `continue` and the next draw, so any other event arriving in that window would be processed before the repaint. - `truncate_line_to_width` for budgets `<= 3` was counting codepoints instead of display widths, overrunning the cell budget for any double-width grapheme. Fix by accumulating display widths consistently. - Add a `tracing::debug!` log to the resize handler so users hitting this in the wild can confirm whether crossterm is delivering the event. Adds two regression tests in `tui/widgets` (resize cycle + cache invalidation on width change) and one in `tui/ui` (truncate semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:42:42 -05:00
Hunter Bown	6d06595b76	merge: animated working strip (closes #61 )	2026-04-26 14:22:37 -05:00
Hunter Bown	70ce26e196	merge: rlm_query parallelism verification + per-child UI (closes #60 )	2026-04-26 14:22:32 -05:00
Hunter Bown	2b7800885e	merge: 'deepseek metrics' CLI (closes #70 )	2026-04-26 14:22:27 -05:00
Hunter Bown	49673d2ea3	feat(rlm_query): verify parallel fan-out + per-child prompt rendering (closes #60 ) Introduce `RlmChildClient` — a dyn-compatible `#[async_trait]` wrapper around the single create_message operation — so tests can inject a `MockRlmClient` without a live API key. This replaces the direct `Arc<DeepSeekClient>` field with `Arc<dyn RlmChildClient>`, wired transparently via `RlmQueryTool::new`. Concurrency regression test (`rlm_parallel_fanout_overlaps_not_serialized`): fires N=4 children each sleeping 50 ms through `join_all`. Asserts total elapsed < 4×50 ms (serial bound) and that all start timestamps cluster within <50 ms of each other. First run: total_elapsed=54 ms, start_spread=141 µs — fan-out was already correct; no serialization fix needed. UI wiring tests (`rlm_query_tool_cell_wired_with_prompts_on_start` etc.) verify that `handle_tool_call_started` with `rlm_query` populates `GenericToolCell.prompts` from the `prompts` (array) and `prompt` (singular) input shapes, and that non-fan-out tools leave `prompts: None`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:21:43 -05:00
Hunter Bown	9804c92c21	feat(cli): add 'deepseek metrics' command (closes #70 ) Implement `deepseek metrics` as a dispatcher-handled subcommand (no TUI binary roundtrip) that reads ~/.deepseek/audit.log, session JSON files, and tasks runtime JSONL event streams, then prints a human-readable usage rollup aggregated by tool name, compaction events, sub-agent spawns, and capacity-controller interventions. Flags: --json (machine-readable) and --since DURATION (e.g. 7d, 24h, 30m, now-2h, 2h30m). Empty/missing audit log exits 0 with an empty rollup; malformed lines are skipped silently via tracing::trace!. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:17:58 -05:00
Hunter Bown	7d0450f541	feat(tui): animated water-spout working strip in the footer (closes #61 ) Replace the single-spout bounce animation with two independent `╭───╮` arcs sweeping at different speeds across a calm `─` water surface. Add `footer_working_label` to pulse `working` → `working...` at 400 ms cadence while a turn is live. The dot-pulse fires even in low-motion mode; the arc strip is gated behind `!app.low_motion`. Frame math is purely deterministic so the test suite can pin specific frames. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:17:17 -05:00
Hunter Bown	9bc8eee927	feat(tui): animated water-spout working strip in the footer (closes #61 ) Replace the single-spout bounce animation with two independent `╭───╮` arcs sweeping at different speeds across a calm `─` water surface. Add `footer_working_label` to pulse `working` → `working...` at 400 ms cadence while a turn is live. The dot-pulse fires even in low-motion mode; the arc strip is gated behind `!app.low_motion`. Frame math is purely deterministic so the test suite can pin specific frames. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:17:02 -05:00
Hunter Bown	ebc70176ad	feat(tui): bracketed-paste config toggle + capacity hot-path bench Closes #77, refs #75. #77 — bracketed paste was unconditionally enabled at terminal init. Add a \`bracketed_paste\` field to Settings (default true) and propagate it through TuiOptions → App → run_tui / pause_terminal / resume_terminal so users on the rare terminal that mishandles \`\e[?2004h\` can disable it via \`/set bracketed_paste off\` or \`bracketed_paste = false\` in \`~/.config/deepseek/settings.toml\`. Modern terminals continue to work as before. All TuiOptions construction sites updated in one pass. #75 — added an ignored-test microbench for \`compute_profile\` in \`crates/tui/src/core/capacity.rs\`. Run with: cargo test -p deepseek-tui --release bench_compute_profile -- --ignored --nocapture Baseline (release, M1): window= 16 per-call= 48ns window= 64 per-call= 126ns window= 256 per-call= 385ns window=1024 per-call=1438ns Sub-µs at typical window sizes — no optimization shipped, bench locks in the regression contract. No new dev-deps (uses std::time::Instant + black_box, gated as #[ignore]).	2026-04-26 14:10:50 -05:00
Hunter Bown	432082e956	docs(agents): document `deepseek` as the canonical CLI binary The user-facing entry point for every flow is the `deepseek` dispatcher (crates/cli), not `deepseek-tui`. Future agent sessions and example commands should default to `deepseek` / `cargo run --bin deepseek`. Mirror the same directive in the local CLAUDE.md (gitignored).	2026-04-26 14:01:30 -05:00
Hunter Bown	ac1332565c	release: v0.6.2 Highlights: - fix(client): SSE idle-timeout so a stalled stream surfaces a clear error instead of hanging the active cell (#76) - fix(tui): sidebar Agents panel reads live engine progress, not just the cached snapshot — matches the footer chip in real time (#63) - fix(tui): generic tool result preview preserves newlines for diff stats / file lists / todo snapshots (#80) - fix(tui): slash-menu scroll viewport now exercises center-tracking past the first 6 entries (#64) - feat(mcp): connect-failure errors include URL, status, body excerpt, transport — credentials masked (#71) - feat(tools): mark alias tools (spawn_agent, close_agent, send_input, delegate_to_agent) with _deprecation metadata; removal slated 0.8.0 (#72) - feat(capacity): V4 model priors (deepseek-v4-pro/flash) + key normalization, plus DEEPSEEK_CAPACITY_PRIOR_V4_* env overrides (#73) - feat(tools): explain parallel fan-out caps in agent_spawn vs rlm_query descriptions and error messages — cost-class table in TOOL_SURFACE.md (#81) - chore(errors): partial wiring of the error taxonomy — classify_error_message helper used in capacity controller, audit log fields pending (#66) - chore(providers): scaffold OpenRouter and Novita variants end-to-end (env keys, default base URLs, model normalization). Modal /provider picker UI still pending (#52) Build hygiene: - cargo fmt clean, cargo clippy --workspace -- -D warnings clean - cargo test --workspace passes (979+ tests across crates) - pre-existing dead-code warnings gated per-item with TODO refs to #61/#66 EOF )	2026-04-26 13:56:40 -05:00
Hunter Bown	3375fc7285	merge: explain parallel fan-out caps (fixes #81 — was PR #82 )	2026-04-26 13:55:21 -05:00
Hunter Bown	1107b723b1	chore: simplify pass + clippy clean for v0.6.2 Cleanup pass after the issue fixes (#64, #71, #80, #63): Simplifications: - sidebar.rs: extract `push_agent_row` closure to remove the duplicated two-line agent rendering (cached + progress-only paths used the same shape with different summary text). - engine.rs: replace `error_categories.iter().any(\|c\| c == X)` with `.contains(&X)` (clippy::manual_contains). - widgets/mod.rs: replace `for idx in menu_top..menu_bottom` index loop with `.iter().enumerate().take(menu_bottom).skip(menu_top)` (clippy::needless_range_loop). Build hygiene (CI runs `cargo clippy ... -- -D warnings`): - error_taxonomy.rs: per-item `#[allow(dead_code)]` on `ErrorSeverity`, `ErrorEnvelope`, and `ErrorEnvelope::new` with TODO notes referencing #66. Keeps deepseek's removal of the file-wide allow but stops the scaffold from breaking the build until #66 follows up. - app.rs: per-field `#[allow(dead_code)]` on `fancy_animations` (pending #61 footer animation consumer). - config/lib.rs: complete the OpenRouter/Novita variant scaffolding so `match ProviderKind { ... }` is exhaustive — add api_key/base_url env loading (`OPENROUTER_API_KEY`, `NOVITA_API_KEY`, optional `*_BASE_URL` overrides), wire `api_key_for` / `base_url_for` arms with the documented defaults, and extend `normalize_model_for_provider` so generic V4 model names map to each provider's catalog ID. Full /provider picker UI still pending #52. Verified: cargo fmt clean, cargo clippy --workspace --all-targets --all-features --locked -- -D warnings clean, full test suite passes (979 + adjacent crate tests).	2026-04-26 13:54:54 -05:00
Hunter Bown	124011a862	fix(tui): sidebar Agents panel reads live progress, not just cache (closes #63 ) Repro: spawn 5 sub-agents. The footer chip correctly shows "5 agents" because running_agent_count() unions app.agent_progress (live engine events) with app.subagent_cache (settled snapshot from Op::ListSubAgents). The sidebar's Agents panel only read app.subagent_cache and so showed "No agents" while the footer said 5 — same data flow bug the user screenshotted in #63. Mirror the footer's union here: - Live progress-only IDs (in agent_progress, not yet in subagent_cache) get a one-line "starting" row with the latest progress message — surfaces the freshest signal first. - Cached entries get the full status row (steps taken, role, objective). - Header shows "{live_running} running / {total}" with both counts unified. The Agents panel now stays in sync with the footer chip and never lies about whether agents are in flight. Todos panel was already wired correctly to app.todos (the SharedTodoList lock); only the agents path was racing. Refs #63	2026-04-26 13:48:28 -05:00
Hunter Bown	f342d6508e	fix(tui): preserve newlines in generic tool result preview (closes #80 ) Before, GenericToolCell rendered its `output` through `render_compact_kv`, which treated the entire string as one logical line and let the wrapper handle overflow. Multi-line output (git diff --stat, todo snapshots, file lists) ended up squashed into a single hard-wrapped blob — the screenshot in the issue showed "Cargo.lock \| 1 + crates/cli/Cargo.toml \| 1 + crates/cli/src/main.rs" all on one row. Switch the result rendering to `render_tool_output_mode` (already used by ExecCell) which: - splits on `\n` first, then wraps each line independently; - caps live view at TOOL_OUTPUT_LINE_LIMIT (= 6) rows with a "+N more lines; press v for details" affordance; - emits the full body in transcript view. Threaded `RenderMode` through `ToolCell::Generic(...)` dispatch and renamed `GenericToolCell::lines_with_motion` → `lines_with_mode(mode)` (sole caller). Tests: - `generic_tool_cell_preserves_multi_line_output_in_transcript` asserts each diff-stat file lands on its own row. - `generic_tool_cell_caps_multi_line_output_in_live_with_affordance` pins the live cap + affordance + transcript-includes-everything contract. Fixes #80	2026-04-26 13:44:51 -05:00
Hunter Bown	ec92e535e8	feat(mcp): surface URL, status, body excerpt, transport on connect failure Before: a failed MCP server connection just said "Failed to connect to SSE: 401" or "Failed to spawn MCP server 'foo'" — devs had to enable RUST_LOG=debug to see what actually went wrong. Now: - SSE failures show "MCP SSE rejected (transport=http url=... status=401): <body excerpt up to 200 bytes>", with userinfo + bearer tokens + api_key query params masked. - stdio spawn failures show "MCP stdio spawn failed (transport=stdio server=foo cmd="..." args=[...] env_keys=[...])" — env values stay private, only keys leak. Helpers `mask_url_secrets`, `redact_body_preview`, `bounded_body_excerpt` are covered by 4 unit tests. Fixes #71	2026-04-26 13:40:07 -05:00
Hunter Bown	86f59cd2c2	merge: slash-menu scroll viewport fix (fixes #64 )	2026-04-26 13:37:30 -05:00
Hunter Bown	320325e419	fix(tui): bump SLASH_MENU_LIMIT to 128 so the scroll viewport works The composer's render path already paginates with center-tracking, but the source list was hard-capped at 6 entries — so pressing Down arrow past index 5 had no entries to land on. Repro: with ~37 slash commands, hitting Down repeatedly stuck at the last visible row. Bumping the source cap to 128 lets the existing viewport scroll logic exercise the full filtered command list. No render-path change needed. Fixes #64	2026-04-26 13:37:29 -05:00
Hunter Bown	feb3cf1e0c	feat: explain parallel fan-out caps in tool descriptions and error messages (fixes #81 )	2026-04-26 13:16:12 -05:00
Hunter Bown	38069700cc	chore: wip capacity canonical state + tool alias deprecation	2026-04-26 13:11:57 -05:00
Hunter Bown	2adbe398ba	merge: tool alias deprecation metadata (fixes #72 )	2026-04-26 12:55:17 -05:00
Hunter Bown	4f18809d74	merge: V4 capacity priors (fixes #73 )	2026-04-26 12:53:31 -05:00
Hunter Bown	c58d10ded1	feat(tools): mark alias tools with deprecation metadata Add `wrap_with_deprecation_notice` helper in the subagent module that merges a `_deprecation` block into a ToolResult's metadata. Applied exclusively on alias invocations: - `spawn_agent` → use `agent_spawn` (removed in v0.8.0) - `delegate_to_agent` → use `agent_spawn` (removed in v0.8.0) - `close_agent` → use `agent_cancel` (removed in v0.8.0) - `send_input` → use `agent_send_input` (removed in v0.8.0) Canonical names are unaffected. Each alias invocation also emits a `tracing::warn` so the deprecation appears in audit logs. Documents the deprecation schedule in `docs/TOOL_SURFACE.md`. Four unit tests verify the notice shape and that canonical tools stay clean. Refs #72 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:32:26 -05:00
Hunter Bown	cf9fdef9d8	fix(capacity): add V4 model priors and key normalization Add deepseek_v4_pro (3.5) and deepseek_v4_flash (4.2) priors to CapacityControllerConfig::default() so V4 models are no longer silently mapped to the generic 3.8 fallback. Extend normalize_model_prior_key to match v4-pro, v4_pro, v4-flash, v4_flash, and deepseek-ai/-prefixed NIM identifiers before the V3/ reasoner branches to prevent cross-matches. V3 and reasoner fallbacks are unchanged. Add deepseek_v4_pro_prior / deepseek_v4_flash_prior fields to CapacityConfig (config.toml) and DEEPSEEK_CAPACITY_PRIOR_V4_PRO / DEEPSEEK_CAPACITY_PRIOR_V4_FLASH env-var overrides, matching the existing V3 pattern. Refs #73 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:28:21 -05:00
Hunter Bown	e9970fcad3	ci: switch npm publish to NPM_TOKEN + add auto-tag workflow The OIDC Trusted Publisher path for npm has 404'd on PUT for v0.5.1, v0.5.2, and v0.6.1, even with valid OIDC tokens. Switch publish-npm and publish-npm-manual to a classic NPM_TOKEN automation token (set the NPM_TOKEN repo secret to a granular access token scoped to deepseek-tui with publish permission) so future releases ship reliably. Also add .github/workflows/auto-tag.yml: when the workspace version on main changes, push the matching v$VERSION tag automatically so release.yml fires without a manual tag push. Requires a RELEASE_TAG_PAT secret to trigger downstream workflows (GITHUB_TOKEN tag pushes don't trigger on: push: tags by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:22:15 -05:00
Hunter Bown	e1ac84ae44	release: v0.6.1 — pricing update, remove light theme + theme setting - V4 cache-hit input prices cut to 1/10th per DeepSeek pricing update: Pro promo 0.03625→0.003625, Pro base 0.145→0.0145, Flash 0.028→0.0028 - Remove the 'light' theme variant (Variant::Light, Theme::light(), test) - Remove the theme setting entirely — hardcode UI_THEME to whale/dark, drop the theme field from Settings, ConfigView, and config command - Bump workspace version 0.6.0 → 0.6.1 (Cargo.toml, npm pkg, CHANGELOG) - De-cringe the README: drop emojis, marketing fluff, unverified claims	2026-04-26 11:56:41 -05:00
Hunter Bown	c5a584d5c3	refactor(client): extract chat + responses into folder module (P1.1) Split client.rs into client/mod.rs (public API + helpers), client/chat.rs (chat-completions streaming), and client/responses.rs (responses API helpers). Internal helpers promoted to pub(super) for intra-module visibility; the public DeepSeekClient API is unchanged. While here, redesign all five system prompts around decomposition-first philosophy inspired by the mismanaged-geniuses hypothesis (Zhang et al., 2026). The model is now instructed to todo_write / update_plan before acting, fan out sub-agents for parallel work, and keep the sidebar populated so the user always sees what's happening. Mode prompts updated: - agent.txt: 'Before requesting approval, lay out work with todo_write' - plan.txt: 'Use update_plan for strategy, todo_write for tactics' - yolo.txt: 'Even with auto-approval, create a todo_write first' - normal.txt: same pattern for legacy compatibility Update CHANGELOG [Unreleased] and README modes section accordingly.	2026-04-26 11:39:44 -05:00
Hunter Bown	1a100fe96c	refactor(core): carve approval + dispatch helpers out of engine.rs (P1.3) Splits `core/engine.rs` (4670 → 4314 lines) into a small folder module: - `engine/approval.rs` (~125 lines) — `ApprovalDecision`, `UserInputDecision`, `ApprovalResult`, plus the two handshake methods `Engine::await_tool_approval` and `Engine::await_user_input`. - `engine/dispatch.rs` (~300 lines) — tool-input parsing (`final_tool_input`, `parse_tool_input`, fenced/JSON segment helpers), `multi_tool_use.parallel` payload parser, dispatch policy predicates (`should_parallelize_tool_batch`, `should_force_update_plan_first`, `should_stop_after_plan_tool`, the read-only MCP tool helpers), and the `ToolExecutionPlan`/`ToolExecOutcome`/`ParallelToolResult`/ `ToolExecGuard` types the batch driver passes around. The public engine surface (`EngineConfig`, `EngineHandle`, `spawn_engine`, `MockEngineHandle`, `mock_engine_handle`, `compact_tool_result_for_context`, `TOOL_CALL__MARKERS`, `FAKE_WRAPPER_NOTICE`) stays in `engine.rs` — every external user imports unchanged. Not split this round: the 1268-line `handle_deepseek_turn` method. Carving its inline parallel/sequential dispatch and approval handshake arms requires extracting two new methods from a borrow-heavy turn loop; flagged in the v0.6.0 audit doc as future work. Workspace tests: 1011/1011 still green. No clippy regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:57:27 -05:00
Hunter Bown	25cfe11736	refactor(tui): extract slash-menu helpers into tui/slash_menu.rs (P1.2) Lifts `visible_slash_menu_entries`, `apply_slash_menu_selection`, and `try_autocomplete_slash_command` from `tui/ui.rs` into a sibling module. Drops the now-unused `slash_completion_hints` import from `ui.rs` (the new module imports it directly). Kept separate from `tui::file_mention` per the audit doc — the two popups have distinct trigger characters, ranking, and post-selection behaviour even though they share UI scaffolding. `ui.rs`: ~5070 → ~4990 lines. Workspace tests: 1011/1011 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:47:44 -05:00
Hunter Bown	56308bb5d7	refactor(tui): extract paste-burst handlers into tui/paste.rs (P1.2) Lifts `handle_paste_burst_key`, `handle_paste_burst_decision`, `apply_paste_burst_retro_capture`, and the local `in_command_context` helper out of `tui/ui.rs` into a sibling module. The state machine (`PasteBurst`) and its tests stay in `paste_burst.rs`; only the keymap- side wiring moves. Drops the now-unused `CharDecision` import from `ui.rs`. Workspace tests: 1011/1011 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:45:35 -05:00
Hunter Bown	4138053dd8	refactor(tui): extract sidebar rendering into tui/sidebar.rs (P1.2) Moves the four sidebar panels (Plan, Todos, Tasks, Agents) plus the shared `render_sidebar_section` wrapper out of `tui/ui.rs` into a new sibling module. `truncate_line_to_width` becomes `pub(crate)` so the new module can reuse it. Drops six imports from `ui.rs` that the sidebar took with it. `ui.rs`: 5450 → ~5070 lines. Workspace tests: 1011/1011 still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:43:43 -05:00
Hunter Bown	5d51143194	feat(tui): group thinking + tool calls in one active cell (P2.3) Routes Thinking content through `active_cell` so a turn that emits Thinking → Tool → Tool renders as one logical "Working…" block until the next assistant prose chunk flushes the group into history. - `ActiveCell::push_thinking` parallels `push_tool` for non-tool entries. - `mark_in_progress_as_interrupted` now also stops streaming Thinking spinners on cancellation, matching tool cell behaviour. - New `streaming_thinking_active_entry` field on `App` tracks the in-flight thinking entry index so deltas can mutate it in place. - `flush_active_cell` finalizes any unclosed thinking spinner before draining the group into history (defensive guard). - Removed the dead `StreamingCellKind::Thinking` variant and tightened `append_streaming_text` to Assistant only. Tests cover: push_thinking, group ordering, drain order, interrupt- clears-spinner, the full Thinking → Tool → Tool → flush flow, defensive flush of an unclosed thinking block, and a second thinking block appending inside the same active cell. Workspace tests: 1004/1004 → 1011/1011. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:40:05 -05:00
Hunter Bown	ca7ca9f75f	docs: drop stale handoff/migration/parity docs Removed: - `.claude/next-agent-prompt.md` (111 lines) — v0.4.6-era session prompt describing slices A/B/C that have all shipped. Successive sessions use fresh prompts (e.g. .deepseek/v0.6.0-overnight-review.md); this one is pure history. - `docs/archive/workspace_migration_status.md` (92 lines) — explicitly archived (April 11), describes a one-time migration that's complete. Removed enclosing `docs/archive/` directory too (was the only file). CHANGELOG entry from v0.4.x still narrates the archival as history. - `docs/parity_release_and_ci.md` (38 lines) — duplicates what `.github/workflows/parity.yml` and CONTRIBUTING.md already say authoritatively. Single source of truth wins. - `AI_HANDOFF.md` + `todo.md` (untracked, no commit needed) — `todo.md` was a 7-line pointer to AI_HANDOFF.md, which itself was an April 11 snapshot listing "remaining work" that's mostly delivered. CLAUDE.md is the live developer guide now. 1004/1004 tests still green; no doc/code references broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:27:58 -05:00
Hunter Bown	d6bfcda474	chore: drop unreferenced assets/hero.png Not referenced from README.md, docs/, npm/, or any Cargo metadata. README uses assets/screenshot.png. Reduces repo size by 226 KB. Also cleaned up working-directory cruft (untracked, no commit needed): apps/ (empty), python/ (empty after egg-info removed), counterpoint.copilot.db, firebase-debug.log, excalidraw.log, .DS_Store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:23:08 -05:00
Hunter Bown	f3df5e515e	docs(changelog): roll up Phase 2/4 polish — agents chip, mention popup, P2.4 tests, subagent split, parse-counter de-flake Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:16:30 -05:00
Hunter Bown	a4f4f5040f	style: cargo fmt --all (post-Phase-2/4 cleanup) Auto-format pass after the tool-call rendering work, footer chip, mention popup, subagent split, and parse-counter de-flake. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:15:11 -05:00
Hunter Bown	a02898545d	refactor(tools): split subagent.rs into folder module — start with tests (P1.1) Promote `tools/subagent.rs` (4206 lines) to a folder module: tools/subagent/ mod.rs — runtime types, manager, tool implementations (~3577 lines) tests.rs — extracted test module (~631 lines) This is the safe first step. The audit doc proposed a 4-way split (mod / spec / executor / tests). I tried the 3-way (mod / tools / tests) and the runtime <-> tool-impl coupling produces unresolved-symbol errors because shared helpers (`SubAgentTask`, `run_subagent_task`, `build_allowed_tools`, `normalize_role_alias`, `parse_spawn_request`, the agent prompt constants) are referenced from both layers. Doing that split right needs a small API design pass to decide which helpers graduate to the manager API and which stay tool-private — out of scope for a structural reorg. Pulled the test module out as the cleanest no-API-change win and left a path open for the bigger split later. Public API unchanged — `pub mod subagent;` still exports the same items because `mod.rs` is a drop-in replacement for `subagent.rs`. 954 → 954 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:13:35 -05:00
Hunter Bown	2185b8c3c6	feat(tui): wire up @-mention popup end-to-end (P2.1) The audit doc claimed the wiring was "in place" but only the App state fields existed (`mention_menu_selected`, `mention_menu_hidden`) — no helpers, no widget rendering, no key handling. Building it out fully so the popup actually shows when the user types `@` in the composer and Up/Down/Enter/Tab/Esc behave the way the slash menu does. What's new: 1. `file_mention::visible_mention_menu_entries(app, limit)` — the entries source. Returns `Vec<String>` from the workspace walk, gated on the `mention_menu_hidden` flag and on the cursor being inside an `@token`. 2. `file_mention::apply_mention_menu_selection(app, entries)` — splices the selected entry into the input via the existing `replace_file_mention`, resets `mention_menu_hidden`, surfaces a status confirmation. 3. `ComposerWidget::new(app, max_height, slash_entries, mention_entries)` — second menu slot. The widget renders whichever slice is non-empty, addressed by the matching selected index. Mention entries get an `@` prefix so the popup row reads like the actual mention being composed. Mention takes precedence (positional check is stricter than slash's "starts-with-/"). 4. ui.rs key handler: - Up/Down navigate `mention_menu_selected` when the popup is open. - Enter applies `apply_mention_menu_selection` instead of submitting. - Tab applies the selection (then falls through to the existing slash / command-completion / file-mention chain). - Esc hides the popup until the next input edit (`insert_str` already resets `mention_menu_hidden`, so typing re-opens it). 6 new tests in `ui/tests.rs`: - mention_popup_is_empty_when_cursor_is_not_in_a_mention - mention_popup_lists_workspace_matches_for_cursor_partial - mention_popup_respects_hidden_flag - apply_mention_menu_selection_splices_selected_entry - apply_mention_menu_selection_is_noop_outside_a_mention - apply_mention_menu_selection_with_no_entries_is_noop Also fixes a stray duplicate `#[cfg(...)]` and an unused-doc-comment warning that landed when the parse-counter went thread-local — back to baseline 7 clippy warnings. 948 → 954 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:09:04 -05:00
Hunter Bown	06355e3aea	test(tui): pin auto-scroll churn contract for P2.4 regression coverage Audit pass found the auto-scroll paths are already gated correctly: - `mark_history_updated` only bumps history_version + needs_redraw — does NOT scroll. - All tool-cell handlers (`handle_tool_call_started`, `handle_tool_call_complete`, `push_active_tool_cell`, `register_tool_cell`) call `mark_history_updated` only — none of them call `scroll_to_bottom`. - `add_message` and `flush_active_cell` gate their auto-scroll on `user_scrolled_during_stream`. - The per-stream lock clears at TurnComplete (ui.rs ~557) and when the user scrolls back to the live tail (widgets/mod.rs ~126). - Explicit user actions (vim G, End, session resume, message submit) call `scroll_to_bottom` directly — that's correct. 5 new regression tests in ui/tests.rs lock the contract so a future contributor adding `app.scroll_to_bottom()` to a tool-cell handler hits a red CI immediately: - add_message_does_not_scroll_when_user_scrolled_away - add_message_pins_to_tail_when_user_was_following - tool_call_started_does_not_scroll_when_user_scrolled_away - tool_call_complete_does_not_scroll_when_user_scrolled_away - mark_history_updated_does_not_call_scroll_to_bottom 948 → 948 (no changes; tests were already passing — they just weren't written yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:00:43 -05:00
Hunter Bown	75de26c7a1	test(tui): de-flake parse-invocation counter via thread-local `parse_invocations_increment` and `render_parsed_does_not_call_parse` both read the global PARSE_INVOCATIONS atomic. They were racing whenever any other test in the suite called `parse()` in parallel — the global counter would tick once for each unrelated call and the assertion (== 2 / == 0) would mismatch. Switching to `thread_local!<Cell<u64>>` gives each test thread its own counter, so concurrent callers from other tests can't pollute the result. Tested across 8 sequential full-suite runs: 8/8 green (was ~40% green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:00:32 -05:00
Hunter Bown	9467d26db7	feat(tui): surface in-flight sub-agents in the footer status strip (P2.5) FooterProps gains an `agents` chip slot, populated by `footer_agents_chip` which mirrors the rest of the footer chips: empty `Vec<Span>` when `running_agent_count == 0` (chip hides), "1 agent" / "N agents" otherwise, DeepSeek-sky color matching the model badge. The widget's `auxiliary_spans` includes it in the same drop-from-end fit-to-width chain as the existing chips, so on narrow terminals the cost chip drops first as before. The "0 running" wording the audit doc called out wasn't actually in FooterProps — that wording is in the agent sidebar (ui.rs ~2960) and was already fixed there to swap to "N done" once nothing is in flight. So the P2.5 work here is the additive footer surface, not a wording fix. 4 new tests in widgets/footer.rs: - footer_agents_chip_is_empty_when_no_agents_running - footer_agents_chip_uses_singular_for_one - footer_agents_chip_uses_plural_for_many - footer_agents_chip_renders_into_widget 939 → 943 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:54:03 -05:00
Hunter Bown	93efb09038	fix(tui): tool-call rendering — defer ToolCallStarted, progressive labels, elapsed badge The engine used to fire `Event::ToolCallStarted` from `ContentBlockStart::ToolUse` with `input: json!({})` — before any `Delta::InputJsonDelta` had streamed in. The UI's `handle_tool_call_started` baked the placeholder into the cell at creation time and never refreshed, so users saw `<command>` and `<file>` literals while the args finished streaming. Fix relocates the emission to `ContentBlockStop` (where the input is finalized already) and routes it through a new `final_tool_input(state)` helper that prefers the parsed buffer over a stale empty initial input. Three regression tests in `engine/tests.rs` pin the contract. Also bundled (same theme — make in-flight tool cells read right): - Progressive labels via `exploring_label`: "Read foo.rs" → "Reading foo.rs", "List X" → "Listing X", "Search pattern" → "Searching for `pattern`", "List files" → "Listing files". 5 tests in `ui/tests.rs`. - `running_status_label_with_elapsed` in `history.rs`: from 3 s onward the status segment becomes `running (Ns)` and ticks every second, driven by the existing CX#3 status-animation tick. Below 3 s no badge — quick reads/greps stay quiet. Wired through `render_tool_header`. 2 tests. - Spinner cadence sped up: `TOOL_STATUS_SYMBOL_MS` 1800 → 720 ms per glyph, so the 4-glyph "heartbeat" is ~2.88 s instead of ~7.2 s. 929 → 939 tests, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:50:32 -05:00
Hunter Bown	42fe888d35	Merge CX#7: one active cell mutated in place Replaces "tool start pushes new cell" with a single ActiveCell that collects parallel/serial tool entries at the transcript tail and flushes as a contiguous block on first assistant text or turn complete. Stops the bounce when many tools fire concurrently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:14:07 -05:00

1 2 3 4 5

207 Commits