Commit Graph

167 Commits

Author SHA1 Message Date
Hunter Bown d6bfcda474 chore: drop unreferenced assets/hero.png
Not referenced from README.md, docs/, npm/, or any Cargo metadata.
README uses assets/screenshot.png. Reduces repo size by 226 KB.

Also cleaned up working-directory cruft (untracked, no commit needed):
apps/ (empty), python/ (empty after egg-info removed),
counterpoint.copilot.db, firebase-debug.log, excalidraw.log, .DS_Store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:23:08 -05:00
Hunter Bown f3df5e515e docs(changelog): roll up Phase 2/4 polish — agents chip, mention popup, P2.4 tests, subagent split, parse-counter de-flake
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:16:30 -05:00
Hunter Bown a4f4f5040f style: cargo fmt --all (post-Phase-2/4 cleanup)
Auto-format pass after the tool-call rendering work, footer chip,
mention popup, subagent split, and parse-counter de-flake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:15:11 -05:00
Hunter Bown a02898545d refactor(tools): split subagent.rs into folder module — start with tests (P1.1)
Promote `tools/subagent.rs` (4206 lines) to a folder module:

  tools/subagent/
    mod.rs    — runtime types, manager, tool implementations (~3577 lines)
    tests.rs  — extracted test module (~631 lines)

This is the safe first step. The audit doc proposed a 4-way split
(mod / spec / executor / tests). I tried the 3-way (mod / tools / tests)
and the runtime <-> tool-impl coupling produces unresolved-symbol errors
because shared helpers (`SubAgentTask`, `run_subagent_task`,
`build_allowed_tools`, `normalize_role_alias`, `parse_spawn_request`,
the agent prompt constants) are referenced from both layers. Doing that
split right needs a small API design pass to decide which helpers
graduate to the manager API and which stay tool-private — out of scope
for a structural reorg. Pulled the test module out as the cleanest
no-API-change win and left a path open for the bigger split later.

Public API unchanged — `pub mod subagent;` still exports the same items
because `mod.rs` is a drop-in replacement for `subagent.rs`.

954 → 954 tests, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:13:35 -05:00
Hunter Bown 2185b8c3c6 feat(tui): wire up @-mention popup end-to-end (P2.1)
The audit doc claimed the wiring was "in place" but only the App state
fields existed (`mention_menu_selected`, `mention_menu_hidden`) — no
helpers, no widget rendering, no key handling. Building it out fully so
the popup actually shows when the user types `@` in the composer and
Up/Down/Enter/Tab/Esc behave the way the slash menu does.

What's new:

1. `file_mention::visible_mention_menu_entries(app, limit)` — the entries
   source. Returns `Vec<String>` from the workspace walk, gated on the
   `mention_menu_hidden` flag and on the cursor being inside an `@token`.

2. `file_mention::apply_mention_menu_selection(app, entries)` — splices
   the selected entry into the input via the existing `replace_file_mention`,
   resets `mention_menu_hidden`, surfaces a status confirmation.

3. `ComposerWidget::new(app, max_height, slash_entries, mention_entries)`
   — second menu slot. The widget renders whichever slice is non-empty,
   addressed by the matching selected index. Mention entries get an `@`
   prefix so the popup row reads like the actual mention being composed.
   Mention takes precedence (positional check is stricter than slash's
   "starts-with-/").

4. ui.rs key handler:
   - Up/Down navigate `mention_menu_selected` when the popup is open.
   - Enter applies `apply_mention_menu_selection` instead of submitting.
   - Tab applies the selection (then falls through to the existing slash /
     command-completion / file-mention chain).
   - Esc hides the popup until the next input edit (`insert_str` already
     resets `mention_menu_hidden`, so typing re-opens it).

6 new tests in `ui/tests.rs`:
- mention_popup_is_empty_when_cursor_is_not_in_a_mention
- mention_popup_lists_workspace_matches_for_cursor_partial
- mention_popup_respects_hidden_flag
- apply_mention_menu_selection_splices_selected_entry
- apply_mention_menu_selection_is_noop_outside_a_mention
- apply_mention_menu_selection_with_no_entries_is_noop

Also fixes a stray duplicate `#[cfg(...)]` and an unused-doc-comment
warning that landed when the parse-counter went thread-local — back to
baseline 7 clippy warnings.

948 → 954 tests, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:09:04 -05:00
Hunter Bown 06355e3aea test(tui): pin auto-scroll churn contract for P2.4 regression coverage
Audit pass found the auto-scroll paths are already gated correctly:

- `mark_history_updated` only bumps history_version + needs_redraw — does
  NOT scroll.
- All tool-cell handlers (`handle_tool_call_started`,
  `handle_tool_call_complete`, `push_active_tool_cell`,
  `register_tool_cell`) call `mark_history_updated` only — none of them
  call `scroll_to_bottom`.
- `add_message` and `flush_active_cell` gate their auto-scroll on
  `user_scrolled_during_stream`.
- The per-stream lock clears at TurnComplete (ui.rs ~557) and when the
  user scrolls back to the live tail (widgets/mod.rs ~126).
- Explicit user actions (vim G, End, session resume, message submit) call
  `scroll_to_bottom` directly — that's correct.

5 new regression tests in ui/tests.rs lock the contract so a future
contributor adding `app.scroll_to_bottom()` to a tool-cell handler hits a
red CI immediately:

- add_message_does_not_scroll_when_user_scrolled_away
- add_message_pins_to_tail_when_user_was_following
- tool_call_started_does_not_scroll_when_user_scrolled_away
- tool_call_complete_does_not_scroll_when_user_scrolled_away
- mark_history_updated_does_not_call_scroll_to_bottom

948 → 948 (no changes; tests were already passing — they just weren't
written yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:00:43 -05:00
Hunter Bown 75de26c7a1 test(tui): de-flake parse-invocation counter via thread-local
`parse_invocations_increment` and `render_parsed_does_not_call_parse` both
read the global PARSE_INVOCATIONS atomic. They were racing whenever any
other test in the suite called `parse()` in parallel — the global counter
would tick once for each unrelated call and the assertion (== 2 / == 0)
would mismatch.

Switching to `thread_local!<Cell<u64>>` gives each test thread its own
counter, so concurrent callers from other tests can't pollute the result.
Tested across 8 sequential full-suite runs: 8/8 green (was ~40% green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:00:32 -05:00
Hunter Bown 9467d26db7 feat(tui): surface in-flight sub-agents in the footer status strip (P2.5)
FooterProps gains an `agents` chip slot, populated by `footer_agents_chip`
which mirrors the rest of the footer chips: empty `Vec<Span>` when
`running_agent_count == 0` (chip hides), "1 agent" / "N agents" otherwise,
DeepSeek-sky color matching the model badge.

The widget's `auxiliary_spans` includes it in the same drop-from-end
fit-to-width chain as the existing chips, so on narrow terminals the cost
chip drops first as before.

The "0 running" wording the audit doc called out wasn't actually in
FooterProps — that wording is in the agent sidebar (ui.rs ~2960) and was
already fixed there to swap to "N done" once nothing is in flight. So the
P2.5 work here is the additive footer surface, not a wording fix.

4 new tests in widgets/footer.rs:
- footer_agents_chip_is_empty_when_no_agents_running
- footer_agents_chip_uses_singular_for_one
- footer_agents_chip_uses_plural_for_many
- footer_agents_chip_renders_into_widget

939 → 943 tests, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:54:03 -05:00
Hunter Bown 93efb09038 fix(tui): tool-call rendering — defer ToolCallStarted, progressive labels, elapsed badge
The engine used to fire `Event::ToolCallStarted` from `ContentBlockStart::ToolUse`
with `input: json!({})` — before any `Delta::InputJsonDelta` had streamed in.
The UI's `handle_tool_call_started` baked the placeholder into the cell at
creation time and never refreshed, so users saw `<command>` and `<file>`
literals while the args finished streaming.

Fix relocates the emission to `ContentBlockStop` (where the input is finalized
already) and routes it through a new `final_tool_input(state)` helper that
prefers the parsed buffer over a stale empty initial input. Three regression
tests in `engine/tests.rs` pin the contract.

Also bundled (same theme — make in-flight tool cells read right):

- Progressive labels via `exploring_label`: "Read foo.rs" → "Reading foo.rs",
  "List X" → "Listing X", "Search pattern" → "Searching for `pattern`",
  "List files" → "Listing files". 5 tests in `ui/tests.rs`.
- `running_status_label_with_elapsed` in `history.rs`: from 3 s onward the
  status segment becomes `running (Ns)` and ticks every second, driven by
  the existing CX#3 status-animation tick. Below 3 s no badge — quick
  reads/greps stay quiet. Wired through `render_tool_header`. 2 tests.
- Spinner cadence sped up: `TOOL_STATUS_SYMBOL_MS` 1800 → 720 ms per glyph,
  so the 4-glyph "heartbeat" is ~2.88 s instead of ~7.2 s.

929 → 939 tests, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:50:32 -05:00
Hunter Bown 42fe888d35 Merge CX#7: one active cell mutated in place
Replaces "tool start pushes new cell" with a single ActiveCell that
collects parallel/serial tool entries at the transcript tail and
flushes as a contiguous block on first assistant text or turn complete.
Stops the bounce when many tools fire concurrently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:14:07 -05:00
Hunter Bown 63d7391ff8 CX#7: one active cell, mutated in place
Codex pattern — instead of appending a new ToolCall history cell for each
parallel tool invocation, keep one Exploring/Searching/Reading active cell at
the tail of the transcript and mutate its contents in place as new tool calls
fire. Drops cell churn and keeps the visual anchor stable while multiple tools
stream concurrently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:13:57 -05:00
Hunter Bown 585dd2f7d0 CX#8: two surfaces — display_lines vs transcript_lines
- HistoryCell::Thinking — live shows first ~4 lines + Ctrl+O affordance;
  transcript_lines() returns full content with all paragraphs.
- ExecCell — live caps with head/tail + omission marker; transcript
  emits all wrapped lines without truncation.
- Tool/Patch/Mcp/Review cells — live caps + affordance; transcript
  uncapped.
- User/Assistant/System/Plan/Diff/etc — display == transcript.
- Pager (Ctrl+O / Ctrl+T) flows through transcript_lines via
  history_cell_to_text — opening the pager on a thinking or capped tool
  cell shows the full body.

Updated affordance assertion to match the post-CX#9 wording
(press Ctrl+O for full text).

911/911 tests pass; clippy -D warnings clean; fmt clean.
2026-04-25 22:44:42 -05:00
Hunter Bown 8f05f272d3 CX#5 + CX#11: line-buffer newline gate + pure-render footer
CX#5 (newline-boundary streaming gate):
- New crates/tui/src/tui/streaming/line_buffer.rs — LineBuffer holds
  text after the last \n until the next \n arrives, so partial code
  fences never become visible state.
- Wired into BlockState in streaming/mod.rs. Assistant text gates;
  thinking deltas bypass (reasoning stays live).
- 9 unit tests including the partial-fence regression case.

CX#11 (pure-render footer):
- New crates/tui/src/tui/widgets/footer.rs — FooterProps / FooterToast
  / FooterWidget. Pure render of pre-computed props.
- ui.rs::render_footer rewritten to build props once and delegate to
  FooterWidget. Visual output identical; existing 10 footer tests
  pass unchanged. 5 new from_app tests for the props builder.

908/908 tests pass; cargo clippy --workspace -D warnings clean;
cargo fmt clean.
2026-04-25 22:43:03 -05:00
Hunter Bown 1ad0c886b8 CX#4: two-gear streaming chunker (Smooth ↔ CatchUp)
Splits crates/tui/src/tui/streaming.rs into a streaming/ module:
- streaming/mod.rs — StreamingState with per-block BlockState
- streaming/chunking.rs — policy state machine, 7 tests
- streaming/commit_tick.rs — StreamChunker queue + run_commit_tick

Thresholds match codex parity: ENTER_QUEUE_DEPTH=8, ENTER_OLDEST_AGE=120ms,
EXIT_QUEUE_DEPTH=2, EXIT_OLDEST_AGE=40ms, EXIT_HOLD=250ms.

894/894 tests pass; clippy -D warnings clean; fmt clean.
2026-04-25 22:28:00 -05:00
Hunter Bown d111680a3b v0.6: codex-pattern wave 1 — frame rate, scroll model, revision cache, vim pager, cap output, kill-buffer
Merges in 6 of 13 codex-pattern items as a wave-1 checkpoint:

- CX#3 frame-rate limiter (~120 FPS coalesce)
- CX#9 LIVE_TOOL_OUTPUT_MAX_LINES + Ctrl+O affordance
- CX#13 kill-buffer Ctrl+K / Ctrl+Y composer
- CX#12 vim pager keys (j/k/g/G/Ctrl+D/Ctrl+U/Ctrl+F/Ctrl+B/Space)
- CX#2 flat line-offset scroll model (replaces TranscriptScroll cell-anchor enum)
- CX#10 per-cell revision counter for partial cache rebuilds

883/883 tests pass (was 853). cargo clippy --workspace -D warnings clean. cargo fmt clean.
2026-04-25 22:25:30 -05:00
Hunter Bown 5f223adea6 v0.6.0: native rlm_query tool + scroll fix + cleanup
Adds a structured rlm_query tool for parallel/batched LLM fan-out.
The model calls it with one prompt or up to 16 concurrent prompts;
children dispatch via tokio::join_all against the existing DeepSeek
client. Default child model is deepseek-v4-flash; override per-call
via the model field. Available in Plan / Agent / YOLO. Cost folds
into the session's running total automatically.

Fixes scroll-stuck regression (#56): TranscriptScroll::resolve_top
and scrolled_by now use a three-level fallback chain (same line →
same cell line 0 → nearest cell at-or-before) instead of teleporting
to ToBottom when an anchor cell vanishes.

Loosens command-safety chains (#57): cargo build && cargo test and
similar chains of known-safe commands now escalate to RequiresApproval
instead of being hard-blocked as Dangerous. Chains containing unknown
commands still block.

Suppresses the GettingCrowded footer chip — context-percent header
already covers conversation pressure.

Refactors:
- Extracts file_mention parsing/completion/expansion (~450 LOC) from
  the 5,500-line ui.rs into crates/tui/src/tui/file_mention.rs.
- Deletes truly unused helpers (write_bytes, timestamped_filename,
  extension_from_url, output_path, has_project_doc, primary_doc_path).

Tests: 853 pass. cargo clippy --workspace -D warnings clean.
cargo fmt --all -- --check clean.

Closes #46 #47 #48 #49 #50 #53 #54 #55 #56 #57 #58.
2026-04-25 21:48:17 -05:00
Hunter Bown 027d6d19b6 docs(rlm): land Hetun design + helper layer + Sakana research methodology
Captures the full RLM-fundamental story across the design doc, MODES.md,
and the Hetun prompt. Tracking issues are now #46–#55 (helper layer
filed as #53, Hetun as #54, vendoring as #55).

What this nails down:

- **Hetun mode** is added at the END of the Tab cycle (Plan → Agent →
  YOLO → Hetun → Plan), not as a Plan replacement. Default landing mode
  is unchanged so people don't accidentally start there. Plan stays as
  it is.

- **Mission-level approval, not block-level.** Hetun runs a research
  phase, presents one mission card, and only executes after explicit
  user approval. Inside the execution turn the repl block runs straight
  through with no per-block prompts — that's the whole point of the
  mode.

- **The user's configured model is left alone on enter/exit.** Pro/max
  users stay on Pro/max. The flash-as-coordinator behaviour is internal
  to the runtime (ZIGRLM_RLM_CMD always points to flash regardless of
  mode). No global model swap.

- **No /hetun slash command.** Tab cycles into the mode; /plan keeps
  switching to Plan as today.

- **The helper layer (#53) is fundamental, not aleph-derived.** A
  curated ~20-function ctx-helper module + AST-validated Python sandbox
  baked into the repl runtime so a single block can load → slice → fan
  out flash queries → aggregate without crossing tool boundaries.
  Inspired by aleph's pattern but our own native primitive — not a port.

- **Hetun research methodology adopts Sakana's Fugu patterns.** The
  research phase is recursive novelty sampling + hierarchical narrative
  tree synthesis + multi-detector cross-verification (flash for
  breadth, Pro for depth) + hypothesis-verification loop. Not "fan out
  8 fixed queries". This is what makes "Plan + Recursive Agents"
  meaningful versus a flash-coordinator wrapper.

- **No version-number framing anywhere.** The plan ships as one cohesive
  RLM landing across #46/#48/#49/#50/#53/#54/#55 — order is dependency,
  not release schedule. We keep shipping.

- **Auto-compaction stays automatic.** Removed a manual /compact nag
  from the Hetun prompt; the existing coherence + capacity system
  already handles this.

Files:
  docs/rlm-design.md          new — full design doc with Hetun details
  docs/research-react-vs-rlm.md  new — supporting research treatment
  docs/MODES.md               4-mode cycle, Hetun added at end, Plan kept
  crates/tui/src/prompts/hetun.txt  prompt teaching the recursive-novelty
                              + hierarchical-synthesis + verification-loop
                              rhythm, mission-card structure, two-step gate
  .gitignore                  ignore .claude/scheduled_tasks.lock runtime

Closes nothing yet — implementation lands across the tracking issues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 15:37:25 -05:00
Hunter Bown 229b1993ac ci: mirror release-workflow strict gates so failed publishes can't slip through
The Release / parity job runs with `--locked` and clippy `-D warnings`.
Main CI ran without either flag, so commits could pass main CI but fail
the release pipeline at the parity stage — which has been silently
blocking every npm publish since v0.4.6 (latest npm = 0.4.8 even though
git tags reach v0.5.2). Most failures were either fmt drift caused by
new stable rustc / rustfmt revisions or lockfile drift the workspace CI
never noticed.

Aligns the Lint job's clippy step with `--locked -- -D warnings` and
the Test job with `--locked` + an explicit `git diff --exit-code --
Cargo.lock` lockfile drift guard. From here on, anything that would
fail Release / parity also fails main CI on the same push, so we
never push a tag we know will fail the publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 15:07:28 -05:00
Hunter Bown 686ef94719 fix: doc-link warning in model_picker module header
Removed the `[`Settings`]` intra-doc link that referenced a type not
in scope at the module-doc level — RUSTDOCFLAGS=-Dwarnings rejects it.
Replaced with backtick code formatting; the rest of the doc is
unchanged.

Caught by the Documentation CI job on v0.5.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 15:03:37 -05:00
Hunter Bown ba6bc351da release: v0.5.2 — /model picker
Single-feature release that lands #39 (the /model two-pane picker)
on top of the v0.5.1 quality-of-life batch. Bumps workspace +
npm wrapper + Cargo.lock in lockstep; check-versions.sh verifies.

Closes #39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 15:00:19 -05:00
Hunter Bown ebdda09c29 feat(#39): /model opens Pro/Flash + Off/High/Max picker
`/model` with no argument now opens a two-pane modal: model on the left
(deepseek-v4-pro flagship vs deepseek-v4-flash fast-and-cheap, with the
current id appearing as a "current (custom)" row when it isn't one of
the listed defaults), and thinking effort on the right (Off, High,
Max). Tab/←/→ swaps panes, ↑/↓ moves within the focused pane, Enter
applies both, Esc cancels.

Effort exposes only the three rows DeepSeek behaviorally distinguishes
per the Thinking Mode docs — `low`/`medium` are mapped server-side to
`high`, and `xhigh` to `max`, so listing them as separate choices would
mislead. The legacy variants stay valid in `~/.deepseek/settings.toml`
for back-compat (the existing `cycle_next` already only visits
Off→High→Max), the picker just doesn't surface them.

Apply path:
 * mutates app.model and app.reasoning_effort
 * resets last_*_tokens / cache / replay-token gauges so the next-turn
   footer numbers reflect the new model rather than stale ones
 * persists `default_model` and `reasoning_effort` to settings via the
   existing Settings::set/save flow so the choice survives restart
 * forwards Op::SetModel + Op::SetCompaction to the engine so the
   running session picks up the new compaction budget
 * surfaces a one-line summary describing what changed
 * if persistence fails, the in-memory change still applies and a
   "(not persisted: ...)" suffix is appended to the status line

`/model <id>` keeps working unchanged for power users; only the
no-argument branch was redirected to the new modal.

Files:
 * tui/model_picker.rs — new ModelPickerView struct + ModalView impl,
   plus eight unit tests (initial state, low/medium normalisation,
   custom model preservation, arrow navigation, focus toggle, Enter
   emits ModelPickerApplied with the right values, Esc closes silently,
   and a guard that the picker exposes exactly off/high/max).
 * tui/views/mod.rs — adds ModalKind::ModelPicker and
   ViewEvent::ModelPickerApplied carrying both new and previous
   model+effort so the handler can describe the diff.
 * tui/app.rs — adds AppAction::OpenModelPicker.
 * commands/core.rs — `/model` no-arg branch now returns
   AppAction::OpenModelPicker; `/model <id>` shortcut is unchanged.
 * tui/ui.rs — pushes ModelPickerView on the action and adds
   apply_model_picker_choice() that handles persistence + engine sync
   when ViewEvent::ModelPickerApplied fires.
 * tui/mod.rs — registers the new submodule.

Closes #39 (against v0.5.2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 14:50:49 -05:00
Hunter Bown f42f94207c release: v0.5.1 — telemetry, completion, and trust quality-of-life
Bumps workspace version to 0.5.1 and finalises the changelog with the
issues that landed since v0.5.0:
 * #25, #27, #31, #33, #34 (already on main)
 * #28 @file Tab-completion
 * #29 per-workspace trust list with /trust slash command
 * #30 reasoning-replay token chip in the footer
 * #36 regression tests for sidebar gutter bleed

scripts/release/check-versions.sh is green: workspace=0.5.1, npm=0.5.1,
Cargo.lock in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 14:40:06 -05:00
Hunter Bown d79178a926 feat(#28,#30): @file Tab-completion + reasoning replay footer chip
Two related TUI affordances bundled because they share ui.rs and the
ui/tests.rs file.

#30 — Reasoning-content replay telemetry, end-to-end:
 * models.rs — Usage gains reasoning_replay_tokens: Option<u32>.
 * client.rs — sanitize_thinking_mode_messages now returns the
   approximate replay-token count (~4 chars/token); the streaming
   pipeline overlays it onto the parsed MessageDelta usage so the
   server-reported and client-estimated numbers reach the engine
   together.
 * app.rs — App stores last_reasoning_replay_tokens.
 * ui.rs — TurnComplete handler copies the value into the App; new
   footer_reasoning_replay_spans renders an `rsn N.Nk` chip in the
   footer next to the cache hit-rate, warning-coloured when replay
   tokens exceed 50% of the input budget.
 * ui/tests.rs — covers chip-on, chip-hidden-when-zero, and the
   sanitizer's None-on-non-thinking-model path.

#28 — Tab-complete @file mentions against the workspace:
 * ui.rs — adds partial_file_mention_at_cursor (with a guard against
   `user@example.com`-style false positives) and
   try_autocomplete_file_mention. Walks the workspace via the
   existing ignore::WalkBuilder, ranks prefix matches above
   substring matches, applies the unique match outright, extends to
   the longest common prefix when multiple match, and surfaces
   ambiguous candidates via the status line. Wired into the existing
   Tab handler after the slash-command branch.
 * ui/tests.rs — covers cursor-inside-mention extraction, email
   guard, prefix vs substring ranking, single-match application,
   common-prefix extension, no-match status, and the
   no-mention-no-op path.

The mention-expansion path that ships file contents to the model is
unchanged — this is purely a discovery aid for typing the path.
Inline-contents and a fuzzy popup picker are queued for v0.5.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 14:39:59 -05:00
Hunter Bown bcf6ba9a8e feat(#29): per-workspace trust list with /trust slash command
Adds a persistent allowlist of external paths the agent may read/write
from outside the current workspace, scoped to the workspace it was
granted in. The list lives in ~/.deepseek/workspace-trust.json with
schema {"workspaces": {"<ws>": ["<trusted>", ...]}}; canonical paths on
both sides keep symlink-aliased macOS tempdirs sane.

Surface area:
 * crates/tui/src/workspace_trust.rs — new module: load_for / add /
   remove plus *_at variants for tests that need an explicit file path
   rather than HOME mutation.
 * tools/spec.rs — ToolContext gains trusted_external_paths and
   resolve_path consults it before returning PathEscape, both for the
   existing-path branch and the to-be-created (parent-canonical) branch.
 * core/engine.rs — build_tool_context loads the trust snapshot on every
   tool dispatch so /trust mutations apply on the next call.
 * commands/config.rs — /trust now takes subcommands (add, remove,
   list, on, off, status) instead of being a single all-or-nothing
   toggle. Tilde expansion handled in-line.
 * commands/mod.rs — registry entry updated with the new usage string
   and a dispatcher that forwards args.
 * tools/diagnostics.rs — adds trusted_external_paths to the JSON
   output so the agent and the user can see the list at a glance.

The interactive "Allow once / Always allow / Deny" prompt that the
issue describes is deferred — for v0.5.1 the workflow is "grant
ahead with /trust add". A future change will add a hook in
ToolContext::resolve_path that surfaces an ApprovalRequest when an
escape path is hit, so the slash-command remains the durable
mechanism while the prompt becomes the discovery one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 14:39:37 -05:00
Hunter Bown 7a85f182e2 test(#36): regression tests for sidebar gutter bleed
Adds two snapshot tests against ChatWidget rendering to lock in that long
single-line tool results never write any cells outside chat_area at the
widths reported in the bug (80, 120, 165, 200 cols), and that the
scrollbar coexists with content along the right edge instead of
overdrawing the penultimate column. The acceptance criterion in the
issue specifically requires this regression coverage; the tests pass
against current code, so existing rendering is the baseline being
guarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 14:39:19 -05:00
Hunter Bown f7fe5e09a5 fix(#37): make NIM a peer provider in config.example.toml + setup status
A user couldn't find an `NVIDIA_API_KEY` block in `~/.deepseek/config.toml`
because the example file only mentioned NIM as commented-out alternates
to the top-level keys. Two fixes:

- `config.example.toml` now has explicit `[providers.deepseek]` and
  `[providers.nvidia_nim]` sections (placed after all top-level keys so
  the TOML still parses cleanly), each documenting `api_key` /
  `base_url` / `model` plus the env vars that override them. Both
  providers can be stored at once and toggled via `/provider` or
  `--provider` without re-entering keys.
- `setup --status` "missing api_key" message is now provider-aware: on
  `nvidia-nim` it points at `NVIDIA_API_KEY` + `[providers.nvidia_nim]`
  + `deepseek auth set --provider nvidia-nim`, instead of the
  DeepSeek-only hint.

Audit verified: the v0.5.0 multi-turn replay fix path
(`should_replay_reasoning_content` → `requires_reasoning_content` in
`crates/tui/src/client.rs:1796`) keys off the model name (matches
`deepseek-v4`), not the provider, so NIM-hosted V4 models get the
replay automatically. No NIM-specific 400-class regression there.

Closes #37 (docs/UX); the live multi-turn-against-NIM verification
remains a manual smoke step listed in the issue (no NIM creds in CI).
2026-04-25 13:52:44 -05:00
Hunter Bown 24b8945010 feat(#32): basic session-handoff convention via .deepseek/handoff.md
Minimum-viable version of the handoff artifact described in #32:

- New `HANDOFF_RELATIVE_PATH = ".deepseek/handoff.md"` convention.
- `system_prompt_for_mode_with_context` now reads that path on every
  prompt rebuild and prepends a `## Previous Session Handoff` block to
  the system prompt when the file is non-empty. A fresh agent gets the
  prior session's blockers/decisions/files-touched in turn-1 context
  with zero discovery cost.
- Agent prompt updated to make the convention explicit: "if the block
  appears, read it first; before exit/`/compact`, write or update it
  via `write_file`."
- `.deepseek/` is already gitignored, so the handoff travels with the
  workspace but doesn't pollute commits unless the user opts in.

Tests cover: present-and-non-empty (block injected with file content),
missing file (no block), empty/whitespace-only file (no block). A
unique marker in the injected block (`"left a handoff at .deepseek/..."`)
discriminates the actual block from the agent prompt's own description
of the convention.

Out of scope for v0.5.1: a `/handoff` slash command, a startup banner
toast, automatic write on exit, and the diff-against-HEAD-on-resume
mechanism. The agent can already write the file via `write_file` when
the user types `write a session handoff`.

Closes #32.
2026-04-25 13:48:22 -05:00
Hunter Bown 82e4a564aa refactor(#35): tighten agent prompt tool descriptions, drop alias dupes
Tool-surface audit pass:

- FILE OPERATIONS rewritten so each line states the niche, not just the
  verb. read_file mentions PDF auto-extraction + `pages` slicing.
- New SEARCH section consolidates grep_files / file_search / web_search /
  fetch_url so the model sees them next to each other and picks the
  right one. fetch_url (#33) added; previously absent from the prompt.
- request_user_input pulled out of FILE OPERATIONS into its own USER
  section — it never belonged there.
- SUB-AGENTS list shrinks by 3: drops `spawn_agent` (use `agent_spawn`),
  `close_agent` (use `agent_cancel`), and the `agent_assign /
  assign_agent` dual-name. The underlying dispatchers still resolve those
  names, so existing sessions don't break — they just no longer
  pollute the model's tool list.

Adds `docs/TOOL_SURFACE.md` with the rationale, the v0.5.1 final
surface, and the dropped aliases. Calls out that grep_files is pure-Rust
(no rg/grep shell-out, so the "fall back to grep" AC from #35 is
vacuously satisfied — the tool has no shell dependency to fall back from).

Closes #35.
2026-04-25 13:44:43 -05:00
Hunter Bown 07ae792068 fix(#38): show provider chip in header when not on default DeepSeek
The reasoning-effort tier (`max` chip + whale icon) and the live/context
indicators were the only signals on the right of the header. Switching
to nvidia-nim left the right-hand side identical to a DeepSeek session,
so it wasn't obvious at a glance that requests were going to a different
backend.

Now: when `app.api_provider != Deepseek`, the header surfaces a bold
`NIM` chip on the right, leftmost in the chip cluster (so it survives
the narrow-width fallback variants in `right_spans`). Default-DeepSeek
sessions are unchanged — `provider_label = None` short-circuits the
chip.

Closes #38.
2026-04-25 13:41:52 -05:00
Hunter Bown ba40ae4aac feat(#34): auto-extract text from PDFs in read_file
`read_file` now detects PDFs by extension or `%PDF-` magic bytes and
shells out to `pdftotext -layout` (poppler) to return plain text
directly to the model. New optional `pages` arg accepts `N` or `N-M`
slices so big papers can be read in pieces without burning context.

When `pdftotext` isn't on `$PATH`, the tool returns a structured
`{type: "binary_unavailable", kind: "pdf", reason, hint}` payload with
install hints (`brew install poppler` / `apt install poppler-utils`)
instead of crashing or returning UTF-8 garbage from a binary file.

Tests cover extension detection (case-insensitive), magic-byte sniffing
on extension-less files, the negative case for plain text, the pages
arg parser (single, range, whitespace, invalid forms), and the
binary_unavailable branch when `pdftotext` is absent.

.docx / .epub / .html stripping deferred — same dispatch can take more
extractors later.

Closes #34.
2026-04-25 13:36:30 -05:00
Hunter Bown 7f2c382343 feat(#33): add fetch_url tool for direct HTTP GET
Complements `web_search` for cases where the URL is already known —
GitHub repos, blog posts, spec pages — and a search-engine round trip is
overkill or actively unhelpful (which #25 had been making worse).

Surface:
- `fetch_url(url, format?, max_bytes?, timeout_ms?)`
- `format`: `markdown` (default), `text`, `raw`
- HTTPS preferred, http:// allowed; non-http schemes rejected up front
- Follows up to 5 redirects; 1 MB default cap (10 MB hard ceiling); 15 s
  default timeout (60 s ceiling)
- HTML responses are stripped to readable text via the same regex
  pattern used by `web_search` (script/style strip → tag strip → entity
  decode → whitespace collapse)
- 4xx / 5xx responses still return the body (with `success: false`) so
  the caller can read JSON error envelopes

Capabilities: `ReadOnly + Network`. Approval: `Auto` (matches
`web_search`). Registered in `with_web_tools` so it's available wherever
`web_search` is.

Tests cover: format parsing aliases, scheme rejection, missing/empty
url validation, html-to-text stripping. The over-the-wire cases
(redirect chains, oversized truncation) are exercised by integration
tests once the test suite is wired to a local mock HTTP server —
deferring that since the unit tests already lock in the input
validation and HTML processing.

Closes #33.
2026-04-25 13:33:22 -05:00
Hunter Bown 017ac97d0d feat(#30): debug-log reasoning_content replay size per request
The thinking-mode sanitizer now sums the byte size of every replayed
`reasoning_content` field in the outgoing chat-completions body and
emits an `info`-level log line:

  Reasoning-content replay: 7 assistant message(s), ~3.2K input tokens (12,884 chars) being re-sent in this request

This is visible under `RUST_LOG=deepseek_tui=info` (or higher). It's the
first step toward the footer/status-line indicator described in #30 —
the model's input-side reasoning replay is now observable per turn,
even before it gets a dedicated UI surface.

Tests cover both branches: bodies that already have reasoning_content
(count is summed across all assistant turns) and bodies where the
sanitizer had to inject the `(reasoning omitted)` placeholder (the
placeholder bytes are included in the count since they ship over the
wire).

Footer integration deferred — that needs a new event from client → engine
→ TUI to surface the count alongside `cache N%` / `$X.XX`. Part of #30
remains open.
2026-04-25 13:28:44 -05:00
Hunter Bown 0a394e1587 fix(#31): catch version drift in CI, not at release time
Adds scripts/release/check-versions.sh and a `versions` CI job that runs
on every push/PR. Verifies:
- no per-crate Cargo.toml carries a literal version (must inherit the
  workspace version)
- npm/deepseek-tui/package.json matches the workspace version
- Cargo.lock is in sync with the manifests

Closes #31.
2026-04-25 13:25:55 -05:00
Hunter Bown fafb76063d ci: fix unused import + cargo fmt drift from #27
The #27 per-mode context budget commit (1be18e69) replaced calls to
compaction_threshold_for_model with compaction_threshold_for_model_and_effort
but left the old name in the import list, which fails under -Dwarnings on
Build, Test, and the npm wrapper smoke job. Also re-runs cargo fmt over
the four files the lint job flagged.
2026-04-25 13:21:16 -05:00
Hunter Bown 9e7cfc951a docs: README v0.5.0 callout + CHANGELOG entry, drop cargo install
- README: drop the `cargo install deepseek-tui` / `deepseek-tui-cli`
  block and the crates.io badge — those packages have been lagging the
  workspace version for several releases. Source install (`cargo install
  --path crates/tui`) remains documented for hackers.
- README: add a "What's new in v0.5.0" section pointing at the
  thinking-mode tool-call 400 fix and the #25 web.run cleanup.
- CHANGELOG: add the [0.5.0] - 2026-04-25 entry covering the per-message
  reasoning_content rule, the wire-payload sanitizer, and #25.
2026-04-25 13:15:35 -05:00
Hunter Bown 1be18e691b feat(#27): per-mode soft context budget for V4 compaction trigger
Add compaction_threshold_for_model_and_effort() with mode-aware soft
caps based on DeepSeek V4 paper Figure 9 recall-quality data:

  Plan / off   ->  64K (paper eval: 8K-128K)
  Agent / high -> 192K (paper eval: 128K)
  YOLO / max   -> 384K (paper eval: 384K-512K)

Previously, the 80%-of-window rule gave 800K for V4's 1M window,
which is well past the point where MRCR MMR collapses (0.49 at 1M).

Non-V4 models keep the legacy 80% rule. None/unknown effort defaults
to agent-tier (192K).
2026-04-25 12:58:35 -05:00
Hunter Bown ccc9554ef4 fix(#25): strip phantom web.run references from prompts and web_search tool
Replace all web.run mentions with web_search in prompt files (base,
agent, yolo, plan, normal) and update web_search.rs description.
Model was trying web.run which doesn't exist, wasting turns on
validation errors. Also remove [cite:ref_id] citation format which
required web.run's ref_id system.

Partial fix for #25 — web_search reliability improvements (real
search provider) still needed.
2026-04-25 12:51:12 -05:00
Hunter Bown 97f245d5ac docs: add TUI screenshot to README 2026-04-25 12:48:42 -05:00
Hunter Bown 19f8d83d3b release: v0.5.0 — fix multi-turn tool call 400 error (missing reasoning_content on assistant messages with tool_calls) 2026-04-25 12:27:53 -05:00
Hunter Bown 67b232b063 Release v0.4.9: thinking-mode reasoning_content fix + README refresh
### Fixed
- DeepSeek thinking-mode tool-call rounds now always replay reasoning_content
  in all subsequent requests (including across new user turns), matching the
  documented API contract that assistant tool-call messages must retain their
  reasoning content forever. Previously, reasoning_content was cleared after
  the current user turn completed, which could cause HTTP 400 errors.
- Missing reasoning_content on a tool-call assistant message now substitutes
  a safe placeholder ("(reasoning omitted)") instead of dropping the tool
  calls and their matching tool results, preventing orphaned conversation
  chains and API 400 rejections.
- Session checkpoint now persists a Thinking-block placeholder for tool-call
  turns that produced no streamed reasoning text, keeping on-disk sessions
  structurally correct for subsequent requests.
- Token estimation for compaction now counts thinking tokens across ALL
  tool-call rounds (not just the current user turn), aligning with the
  updated reasoning_content replay rule.

### Changed
- Internal crate dependency pins bumped 0.4.5 → 0.4.9 to match workspace.
- npm wrapper version and deepseekBinaryVersion bumped to 0.4.9.
- README fully rewritten: clearer feature highlights, V4 model focus,
  keyboard shortcut table, improved docs index, and more engaging layout.
- CHANGELOG entry for 0.4.9 with comparison URLs.
2026-04-25 12:00:08 -05:00
Hunter Bown 41c54f08aa release: deepseek-tui 0.4.8 2026-04-25 10:48:00 -05:00
Hunter Bown 56ea63e2c6 Merge pull request #20 from Hmbown/codex/v4-pro-discount-pricing
[codex] Track DeepSeek V4 Pro discount pricing
2026-04-25 10:44:22 -05:00
Hunter Bown 94834b2eb4 fix: track DeepSeek V4 Pro discount pricing 2026-04-25 10:34:02 -05:00
Hunter Bown dc8f8496d6 Merge pull request #19 from Hmbown/codex/release-0.4.7
release: deepseek-tui 0.4.7 + Devin .env empty-key fix
2026-04-25 08:04:10 -05:00
Hunter Bown 5c1086fe9e release: deepseek-tui 0.4.7
Bump workspace version 0.4.6 -> 0.4.7 and ship the bug fix flagged by Devin
on PR #18: an uncommented `DEEPSEEK_API_KEY=` line in .env.example caused
`cp .env.example .env` to load an empty key, which `apply_env_overrides`
then propagated into the config and `Config::validate()` rejected on
startup with "api_key cannot be empty string".

- .env.example: comment out the empty `DEEPSEEK_API_KEY=` placeholder.
- crates/tui/src/config.rs: skip empty `DEEPSEEK_API_KEY` env values in
  `apply_env_overrides`, matching the facade's empty-string filter.
- Add `apply_env_overrides_ignores_empty_api_key` regression test.
- Bump Cargo.toml workspace version, npm wrapper version + binary version,
  and Cargo.lock.
2026-04-25 07:59:01 -05:00
Hunter Bown dff592bc6d Merge pull request #18 from Hmbown/codex/deepseek-tui-app-working
Verify app smoke and add NIM env support
2026-04-25 07:46:41 -05:00
Hunter Bown 29141bc89b Add NIM env support and .env.example template 2026-04-25 07:21:43 -05:00
Hunter Bown fe4f261a5d release: deepseek tui 0.4.6
Bundles new work since v0.4.4: NIM provider support, file mention
attachments, provider switch UX, setup --status/--clean/--tools/--plugins,
doctor --json, and protocol-recovery hardening that strips fake
tool-call wrappers from streaming model output.
2026-04-25 02:01:04 -05:00
Hunter Bown 7f0444d26b fix: re-add DEFAULT_TEXT_MODEL import dropped by merge
Local main's unpushed commits had removed DEFAULT_TEXT_MODEL from the
crate::config import in main.rs, but the merged branch's new code at
two call sites still uses it. Textual three-way merge took the local
import line and the branch's call sites, producing a build break.
Re-add the symbol to the import.
2026-04-25 01:57:17 -05:00
Hunter Bown 298f5c6c51 Merge branch 'claude/improve-deepseek-v4-harness-NxBpS' into main 2026-04-25 01:55:44 -05:00