Commit Graph

33 Commits

Author SHA1 Message Date
Hunter B 877b44935e fix(skills): reject multi-skill Claude plugin archives
Document the portable SKILL.md compatibility boundary for Claude Code plugin bundles and keep /skill install from silently flattening plugin archives that carry multiple skills plus plugin.json runtime metadata.

Reported by @AiurArtanis in #2743.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 08:20:36 -07:00
Claude 8284f395e6 test(integration): add signature field to ContentBlock::Thinking initializers and pattern in tests/ — the bins-only local run missed integration-test targets when the field landed
Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_018zaP8vUfTAsrE38L6h6fw5
2026-06-11 04:38:39 +00:00
Hunter B 44c13eb63f fix(release): check-versions validates the generated TUI changelog slice, not byte equality
The packaged changelog is now a recent-releases slice produced by
scripts/sync-changelog.sh (which gains a --check mode); also restore the
SECURITY.md contact line the version gate guards, and finish the stale
binary-name sweep (--bin codewhale examples, qa harness doc).
2026-06-09 23:32:40 -07:00
Paulo Aboim Pinto c25f7af219 Address acceptance harness review feedback 2026-06-07 16:29:40 +02:00
Paulo Aboim Pinto d90031f06f Add Gherkin acceptance E2E harness example 2026-06-07 16:12:12 +02:00
Hunter B 3d676c2509 chore(tui): harden exec harness signals 2026-06-06 22:55:23 -07:00
Hunter B 9b500a7b91 Prepare v0.9.0 release build 2026-06-06 19:39:02 -07:00
Hunter Bown 06612495fc chore(release): prep v0.8.51 — Arcee provider, cycle removal, UI fixes
Release-preparation checkpoint for v0.8.51 (workspace + npm bumped to 0.8.51).

Added:
- Arcee AI direct provider: [providers.arcee], ARCEE_API_KEY/BASE_URL/MODEL,
  CLI auth, provider + model picker, registry. Default direct-API model is
  trinity-large-thinking (reasoning, 262K ctx/out); preview + mini selectable.
  Cloudflare-WAF-safe opening turn (benign read-only tool surface, system-prompt
  payload splitting) and reasoning_content replay on tool-call turns.
- Expanded model catalog (qwen3.6 flash/plus/max-preview, Xiaomi MiMo v2.5
  chat/ASR/TTS); provider-aware model picker with per-provider saved models.

Changed:
- Auto-compaction is percentage- and model-aware
  (compaction_threshold_for_model_at_percent; default 80%; auto-enable for
  <=256K windows, opt-in for 1M models).
- Provider/gateway HTTP errors sanitized (HTML/WAF interstitials collapsed,
  401/403 split into authentication vs authorization).

Removed:
- The session cycle / checkpoint-restart system: /cycles, /cycle, /recall,
  recall_archive tool, cycle_manager, cycle-handoff prompt, sidebar cycle lines,
  EngineConfig.cycle / Event::CycleAdvanced / seam cycle thresholds.

Fixed:
- Orphaned assistant 'blue dot' role glyph on whitespace-only turns.
- Sidebar mouse-wheel scroll leaking into the transcript.
- Sidebar hover tooltip overlap + warning-orange styling.
- README Constitution description corrected to match prompts/base.md.
- Repaired release-blocking unit/integration tests after the refactors.

Preflight: cargo fmt clean, workspace builds, 3903 tui tests pass (1 known
flaky MCP SSE test under parallel load, passes in isolation).
2026-06-02 17:36:18 -07:00
Hunter B ddae7584f8 fix: resolve clippy warnings in harvested PRs (needless-borrow, is_multiple_of, dead unwrap) 2026-06-01 21:24:38 -07:00
Hu Qiantao 139b542d3f test(ci): add Cache Guard CI test for prefix-cache stability
Add a CI guard test that verifies prefix-cache stability across
multi-turn conversations.

The test runs 8 test cases × 14-24 turns each:
- plain-dialogue (14 turns, with/without reasoning)
- long-dialogue (18 turns)
- mixed-message-sizes (20 turns)
- tool-loop (14 turns, with/without reasoning)
- long-tool-loop (24 turns, with/without reasoning)
- compaction-must-cause-at-least-one-miss (30 turns)

Environment variables:
- CODEWHALE_CACHE_GUARD=1: Enable the guard (default: disabled)
- CODEWHALE_CACHE_GUARD_THRESHOLD=40: Hit rate threshold (0-100)
- CODEWHALE_CACHE_GUARD_STRICT=1: Fail on threshold violation

Usage:
  CODEWHALE_CACHE_GUARD=1 cargo test --test cache_guard
  CODEWHALE_CACHE_GUARD=1 CODEWHALE_CACHE_GUARD_STRICT=1 cargo test --test cache_guard

The mock simulates DeepSeek's server-side prefix cache behavior
using byte-prefix matching. The default threshold (40%) is calibrated
for the mock; real CI should use CODEWHALE_CACHE_GUARD_THRESHOLD=90
for production-quality validation.

9 tests covering:
- 8 multi-turn conversation scenarios
- 1 compaction behavior verification
2026-06-01 21:15:12 -07:00
Hunter B a9a4213d39 fix(tui): make startup update checks configurable 2026-05-31 17:06:20 -07:00
HUQIANTAO 6f785c4bab refactor(palette): remove unused backward-compat aliases and add module docs (#2445)
* refactor(palette): remove unused backward-compat aliases and add module docs

- Remove DEEPSEEK_AQUA_RGB, DEEPSEEK_NAVY_RGB, DEEPSEEK_AQUA, DEEPSEEK_NAVY
  (unused backward-compatible aliases with no references in production code)
- Add module-level doc comment explaining the three-layer palette organization:
  RGB tuples, semantic Color constants, and backward-compat aliases
- Note that some constants are kept for design-system completeness

* fix: remove deprecated color audit test (DEEPSEEK_AQUA no longer exists)

* fix: remove unused import in palette_audit test

---------

Co-authored-by: Hu Qiantao <huqiantao@HudeMacBook-Air.local>
2026-05-31 10:47:32 -07:00
Matt Van Horn de3a1f7773 feat(tui): FauxStep::Factory for live request-shape assertions
Closes #2074

Adds FauxStep::Factory(Box<dyn Fn(&MessageRequest) -> CannedTurn + Send + Sync>)
to MockLlmClient. When a Factory step is dequeued, its closure runs
against the real outgoing MessageRequest before the response stream is
built, so any assert! panic surfaces directly from the client call
instead of later in stream polling.

Internal storage moves from VecDeque<CannedTurn> to VecDeque<FauxStep>,
but every existing public method keeps working:

- MockLlmClient::new(Vec<CannedTurn>) wraps each turn in FauxStep::Canned.
- push_turn(CannedTurn) appends as FauxStep::Canned.

Adds push_factory(closure) for tests that want the Factory branch.

Doc comment on the Factory variant captures the DeepSeek V4
thinking-mode tool-call invariant (the v0.4.9-v0.5.1 reasoning_content
drop that produced HTTP 400 on follow-up turns).

Adds:

- crates/tui/tests/reasoning_content_replayed_after_tool_call.rs — a
  regression test whose factory asserts the assistant tool-call turn
  carries a Thinking content block after a thinking + tool-call round.
- An additional unit test in mock.rs covering create_message_synthesizes_from_factory_turn.

All 20 tests in the new file pass, and the existing
integration_mock_llm suite (27 tests) is unchanged.
2026-05-30 21:15:58 -07:00
Paulo Aboim Pinto 45e7b12583 style(tui): format shell dispatcher stack 2026-05-30 19:18:38 -07:00
Hunter Bown 228372935e chore(release): prepare v0.8.45
Harvested from PR #2118 by @Hmbown.

Includes Kimi/Moonshot OAuth, v0.8.45 release prep, the Codex/ChatGPT OAuth removal, open-source-first model defaults, and the safe green PR batch merged into main before the release branch refresh.
2026-05-25 18:45:36 -05:00
Hunter Bown 2947eff9d1 fix(ci): satisfy Rust 1.88 clippy gate 2026-05-24 01:20:19 -05:00
Hunter Bown 903e4537f4 refactor(strings): rebrand user-facing strings to codewhale
Rename brand-bearing string literals across the TUI source and the
system-prompt templates that ship inside the binary. The DeepSeek
provider integration is again left intact: the `ApiProvider::Deepseek`
enum variant, the `"deepseek"` provider name string returned by
`ApiProvider`-to-string mappings, model IDs, the `~/.deepseek/` config
directory and `DEEPSEEK_CONFIG_PATH` env var, the OS keyring key
`"deepseek"`, the Ollama `deepseek-coder*` model defaults, the China
preset alias `deepseek-china`, and the various provider list error
messages all keep the legacy spelling.

Touchpoints:

- `crates/tui/src/prompts/*.md` and `*.txt`: brand language flipped to
  `codewhale`; the internal `<deepseek:subagent.done>`,
  `<deepseek:subagent_context>`, `<deepseek:fork_state>`,
  `<deepseek:tool_call>` XML-ish event tags rename in lockstep to
  `<codewhale:…>` so the model-facing format stays consistent.
- `crates/tui/src/tools/subagent/mod.rs`: emits the new event tag.
- `crates/tui/src/core/tool_parser.rs`: parses the new event tag.
- `crates/tui/src/tools/subagent/tests.rs`,
  `crates/tui/tests/protocol_recovery.rs`,
  `crates/tui/src/prompts.rs`: test expectations updated to match the
  new tag and the new prompt text.
- Status / display strings flipped to `codewhale`: `acp_server.rs`'s
  agent name + title, `config_ui.rs`'s config schema title,
  `share.rs`'s export title, `welcome.rs`'s onboarding banner,
  `commands/status.rs`, `core/engine*`, `tui/notifications.rs`,
  `tui/sidebar.rs`, `tui/widgets/header.rs`, `tui/widgets/mod.rs`,
  `tui/ui.rs`'s resume-hint, `main.rs`'s clap header and `Doctor`
  prose, `tui/ui/tests.rs` and other test assertions.
- `crates/tui/src/logging.rs` test fixture: `deepseek_cli=debug` ->
  `codewhale_cli=debug` so the log-filter test references the renamed
  crate.
- Tracing targets that were namespaced under the brand
  (`target: "deepseek::config"`) move to `target: "codewhale::config"`.
- Test-fixture tempdir prefixes (`deepseek-tui-…` / `deepseek-…`) rename
  for consistency.

Local gates green: `cargo check --workspace --all-targets --locked`,
`cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets
--all-features --locked -- -D warnings`, `cargo test --workspace
--all-features --locked` (3226+ pass, 0 fail).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:48:43 -05:00
Hunter Bown 8da9fb7d52 test(qa-pty): retarget harness at codewhale-tui binary
The qa_pty integration tests booted the canonical TUI via
`Harness::cargo_bin("deepseek-tui")`. After the previous commit
renamed the canonical bin to `codewhale-tui` and made `deepseek-tui`
a tiny deprecation shim that forwards to it via PATH, those tests
launched the shim and hung waiting for a TUI frame because the
sealed PATH in the test sandbox has no `codewhale-tui`.

Point the harness at `cargo_bin("codewhale-tui")` directly. Also
add a `CARGO_BIN_EXE_codewhale-tui` compile-time fallback in the
harness so older Cargo versions still work, alongside the existing
legacy `deepseek-tui` fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 11:11:11 -05:00
Hunter Bown bd603a271c feat(tools): add image_ocr tool — extract text from images via tesseract
Lets the model OCR a screenshot, scanned receipt, whiteboard photo,
or image-only PDF the user drops into the workspace, without
bouncing through `exec_shell` (which would mean an approval prompt
plus the model having to remember tesseract's CLI surface). The
tool spawns `tesseract <image> -` and returns the recognised text
inline — no file is written. Capability is ReadOnly + parallel
since OCR is a side-effect-free read.

Registration is gated on `crate::dependencies::resolve_tesseract()`
via the new `ToolRegistryBuilder::with_image_ocr_tools()` builder,
hooked into `with_agent_tools` alongside `pandoc_convert`. When
tesseract is missing the tool isn't advertised — same
probe-then-decide pattern v0.8.31 introduced for Python. The
execute path also late-resolves so a concurrent uninstall surfaces
the install-tesseract hint rather than the raw spawn failure.

`deepseek doctor`'s "Tool Dependencies" section reports tesseract
status next to pandoc / node / python with platform-aware install
hints. For non-default language packs or PSM modes the user can
still drop into `exec_shell` with the full tesseract CLI surface.

Tests check the metadata (ReadOnly + parallel, not WritesFiles),
the missing-path rejection, and the happy-path OCR round-trip
against `crates/tui/tests/fixtures/ocr_hello.png` — a 2 KB
300×100 grayscale PNG generated with ImageMagick rendering
"HELLO OCR" in Helvetica. The happy-path test skips silently on
hosts without tesseract (matching the catalog-build behaviour) and
on hosts where the fixture isn't checked out (sparse / shallow
clones).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 00:58:48 -05:00
THINKER_ONLY 250953ad35 feat(tools/agent_spawn): teach parent that subagent results are self-reports 2026-05-10 08:15:19 -05:00
Hunter Bown f91970f092 fix(skills): accept workflow pack archive layouts (#1164)
Teach /skill install to recognize compatible skill directories such as .claude/skills/<name>/SKILL.md, nested packages/.../skills/<name>/SKILL.md, and single nested skill repos while still extracting only the selected subtree.

Also make /init treat an existing AGENTS.md as an idempotent no-op so the TUI matches the dispatcher behavior instead of surfacing a scary error for an already-initialized project.
2026-05-08 02:37:21 -05:00
Sun 73f92bfb5a fix(fetch_url): add proxy DNS opt-in (#1103) 2026-05-08 02:26:12 -05:00
Hunter Bown 245e409a20 fix(tui): reset terminal viewport before repaint 2026-05-07 15:17:03 -05:00
Hunter Bown 0f46acdd76 fix(release): close v0.8.17 gate gaps 2026-05-07 13:27:31 -05:00
Hunter Bown 4e285595b0 fix(tui): paste-Enter must not auto-submit (#1073) + PTY QA harness
Two pieces:

**#1073 fix.** When a paste burst is currently being assembled, or when
the burst's Enter-suppression window is still open after a flush, the
trailing newline of the paste was firing `submit_input()` and the
in-flight burst buffer was getting destroyed by `clear_after_explicit_paste()`.
The PasteBurst module already exposed `newline_should_insert_instead_of_submit`
and `append_newline_if_active` for exactly this case, but no caller had
been wired up. Added `App::handle_composer_enter`, which checks the
suppression state and either appends `\n` to the burst buffer or inserts
it directly into the composer text — no submit. The `KeyCode::Enter`
arm in the composer event loop now dispatches through that helper.
Reproduces the Windows/PowerShell symptom from the report:
multi-line paste ending with `\n` no longer auto-submits AND the text
no longer leaks into the now-empty composer.

Four unit tests cover: active-burst Enter, post-flush window Enter,
normal Enter outside the window, and Enter with paste-burst detection
disabled (suppression must be off).

**PTY QA harness.** New `crates/tui/tests/support/qa_harness/` wraps
`portable-pty` (already a runtime dep) and `vt100` (new dev-dep) into
a small surface for scenarios that need a real PTY: spawn a binary,
send keys/paste/resize, parse the ANSI stream into a frame, assert
on visible text + filesystem state. The harness seals `$HOME` so
scenarios cannot read the developer's real `~/.deepseek/` and points
the base URL at 127.0.0.1:1 so no live request escapes. README under
`support/qa_harness/README.md` documents how to add a scenario.

Initial scenarios in `crates/tui/tests/qa_pty.rs`: smoke boot,
keystroke round-trip, and bracketed/unbracketed paste-with-trailing-
newline regression guards for #1073. The unbracketed scenario does
not deterministically reproduce the bug on macOS (single-syscall
PTY writes keep the burst continuously active), but the unit tests
above cover the path conclusively; the PTY test stands as a
regression guard for the visible-text invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:23:57 -05:00
Hunter Bown 1171c5e401 fix(skills): ignore symlinks outside selected install root (#814) 2026-05-06 02:36:57 -05:00
Hunter Bown 5bfc1feb62 v0.8.6: survivability, UX polish, and release hardening
Merge the v0.8.6 feature batch and release hardening.\n\nIncludes the full #373-#380/#382-#402 milestone scope, version bump to 0.8.6, secure /share temp-file handling, Windows-safe self-update replacement, and CI portability fixes.\n\nRemote PR checks passed on the final head before merge.
2026-05-02 20:11:33 -05:00
Hunter Bown a7e629ae4d test(parity): scan engine submodules after decomposition refactor
The protocol-recovery contract tests `include_str!`-ed `engine.rs` and
asserted the fake-wrapper markers (`[TOOL_CALL]`, `<function_calls>`,
…) appeared as string literals in that file. The recent engine
decomposition refactor (commits f0fad7aa..a64bc9bb) split engine.rs
into `engine/streaming.rs`, `engine/turn_loop.rs`, `engine/dispatch.rs`,
`engine/tool_setup.rs`, `engine/tool_execution.rs`,
`engine/tool_catalog.rs`, `engine/context.rs`, `engine/approval.rs`,
`engine/capacity_flow.rs`, and `engine/lsp_hooks.rs`. The marker
literals followed the code into those files, so the original
single-file `include_str!` no longer saw them and 4 protocol-recovery
tests went red.

Switch to an `ENGINE_SOURCES: &[&str]` array of `include_str!`s across
engine.rs + every submodule, with a small `any_engine_source_contains`
helper. Test bodies are otherwise unchanged. The file-size sanity
check on `engine.rs` (>10_000 bytes) still passes — engine.rs is still
~65k bytes after the refactor.

Same regression coverage as before; just survives the new file layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:34:11 -05:00
Hunter Bown ad78466ba0 feat(skills): #140 community-skill installer module
Add `crates/tui/src/skills/install.rs` — async installer that pulls
user-authored skills from GitHub repos, raw tarball URLs, or a curated
`index.json` registry. The whole pipeline is gated by the per-domain
`NetworkPolicy` (#135), validated against path-traversal / size / symlink
attacks before any bytes hit the destination, and atomic-renamed into place
so a half-installed skill cannot survive a failure mid-extract.

Public surface:
- `InstallSource::{GitHubRepo,DirectUrl,Registry}` with `parse(spec)`.
- `install` / `install_with_registry` returning
  `InstallOutcome::{Installed,NeedsApproval,NetworkDenied}`.
- `update` / `update_with_registry` returning
  `UpdateResult::{NoChange,Updated,NeedsApproval,NetworkDenied}` — uses a
  SHA-256 over the downloaded tarball to short-circuit no-op fetches.
- `uninstall` / `trust` — both refuse to touch directories without an
  `.installed-from` marker, so the bundled `skill-creator` system skill is
  protected.
- `fetch_registry` — typed loader for the curated `index.json`.

Validation hard rules (each covered by an integration test):
- `..` segments and absolute paths in tar entries are rejected.
- Symlinks / hardlinks in tar entries are rejected outright.
- Uncompressed total size is bounded by `max_size` (default 5 MiB).
- SKILL.md must exist at the archive root or under `skills/<name>/`.
- Frontmatter must carry both `name` and `description`.
- `install` with an existing destination requires `update = true`.
- `update` re-fetches and only replaces the on-disk install when the
  checksum changes; no-change paths skip the rename entirely.

Adds `tar`, `flate2`, and `sha2` to `crates/tui/Cargo.toml` and propagates
the resulting lockfile drift to `Cargo.lock`.

Tests: 11 colocated unit tests in `install.rs` + 11 integration tests in
`crates/tui/tests/skill_install.rs` driving a `tiny_http`-based server so
the network gate, download cap, validation pipeline, and atomic rename
all run end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 00:29:48 -05:00
Hunter Bown 9db841fc62 test(tui): #69 integration tests for mock LLM client + record fixtures
Adds `integration_mock_llm.rs` covering the LlmClient trait surface:

- streaming turn loop (text deltas + finish reason)
- reasoning-content replay across tool-call rounds (V4 §5.1.1, the
  HTTP 400 path that broke v0.4.9-v0.5.1)
- tool-call round-trip with chunked input JSON
- multiple tool calls in one turn preserve event ordering
- compaction-style non-streaming `create_message`
- sub-agent style independent parent/child mocks
- capacity-gate observation of a captured request

Four full-engine tests are `#[ignore]`-marked as BLOCKED on the engine
refactor from concrete `Option<DeepSeekClient>` to `Arc<dyn LlmClient>`.
Once that wiring lands the ignored tests light up with no mock changes.

Adds:
- `tests/support/llm_client.rs` mirrors the trait so the mock can be
  brought into the integration test via `#[path]` without dragging in
  the rest of the binary's module tree
- `tests/fixtures/.gitkeep` so the `eval --record` output directory
  rides the repo
- `tests/README.md` documents both the trait-level mocking strategy
  and the `--record` fixture flow
- `record_flag_writes_one_jsonl_line_per_step` in `eval_harness.rs`
  exercises the new `--record` flag end-to-end

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 00:03:18 -05:00
Hunter Bown 853a39138c feat: setup status/clean/dirs and protocol-recovery hardening
Adds a compact `setup --status` view, a `setup --clean` for regenerable
session checkpoints, and `--tools`/`--plugins` scaffolding for
~/.deepseek/{tools,plugins} so the extension model has a documented home
that doctor can count. `doctor --json` lands as a CI-safe alternative to
the human-readable doctor (skips the live API probe).

Also locks down the engine's hostility to fake tool-call wrappers:
filter_tool_call_delta and the marker constants are now testable, the
streaming loop emits one compact status notice per turn when it strips
a wrapper, and a new protocol_recovery integration test asserts that
the legacy text parser never turns <function_calls> into a real tool
call. Adds 23 unit tests + 14 integration tests covering both slices.
2026-04-25 06:26:07 +00:00
Hunter Bown f4dbf828c9 Footer polish: remove FOOTER_HINT, simplify footer rendering
- Remove FOOTER_HINT color constant from palette
- Drop footer clock label and related synchronization logic
- Simplify footer status line layout and narrow-terminal handling
- Update tests to align with simplified footer logic
- Remove empty state placeholder text for cleaner UI
- Bump version to 0.3.33
2026-04-11 20:20:18 -05:00
Hunter Bown 7b91169017 refactor: move source files into workspace crates
- Move src/* into crates/tui/src/ to create a proper workspace structure
- Add .claude/ and .trimtab/ directories for Trimtab closed-loop workflow
- Add DEPENDENCY_GRAPH.md and update documentation
- Update Cargo.toml files to reflect new crate dependencies
- Update CI workflows and npm package scripts
- All tests pass, release build works
2026-03-11 20:00:38 -05:00