codewhale

dgf1988/codewhale

Author	SHA1	Message	Date
Hunter B	877b44935e	fix(skills): reject multi-skill Claude plugin archives Document the portable SKILL.md compatibility boundary for Claude Code plugin bundles and keep /skill install from silently flattening plugin archives that carry multiple skills plus plugin.json runtime metadata. Reported by @AiurArtanis in #2743. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 08:20:36 -07:00
Claude	8284f395e6	test(integration): add signature field to ContentBlock::Thinking initializers and pattern in tests/ — the bins-only local run missed integration-test targets when the field landed Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_018zaP8vUfTAsrE38L6h6fw5	2026-06-11 04:38:39 +00:00
Hunter B	44c13eb63f	fix(release): check-versions validates the generated TUI changelog slice, not byte equality The packaged changelog is now a recent-releases slice produced by scripts/sync-changelog.sh (which gains a --check mode); also restore the SECURITY.md contact line the version gate guards, and finish the stale binary-name sweep (--bin codewhale examples, qa harness doc).	2026-06-09 23:32:40 -07:00
Paulo Aboim Pinto	c25f7af219	Address acceptance harness review feedback	2026-06-07 16:29:40 +02:00
Paulo Aboim Pinto	d90031f06f	Add Gherkin acceptance E2E harness example	2026-06-07 16:12:12 +02:00
Hunter B	3d676c2509	chore(tui): harden exec harness signals	2026-06-06 22:55:23 -07:00
Hunter B	9b500a7b91	Prepare v0.9.0 release build	2026-06-06 19:39:02 -07:00
Hunter Bown	06612495fc	chore(release): prep v0.8.51 — Arcee provider, cycle removal, UI fixes Release-preparation checkpoint for v0.8.51 (workspace + npm bumped to 0.8.51). Added: - Arcee AI direct provider: [providers.arcee], ARCEE_API_KEY/BASE_URL/MODEL, CLI auth, provider + model picker, registry. Default direct-API model is trinity-large-thinking (reasoning, 262K ctx/out); preview + mini selectable. Cloudflare-WAF-safe opening turn (benign read-only tool surface, system-prompt payload splitting) and reasoning_content replay on tool-call turns. - Expanded model catalog (qwen3.6 flash/plus/max-preview, Xiaomi MiMo v2.5 chat/ASR/TTS); provider-aware model picker with per-provider saved models. Changed: - Auto-compaction is percentage- and model-aware (compaction_threshold_for_model_at_percent; default 80%; auto-enable for <=256K windows, opt-in for 1M models). - Provider/gateway HTTP errors sanitized (HTML/WAF interstitials collapsed, 401/403 split into authentication vs authorization). Removed: - The session cycle / checkpoint-restart system: /cycles, /cycle, /recall, recall_archive tool, cycle_manager, cycle-handoff prompt, sidebar cycle lines, EngineConfig.cycle / Event::CycleAdvanced / seam cycle thresholds. Fixed: - Orphaned assistant 'blue dot' role glyph on whitespace-only turns. - Sidebar mouse-wheel scroll leaking into the transcript. - Sidebar hover tooltip overlap + warning-orange styling. - README Constitution description corrected to match prompts/base.md. - Repaired release-blocking unit/integration tests after the refactors. Preflight: cargo fmt clean, workspace builds, 3903 tui tests pass (1 known flaky MCP SSE test under parallel load, passes in isolation).	2026-06-02 17:36:18 -07:00
Hunter B	ddae7584f8	fix: resolve clippy warnings in harvested PRs (needless-borrow, is_multiple_of, dead unwrap)	2026-06-01 21:24:38 -07:00
Hu Qiantao	139b542d3f	test(ci): add Cache Guard CI test for prefix-cache stability Add a CI guard test that verifies prefix-cache stability across multi-turn conversations. The test runs 8 test cases × 14-24 turns each: - plain-dialogue (14 turns, with/without reasoning) - long-dialogue (18 turns) - mixed-message-sizes (20 turns) - tool-loop (14 turns, with/without reasoning) - long-tool-loop (24 turns, with/without reasoning) - compaction-must-cause-at-least-one-miss (30 turns) Environment variables: - CODEWHALE_CACHE_GUARD=1: Enable the guard (default: disabled) - CODEWHALE_CACHE_GUARD_THRESHOLD=40: Hit rate threshold (0-100) - CODEWHALE_CACHE_GUARD_STRICT=1: Fail on threshold violation Usage: CODEWHALE_CACHE_GUARD=1 cargo test --test cache_guard CODEWHALE_CACHE_GUARD=1 CODEWHALE_CACHE_GUARD_STRICT=1 cargo test --test cache_guard The mock simulates DeepSeek's server-side prefix cache behavior using byte-prefix matching. The default threshold (40%) is calibrated for the mock; real CI should use CODEWHALE_CACHE_GUARD_THRESHOLD=90 for production-quality validation. 9 tests covering: - 8 multi-turn conversation scenarios - 1 compaction behavior verification	2026-06-01 21:15:12 -07:00
Hunter B	a9a4213d39	fix(tui): make startup update checks configurable	2026-05-31 17:06:20 -07:00
HUQIANTAO	6f785c4bab	refactor(palette): remove unused backward-compat aliases and add module docs (#2445 ) * refactor(palette): remove unused backward-compat aliases and add module docs - Remove DEEPSEEK_AQUA_RGB, DEEPSEEK_NAVY_RGB, DEEPSEEK_AQUA, DEEPSEEK_NAVY (unused backward-compatible aliases with no references in production code) - Add module-level doc comment explaining the three-layer palette organization: RGB tuples, semantic Color constants, and backward-compat aliases - Note that some constants are kept for design-system completeness * fix: remove deprecated color audit test (DEEPSEEK_AQUA no longer exists) * fix: remove unused import in palette_audit test --------- Co-authored-by: Hu Qiantao <huqiantao@HudeMacBook-Air.local>	2026-05-31 10:47:32 -07:00
Matt Van Horn	de3a1f7773	feat(tui): FauxStep::Factory for live request-shape assertions Closes #2074 Adds FauxStep::Factory(Box<dyn Fn(&MessageRequest) -> CannedTurn + Send + Sync>) to MockLlmClient. When a Factory step is dequeued, its closure runs against the real outgoing MessageRequest before the response stream is built, so any assert! panic surfaces directly from the client call instead of later in stream polling. Internal storage moves from VecDeque<CannedTurn> to VecDeque<FauxStep>, but every existing public method keeps working: - MockLlmClient::new(Vec<CannedTurn>) wraps each turn in FauxStep::Canned. - push_turn(CannedTurn) appends as FauxStep::Canned. Adds push_factory(closure) for tests that want the Factory branch. Doc comment on the Factory variant captures the DeepSeek V4 thinking-mode tool-call invariant (the v0.4.9-v0.5.1 reasoning_content drop that produced HTTP 400 on follow-up turns). Adds: - crates/tui/tests/reasoning_content_replayed_after_tool_call.rs — a regression test whose factory asserts the assistant tool-call turn carries a Thinking content block after a thinking + tool-call round. - An additional unit test in mock.rs covering create_message_synthesizes_from_factory_turn. All 20 tests in the new file pass, and the existing integration_mock_llm suite (27 tests) is unchanged.	2026-05-30 21:15:58 -07:00
Paulo Aboim Pinto	45e7b12583	style(tui): format shell dispatcher stack	2026-05-30 19:18:38 -07:00
Hunter Bown	228372935e	chore(release): prepare v0.8.45 Harvested from PR #2118 by @Hmbown. Includes Kimi/Moonshot OAuth, v0.8.45 release prep, the Codex/ChatGPT OAuth removal, open-source-first model defaults, and the safe green PR batch merged into main before the release branch refresh.	2026-05-25 18:45:36 -05:00
Hunter Bown	2947eff9d1	fix(ci): satisfy Rust 1.88 clippy gate	2026-05-24 01:20:19 -05:00
Hunter Bown	903e4537f4	refactor(strings): rebrand user-facing strings to codewhale Rename brand-bearing string literals across the TUI source and the system-prompt templates that ship inside the binary. The DeepSeek provider integration is again left intact: the `ApiProvider::Deepseek` enum variant, the `"deepseek"` provider name string returned by `ApiProvider`-to-string mappings, model IDs, the `~/.deepseek/` config directory and `DEEPSEEK_CONFIG_PATH` env var, the OS keyring key `"deepseek"`, the Ollama `deepseek-coder` model defaults, the China preset alias `deepseek-china`, and the various provider list error messages all keep the legacy spelling. Touchpoints: - `crates/tui/src/prompts/.md` and `.txt`: brand language flipped to `codewhale`; the internal `<deepseek:subagent.done>`, `<deepseek:subagent_context>`, `<deepseek:fork_state>`, `<deepseek:tool_call>` XML-ish event tags rename in lockstep to `<codewhale:…>` so the model-facing format stays consistent. - `crates/tui/src/tools/subagent/mod.rs`: emits the new event tag. - `crates/tui/src/core/tool_parser.rs`: parses the new event tag. - `crates/tui/src/tools/subagent/tests.rs`, `crates/tui/tests/protocol_recovery.rs`, `crates/tui/src/prompts.rs`: test expectations updated to match the new tag and the new prompt text. - Status / display strings flipped to `codewhale`: `acp_server.rs`'s agent name + title, `config_ui.rs`'s config schema title, `share.rs`'s export title, `welcome.rs`'s onboarding banner, `commands/status.rs`, `core/engine`, `tui/notifications.rs`, `tui/sidebar.rs`, `tui/widgets/header.rs`, `tui/widgets/mod.rs`, `tui/ui.rs`'s resume-hint, `main.rs`'s clap header and `Doctor` prose, `tui/ui/tests.rs` and other test assertions. - `crates/tui/src/logging.rs` test fixture: `deepseek_cli=debug` -> `codewhale_cli=debug` so the log-filter test references the renamed crate. - Tracing targets that were namespaced under the brand (`target: "deepseek::config"`) move to `target: "codewhale::config"`. - Test-fixture tempdir prefixes (`deepseek-tui-…` / `deepseek-…`) rename for consistency. Local gates green: `cargo check --workspace --all-targets --locked`, `cargo fmt --all -- --check`, `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings`, `cargo test --workspace --all-features --locked` (3226+ pass, 0 fail). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 11:48:43 -05:00
Hunter Bown	8da9fb7d52	test(qa-pty): retarget harness at codewhale-tui binary The qa_pty integration tests booted the canonical TUI via `Harness::cargo_bin("deepseek-tui")`. After the previous commit renamed the canonical bin to `codewhale-tui` and made `deepseek-tui` a tiny deprecation shim that forwards to it via PATH, those tests launched the shim and hung waiting for a TUI frame because the sealed PATH in the test sandbox has no `codewhale-tui`. Point the harness at `cargo_bin("codewhale-tui")` directly. Also add a `CARGO_BIN_EXE_codewhale-tui` compile-time fallback in the harness so older Cargo versions still work, alongside the existing legacy `deepseek-tui` fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 11:11:11 -05:00
Hunter Bown	bd603a271c	feat(tools): add image_ocr tool — extract text from images via tesseract Lets the model OCR a screenshot, scanned receipt, whiteboard photo, or image-only PDF the user drops into the workspace, without bouncing through `exec_shell` (which would mean an approval prompt plus the model having to remember tesseract's CLI surface). The tool spawns `tesseract <image> -` and returns the recognised text inline — no file is written. Capability is ReadOnly + parallel since OCR is a side-effect-free read. Registration is gated on `crate::dependencies::resolve_tesseract()` via the new `ToolRegistryBuilder::with_image_ocr_tools()` builder, hooked into `with_agent_tools` alongside `pandoc_convert`. When tesseract is missing the tool isn't advertised — same probe-then-decide pattern v0.8.31 introduced for Python. The execute path also late-resolves so a concurrent uninstall surfaces the install-tesseract hint rather than the raw spawn failure. `deepseek doctor`'s "Tool Dependencies" section reports tesseract status next to pandoc / node / python with platform-aware install hints. For non-default language packs or PSM modes the user can still drop into `exec_shell` with the full tesseract CLI surface. Tests check the metadata (ReadOnly + parallel, not WritesFiles), the missing-path rejection, and the happy-path OCR round-trip against `crates/tui/tests/fixtures/ocr_hello.png` — a 2 KB 300×100 grayscale PNG generated with ImageMagick rendering "HELLO OCR" in Helvetica. The happy-path test skips silently on hosts without tesseract (matching the catalog-build behaviour) and on hosts where the fixture isn't checked out (sparse / shallow clones). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 00:58:48 -05:00
THINKER_ONLY	250953ad35	feat(tools/agent_spawn): teach parent that subagent results are self-reports	2026-05-10 08:15:19 -05:00
Hunter Bown	f91970f092	fix(skills): accept workflow pack archive layouts (#1164 ) Teach /skill install to recognize compatible skill directories such as .claude/skills/<name>/SKILL.md, nested packages/.../skills/<name>/SKILL.md, and single nested skill repos while still extracting only the selected subtree. Also make /init treat an existing AGENTS.md as an idempotent no-op so the TUI matches the dispatcher behavior instead of surfacing a scary error for an already-initialized project.	2026-05-08 02:37:21 -05:00
Sun	73f92bfb5a	fix(fetch_url): add proxy DNS opt-in (#1103 )	2026-05-08 02:26:12 -05:00
Hunter Bown	245e409a20	fix(tui): reset terminal viewport before repaint	2026-05-07 15:17:03 -05:00
Hunter Bown	0f46acdd76	fix(release): close v0.8.17 gate gaps	2026-05-07 13:27:31 -05:00
Hunter Bown	4e285595b0	fix(tui): paste-Enter must not auto-submit (#1073 ) + PTY QA harness Two pieces: #1073 fix. When a paste burst is currently being assembled, or when the burst's Enter-suppression window is still open after a flush, the trailing newline of the paste was firing `submit_input()` and the in-flight burst buffer was getting destroyed by `clear_after_explicit_paste()`. The PasteBurst module already exposed `newline_should_insert_instead_of_submit` and `append_newline_if_active` for exactly this case, but no caller had been wired up. Added `App::handle_composer_enter`, which checks the suppression state and either appends `\n` to the burst buffer or inserts it directly into the composer text — no submit. The `KeyCode::Enter` arm in the composer event loop now dispatches through that helper. Reproduces the Windows/PowerShell symptom from the report: multi-line paste ending with `\n` no longer auto-submits AND the text no longer leaks into the now-empty composer. Four unit tests cover: active-burst Enter, post-flush window Enter, normal Enter outside the window, and Enter with paste-burst detection disabled (suppression must be off). PTY QA harness. New `crates/tui/tests/support/qa_harness/` wraps `portable-pty` (already a runtime dep) and `vt100` (new dev-dep) into a small surface for scenarios that need a real PTY: spawn a binary, send keys/paste/resize, parse the ANSI stream into a frame, assert on visible text + filesystem state. The harness seals `$HOME` so scenarios cannot read the developer's real `~/.deepseek/` and points the base URL at 127.0.0.1:1 so no live request escapes. README under `support/qa_harness/README.md` documents how to add a scenario. Initial scenarios in `crates/tui/tests/qa_pty.rs`: smoke boot, keystroke round-trip, and bracketed/unbracketed paste-with-trailing- newline regression guards for #1073. The unbracketed scenario does not deterministically reproduce the bug on macOS (single-syscall PTY writes keep the burst continuously active), but the unit tests above cover the path conclusively; the PTY test stands as a regression guard for the visible-text invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 12:23:57 -05:00
Hunter Bown	1171c5e401	fix(skills): ignore symlinks outside selected install root (#814 )	2026-05-06 02:36:57 -05:00
Hunter Bown	5bfc1feb62	v0.8.6: survivability, UX polish, and release hardening Merge the v0.8.6 feature batch and release hardening.\n\nIncludes the full #373-#380/#382-#402 milestone scope, version bump to 0.8.6, secure /share temp-file handling, Windows-safe self-update replacement, and CI portability fixes.\n\nRemote PR checks passed on the final head before merge.	2026-05-02 20:11:33 -05:00
Hunter Bown	a7e629ae4d	test(parity): scan engine submodules after decomposition refactor The protocol-recovery contract tests `include_str!`-ed `engine.rs` and asserted the fake-wrapper markers (`[TOOL_CALL]`, `<function_calls>`, …) appeared as string literals in that file. The recent engine decomposition refactor (commits f0fad7aa..a64bc9bb) split engine.rs into `engine/streaming.rs`, `engine/turn_loop.rs`, `engine/dispatch.rs`, `engine/tool_setup.rs`, `engine/tool_execution.rs`, `engine/tool_catalog.rs`, `engine/context.rs`, `engine/approval.rs`, `engine/capacity_flow.rs`, and `engine/lsp_hooks.rs`. The marker literals followed the code into those files, so the original single-file `include_str!` no longer saw them and 4 protocol-recovery tests went red. Switch to an `ENGINE_SOURCES: &[&str]` array of `include_str!`s across engine.rs + every submodule, with a small `any_engine_source_contains` helper. Test bodies are otherwise unchanged. The file-size sanity check on `engine.rs` (>10_000 bytes) still passes — engine.rs is still ~65k bytes after the refactor. Same regression coverage as before; just survives the new file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:34:11 -05:00
Hunter Bown	ad78466ba0	feat(skills): #140 community-skill installer module Add `crates/tui/src/skills/install.rs` — async installer that pulls user-authored skills from GitHub repos, raw tarball URLs, or a curated `index.json` registry. The whole pipeline is gated by the per-domain `NetworkPolicy` (#135), validated against path-traversal / size / symlink attacks before any bytes hit the destination, and atomic-renamed into place so a half-installed skill cannot survive a failure mid-extract. Public surface: - `InstallSource::{GitHubRepo,DirectUrl,Registry}` with `parse(spec)`. - `install` / `install_with_registry` returning `InstallOutcome::{Installed,NeedsApproval,NetworkDenied}`. - `update` / `update_with_registry` returning `UpdateResult::{NoChange,Updated,NeedsApproval,NetworkDenied}` — uses a SHA-256 over the downloaded tarball to short-circuit no-op fetches. - `uninstall` / `trust` — both refuse to touch directories without an `.installed-from` marker, so the bundled `skill-creator` system skill is protected. - `fetch_registry` — typed loader for the curated `index.json`. Validation hard rules (each covered by an integration test): - `..` segments and absolute paths in tar entries are rejected. - Symlinks / hardlinks in tar entries are rejected outright. - Uncompressed total size is bounded by `max_size` (default 5 MiB). - SKILL.md must exist at the archive root or under `skills/<name>/`. - Frontmatter must carry both `name` and `description`. - `install` with an existing destination requires `update = true`. - `update` re-fetches and only replaces the on-disk install when the checksum changes; no-change paths skip the rename entirely. Adds `tar`, `flate2`, and `sha2` to `crates/tui/Cargo.toml` and propagates the resulting lockfile drift to `Cargo.lock`. Tests: 11 colocated unit tests in `install.rs` + 11 integration tests in `crates/tui/tests/skill_install.rs` driving a `tiny_http`-based server so the network gate, download cap, validation pipeline, and atomic rename all run end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 00:29:48 -05:00
Hunter Bown	9db841fc62	test(tui): #69 integration tests for mock LLM client + record fixtures Adds `integration_mock_llm.rs` covering the LlmClient trait surface: - streaming turn loop (text deltas + finish reason) - reasoning-content replay across tool-call rounds (V4 §5.1.1, the HTTP 400 path that broke v0.4.9-v0.5.1) - tool-call round-trip with chunked input JSON - multiple tool calls in one turn preserve event ordering - compaction-style non-streaming `create_message` - sub-agent style independent parent/child mocks - capacity-gate observation of a captured request Four full-engine tests are `#[ignore]`-marked as BLOCKED on the engine refactor from concrete `Option<DeepSeekClient>` to `Arc<dyn LlmClient>`. Once that wiring lands the ignored tests light up with no mock changes. Adds: - `tests/support/llm_client.rs` mirrors the trait so the mock can be brought into the integration test via `#[path]` without dragging in the rest of the binary's module tree - `tests/fixtures/.gitkeep` so the `eval --record` output directory rides the repo - `tests/README.md` documents both the trait-level mocking strategy and the `--record` fixture flow - `record_flag_writes_one_jsonl_line_per_step` in `eval_harness.rs` exercises the new `--record` flag end-to-end Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 00:03:18 -05:00
Hunter Bown	853a39138c	feat: setup status/clean/dirs and protocol-recovery hardening Adds a compact `setup --status` view, a `setup --clean` for regenerable session checkpoints, and `--tools`/`--plugins` scaffolding for ~/.deepseek/{tools,plugins} so the extension model has a documented home that doctor can count. `doctor --json` lands as a CI-safe alternative to the human-readable doctor (skips the live API probe). Also locks down the engine's hostility to fake tool-call wrappers: filter_tool_call_delta and the marker constants are now testable, the streaming loop emits one compact status notice per turn when it strips a wrapper, and a new protocol_recovery integration test asserts that the legacy text parser never turns <function_calls> into a real tool call. Adds 23 unit tests + 14 integration tests covering both slices.	2026-04-25 06:26:07 +00:00
Hunter Bown	f4dbf828c9	Footer polish: remove FOOTER_HINT, simplify footer rendering - Remove FOOTER_HINT color constant from palette - Drop footer clock label and related synchronization logic - Simplify footer status line layout and narrow-terminal handling - Update tests to align with simplified footer logic - Remove empty state placeholder text for cleaner UI - Bump version to 0.3.33	2026-04-11 20:20:18 -05:00
Hunter Bown	7b91169017	refactor: move source files into workspace crates - Move src/* into crates/tui/src/ to create a proper workspace structure - Add .claude/ and .trimtab/ directories for Trimtab closed-loop workflow - Add DEPENDENCY_GRAPH.md and update documentation - Update Cargo.toml files to reflect new crate dependencies - Update CI workflows and npm package scripts - All tests pass, release build works	2026-03-11 20:00:38 -05:00

33 Commits