dgf1988/codewhale

Files

T

Hunter Bown ad78466ba0 feat(skills): #140 community-skill installer module

Add `crates/tui/src/skills/install.rs` — async installer that pulls
user-authored skills from GitHub repos, raw tarball URLs, or a curated
`index.json` registry. The whole pipeline is gated by the per-domain
`NetworkPolicy` (#135), validated against path-traversal / size / symlink
attacks before any bytes hit the destination, and atomic-renamed into place
so a half-installed skill cannot survive a failure mid-extract.

Public surface:
- `InstallSource::{GitHubRepo,DirectUrl,Registry}` with `parse(spec)`.
- `install` / `install_with_registry` returning
  `InstallOutcome::{Installed,NeedsApproval,NetworkDenied}`.
- `update` / `update_with_registry` returning
  `UpdateResult::{NoChange,Updated,NeedsApproval,NetworkDenied}` — uses a
  SHA-256 over the downloaded tarball to short-circuit no-op fetches.
- `uninstall` / `trust` — both refuse to touch directories without an
  `.installed-from` marker, so the bundled `skill-creator` system skill is
  protected.
- `fetch_registry` — typed loader for the curated `index.json`.

Validation hard rules (each covered by an integration test):
- `..` segments and absolute paths in tar entries are rejected.
- Symlinks / hardlinks in tar entries are rejected outright.
- Uncompressed total size is bounded by `max_size` (default 5 MiB).
- SKILL.md must exist at the archive root or under `skills/<name>/`.
- Frontmatter must carry both `name` and `description`.
- `install` with an existing destination requires `update = true`.
- `update` re-fetches and only replaces the on-disk install when the
  checksum changes; no-change paths skip the rename entirely.

Adds `tar`, `flate2`, and `sha2` to `crates/tui/Cargo.toml` and propagates
the resulting lockfile drift to `Cargo.lock`.

Tests: 11 colocated unit tests in `install.rs` + 11 integration tests in
`crates/tui/tests/skill_install.rs` driving a `tiny_http`-based server so
the network gate, download cap, validation pipeline, and atomic rename
all run end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 00:29:48 -05:00

fixtures

test(tui): #69 integration tests for mock LLM client + record fixtures

2026-04-28 00:03:18 -05:00

support

test(tui): #69 integration tests for mock LLM client + record fixtures

2026-04-28 00:03:18 -05:00

eval_harness.rs

test(tui): #69 integration tests for mock LLM client + record fixtures

2026-04-28 00:03:18 -05:00

integration_mock_llm.rs

test(tui): #69 integration tests for mock LLM client + record fixtures

2026-04-28 00:03:18 -05:00

palette_audit.rs

Footer polish: remove FOOTER_HINT, simplify footer rendering

2026-04-11 20:20:18 -05:00

protocol_recovery.rs

feat: setup status/clean/dirs and protocol-recovery hardening

2026-04-25 06:26:07 +00:00

README.md

test(tui): #69 integration tests for mock LLM client + record fixtures

2026-04-28 00:03:18 -05:00

skill_install.rs

feat(skills): #140 community-skill installer module

2026-04-28 00:29:48 -05:00

README.md

`crates/tui/tests/`

Integration tests for the TUI binary. Per CONTRIBUTING.md, each crate's integration tests live in its own tests/ directory; the repository-root tests/ directory is unused.

Mock LLM client (`integration_mock_llm.rs`)

crates/tui/src/llm_client/mock.rs provides a MockLlmClient that implements the LlmClient trait by replaying queue-driven canned responses and capturing every outgoing MessageRequest. Tests mock at the trait boundary — never at the reqwest HTTP layer — because the trait is the durable abstraction the runtime is meant to depend on.

Coverage today exercises the trait surface end-to-end:

streaming turn loop
reasoning-content replay across tool-call rounds (V4 §5.1.1, the bug that broke v0.4.9-v0.5.1)
tool-call round-trip with chunked input JSON
multi-tool-call ordering inside a single turn
compaction-style non-streaming create_message
sub-agent style independent parent/child mocks
capacity-gate observation of a captured request before stream drain

Four full-engine tests (engine_full_*) are #[ignore]-marked. They unblock when core::engine::Engine is refactored to take Arc<dyn LlmClient> instead of a concrete Option<DeepSeekClient>. See the comment block at the bottom of integration_mock_llm.rs for the exact refactor surface.

`--record` mode for `deepseek eval`

The offline deepseek eval harness now accepts --record <DIR>. When set, each tool step appends one JSON Lines record to <DIR>/<scenario>.jsonl (default scenario: offline-tool-loop.jsonl). Each line is a self-contained JSON object with the schema:

{ "request":  { "step": "list_dir", "kind": "List" },
  "response_events": [ { "type": "ok", "output": "…" } ] }

The mock LLM client (crate::llm_client::mock) replays these fixtures by mapping each response_events array onto a canned Vec<StreamEvent>. Drop generated fixtures into crates/tui/tests/fixtures/ so they ride the repo and feed the mock in CI.

Quick example:

cargo run --bin deepseek -- eval --record crates/tui/tests/fixtures
cat crates/tui/tests/fixtures/offline-tool-loop.jsonl | jq .

The scenario name is sanitized to [A-Za-z0-9_-] before forming the filename, so unusual scenario strings stay portable across platforms.

README.md

crates/tui/tests/

Mock LLM client (integration_mock_llm.rs)

--record mode for deepseek eval

`crates/tui/tests/`

Mock LLM client (`integration_mock_llm.rs`)

`--record` mode for `deepseek eval`