dgf1988/codewhale

Files

T

Matt Van Horn de3a1f7773 feat(tui): FauxStep::Factory for live request-shape assertions

Closes #2074

Adds FauxStep::Factory(Box<dyn Fn(&MessageRequest) -> CannedTurn + Send + Sync>)
to MockLlmClient. When a Factory step is dequeued, its closure runs
against the real outgoing MessageRequest before the response stream is
built, so any assert! panic surfaces directly from the client call
instead of later in stream polling.

Internal storage moves from VecDeque<CannedTurn> to VecDeque<FauxStep>,
but every existing public method keeps working:

- MockLlmClient::new(Vec<CannedTurn>) wraps each turn in FauxStep::Canned.
- push_turn(CannedTurn) appends as FauxStep::Canned.

Adds push_factory(closure) for tests that want the Factory branch.

Doc comment on the Factory variant captures the DeepSeek V4
thinking-mode tool-call invariant (the v0.4.9-v0.5.1 reasoning_content
drop that produced HTTP 400 on follow-up turns).

Adds:

- crates/tui/tests/reasoning_content_replayed_after_tool_call.rs — a
  regression test whose factory asserts the assistant tool-call turn
  carries a Thinking content block after a thinking + tool-call round.
- An additional unit test in mock.rs covering create_message_synthesizes_from_factory_turn.

All 20 tests in the new file pass, and the existing
integration_mock_llm suite (27 tests) is unchanged.

2026-05-30 21:15:58 -07:00

fixtures

feat(tools): add image_ocr tool — extract text from images via tesseract

2026-05-12 00:58:48 -05:00

support

refactor(strings): rebrand user-facing strings to codewhale

2026-05-23 11:48:43 -05:00

eval_harness.rs

style(tui): format shell dispatcher stack

2026-05-30 19:18:38 -07:00

integration_mock_llm.rs

feat(tools/agent_spawn): teach parent that subagent results are self-reports

2026-05-10 08:15:19 -05:00

palette_audit.rs

chore(release): prepare v0.8.45

2026-05-25 18:45:36 -05:00

protocol_recovery.rs

refactor(strings): rebrand user-facing strings to codewhale

2026-05-23 11:48:43 -05:00

qa_pty.rs

test(qa-pty): retarget harness at codewhale-tui binary

2026-05-23 11:11:11 -05:00

README.md

test(tui): #69 integration tests for mock LLM client + record fixtures

2026-04-28 00:03:18 -05:00

reasoning_content_replayed_after_tool_call.rs

feat(tui): FauxStep::Factory for live request-shape assertions

2026-05-30 21:15:58 -07:00

skill_install.rs

fix(skills): accept workflow pack archive layouts (#1164 )

2026-05-08 02:37:21 -05:00

README.md

`crates/tui/tests/`

Integration tests for the TUI binary. Per CONTRIBUTING.md, each crate's integration tests live in its own tests/ directory; the repository-root tests/ directory is unused.

Mock LLM client (`integration_mock_llm.rs`)

crates/tui/src/llm_client/mock.rs provides a MockLlmClient that implements the LlmClient trait by replaying queue-driven canned responses and capturing every outgoing MessageRequest. Tests mock at the trait boundary — never at the reqwest HTTP layer — because the trait is the durable abstraction the runtime is meant to depend on.

Coverage today exercises the trait surface end-to-end:

streaming turn loop
reasoning-content replay across tool-call rounds (V4 §5.1.1, the bug that broke v0.4.9-v0.5.1)
tool-call round-trip with chunked input JSON
multi-tool-call ordering inside a single turn
compaction-style non-streaming create_message
sub-agent style independent parent/child mocks
capacity-gate observation of a captured request before stream drain

Four full-engine tests (engine_full_*) are #[ignore]-marked. They unblock when core::engine::Engine is refactored to take Arc<dyn LlmClient> instead of a concrete Option<DeepSeekClient>. See the comment block at the bottom of integration_mock_llm.rs for the exact refactor surface.

`--record` mode for `deepseek eval`

The offline deepseek eval harness now accepts --record <DIR>. When set, each tool step appends one JSON Lines record to <DIR>/<scenario>.jsonl (default scenario: offline-tool-loop.jsonl). Each line is a self-contained JSON object with the schema:

{ "request":  { "step": "list_dir", "kind": "List" },
  "response_events": [ { "type": "ok", "output": "…" } ] }

The mock LLM client (crate::llm_client::mock) replays these fixtures by mapping each response_events array onto a canned Vec<StreamEvent>. Drop generated fixtures into crates/tui/tests/fixtures/ so they ride the repo and feed the mock in CI.

Quick example:

cargo run --bin deepseek -- eval --record crates/tui/tests/fixtures
cat crates/tui/tests/fixtures/offline-tool-loop.jsonl | jq .

The scenario name is sanitized to [A-Za-z0-9_-] before forming the filename, so unusual scenario strings stay portable across platforms.

README.md

crates/tui/tests/

Mock LLM client (integration_mock_llm.rs)

--record mode for deepseek eval

`crates/tui/tests/`

Mock LLM client (`integration_mock_llm.rs`)

`--record` mode for `deepseek eval`