Closes #2074 Adds FauxStep::Factory(Box<dyn Fn(&MessageRequest) -> CannedTurn + Send + Sync>) to MockLlmClient. When a Factory step is dequeued, its closure runs against the real outgoing MessageRequest before the response stream is built, so any assert! panic surfaces directly from the client call instead of later in stream polling. Internal storage moves from VecDeque<CannedTurn> to VecDeque<FauxStep>, but every existing public method keeps working: - MockLlmClient::new(Vec<CannedTurn>) wraps each turn in FauxStep::Canned. - push_turn(CannedTurn) appends as FauxStep::Canned. Adds push_factory(closure) for tests that want the Factory branch. Doc comment on the Factory variant captures the DeepSeek V4 thinking-mode tool-call invariant (the v0.4.9-v0.5.1 reasoning_content drop that produced HTTP 400 on follow-up turns). Adds: - crates/tui/tests/reasoning_content_replayed_after_tool_call.rs — a regression test whose factory asserts the assistant tool-call turn carries a Thinking content block after a thinking + tool-call round. - An additional unit test in mock.rs covering create_message_synthesizes_from_factory_turn. All 20 tests in the new file pass, and the existing integration_mock_llm suite (27 tests) is unchanged.
crates/tui/tests/
Integration tests for the TUI binary. Per CONTRIBUTING.md, each crate's
integration tests live in its own tests/ directory; the repository-root
tests/ directory is unused.
Mock LLM client (integration_mock_llm.rs)
crates/tui/src/llm_client/mock.rs provides a MockLlmClient that implements
the LlmClient trait by replaying queue-driven canned responses and capturing
every outgoing MessageRequest. Tests mock at the trait boundary — never
at the reqwest HTTP layer — because the trait is the durable abstraction the
runtime is meant to depend on.
Coverage today exercises the trait surface end-to-end:
- streaming turn loop
- reasoning-content replay across tool-call rounds (V4 §5.1.1, the bug that broke v0.4.9-v0.5.1)
- tool-call round-trip with chunked input JSON
- multi-tool-call ordering inside a single turn
- compaction-style non-streaming
create_message - sub-agent style independent parent/child mocks
- capacity-gate observation of a captured request before stream drain
Four full-engine tests (engine_full_*) are #[ignore]-marked. They unblock
when core::engine::Engine is refactored to take Arc<dyn LlmClient> instead
of a concrete Option<DeepSeekClient>. See the comment block at the bottom of
integration_mock_llm.rs for the exact refactor surface.
--record mode for deepseek eval
The offline deepseek eval harness now accepts --record <DIR>. When set,
each tool step appends one JSON Lines record to <DIR>/<scenario>.jsonl
(default scenario: offline-tool-loop.jsonl). Each line is a self-contained
JSON object with the schema:
{ "request": { "step": "list_dir", "kind": "List" },
"response_events": [ { "type": "ok", "output": "…" } ] }
The mock LLM client (crate::llm_client::mock) replays these fixtures by
mapping each response_events array onto a canned Vec<StreamEvent>. Drop
generated fixtures into crates/tui/tests/fixtures/ so they ride the repo and
feed the mock in CI.
Quick example:
cargo run --bin deepseek -- eval --record crates/tui/tests/fixtures
cat crates/tui/tests/fixtures/offline-tool-loop.jsonl | jq .
The scenario name is sanitized to [A-Za-z0-9_-] before forming the filename,
so unusual scenario strings stay portable across platforms.