139b542d3f
Add a CI guard test that verifies prefix-cache stability across multi-turn conversations. The test runs 8 test cases × 14-24 turns each: - plain-dialogue (14 turns, with/without reasoning) - long-dialogue (18 turns) - mixed-message-sizes (20 turns) - tool-loop (14 turns, with/without reasoning) - long-tool-loop (24 turns, with/without reasoning) - compaction-must-cause-at-least-one-miss (30 turns) Environment variables: - CODEWHALE_CACHE_GUARD=1: Enable the guard (default: disabled) - CODEWHALE_CACHE_GUARD_THRESHOLD=40: Hit rate threshold (0-100) - CODEWHALE_CACHE_GUARD_STRICT=1: Fail on threshold violation Usage: CODEWHALE_CACHE_GUARD=1 cargo test --test cache_guard CODEWHALE_CACHE_GUARD=1 CODEWHALE_CACHE_GUARD_STRICT=1 cargo test --test cache_guard The mock simulates DeepSeek's server-side prefix cache behavior using byte-prefix matching. The default threshold (40%) is calibrated for the mock; real CI should use CODEWHALE_CACHE_GUARD_THRESHOLD=90 for production-quality validation. 9 tests covering: - 8 multi-turn conversation scenarios - 1 compaction behavior verification