Files
codewhale/crates
Hunter Bown bbdfb26f3c fix(client): TCP/HTTP2 keepalives + stream-error diagnostics (#103 Phase 1+2)
Two fixes for the persistent "Stream read error: error decoding response
body" we saw mid-turn during long V4-pro thinking sessions.

1) HTTP transport tuning (`crates/tui/src/client.rs`):
   - Drop the blanket 300s request timeout. Long V4 thinking turns
     legitimately exceed the wall-clock window; per-chunk and per-stream
     guards in `engine.rs` already bound how long we wait without progress.
   - Add `tcp_keepalive(30s)` so dead-peer detection happens at the TCP
     layer instead of waiting for the application to notice.
   - Add `http2_keep_alive_interval(15s)` + `http2_keep_alive_timeout(20s)`
     so HTTP/2 connections to DeepSeek's edge don't go silent and get
     killed by an upstream proxy mid-thinking.

2) Stream-error diagnostics (`crates/tui/src/client/chat.rs`):
   - Walk reqwest's `std::error::Error::source()` chain when a chunk read
     errors, so the underlying hyper / h2 / io error is logged. Without
     this the outer "error decoding response body" message tells us
     nothing about WHY the stream died.
   - Track elapsed wall time, bytes received so far, and ms since the
     last successful event; log them alongside the error chain. Lets us
     tell HTTP/2 RST_STREAM mid-idle from chunk-decode-failure on a
     short stream from gzip-corruption mid-burst.

Phase 3 (transparent retry with `prefix` continuation) is intentionally
NOT in this PR. The retry-flag plumbing on MessageRequest + chat.rs prefix
wire format + engine.rs retry loop is a meaningful surface that deserves
its own review pass; this PR ships the diagnostic-and-resilience floor so
we can land the harder retry work knowing the underlying network state is
better.

Refs #103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:19:42 -05:00
..