Files
codewhale/crates
hexin a528ea9824 fix(streaming): preserve all tool_calls in OpenAI batch responses (#1686)
When an OpenAI-compatible backend (vLLM, Ollama, LM Studio, Together AI,
self-hosted vLLM/SGLang, etc.) streams an assistant message containing
multiple tool_calls in a single round, only the **last** tool's
`Event::ToolCallStarted` was firing. The preceding N-1 tool calls
executed and produced tool_result events, but never announced their
start to consumers (TUI / runtime API / embedder bridges), leaving them
with N orphan tool_result blocks and no matching tool_use blocks in the
assistant history.

## Reproduction

```text
backend dispatches:   7 × write_file + 1 × exec_shell
log shows:            7 × ApprovalRequired events ✓
listeners receive:    1 × chat:tool_start, 7 × chat:tool_end
session history:      1 tool_use + 7 tool_result (6 orphans)
```

Tested against vLLM 0.7 + Qwen3.6-35B-A3B with a "scaffold 7-file Tauri
template" prompt. Any model+backend combo that emits batch tool_calls
trips this — typical when a single LLM round asks for multiple parallel
file writes or edits.

## Root cause

`run_turn` tracked the currently-streaming tool block with a single
`current_tool_index: Option<usize>`. The Anthropic-style adapter
(non-streaming response → events at `chat.rs::L1807`) emits
Start/Stop pairs in lockstep so the slot never overlaps. But the
OpenAI streaming parser (`chat.rs::L1954-2064`) emits every
`ContentBlockStart::ToolUse` as soon as a tool_call delta lands, then
batches every `ContentBlockStop` at `finish_reason`:

```text
Start { index: 0 }       // tool #1
Delta { index: 0, .. }
Start { index: 1 }       // tool #2 — overwrites current_tool_index
Delta { index: 1, .. }
…
Start { index: 6 }       // current_tool_index = Some(6)
Delta { index: 6, .. }
Stop  { index: 0 }       // take() returns Some(6)  ← wrong tool!
Stop  { index: 1 }       // take() returns None
Stop  { index: 2 }       // take() returns None
…
```

The first `Stop` consumes the last index and emits `ToolCallStarted`
for the wrong `tool_uses` entry; every subsequent `Stop` finds the
slot already `None` and skips the entire `if let Some(index) = …`
branch, dropping the announcement.

## Fix

Replace the single slot with `HashMap<u32 block_index, usize
tool_uses_idx>`:

- `ContentBlockStart::ToolUse` and `::ServerToolUse` insert the
  `(event.index → tool_uses.len())` mapping.
- `InputJsonDelta` looks up by the `ContentBlockDelta` outer index.
- `ContentBlockStop` removes by the stop's index, so each Stop routes
  to its own `tool_uses` entry regardless of arrival order.

Routing no longer depends on `current_block_kind` (which has the same
single-slot overwrite problem); `current_tool_indices.remove(&index)`
returning `Some(_)` already proves the Stop belongs to a tool block.

## Tests

Added `batch_tool_calls_preserve_all_tool_use_indices` in
`core/engine/turn_loop.rs::tests` — feeds 7 Starts and 7 Stops through
the same `HashMap` API used by `run_turn`, asserts every index round-trips.

Manual end-to-end verification: vLLM + Qwen3.6-35B + 7-file Tauri
template prompt → frontend `messages` history now contains all 7
`write_file` tool_use blocks paired with their tool_result blocks.

Co-authored-by: hexin <he.xin@h3c.com>
2026-05-15 17:55:44 -05:00
..
2026-05-14 14:37:14 -05:00
2026-05-14 14:37:14 -05:00
2026-05-14 14:37:14 -05:00
2026-05-14 14:37:14 -05:00
2026-05-14 14:37:14 -05:00
2026-05-14 00:31:18 -05:00
2026-05-14 14:37:14 -05:00