fix(streaming): preserve all tool_calls in OpenAI batch responses (#1686)

When an OpenAI-compatible backend (vLLM, Ollama, LM Studio, Together AI, self-hosted vLLM/SGLang, etc.) streams an assistant message containing multiple tool_calls in a single round, only the **last** tool's `Event::ToolCallStarted` was firing. The preceding N-1 tool calls executed and produced tool_result events, but never announced their start to consumers (TUI / runtime API / embedder bridges), leaving them with N orphan tool_result blocks and no matching tool_use blocks in the assistant history. ## Reproduction ```text backend dispatches: 7 × write_file + 1 × exec_shell log shows: 7 × ApprovalRequired events ✓ listeners receive: 1 × chat:tool_start, 7 × chat:tool_end session history: 1 tool_use + 7 tool_result (6 orphans) ``` Tested against vLLM 0.7 + Qwen3.6-35B-A3B with a "scaffold 7-file Tauri template" prompt. Any model+backend combo that emits batch tool_calls trips this — typical when a single LLM round asks for multiple parallel file writes or edits. ## Root cause `run_turn` tracked the currently-streaming tool block with a single `current_tool_index: Option<usize>`. The Anthropic-style adapter (non-streaming response → events at `chat.rs::L1807`) emits Start/Stop pairs in lockstep so the slot never overlaps. But the OpenAI streaming parser (`chat.rs::L1954-2064`) emits every `ContentBlockStart::ToolUse` as soon as a tool_call delta lands, then batches every `ContentBlockStop` at `finish_reason`: ```text Start { index: 0 } // tool #1 Delta { index: 0, .. } Start { index: 1 } // tool #2 — overwrites current_tool_index Delta { index: 1, .. } … Start { index: 6 } // current_tool_index = Some(6) Delta { index: 6, .. } Stop { index: 0 } // take() returns Some(6) ← wrong tool! Stop { index: 1 } // take() returns None Stop { index: 2 } // take() returns None … ``` The first `Stop` consumes the last index and emits `ToolCallStarted` for the wrong `tool_uses` entry; every subsequent `Stop` finds the slot already `None` and skips the entire `if let Some(index) = …` branch, dropping the announcement. ## Fix Replace the single slot with `HashMap<u32 block_index, usize tool_uses_idx>`: - `ContentBlockStart::ToolUse` and `::ServerToolUse` insert the `(event.index → tool_uses.len())` mapping. - `InputJsonDelta` looks up by the `ContentBlockDelta` outer index. - `ContentBlockStop` removes by the stop's index, so each Stop routes to its own `tool_uses` entry regardless of arrival order. Routing no longer depends on `current_block_kind` (which has the same single-slot overwrite problem); `current_tool_indices.remove(&index)` returning `Some(_)` already proves the Stop belongs to a tool block. ## Tests Added `batch_tool_calls_preserve_all_tool_use_indices` in `core/engine/turn_loop.rs::tests` — feeds 7 Starts and 7 Stops through the same `HashMap` API used by `run_turn`, asserts every index round-trips. Manual end-to-end verification: vLLM + Qwen3.6-35B + 7-file Tauri template prompt → frontend `messages` history now contains all 7 `write_file` tool_use blocks paired with their tool_result blocks. Co-authored-by: hexin <he.xin@h3c.com>
2026-05-16 06:55:44 +08:00
parent b080891efa
commit a528ea9824
1 changed files with 83 additions and 8 deletions
@@ -355,7 +355,17 @@ impl Engine {
                ..Usage::default()
            };
            let mut current_block_kind: Option<ContentBlockKind> = None;
-            let mut current_tool_index: Option<usize> = None;
+            // Map block_index → tool_uses position. Required because the
+            // OpenAI-compatible streaming parser emits multiple
+            // ContentBlockStart::ToolUse events back-to-back (one per
+            // tool_call in a batch) before any ContentBlockStop arrives —
+            // all Stops are flushed together at `finish_reason`. A single
+            // Option<usize> gets overwritten by each new Start; the first
+            // Stop then takes the last index, and every subsequent Stop
+            // takes `None`, dropping ToolCallStarted events for every
+            // tool call except the last one in the batch.
+            let mut current_tool_indices: std::collections::HashMap<u32, usize> =
+                std::collections::HashMap::new();
            let mut in_tool_call_block = false;
            let mut fake_wrapper_notice_emitted = false;
            let mut pending_message_complete = false;
@@ -566,7 +576,7 @@ impl Engine {
                                name, input
                            ));
                            current_block_kind = Some(ContentBlockKind::ToolUse);
-                            current_tool_index = Some(tool_uses.len());
+                            current_tool_indices.insert(index, tool_uses.len());
                            // ToolCallStarted is deferred to ContentBlockStop —
                            // see `final_tool_input`. Emitting here would ship
                            // the placeholder `{}` and the cell would render
@@ -585,7 +595,7 @@ impl Engine {
                                name, input
                            ));
                            current_block_kind = Some(ContentBlockKind::ToolUse);
-                            current_tool_index = Some(tool_uses.len());
+                            current_tool_indices.insert(index, tool_uses.len());
                            tool_uses.push(ToolUseState {
                                id,
                                name,
@@ -634,8 +644,8 @@ impl Engine {
                            }
                        }
                        Delta::InputJsonDelta { partial_json } => {
-                            if let Some(index) = current_tool_index
-                                && let Some(tool_state) = tool_uses.get_mut(index)
+                            if let Some(&tool_idx) = current_tool_indices.get(&index)
+                                && let Some(tool_state) = tool_uses.get_mut(tool_idx)
                            {
                                tool_state.input_buffer.push_str(&partial_json);
                                crate::logging::info(format!(
@@ -669,9 +679,15 @@ impl Engine {
                            }
                            Some(ContentBlockKind::ToolUse) | None => {}
                        }
-                        if matches!(stopped_kind, Some(ContentBlockKind::ToolUse))
-                            && let Some(index) = current_tool_index.take()
-                            && let Some(tool_state) = tool_uses.get_mut(index)
+                        // Route the Stop using event.index (via
+                        // `current_tool_indices`) rather than the single
+                        // `current_block_kind` slot. In an OpenAI batch
+                        // tool-call stream every Stop after the first sees
+                        // `stopped_kind = None` because `take()` cleared the
+                        // slot, so the original `matches!(stopped_kind, …)`
+                        // check would skip every tool except the last.
+                        if let Some(tool_idx) = current_tool_indices.remove(&index)
+                            && let Some(tool_state) = tool_uses.get_mut(tool_idx)
                        {
                            crate::logging::info(format!(
                                "Tool '{}' block stop. Buffer: '{}', Current input: {:?}",
@@ -2024,6 +2040,65 @@ mod tests {
        assert!(!should_hold_turn_for_subagents(0, 0));
    }

+    /// Regression test for the OpenAI streaming batch tool_calls bug.
+    ///
+    /// Background: when an OpenAI-compatible backend (vLLM, Ollama, LM Studio,
+    /// etc.) streams a response containing multiple `tool_calls` in the same
+    /// assistant message, the streaming parser emits the events in this order:
+    ///
+    /// ```text
+    /// ContentBlockStart::ToolUse { index: 0, .. }   // tool #1
+    /// ContentBlockDelta { index: 0, .. }            // its arguments
+    /// ContentBlockStart::ToolUse { index: 1, .. }   // tool #2
+    /// ContentBlockDelta { index: 1, .. }
+    /// …
+    /// ContentBlockStart::ToolUse { index: N-1, .. }
+    /// ContentBlockDelta { index: N-1, .. }
+    /// ContentBlockStop { index: 0 }                 // ── only flushed at
+    /// ContentBlockStop { index: 1 }                 //    finish_reason
+    /// …                                             //    (see chat.rs
+    /// ContentBlockStop { index: N-1 }               //    L2050-L2064)
+    /// ```
+    ///
+    /// All Starts arrive before any Stop. The fix replaces the single
+    /// `current_tool_index: Option<usize>` slot (overwritten by each Start)
+    /// with a `HashMap<u32 block_index, usize tool_uses_idx>` that survives
+    /// every Start and routes each Stop to the right `tool_uses` entry.
+    ///
+    /// This test confirms the invariant: feed 7 Starts then 7 Stops, expect
+    /// all 7 indices to come back out in order.
+    #[test]
+    fn batch_tool_calls_preserve_all_tool_use_indices() {
+        let mut current_tool_indices: std::collections::HashMap<u32, usize> =
+            std::collections::HashMap::new();
+
+        // Simulate `ContentBlockStart::ToolUse { index: i }` for 7 tools.
+        for block_index in 0..7u32 {
+            current_tool_indices.insert(block_index, block_index as usize);
+        }
+        assert_eq!(current_tool_indices.len(), 7);
+
+        // Now drain via `ContentBlockStop { index: i }` in the same order.
+        let mut recovered: Vec<(u32, usize)> = (0..7u32)
+            .map(|block_index| {
+                let tool_idx = current_tool_indices
+                    .remove(&block_index)
+                    .expect("each block_index must route to a tool_uses entry");
+                (block_index, tool_idx)
+            })
+            .collect();
+        recovered.sort_by_key(|(block_index, _)| *block_index);
+        let expected: Vec<(u32, usize)> = (0..7u32).map(|i| (i, i as usize)).collect();
+        assert_eq!(
+            recovered, expected,
+            "every Stop must recover the tool_uses index pushed by its matching Start"
+        );
+        assert!(
+            current_tool_indices.is_empty(),
+            "all entries must drain after their Stops"
+        );
+    }
+
    #[test]
    fn loop_guard_block_tool_result_counts_as_failure() {
        let result = loop_guard_block_tool_result("Blocked: repeated call".to_string());