docs: harvest provider fallback chain RFC
Harvested from PR #2581 by @idling11. Co-authored-by: idling11 <8055620+idling11@users.noreply.github.com>
This commit is contained in:
@@ -0,0 +1,167 @@
|
||||
# RFC: Provider Fallback Chain
|
||||
|
||||
**Issue:** #2574
|
||||
**Reporter:** @hsdbeebou
|
||||
**Design source:** #2581 by @idling11
|
||||
**Status:** Draft for the v0.9 provider-routing lane
|
||||
**Date:** 2026-06-04
|
||||
|
||||
## Problem
|
||||
|
||||
CodeWhale can store credentials and defaults for several providers, but a
|
||||
running session uses one active provider route at a time. When that provider
|
||||
hits a rate limit, temporary outage, or transport failure, the user must notice
|
||||
the failure, run `/provider`, choose another route, and resubmit the turn.
|
||||
|
||||
That manual switch is especially disruptive during long-running agentic work.
|
||||
A provider fallback chain can keep work moving, but it also changes billing
|
||||
source, model behavior, tool support, context-window limits, and vendor
|
||||
expectations. The design must make that switch explicit and capability-aware.
|
||||
|
||||
## Principles
|
||||
|
||||
- Fallback is opt-in. No provider switch happens unless the user configured a
|
||||
fallback chain.
|
||||
- Billing and vendor changes are visible in the transcript and status UI.
|
||||
- Normal retry policy runs before fallback.
|
||||
- Fallback is allowed only before assistant content or tool calls have started
|
||||
streaming for the failing request.
|
||||
- Fallback candidates must support the request shape for the current turn.
|
||||
- Authentication, authorization, malformed request, and model-not-found errors
|
||||
do not silently switch providers by default.
|
||||
|
||||
## Proposed Config Shape
|
||||
|
||||
Keep the existing root `provider = "..."` setting as the primary route. Add an
|
||||
ordered fallback list and a small policy section:
|
||||
|
||||
```toml
|
||||
provider = "nvidia-nim"
|
||||
fallback_providers = ["deepseek", "openrouter"]
|
||||
|
||||
[provider_fallback]
|
||||
enabled = true
|
||||
reset_on_new_session = true
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- `fallback_providers` is ordered and contains provider IDs already accepted by
|
||||
the provider parser.
|
||||
- The primary provider is not repeated in the fallback list.
|
||||
- Duplicate fallback providers are rejected.
|
||||
- Missing credentials produce a startup warning and make that fallback entry
|
||||
inactive until credentials appear.
|
||||
- If `provider_fallback.enabled` is absent, the presence of a non-empty
|
||||
`fallback_providers` list enables fallback.
|
||||
|
||||
## Fallback Eligibility
|
||||
|
||||
| Failure | Fallback by default? | Notes |
|
||||
| --- | --- | --- |
|
||||
| HTTP 429 | Yes | Rate limit or quota exhaustion on the active route. |
|
||||
| HTTP 502, 503, 504 | Yes | Temporary upstream failure after normal retries. |
|
||||
| Connect timeout / DNS failure | Yes | Transport path failed before content streamed. |
|
||||
| HTTP 401 / 403 | No | Usually bad credentials or account permissions. |
|
||||
| HTTP 400 | No | Usually client request shape or model parameter issue. |
|
||||
| Model not found | No | Avoid silently switching model families unless a future policy explicitly opts in. |
|
||||
| Stream interrupted after content | No | The transcript may already contain partial assistant content or tool-call deltas. |
|
||||
|
||||
The first implementation should classify errors centrally and expose tests for
|
||||
each case before any fallback execution is wired into the turn loop.
|
||||
|
||||
## Capability Gate
|
||||
|
||||
Before switching to a fallback provider/model, CodeWhale checks that the
|
||||
candidate can support the current request shape:
|
||||
|
||||
| Requirement | Gate |
|
||||
| --- | --- |
|
||||
| Tool calls | Candidate provider/model must support tool calling. |
|
||||
| Reasoning effort | Candidate must support the requested thinking mode, or the switch is blocked. |
|
||||
| Context size | Candidate context window must fit the estimated current request. |
|
||||
| Image inputs | Candidate must support vision if the turn includes images. |
|
||||
| Provider-specific headers | Candidate request must be rebuilt from that provider's own auth/base-url/header rules. |
|
||||
|
||||
If no fallback candidate passes the gate, CodeWhale surfaces the original
|
||||
provider error with a clear "fallback chain exhausted or incompatible" note.
|
||||
|
||||
## Runtime Behavior
|
||||
|
||||
1. Build the request for the active provider.
|
||||
2. Run existing retry policy for that provider.
|
||||
3. If retries exhaust with a fallback-eligible failure and no assistant content
|
||||
has streamed, evaluate the next fallback provider.
|
||||
4. Rebuild the request with the fallback provider's model, base URL, auth, and
|
||||
provider-specific headers.
|
||||
5. Add a visible transcript marker and status event before the fallback request
|
||||
starts.
|
||||
6. Continue through the chain until a provider succeeds, the chain is
|
||||
exhausted, or a non-eligible failure occurs.
|
||||
|
||||
Suggested transcript marker:
|
||||
|
||||
```text
|
||||
[provider fallback: nvidia-nim -> deepseek, reason: rate_limit]
|
||||
```
|
||||
|
||||
Suggested status text:
|
||||
|
||||
```text
|
||||
NVIDIA NIM unavailable; switched to DeepSeek fallback
|
||||
```
|
||||
|
||||
For multi-request turns, such as tool-call result follow-ups, fallback can be
|
||||
considered for a later request only if that later request has not started
|
||||
streaming assistant content yet. The transcript marker must identify that the
|
||||
turn changed provider between requests.
|
||||
|
||||
## UI and Commands
|
||||
|
||||
- `/provider` should show the primary route and the current fallback position.
|
||||
- `/provider reset` should return to the primary provider for future requests in
|
||||
the current session.
|
||||
- The footer/statusline should surface the concrete provider/model that actually
|
||||
handled the latest request.
|
||||
- Session receipts should record both attempted provider and successful
|
||||
provider so cost and debugging information stay truthful.
|
||||
|
||||
## Implementation Slices
|
||||
|
||||
1. Config schema and validation:
|
||||
- parse `fallback_providers` and `[provider_fallback]`
|
||||
- validate known providers, duplicates, missing credentials, and primary
|
||||
self-reference
|
||||
- document the config surface
|
||||
2. Error classification:
|
||||
- define fallback-eligible error kinds
|
||||
- add unit tests for HTTP and transport failures
|
||||
3. Request-shape capability gate:
|
||||
- evaluate tool, thinking, context, and image requirements
|
||||
- add tests for incompatible fallbacks
|
||||
4. Fallback execution:
|
||||
- run retries per provider before moving to the next provider
|
||||
- rebuild auth/base-url/header state for each candidate
|
||||
- block fallback after partial streaming
|
||||
5. UI/receipt integration:
|
||||
- status event
|
||||
- transcript marker
|
||||
- `/provider reset`
|
||||
- receipt fields for attempted and selected provider
|
||||
|
||||
## Non-goals
|
||||
|
||||
- No automatic cost optimization or weighted provider selection.
|
||||
- No silent fallback when authentication or permissions fail.
|
||||
- No fallback after partial assistant content or tool-call deltas have streamed.
|
||||
- No provider/model capability downgrades without an explicit future policy.
|
||||
- No sub-agent-specific fallback policy in the first implementation; sub-agents
|
||||
inherit the same configured fallback chain unless they are given an explicit
|
||||
provider/model override.
|
||||
|
||||
## Credit
|
||||
|
||||
This RFC is based on issue #2574 from @hsdbeebou and PR #2581 from @idling11.
|
||||
The original PR head currently has no net file changes, so this document
|
||||
preserves the useful design direction while tightening the v0.9 contract around
|
||||
truthful provider routing, billing visibility, and capability checks.
|
||||
Reference in New Issue
Block a user