docs: harvest provider fallback chain RFC
Harvested from PR #2581 by @idling11. Co-authored-by: idling11 <8055620+idling11@users.noreply.github.com>
This commit is contained in:
@@ -551,6 +551,7 @@ without recreating skills the user deliberately deleted.
|
|||||||
| [LOCALIZATION.md](docs/LOCALIZATION.md) | UI locale matrix & switching |
|
| [LOCALIZATION.md](docs/LOCALIZATION.md) | UI locale matrix & switching |
|
||||||
| [OPERATIONS_RUNBOOK.md](docs/OPERATIONS_RUNBOOK.md) | Ops & recovery |
|
| [OPERATIONS_RUNBOOK.md](docs/OPERATIONS_RUNBOOK.md) | Ops & recovery |
|
||||||
| [V0_9_0_EXECUTION_MAP.md](docs/V0_9_0_EXECUTION_MAP.md) | v0.9.0 issue lanes, PR harvest state, and release gates |
|
| [V0_9_0_EXECUTION_MAP.md](docs/V0_9_0_EXECUTION_MAP.md) | v0.9.0 issue lanes, PR harvest state, and release gates |
|
||||||
|
| [2574-provider-fallback-chain.md](docs/rfcs/2574-provider-fallback-chain.md) | Provider fallback chain RFC |
|
||||||
|
|
||||||
Full Changelog: [CHANGELOG.md](CHANGELOG.md).
|
Full Changelog: [CHANGELOG.md](CHANGELOG.md).
|
||||||
|
|
||||||
|
|||||||
@@ -42,6 +42,7 @@ harvest/stewardship commits:
|
|||||||
| #2636 project-context mtime cache | Defer direct merge; harvest only after cache key/signature is widened. | Must include constitution changes, auto-generated context deletion, canonical path equivalence, and overwrite detection before landing. |
|
| #2636 project-context mtime cache | Defer direct merge; harvest only after cache key/signature is widened. | Must include constitution changes, auto-generated context deletion, canonical path equivalence, and overwrite detection before landing. |
|
||||||
| #2634 HarmonyOS port | Defer direct merge; draft has broad platform and TLS/runtime blast radius. | Harvest at most the unused `rustyline` cleanup after local verification; full port needs OHOS target checks and sandbox/security review. |
|
| #2634 HarmonyOS port | Defer direct merge; draft has broad platform and TLS/runtime blast radius. | Harvest at most the unused `rustyline` cleanup after local verification; full port needs OHOS target checks and sandbox/security review. |
|
||||||
| #2687 append-only mode/approval prompt | Defer direct merge; draft has compile failures and Plan-mode prompt correctness risks. | Any future harvest must keep stable `message[0]` genuinely mode-agnostic, preserve mode/approval suffixes after capacity replans, and distinguish external overrides from persisted generated prompts. |
|
| #2687 append-only mode/approval prompt | Defer direct merge; draft has compile failures and Plan-mode prompt correctness risks. | Any future harvest must keep stable `message[0]` genuinely mode-agnostic, preserve mode/approval suffixes after capacity replans, and distinguish external overrides from persisted generated prompts. |
|
||||||
|
| #2581 provider fallback chain design doc | Manually harvested as `docs/rfcs/2574-provider-fallback-chain.md` because the current PR head has no net file changes. | Keep issue #2574 open for implementation; close/comment on #2581 after the integration branch is public, crediting @idling11 and reporter @hsdbeebou. |
|
||||||
|
|
||||||
## PR Harvest Queue
|
## PR Harvest Queue
|
||||||
|
|
||||||
@@ -85,7 +86,7 @@ harvest/stewardship commits:
|
|||||||
| #2576 PrefixCacheChange events | Mergeable | Review after current prefix-cache commits. |
|
| #2576 PrefixCacheChange events | Mergeable | Review after current prefix-cache commits. |
|
||||||
| #2578 turn_end observer hook | Conflicting | Defer to hook lifecycle lane. |
|
| #2578 turn_end observer hook | Conflicting | Defer to hook lifecycle lane. |
|
||||||
| #2579 AppendLog session messages | Conflicting | Defer; large architectural change. |
|
| #2579 AppendLog session messages | Conflicting | Defer; large architectural change. |
|
||||||
| #2581 provider fallback chain design doc | Mergeable | Docs-only; review for current provider direction. |
|
| #2581 provider fallback chain design doc | Mergeable / empty diff | Manually harvested into `docs/rfcs/2574-provider-fallback-chain.md`; close original PR after branch is public, keep #2574 open for implementation. |
|
||||||
| #2623 plan prompt modal scroll support | Mergeable | Already harvested into the 22-commit stack. Comment/close original after integration branch is public. |
|
| #2623 plan prompt modal scroll support | Mergeable | Already harvested into the 22-commit stack. Comment/close original after integration branch is public. |
|
||||||
| #2627 Xiaomi MiMo Token Plan mode | Conflicting | Partially harvested; leave original open or comment with remaining mode/env scope once branch is public. |
|
| #2627 Xiaomi MiMo Token Plan mode | Conflicting | Partially harvested; leave original open or comment with remaining mode/env scope once branch is public. |
|
||||||
| #2631 estimated_input_tokens cache | Mergeable | Already harvested into the 22-commit stack. |
|
| #2631 estimated_input_tokens cache | Mergeable | Already harvested into the 22-commit stack. |
|
||||||
@@ -120,7 +121,7 @@ Issue count should drop through evidence-backed consolidation, not bulk closing.
|
|||||||
|
|
||||||
## Immediate Next Actions
|
## Immediate Next Actions
|
||||||
|
|
||||||
1. Review #2048, #2502, #2509, #2513, #2530, #2576, and #2581 as the next small
|
1. Review #2048, #2502, #2509, #2513, #2530, and #2576 as the next small
|
||||||
mergeable candidates.
|
mergeable candidates.
|
||||||
2. Prepare public comments for #2708, #2627, #2634, #2636, #2687, and already-harvested performance
|
2. Prepare public comments for #2708, #2627, #2634, #2636, #2687, and already-harvested performance
|
||||||
PRs once this integration branch has a remote review surface.
|
PRs once this integration branch has a remote review surface.
|
||||||
|
|||||||
@@ -0,0 +1,167 @@
|
|||||||
|
# RFC: Provider Fallback Chain
|
||||||
|
|
||||||
|
**Issue:** #2574
|
||||||
|
**Reporter:** @hsdbeebou
|
||||||
|
**Design source:** #2581 by @idling11
|
||||||
|
**Status:** Draft for the v0.9 provider-routing lane
|
||||||
|
**Date:** 2026-06-04
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
CodeWhale can store credentials and defaults for several providers, but a
|
||||||
|
running session uses one active provider route at a time. When that provider
|
||||||
|
hits a rate limit, temporary outage, or transport failure, the user must notice
|
||||||
|
the failure, run `/provider`, choose another route, and resubmit the turn.
|
||||||
|
|
||||||
|
That manual switch is especially disruptive during long-running agentic work.
|
||||||
|
A provider fallback chain can keep work moving, but it also changes billing
|
||||||
|
source, model behavior, tool support, context-window limits, and vendor
|
||||||
|
expectations. The design must make that switch explicit and capability-aware.
|
||||||
|
|
||||||
|
## Principles
|
||||||
|
|
||||||
|
- Fallback is opt-in. No provider switch happens unless the user configured a
|
||||||
|
fallback chain.
|
||||||
|
- Billing and vendor changes are visible in the transcript and status UI.
|
||||||
|
- Normal retry policy runs before fallback.
|
||||||
|
- Fallback is allowed only before assistant content or tool calls have started
|
||||||
|
streaming for the failing request.
|
||||||
|
- Fallback candidates must support the request shape for the current turn.
|
||||||
|
- Authentication, authorization, malformed request, and model-not-found errors
|
||||||
|
do not silently switch providers by default.
|
||||||
|
|
||||||
|
## Proposed Config Shape
|
||||||
|
|
||||||
|
Keep the existing root `provider = "..."` setting as the primary route. Add an
|
||||||
|
ordered fallback list and a small policy section:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
provider = "nvidia-nim"
|
||||||
|
fallback_providers = ["deepseek", "openrouter"]
|
||||||
|
|
||||||
|
[provider_fallback]
|
||||||
|
enabled = true
|
||||||
|
reset_on_new_session = true
|
||||||
|
```
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
|
||||||
|
- `fallback_providers` is ordered and contains provider IDs already accepted by
|
||||||
|
the provider parser.
|
||||||
|
- The primary provider is not repeated in the fallback list.
|
||||||
|
- Duplicate fallback providers are rejected.
|
||||||
|
- Missing credentials produce a startup warning and make that fallback entry
|
||||||
|
inactive until credentials appear.
|
||||||
|
- If `provider_fallback.enabled` is absent, the presence of a non-empty
|
||||||
|
`fallback_providers` list enables fallback.
|
||||||
|
|
||||||
|
## Fallback Eligibility
|
||||||
|
|
||||||
|
| Failure | Fallback by default? | Notes |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| HTTP 429 | Yes | Rate limit or quota exhaustion on the active route. |
|
||||||
|
| HTTP 502, 503, 504 | Yes | Temporary upstream failure after normal retries. |
|
||||||
|
| Connect timeout / DNS failure | Yes | Transport path failed before content streamed. |
|
||||||
|
| HTTP 401 / 403 | No | Usually bad credentials or account permissions. |
|
||||||
|
| HTTP 400 | No | Usually client request shape or model parameter issue. |
|
||||||
|
| Model not found | No | Avoid silently switching model families unless a future policy explicitly opts in. |
|
||||||
|
| Stream interrupted after content | No | The transcript may already contain partial assistant content or tool-call deltas. |
|
||||||
|
|
||||||
|
The first implementation should classify errors centrally and expose tests for
|
||||||
|
each case before any fallback execution is wired into the turn loop.
|
||||||
|
|
||||||
|
## Capability Gate
|
||||||
|
|
||||||
|
Before switching to a fallback provider/model, CodeWhale checks that the
|
||||||
|
candidate can support the current request shape:
|
||||||
|
|
||||||
|
| Requirement | Gate |
|
||||||
|
| --- | --- |
|
||||||
|
| Tool calls | Candidate provider/model must support tool calling. |
|
||||||
|
| Reasoning effort | Candidate must support the requested thinking mode, or the switch is blocked. |
|
||||||
|
| Context size | Candidate context window must fit the estimated current request. |
|
||||||
|
| Image inputs | Candidate must support vision if the turn includes images. |
|
||||||
|
| Provider-specific headers | Candidate request must be rebuilt from that provider's own auth/base-url/header rules. |
|
||||||
|
|
||||||
|
If no fallback candidate passes the gate, CodeWhale surfaces the original
|
||||||
|
provider error with a clear "fallback chain exhausted or incompatible" note.
|
||||||
|
|
||||||
|
## Runtime Behavior
|
||||||
|
|
||||||
|
1. Build the request for the active provider.
|
||||||
|
2. Run existing retry policy for that provider.
|
||||||
|
3. If retries exhaust with a fallback-eligible failure and no assistant content
|
||||||
|
has streamed, evaluate the next fallback provider.
|
||||||
|
4. Rebuild the request with the fallback provider's model, base URL, auth, and
|
||||||
|
provider-specific headers.
|
||||||
|
5. Add a visible transcript marker and status event before the fallback request
|
||||||
|
starts.
|
||||||
|
6. Continue through the chain until a provider succeeds, the chain is
|
||||||
|
exhausted, or a non-eligible failure occurs.
|
||||||
|
|
||||||
|
Suggested transcript marker:
|
||||||
|
|
||||||
|
```text
|
||||||
|
[provider fallback: nvidia-nim -> deepseek, reason: rate_limit]
|
||||||
|
```
|
||||||
|
|
||||||
|
Suggested status text:
|
||||||
|
|
||||||
|
```text
|
||||||
|
NVIDIA NIM unavailable; switched to DeepSeek fallback
|
||||||
|
```
|
||||||
|
|
||||||
|
For multi-request turns, such as tool-call result follow-ups, fallback can be
|
||||||
|
considered for a later request only if that later request has not started
|
||||||
|
streaming assistant content yet. The transcript marker must identify that the
|
||||||
|
turn changed provider between requests.
|
||||||
|
|
||||||
|
## UI and Commands
|
||||||
|
|
||||||
|
- `/provider` should show the primary route and the current fallback position.
|
||||||
|
- `/provider reset` should return to the primary provider for future requests in
|
||||||
|
the current session.
|
||||||
|
- The footer/statusline should surface the concrete provider/model that actually
|
||||||
|
handled the latest request.
|
||||||
|
- Session receipts should record both attempted provider and successful
|
||||||
|
provider so cost and debugging information stay truthful.
|
||||||
|
|
||||||
|
## Implementation Slices
|
||||||
|
|
||||||
|
1. Config schema and validation:
|
||||||
|
- parse `fallback_providers` and `[provider_fallback]`
|
||||||
|
- validate known providers, duplicates, missing credentials, and primary
|
||||||
|
self-reference
|
||||||
|
- document the config surface
|
||||||
|
2. Error classification:
|
||||||
|
- define fallback-eligible error kinds
|
||||||
|
- add unit tests for HTTP and transport failures
|
||||||
|
3. Request-shape capability gate:
|
||||||
|
- evaluate tool, thinking, context, and image requirements
|
||||||
|
- add tests for incompatible fallbacks
|
||||||
|
4. Fallback execution:
|
||||||
|
- run retries per provider before moving to the next provider
|
||||||
|
- rebuild auth/base-url/header state for each candidate
|
||||||
|
- block fallback after partial streaming
|
||||||
|
5. UI/receipt integration:
|
||||||
|
- status event
|
||||||
|
- transcript marker
|
||||||
|
- `/provider reset`
|
||||||
|
- receipt fields for attempted and selected provider
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- No automatic cost optimization or weighted provider selection.
|
||||||
|
- No silent fallback when authentication or permissions fail.
|
||||||
|
- No fallback after partial assistant content or tool-call deltas have streamed.
|
||||||
|
- No provider/model capability downgrades without an explicit future policy.
|
||||||
|
- No sub-agent-specific fallback policy in the first implementation; sub-agents
|
||||||
|
inherit the same configured fallback chain unless they are given an explicit
|
||||||
|
provider/model override.
|
||||||
|
|
||||||
|
## Credit
|
||||||
|
|
||||||
|
This RFC is based on issue #2574 from @hsdbeebou and PR #2581 from @idling11.
|
||||||
|
The original PR head currently has no net file changes, so this document
|
||||||
|
preserves the useful design direction while tightening the v0.9 contract around
|
||||||
|
truthful provider routing, billing visibility, and capability checks.
|
||||||
Reference in New Issue
Block a user