docs: harvest provider fallback chain RFC

Harvested from PR #2581 by @idling11.

Co-authored-by: idling11 <8055620+idling11@users.noreply.github.com>
This commit is contained in:
Hunter B
2026-06-03 19:55:14 -07:00
parent 44ceabd606
commit 5dc1a63cd4
3 changed files with 171 additions and 2 deletions
+1
View File
@@ -551,6 +551,7 @@ without recreating skills the user deliberately deleted.
| [LOCALIZATION.md](docs/LOCALIZATION.md) | UI locale matrix & switching |
| [OPERATIONS_RUNBOOK.md](docs/OPERATIONS_RUNBOOK.md) | Ops & recovery |
| [V0_9_0_EXECUTION_MAP.md](docs/V0_9_0_EXECUTION_MAP.md) | v0.9.0 issue lanes, PR harvest state, and release gates |
| [2574-provider-fallback-chain.md](docs/rfcs/2574-provider-fallback-chain.md) | Provider fallback chain RFC |
Full Changelog: [CHANGELOG.md](CHANGELOG.md).
+3 -2
View File
@@ -42,6 +42,7 @@ harvest/stewardship commits:
| #2636 project-context mtime cache | Defer direct merge; harvest only after cache key/signature is widened. | Must include constitution changes, auto-generated context deletion, canonical path equivalence, and overwrite detection before landing. |
| #2634 HarmonyOS port | Defer direct merge; draft has broad platform and TLS/runtime blast radius. | Harvest at most the unused `rustyline` cleanup after local verification; full port needs OHOS target checks and sandbox/security review. |
| #2687 append-only mode/approval prompt | Defer direct merge; draft has compile failures and Plan-mode prompt correctness risks. | Any future harvest must keep stable `message[0]` genuinely mode-agnostic, preserve mode/approval suffixes after capacity replans, and distinguish external overrides from persisted generated prompts. |
| #2581 provider fallback chain design doc | Manually harvested as `docs/rfcs/2574-provider-fallback-chain.md` because the current PR head has no net file changes. | Keep issue #2574 open for implementation; close/comment on #2581 after the integration branch is public, crediting @idling11 and reporter @hsdbeebou. |
## PR Harvest Queue
@@ -85,7 +86,7 @@ harvest/stewardship commits:
| #2576 PrefixCacheChange events | Mergeable | Review after current prefix-cache commits. |
| #2578 turn_end observer hook | Conflicting | Defer to hook lifecycle lane. |
| #2579 AppendLog session messages | Conflicting | Defer; large architectural change. |
| #2581 provider fallback chain design doc | Mergeable | Docs-only; review for current provider direction. |
| #2581 provider fallback chain design doc | Mergeable / empty diff | Manually harvested into `docs/rfcs/2574-provider-fallback-chain.md`; close original PR after branch is public, keep #2574 open for implementation. |
| #2623 plan prompt modal scroll support | Mergeable | Already harvested into the 22-commit stack. Comment/close original after integration branch is public. |
| #2627 Xiaomi MiMo Token Plan mode | Conflicting | Partially harvested; leave original open or comment with remaining mode/env scope once branch is public. |
| #2631 estimated_input_tokens cache | Mergeable | Already harvested into the 22-commit stack. |
@@ -120,7 +121,7 @@ Issue count should drop through evidence-backed consolidation, not bulk closing.
## Immediate Next Actions
1. Review #2048, #2502, #2509, #2513, #2530, #2576, and #2581 as the next small
1. Review #2048, #2502, #2509, #2513, #2530, and #2576 as the next small
mergeable candidates.
2. Prepare public comments for #2708, #2627, #2634, #2636, #2687, and already-harvested performance
PRs once this integration branch has a remote review surface.
+167
View File
@@ -0,0 +1,167 @@
# RFC: Provider Fallback Chain
**Issue:** #2574
**Reporter:** @hsdbeebou
**Design source:** #2581 by @idling11
**Status:** Draft for the v0.9 provider-routing lane
**Date:** 2026-06-04
## Problem
CodeWhale can store credentials and defaults for several providers, but a
running session uses one active provider route at a time. When that provider
hits a rate limit, temporary outage, or transport failure, the user must notice
the failure, run `/provider`, choose another route, and resubmit the turn.
That manual switch is especially disruptive during long-running agentic work.
A provider fallback chain can keep work moving, but it also changes billing
source, model behavior, tool support, context-window limits, and vendor
expectations. The design must make that switch explicit and capability-aware.
## Principles
- Fallback is opt-in. No provider switch happens unless the user configured a
fallback chain.
- Billing and vendor changes are visible in the transcript and status UI.
- Normal retry policy runs before fallback.
- Fallback is allowed only before assistant content or tool calls have started
streaming for the failing request.
- Fallback candidates must support the request shape for the current turn.
- Authentication, authorization, malformed request, and model-not-found errors
do not silently switch providers by default.
## Proposed Config Shape
Keep the existing root `provider = "..."` setting as the primary route. Add an
ordered fallback list and a small policy section:
```toml
provider = "nvidia-nim"
fallback_providers = ["deepseek", "openrouter"]
[provider_fallback]
enabled = true
reset_on_new_session = true
```
Rules:
- `fallback_providers` is ordered and contains provider IDs already accepted by
the provider parser.
- The primary provider is not repeated in the fallback list.
- Duplicate fallback providers are rejected.
- Missing credentials produce a startup warning and make that fallback entry
inactive until credentials appear.
- If `provider_fallback.enabled` is absent, the presence of a non-empty
`fallback_providers` list enables fallback.
## Fallback Eligibility
| Failure | Fallback by default? | Notes |
| --- | --- | --- |
| HTTP 429 | Yes | Rate limit or quota exhaustion on the active route. |
| HTTP 502, 503, 504 | Yes | Temporary upstream failure after normal retries. |
| Connect timeout / DNS failure | Yes | Transport path failed before content streamed. |
| HTTP 401 / 403 | No | Usually bad credentials or account permissions. |
| HTTP 400 | No | Usually client request shape or model parameter issue. |
| Model not found | No | Avoid silently switching model families unless a future policy explicitly opts in. |
| Stream interrupted after content | No | The transcript may already contain partial assistant content or tool-call deltas. |
The first implementation should classify errors centrally and expose tests for
each case before any fallback execution is wired into the turn loop.
## Capability Gate
Before switching to a fallback provider/model, CodeWhale checks that the
candidate can support the current request shape:
| Requirement | Gate |
| --- | --- |
| Tool calls | Candidate provider/model must support tool calling. |
| Reasoning effort | Candidate must support the requested thinking mode, or the switch is blocked. |
| Context size | Candidate context window must fit the estimated current request. |
| Image inputs | Candidate must support vision if the turn includes images. |
| Provider-specific headers | Candidate request must be rebuilt from that provider's own auth/base-url/header rules. |
If no fallback candidate passes the gate, CodeWhale surfaces the original
provider error with a clear "fallback chain exhausted or incompatible" note.
## Runtime Behavior
1. Build the request for the active provider.
2. Run existing retry policy for that provider.
3. If retries exhaust with a fallback-eligible failure and no assistant content
has streamed, evaluate the next fallback provider.
4. Rebuild the request with the fallback provider's model, base URL, auth, and
provider-specific headers.
5. Add a visible transcript marker and status event before the fallback request
starts.
6. Continue through the chain until a provider succeeds, the chain is
exhausted, or a non-eligible failure occurs.
Suggested transcript marker:
```text
[provider fallback: nvidia-nim -> deepseek, reason: rate_limit]
```
Suggested status text:
```text
NVIDIA NIM unavailable; switched to DeepSeek fallback
```
For multi-request turns, such as tool-call result follow-ups, fallback can be
considered for a later request only if that later request has not started
streaming assistant content yet. The transcript marker must identify that the
turn changed provider between requests.
## UI and Commands
- `/provider` should show the primary route and the current fallback position.
- `/provider reset` should return to the primary provider for future requests in
the current session.
- The footer/statusline should surface the concrete provider/model that actually
handled the latest request.
- Session receipts should record both attempted provider and successful
provider so cost and debugging information stay truthful.
## Implementation Slices
1. Config schema and validation:
- parse `fallback_providers` and `[provider_fallback]`
- validate known providers, duplicates, missing credentials, and primary
self-reference
- document the config surface
2. Error classification:
- define fallback-eligible error kinds
- add unit tests for HTTP and transport failures
3. Request-shape capability gate:
- evaluate tool, thinking, context, and image requirements
- add tests for incompatible fallbacks
4. Fallback execution:
- run retries per provider before moving to the next provider
- rebuild auth/base-url/header state for each candidate
- block fallback after partial streaming
5. UI/receipt integration:
- status event
- transcript marker
- `/provider reset`
- receipt fields for attempted and selected provider
## Non-goals
- No automatic cost optimization or weighted provider selection.
- No silent fallback when authentication or permissions fail.
- No fallback after partial assistant content or tool-call deltas have streamed.
- No provider/model capability downgrades without an explicit future policy.
- No sub-agent-specific fallback policy in the first implementation; sub-agents
inherit the same configured fallback chain unless they are given an explicit
provider/model override.
## Credit
This RFC is based on issue #2574 from @hsdbeebou and PR #2581 from @idling11.
The original PR head currently has no net file changes, so this document
preserves the useful design direction while tightening the v0.9 contract around
truthful provider routing, billing visibility, and capability checks.