fix(codex): budget oauth route at codex context

Separate model-native context metadata from provider-effective runtime capability so OpenAI API gpt-5.5 stays at its documented 1.05M window while the openai-codex OAuth route budgets preflight, recovery, capacity checks, prompt text, and TUI context indicators against the Codex-family 400K envelope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hunter B
2026-06-13 14:04:42 -07:00
parent bb4d1e5744
commit d9f195ea39
14 changed files with 216 additions and 53 deletions
+4
View File
@@ -222,6 +222,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
CodeWhale aliases now use OpenAI's documented 1,050,000-token context window CodeWhale aliases now use OpenAI's documented 1,050,000-token context window
and 128,000 max-output metadata for context pressure, prompts, and doctor and 128,000 max-output metadata for context pressure, prompts, and doctor
capability output. capability output.
- **OpenAI Codex effective context budgeting.** The public OpenAI API metadata
for `gpt-5.5` remains 1,050,000 tokens, but the `openai-codex` OAuth route now
budgets prompts against the 400K Codex-family effective window so preflight
compaction runs before the backend returns `context_length_exceeded`.
- **OpenRouter Nemotron 3 Ultra preset.** The OpenRouter preset and model - **OpenRouter Nemotron 3 Ultra preset.** The OpenRouter preset and model
registry now emit `nvidia/nemotron-3-ultra-550b-a55b` while keeping the old registry now emit `nvidia/nemotron-3-ultra-550b-a55b` while keeping the old
Ultra aliases compatible. Ultra aliases compatible.
@@ -5,7 +5,7 @@ use std::path::Path;
use super::CommandResult; use super::CommandResult;
use crate::compaction::estimate_input_tokens_conservative; use crate::compaction::estimate_input_tokens_conservative;
use crate::models::{LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS, context_window_for_model}; use crate::config::provider_capability;
use crate::tui::app::App; use crate::tui::app::App;
use crate::utils::{display_path, estimate_message_chars}; use crate::utils::{display_path, estimate_message_chars};
@@ -166,7 +166,7 @@ fn footer_items(app: &App) -> String {
} }
fn context_usage(app: &App) -> (usize, u32, f64) { fn context_usage(app: &App) -> (usize, u32, f64) {
let max = context_window_for_model(&app.model).unwrap_or(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS); let max = provider_capability(app.api_provider, &app.model).context_window;
let estimated = let estimated =
estimate_input_tokens_conservative(&app.api_messages, app.system_prompt.as_ref()); estimate_input_tokens_conservative(&app.api_messages, app.system_prompt.as_ref());
let total_chars = estimate_message_chars(&app.api_messages); let total_chars = estimate_message_chars(&app.api_messages);
+6 -3
View File
@@ -156,6 +156,7 @@ pub const DEFAULT_TOGETHER_MODEL: &str = "deepseek-ai/DeepSeek-V4-Pro";
pub const DEFAULT_TOGETHER_BASE_URL: &str = "https://api.together.xyz/v1"; pub const DEFAULT_TOGETHER_BASE_URL: &str = "https://api.together.xyz/v1";
pub const DEFAULT_OPENAI_CODEX_MODEL: &str = "gpt-5.5"; pub const DEFAULT_OPENAI_CODEX_MODEL: &str = "gpt-5.5";
pub const DEFAULT_OPENAI_CODEX_BASE_URL: &str = "https://chatgpt.com/backend-api"; pub const DEFAULT_OPENAI_CODEX_BASE_URL: &str = "https://chatgpt.com/backend-api";
pub const OPENAI_CODEX_EFFECTIVE_CONTEXT_WINDOW_TOKENS: u32 = 400_000;
/// Legacy `deepseek-cn` provider alias. /// Legacy `deepseek-cn` provider alias.
/// ///
/// DeepSeek's official API host is the same worldwide. Keep this alias for /// DeepSeek's official API host is the same worldwide. Keep this alias for
@@ -433,8 +434,7 @@ pub fn provider_capability(provider: ApiProvider, resolved_model: &str) -> Provi
return ProviderCapability { return ProviderCapability {
provider, provider,
resolved_model: resolved_model.to_string(), resolved_model: resolved_model.to_string(),
context_window: crate::models::context_window_for_model(resolved_model) context_window: OPENAI_CODEX_EFFECTIVE_CONTEXT_WINDOW_TOKENS,
.unwrap_or(crate::models::LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS),
max_output: crate::models::max_output_tokens_for_model(resolved_model).unwrap_or(4096), max_output: crate::models::max_output_tokens_for_model(resolved_model).unwrap_or(4096),
thinking_supported: true, thinking_supported: true,
cache_telemetry_supported: false, cache_telemetry_supported: false,
@@ -11442,7 +11442,10 @@ model = "deepseek-ai/deepseek-v4-pro"
let cap = provider_capability(ApiProvider::OpenaiCodex, DEFAULT_OPENAI_CODEX_MODEL); let cap = provider_capability(ApiProvider::OpenaiCodex, DEFAULT_OPENAI_CODEX_MODEL);
assert_eq!(cap.provider, ApiProvider::OpenaiCodex); assert_eq!(cap.provider, ApiProvider::OpenaiCodex);
assert_eq!(cap.resolved_model, DEFAULT_OPENAI_CODEX_MODEL); assert_eq!(cap.resolved_model, DEFAULT_OPENAI_CODEX_MODEL);
assert_eq!(cap.context_window, 1_050_000); assert_eq!(
cap.context_window,
OPENAI_CODEX_EFFECTIVE_CONTEXT_WINDOW_TOKENS
);
assert_eq!(cap.max_output, 128_000); assert_eq!(cap.max_output, 128_000);
assert!(cap.thinking_supported); assert!(cap.thinking_supported);
assert!(!cap.cache_telemetry_supported); assert!(!cap.cache_telemetry_supported);
+16 -3
View File
@@ -518,6 +518,7 @@ pub struct Engine {
subagent_manager: SharedSubAgentManager, subagent_manager: SharedSubAgentManager,
shell_manager: SharedShellManager, shell_manager: SharedShellManager,
mcp_pool: Option<Arc<AsyncMutex<McpPool>>>, mcp_pool: Option<Arc<AsyncMutex<McpPool>>>,
api_provider: ApiProvider,
rx_op: mpsc::Receiver<Op>, rx_op: mpsc::Receiver<Op>,
rx_approval: mpsc::Receiver<ApprovalDecision>, rx_approval: mpsc::Receiver<ApprovalDecision>,
rx_user_input: mpsc::Receiver<UserInputDecision>, rx_user_input: mpsc::Receiver<UserInputDecision>,
@@ -688,6 +689,7 @@ impl Engine {
Ok(client) => (Some(client), None), Ok(client) => (Some(client), None),
Err(err) => (None, Some(err.to_string())), Err(err) => (None, Some(err.to_string())),
}; };
let api_provider = api_config.api_provider();
let api_key_env_only_recovery = Self::env_only_api_key_recovery_hint(api_config); let api_key_env_only_recovery = Self::env_only_api_key_recovery_hint(api_config);
let mut session = Session::new( let mut session = Session::new(
@@ -718,6 +720,10 @@ impl Engine {
locale_tag: &config.locale_tag, locale_tag: &config.locale_tag,
translation_enabled: config.translation_enabled, translation_enabled: config.translation_enabled,
model_id: &config.model, model_id: &config.model,
context_window_override: Some(
crate::config::provider_capability(api_provider, &config.model)
.context_window,
),
show_thinking: config.show_thinking, show_thinking: config.show_thinking,
verbosity: config.verbosity.as_deref(), verbosity: config.verbosity.as_deref(),
}, },
@@ -821,6 +827,7 @@ impl Engine {
subagent_manager, subagent_manager,
shell_manager, shell_manager,
mcp_pool: None, mcp_pool: None,
api_provider,
rx_op, rx_op,
rx_approval, rx_approval,
rx_user_input, rx_user_input,
@@ -2159,7 +2166,9 @@ impl Engine {
} }
async fn recover_context_overflow(&mut self, client: &DeepSeekClient, reason: &str) -> bool { async fn recover_context_overflow(&mut self, client: &DeepSeekClient, reason: &str) -> bool {
let Some(target_budget) = context_input_budget(&self.session.model) else { let Some(target_budget) =
context_input_budget_for_provider(self.api_provider, &self.session.model)
else {
return false; return false;
}; };
@@ -2500,6 +2509,10 @@ impl Engine {
locale_tag: &self.config.locale_tag, locale_tag: &self.config.locale_tag,
translation_enabled: self.config.translation_enabled, translation_enabled: self.config.translation_enabled,
model_id: &self.config.model, model_id: &self.config.model,
context_window_override: Some(
crate::config::provider_capability(self.api_provider, &self.config.model)
.context_window,
),
show_thinking: self.config.show_thinking, show_thinking: self.config.show_thinking,
verbosity: self.config.verbosity.as_deref(), verbosity: self.config.verbosity.as_deref(),
}, },
@@ -2828,8 +2841,8 @@ mod handle;
pub(crate) use context::compact_tool_result_for_context; pub(crate) use context::compact_tool_result_for_context;
use context::{ use context::{
COMPACTION_SUMMARY_MARKER, MAX_CONTEXT_RECOVERY_ATTEMPTS, MIN_RECENT_MESSAGES_TO_KEEP, COMPACTION_SUMMARY_MARKER, MAX_CONTEXT_RECOVERY_ATTEMPTS, MIN_RECENT_MESSAGES_TO_KEEP,
context_input_budget, effective_max_output_tokens, extract_compaction_summary_prompt, context_input_budget_for_provider, effective_max_output_tokens,
is_context_length_error_message, summarize_text, extract_compaction_summary_prompt, is_context_length_error_message, summarize_text,
}; };
mod dispatch; mod dispatch;
mod loop_guard; mod loop_guard;
+5 -5
View File
@@ -7,7 +7,7 @@
use super::*; use super::*;
use crate::models::context_window_for_model; use crate::config::provider_capability;
impl Engine { impl Engine {
pub(super) async fn run_capacity_pre_request_checkpoint( pub(super) async fn run_capacity_pre_request_checkpoint(
@@ -156,8 +156,7 @@ impl Engine {
let unique_reference_ids_recent_window = let unique_reference_ids_recent_window =
self.recent_unique_reference_count(message_window, turn); self.recent_unique_reference_count(message_window, turn);
let context_window = usize::try_from( let context_window = usize::try_from(
context_window_for_model(&self.session.model) provider_capability(self.api_provider, &self.session.model).context_window,
.unwrap_or(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS),
) )
.unwrap_or(usize::try_from(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS).unwrap_or(128_000)) .unwrap_or(usize::try_from(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS).unwrap_or(128_000))
.max(1); .max(1);
@@ -432,8 +431,9 @@ impl Engine {
} }
if !refreshed { if !refreshed {
let target_budget = context_input_budget(&self.session.model) let target_budget =
.unwrap_or(self.config.compaction.token_threshold.max(1)); context_input_budget_for_provider(self.api_provider, &self.session.model)
.unwrap_or(self.config.compaction.token_threshold.max(1));
if self.estimated_input_tokens() > target_budget { if self.estimated_input_tokens() > target_budget {
let trimmed = self.trim_oldest_messages_to_budget(target_budget); let trimmed = self.trim_oldest_messages_to_budget(target_budget);
refreshed = trimmed > 0; refreshed = trimmed > 0;
+16 -5
View File
@@ -5,6 +5,7 @@
//! engine module from accumulating unrelated context-policy details. //! engine module from accumulating unrelated context-policy details.
use crate::compaction::estimate_tokens; use crate::compaction::estimate_tokens;
use crate::config::{ApiProvider, provider_capability};
use crate::error_taxonomy::ErrorCategory; use crate::error_taxonomy::ErrorCategory;
use crate::models::{Message, SystemPrompt, context_window_for_model}; use crate::models::{Message, SystemPrompt, context_window_for_model};
use crate::tools::spec::ToolResult; use crate::tools::spec::ToolResult;
@@ -562,9 +563,12 @@ pub(super) fn estimate_input_tokens_conservative(
/// window does not underflow to a negative budget. /// window does not underflow to a negative budget.
const INTERNAL_BUDGET_LARGE_WINDOW_THRESHOLD: u32 = 500_000; const INTERNAL_BUDGET_LARGE_WINDOW_THRESHOLD: u32 = 500_000;
/// Internal input-side token budget for a model: `window - reserved_output - /// Internal input-side token budget for a provider/model route:
/// headroom`. Used by the preflight check, emergency recovery, and capacity /// `window - reserved_output - headroom`. Used by the preflight check,
/// trimming to decide when to compact. /// emergency recovery, and capacity trimming to decide when to compact.
/// Unknown model ids fall back to the provider's conservative default instead
/// of disabling preflight; custom long-context deployments can still advertise
/// their window with a `-256k`/`-1024k` model suffix.
/// ///
/// The reserved-output term is window-dependent: /// The reserved-output term is window-dependent:
/// * `window >= 500K` (V4-class large-context) -> [`TURN_MAX_OUTPUT_TOKENS`] /// * `window >= 500K` (V4-class large-context) -> [`TURN_MAX_OUTPUT_TOKENS`]
@@ -575,8 +579,15 @@ const INTERNAL_BUDGET_LARGE_WINDOW_THRESHOLD: u32 = 500_000;
/// `256K - 262K - 1K`, which underflows `checked_sub` to `None` and /// `256K - 262K - 1K`, which underflows `checked_sub` to `None` and
/// *silently disables every preflight and emergency recovery path* — the /// *silently disables every preflight and emergency recovery path* — the
/// session then runs until the provider hard-rejects on context length. /// session then runs until the provider hard-rejects on context length.
pub(super) fn context_input_budget(model: &str) -> Option<usize> { pub(super) fn context_input_budget_for_provider(
let window_tokens = context_window_for_model(model)?; provider: ApiProvider,
model: &str,
) -> Option<usize> {
let capability = provider_capability(provider, model);
context_input_budget_for_window(model, capability.context_window)
}
fn context_input_budget_for_window(model: &str, window_tokens: u32) -> Option<usize> {
let window = usize::try_from(window_tokens).ok()?; let window = usize::try_from(window_tokens).ok()?;
let reserved_output = if window_tokens >= INTERNAL_BUDGET_LARGE_WINDOW_THRESHOLD { let reserved_output = if window_tokens >= INTERNAL_BUDGET_LARGE_WINDOW_THRESHOLD {
TURN_MAX_OUTPUT_TOKENS TURN_MAX_OUTPUT_TOKENS
+25 -4
View File
@@ -1,6 +1,7 @@
use super::*; use super::*;
use super::context::TURN_MAX_OUTPUT_TOKENS; use super::context::TURN_MAX_OUTPUT_TOKENS;
use crate::config::ApiProvider;
use crate::models::SystemBlock; use crate::models::SystemBlock;
use crate::test_support::lock_test_env; use crate::test_support::lock_test_env;
use crate::tools::plan::{PlanItemArg, PlanSnapshot, StepStatus}; use crate::tools::plan::{PlanItemArg, PlanSnapshot, StepStatus};
@@ -2103,13 +2104,31 @@ fn context_budget_reserves_output_and_headroom() {
let _lock = lock_test_env(); let _lock = lock_test_env();
// V4 has a 1M context window — the only family that comfortably hosts // V4 has a 1M context window — the only family that comfortably hosts
// a 256K output reservation without saturating the input budget to 0. // a 256K output reservation without saturating the input budget to 0.
let budget = context_input_budget("deepseek-v4-pro") let budget = context_input_budget_for_provider(ApiProvider::Deepseek, "deepseek-v4-pro")
.expect("deepseek-v4-pro should have a known context window"); .expect("deepseek-v4-pro should have a known context window");
let v4_window: usize = 1_000_000; let v4_window: usize = 1_000_000;
let expected = v4_window - (TURN_MAX_OUTPUT_TOKENS as usize) - 1_024usize; let expected = v4_window - (TURN_MAX_OUTPUT_TOKENS as usize) - 1_024usize;
assert_eq!(budget, expected); assert_eq!(budget, expected);
} }
#[test]
fn context_budget_uses_conservative_fallback_for_unknown_models() {
let _lock = lock_test_env();
let budget = context_input_budget_for_provider(ApiProvider::Openai, "auto")
.expect("unknown/auto model ids should still get a conservative hard preflight budget");
let expected = 128_000usize - effective_max_output_tokens("auto") as usize - 1_024usize;
assert_eq!(budget, expected);
}
#[test]
fn context_budget_uses_provider_effective_window_for_openai_codex() {
let _lock = lock_test_env();
let budget = context_input_budget_for_provider(ApiProvider::OpenaiCodex, "gpt-5.5")
.expect("OpenAI Codex should use the route-effective context window");
let expected = 400_000usize - effective_max_output_tokens("gpt-5.5") as usize - 1_024usize;
assert_eq!(budget, expected);
}
#[test] #[test]
fn effective_max_output_tokens_caps_api_request_for_large_window_models() { fn effective_max_output_tokens_caps_api_request_for_large_window_models() {
// Serialize with other tests that mutate DEEPSEEK_MAX_OUTPUT_TOKENS so // Serialize with other tests that mutate DEEPSEEK_MAX_OUTPUT_TOKENS so
@@ -2213,7 +2232,8 @@ fn internal_context_budget_tiers_reserved_output_by_window() {
// Large-context (>=500K) models reserve the full TURN_MAX_OUTPUT_TOKENS // Large-context (>=500K) models reserve the full TURN_MAX_OUTPUT_TOKENS
// headroom so long V4 sessions don't compact prematurely. // headroom so long V4 sessions don't compact prematurely.
let internal_budget = let internal_budget =
context_input_budget("deepseek-v4-pro").expect("V4 should have a known context window"); context_input_budget_for_provider(ApiProvider::Deepseek, "deepseek-v4-pro")
.expect("V4 should have a known context window");
let v4_window: usize = 1_000_000; let v4_window: usize = 1_000_000;
let expected_internal = v4_window - (TURN_MAX_OUTPUT_TOKENS as usize) - 1_024usize; let expected_internal = v4_window - (TURN_MAX_OUTPUT_TOKENS as usize) - 1_024usize;
assert_eq!(internal_budget, expected_internal); assert_eq!(internal_budget, expected_internal);
@@ -2222,8 +2242,9 @@ fn internal_context_budget_tiers_reserved_output_by_window() {
// deployment must yield a usable positive budget rather than None. The // deployment must yield a usable positive budget rather than None. The
// previous formula reserved the full 262K and computed 256K - 262K - 1K, // previous formula reserved the full 262K and computed 256K - 262K - 1K,
// which underflowed to None and silently disabled preflight/recovery. // which underflowed to None and silently disabled preflight/recovery.
let small_window_budget = context_input_budget("qwen3-32b-256k") let small_window_budget =
.expect("a 256K-suffix model must yield Some budget via the effective-cap branch"); context_input_budget_for_provider(ApiProvider::Openai, "qwen3-32b-256k")
.expect("a 256K-suffix model must yield Some budget via the effective-cap branch");
let effective_output = effective_max_output_tokens("qwen3-32b-256k") as usize; let effective_output = effective_max_output_tokens("qwen3-32b-256k") as usize;
let expected_small = 256_000 - effective_output - 1_024; let expected_small = 256_000 - effective_output - 1_024;
assert_eq!(small_window_budget, expected_small); assert_eq!(small_window_budget, expected_small);
+3 -1
View File
@@ -205,7 +205,9 @@ impl Engine {
continue; continue;
} }
if let Some(input_budget) = context_input_budget(&self.session.model) { if let Some(input_budget) =
context_input_budget_for_provider(self.api_provider, &self.session.model)
{
let estimated_input = self.estimated_input_tokens(); let estimated_input = self.estimated_input_tokens();
if estimated_input > input_budget { if estimated_input > input_budget {
if context_recovery_attempts >= MAX_CONTEXT_RECOVERY_ATTEMPTS { if context_recovery_attempts >= MAX_CONTEXT_RECOVERY_ATTEMPTS {
+78 -10
View File
@@ -246,6 +246,12 @@ pub fn context_window_for_model(model: &str) -> Option<u32> {
} }
return Some(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS); return Some(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS);
} }
if is_openai_gpt_55_api_model(&lower) {
return Some(1_050_000);
}
if is_openai_codex_model(&lower) {
return Some(400_000);
}
if let Some(window) = known_context_window_for_model(&lower) { if let Some(window) = known_context_window_for_model(&lower) {
return Some(window); return Some(window);
} }
@@ -259,7 +265,8 @@ fn known_context_window_for_model(model_lower: &str) -> Option<u32> {
match model_lower { match model_lower {
// OpenAI API model docs, verified 2026-06-12: // OpenAI API model docs, verified 2026-06-12:
// https://developers.openai.com/api/docs/models/gpt-5.5 // https://developers.openai.com/api/docs/models/gpt-5.5
"gpt-5.5" | "gpt-5.5-pro" | "codex-gpt-5.5" | "chatgpt-gpt-5.5" => Some(1_050_000), // Family aliases and snapshots are handled by
// `is_openai_gpt_55_api_model` before this table.
// OpenAI Codex model docs, verified 2026-06-12: // OpenAI Codex model docs, verified 2026-06-12:
// https://developers.openai.com/api/docs/models/gpt-5-codex // https://developers.openai.com/api/docs/models/gpt-5-codex
// https://developers.openai.com/api/docs/models/gpt-5.3-codex // https://developers.openai.com/api/docs/models/gpt-5.3-codex
@@ -320,9 +327,11 @@ pub fn max_output_tokens_for_model(model: &str) -> Option<u32> {
if lower.contains("deepseek") && lower.contains("v4") { if lower.contains("deepseek") && lower.contains("v4") {
return Some(384_000); return Some(384_000);
} }
if is_openai_gpt_55_api_model(&lower) || is_openai_codex_model(&lower) {
return Some(128_000);
}
match lower.as_str() { match lower.as_str() {
"gpt-5.5" | "gpt-5.5-pro" | "codex-gpt-5.5" | "chatgpt-gpt-5.5" | "gpt-5-codex" "gpt-5-codex" | "gpt-5.3-codex" => Some(128_000),
| "gpt-5.3-codex" => Some(128_000),
"claude-opus-4-8" => Some(128_000), "claude-opus-4-8" => Some(128_000),
"claude-sonnet-4-6" | "claude-haiku-4-5" => Some(64_000), "claude-sonnet-4-6" | "claude-haiku-4-5" => Some(64_000),
"arcee-ai/trinity-large-thinking" "arcee-ai/trinity-large-thinking"
@@ -369,10 +378,6 @@ pub fn model_supports_reasoning(model: &str) -> bool {
lower.as_str(), lower.as_str(),
"claude-opus-4-8" "claude-opus-4-8"
| "claude-sonnet-4-6" | "claude-sonnet-4-6"
| "gpt-5.5"
| "gpt-5.5-pro"
| "codex-gpt-5.5"
| "chatgpt-gpt-5.5"
| "gpt-5-codex" | "gpt-5-codex"
| "gpt-5.3-codex" | "gpt-5.3-codex"
| "arcee-ai/trinity-large-thinking" | "arcee-ai/trinity-large-thinking"
@@ -414,9 +419,48 @@ pub fn model_supports_reasoning(model: &str) -> bool {
| "z-ai/glm-5.2" | "z-ai/glm-5.2"
| "glm-5.1" | "glm-5.1"
| "glm-5.2" | "glm-5.2"
) || is_openai_gpt_55_api_model(&lower)
|| is_openai_codex_model(&lower)
}
fn is_openai_gpt_55_api_model(model_lower: &str) -> bool {
matches!(model_lower, "gpt-5.5" | "gpt-5.5-pro")
|| has_date_snapshot_suffix(model_lower, "gpt-5.5-")
|| has_date_snapshot_suffix(model_lower, "gpt-5.5-pro-")
}
fn is_openai_codex_model(model_lower: &str) -> bool {
matches!(
model_lower,
"gpt-5-codex"
| "gpt-5.1-codex"
| "gpt-5.1-codex-mini"
| "gpt-5.1-codex-max"
| "gpt-5.2-codex"
| "gpt-5.3-codex"
| "codex-gpt-5.5"
| "chatgpt-gpt-5.5"
| "gpt-5.5-codex"
| "gpt-5.5-codex-preview"
| "codex-gpt-5.5-preview"
| "chatgpt-gpt-5.5-preview"
) )
} }
fn has_date_snapshot_suffix(model_lower: &str, prefix: &str) -> bool {
let Some(rest) = model_lower.strip_prefix(prefix) else {
return false;
};
let bytes = rest.as_bytes();
bytes.len() == 10
&& bytes[4] == b'-'
&& bytes[7] == b'-'
&& bytes
.iter()
.enumerate()
.all(|(idx, byte)| idx == 4 || idx == 7 || byte.is_ascii_digit())
}
/// Parse an explicit `_Nk` context-window hint from a model name (vendor /// Parse an explicit `_Nk` context-window hint from a model name (vendor
/// agnostic). Returns the window in tokens for `N` in `8..=1024`. /// agnostic). Returns the window in tokens for `N` in `8..=1024`.
fn explicit_context_window_hint(model_lower: &str) -> Option<u32> { fn explicit_context_window_hint(model_lower: &str) -> Option<u32> {
@@ -632,8 +676,13 @@ mod tests {
} }
#[test] #[test]
fn openai_codex_models_have_verified_context_metadata() { fn openai_api_and_codex_models_have_verified_context_metadata() {
for model in ["gpt-5.5", "codex-gpt-5.5", "chatgpt-gpt-5.5"] { for model in [
"gpt-5.5",
"gpt-5.5-pro",
"gpt-5.5-2026-04-23",
"gpt-5.5-pro-2026-04-23",
] {
assert_eq!(context_window_for_model(model), Some(1_050_000)); assert_eq!(context_window_for_model(model), Some(1_050_000));
assert_eq!(max_output_tokens_for_model(model), Some(128_000)); assert_eq!(max_output_tokens_for_model(model), Some(128_000));
assert!(model_supports_reasoning(model)); assert!(model_supports_reasoning(model));
@@ -643,11 +692,30 @@ mod tests {
); );
} }
for model in ["gpt-5-codex", "gpt-5.3-codex"] { for model in [
"gpt-5-codex",
"gpt-5.1-codex",
"gpt-5.1-codex-mini",
"gpt-5.1-codex-max",
"gpt-5.2-codex",
"gpt-5.3-codex",
"codex-gpt-5.5",
"chatgpt-gpt-5.5",
"gpt-5.5-codex",
"gpt-5.5-codex-preview",
] {
assert_eq!(context_window_for_model(model), Some(400_000)); assert_eq!(context_window_for_model(model), Some(400_000));
assert_eq!(max_output_tokens_for_model(model), Some(128_000)); assert_eq!(max_output_tokens_for_model(model), Some(128_000));
assert!(model_supports_reasoning(model)); assert!(model_supports_reasoning(model));
assert_eq!(
compaction_threshold_for_model_at_percent(model, 80.0),
320_000
);
} }
assert_eq!(context_window_for_model("gpt-5.5-nano"), None);
assert_eq!(max_output_tokens_for_model("gpt-5.5-nano"), None);
assert!(!model_supports_reasoning("gpt-5.5-nano"));
} }
#[test] #[test]
+45 -7
View File
@@ -34,6 +34,9 @@ pub struct PromptSessionContext<'a> {
/// preserving backward compatibility with existing call sites /// preserving backward compatibility with existing call sites
/// that predate dynamic model injection. /// that predate dynamic model injection.
pub model_id: &'a str, pub model_id: &'a str,
/// Route-effective context window, when known. This can differ from the
/// model-family maximum when a provider wrapper exposes a smaller envelope.
pub context_window_override: Option<u32>,
/// Whether the user-visible transcript renders thinking blocks. /// Whether the user-visible transcript renders thinking blocks.
/// When false, the prompt should not spend localization pressure on /// When false, the prompt should not spend localization pressure on
/// `reasoning_content` the user will never see. /// `reasoning_content` the user will never see.
@@ -52,6 +55,7 @@ impl Default for PromptSessionContext<'_> {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
} }
@@ -838,12 +842,17 @@ pub(crate) fn render_runtime_policy_reference() -> String {
/// constant; this function produces a per-session variant so the prompt /// constant; this function produces a per-session variant so the prompt
/// says "You are deepseek-v4-pro" or "You are deepseek-v4-flash" instead /// says "You are deepseek-v4-pro" or "You are deepseek-v4-flash" instead
/// of a static placeholder. /// of a static placeholder.
fn apply_model_template(prompt: &str, model_id: &str) -> String { fn apply_model_template(
prompt: &str,
model_id: &str,
context_window_override: Option<u32>,
) -> String {
let mut prompt = prompt.replace("{model_id}", model_id); let mut prompt = prompt.replace("{model_id}", model_id);
// #3025: Substitute model-specific facts so non-DeepSeek models don't // #3025: Substitute model-specific facts so non-DeepSeek models don't
// get V4 architecture claims, 1M-window assumptions, or Flash pricing. // get V4 architecture claims, 1M-window assumptions, or Flash pricing.
let ctx_window = crate::models::context_window_for_model(model_id); let ctx_window =
context_window_override.or_else(|| crate::models::context_window_for_model(model_id));
let window_note = if let Some(window) = ctx_window { let window_note = if let Some(window) = ctx_window {
format!( format!(
"You have a {}-token context window. Do not summarize or delete \ "You have a {}-token context window. Do not summarize or delete \
@@ -999,7 +1008,7 @@ fn compose_default_static_layers(_personality: Personality, model_id: &str) -> S
// Personality is now folded into the YAML constitution (constitution.yaml). // Personality is now folded into the YAML constitution (constitution.yaml).
// No separate overlay is appended — the base prompt already carries voice, // No separate overlay is appended — the base prompt already carries voice,
// tone, and presentation guidance via the preamble and article text. // tone, and presentation guidance via the preamble and article text.
apply_model_template(effective_base_prompt().trim(), model_id) apply_model_template(effective_base_prompt().trim(), model_id, None)
} }
fn apply_static_prompt_composer( fn apply_static_prompt_composer(
@@ -1069,6 +1078,7 @@ pub fn system_prompt_for_mode_with_context_and_skills(
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -1098,8 +1108,17 @@ pub fn system_prompt_for_mode_with_context_skills_session_and_approval(
instructions: Option<&[InstructionSource]>, instructions: Option<&[InstructionSource]>,
session_context: PromptSessionContext<'_>, session_context: PromptSessionContext<'_>,
) -> SystemPrompt { ) -> SystemPrompt {
let mode_prompt = let default_layers = apply_model_template(
compose_prompt_with_approval_model_and_shell(Personality::Calm, session_context.model_id); effective_base_prompt().trim(),
session_context.model_id,
session_context.context_window_override,
);
let mode_prompt = apply_static_prompt_composer(
effective_static_prompt_composer(),
Personality::Calm,
session_context.model_id,
&default_layers,
);
// Load project context from workspace // Load project context from workspace
let project_context = load_project_context_with_parents(workspace); let project_context = load_project_context_with_parents(workspace);
@@ -1548,7 +1567,7 @@ mod tests {
} }
#[test] #[test]
fn compose_prompt_for_openai_codex_uses_verified_context_window() { fn compose_prompt_for_openai_api_gpt_55_uses_verified_context_window() {
let prompt = compose_prompt_with_approval_model_and_shell(Personality::Calm, "gpt-5.5"); let prompt = compose_prompt_with_approval_model_and_shell(Personality::Calm, "gpt-5.5");
assert!(!prompt.contains("Your V4 Characteristics")); assert!(!prompt.contains("Your V4 Characteristics"));
assert!(prompt.contains("1050000-token context window")); assert!(prompt.contains("1050000-token context window"));
@@ -1577,11 +1596,18 @@ mod tests {
#[test] #[test]
fn apply_model_template_replaces_placeholder() { fn apply_model_template_replaces_placeholder() {
let result = apply_model_template("You are {model_id}", "deepseek-v4-pro"); let result = apply_model_template("You are {model_id}", "deepseek-v4-pro", None);
assert_eq!(result, "You are deepseek-v4-pro"); assert_eq!(result, "You are deepseek-v4-pro");
assert!(!result.contains("{model_id}")); assert!(!result.contains("{model_id}"));
} }
#[test]
fn apply_model_template_uses_context_window_override() {
let result = apply_model_template("{context_window_note}", "gpt-5.5", Some(400_000));
assert!(result.contains("400000-token context window"));
assert!(!result.contains("1050000-token context window"));
}
#[test] #[test]
fn compose_prompt_injects_model_id() { fn compose_prompt_injects_model_id() {
let prompt = let prompt =
@@ -1978,6 +2004,7 @@ mod tests {
locale_tag: "zh-Hans", locale_tag: "zh-Hans",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2048,6 +2075,7 @@ mod tests {
locale_tag: "zh-Hans", locale_tag: "zh-Hans",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2091,6 +2119,7 @@ mod tests {
locale_tag: "zh-Hans", locale_tag: "zh-Hans",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: false, show_thinking: false,
verbosity: None, verbosity: None,
}, },
@@ -2144,6 +2173,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2248,6 +2278,7 @@ mod tests {
locale_tag: "ja", locale_tag: "ja",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2285,6 +2316,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2314,6 +2346,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2372,6 +2405,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2401,6 +2435,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2608,6 +2643,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -2643,6 +2679,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: None, verbosity: None,
}, },
@@ -3186,6 +3223,7 @@ mod tests {
locale_tag: "en", locale_tag: "en",
translation_enabled: false, translation_enabled: false,
model_id: "codewhale", model_id: "codewhale",
context_window_override: None,
show_thinking: true, show_thinking: true,
verbosity: Some(" Concise "), verbosity: Some(" Concise "),
}, },
+4 -5
View File
@@ -4,10 +4,9 @@ use std::collections::HashSet;
use std::fmt::Write; use std::fmt::Write;
use crate::compaction::estimate_input_tokens_conservative; use crate::compaction::estimate_input_tokens_conservative;
use crate::config::provider_capability;
use crate::localization::{Locale, MessageId, tr}; use crate::localization::{Locale, MessageId, tr};
use crate::models::{ use crate::models::SystemPrompt;
LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS, SystemPrompt, context_window_for_model,
};
use crate::session_manager::SessionContextReference; use crate::session_manager::SessionContextReference;
use crate::tui::app::{App, ToolDetailRecord}; use crate::tui::app::{App, ToolDetailRecord};
use crate::tui::file_mention::ContextReferenceSource; use crate::tui::file_mention::ContextReferenceSource;
@@ -154,8 +153,8 @@ pub fn build_context_inspector_text(app: &App, locale: Locale) -> String {
} }
fn context_usage(app: &App) -> (usize, u32, f64) { fn context_usage(app: &App) -> (usize, u32, f64) {
let max = context_window_for_model(app.effective_model_for_budget()) let max =
.unwrap_or(LEGACY_DEEPSEEK_CONTEXT_WINDOW_TOKENS); provider_capability(app.api_provider, app.effective_model_for_budget()).context_window;
let estimated = let estimated =
estimate_input_tokens_conservative(&app.api_messages, app.system_prompt.as_ref()); estimate_input_tokens_conservative(&app.api_messages, app.system_prompt.as_ref());
let total_chars = estimate_message_chars(&app.api_messages); let total_chars = estimate_message_chars(&app.api_messages);
+2 -1
View File
@@ -31,6 +31,7 @@ use super::app::{
use super::history::{GenericToolCell, HistoryCell, ToolCell, ToolStatus, summarize_tool_output}; use super::history::{GenericToolCell, HistoryCell, ToolCell, ToolStatus, summarize_tool_output};
use super::subagent_routing::active_fanout_counts; use super::subagent_routing::active_fanout_counts;
use super::ui_text::{concise_shell_command_label, truncate_line_to_width}; use super::ui_text::{concise_shell_command_label, truncate_line_to_width};
use crate::config::provider_capability;
/// Tolerance for floating-point cost comparison in the sidebar breakdown. /// Tolerance for floating-point cost comparison in the sidebar breakdown.
/// Must be large enough that accumulated f64 error across hundreds of turns /// Must be large enough that accumulated f64 error across hundreds of turns
@@ -2407,7 +2408,7 @@ fn render_context_panel(f: &mut Frame, area: Rect, app: &mut App) {
// ── Token usage ────────────────────────────────────────────── // ── Token usage ──────────────────────────────────────────────
let total_tokens = app.session.total_conversation_tokens; let total_tokens = app.session.total_conversation_tokens;
let window = crate::models::context_window_for_model(&app.model).unwrap_or(1_048_576); let window = provider_capability(app.api_provider, &app.model).context_window;
let pct = if window > 0 { let pct = if window > 0 {
((total_tokens as f64 / window as f64) * 100.0).clamp(0.0, 100.0) ((total_tokens as f64 / window as f64) * 100.0).clamp(0.0, 100.0)
} else { } else {
+7 -5
View File
@@ -43,7 +43,7 @@ use crate::commands;
use crate::compaction::estimate_input_tokens_conservative; use crate::compaction::estimate_input_tokens_conservative;
use crate::config::{ use crate::config::{
ApiProvider, Config, DEFAULT_NVIDIA_NIM_BASE_URL, ProviderConfig, ProvidersConfig, StatusItem, ApiProvider, Config, DEFAULT_NVIDIA_NIM_BASE_URL, ProviderConfig, ProvidersConfig, StatusItem,
UpdateConfig, save_provider_auth_mode_for, UpdateConfig, provider_capability, save_provider_auth_mode_for,
}; };
use crate::config_ui::{self, ConfigUiMode, WebConfigSession, WebConfigSessionEvent}; use crate::config_ui::{self, ConfigUiMode, WebConfigSession, WebConfigSessionEvent};
use crate::core::engine::{EngineConfig, EngineHandle, spawn_engine}; use crate::core::engine::{EngineConfig, EngineHandle, spawn_engine};
@@ -52,9 +52,7 @@ use crate::core::ops::{Op, USER_SHELL_TOOL_ID_PREFIX};
use crate::hooks::{HookEvent, HookExecutor, TurnEndPayloadInput, TurnEndTotals}; use crate::hooks::{HookEvent, HookExecutor, TurnEndPayloadInput, TurnEndTotals};
use crate::llm_client::LlmClient; use crate::llm_client::LlmClient;
use crate::localization::{MessageId, tr}; use crate::localization::{MessageId, tr};
use crate::models::{ use crate::models::{ContentBlock, Message, MessageRequest, SystemPrompt, Usage};
ContentBlock, Message, MessageRequest, SystemPrompt, Usage, context_window_for_model,
};
use crate::palette; use crate::palette;
use crate::prompts; use crate::prompts;
use crate::session_manager::{ use crate::session_manager::{
@@ -5659,6 +5657,9 @@ async fn dispatch_user_message(
locale_tag: app.ui_locale.tag(), locale_tag: app.ui_locale.tag(),
translation_enabled: app.translation_enabled, translation_enabled: app.translation_enabled,
model_id: &app.model, model_id: &app.model,
context_window_override: Some(
provider_capability(app.api_provider, &app.model).context_window,
),
show_thinking: app.show_thinking, show_thinking: app.show_thinking,
verbosity: app.verbosity.as_deref(), verbosity: app.verbosity.as_deref(),
}, },
@@ -9425,7 +9426,8 @@ fn estimated_context_tokens(app: &App) -> Option<i64> {
} }
pub(crate) fn context_usage_snapshot(app: &App) -> Option<(i64, u32, f64)> { pub(crate) fn context_usage_snapshot(app: &App) -> Option<(i64, u32, f64)> {
let max = context_window_for_model(app.effective_model_for_budget())?; let max =
provider_capability(app.api_provider, app.effective_model_for_budget()).context_window;
let max_i64 = i64::from(max); let max_i64 = i64::from(max);
let reported = app let reported = app
.session .session
+3 -2
View File
@@ -143,7 +143,7 @@ endpoint.
| `ollama` | `[providers.ollama]` | Optional `OLLAMA_API_KEY` | `OLLAMA_BASE_URL`; default `http://localhost:11434/v1` | `deepseek-coder:1.3b`; provider-hinted custom tags pass through | Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. `OLLAMA_MODEL` is accepted. | | `ollama` | `[providers.ollama]` | Optional `OLLAMA_API_KEY` | `OLLAMA_BASE_URL`; default `http://localhost:11434/v1` | `deepseek-coder:1.3b`; provider-hinted custom tags pass through | Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. `OLLAMA_MODEL` is accepted. |
| `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`, `HF_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible router route. Accepted aliases: `huggingface`, `hugging-face`, `hugging_face`, `hf`. Org-prefixed model IDs pass through. `HUGGINGFACE_MODEL` and `HF_MODEL` are accepted. Hub browsing/export are separate future features. | | `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`, `HF_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible router route. Accepted aliases: `huggingface`, `hugging-face`, `hugging_face`, `hf`. Org-prefixed model IDs pass through. `HUGGINGFACE_MODEL` and `HF_MODEL` are accepted. Hub browsing/export are separate future features. |
| `together` | `[providers.together]` | `TOGETHER_API_KEY` | `TOGETHER_BASE_URL`; default `https://api.together.xyz/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Together AI OpenAI-compatible route. `TOGETHER_MODEL` is accepted. Model aliases `deepseek-v4-pro` and `deepseek-v4-flash` normalize to Together's org-prefixed IDs. | | `together` | `[providers.together]` | `TOGETHER_API_KEY` | `TOGETHER_BASE_URL`; default `https://api.together.xyz/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Together AI OpenAI-compatible route. `TOGETHER_MODEL` is accepted. Model aliases `deepseek-v4-pro` and `deepseek-v4-flash` normalize to Together's org-prefixed IDs. |
| `openai-codex` | `[providers.openai_codex]` | OAuth via `codex login` (`~/.codex/auth.json`); env override `OPENAI_CODEX_ACCESS_TOKEN`, `CODEX_ACCESS_TOKEN` | `OPENAI_CODEX_BASE_URL`/`CODEX_BASE_URL`; default `https://chatgpt.com/backend-api` | `gpt-5.5` | **Experimental.** Reuses your existing ChatGPT/Codex CLI OAuth login and talks to the OpenAI Responses API at `/codex/responses`. The access token is read and refreshed from `~/.codex/auth.json`; no API key is stored. `OPENAI_CODEX_MODEL`/`CODEX_MODEL` and `OPENAI_CODEX_ACCOUNT_ID`/`CODEX_ACCOUNT_ID` are accepted. | | `openai-codex` | `[providers.openai_codex]` | OAuth via `codex login` (`~/.codex/auth.json`); env override `OPENAI_CODEX_ACCESS_TOKEN`, `CODEX_ACCESS_TOKEN` | `OPENAI_CODEX_BASE_URL`/`CODEX_BASE_URL`; default `https://chatgpt.com/backend-api` | `gpt-5.5` | **Experimental.** Reuses your existing ChatGPT/Codex CLI OAuth login and talks to the OpenAI Responses API at `/codex/responses`. The access token is read and refreshed from `~/.codex/auth.json`; no API key is stored. `OPENAI_CODEX_MODEL`/`CODEX_MODEL` and `OPENAI_CODEX_ACCOUNT_ID`/`CODEX_ACCOUNT_ID` are accepted. CodeWhale budgets this route with the 400K Codex-family effective context window even when the public API model table lists a larger native `gpt-5.5` window. |
| `anthropic` | `[providers.anthropic]` | `ANTHROPIC_API_KEY` | `ANTHROPIC_BASE_URL`; default `https://api.anthropic.com` | `claude-opus-4-8`, `claude-sonnet-4-6` (default), `claude-haiku-4-5` | Native Anthropic Messages API route (`/v1/messages`, `x-api-key` + `anthropic-version: 2023-06-01`) — not OpenAI-compatible. Prompt caching via `cache_control` breakpoints, adaptive thinking + `output_config.effort`, signed thinking blocks replayed verbatim, cache telemetry normalized per #2961. `ANTHROPIC_MODEL` is accepted. | | `anthropic` | `[providers.anthropic]` | `ANTHROPIC_API_KEY` | `ANTHROPIC_BASE_URL`; default `https://api.anthropic.com` | `claude-opus-4-8`, `claude-sonnet-4-6` (default), `claude-haiku-4-5` | Native Anthropic Messages API route (`/v1/messages`, `x-api-key` + `anthropic-version: 2023-06-01`) — not OpenAI-compatible. Prompt caching via `cache_control` breakpoints, adaptive thinking + `output_config.effort`, signed thinking blocks replayed verbatim, cache telemetry normalized per #2961. `ANTHROPIC_MODEL` is accepted. |
### Hugging Face Provider vs MCP vs Hub ### Hugging Face Provider vs MCP vs Hub
@@ -262,7 +262,8 @@ Anthropic uses Messages, and `openai-codex` uses Responses.
| OpenRouter Qwen 3.6 Flash / Plus | 1,000,000 | 65,536 | yes | no | not documented in code | | OpenRouter Qwen 3.6 Flash / Plus | 1,000,000 | 65,536 | yes | no | not documented in code |
| OpenRouter Qwen 3.6 35B / 27B | 262,144 | 262,140 | yes | no | not documented in code | | OpenRouter Qwen 3.6 35B / 27B | 262,144 | 262,140 | yes | no | not documented in code |
| OpenRouter Qwen 3.6 Max Preview | 262,144 | 65,536 | yes | no | not documented in code | | OpenRouter Qwen 3.6 Max Preview | 262,144 | 65,536 | yes | no | not documented in code |
| OpenAI Codex / ChatGPT `gpt-5.5` | 1,050,000 | 128,000 | yes | no | not documented in code | | OpenAI API `gpt-5.5` | 1,050,000 | 128,000 | yes | no | not documented in code |
| OpenAI Codex / ChatGPT route (`openai-codex`) | 400,000 effective | 128,000 | yes | no | route uses Responses payload at `/codex/responses` |
| Wanjie Ark `reasoner` / `r1` model IDs | 128,000 | 4,096 | yes | no | not documented in code | | Wanjie Ark `reasoner` / `r1` model IDs | 128,000 | 4,096 | yes | no | not documented in code |
| Direct Arcee API `trinity-large-thinking` | 262,144 | 262,144 | yes | no | not documented in code | | Direct Arcee API `trinity-large-thinking` | 262,144 | 262,144 | yes | no | not documented in code |
| Direct Arcee API `trinity-large-preview` | 262,144 | 4,096 | no in doctor capability metadata | no | not documented in code | | Direct Arcee API `trinity-large-preview` | 262,144 | 4,096 | no in doctor capability metadata | no | not documented in code |