Files
codewhale/docs/PROVIDERS.md
T
CodeWhale Agent 89a9981bf9 Merge PR #2879: Hugging Face provider docs and tests
Harvested from PR #2879 by @mvanhorn

Co-authored-by: mvanhorn <455140+mvanhorn@users.noreply.github.com>
2026-06-12 13:56:03 -07:00

28 KiB

Provider Registry

This registry describes provider behavior that is wired into the current CodeWhale codebase. It is intentionally conservative: shipped entries are limited to provider IDs, config keys, auth paths, base URLs, model resolution, and capability metadata that the code already knows about.

DeepSeek remains the first-class default provider. NVIDIA NIM, OpenRouter, Volcengine Ark, Xiaomi MiMo, Novita, Fireworks, SiliconFlow, Arcee AI, generic OpenAI-compatible endpoints, self-hosted runtimes, Moonshot/Kimi, and Hugging Face Inference Providers are additive routes for running the same terminal harness against other hosted or local model endpoints.

Sources to keep in sync:

  • crates/config/src/lib.rs - shared provider IDs, defaults, env precedence.
  • crates/tui/src/config.rs - TUI provider IDs, provider capability metadata, and provider-specific env handling.
  • crates/agent/src/lib.rs - static ModelRegistry used by codewhale model list and codewhale model resolve.
  • config.example.toml and docs/CONFIGURATION.md - user-facing config examples and environment variable reference.
  • scripts/check-provider-registry.py - drift check for canonical provider IDs, live TUI provider IDs, TOML table names, static registry rows, and documented defaults.

Provider Selection

The canonical provider IDs are:

deepseek, nvidia-nim, openai, atlascloud, wanjie-ark, volcengine, openrouter, xiaomi-mimo, novita, fireworks, siliconflow, siliconflow-CN, arcee, moonshot, sglang, vllm, ollama, huggingface, together, openai-codex, and anthropic.

Use any of these surfaces to select a provider:

  • CLI: codewhale --provider <id>
  • TUI: /provider <id> or the provider picker
  • Env: CODEWHALE_PROVIDER=<id>; DEEPSEEK_PROVIDER=<id> is the legacy alias
  • Config: provider = "<id>"

deepseek-cn, deepseek_china, deepseekcn, and deepseek-china are accepted as legacy aliases for deepseek. They do not select a different official host; DeepSeek uses the same official API host worldwide.

huggingface, hugging-face, hugging_face, and hf all select the Hugging Face Inference Providers route. This is the OpenAI-compatible router path for chat/inference, not Hub browsing, model-card inspection, uploads, or artifact export.

Fresh shared config writes to ~/.codewhale/config.toml. Existing ~/.deepseek/config.toml files are still read for compatibility.

Auth And Env Rules

For hosted providers, codewhale auth set --provider <id> saves an API key for that provider. API-key environment variables are fallback inputs after saved config and keyring credentials; an explicit process-level --api-key still wins for that launch.

For base URL and model selection, prefer:

  • CODEWHALE_BASE_URL / CODEWHALE_MODEL for the active provider.
  • Provider-specific base URL/model env vars when listed below.
  • DEEPSEEK_BASE_URL, DEEPSEEK_MODEL, and DEEPSEEK_DEFAULT_TEXT_MODEL as legacy aliases.

Non-local http:// base URLs are rejected unless DEEPSEEK_ALLOW_INSECURE_HTTP=1 is set. Loopback HTTP URLs are allowed for self-hosted runtimes.

Custom DeepSeek-Compatible Endpoints

Most custom DeepSeek-compatible deployments can use an existing provider ID. Do not create [providers.deepseek_custom]; the provider table names are fixed. Instead, choose the closest shipped route and override its endpoint/model:

  • DeepSeek-compatible hosted API: keep provider = "deepseek" and set [providers.deepseek].base_url plus [providers.deepseek].model, or launch with DEEPSEEK_BASE_URL and DEEPSEEK_MODEL.
  • Generic OpenAI-compatible gateway: use provider = "openai" with [providers.openai].base_url plus [providers.openai].model, or launch with OPENAI_BASE_URL and OPENAI_MODEL.
  • Local OpenAI-compatible runtimes: use provider = "vllm", "sglang", or "ollama" with the matching provider-specific base URL/model values.

Example user config for a DeepSeek-compatible host:

provider = "deepseek"

[providers.deepseek]
api_key = "YOUR_API_KEY"
base_url = "https://your-provider.example/v1"
model = "deepseek-ai/DeepSeek-V4-Pro"

Example user config for a generic gateway:

provider = "openai"

[providers.openai]
api_key = "YOUR_GATEWAY_API_KEY"
base_url = "https://gateway.example/v1"
model = "your-deepseek-compatible-model"

Private gateways with broken or intercepted certificates should use SSL_CERT_FILE with a trusted CA bundle. As a last resort, insecure_skip_tls_verify = true can be set on the active [providers.*] table; it applies only to the LLM provider client and is shown by codewhale doctor.

Keep provider, api_key, and base_url in user config or process environment. Project-local config overlays intentionally cannot set those keys, so a repository cannot silently redirect prompts or credentials to another endpoint.

Shipped Providers

Provider ID TOML table Auth env Base URL env and default Default or static models Notes
deepseek [providers.deepseek] DEEPSEEK_API_KEY CODEWHALE_BASE_URL / DEEPSEEK_BASE_URL; default https://api.deepseek.com/beta deepseek-v4-pro, deepseek-v4-flash; compatibility aliases deepseek-chat, deepseek-reasoner First-class default. Beta URL enables strict tool mode, chat prefix completion, and FIM completion. Set https://api.deepseek.com or /v1 explicitly to opt out of beta-only features.
nvidia-nim [providers.nvidia_nim] NVIDIA_API_KEY, NVIDIA_NIM_API_KEY, fallback DEEPSEEK_API_KEY NVIDIA_NIM_BASE_URL, NIM_BASE_URL, NVIDIA_BASE_URL; default https://integrate.api.nvidia.com/v1 deepseek-ai/deepseek-v4-pro, deepseek-ai/deepseek-v4-flash Hosted DeepSeek V4 through NVIDIA NIM. NVIDIA_NIM_MODEL is accepted by the TUI config path.
openai [providers.openai] OPENAI_API_KEY OPENAI_BASE_URL; default https://api.openai.com/v1 Registry entries: deepseek-v4-pro, deepseek-v4-flash; default config model deepseek-v4-pro Generic OpenAI-compatible route for gateways and custom endpoints. Use this for explicit third-party OpenAI-compatible routes instead of inventing a new provider ID. OPENAI_MODEL is accepted.
atlascloud [providers.atlascloud] ATLASCLOUD_API_KEY ATLASCLOUD_BASE_URL; default https://api.atlascloud.ai/v1 Default deepseek-ai/deepseek-v4-flash; explicit vendor/model-id values pass through when AtlasCloud is selected OpenAI-compatible hosted route. ATLASCLOUD_MODEL is accepted by the TUI config path, the static ModelRegistry keeps DeepSeek V4 fallback rows, and provider-hinted CLI model IDs are sent to AtlasCloud exactly as requested.
wanjie-ark [providers.wanjie_ark] WANJIE_ARK_API_KEY, WANJIE_API_KEY, WANJIE_MAAS_API_KEY WANJIE_ARK_BASE_URL, WANJIE_BASE_URL, WANJIE_MAAS_BASE_URL; default https://maas-openapi.wanjiedata.com/api/v1 deepseek-reasoner OpenAI-compatible hosted route. WANJIE_ARK_MODEL, WANJIE_MODEL, and WANJIE_MAAS_MODEL are accepted.
volcengine [providers.volcengine] VOLCENGINE_API_KEY, VOLCENGINE_ARK_API_KEY, ARK_API_KEY VOLCENGINE_BASE_URL, VOLCENGINE_ARK_BASE_URL, ARK_BASE_URL; default https://ark.cn-beijing.volces.com/api/coding/v3 DeepSeek-V4-Pro, DeepSeek-V4-Flash Volcengine/Volcano Engine Ark OpenAI-compatible coding endpoint. VOLCENGINE_MODEL and VOLCENGINE_ARK_MODEL are accepted.
openrouter [providers.openrouter] OPENROUTER_API_KEY OPENROUTER_BASE_URL; default https://openrouter.ai/api/v1 deepseek/deepseek-v4-pro, deepseek/deepseek-v4-flash; recent large IDs include arcee-ai/trinity-large-thinking, minimax/minimax-m3, xiaomi/mimo-v2.5-pro, qwen/qwen3.6-flash, qwen/qwen3.6-35b-a3b, qwen/qwen3.6-max-preview, qwen/qwen3.6-27b, qwen/qwen3.6-plus, google/gemma-4-31b-it, z-ai/glm-5.1, moonshotai/kimi-k2.7-code, moonshotai/kimi-k2.6 Additive open-model routing layer. It does not replace DeepSeek; it lets users route supported model IDs through OpenRouter when they choose it.
xiaomi-mimo [providers.xiaomi_mimo] XIAOMI_MIMO_TOKEN_PLAN_API_KEY, MIMO_TOKEN_PLAN_API_KEY, XIAOMI_MIMO_API_KEY, XIAOMI_API_KEY, MIMO_API_KEY XIAOMI_MIMO_BASE_URL, MIMO_BASE_URL, XIAOMI_MIMO_MODE, MIMO_MODE; default https://token-plan-sgp.xiaomimimo.com/v1 Chat: mimo-v2.5-pro, mimo-v2.5; speech/TTS: mimo-v2.5-tts, mimo-v2.5-tts-voicedesign, mimo-v2.5-tts-voiceclone, mimo-v2-tts Xiaomi MiMo OpenAI-compatible chat completions route. Token Plan keys (tp-...) use api-key auth and the token-plan endpoint by default; pay-as-you-go mode uses standard API keys (sk-...) and https://api.xiaomimimo.com/v1. It sends max_completion_tokens and uses MiMo's thinking field for reasoning control. codewhale speech / tts uses the TTS models.
novita [providers.novita] NOVITA_API_KEY NOVITA_BASE_URL; default https://api.novita.ai/v1 deepseek/deepseek-v4-pro, deepseek/deepseek-v4-flash OpenAI-compatible hosted route for DeepSeek model IDs. Use config or CODEWHALE_MODEL / DEEPSEEK_MODEL for model overrides.
fireworks [providers.fireworks] FIREWORKS_API_KEY FIREWORKS_BASE_URL; default https://api.fireworks.ai/inference/v1 accounts/fireworks/models/deepseek-v4-pro OpenAI-compatible hosted route. Use config or CODEWHALE_MODEL / DEEPSEEK_MODEL for model overrides.
siliconflow [providers.siliconflow] SILICONFLOW_API_KEY SILICONFLOW_BASE_URL; default https://api.siliconflow.com/v1 deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash OpenAI-compatible hosted route. Official docs use the .com endpoint. SILICONFLOW_MODEL is accepted. Reasoning aliases deepseek-reasoner and deepseek-r1 map to Pro; deepseek-chat and deepseek-v3 map to Flash.
siliconflow-CN [providers.siliconflow_cn] SILICONFLOW_API_KEY SILICONFLOW_BASE_URL; default https://api.siliconflow.cn/v1 Uses the SiliconFlow model set China regional SiliconFlow route. Falls back to [providers.siliconflow] for api_key / base_url / model when unset. Select it with provider = "siliconflow-CN" or CODEWHALE_PROVIDER=siliconflow-CN.
arcee [providers.arcee] ARCEE_API_KEY ARCEE_BASE_URL; default https://api.arcee.ai/api/v1 trinity-large-thinking, trinity-large-preview Arcee AI direct OpenAI-compatible route, tracked as 256K-context BF16 serving. ARCEE_MODEL is accepted. OpenRouter's arcee-ai/trinity-large-thinking remains the OpenRouter namespaced model ID; direct Arcee uses the bare trinity-large-thinking ID.
moonshot [providers.moonshot] MOONSHOT_API_KEY, KIMI_API_KEY MOONSHOT_BASE_URL, KIMI_BASE_URL; default https://api.moonshot.ai/v1 kimi-k2.7-code, kimi-k2.6; Kimi Code path uses kimi-for-coding at https://api.kimi.com/coding/v1 Moonshot/Kimi route. kimi and kimi-k2 aliases select kimi-k2.7-code; MOONSHOT_MODEL, KIMI_MODEL_NAME, and KIMI_MODEL are accepted. [providers.moonshot] auth_mode = "kimi_oauth" reads Kimi CLI OAuth credentials when present.
sglang [providers.sglang] Optional SGLANG_API_KEY SGLANG_BASE_URL; default http://localhost:30000/v1 deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash Self-hosted OpenAI-compatible route. Localhost deployments commonly omit auth. SGLANG_MODEL is accepted.
vllm [providers.vllm] Optional VLLM_API_KEY VLLM_BASE_URL; default http://localhost:8000/v1 deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash Self-hosted vLLM OpenAI-compatible route. Localhost deployments commonly omit auth. VLLM_MODEL is accepted.
ollama [providers.ollama] Optional OLLAMA_API_KEY OLLAMA_BASE_URL; default http://localhost:11434/v1 deepseek-coder:1.3b; provider-hinted custom tags pass through Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. OLLAMA_MODEL is accepted.
huggingface [providers.huggingface] HUGGINGFACE_API_KEY, HF_TOKEN HUGGINGFACE_BASE_URL, HF_BASE_URL; default https://router.huggingface.co/v1 deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash Hugging Face Inference Providers OpenAI-compatible router route. Accepted aliases: huggingface, hugging-face, hugging_face, hf. Org-prefixed model IDs pass through. HUGGINGFACE_MODEL and HF_MODEL are accepted. Hub browsing/export are separate future features.
together [providers.together] TOGETHER_API_KEY TOGETHER_BASE_URL; default https://api.together.xyz/v1 deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash Together AI OpenAI-compatible route. TOGETHER_MODEL is accepted. Model aliases deepseek-v4-pro and deepseek-v4-flash normalize to Together's org-prefixed IDs.
openai-codex [providers.openai_codex] OAuth via codex login (~/.codex/auth.json); env override OPENAI_CODEX_ACCESS_TOKEN, CODEX_ACCESS_TOKEN OPENAI_CODEX_BASE_URL/CODEX_BASE_URL; default https://chatgpt.com/backend-api gpt-5.5 Experimental. Reuses your existing ChatGPT/Codex CLI OAuth login and talks to the OpenAI Responses API at /codex/responses. The access token is read and refreshed from ~/.codex/auth.json; no API key is stored. OPENAI_CODEX_MODEL/CODEX_MODEL and OPENAI_CODEX_ACCOUNT_ID/CODEX_ACCOUNT_ID are accepted.
anthropic [providers.anthropic] ANTHROPIC_API_KEY ANTHROPIC_BASE_URL; default https://api.anthropic.com claude-opus-4-8, claude-sonnet-4-6 (default), claude-haiku-4-5 Native Anthropic Messages API route (/v1/messages, x-api-key + anthropic-version: 2023-06-01) — not OpenAI-compatible. Prompt caching via cache_control breakpoints, adaptive thinking + output_config.effort, signed thinking blocks replayed verbatim, cache telemetry normalized per #2961. ANTHROPIC_MODEL is accepted.

Hugging Face Provider vs MCP vs Hub

CodeWhale's huggingface provider ID is only the OpenAI-compatible chat inference route through Hugging Face Inference Providers. It is selected with /provider huggingface, CODEWHALE_PROVIDER=huggingface, or provider = "huggingface".

Hugging Face MCP is a separate external-tool route. Configure it through the MCP config described in docs/MCP.md, preferably using the settings-generated snippet from https://huggingface.co/settings/mcp. In the TUI, /hf mcp status checks whether the Hugging Face MCP server appears in the resolved MCP config, /hf mcp setup prints the settings workflow and a placeholder-only shape, and /hf concepts explains the provider/MCP/Hub distinction.

Hub publishing or repository management remains explicit user action through Hub-native tooling such as huggingface_hub or git. The /hf helper does not upload to Hugging Face and does not perform direct Hugging Face Hub HTTP search.

Xiaomi MiMo Notes

xiaomi-mimo defaults to mimo-v2.5-pro for long-context reasoning and coding work. The chat picker also exposes the latest Omni model mimo-v2.5. Xiaomi MiMo TTS is available through codewhale --provider xiaomi-mimo speech "text" --model tts (or the tts alias) plus model-visible speech / tts tools in Agent/YOLO mode.

Token Plan keys default to the Singapore endpoint https://token-plan-sgp.xiaomimimo.com/v1. If your MiMo account is provisioned for the China region, set base_url = "https://token-plan-cn.xiaomimimo.com/v1" explicitly in [providers.xiaomi_mimo] or set mode = "token-plan-cn". Europe Token Plan accounts can use mode = "token-plan-ams"; mode = "pay-as-you-go" selects the standard API endpoint and standard MiMo key family.

Voice-design and voice-clone shorthands map to mimo-v2.5-tts-voicedesign and mimo-v2.5-tts-voiceclone. Xiaomi's current image-understanding guide includes mimo-v2.5 for image input. CodeWhale exposes image analysis through the separate [vision_model] / image_analyze path; set that model to mimo-v2.5 when using MiMo for vision.

Recent OpenRouter Large Models

OpenRouter completions and static registry rows include the April 2026 onward large models verified through OpenRouter's model metadata: arcee-ai/trinity-large-thinking, qwen/qwen3.6-flash, qwen/qwen3.6-35b-a3b, qwen/qwen3.6-max-preview, qwen/qwen3.6-27b, qwen/qwen3.6-plus, minimax/minimax-m3, xiaomi/mimo-v2.5-pro, xiaomi/mimo-v2.5, moonshotai/kimi-k2.7-code, moonshotai/kimi-k2.6, z-ai/glm-5.1, tencent/hy3-preview, google/gemma-4-31b-it, google/gemma-4-26b-a4b-it, and nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free. minimax/minimax-m3 was added from OpenRouter's May 31, 2026 listing as a 1M context multimodal model for coding, tool use, and long-horizon agentic work.

Static Model Registry

codewhale model list and codewhale model resolve use the static registry in crates/agent/src/lib.rs. This is not the same as live /models discovery. Use /models or codewhale models to fetch model IDs from the active API endpoint when the endpoint supports model listing.

Provider Static registry entries Tool calls Registry reasoning flag
deepseek deepseek-v4-pro, deepseek-v4-flash yes yes
nvidia-nim deepseek-ai/deepseek-v4-pro, deepseek-ai/deepseek-v4-flash yes yes
openai deepseek-v4-pro, deepseek-v4-flash yes yes
atlascloud deepseek-ai/deepseek-v4-flash, deepseek-ai/deepseek-v4-pro yes yes
wanjie-ark deepseek-reasoner yes yes
volcengine DeepSeek-V4-Pro, DeepSeek-V4-Flash yes yes
openrouter deepseek/deepseek-v4-pro, deepseek/deepseek-v4-flash, arcee-ai/trinity-large-thinking, minimax/minimax-m3, minimax/minimax-2.7, xiaomi/mimo-v2.5-pro, xiaomi/mimo-v2.5, qwen/qwen3.6-flash, qwen/qwen3.6-35b-a3b, qwen/qwen3.6-max-preview, qwen/qwen3.6-27b, qwen/qwen3.6-plus, qwen/qwen3.7-max, moonshotai/kimi-k2.7-code, moonshotai/kimi-k2.6, z-ai/glm-5.1, tencent/hy3-preview, google/gemma-4-31b-it, google/gemma-4-26b-a4b-it, nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free, nvidia/nemotron-3-ultra-550b-a55b yes yes
xiaomi-mimo mimo-v2.5-pro, mimo-v2.5; speech/TTS IDs are selected through codewhale speech / tts yes yes for chat models; no for speech/TTS models
novita deepseek/deepseek-v4-pro, deepseek/deepseek-v4-flash yes yes
fireworks accounts/fireworks/models/deepseek-v4-pro yes yes
siliconflow deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash yes yes
arcee trinity-large-thinking, trinity-large-preview; provider-hinted custom model IDs pass through yes yes for trinity-large-thinking; no for trinity-large-preview
moonshot kimi-k2.7-code, kimi-k2.6 yes yes
sglang deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash yes yes
vllm deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash yes yes
ollama deepseek-coder:1.3b; custom tags pass through when provider hint is ollama yes no
huggingface deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash yes no
together deepseek-ai/DeepSeek-V4-Pro, deepseek-ai/DeepSeek-V4-Flash yes yes
openai-codex gpt-5.5 yes yes
anthropic claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5 yes yes for claude-opus-4-8 and claude-sonnet-4-6; no for claude-haiku-4-5

AtlasCloud keeps the same default model as the config layer and adds provider-scoped aliases for the Pro and Flash rows. Other AtlasCloud model IDs should still be selected through ATLASCLOUD_MODEL, config, or live model listing when available.

Capability Metadata

codewhale-tui doctor --json exposes the capability object. It is static metadata, not a live API probe. Current fields are:

resolved_provider, resolved_model, context_window, max_output, thinking_supported, cache_telemetry_supported, and request_payload_mode.

Most shipped providers use the Chat Completions request payload mode. Native Anthropic uses Messages, and openai-codex uses Responses.

Provider/model class Context window Max output metadata Thinking support Cache telemetry FIM endpoint
DeepSeek V4 (deepseek-v4-pro, deepseek-v4-flash) 1,000,000 384,000 yes yes DeepSeek beta only
DeepSeek compatibility aliases (deepseek-chat, deepseek-reasoner) 1,000,000 384,000 yes yes DeepSeek beta only
NVIDIA NIM V4 registry models 1,000,000 384,000 yes yes not documented in code
Volcengine Ark V4 model IDs 1,000,000 384,000 yes yes not documented in code
OpenRouter, Novita, Fireworks, SiliconFlow, SGLang, and vLLM V4 model IDs 1,000,000 384,000 yes no not documented in code
Xiaomi MiMo mimo-v2.5-pro, mimo-v2.5 1,000,000 131,072 yes no not documented in code
OpenRouter Qwen 3.6 Flash / Plus 1,000,000 65,536 yes no not documented in code
OpenRouter Qwen 3.6 35B / 27B 262,144 262,140 yes no not documented in code
OpenRouter Qwen 3.6 Max Preview 262,144 65,536 yes no not documented in code
OpenAI Codex / ChatGPT gpt-5.5 1,050,000 128,000 yes no not documented in code
Wanjie Ark reasoner / r1 model IDs 128,000 4,096 yes no not documented in code
Direct Arcee API trinity-large-thinking 262,144 262,144 yes no not documented in code
Direct Arcee API trinity-large-preview 262,144 4,096 no in doctor capability metadata no not documented in code
Generic openai, AtlasCloud, and Moonshot/Kimi 128,000 4,096 no in doctor capability metadata no not documented in code
Ollama 8,192 4,096 no no not documented in code
Hugging Face Inference Providers V4 model IDs 131,072 4,096 yes no not documented in code
Other recognized DeepSeek model IDs 128,000 unless the model name carries an explicit Nk hint 4,096 no unless V4/reasoner logic matches DeepSeek/NIM only DeepSeek beta only

Tool-call support is tracked separately by the static ModelRegistry and by the endpoint's ability to accept OpenAI-compatible tools payloads. A custom OpenAI-compatible or local endpoint can still reject tool calls even if CodeWhale can send the schema.

Hugging Face Inference Providers Notes

The shipped Hugging Face route targets the OpenAI-compatible Inference Providers router at https://router.huggingface.co/v1. Configure auth with HUGGINGFACE_API_KEY first, or HF_TOKEN as a fallback. Configure the endpoint with HUGGINGFACE_BASE_URL first, or HF_BASE_URL as a fallback; configure the model with HUGGINGFACE_MODEL first, or HF_MODEL as a fallback.

This route does not imply Hub browsing, model-card metadata, dataset access, Jobs, uploads, or export. Those remain explicit Model Lab work items so provider auth and artifact movement stay separate.

When a Local Model Prints Tool JSON

CodeWhale only executes tools when the provider returns Chat Completions tool_calls or streamed delta.tool_calls. If a local model prints text such as {"name":"grep_files","arguments":{...}} in the assistant message, that is ordinary model output, not an executable tool request.

For OpenAI-compatible or local runtimes, check:

  • The endpoint accepts the tools array in /v1/chat/completions requests.
  • The selected model or chat template is configured for function/tool calls.
  • The server returns tool_calls in the response rather than plain JSON text.
  • The compatibility layer does not strip tools before forwarding the request.
  • If in doubt, test a small read_file or grep_files request against a known tool-calling model before debugging CodeWhale's tool registry.

Changing provider, base_url, or model can select a route that supports the OpenAI-compatible payload shape, but CodeWhale cannot convert arbitrary JSON text into a trusted tool call after the model has emitted it as prose.

DeepSeek compatibility aliases deepseek-chat and deepseek-reasoner map to deepseek-v4-flash capability metadata and are scheduled to retire on 2026-07-24 at 2026-07-24T15:59:00Z.

Reasoning Effort

/reasoning <effort> (and the reasoning_effort config key) is translated to each provider's wire dialect by the client before the request is sent. off disables thinking where the dialect supports it; providers marked "omitted" receive no reasoning fields at all for that tier.

Provider off low/medium/high max/xhigh
deepseek, deepseek-cn, siliconflow, siliconflow-CN, sglang, volcengine, atlascloud thinking: {type: disabled} reasoning_effort: "high" + thinking: {type: enabled} reasoning_effort: "max" + thinking: {type: enabled}
openrouter, novita, together thinking: {type: disabled} reasoning_effort pass-through + thinking: {type: enabled} reasoning_effort: "xhigh" + thinking: {type: enabled}
moonshot thinking: {type: disabled} thinking: {type: enabled} thinking: {type: enabled}
ollama think: false think: true think: true
xiaomi-mimo thinking: {type: disabled} thinking: {type: enabled} thinking: {type: enabled}
nvidia-nim chat_template_kwargs.thinking: false chat_template_kwargs: thinking: true + reasoning_effort: "high" chat_template_kwargs: thinking: true + reasoning_effort: "max"
vllm chat_template_kwargs.enable_thinking: false chat_template_kwargs.enable_thinking: true + reasoning_effort low/medium/high chat_template_kwargs.enable_thinking: true + reasoning_effort: "high" (vLLM has no max tier)
arcee, huggingface omitted reasoning_effort pass-through reasoning_effort: "high"
fireworks omitted reasoning_effort: "high" reasoning_effort: "max"
openai, wanjie-ark omitted omitted omitted
openai-codex Responses API reasoning field (handled by the Responses bridge) Responses API reasoning field Responses API reasoning field

AtlasCloud serves DeepSeek models, so it speaks the DeepSeek reasoning dialect, including the max tier (#3024).

Drift Check

Run this before changing provider IDs, provider TOML tables, static model registry rows, or provider default strings:

python3 scripts/check-provider-registry.py

The check fails when:

  • docs/PROVIDERS.md omits a canonical ProviderKind::as_str() ID.
  • crates/tui/src/config.rs ApiProvider::as_str() diverges from ProviderKind::as_str() except for the explicit deepseek-cn legacy alias.
  • The shipped-provider table omits or adds a [providers.*] TOML table.
  • The static model registry table drifts from providers used by crates/agent/src/lib.rs.
  • A provider default model or base URL constant in crates/tui/src/config.rs is no longer mentioned here.

Planned, Not Shipped Yet

These items belong to the v0.8.48+ provider-abstraction milestone or related provider docs work, but they are not native shipped behavior in this checkout:

  • A unified Provider trait in codewhale-agent that owns env precedence, secret resolution, base URL normalization, auth-header construction, and provider metadata. Those responsibilities are still split across crates/config, crates/secrets, and crates/tui/src/client.rs.
  • Hugging Face model passport metadata in the picker, including license, base model, context length, chat template, tool-call support, reasoning support, and gated/private status.