docs: align Hugging Face provider docs, errors, and tests with shipped route

This commit is contained in:
Matt Van Horn
2026-06-07 02:32:41 -07:00
parent 8dff2f7525
commit a855b41d91
6 changed files with 370 additions and 20 deletions
+21 -12
View File
@@ -7,8 +7,7 @@ those models become discoverable, evaluable, routable, servable, and exportable
without weakening the current terminal-agent contract: local workspace control,
explicit provider auth, approval gates, and clear privacy boundaries.
This document is roadmap language. It does not mean every workset below is
implemented today.
This document is roadmap language. Some worksets below are roadmap-only.
## Implemented Today
@@ -19,6 +18,10 @@ implemented today.
OpenAI-compatible endpoints, SGLang, vLLM, and Ollama are supported provider
paths where their IDs appear in `/provider`, `codewhale --provider`, or
`codewhale models`.
- Hugging Face Inference Providers are available through the
OpenAI-compatible router at `https://router.huggingface.co/v1`. Select the
route with `huggingface`, `hugging-face`, `hugging_face`, or `hf`; configure
`HUGGINGFACE_API_KEY` or `HF_TOKEN` for auth.
- Model auto-routing chooses a concrete DeepSeek model and thinking level per
turn. It is not a TUI mode.
- Fin is the fast `deepseek-v4-flash` thinking-off path for routing,
@@ -27,11 +30,10 @@ implemented today.
- Self-hosted OpenAI-compatible endpoints can be used through SGLang, vLLM,
Ollama, or the generic `openai` provider configuration.
## Not Implemented Yet
## Still Planned
- A native Hugging Face provider or Hub browser.
- Built-in Hugging Face model card, dataset, adapter, safetensors, or Jobs
workflows.
- Hugging Face Hub browsing, upload/export, model card, dataset, adapter,
safetensors, or Jobs workflows.
- Native Unsloth, NeMo, or Arcee integrations.
- A dedicated Model Lab UI tab.
- Built-in benchmark suites, eval leaderboards, hosted observability, or
@@ -57,18 +59,24 @@ describe a model as available before CodeWhale can actually route to it.
## Hugging Face Workset
Implemented today:
- Hugging Face Inference Providers as an explicit OpenAI-compatible router
provider, selected with `huggingface`, `hugging-face`, `hugging_face`, or
`hf`.
- Model IDs are sent to the router exactly as selected, including
org-prefixed Hugging Face model IDs.
Planned scope:
- Hub API auth and model discovery.
- Model cards, licenses, tags, safetensors metadata, adapters, and dataset
links surfaced in a terminal-friendly way.
- Inference Providers as explicit provider choices when the user configures
them.
- Hugging Face Jobs as an optional remote execution path for user-approved
experiments.
Non-goal for now: claiming a native Hugging Face provider exists before it is
implemented in code.
Non-goal for now: treating the router route as Hub browsing/export, or
inferring Hub upload/export auth from the inference-provider API key.
## Unsloth Workset
@@ -138,8 +146,9 @@ Planned scope:
- Local files, prompts, transcripts, traces, model outputs, eval results,
adapters, datasets, and checkpoints should remain local unless the user
explicitly chooses a provider or export destination.
- Provider auth must remain explicit. `DEEPSEEK_*`, OpenRouter, Hugging Face,
and self-hosted credentials should not be inferred from unrelated config.
- Provider auth must remain explicit. `DEEPSEEK_*`, OpenRouter,
`HUGGINGFACE_API_KEY` / `HF_TOKEN`, and self-hosted credentials should not be
inferred from unrelated config.
- Exportable artifacts should include provenance: source model, provider,
route, tool policy, eval inputs, and redaction status.
- Public sharing, hosted telemetry, sponsorship badges, and external branding
+18 -1
View File
@@ -44,6 +44,11 @@ Use any of these surfaces to select a provider:
as legacy aliases for `deepseek`. They do not select a different official host;
DeepSeek uses the same official API host worldwide.
`huggingface`, `hugging-face`, `hugging_face`, and `hf` all select the
Hugging Face Inference Providers route. This is the OpenAI-compatible router
path for chat/inference, not Hub browsing, model-card inspection, uploads, or
artifact export.
Fresh shared config writes to `~/.codewhale/config.toml`. Existing
`~/.deepseek/config.toml` files are still read for compatibility.
@@ -128,7 +133,7 @@ endpoint.
| `sglang` | `[providers.sglang]` | Optional `SGLANG_API_KEY` | `SGLANG_BASE_URL`; default `http://localhost:30000/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Self-hosted OpenAI-compatible route. Localhost deployments commonly omit auth. `SGLANG_MODEL` is accepted. |
| `vllm` | `[providers.vllm]` | Optional `VLLM_API_KEY` | `VLLM_BASE_URL`; default `http://localhost:8000/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Self-hosted vLLM OpenAI-compatible route. Localhost deployments commonly omit auth. `VLLM_MODEL` is accepted. |
| `ollama` | `[providers.ollama]` | Optional `OLLAMA_API_KEY` | `OLLAMA_BASE_URL`; default `http://localhost:11434/v1` | `deepseek-coder:1.3b`; provider-hinted custom tags pass through | Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. `OLLAMA_MODEL` is accepted. |
| `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible route. Org-prefixed model IDs pass through. |
| `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`, `HF_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible router route. Accepted aliases: `huggingface`, `hugging-face`, `hugging_face`, `hf`. Org-prefixed model IDs pass through. `HUGGINGFACE_MODEL` and `HF_MODEL` are accepted. Hub browsing/export are separate future features. |
### Xiaomi MiMo Notes
@@ -223,6 +228,18 @@ the endpoint's ability to accept OpenAI-compatible `tools` payloads. A custom
OpenAI-compatible or local endpoint can still reject tool calls even if
CodeWhale can send the schema.
### Hugging Face Inference Providers Notes
The shipped Hugging Face route targets the OpenAI-compatible Inference Providers
router at `https://router.huggingface.co/v1`. Configure auth with
`HUGGINGFACE_API_KEY` first, or `HF_TOKEN` as a fallback. Configure the endpoint
with `HUGGINGFACE_BASE_URL` first, or `HF_BASE_URL` as a fallback; configure the
model with `HUGGINGFACE_MODEL` first, or `HF_MODEL` as a fallback.
This route does not imply Hub browsing, model-card metadata, dataset access,
Jobs, uploads, or export. Those remain explicit Model Lab work items so
provider auth and artifact movement stay separate.
### When a Local Model Prints Tool JSON
CodeWhale only executes tools when the provider returns Chat Completions