docs: align Hugging Face provider docs, errors, and tests with shipped route

2026-06-07 02:32:41 -07:00
parent 8dff2f7525
commit a855b41d91
6 changed files with 370 additions and 20 deletions
@@ -7,8 +7,7 @@ those models become discoverable, evaluable, routable, servable, and exportable
 without weakening the current terminal-agent contract: local workspace control,
 explicit provider auth, approval gates, and clear privacy boundaries.

-This document is roadmap language. It does not mean every workset below is
-implemented today.
+This document is roadmap language. Some worksets below are roadmap-only.

 ## Implemented Today

@@ -19,6 +18,10 @@ implemented today.
  OpenAI-compatible endpoints, SGLang, vLLM, and Ollama are supported provider
  paths where their IDs appear in `/provider`, `codewhale --provider`, or
  `codewhale models`.
+- Hugging Face Inference Providers are available through the
+  OpenAI-compatible router at `https://router.huggingface.co/v1`. Select the
+  route with `huggingface`, `hugging-face`, `hugging_face`, or `hf`; configure
+  `HUGGINGFACE_API_KEY` or `HF_TOKEN` for auth.
 - Model auto-routing chooses a concrete DeepSeek model and thinking level per
  turn. It is not a TUI mode.
 - Fin is the fast `deepseek-v4-flash` thinking-off path for routing,
@@ -27,11 +30,10 @@ implemented today.
 - Self-hosted OpenAI-compatible endpoints can be used through SGLang, vLLM,
  Ollama, or the generic `openai` provider configuration.

-## Not Implemented Yet
+## Still Planned

- A native Hugging Face provider or Hub browser.
- Built-in Hugging Face model card, dataset, adapter, safetensors, or Jobs
-  workflows.
+- Hugging Face Hub browsing, upload/export, model card, dataset, adapter,
+  safetensors, or Jobs workflows.
 - Native Unsloth, NeMo, or Arcee integrations.
 - A dedicated Model Lab UI tab.
 - Built-in benchmark suites, eval leaderboards, hosted observability, or
@@ -57,18 +59,24 @@ describe a model as available before CodeWhale can actually route to it.

 ## Hugging Face Workset

+Implemented today:
+
+- Hugging Face Inference Providers as an explicit OpenAI-compatible router
+  provider, selected with `huggingface`, `hugging-face`, `hugging_face`, or
+  `hf`.
+- Model IDs are sent to the router exactly as selected, including
+  org-prefixed Hugging Face model IDs.
+
 Planned scope:

 - Hub API auth and model discovery.
 - Model cards, licenses, tags, safetensors metadata, adapters, and dataset
  links surfaced in a terminal-friendly way.
- Inference Providers as explicit provider choices when the user configures
-  them.
 - Hugging Face Jobs as an optional remote execution path for user-approved
  experiments.

-Non-goal for now: claiming a native Hugging Face provider exists before it is
-implemented in code.
+Non-goal for now: treating the router route as Hub browsing/export, or
+inferring Hub upload/export auth from the inference-provider API key.

 ## Unsloth Workset

@@ -138,8 +146,9 @@ Planned scope:
 - Local files, prompts, transcripts, traces, model outputs, eval results,
  adapters, datasets, and checkpoints should remain local unless the user
  explicitly chooses a provider or export destination.
- Provider auth must remain explicit. `DEEPSEEK_*`, OpenRouter, Hugging Face,
-  and self-hosted credentials should not be inferred from unrelated config.
+- Provider auth must remain explicit. `DEEPSEEK_*`, OpenRouter,
+  `HUGGINGFACE_API_KEY` / `HF_TOKEN`, and self-hosted credentials should not be
+  inferred from unrelated config.
 - Exportable artifacts should include provenance: source model, provider,
  route, tool policy, eval inputs, and redaction status.
 - Public sharing, hosted telemetry, sponsorship badges, and external branding
@@ -44,6 +44,11 @@ Use any of these surfaces to select a provider:
 as legacy aliases for `deepseek`. They do not select a different official host;
 DeepSeek uses the same official API host worldwide.

+`huggingface`, `hugging-face`, `hugging_face`, and `hf` all select the
+Hugging Face Inference Providers route. This is the OpenAI-compatible router
+path for chat/inference, not Hub browsing, model-card inspection, uploads, or
+artifact export.
+
 Fresh shared config writes to `~/.codewhale/config.toml`. Existing
 `~/.deepseek/config.toml` files are still read for compatibility.

@@ -128,7 +133,7 @@ endpoint.
 | `sglang` | `[providers.sglang]` | Optional `SGLANG_API_KEY` | `SGLANG_BASE_URL`; default `http://localhost:30000/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Self-hosted OpenAI-compatible route. Localhost deployments commonly omit auth. `SGLANG_MODEL` is accepted. |
 | `vllm` | `[providers.vllm]` | Optional `VLLM_API_KEY` | `VLLM_BASE_URL`; default `http://localhost:8000/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Self-hosted vLLM OpenAI-compatible route. Localhost deployments commonly omit auth. `VLLM_MODEL` is accepted. |
 | `ollama` | `[providers.ollama]` | Optional `OLLAMA_API_KEY` | `OLLAMA_BASE_URL`; default `http://localhost:11434/v1` | `deepseek-coder:1.3b`; provider-hinted custom tags pass through | Self-hosted Ollama OpenAI-compatible route. Localhost deployments commonly omit auth. `OLLAMA_MODEL` is accepted. |
-| `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible route. Org-prefixed model IDs pass through. |
+| `huggingface` | `[providers.huggingface]` | `HUGGINGFACE_API_KEY`, `HF_TOKEN` | `HUGGINGFACE_BASE_URL`, `HF_BASE_URL`; default `https://router.huggingface.co/v1` | `deepseek-ai/DeepSeek-V4-Pro`, `deepseek-ai/DeepSeek-V4-Flash` | Hugging Face Inference Providers OpenAI-compatible router route. Accepted aliases: `huggingface`, `hugging-face`, `hugging_face`, `hf`. Org-prefixed model IDs pass through. `HUGGINGFACE_MODEL` and `HF_MODEL` are accepted. Hub browsing/export are separate future features. |

 ### Xiaomi MiMo Notes

@@ -223,6 +228,18 @@ the endpoint's ability to accept OpenAI-compatible `tools` payloads. A custom
 OpenAI-compatible or local endpoint can still reject tool calls even if
 CodeWhale can send the schema.

+### Hugging Face Inference Providers Notes
+
+The shipped Hugging Face route targets the OpenAI-compatible Inference Providers
+router at `https://router.huggingface.co/v1`. Configure auth with
+`HUGGINGFACE_API_KEY` first, or `HF_TOKEN` as a fallback. Configure the endpoint
+with `HUGGINGFACE_BASE_URL` first, or `HF_BASE_URL` as a fallback; configure the
+model with `HUGGINGFACE_MODEL` first, or `HF_MODEL` as a fallback.
+
+This route does not imply Hub browsing, model-card metadata, dataset access,
+Jobs, uploads, or export. Those remain explicit Model Lab work items so
+provider auth and artifact movement stay separate.
+
 ### When a Local Model Prints Tool JSON

 CodeWhale only executes tools when the provider returns Chat Completions