Files
codewhale/docs/MODEL_LAB.md
T

156 lines
5.8 KiB
Markdown

# Model Lab Roadmap
Model Lab is the planned open-model workbench for CodeWhale. The north star is
simple: CodeWhale should become the best terminal coding agent for open-source
and open-weight models across every provider that offers them. Model Lab is how
those models become discoverable, evaluable, routable, servable, and exportable
without weakening the current terminal-agent contract: local workspace control,
explicit provider auth, approval gates, and clear privacy boundaries.
This document is roadmap language. Some worksets below are roadmap-only.
## Implemented Today
- DeepSeek is the first-class default provider today, with `deepseek-v4-pro`,
`deepseek-v4-flash`, streaming thinking blocks, Fin routing, `DEEPSEEK_*`
environment variables, and `~/.deepseek` config compatibility.
- OpenRouter, Novita, Fireworks, NVIDIA NIM, AtlasCloud, Wanjie Ark, generic
OpenAI-compatible endpoints, SGLang, vLLM, and Ollama are supported provider
paths where their IDs appear in `/provider`, `codewhale --provider`, or
`codewhale models`.
- Hugging Face Inference Providers are available through the
OpenAI-compatible router at `https://router.huggingface.co/v1`. Select the
route with `huggingface`, `hugging-face`, `hugging_face`, or `hf`; configure
`HUGGINGFACE_API_KEY` or `HF_TOKEN` for auth.
- Model auto-routing chooses a concrete DeepSeek model and thinking level per
turn. It is not a TUI mode.
- Fin is the fast `deepseek-v4-flash` thinking-off path for routing,
summaries, cheap checks, RLM child calls, wakeup verification, and
binary-completion checks.
- Self-hosted OpenAI-compatible endpoints can be used through SGLang, vLLM,
Ollama, or the generic `openai` provider configuration.
## Still Planned
- Hugging Face Hub browsing, upload/export, model card, dataset, adapter,
safetensors, or Jobs workflows.
- Native Unsloth, NeMo, or Arcee integrations.
- A dedicated Model Lab UI tab.
- Built-in benchmark suites, eval leaderboards, hosted observability, or
training-infrastructure orchestration.
Until those land, use the provider paths above, MCP servers, or external
workflows explicitly configured by the user.
## Model Lab Principle
Model Lab should help users answer practical questions:
- Which model should handle this turn?
- Which open or open-weight model can I run locally or through a trusted
provider?
- Which provider offers this model with the latency, price, context window,
license, and privacy posture I need?
- What did this model cost, how did it perform, and what data left my machine?
- Can I reproduce, export, or self-host the route?
It should never hide provider boundaries, silently upload local artifacts, or
describe a model as available before CodeWhale can actually route to it.
## Hugging Face Workset
Implemented today:
- Hugging Face Inference Providers as an explicit OpenAI-compatible router
provider, selected with `huggingface`, `hugging-face`, `hugging_face`, or
`hf`.
- Model IDs are sent to the router exactly as selected, including
org-prefixed Hugging Face model IDs.
Planned scope:
- Hub API auth and model discovery.
- Model cards, licenses, tags, safetensors metadata, adapters, and dataset
links surfaced in a terminal-friendly way.
- Hugging Face Jobs as an optional remote execution path for user-approved
experiments.
Non-goal for now: treating the router route as Hub browsing/export, or
inferring Hub upload/export auth from the inference-provider API key.
## Unsloth Workset
Planned scope:
- Fine-tuning recipes and adapter workflows for users who already own the data
and compute path.
- Export guidance that keeps dataset, adapter, and checkpoint locations explicit.
- Compatibility notes for models that can return to local serving or a hosted
OpenAI-compatible endpoint.
## NeMo Workset
Planned scope:
- Training and alignment workflow notes for users operating NVIDIA-centric
infrastructure.
- Clear boundaries between NVIDIA NIM inference support that exists today and
future NeMo training or customization workflows.
## Arcee Workset
Planned scope:
- Small-model routing and specialization experiments.
- Exportable routes that make it clear when a task is handled by a smaller
model, Fin, or full DeepSeek reasoning.
## Serving Workset
Planned scope:
- Better local and private serving ergonomics for SGLang, vLLM, Ollama, and
OpenAI-compatible gateways.
- Health checks, model listing, context-window metadata, and route validation.
- No silent network exposure: public endpoints must be configured explicitly.
## Eval Workset
Planned scope:
- Reproducible task suites for coding, review, docs, release checks, and
long-context workflows.
- Side-by-side route comparisons where the exact model, provider, thinking
level, prompt, and tool policy are captured.
## Observability Workset
Planned scope:
- Local-first traces for turn routing, tool calls, approvals, cost, cache
behavior, and context pressure.
- Export rules that redact secrets and require explicit user action before data
leaves the machine.
## Training Infra Workset
Planned scope:
- Recipes for dataset preparation, adapter training, artifact naming, and
promotion into serving.
- Separation between local/private artifacts and anything published to a hub or
registry.
## Privacy And Export Rules
- Local files, prompts, transcripts, traces, model outputs, eval results,
adapters, datasets, and checkpoints should remain local unless the user
explicitly chooses a provider or export destination.
- Provider auth must remain explicit. `DEEPSEEK_*`, OpenRouter,
`HUGGINGFACE_API_KEY` / `HF_TOKEN`, and self-hosted credentials should not be
inferred from unrelated config.
- Exportable artifacts should include provenance: source model, provider,
route, tool policy, eval inputs, and redaction status.
- Public sharing, hosted telemetry, sponsorship badges, and external branding
require maintainer approval.