docs(config): clarify provider path suffix support

Records that #2506/#2508 are superseded by the safer #2558 path_suffix implementation, credits the original #1874 report and follow-up PR review trail, and documents that suffix overrides only affect chat completions while model and beta paths keep built-in routing.
2026-06-03 23:56:40 -07:00
parent f5e6d46848
commit 13cabac077
5 changed files with 43 additions and 12 deletions
@@ -103,6 +103,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  generated-context targets, trust markers, and trust config paths, and it
  stores post-load signatures so auto-generated context deletion/regeneration
  stays correct (#2636).
+- Configuration docs now show the provider-local `path_suffix` escape hatch
+  for OpenAI-compatible gateways that accept `/chat/completions` but reject
+  `/v1/chat/completions`, while making clear that model listing and DeepSeek
+  beta routes keep their built-in paths (#1874).

 ### Community

@@ -122,11 +126,13 @@ dense tool-call transcript collapse/sidebar detail direction (#2738, #2734,
 **@h3c-hexin** for the tool-agent model inheritance and configured
 `skills_dir` fixes (#2736, #2737). Thanks also to **@qiyuanlicn** for the
 checkpoint/resume report that shaped the sub-agent recovery slice (#2029),
-to **@bevis-wong** for the long-running shell/task liveness report (#1786),
-and to **@NASLXTO** and
-**@wuxixing** for the large-workspace startup reports (#697, #1827), and to
-**@linzhiqin2003** and **@merchloubna70-dot** for earlier context-cap and
-startup-diagnosis work that shaped this bounded fallback.
+**@bevis-wong** for the long-running shell/task liveness report (#1786),
+**@shuxiangxuebiancheng** for the third-party OpenAI-compatible path report
+(#1874), **@hongqitai** and **@cyq1017** for the follow-up path-suffix PR
+review trail (#2508, #2506), **@NASLXTO** and **@wuxixing** for the
+large-workspace startup reports (#697, #1827), and **@linzhiqin2003** and
+**@merchloubna70-dot** for earlier context-cap and startup-diagnosis work that
+shaped this bounded fallback.

 ## [0.8.53] - 2026-06-03

@@ -638,6 +638,11 @@ Current v0.9 track credits:
 - **[NASLXTO](https://github.com/NASLXTO)** and
  **[wuxixing](https://github.com/wuxixing)** — large-workspace startup
  reports that shaped the bounded project-context fallback (#697, #1827)
+- **[shuxiangxuebiancheng](https://github.com/shuxiangxuebiancheng)**,
+  **[hongqitai](https://github.com/hongqitai)**, and
+  **[cyq1017](https://github.com/cyq1017)** — third-party
+  OpenAI-compatible path-suffix report and follow-up review trail (#1874,
+  #2508, #2506)

 Current and recurring contributors include:

@@ -103,6 +103,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  generated-context targets, trust markers, and trust config paths, and it
  stores post-load signatures so auto-generated context deletion/regeneration
  stays correct (#2636).
+- Configuration docs now show the provider-local `path_suffix` escape hatch
+  for OpenAI-compatible gateways that accept `/chat/completions` but reject
+  `/v1/chat/completions`, while making clear that model listing and DeepSeek
+  beta routes keep their built-in paths (#1874).

 ### Community

@@ -122,11 +126,13 @@ dense tool-call transcript collapse/sidebar detail direction (#2738, #2734,
 **@h3c-hexin** for the tool-agent model inheritance and configured
 `skills_dir` fixes (#2736, #2737). Thanks also to **@qiyuanlicn** for the
 checkpoint/resume report that shaped the sub-agent recovery slice (#2029),
-to **@bevis-wong** for the long-running shell/task liveness report (#1786),
-and to **@NASLXTO** and
-**@wuxixing** for the large-workspace startup reports (#697, #1827), and to
-**@linzhiqin2003** and **@merchloubna70-dot** for earlier context-cap and
-startup-diagnosis work that shaped this bounded fallback.
+**@bevis-wong** for the long-running shell/task liveness report (#1786),
+**@shuxiangxuebiancheng** for the third-party OpenAI-compatible path report
+(#1874), **@hongqitai** and **@cyq1017** for the follow-up path-suffix PR
+review trail (#2508, #2506), **@NASLXTO** and **@wuxixing** for the
+large-workspace startup reports (#697, #1827), and **@linzhiqin2003** and
+**@merchloubna70-dot** for earlier context-cap and startup-diagnosis work that
+shaped this bounded fallback.

 ## [0.8.53] - 2026-06-03

@@ -209,6 +209,19 @@ legacy top-level `base_url`, so the OpenAI-compatible provider receives it.
 provider tables in one config, `[providers.openai].model` can be used as the
 OpenAI-provider-specific override.

+If the gateway accepts `POST /chat/completions` but rejects
+`/v1/chat/completions`, set a provider-local `path_suffix`:
+
+```toml
+[providers.openai]
+base_url = "https://your-gateway.example/v1"
+path_suffix = "/chat/completions"
+```
+
+The suffix applies only to chat-completion requests. Model listing and
+DeepSeek beta paths keep their built-in routing so a generic gateway override
+does not accidentally rewrite `/models` or `/beta/completions`.
+
 Local HTTP endpoints such as Ollama, SGLang, and vLLM are allowed by default
 when they use localhost or loopback addresses. For a non-local `http://`
 gateway, launch with `DEEPSEEK_ALLOW_INSECURE_HTTP=1` only on a trusted network:
@@ -744,6 +757,7 @@ If you are upgrading from older releases:
 - `provider` (string, optional): `deepseek` (default), `nvidia-nim`, `openai`, `atlascloud`, `wanjie-ark`, `volcengine`, `openrouter`, `xiaomi-mimo`, `novita`, `fireworks`, `siliconflow`, `siliconflow-CN`, `arcee`, `moonshot`, `sglang`, `vllm`, or `ollama`. Legacy `deepseek-cn` configs are still accepted as an alias for `deepseek`; DeepSeek uses the same official host [`https://api.deepseek.com`](https://api-docs.deepseek.com/) worldwide. `nvidia-nim` targets NVIDIA's NIM-hosted DeepSeek endpoints through `https://integrate.api.nvidia.com/v1`; `openai` targets a generic OpenAI-compatible endpoint, defaulting to `https://api.openai.com/v1`; `atlascloud` targets AtlasCloud's OpenAI-compatible endpoint at `https://api.atlascloud.ai/v1`; `wanjie-ark` targets Wanjie Ark's OpenAI-compatible endpoint at `https://maas-openapi.wanjiedata.com/api/v1`; `volcengine` targets Volcengine Ark's OpenAI-compatible coding endpoint at `https://ark.cn-beijing.volces.com/api/coding/v3`; `openrouter` targets `https://openrouter.ai/api/v1`; `xiaomi-mimo` targets Xiaomi MiMo's OpenAI-compatible endpoint at `https://api.xiaomimimo.com/v1`; `novita` targets `https://api.novita.ai/v1`; `fireworks` targets `https://api.fireworks.ai/inference/v1`; `siliconflow` targets SiliconFlow, defaulting to `https://api.siliconflow.com/v1`; `siliconflow-CN` targets the SiliconFlow China regional endpoint while sharing `[providers.siliconflow]`; `arcee` targets Arcee AI's OpenAI-compatible endpoint at `https://api.arcee.ai/api/v1`; `moonshot` targets Moonshot/Kimi, defaulting to `https://api.moonshot.ai/v1`; `sglang` targets a self-hosted OpenAI-compatible endpoint, defaulting to `http://localhost:30000/v1`; `vllm` targets a self-hosted vLLM OpenAI-compatible endpoint, defaulting to `http://localhost:8000/v1`; `ollama` targets Ollama's OpenAI-compatible endpoint, defaulting to `http://localhost:11434/v1`.
 - `api_key` (string, required for hosted providers): must be non-empty for DeepSeek/hosted providers (or set the provider API key env var). Self-hosted SGLang, vLLM, and Ollama can omit it.
 - `base_url` (string, optional): defaults to `https://api.deepseek.com/beta` for DeepSeek's OpenAI-compatible Chat Completions API, including legacy `provider = "deepseek-cn"` configs. Other defaults are `https://integrate.api.nvidia.com/v1` for `nvidia-nim`, `https://api.openai.com/v1` for `openai`, `https://api.atlascloud.ai/v1` for `atlascloud`, `https://maas-openapi.wanjiedata.com/api/v1` for `wanjie-ark`, `https://ark.cn-beijing.volces.com/api/coding/v3` for `volcengine`, `https://openrouter.ai/api/v1` for `openrouter`, `https://api.xiaomimimo.com/v1` for `xiaomi-mimo`, `https://api.novita.ai/v1` for `novita`, `https://api.fireworks.ai/inference/v1` for `fireworks`, `https://api.siliconflow.com/v1` for `siliconflow`, `https://api.siliconflow.cn/v1` for `siliconflow-CN`, `https://api.arcee.ai/api/v1` for `arcee`, `https://api.moonshot.ai/v1` for `moonshot`, `http://localhost:30000/v1` for `sglang`, `http://localhost:8000/v1` for `vllm`, and `http://localhost:11434/v1` for `ollama`. Set `https://api.deepseek.com` or `https://api.deepseek.com/v1` explicitly to opt out of DeepSeek beta features.
+- `path_suffix` (string, optional provider-table key): override the chat-completions path for OpenAI-compatible gateways that do not serve `/v1/chat/completions`. For example, `[providers.openai] path_suffix = "/chat/completions"` sends chat requests to the unversioned base URL plus `/chat/completions`; `models` and `beta/*` requests keep their normal routing.
 - `default_text_model` (string, optional): defaults to `deepseek-v4-pro` for DeepSeek and generic OpenAI-compatible endpoints, `deepseek-ai/deepseek-v4-pro` for NVIDIA NIM, `deepseek-ai/deepseek-v4-flash` for AtlasCloud, `deepseek-reasoner` for Wanjie Ark, `DeepSeek-V4-Pro` for Volcengine Ark, `deepseek/deepseek-v4-pro` for OpenRouter and Novita, `mimo-v2.5-pro` for Xiaomi MiMo, `accounts/fireworks/models/deepseek-v4-pro` for Fireworks, `deepseek-ai/DeepSeek-V4-Pro` for SiliconFlow, `trinity-large-thinking` for Arcee AI, `kimi-k2.6` for Moonshot, `deepseek-ai/DeepSeek-V4-Pro` for SGLang/vLLM, and `deepseek-coder:1.3b` for Ollama. Current public DeepSeek IDs are `deepseek-v4-pro` and `deepseek-v4-flash`, both with 1M context windows, 384K max output, and thinking mode enabled by default. Legacy `deepseek-chat` and `deepseek-reasoner` remain compatibility aliases for `deepseek-v4-flash` until July 24, 2026, except SiliconFlow maps `deepseek-reasoner` and `deepseek-r1` to its Pro model while `deepseek-chat` and `deepseek-v3` map to Flash. Provider-specific mappings translate `deepseek-v4-pro` / `deepseek-v4-flash` to each provider's model ID where supported. OpenRouter also recognizes recent large IDs such as `arcee-ai/trinity-large-thinking`, `minimax/minimax-m3`, `xiaomi/mimo-v2.5-pro`, `qwen/qwen3.6-flash`, `qwen/qwen3.6-35b-a3b`, `qwen/qwen3.6-max-preview`, `qwen/qwen3.6-27b`, `qwen/qwen3.6-plus`, `google/gemma-4-31b-it`, and `moonshotai/kimi-k2.6`; direct Arcee uses bare IDs such as `trinity-large-thinking` and `trinity-large-preview`; direct Xiaomi MiMo recognizes chat IDs `mimo-v2.5-pro` and `mimo-v2.5`, while TTS IDs are selected through `codewhale speech` / `tts`. Generic `openai`, `atlascloud`, `wanjie-ark`, `xiaomi-mimo`, `arcee`, and Ollama model IDs are passed through unchanged after known aliases are normalized. OpenRouter and SiliconFlow provider configs with a custom `base_url` also preserve explicit model values, which lets OpenAI-compatible gateways accept bare model IDs. Use `/models` or `codewhale models` to discover live IDs from your configured endpoint. `CODEWHALE_MODEL` overrides this for a single process; `DEEPSEEK_MODEL` is the legacy alias.
 - `reasoning_effort` (string, optional): `off`, `low`, `medium`, `high`, or `max`; defaults to the configured UI tier. DeepSeek Platform receives top-level `thinking` / `reasoning_effort` fields. NVIDIA NIM receives equivalent settings through `chat_template_kwargs`.
 - `allow_shell` (bool, optional): defaults to `false`; shell tools must be explicitly enabled.
@@ -107,9 +107,9 @@ v0.9 branch so the remaining Windows/manual checks are explicit.
 | #2501 in-process LLM response cache | Conflicting | Defer; cache key risks noted in prior review. |
 | #2502 web_run RwLock split | Mergeable | Manually harvested with panic-safety and shared cached-page reads; close/comment after branch is public. |
 | #2505 subagent cap accounting | Draft/conflicting | Compare with current subagent cap tests before harvest. |
-| #2506 provider path suffix overrides | Draft/conflicting | Partly superseded by current provider path-suffix support; verify. |
+| #2506 provider path suffix overrides | Draft/conflicting / superseded | The current branch already contains provider-table `path_suffix` support from #2558 with the safer constrained behavior: only `chat/completions` uses the override, while `models` and DeepSeek `beta/*` keep their built-in routing. `cargo test -p codewhale-tui --bin codewhale-tui --locked api_url_with_suffix -- --nocapture` passed. Credit @cyq1017 for the earlier design/review trail; comment/close after branch is public, keeping #1874 tied to the shipped #2558 implementation/docs. |
 | #2507 stream chunk timeout config | Draft/conflicting | Defer unless stabilization needs it. |
-| #2508 configurable path suffix | Conflicting | Likely superseded by #2506/current code; verify linked issue #2089. |
+| #2508 configurable path suffix | Conflicting / superseded | #2089 is already closed. The current implementation covers #1874's third-party gateway need without the broader env/CLI surface from #2508. Docs now show `[providers.openai].path_suffix = "/chat/completions"` and state that model/beta paths are not rewritten. Credit @hongqitai for the follow-up PR and @shuxiangxuebiancheng for the original #1874 report; close/comment after branch is public. |
 | #2509 parallel read-only web search | Closed / already merged via #2504 | Already present in `origin/main` as `a09af2024`; closed as harvested/superseded on 2026-06-04. |
 | #2510 custom DuckDuckGo endpoint | Draft/mergeable | Low priority; defer unless docs/search lane takes it. |
 | #2511 ToolCallBefore hooks | Conflicting | Defer to hook lifecycle lane. |