The token estimator walks the full session.messages and the active system
prompt. Five call sites per turn in the engine (capacity pre/post tool
checkpoints, error escalation, the seam manager, the trim budget check)
plus four TUI/command consumers (footer, /status, /debug, context
inspector) all re-walked the same data independently. On a 200-message
history with 5 KB of tool results that is roughly 2 ms per call, or
~20 ms of pure waste on a single turn.
Introduce a process-local TokenEstimateCache keyed on
(session.messages_revision, system_prompt_fingerprint). Repeated calls
with the same inputs return the cached value without re-walking the
message list. The cache invalidates as soon as either input changes:
* session.messages_revision is a monotonic counter bumped in
Session::add_message, Session::replace_messages, the new
Session::bump_messages_revision helper, and at every direct
session.messages mutation site in core/engine.rs and
core/engine/capacity_flow.rs.
* system_prompt_fingerprint is a stable 64-bit hash of the
SystemPrompt::Text or SystemPrompt::Blocks payload.
Also restructures layered_context_checkpoint to compute the estimated
token count before taking a long-lived &SeamManager borrow, and
re-routes the capacity pre/post tool checkpoints to compute the
observation into a local before calling
capacity_controller.observe_*. Both refactors are required to satisfy
the borrow checker once estimated_input_tokens requires &mut self.
Tests: 10 new unit tests cover the miss/hit path, revision bumps,
system-prompt changes, audit-ring capacity, and downward-revision
no-ops. The full 157-test engine suite still passes.
The `effective_max_output_tokens` heuristic defaults to 64K for any model
not in the known-context-window table. This is fine for DeepSeek's hosted
API (1M context) but causes immediate HTTP 400s on self-hosted providers
with tight `max-model-len`.
Example: vLLM serving Qwen3.6 with `--max-model-len 65536` rejects
requests because 64000 (output) + ~1500 (input) exceeds the limit by 1
token.
This change lets the operator set `DEEPSEEK_MAX_OUTPUT_TOKENS=16384` (or
whatever fits their deployment) to override the heuristic. The env var
takes precedence over the model-table lookup when set to a positive
integer; otherwise the existing behavior is preserved.
No new config struct field — env-only override keeps the public API
unchanged. Useful for embedded users (e.g. pinvou3) who need to control
output budget without forking the engine config schema.
Co-authored-by: hexin <he.xin@h3c.com>
Bundles the v0.8.8 stabilization fixes that were already implemented in the
working tree, plus the workflow/doc reconciliation called out in #507.
### Sub-agent runtime fixes
- **#509** Default sub-agent cap raised to 10 (configurable via
`[subagents].max_concurrent` in `config.toml`, hard ceiling 20). The
running-count calculation now ignores non-running, no-handle, and finished
handles so completed agents stop counting against the cap.
- **#510** `SharedSubAgentManager` is now `Arc<RwLock<...>>`; the read paths
that previously held a `Mutex` for inspection now take a read lock,
eliminating the multi-agent fan-out UI freeze.
- **#511** `compact_tool_result_for_context` summarizes `agent_result` /
`agent_wait` payloads before they are folded into the parent context.
- **#512** RLM tool cards map to `ToolFamily::Rlm` and render `rlm`, not
`swarm`. Stale "swarm" wording cleaned in docs/comments/tests.
- **#513** (foreground stopgap only) Foreground RLM work is visible in the
Agents sidebar projection. Full async RLM lifecycle remains v0.8.9 — the
issue stays open with a refined scope.
### TUI / UX fixes
- **#487** Offline composer queue is now session-scoped; legacy unscoped
queues fail closed.
- **#488** Composer Option+Backspace deletes by word; cross-platform key
routing helpers added.
- **#443/#444** Keyboard enhancement flags pop on normal AND panic exit; the
raw-mode startup probe is now bounded by a configurable timeout.
- **#449** Production footer reads statusline colors from `app.ui_theme`
rather than the bespoke palette.
- **#506** `display_path_with_home` no longer mutates `HOME` in tests; the
flake on shared-env CI is gone.
### Self-update / packaging
- **#503** `update.rs` arch mapping uses release-asset naming (`arm64`/`x64`)
instead of the raw Rust constants. The platform-asset selector also rejects
`.sha256` siblings as primary binaries. Tests now live alongside the source
in `mod tests` (the `#[path]`-based integration test was removed because it
duplicated test runs and forced a `pub(crate)` helper that no real caller
used).
- **`Max 5 in flight` wording updated** in `agent_spawn` description,
`prompts/base.md`, and `docs/TOOL_SURFACE.md` so the model sees the real
default cap (10) and the configuration knob name.
### CI / release docs (#507)
- Pruned three duplicated/dead workflows: `crates-publish.yml`, `parity.yml`,
`publish-npm.yml`. Their gates already run in `ci.yml` for every push/PR.
- `release.yml` build job now allows `parity` to be skipped (it only runs on
tag push), unblocking `workflow_dispatch` reruns. The job still fails
closed on a real parity failure.
- `RELEASE_RUNBOOK.md` reconciled: crate publishing is documented as the
manual `scripts/release/publish-crates.sh` flow (no automated workflow);
references to the deleted workflows removed.
- `CLAUDE.md` notes the `RELEASE_TAG_PAT` requirement for the auto-tag →
release.yml chain (without it, the tag is created but `release.yml` does
not fire) and documents the `workflow_dispatch` parity-skip behavior.
### Docs
- `docs/COMPETITIVE_ANALYSIS.md` added — capability matrix vs OpenCode and
Codex CLI, gap analysis, and recommended implementation order.
### Verification (this branch)
- `cargo fmt --all -- --check` ✓
- `cargo check --workspace --all-targets --locked` ✓
- `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` ✓
- `cargo test --workspace --all-features --locked` ✓ (1809 + supporting)
- Parity gates ✓ (snapshot, parity_protocol, parity_state)
- `cargo build --release --locked -p deepseek-tui-cli -p deepseek-tui` ✓
- Lockfile drift guard ✓
- `deepseek doctor --json` clean
- `deepseek eval` (offline harness) success=true, 0 tool errors
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>