Commit Graph

13 Commits

Author SHA1 Message Date
HUQIANTAO 3de07a99ed perf(engine): memoize estimated_input_tokens via content-keyed cache
The token estimator walks the full session.messages and the active system
prompt. Five call sites per turn in the engine (capacity pre/post tool
checkpoints, error escalation, the seam manager, the trim budget check)
plus four TUI/command consumers (footer, /status, /debug, context
inspector) all re-walked the same data independently. On a 200-message
history with 5 KB of tool results that is roughly 2 ms per call, or
~20 ms of pure waste on a single turn.

Introduce a process-local TokenEstimateCache keyed on
(session.messages_revision, system_prompt_fingerprint). Repeated calls
with the same inputs return the cached value without re-walking the
message list. The cache invalidates as soon as either input changes:
  * session.messages_revision is a monotonic counter bumped in
    Session::add_message, Session::replace_messages, the new
    Session::bump_messages_revision helper, and at every direct
    session.messages mutation site in core/engine.rs and
    core/engine/capacity_flow.rs.
  * system_prompt_fingerprint is a stable 64-bit hash of the
    SystemPrompt::Text or SystemPrompt::Blocks payload.

Also restructures layered_context_checkpoint to compute the estimated
token count before taking a long-lived &SeamManager borrow, and
re-routes the capacity pre/post tool checkpoints to compute the
observation into a local before calling
capacity_controller.observe_*. Both refactors are required to satisfy
the borrow checker once estimated_input_tokens requires &mut self.

Tests: 10 new unit tests cover the miss/hit path, revision bumps,
system-prompt changes, audit-ring capacity, and downward-revision
no-ops. The full 157-test engine suite still passes.
2026-06-03 21:01:37 -07:00
Hunter Bown 06612495fc chore(release): prep v0.8.51 — Arcee provider, cycle removal, UI fixes
Release-preparation checkpoint for v0.8.51 (workspace + npm bumped to 0.8.51).

Added:
- Arcee AI direct provider: [providers.arcee], ARCEE_API_KEY/BASE_URL/MODEL,
  CLI auth, provider + model picker, registry. Default direct-API model is
  trinity-large-thinking (reasoning, 262K ctx/out); preview + mini selectable.
  Cloudflare-WAF-safe opening turn (benign read-only tool surface, system-prompt
  payload splitting) and reasoning_content replay on tool-call turns.
- Expanded model catalog (qwen3.6 flash/plus/max-preview, Xiaomi MiMo v2.5
  chat/ASR/TTS); provider-aware model picker with per-provider saved models.

Changed:
- Auto-compaction is percentage- and model-aware
  (compaction_threshold_for_model_at_percent; default 80%; auto-enable for
  <=256K windows, opt-in for 1M models).
- Provider/gateway HTTP errors sanitized (HTML/WAF interstitials collapsed,
  401/403 split into authentication vs authorization).

Removed:
- The session cycle / checkpoint-restart system: /cycles, /cycle, /recall,
  recall_archive tool, cycle_manager, cycle-handoff prompt, sidebar cycle lines,
  EngineConfig.cycle / Event::CycleAdvanced / seam cycle thresholds.

Fixed:
- Orphaned assistant 'blue dot' role glyph on whitespace-only turns.
- Sidebar mouse-wheel scroll leaking into the transcript.
- Sidebar hover tooltip overlap + warning-orange styling.
- README Constitution description corrected to match prompts/base.md.
- Repaired release-blocking unit/integration tests after the refactors.

Preflight: cargo fmt clean, workspace builds, 3903 tui tests pass (1 known
flaky MCP SSE test under parallel load, passes in isolation).
2026-06-02 17:36:18 -07:00
Hunter Bown 57c10c78d6 fix(tui): compact tool-call UI and context 2026-06-01 16:55:10 -07:00
hexin 4f3a0c3cfc feat(engine): allow DEEPSEEK_MAX_OUTPUT_TOKENS env override for tight-context providers (#2147)
The `effective_max_output_tokens` heuristic defaults to 64K for any model
not in the known-context-window table. This is fine for DeepSeek's hosted
API (1M context) but causes immediate HTTP 400s on self-hosted providers
with tight `max-model-len`.

Example: vLLM serving Qwen3.6 with `--max-model-len 65536` rejects
requests because 64000 (output) + ~1500 (input) exceeds the limit by 1
token.

This change lets the operator set `DEEPSEEK_MAX_OUTPUT_TOKENS=16384` (or
whatever fits their deployment) to override the heuristic. The env var
takes precedence over the model-table lookup when set to a positive
integer; otherwise the existing behavior is preserved.

No new config struct field — env-only override keeps the public API
unchanged. Useful for embedded users (e.g. pinvou3) who need to control
output budget without forking the engine config schema.

Co-authored-by: hexin <he.xin@h3c.com>
2026-05-26 10:31:26 -05:00
hexin 9c8e482607 fix(engine): keep auto-compaction working on sub-500K self-hosted windows
Harvested from PR #2060 by @h3c-hexin.

Co-authored-by: hexin <he.xin@h3c.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:25:58 -05:00
Hunter Bown faf5e07adb feat(tui): add Fin tool-agent and screenshot OCR 2026-05-21 10:47:14 +08:00
Hunter Bown 99c6b22e83 chore(release): v0.8.33 — sub-agent and RLM renovation with persistent sessions
- Persistent RLM sessions (rlm_open/rlm_eval/rlm_close) with bounded REPL helpers
- Fork-aware sub-agent sessions (agent_open/agent_eval/agent_close) with handle_read
- Shared handle_read storage with slice/range/count/JSONPath projections
- Slash-command routing: /rlm, /agent, /relay (/接力) for handoff prompts
- Sidebar renamed to "Work" tab, consistent across Plan/Agent/YOLO modes
- Tool papercuts: file_search excludes, grep_files strings, fetch_url JSON,
  edit_file fuzz, exec_shell merged stdout/stderr, revert_turn no-op reject
- CLI reasoning-effort honoured on non-auto exec routes (#1511 @h3c-hexin)
- Edit-file replacement boundaries clarified (#1516)
- Pandoc output validated before probing (#1523)
- Running turns steerable/repaintable (#1533, #1537)
- Tasks/Activity Detail calmer under load
- npm retry timeout hint (#1538 @reidliu41)
- Issue templates improved (#1525 @reidliu41)
- Shell: kill process group to prevent UI freeze (#828 @CrepuscularIRIS)
- TUI: ignore leaked SGR mouse reports in composer (#1421 @reidliu41)
- Footer: keep chips within available width (#1417 @Wenjunyun123)
- Session picker: scope Ctrl+R to current workspace (#1395 @LinQ)
- Removed stale competitive-analysis doc
- Prompts/docs teach only new tool names
2026-05-12 19:54:08 -05:00
THINKER_ONLY 250953ad35 feat(tools/agent_spawn): teach parent that subagent results are self-reports 2026-05-10 08:15:19 -05:00
junskyeed d4185ede5a fix(engine): cap API request max_tokens
fix(engine): cap API request max_tokens without affecting internal context budget
2026-05-06 09:59:22 -05:00
Hunter Bown e98efcf31d fix(engine): drop dead working set prompt marker 2026-05-04 22:08:07 -05:00
Hunter Bown b48b68f078 perf(engine): stabilize system prompt and move working set metadata 2026-05-04 22:06:55 -05:00
Hunter Bown ad8064b143 chore(v0.8.8): stabilization batch — sub-agent caps, mutex contention, RLM polish, CI cleanup
Bundles the v0.8.8 stabilization fixes that were already implemented in the
working tree, plus the workflow/doc reconciliation called out in #507.

### Sub-agent runtime fixes
- **#509** Default sub-agent cap raised to 10 (configurable via
  `[subagents].max_concurrent` in `config.toml`, hard ceiling 20). The
  running-count calculation now ignores non-running, no-handle, and finished
  handles so completed agents stop counting against the cap.
- **#510** `SharedSubAgentManager` is now `Arc<RwLock<...>>`; the read paths
  that previously held a `Mutex` for inspection now take a read lock,
  eliminating the multi-agent fan-out UI freeze.
- **#511** `compact_tool_result_for_context` summarizes `agent_result` /
  `agent_wait` payloads before they are folded into the parent context.
- **#512** RLM tool cards map to `ToolFamily::Rlm` and render `rlm`, not
  `swarm`. Stale "swarm" wording cleaned in docs/comments/tests.
- **#513** (foreground stopgap only) Foreground RLM work is visible in the
  Agents sidebar projection. Full async RLM lifecycle remains v0.8.9 — the
  issue stays open with a refined scope.

### TUI / UX fixes
- **#487** Offline composer queue is now session-scoped; legacy unscoped
  queues fail closed.
- **#488** Composer Option+Backspace deletes by word; cross-platform key
  routing helpers added.
- **#443/#444** Keyboard enhancement flags pop on normal AND panic exit; the
  raw-mode startup probe is now bounded by a configurable timeout.
- **#449** Production footer reads statusline colors from `app.ui_theme`
  rather than the bespoke palette.
- **#506** `display_path_with_home` no longer mutates `HOME` in tests; the
  flake on shared-env CI is gone.

### Self-update / packaging
- **#503** `update.rs` arch mapping uses release-asset naming (`arm64`/`x64`)
  instead of the raw Rust constants. The platform-asset selector also rejects
  `.sha256` siblings as primary binaries. Tests now live alongside the source
  in `mod tests` (the `#[path]`-based integration test was removed because it
  duplicated test runs and forced a `pub(crate)` helper that no real caller
  used).
- **`Max 5 in flight` wording updated** in `agent_spawn` description,
  `prompts/base.md`, and `docs/TOOL_SURFACE.md` so the model sees the real
  default cap (10) and the configuration knob name.

### CI / release docs (#507)
- Pruned three duplicated/dead workflows: `crates-publish.yml`, `parity.yml`,
  `publish-npm.yml`. Their gates already run in `ci.yml` for every push/PR.
- `release.yml` build job now allows `parity` to be skipped (it only runs on
  tag push), unblocking `workflow_dispatch` reruns. The job still fails
  closed on a real parity failure.
- `RELEASE_RUNBOOK.md` reconciled: crate publishing is documented as the
  manual `scripts/release/publish-crates.sh` flow (no automated workflow);
  references to the deleted workflows removed.
- `CLAUDE.md` notes the `RELEASE_TAG_PAT` requirement for the auto-tag →
  release.yml chain (without it, the tag is created but `release.yml` does
  not fire) and documents the `workflow_dispatch` parity-skip behavior.

### Docs
- `docs/COMPETITIVE_ANALYSIS.md` added — capability matrix vs OpenCode and
  Codex CLI, gap analysis, and recommended implementation order.

### Verification (this branch)
- `cargo fmt --all -- --check` ✓
- `cargo check --workspace --all-targets --locked` ✓
- `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` ✓
- `cargo test --workspace --all-features --locked` ✓ (1809 + supporting)
- Parity gates ✓ (snapshot, parity_protocol, parity_state)
- `cargo build --release --locked -p deepseek-tui-cli -p deepseek-tui` ✓
- Lockfile drift guard ✓
- `deepseek doctor --json` clean
- `deepseek eval` (offline harness) success=true, 0 tool errors

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:57:37 -05:00
Hunter Bown 8dd5ed38d7 refactor(engine): extract context helpers 2026-05-01 07:09:30 -05:00