dgf1988/codewhale

Files

T

Hunter Bown ad8064b143 chore(v0.8.8): stabilization batch — sub-agent caps, mutex contention, RLM polish, CI cleanup

Bundles the v0.8.8 stabilization fixes that were already implemented in the
working tree, plus the workflow/doc reconciliation called out in #507.

### Sub-agent runtime fixes
- **#509** Default sub-agent cap raised to 10 (configurable via
  `[subagents].max_concurrent` in `config.toml`, hard ceiling 20). The
  running-count calculation now ignores non-running, no-handle, and finished
  handles so completed agents stop counting against the cap.
- **#510** `SharedSubAgentManager` is now `Arc<RwLock<...>>`; the read paths
  that previously held a `Mutex` for inspection now take a read lock,
  eliminating the multi-agent fan-out UI freeze.
- **#511** `compact_tool_result_for_context` summarizes `agent_result` /
  `agent_wait` payloads before they are folded into the parent context.
- **#512** RLM tool cards map to `ToolFamily::Rlm` and render `rlm`, not
  `swarm`. Stale "swarm" wording cleaned in docs/comments/tests.
- **#513** (foreground stopgap only) Foreground RLM work is visible in the
  Agents sidebar projection. Full async RLM lifecycle remains v0.8.9 — the
  issue stays open with a refined scope.

### TUI / UX fixes
- **#487** Offline composer queue is now session-scoped; legacy unscoped
  queues fail closed.
- **#488** Composer Option+Backspace deletes by word; cross-platform key
  routing helpers added.
- **#443/#444** Keyboard enhancement flags pop on normal AND panic exit; the
  raw-mode startup probe is now bounded by a configurable timeout.
- **#449** Production footer reads statusline colors from `app.ui_theme`
  rather than the bespoke palette.
- **#506** `display_path_with_home` no longer mutates `HOME` in tests; the
  flake on shared-env CI is gone.

### Self-update / packaging
- **#503** `update.rs` arch mapping uses release-asset naming (`arm64`/`x64`)
  instead of the raw Rust constants. The platform-asset selector also rejects
  `.sha256` siblings as primary binaries. Tests now live alongside the source
  in `mod tests` (the `#[path]`-based integration test was removed because it
  duplicated test runs and forced a `pub(crate)` helper that no real caller
  used).
- **`Max 5 in flight` wording updated** in `agent_spawn` description,
  `prompts/base.md`, and `docs/TOOL_SURFACE.md` so the model sees the real
  default cap (10) and the configuration knob name.

### CI / release docs (#507)
- Pruned three duplicated/dead workflows: `crates-publish.yml`, `parity.yml`,
  `publish-npm.yml`. Their gates already run in `ci.yml` for every push/PR.
- `release.yml` build job now allows `parity` to be skipped (it only runs on
  tag push), unblocking `workflow_dispatch` reruns. The job still fails
  closed on a real parity failure.
- `RELEASE_RUNBOOK.md` reconciled: crate publishing is documented as the
  manual `scripts/release/publish-crates.sh` flow (no automated workflow);
  references to the deleted workflows removed.
- `CLAUDE.md` notes the `RELEASE_TAG_PAT` requirement for the auto-tag →
  release.yml chain (without it, the tag is created but `release.yml` does
  not fire) and documents the `workflow_dispatch` parity-skip behavior.

### Docs
- `docs/COMPETITIVE_ANALYSIS.md` added — capability matrix vs OpenCode and
  Codex CLI, gap analysis, and recommended implementation order.

### Verification (this branch)
- `cargo fmt --all -- --check` ✓
- `cargo check --workspace --all-targets --locked` ✓
- `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` ✓
- `cargo test --workspace --all-features --locked` ✓ (1809 + supporting)
- Parity gates ✓ (snapshot, parity_protocol, parity_state)
- `cargo build --release --locked -p deepseek-tui-cli -p deepseek-tui` ✓
- Lockfile drift guard ✓
- `deepseek doctor --json` clean
- `deepseek eval` (offline harness) success=true, 0 tool errors

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 01:57:37 -05:00

19 KiB

Raw Blame History

Competitive Analysis: DeepSeek TUI vs OpenCode vs Codex CLI

Analysis of capabilities across three AI coding agents: OpenCode (/Volumes/VIXinSSD/opencode), Codex CLI (/Volumes/VIXinSSD/codex-main), and DeepSeek TUI (/Volumes/VIXinSSD/deepseek-tui).

Tool Matrix

Capability	OpenCode	Codex CLI	DeepSeek TUI
File read	✅ Read	✅	✅ file
File write	✅ Write	✅	✅ file
File edit	✅ Edit (string replace)	✅ apply_patch (diff format)	✅ edit_file + apply_patch
File glob	✅ Glob	✅	✅ file_search
Code search	✅ Grep + CodeSearch (Exa)	✅	✅ grep_files + search
Shell exec	✅ Bash	✅ exec/shell	✅ shell
Web fetch	✅ WebFetch	✅	✅ fetch_url
Web search	✅ WebSearch	✅ WebSearchRequest	✅ web_search
Web browse	❌	❌	✅ web_run
LSP	✅ Lsp (experimental)	❌	❌
Task/todo tracking	✅ TodoWrite	✅	✅ todo_write
Subagent spawn	✅ Task	✅ Collab/SpawnCsv	✅ agent_spawn
Skill system	✅ Skill (multi-location discovery)	✅ core-skills	⚠️ Partial (.deepseek/skills/)
Plan mode	✅ plan-enter/exit	✅ Plan mode	✅ Plan mode
User question	✅ Question	✅ request_user_input	✅ user_input
Patch apply	✅ apply_patch (custom format)	✅ apply_patch (diff format)	✅ apply_patch
Data validation	❌	❌	✅ validate_data
Finance	❌	❌	✅ finance
Git ops	Via Bash tool	✅ git-utils	✅ git module
GitHub ops	Via Bash (gh)	✅	✅ github
Test running	❌	✅	✅ test_runner
Automation	❌	❌	✅ automation
Code review	❌	✅ GuardianApproval	✅ review
Recall/archive	❌	❌	✅ recall_archive
Diagnostics	❌	✅	✅ diagnostics
Revert turn	❌	❌	✅ revert_turn
Image generation	❌	✅ ImageGeneration	❌
Browser use	❌	✅ BrowserUse	❌ (web_run is headless)
Computer use	❌	✅ ComputerUse	❌
Realtime voice	❌	✅ RealtimeConversation	❌

High Priority Gaps

These are capabilities that would most directly improve DeepSeek TUI's effectiveness as a coding agent.

1. LSP Integration

What it is: A model-callable tool that queries Language Server Protocol servers for code intelligence — go-to-definition, find references, hover (type info), document symbols, workspace symbols, call hierarchy, and implementations.

Why it matters: The single biggest capability gap. Every codebase exploration currently costs shell rg calls and sequential file reads. With LSP, the agent can jump to definitions, find all callers of a function, and inspect types in a single tool call. Estimated 30–50% reduction in exploration turns for structured codebases.

OpenCode implementation: packages/opencode/src/tool/lsp.ts exposes nine operations with file/line/character parameters. The tool prompts are in tool/lsp.txt. LSP servers must be configured per file type.

Supported operations:
- goToDefinition
- findReferences  
- hover
- documentSymbol
- workspaceSymbol
- goToImplementation
- prepareCallHierarchy
- incomingCalls
- outgoingCalls

What DeepSeek TUI would need: A new lsp.rs tool in crates/tui/src/tools/, integration with tower-lsp or lsp-server crate, and per-language server configuration.

2. Granular Permission System

What it is: Allow/deny/ask rules keyed on tool name × file path pattern, with wildcard support, home-directory expansion, and cascading to pending requests.

Why it matters: The current all-or-nothing approval model creates friction. Users can't express "always allow reads in src/ but always ask for .env files." The ability to permanently approve a pattern reduces approval fatigue by 60–80% over a long session.

OpenCode implementation: packages/opencode/src/permission/index.ts implements:

Action: allow | deny | ask
Rule: { permission: string, pattern: string, action: Action }
Ruleset: ordered list of rules with last-match-wins semantics
Pattern expansion for ~/, $HOME/
Wildcard matching on both permission names and path patterns
Reply modes: once (approve this one call), always (approve pattern forever), reject (deny this one)
Automatic cascading: an "always" reply auto-resolves pending requests for the same session
Distinct error types: DeniedError (rule-based), RejectedError (user said no), CorrectedError (user said no with feedback)

Agent definitions inherit permission rulesets that can be user-overridden:

build: {
  permission: merge(defaults, { question: "allow", plan_enter: "allow" }, user),
}
plan: {
  permission: merge(defaults, { edit: { "*": "deny" } }, user),
}
explore: {
  permission: merge(defaults, { "*": "deny", grep: "allow", read: "allow", ... }, user),
}

What DeepSeek TUI would need: A permission rule engine with the same dimension (tool name × path pattern × action), persistence to disk, and hook integration so approval decisions can cascade.

3. Lifecycle Hooks

What it is: User-defined shell commands or plugin functions that fire on specific lifecycle events — before a tool executes, after it completes, when permission is requested, at session start, when the user submits a prompt, and at session stop.

Why it matters: Hooks are the escape hatch that lets users enforce invariants without polluting the system prompt. "Always run cargo fmt after writing a .rs file." "Warn me before any rm -rf." "Log every shell command to a file." They are composable, auditable, and don't consume context window tokens.

Codex CLI implementation: codex-rs/hooks/ defines six event types with typed request/response payloads:

Event	When it fires	Payload
`PreToolUse`	Before tool execution	tool name, input params, sandbox state
`PostToolUse`	After tool execution	tool name, input, success/failure, duration, output preview
`PermissionRequest`	When model requests permission	permission type, justification
`SessionStart`	New session begins	session ID, cwd, source (new/resume)
`UserPromptSubmit`	User sends a message	prompt text
`Stop`	Session ending	reason

Each hook handler supports:

matcher: optional regex to filter which tool calls trigger the hook
command: shell command to run
timeout_sec: maximum runtime
status_message: shown to the user while the hook runs
source_path + source: tracks where the hook was defined (project hooks.json, user config, plugin)
Hooks can return Success, FailedContinue, or FailedAbort (blocks the operation)

What DeepSeek TUI would need: Extend crates/hooks/ to support the full event surface, add matcher-based filtering, and provide a hooks.json discovery mechanism similar to Codex CLI's.

4. Persistent Memories

What it is: Automatic extraction of user preferences, project conventions, and past decisions from conversations, stored as retrievable memories that are injected into new sessions.

Why it matters: Across a long debugging session, the agent rediscovers the same facts: "this project uses Rust edition 2024," "tests run with cargo test --workspace," "the user prefers 4-space indentation." A memory system compounds value — each session builds on prior knowledge rather than starting from zero.

Codex CLI implementation: The MemoryTool feature (experimental, behind /experimental menu) enables:

Memory generation: the model creates structured memories from conversation content
Memory retrieval: relevant memories are injected into new conversation context
The Chronicle feature adds passive screen-context memories via a sidecar process
Memories are stored in SQLite and surfaced in the TUI via /memories command

What DeepSeek TUI would need: A memory extraction prompt, a vector or keyword-based retrieval system, and storage in the existing session/state infrastructure.

5. Skill Auto-Discovery

What it is: Automatic scanning of multiple locations for SKILL.md files that provide domain-specific instructions, scripts, and references. Skills are injected into the conversation on demand via a skill tool.

Why it matters: Skills are how the community packages expertise. A "Rust refactoring" skill, a "Docker deployment" skill, a "GitHub Actions" skill — each provides specialized instructions without bloating the main system prompt. OpenCode's multi-location discovery means skills can be project-local, user-global, or pulled from URLs.

OpenCode implementation: packages/opencode/src/skill/index.ts scans:

~/.claude/skills/**/SKILL.md (Claude Code compatibility)
~/.agents/skills/**/SKILL.md (Agents SDK compatibility)
Parent directories from cwd to workspace root for .claude/skills/ and .agents/skills/
Project config directories for {skill,skills}/**/SKILL.md
User-configured paths (with ~/ expansion)
User-configured URLs (pulled via discovery module)

Skills are parsed for YAML frontmatter (name, description) and Markdown content. Duplicate names warn but don't error. Skills respect agent permissions — an agent can only load skills its permission ruleset allows.

What DeepSeek TUI would need: Extend the existing ~/.deepseek/skills/ discovery to parent-directory walking, Claude Code compatibility paths, and URL-based skill sources. Add YAML frontmatter parsing.

Medium Priority Gaps

These would meaningfully improve the agent experience but are less urgent.

6. Agent Profiles with Permission Inheritance

What it is: Named agent types (build, plan, general, explore) that inherit different tool permission sets. Users can define custom agents with specific models, temperatures, system prompts, and permission rules.

OpenCode implementation: packages/opencode/src/agent/agent.ts:

build: full-access with ask on sensitive paths
plan: all edit tools denied, plan-exit allowed, plan file writes in .opencode/plans/ allowed
general: subagent-only, todo-write denied
explore: read-only, grep/glob/read/bash/webfetch/websearch allowed
Plus hidden agents for internal tasks (compaction, title generation, summarization)

Each agent carries its own model, temperature, topP, prompt, and permission ruleset. A generate function creates new agent configs dynamically from user descriptions.

What DeepSeek TUI would need: Extend the mode system (Plan/Agent/YOLO) to support named agent profiles with per-profile tool filtering and model configuration.

7. Shell Sandboxing

What it is: OS-level sandbox enforcement for shell commands — network restrictions, filesystem read-only mounts, allowed/disallowed paths.

Codex CLI implementation: codex-rs/sandboxing/:

macOS: Seatbelt (sandboxing/src/seatbelt.rs) with .sbpl policy files
Linux: bubblewrap (default) or Landlock (legacy fallback)
Windows: restricted token
Configurable sandbox policies per command
Integration tests can detect they're running under sandbox and early-exit

What DeepSeek TUI would need: Extend crates/execpolicy/ to support platform-specific sandbox enforcement. Start with macOS Seatbelt (most DeepSeek TUI users are on macOS).

8. Tool Search / Deferred MCP Tool Exposure

What it is: Instead of dumping all MCP tools into the system prompt (bloating context), expose a tool_search function that the model calls to discover relevant tools by name or description.

Codex CLI implementation: ToolSearch feature (stable, default-enabled). ToolSearchAlwaysDeferMcpTools goes further — never exposes MCP tools directly, always requires search. This is critical when MCP servers expose hundreds of tools.

What DeepSeek TUI would need: tool_search_tool_regex and tool_search_tool_bm25 already exist as deferred tool discovery mechanisms. Extend them to gate MCP tool exposure behind on-demand search.

9. ExecPolicy / Command Approval Rules

What it is: A policy engine that evaluates shell commands against user-defined rules — prefix allowlists, network restrictions, pattern matching — and auto-approves, denies, or escalates.

Codex CLI implementation: codex-rs/execpolicy/src/:

Policy: ordered list of Rule entries
Rule: prefix patterns (e.g., allow cargo build*, deny rm *)
NetworkRule: protocol-level network restrictions
MatchOptions: controls rule evaluation behavior
Evaluation: result of policy evaluation against a command

Rules can be amended at runtime via blocking_append_allow_prefix_rule.

What DeepSeek TUI would need: Extend crates/execpolicy/ to support prefix rules, network rules, and runtime policy amendments.

10. Dynamic Agent Generation

What it is: On-the-fly generation of new agent configurations from natural language descriptions.

OpenCode implementation: The generate function in agent.ts takes a description like "code reviewer that only reads files and reports issues" and returns an { identifier, whenToUse, systemPrompt } object using a structured LLM call. Generated agents respect existing agent name collisions.

What DeepSeek TUI would need: A model-callable tool or slash command that generates agent configs from descriptions and registers them for the session.

11. Streaming Patch Events

What it is: Structured progress events streamed while the model is generating apply_patch input, giving the user real-time feedback on what files will change.

Codex CLI implementation: ApplyPatchStreamingEvents feature (under development) streams file-level progress as the model produces patch hunks. The StreamingPatchParser in apply-patch/src/streaming_parser.rs handles incremental parsing.

What DeepSeek TUI would need: Extend apply_patch.rs to emit progress events during streaming model output.

Lower Priority Gaps

Specialized features that are valuable but less critical for core coding workflow.

Capability	Where	Notes
Image Generation	Codex CLI `ImageGeneration`	Niche for coding; useful for documentation diagrams
Browser Use	Codex CLI `BrowserUse`	Interactive browser automation (click, type, screenshot). DeepSeek TUI has `web_run` for headless
Computer Use	Codex CLI `ComputerUse`	Full desktop automation. Desktop-app-gated
Realtime Voice	Codex CLI `RealtimeConversation`	Voice conversation mode. Experimental
Unified PTY Exec	Codex CLI `UnifiedExec`	Single PTY-backed shell with state snapshotting across turns
Artifacts	Codex CLI `Artifact`	Native artifact rendering tools
Goals	Codex CLI `Goals`	Persistent thread goals that survive compaction and session restarts
Git Commit Attribution	Codex CLI `CodexGitCommit`	Model instructions for proper commit attribution
CSV Agent Spawning	Codex CLI `SpawnCsv`	CSV-backed parallel agent job distribution
Shell Snapshotting	Codex CLI `ShellSnapshot`	Save/restore shell state across turns
Prevent Idle Sleep	Codex CLI `PreventIdleSleep`	Keep machine awake during long-running agent tasks

Architectural Patterns

OpenCode

Client/Server Architecture: The TUI is one client; the server can be driven remotely from a mobile app, desktop app, or web console. This decouples the agent runtime from the UI layer.

Plugin System: packages/opencode/src/plugin/ supports hot-loadable JS/TS plugins that add tools, models, auth providers, and chat middleware. Plugins receive a typed context with tool execution, auth, and filesystem access.

Multi-Provider: Not coupled to any single AI provider. Models are configured with provider IDs and resolved through a provider registry. OAuth support for OpenAI Codex (ChatGPT subscription integration) in plugin/codex.ts.

Config Layering: Config is loaded from multiple sources (global, project, env vars) and merged with well-defined precedence.

Codex CLI

App-Server Protocol: codex-rs/app-server-protocol/ defines a versioned RPC protocol (v2) between the TUI frontend and the agent backend. All new API development goes through v2 with strict naming conventions (*Params/*Response/*Notification, resource/method RPC naming).

Feature Flag System: codex-rs/features/ centralizes 60+ feature flags with lifecycle stages (UnderDevelopment, Experimental, Stable, Deprecated, Removed). Features have metadata (menu name, description, announcement text) and can carry custom config structs.

Bazel + Cargo Dual Build: Codex CLI uses both Cargo (for development) and Bazel (for CI/release). The find_resource! macro and cargo_bin() helper abstract over runfile differences.

Snapshot Testing: codex-rs/tui/ extensively uses insta for UI snapshot tests. Any UI change requires corresponding snapshot coverage.

Core Modularity: Explicit resistance to adding code to codex-core. New functionality goes into purpose-built crates (codex-apply-patch, codex-memories, codex-sandboxing) rather than growing the core crate.

DeepSeek TUI

RLM (Recursive Language Model): Unique in this space. A sandboxed Python REPL where a sub-LLM can call helpers (llm_query, llm_query_batched, rlm_query) for batch processing, chunking, and recursive critique. Neither competitor has an equivalent.

Durable Tasks: Restart-aware persistent task objects with evidence tracking (gate runs, PR attempts, timeline). Designed for long-running autonomous work that survives restarts.

Automations: Scheduled recurring tasks with cron-style RRULE recurrence. Unique among the three.

What DeepSeek TUI Already Excels At

RLM — batch/bulk LLM processing in a Python sandbox; no equivalent in either competitor
Finance — live stock/crypto quotes; unique in this space
Automations — scheduled recurring tasks with cron rules
Durable tasks — restart-aware with evidence tracking and gate verification
Turn revert — undo workspace changes per turn via side-git snapshots
Data validation — JSON/TOML validation tool
Web run — headless browser interaction (Codex CLI has Browser Use but it's desktop-app-gated)
Parallel tool execution — explicitly modeled as infrastructure
Git/GitHub operations — comprehensive git module with blame, log, diff, status plus full GitHub API via gh
Project map — high-level project structure generation

Recommended Implementation Order

LSP tool — single biggest capability gap. Estimated 30–50% reduction in codebase exploration turns.
Path-pattern permissions — reduces approval fatigue by 60–80% over long sessions.
Persistent memory — compounds value across sessions; foundational for long-running projects.
Pre/Post-tool-use hooks — escape hatch for user-defined guardrails without system prompt bloat.
Skill auto-discovery — enables community skill ecosystem and Claude Code compatibility.
Agent profiles — named agent types with model/permission inheritance.
Tool search for MCP — keeps context window manageable when connecting to MCP servers with many tools.
Shell sandboxing — security improvement, starting with macOS Seatbelt.

19 KiB Raw Blame History Unescape Escape