Files
codewhale/docs/COMPETITIVE_ANALYSIS.md
T
Hunter Bown ad8064b143 chore(v0.8.8): stabilization batch — sub-agent caps, mutex contention, RLM polish, CI cleanup
Bundles the v0.8.8 stabilization fixes that were already implemented in the
working tree, plus the workflow/doc reconciliation called out in #507.

### Sub-agent runtime fixes
- **#509** Default sub-agent cap raised to 10 (configurable via
  `[subagents].max_concurrent` in `config.toml`, hard ceiling 20). The
  running-count calculation now ignores non-running, no-handle, and finished
  handles so completed agents stop counting against the cap.
- **#510** `SharedSubAgentManager` is now `Arc<RwLock<...>>`; the read paths
  that previously held a `Mutex` for inspection now take a read lock,
  eliminating the multi-agent fan-out UI freeze.
- **#511** `compact_tool_result_for_context` summarizes `agent_result` /
  `agent_wait` payloads before they are folded into the parent context.
- **#512** RLM tool cards map to `ToolFamily::Rlm` and render `rlm`, not
  `swarm`. Stale "swarm" wording cleaned in docs/comments/tests.
- **#513** (foreground stopgap only) Foreground RLM work is visible in the
  Agents sidebar projection. Full async RLM lifecycle remains v0.8.9 — the
  issue stays open with a refined scope.

### TUI / UX fixes
- **#487** Offline composer queue is now session-scoped; legacy unscoped
  queues fail closed.
- **#488** Composer Option+Backspace deletes by word; cross-platform key
  routing helpers added.
- **#443/#444** Keyboard enhancement flags pop on normal AND panic exit; the
  raw-mode startup probe is now bounded by a configurable timeout.
- **#449** Production footer reads statusline colors from `app.ui_theme`
  rather than the bespoke palette.
- **#506** `display_path_with_home` no longer mutates `HOME` in tests; the
  flake on shared-env CI is gone.

### Self-update / packaging
- **#503** `update.rs` arch mapping uses release-asset naming (`arm64`/`x64`)
  instead of the raw Rust constants. The platform-asset selector also rejects
  `.sha256` siblings as primary binaries. Tests now live alongside the source
  in `mod tests` (the `#[path]`-based integration test was removed because it
  duplicated test runs and forced a `pub(crate)` helper that no real caller
  used).
- **`Max 5 in flight` wording updated** in `agent_spawn` description,
  `prompts/base.md`, and `docs/TOOL_SURFACE.md` so the model sees the real
  default cap (10) and the configuration knob name.

### CI / release docs (#507)
- Pruned three duplicated/dead workflows: `crates-publish.yml`, `parity.yml`,
  `publish-npm.yml`. Their gates already run in `ci.yml` for every push/PR.
- `release.yml` build job now allows `parity` to be skipped (it only runs on
  tag push), unblocking `workflow_dispatch` reruns. The job still fails
  closed on a real parity failure.
- `RELEASE_RUNBOOK.md` reconciled: crate publishing is documented as the
  manual `scripts/release/publish-crates.sh` flow (no automated workflow);
  references to the deleted workflows removed.
- `CLAUDE.md` notes the `RELEASE_TAG_PAT` requirement for the auto-tag →
  release.yml chain (without it, the tag is created but `release.yml` does
  not fire) and documents the `workflow_dispatch` parity-skip behavior.

### Docs
- `docs/COMPETITIVE_ANALYSIS.md` added — capability matrix vs OpenCode and
  Codex CLI, gap analysis, and recommended implementation order.

### Verification (this branch)
- `cargo fmt --all -- --check` ✓
- `cargo check --workspace --all-targets --locked` ✓
- `cargo clippy --workspace --all-targets --all-features --locked -- -D warnings` ✓
- `cargo test --workspace --all-features --locked` ✓ (1809 + supporting)
- Parity gates ✓ (snapshot, parity_protocol, parity_state)
- `cargo build --release --locked -p deepseek-tui-cli -p deepseek-tui` ✓
- Lockfile drift guard ✓
- `deepseek doctor --json` clean
- `deepseek eval` (offline harness) success=true, 0 tool errors

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:57:37 -05:00

318 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Competitive Analysis: DeepSeek TUI vs OpenCode vs Codex CLI
Analysis of capabilities across three AI coding agents: OpenCode (`/Volumes/VIXinSSD/opencode`), Codex CLI (`/Volumes/VIXinSSD/codex-main`), and DeepSeek TUI (`/Volumes/VIXinSSD/deepseek-tui`).
## Tool Matrix
| Capability | OpenCode | Codex CLI | DeepSeek TUI |
|---|---|---|---|
| File read | ✅ Read | ✅ | ✅ file |
| File write | ✅ Write | ✅ | ✅ file |
| File edit | ✅ Edit (string replace) | ✅ apply_patch (diff format) | ✅ edit_file + apply_patch |
| File glob | ✅ Glob | ✅ | ✅ file_search |
| Code search | ✅ Grep + CodeSearch (Exa) | ✅ | ✅ grep_files + search |
| Shell exec | ✅ Bash | ✅ exec/shell | ✅ shell |
| Web fetch | ✅ WebFetch | ✅ | ✅ fetch_url |
| Web search | ✅ WebSearch | ✅ WebSearchRequest | ✅ web_search |
| Web browse | ❌ | ❌ | ✅ web_run |
| LSP | ✅ Lsp (experimental) | ❌ | ❌ |
| Task/todo tracking | ✅ TodoWrite | ✅ | ✅ todo_write |
| Subagent spawn | ✅ Task | ✅ Collab/SpawnCsv | ✅ agent_spawn |
| Skill system | ✅ Skill (multi-location discovery) | ✅ core-skills | ⚠️ Partial (.deepseek/skills/) |
| Plan mode | ✅ plan-enter/exit | ✅ Plan mode | ✅ Plan mode |
| User question | ✅ Question | ✅ request_user_input | ✅ user_input |
| Patch apply | ✅ apply_patch (custom format) | ✅ apply_patch (diff format) | ✅ apply_patch |
| Data validation | ❌ | ❌ | ✅ validate_data |
| Finance | ❌ | ❌ | ✅ finance |
| Git ops | Via Bash tool | ✅ git-utils | ✅ git module |
| GitHub ops | Via Bash (gh) | ✅ | ✅ github |
| Test running | ❌ | ✅ | ✅ test_runner |
| Automation | ❌ | ❌ | ✅ automation |
| Code review | ❌ | ✅ GuardianApproval | ✅ review |
| Recall/archive | ❌ | ❌ | ✅ recall_archive |
| Diagnostics | ❌ | ✅ | ✅ diagnostics |
| Revert turn | ❌ | ❌ | ✅ revert_turn |
| Image generation | ❌ | ✅ ImageGeneration | ❌ |
| Browser use | ❌ | ✅ BrowserUse | ❌ (web_run is headless) |
| Computer use | ❌ | ✅ ComputerUse | ❌ |
| Realtime voice | ❌ | ✅ RealtimeConversation | ❌ |
---
## High Priority Gaps
These are capabilities that would most directly improve DeepSeek TUI's effectiveness as a coding agent.
### 1. LSP Integration
**What it is:** A model-callable tool that queries Language Server Protocol servers for code intelligence — go-to-definition, find references, hover (type info), document symbols, workspace symbols, call hierarchy, and implementations.
**Why it matters:** The single biggest capability gap. Every codebase exploration currently costs shell `rg` calls and sequential file reads. With LSP, the agent can jump to definitions, find all callers of a function, and inspect types in a single tool call. Estimated 3050% reduction in exploration turns for structured codebases.
**OpenCode implementation:** `packages/opencode/src/tool/lsp.ts` exposes nine operations with file/line/character parameters. The tool prompts are in `tool/lsp.txt`. LSP servers must be configured per file type.
```
Supported operations:
- goToDefinition
- findReferences
- hover
- documentSymbol
- workspaceSymbol
- goToImplementation
- prepareCallHierarchy
- incomingCalls
- outgoingCalls
```
**What DeepSeek TUI would need:** A new `lsp.rs` tool in `crates/tui/src/tools/`, integration with tower-lsp or lsp-server crate, and per-language server configuration.
### 2. Granular Permission System
**What it is:** Allow/deny/ask rules keyed on tool name × file path pattern, with wildcard support, home-directory expansion, and cascading to pending requests.
**Why it matters:** The current all-or-nothing approval model creates friction. Users can't express "always allow reads in `src/` but always ask for `.env` files." The ability to permanently approve a pattern reduces approval fatigue by 6080% over a long session.
**OpenCode implementation:** `packages/opencode/src/permission/index.ts` implements:
- `Action`: `allow | deny | ask`
- `Rule`: `{ permission: string, pattern: string, action: Action }`
- `Ruleset`: ordered list of rules with last-match-wins semantics
- Pattern expansion for `~/`, `$HOME/`
- Wildcard matching on both permission names and path patterns
- Reply modes: `once` (approve this one call), `always` (approve pattern forever), `reject` (deny this one)
- Automatic cascading: an "always" reply auto-resolves pending requests for the same session
- Distinct error types: `DeniedError` (rule-based), `RejectedError` (user said no), `CorrectedError` (user said no with feedback)
Agent definitions inherit permission rulesets that can be user-overridden:
```typescript
build: {
permission: merge(defaults, { question: "allow", plan_enter: "allow" }, user),
}
plan: {
permission: merge(defaults, { edit: { "*": "deny" } }, user),
}
explore: {
permission: merge(defaults, { "*": "deny", grep: "allow", read: "allow", ... }, user),
}
```
**What DeepSeek TUI would need:** A permission rule engine with the same dimension (tool name × path pattern × action), persistence to disk, and hook integration so approval decisions can cascade.
### 3. Lifecycle Hooks
**What it is:** User-defined shell commands or plugin functions that fire on specific lifecycle events — before a tool executes, after it completes, when permission is requested, at session start, when the user submits a prompt, and at session stop.
**Why it matters:** Hooks are the escape hatch that lets users enforce invariants without polluting the system prompt. "Always run `cargo fmt` after writing a `.rs` file." "Warn me before any `rm -rf`." "Log every shell command to a file." They are composable, auditable, and don't consume context window tokens.
**Codex CLI implementation:** `codex-rs/hooks/` defines six event types with typed request/response payloads:
| Event | When it fires | Payload |
|---|---|---|
| `PreToolUse` | Before tool execution | tool name, input params, sandbox state |
| `PostToolUse` | After tool execution | tool name, input, success/failure, duration, output preview |
| `PermissionRequest` | When model requests permission | permission type, justification |
| `SessionStart` | New session begins | session ID, cwd, source (new/resume) |
| `UserPromptSubmit` | User sends a message | prompt text |
| `Stop` | Session ending | reason |
Each hook handler supports:
- `matcher`: optional regex to filter which tool calls trigger the hook
- `command`: shell command to run
- `timeout_sec`: maximum runtime
- `status_message`: shown to the user while the hook runs
- `source_path` + `source`: tracks where the hook was defined (project hooks.json, user config, plugin)
- Hooks can return `Success`, `FailedContinue`, or `FailedAbort` (blocks the operation)
**What DeepSeek TUI would need:** Extend `crates/hooks/` to support the full event surface, add matcher-based filtering, and provide a `hooks.json` discovery mechanism similar to Codex CLI's.
### 4. Persistent Memories
**What it is:** Automatic extraction of user preferences, project conventions, and past decisions from conversations, stored as retrievable memories that are injected into new sessions.
**Why it matters:** Across a long debugging session, the agent rediscovers the same facts: "this project uses Rust edition 2024," "tests run with `cargo test --workspace`," "the user prefers 4-space indentation." A memory system compounds value — each session builds on prior knowledge rather than starting from zero.
**Codex CLI implementation:** The `MemoryTool` feature (experimental, behind `/experimental` menu) enables:
- Memory generation: the model creates structured memories from conversation content
- Memory retrieval: relevant memories are injected into new conversation context
- The `Chronicle` feature adds passive screen-context memories via a sidecar process
- Memories are stored in SQLite and surfaced in the TUI via `/memories` command
**What DeepSeek TUI would need:** A memory extraction prompt, a vector or keyword-based retrieval system, and storage in the existing session/state infrastructure.
### 5. Skill Auto-Discovery
**What it is:** Automatic scanning of multiple locations for `SKILL.md` files that provide domain-specific instructions, scripts, and references. Skills are injected into the conversation on demand via a `skill` tool.
**Why it matters:** Skills are how the community packages expertise. A "Rust refactoring" skill, a "Docker deployment" skill, a "GitHub Actions" skill — each provides specialized instructions without bloating the main system prompt. OpenCode's multi-location discovery means skills can be project-local, user-global, or pulled from URLs.
**OpenCode implementation:** `packages/opencode/src/skill/index.ts` scans:
1. `~/.claude/skills/**/SKILL.md` (Claude Code compatibility)
2. `~/.agents/skills/**/SKILL.md` (Agents SDK compatibility)
3. Parent directories from cwd to workspace root for `.claude/skills/` and `.agents/skills/`
4. Project config directories for `{skill,skills}/**/SKILL.md`
5. User-configured paths (with `~/` expansion)
6. User-configured URLs (pulled via discovery module)
Skills are parsed for YAML frontmatter (`name`, `description`) and Markdown content. Duplicate names warn but don't error. Skills respect agent permissions — an agent can only load skills its permission ruleset allows.
**What DeepSeek TUI would need:** Extend the existing `~/.deepseek/skills/` discovery to parent-directory walking, Claude Code compatibility paths, and URL-based skill sources. Add YAML frontmatter parsing.
---
## Medium Priority Gaps
These would meaningfully improve the agent experience but are less urgent.
### 6. Agent Profiles with Permission Inheritance
**What it is:** Named agent types (build, plan, general, explore) that inherit different tool permission sets. Users can define custom agents with specific models, temperatures, system prompts, and permission rules.
**OpenCode implementation:** `packages/opencode/src/agent/agent.ts`:
- `build`: full-access with ask on sensitive paths
- `plan`: all edit tools denied, plan-exit allowed, plan file writes in `.opencode/plans/` allowed
- `general`: subagent-only, todo-write denied
- `explore`: read-only, grep/glob/read/bash/webfetch/websearch allowed
- Plus hidden agents for internal tasks (compaction, title generation, summarization)
Each agent carries its own `model`, `temperature`, `topP`, `prompt`, and `permission` ruleset. A `generate` function creates new agent configs dynamically from user descriptions.
**What DeepSeek TUI would need:** Extend the mode system (Plan/Agent/YOLO) to support named agent profiles with per-profile tool filtering and model configuration.
### 7. Shell Sandboxing
**What it is:** OS-level sandbox enforcement for shell commands — network restrictions, filesystem read-only mounts, allowed/disallowed paths.
**Codex CLI implementation:** `codex-rs/sandboxing/`:
- macOS: Seatbelt (`sandboxing/src/seatbelt.rs`) with `.sbpl` policy files
- Linux: bubblewrap (default) or Landlock (legacy fallback)
- Windows: restricted token
- Configurable sandbox policies per command
- Integration tests can detect they're running under sandbox and early-exit
**What DeepSeek TUI would need:** Extend `crates/execpolicy/` to support platform-specific sandbox enforcement. Start with macOS Seatbelt (most DeepSeek TUI users are on macOS).
### 8. Tool Search / Deferred MCP Tool Exposure
**What it is:** Instead of dumping all MCP tools into the system prompt (bloating context), expose a `tool_search` function that the model calls to discover relevant tools by name or description.
**Codex CLI implementation:** `ToolSearch` feature (stable, default-enabled). `ToolSearchAlwaysDeferMcpTools` goes further — never exposes MCP tools directly, always requires search. This is critical when MCP servers expose hundreds of tools.
**What DeepSeek TUI would need:** `tool_search_tool_regex` and `tool_search_tool_bm25` already exist as deferred tool discovery mechanisms. Extend them to gate MCP tool exposure behind on-demand search.
### 9. ExecPolicy / Command Approval Rules
**What it is:** A policy engine that evaluates shell commands against user-defined rules — prefix allowlists, network restrictions, pattern matching — and auto-approves, denies, or escalates.
**Codex CLI implementation:** `codex-rs/execpolicy/src/`:
- `Policy`: ordered list of `Rule` entries
- `Rule`: prefix patterns (e.g., allow `cargo build*`, deny `rm *`)
- `NetworkRule`: protocol-level network restrictions
- `MatchOptions`: controls rule evaluation behavior
- `Evaluation`: result of policy evaluation against a command
Rules can be amended at runtime via `blocking_append_allow_prefix_rule`.
**What DeepSeek TUI would need:** Extend `crates/execpolicy/` to support prefix rules, network rules, and runtime policy amendments.
### 10. Dynamic Agent Generation
**What it is:** On-the-fly generation of new agent configurations from natural language descriptions.
**OpenCode implementation:** The `generate` function in `agent.ts` takes a description like "code reviewer that only reads files and reports issues" and returns an `{ identifier, whenToUse, systemPrompt }` object using a structured LLM call. Generated agents respect existing agent name collisions.
**What DeepSeek TUI would need:** A model-callable tool or slash command that generates agent configs from descriptions and registers them for the session.
### 11. Streaming Patch Events
**What it is:** Structured progress events streamed while the model is generating `apply_patch` input, giving the user real-time feedback on what files will change.
**Codex CLI implementation:** `ApplyPatchStreamingEvents` feature (under development) streams file-level progress as the model produces patch hunks. The `StreamingPatchParser` in `apply-patch/src/streaming_parser.rs` handles incremental parsing.
**What DeepSeek TUI would need:** Extend `apply_patch.rs` to emit progress events during streaming model output.
---
## Lower Priority Gaps
Specialized features that are valuable but less critical for core coding workflow.
| Capability | Where | Notes |
|---|---|---|
| Image Generation | Codex CLI `ImageGeneration` | Niche for coding; useful for documentation diagrams |
| Browser Use | Codex CLI `BrowserUse` | Interactive browser automation (click, type, screenshot). DeepSeek TUI has `web_run` for headless |
| Computer Use | Codex CLI `ComputerUse` | Full desktop automation. Desktop-app-gated |
| Realtime Voice | Codex CLI `RealtimeConversation` | Voice conversation mode. Experimental |
| Unified PTY Exec | Codex CLI `UnifiedExec` | Single PTY-backed shell with state snapshotting across turns |
| Artifacts | Codex CLI `Artifact` | Native artifact rendering tools |
| Goals | Codex CLI `Goals` | Persistent thread goals that survive compaction and session restarts |
| Git Commit Attribution | Codex CLI `CodexGitCommit` | Model instructions for proper commit attribution |
| CSV Agent Spawning | Codex CLI `SpawnCsv` | CSV-backed parallel agent job distribution |
| Shell Snapshotting | Codex CLI `ShellSnapshot` | Save/restore shell state across turns |
| Prevent Idle Sleep | Codex CLI `PreventIdleSleep` | Keep machine awake during long-running agent tasks |
---
## Architectural Patterns
### OpenCode
**Client/Server Architecture:** The TUI is one client; the server can be driven remotely from a mobile app, desktop app, or web console. This decouples the agent runtime from the UI layer.
**Plugin System:** `packages/opencode/src/plugin/` supports hot-loadable JS/TS plugins that add tools, models, auth providers, and chat middleware. Plugins receive a typed context with tool execution, auth, and filesystem access.
**Multi-Provider:** Not coupled to any single AI provider. Models are configured with provider IDs and resolved through a provider registry. OAuth support for OpenAI Codex (ChatGPT subscription integration) in `plugin/codex.ts`.
**Config Layering:** Config is loaded from multiple sources (global, project, env vars) and merged with well-defined precedence.
### Codex CLI
**App-Server Protocol:** `codex-rs/app-server-protocol/` defines a versioned RPC protocol (v2) between the TUI frontend and the agent backend. All new API development goes through v2 with strict naming conventions (`*Params`/`*Response`/`*Notification`, `resource/method` RPC naming).
**Feature Flag System:** `codex-rs/features/` centralizes 60+ feature flags with lifecycle stages (UnderDevelopment, Experimental, Stable, Deprecated, Removed). Features have metadata (menu name, description, announcement text) and can carry custom config structs.
**Bazel + Cargo Dual Build:** Codex CLI uses both Cargo (for development) and Bazel (for CI/release). The `find_resource!` macro and `cargo_bin()` helper abstract over runfile differences.
**Snapshot Testing:** `codex-rs/tui/` extensively uses `insta` for UI snapshot tests. Any UI change requires corresponding snapshot coverage.
**Core Modularity:** Explicit resistance to adding code to `codex-core`. New functionality goes into purpose-built crates (`codex-apply-patch`, `codex-memories`, `codex-sandboxing`) rather than growing the core crate.
### DeepSeek TUI
**RLM (Recursive Language Model):** Unique in this space. A sandboxed Python REPL where a sub-LLM can call helpers (`llm_query`, `llm_query_batched`, `rlm_query`) for batch processing, chunking, and recursive critique. Neither competitor has an equivalent.
**Durable Tasks:** Restart-aware persistent task objects with evidence tracking (gate runs, PR attempts, timeline). Designed for long-running autonomous work that survives restarts.
**Automations:** Scheduled recurring tasks with cron-style RRULE recurrence. Unique among the three.
---
## What DeepSeek TUI Already Excels At
- **RLM** — batch/bulk LLM processing in a Python sandbox; no equivalent in either competitor
- **Finance** — live stock/crypto quotes; unique in this space
- **Automations** — scheduled recurring tasks with cron rules
- **Durable tasks** — restart-aware with evidence tracking and gate verification
- **Turn revert** — undo workspace changes per turn via side-git snapshots
- **Data validation** — JSON/TOML validation tool
- **Web run** — headless browser interaction (Codex CLI has Browser Use but it's desktop-app-gated)
- **Parallel tool execution** — explicitly modeled as infrastructure
- **Git/GitHub operations** — comprehensive git module with blame, log, diff, status plus full GitHub API via gh
- **Project map** — high-level project structure generation
---
## Recommended Implementation Order
1. **LSP tool** — single biggest capability gap. Estimated 3050% reduction in codebase exploration turns.
2. **Path-pattern permissions** — reduces approval fatigue by 6080% over long sessions.
3. **Persistent memory** — compounds value across sessions; foundational for long-running projects.
4. **Pre/Post-tool-use hooks** — escape hatch for user-defined guardrails without system prompt bloat.
5. **Skill auto-discovery** — enables community skill ecosystem and Claude Code compatibility.
6. **Agent profiles** — named agent types with model/permission inheritance.
7. **Tool search for MCP** — keeps context window manageable when connecting to MCP servers with many tools.
8. **Shell sandboxing** — security improvement, starting with macOS Seatbelt.