diff --git a/.gitignore b/.gitignore index c1cef6b6..d50dd537 100644 --- a/.gitignore +++ b/.gitignore @@ -51,6 +51,9 @@ todo*.md CLAUDE.md NEXT_SESSION.md AI_HANDOFF.md +result.json +count_deps.py +project_overhaul_prompt.md .codex/ docs/rlm-paper.txt diff --git a/AGENTS.md b/AGENTS.md index 2fdcea4d..89cc99b0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -48,6 +48,10 @@ For complex, multi-step tasks, you should delegate work: +- **Finance tool currently unavailable**: The finance tool relies on Stooq which frequently returns no data. As a workaround, use `web.run` to fetch financial data from web sources. +- **Token/cost tracking inaccuracies**: Token counting and cost estimation may be inflated due to thinking token accounting bugs. Use `/compact` to manage context, and treat cost estimates as approximate. +- **Web.run tool name**: Note that the tool is named `web.run` (single dot), not `web..run`. Some earlier versions of the CLI may have had this typo. + ### DeepSeek-Specific Capabilities This project is built specifically for DeepSeek models, leveraging their unique features: diff --git a/Cargo.lock b/Cargo.lock index 1c5ca08b..13c585d5 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -726,7 +726,7 @@ dependencies = [ [[package]] name = "deepseek-tui" -version = "0.3.17" +version = "0.3.18" dependencies = [ "anyhow", "arboard", diff --git a/Cargo.toml b/Cargo.toml index f0ee13fa..fb35f68f 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "deepseek-tui" -version = "0.3.17" +version = "0.3.18" edition = "2024" description = "Unofficial DeepSeek CLI - Just run 'deepseek' to start chatting" license = "MIT" diff --git a/PARITY.md b/PARITY.md deleted file mode 100644 index 58c793e9..00000000 --- a/PARITY.md +++ /dev/null @@ -1,94 +0,0 @@ -# Parity Spec v2: Codex Harness (2026-02-03) - -This document defines parity between DeepSeek CLI (this repo) and the Codex -harness used by this environment. It is intentionally concrete and testable. - -## Scope - -Parity is evaluated across: - -- Tool surface (capabilities and availability) -- Behavioral protocol (when and how tools are used, reporting rules) -- UX/workflow (approvals, prompts, and interaction flows) - -## Non-goals - -- OAuth or vendor-specific auth flows -- Model quality or response style beyond defined behavioral rules -- Exact tool names when equivalent capabilities exist - -## Baseline: Codex Harness Capabilities - -The Codex harness baseline (as of 2026-02-03) includes: - -- File ops: read/write/edit/patch -- Shell execution with streaming and optional PTY input -- Web browsing via `web.run` (search/open/click/find/screenshot) -- Structured data tools: weather, finance, sports, time, calculator -- Image search via `image_query` -- Multi-tool parallel execution wrapper -- User-input prompts (multiple-choice + free-form) -- MCP resource listing/reading and prompt retrieval -- Sub-agent control (spawn, send_input, wait, close) -- Planning tool (`update_plan`) - -## Tool Surface Parity Matrix - -| Capability | Codex Harness | DeepSeek CLI (current) | Status | Notes | -| --- | --- | --- | --- | --- | -| File ops | read/write/edit/list | read_file/write_file/edit_file/list_dir | Parity | - | -| Patch apply | apply_patch | apply_patch | Parity | - | -| Code search | rg via shell | grep_files, file_search, exec_shell | Parity | - | -| Shell exec | exec_command + write_stdin | exec_shell | Parity | PTY + stdin streaming via exec_shell_wait/exec_shell_interact | -| Web search/browse | web.run (search/open/click/find/screenshot) | web.run + web_search | Partial | web.run implemented; citation placement + quote limits enforced via prompts (no word-limit enforcement) | -| Image search | image_query | web.run image_query | Parity | DuckDuckGo image search via web.run.image_query | -| Structured data | weather/finance/sports/time/calculator | weather/finance/sports/time/calculator | Partial | Uses public data sources; coverage may vary by league/market | -| Multi-tool parallel | multi_tool_use.parallel | multi_tool_use.parallel | Partial | Read-only tools plus safe MCP meta tools (list/read/get prompt) | -| User input tool | request_user_input | request_user_input | Parity | - | -| MCP resources | list/read resources + get prompt | list_mcp_resources, list_mcp_resource_templates, mcp_read_resource, mcp_get_prompt | Parity | - | -| Sub-agents | spawn/send_input/wait/close | agent_spawn/send_input/wait/agent_cancel/agent_list/agent_swarm | Partial | send_input/wait added; close maps to agent_cancel | -| Planning tool | update_plan | update_plan | Parity | - | - -## Behavioral Protocol Parity - -Codex harness requires these behaviors to be enforced by prompts or code: - -- Instruction hierarchy and scope compliance (AGENTS.md, user constraints) -- Use web tools for time-sensitive or uncertain facts, with citations -- Dedicated tools for weather/finance/sports/time when asked -- Citation format and placement rules, including quote limits -- Use plan tool for multi-step tasks and update after steps -- Report validation commands and outcomes for code changes -- Avoid destructive git commands unless explicitly requested - -These rules are parity-critical even when tool surface is similar. - -Citation format (current): `[cite:ref_id]` using the `ref_id` returned by `web.run`. - -## UX/Workflow Parity Targets - -- Approval gating for file writes and shell execution -- Trust/workspace boundary controls -- Tool-call progress and results surfaced in the UI -- User input prompt UI (for request_user_input) -- Clear, reproducible reporting with clickable file references - -## Gap Backlog (Prioritized) - -1. ✅ Add image_query support (image search parity) -2. ✅ Enforce web.run citation placement/quote limits in prompts or tooling -3. ☐ Expand structured data coverage for edge leagues/markets -4. ✅ Allow multi_tool_use.parallel to include MCP tools (where safe) - -## Parity Gates (Acceptance) - -Hard gates: - -- Tool surface gaps 1-4 closed -- No destructive git commands on eval tasks -- Validation commands executed and reported - -Soft gates: - -- Parity score >= 0.8 across the matrix -- UX parity items covered in at least 2 eval tasks each diff --git a/README.md b/README.md index af2c7eb6..47afeaba 100644 --- a/README.md +++ b/README.md @@ -95,39 +95,57 @@ Override approval behavior at runtime: `/set approval_mode auto|suggest|never`. ## Tools -The model has access to 25+ tools across these categories: +The model has access to 30+ tools across these categories: ### File Operations - `list_dir` / `read_file` / `write_file` / `edit_file` — basic file I/O within the workspace - `apply_patch` — apply unified diffs with fuzzy matching - `grep_files` / `file_search` — search files by regex or name +- `git_status` / `git_diff` — inspect repository status and changes ### Shell Execution - `exec_shell` — run commands with timeout support and background execution -- `exec_shell_wait` / `exec_shell_interact` — wait on or send input to running commands +- `exec_shell_wait` / `exec_wait`, `exec_shell_interact` / `exec_interact` — wait on or send input to running commands -### Web -- `web.run` — multi-command browser (search / open / click / find / screenshot / image_query) with citation support +### Web & Browsing +- `web.run` — multi-command browser (search / open / click / find / screenshot / image_query) with citation support. Note: the tool name is `web.run` (single dot), not `web..run`. - `web_search` — quick DuckDuckGo search when citations are not needed -### Task Management +### Task & Project Management - `todo_write` — create and track task lists with status - `update_plan` — structured implementation plans - `note` — persistent cross-session notes - `/task add|list|show|cancel` — persistent background task queue with timeline visibility +- `project_map` — high-level project structure visualization -### Sub-Agents -- `agent_spawn` / `agent_swarm` — launch background agents or dependency-aware swarms -- `agent_result` / `agent_list` / `agent_cancel` — manage running agents +### Code Analysis & Review +- `review` — structured code review for files, git diffs, or GitHub PRs +- `run_tests` — run `cargo test` with optional arguments +- `diagnostics` — report workspace, git, sandbox, and toolchain info + +### Sub-Agent Orchestration +- `agent_spawn` / `delegate_to_agent` — launch background agents for focused tasks +- `agent_swarm` — orchestrate multiple sub-agents with dependencies +- `agent_result` / `agent_list` / `agent_cancel` / `agent_wait` / `wait` / `send_input` — manage running agents +- `multi_tool_use.parallel` — execute multiple read-only tools in parallel ### Structured Data -- `weather` / `finance` / `sports` / `time` / `calculator` +- `weather` — daily weather forecast for a location +- `finance` — latest price for stocks, funds, indices, or cryptocurrency +- `sports` — schedules or standings for a league +- `time` — current time for a UTC offset +- `calculator` — evaluate basic arithmetic expressions ### Interaction - `request_user_input` — ask the user structured or multiple-choice questions -- `multi_tool_use.parallel` — execute multiple read-only tools in parallel -All file tools respect the `--workspace` boundary unless `/trust` is enabled (YOLO enables trust automatically). MCP tools now use the same approval pipeline as built-in tools; only trusted MCP servers should be configured. +### MCP Integration (when configured) +- `mcp_read_resource`, `mcp_get_prompt` — read context from external MCP servers +- `list_mcp_resources`, `list_mcp_resource_templates` — explore available MCP resources + +All file tools respect the `--workspace` boundary unless `/trust` is enabled (YOLO enables trust automatically). MCP tools use the same approval pipeline as built-in tools; only trusted MCP servers should be configured. + +**Note on token tracking**: DeepSeek models have a 128k context window. If token counts appear inflated (e.g., >128k), this is likely a tracking bug; use `/compact` to summarize earlier context and free up space. ## Configuration @@ -252,6 +270,8 @@ Security caveat: | Skills missing | Run `deepseek setup --skills` (add `--local` for workspace-local) | | MCP tools missing | Run `deepseek mcp init`, then restart | | Sandbox errors (macOS) | Run `deepseek doctor` to confirm sandbox availability | +| Finance tool returns no data | Currently, the finance tool relies on Stooq which may be unavailable; use `web.run` for financial data | +| Token/cost tracking inaccurate | This is a known bug; metrics are approximate. Use `/compact` to manage context | ## Documentation