chore: v0.3.18 — expand docs, remove PARITY.md, clean up gitignore

- Remove PARITY.md (no longer needed) - Expand README tools section with full tool inventory and troubleshooting entries - Add known-issue notes to AGENTS.md (finance tool, token tracking, web.run naming) - Update .gitignore with additional temp file patterns - Bump version to 0.3.18
2026-02-16 12:19:50 -06:00
parent 19a9724008
commit b556559fb4
6 changed files with 40 additions and 107 deletions
@@ -51,6 +51,9 @@ todo*.md
 CLAUDE.md
 NEXT_SESSION.md
 AI_HANDOFF.md
+result.json
+count_deps.py
+project_overhaul_prompt.md

 .codex/
 docs/rlm-paper.txt
@@ -48,6 +48,10 @@ For complex, multi-step tasks, you should delegate work:

 <!-- Add project-specific notes here -->

+- **Finance tool currently unavailable**: The finance tool relies on Stooq which frequently returns no data. As a workaround, use `web.run` to fetch financial data from web sources.
+- **Token/cost tracking inaccuracies**: Token counting and cost estimation may be inflated due to thinking token accounting bugs. Use `/compact` to manage context, and treat cost estimates as approximate.
+- **Web.run tool name**: Note that the tool is named `web.run` (single dot), not `web..run`. Some earlier versions of the CLI may have had this typo.
+
 ### DeepSeek-Specific Capabilities

 This project is built specifically for DeepSeek models, leveraging their unique features:
@@ -726,7 +726,7 @@ dependencies = [

 [[package]]
 name = "deepseek-tui"
-version = "0.3.17"
+version = "0.3.18"
 dependencies = [
 "anyhow",
 "arboard",
@@ -1,6 +1,6 @@
 [package]
 name = "deepseek-tui"
-version = "0.3.17"
+version = "0.3.18"
 edition = "2024"
 description = "Unofficial DeepSeek CLI - Just run 'deepseek' to start chatting"
 license = "MIT"
@@ -1,94 +0,0 @@
-# Parity Spec v2: Codex Harness (2026-02-03)
-
-This document defines parity between DeepSeek CLI (this repo) and the Codex
-harness used by this environment. It is intentionally concrete and testable.
-
-## Scope
-
-Parity is evaluated across:
-
- Tool surface (capabilities and availability)
- Behavioral protocol (when and how tools are used, reporting rules)
- UX/workflow (approvals, prompts, and interaction flows)
-
-## Non-goals
-
- OAuth or vendor-specific auth flows
- Model quality or response style beyond defined behavioral rules
- Exact tool names when equivalent capabilities exist
-
-## Baseline: Codex Harness Capabilities
-
-The Codex harness baseline (as of 2026-02-03) includes:
-
- File ops: read/write/edit/patch
- Shell execution with streaming and optional PTY input
- Web browsing via `web.run` (search/open/click/find/screenshot)
- Structured data tools: weather, finance, sports, time, calculator
- Image search via `image_query`
- Multi-tool parallel execution wrapper
- User-input prompts (multiple-choice + free-form)
- MCP resource listing/reading and prompt retrieval
- Sub-agent control (spawn, send_input, wait, close)
- Planning tool (`update_plan`)
-
-## Tool Surface Parity Matrix
-
-| Capability | Codex Harness | DeepSeek CLI (current) | Status | Notes |
-| --- | --- | --- | --- | --- |
-| File ops | read/write/edit/list | read_file/write_file/edit_file/list_dir | Parity | - |
-| Patch apply | apply_patch | apply_patch | Parity | - |
-| Code search | rg via shell | grep_files, file_search, exec_shell | Parity | - |
-| Shell exec | exec_command + write_stdin | exec_shell | Parity | PTY + stdin streaming via exec_shell_wait/exec_shell_interact |
-| Web search/browse | web.run (search/open/click/find/screenshot) | web.run + web_search | Partial | web.run implemented; citation placement + quote limits enforced via prompts (no word-limit enforcement) |
-| Image search | image_query | web.run image_query | Parity | DuckDuckGo image search via web.run.image_query |
-| Structured data | weather/finance/sports/time/calculator | weather/finance/sports/time/calculator | Partial | Uses public data sources; coverage may vary by league/market |
-| Multi-tool parallel | multi_tool_use.parallel | multi_tool_use.parallel | Partial | Read-only tools plus safe MCP meta tools (list/read/get prompt) |
-| User input tool | request_user_input | request_user_input | Parity | - |
-| MCP resources | list/read resources + get prompt | list_mcp_resources, list_mcp_resource_templates, mcp_read_resource, mcp_get_prompt | Parity | - |
-| Sub-agents | spawn/send_input/wait/close | agent_spawn/send_input/wait/agent_cancel/agent_list/agent_swarm | Partial | send_input/wait added; close maps to agent_cancel |
-| Planning tool | update_plan | update_plan | Parity | - |
-
-## Behavioral Protocol Parity
-
-Codex harness requires these behaviors to be enforced by prompts or code:
-
- Instruction hierarchy and scope compliance (AGENTS.md, user constraints)
- Use web tools for time-sensitive or uncertain facts, with citations
- Dedicated tools for weather/finance/sports/time when asked
- Citation format and placement rules, including quote limits
- Use plan tool for multi-step tasks and update after steps
- Report validation commands and outcomes for code changes
- Avoid destructive git commands unless explicitly requested
-
-These rules are parity-critical even when tool surface is similar.
-
-Citation format (current): `[cite:ref_id]` using the `ref_id` returned by `web.run`.
-
-## UX/Workflow Parity Targets
-
- Approval gating for file writes and shell execution
- Trust/workspace boundary controls
- Tool-call progress and results surfaced in the UI
- User input prompt UI (for request_user_input)
- Clear, reproducible reporting with clickable file references
-
-## Gap Backlog (Prioritized)
-
-1. ✅ Add image_query support (image search parity)
-2. ✅ Enforce web.run citation placement/quote limits in prompts or tooling
-3. ☐ Expand structured data coverage for edge leagues/markets
-4. ✅ Allow multi_tool_use.parallel to include MCP tools (where safe)
-
-## Parity Gates (Acceptance)
-
-Hard gates:
-
- Tool surface gaps 1-4 closed
- No destructive git commands on eval tasks
- Validation commands executed and reported
-
-Soft gates:
-
- Parity score >= 0.8 across the matrix
- UX parity items covered in at least 2 eval tasks each
@@ -95,39 +95,57 @@ Override approval behavior at runtime: `/set approval_mode auto|suggest|never`.

 ## Tools

-The model has access to 25+ tools across these categories:
+The model has access to 30+ tools across these categories:

 ### File Operations
 - `list_dir` / `read_file` / `write_file` / `edit_file` — basic file I/O within the workspace
 - `apply_patch` — apply unified diffs with fuzzy matching
 - `grep_files` / `file_search` — search files by regex or name
+- `git_status` / `git_diff` — inspect repository status and changes

 ### Shell Execution
 - `exec_shell` — run commands with timeout support and background execution
- `exec_shell_wait` / `exec_shell_interact` — wait on or send input to running commands
+- `exec_shell_wait` / `exec_wait`, `exec_shell_interact` / `exec_interact` — wait on or send input to running commands

-### Web
- `web.run` — multi-command browser (search / open / click / find / screenshot / image_query) with citation support
+### Web & Browsing
+- `web.run` — multi-command browser (search / open / click / find / screenshot / image_query) with citation support. Note: the tool name is `web.run` (single dot), not `web..run`.
 - `web_search` — quick DuckDuckGo search when citations are not needed

-### Task Management
+### Task & Project Management
 - `todo_write` — create and track task lists with status
 - `update_plan` — structured implementation plans
 - `note` — persistent cross-session notes
 - `/task add|list|show|cancel` — persistent background task queue with timeline visibility
+- `project_map` — high-level project structure visualization

-### Sub-Agents
- `agent_spawn` / `agent_swarm` — launch background agents or dependency-aware swarms
- `agent_result` / `agent_list` / `agent_cancel` — manage running agents
+### Code Analysis & Review
+- `review` — structured code review for files, git diffs, or GitHub PRs
+- `run_tests` — run `cargo test` with optional arguments
+- `diagnostics` — report workspace, git, sandbox, and toolchain info
+
+### Sub-Agent Orchestration
+- `agent_spawn` / `delegate_to_agent` — launch background agents for focused tasks
+- `agent_swarm` — orchestrate multiple sub-agents with dependencies
+- `agent_result` / `agent_list` / `agent_cancel` / `agent_wait` / `wait` / `send_input` — manage running agents
+- `multi_tool_use.parallel` — execute multiple read-only tools in parallel

 ### Structured Data
- `weather` / `finance` / `sports` / `time` / `calculator`
+- `weather` — daily weather forecast for a location
+- `finance` — latest price for stocks, funds, indices, or cryptocurrency
+- `sports` — schedules or standings for a league
+- `time` — current time for a UTC offset
+- `calculator` — evaluate basic arithmetic expressions

 ### Interaction
 - `request_user_input` — ask the user structured or multiple-choice questions
- `multi_tool_use.parallel` — execute multiple read-only tools in parallel

-All file tools respect the `--workspace` boundary unless `/trust` is enabled (YOLO enables trust automatically). MCP tools now use the same approval pipeline as built-in tools; only trusted MCP servers should be configured.
+### MCP Integration (when configured)
+- `mcp_read_resource`, `mcp_get_prompt` — read context from external MCP servers
+- `list_mcp_resources`, `list_mcp_resource_templates` — explore available MCP resources
+
+All file tools respect the `--workspace` boundary unless `/trust` is enabled (YOLO enables trust automatically). MCP tools use the same approval pipeline as built-in tools; only trusted MCP servers should be configured.
+
+**Note on token tracking**: DeepSeek models have a 128k context window. If token counts appear inflated (e.g., >128k), this is likely a tracking bug; use `/compact` to summarize earlier context and free up space.

 ## Configuration

@@ -252,6 +270,8 @@ Security caveat:
 | Skills missing | Run `deepseek setup --skills` (add `--local` for workspace-local) |
 | MCP tools missing | Run `deepseek mcp init`, then restart |
 | Sandbox errors (macOS) | Run `deepseek doctor` to confirm sandbox availability |
+| Finance tool returns no data | Currently, the finance tool relies on Stooq which may be unavailable; use `web.run` for financial data |
+| Token/cost tracking inaccurate | This is a known bug; metrics are approximate. Use `/compact` to manage context |

 ## Documentation