chore: v0.3.18 — expand docs, remove PARITY.md, clean up gitignore
- Remove PARITY.md (no longer needed) - Expand README tools section with full tool inventory and troubleshooting entries - Add known-issue notes to AGENTS.md (finance tool, token tracking, web.run naming) - Update .gitignore with additional temp file patterns - Bump version to 0.3.18
This commit is contained in:
@@ -51,6 +51,9 @@ todo*.md
|
||||
CLAUDE.md
|
||||
NEXT_SESSION.md
|
||||
AI_HANDOFF.md
|
||||
result.json
|
||||
count_deps.py
|
||||
project_overhaul_prompt.md
|
||||
|
||||
.codex/
|
||||
docs/rlm-paper.txt
|
||||
|
||||
@@ -48,6 +48,10 @@ For complex, multi-step tasks, you should delegate work:
|
||||
|
||||
<!-- Add project-specific notes here -->
|
||||
|
||||
- **Finance tool currently unavailable**: The finance tool relies on Stooq which frequently returns no data. As a workaround, use `web.run` to fetch financial data from web sources.
|
||||
- **Token/cost tracking inaccuracies**: Token counting and cost estimation may be inflated due to thinking token accounting bugs. Use `/compact` to manage context, and treat cost estimates as approximate.
|
||||
- **Web.run tool name**: Note that the tool is named `web.run` (single dot), not `web..run`. Some earlier versions of the CLI may have had this typo.
|
||||
|
||||
### DeepSeek-Specific Capabilities
|
||||
|
||||
This project is built specifically for DeepSeek models, leveraging their unique features:
|
||||
|
||||
Generated
+1
-1
@@ -726,7 +726,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "deepseek-tui"
|
||||
version = "0.3.17"
|
||||
version = "0.3.18"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"arboard",
|
||||
|
||||
+1
-1
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "deepseek-tui"
|
||||
version = "0.3.17"
|
||||
version = "0.3.18"
|
||||
edition = "2024"
|
||||
description = "Unofficial DeepSeek CLI - Just run 'deepseek' to start chatting"
|
||||
license = "MIT"
|
||||
|
||||
@@ -1,94 +0,0 @@
|
||||
# Parity Spec v2: Codex Harness (2026-02-03)
|
||||
|
||||
This document defines parity between DeepSeek CLI (this repo) and the Codex
|
||||
harness used by this environment. It is intentionally concrete and testable.
|
||||
|
||||
## Scope
|
||||
|
||||
Parity is evaluated across:
|
||||
|
||||
- Tool surface (capabilities and availability)
|
||||
- Behavioral protocol (when and how tools are used, reporting rules)
|
||||
- UX/workflow (approvals, prompts, and interaction flows)
|
||||
|
||||
## Non-goals
|
||||
|
||||
- OAuth or vendor-specific auth flows
|
||||
- Model quality or response style beyond defined behavioral rules
|
||||
- Exact tool names when equivalent capabilities exist
|
||||
|
||||
## Baseline: Codex Harness Capabilities
|
||||
|
||||
The Codex harness baseline (as of 2026-02-03) includes:
|
||||
|
||||
- File ops: read/write/edit/patch
|
||||
- Shell execution with streaming and optional PTY input
|
||||
- Web browsing via `web.run` (search/open/click/find/screenshot)
|
||||
- Structured data tools: weather, finance, sports, time, calculator
|
||||
- Image search via `image_query`
|
||||
- Multi-tool parallel execution wrapper
|
||||
- User-input prompts (multiple-choice + free-form)
|
||||
- MCP resource listing/reading and prompt retrieval
|
||||
- Sub-agent control (spawn, send_input, wait, close)
|
||||
- Planning tool (`update_plan`)
|
||||
|
||||
## Tool Surface Parity Matrix
|
||||
|
||||
| Capability | Codex Harness | DeepSeek CLI (current) | Status | Notes |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| File ops | read/write/edit/list | read_file/write_file/edit_file/list_dir | Parity | - |
|
||||
| Patch apply | apply_patch | apply_patch | Parity | - |
|
||||
| Code search | rg via shell | grep_files, file_search, exec_shell | Parity | - |
|
||||
| Shell exec | exec_command + write_stdin | exec_shell | Parity | PTY + stdin streaming via exec_shell_wait/exec_shell_interact |
|
||||
| Web search/browse | web.run (search/open/click/find/screenshot) | web.run + web_search | Partial | web.run implemented; citation placement + quote limits enforced via prompts (no word-limit enforcement) |
|
||||
| Image search | image_query | web.run image_query | Parity | DuckDuckGo image search via web.run.image_query |
|
||||
| Structured data | weather/finance/sports/time/calculator | weather/finance/sports/time/calculator | Partial | Uses public data sources; coverage may vary by league/market |
|
||||
| Multi-tool parallel | multi_tool_use.parallel | multi_tool_use.parallel | Partial | Read-only tools plus safe MCP meta tools (list/read/get prompt) |
|
||||
| User input tool | request_user_input | request_user_input | Parity | - |
|
||||
| MCP resources | list/read resources + get prompt | list_mcp_resources, list_mcp_resource_templates, mcp_read_resource, mcp_get_prompt | Parity | - |
|
||||
| Sub-agents | spawn/send_input/wait/close | agent_spawn/send_input/wait/agent_cancel/agent_list/agent_swarm | Partial | send_input/wait added; close maps to agent_cancel |
|
||||
| Planning tool | update_plan | update_plan | Parity | - |
|
||||
|
||||
## Behavioral Protocol Parity
|
||||
|
||||
Codex harness requires these behaviors to be enforced by prompts or code:
|
||||
|
||||
- Instruction hierarchy and scope compliance (AGENTS.md, user constraints)
|
||||
- Use web tools for time-sensitive or uncertain facts, with citations
|
||||
- Dedicated tools for weather/finance/sports/time when asked
|
||||
- Citation format and placement rules, including quote limits
|
||||
- Use plan tool for multi-step tasks and update after steps
|
||||
- Report validation commands and outcomes for code changes
|
||||
- Avoid destructive git commands unless explicitly requested
|
||||
|
||||
These rules are parity-critical even when tool surface is similar.
|
||||
|
||||
Citation format (current): `[cite:ref_id]` using the `ref_id` returned by `web.run`.
|
||||
|
||||
## UX/Workflow Parity Targets
|
||||
|
||||
- Approval gating for file writes and shell execution
|
||||
- Trust/workspace boundary controls
|
||||
- Tool-call progress and results surfaced in the UI
|
||||
- User input prompt UI (for request_user_input)
|
||||
- Clear, reproducible reporting with clickable file references
|
||||
|
||||
## Gap Backlog (Prioritized)
|
||||
|
||||
1. ✅ Add image_query support (image search parity)
|
||||
2. ✅ Enforce web.run citation placement/quote limits in prompts or tooling
|
||||
3. ☐ Expand structured data coverage for edge leagues/markets
|
||||
4. ✅ Allow multi_tool_use.parallel to include MCP tools (where safe)
|
||||
|
||||
## Parity Gates (Acceptance)
|
||||
|
||||
Hard gates:
|
||||
|
||||
- Tool surface gaps 1-4 closed
|
||||
- No destructive git commands on eval tasks
|
||||
- Validation commands executed and reported
|
||||
|
||||
Soft gates:
|
||||
|
||||
- Parity score >= 0.8 across the matrix
|
||||
- UX parity items covered in at least 2 eval tasks each
|
||||
@@ -95,39 +95,57 @@ Override approval behavior at runtime: `/set approval_mode auto|suggest|never`.
|
||||
|
||||
## Tools
|
||||
|
||||
The model has access to 25+ tools across these categories:
|
||||
The model has access to 30+ tools across these categories:
|
||||
|
||||
### File Operations
|
||||
- `list_dir` / `read_file` / `write_file` / `edit_file` — basic file I/O within the workspace
|
||||
- `apply_patch` — apply unified diffs with fuzzy matching
|
||||
- `grep_files` / `file_search` — search files by regex or name
|
||||
- `git_status` / `git_diff` — inspect repository status and changes
|
||||
|
||||
### Shell Execution
|
||||
- `exec_shell` — run commands with timeout support and background execution
|
||||
- `exec_shell_wait` / `exec_shell_interact` — wait on or send input to running commands
|
||||
- `exec_shell_wait` / `exec_wait`, `exec_shell_interact` / `exec_interact` — wait on or send input to running commands
|
||||
|
||||
### Web
|
||||
- `web.run` — multi-command browser (search / open / click / find / screenshot / image_query) with citation support
|
||||
### Web & Browsing
|
||||
- `web.run` — multi-command browser (search / open / click / find / screenshot / image_query) with citation support. Note: the tool name is `web.run` (single dot), not `web..run`.
|
||||
- `web_search` — quick DuckDuckGo search when citations are not needed
|
||||
|
||||
### Task Management
|
||||
### Task & Project Management
|
||||
- `todo_write` — create and track task lists with status
|
||||
- `update_plan` — structured implementation plans
|
||||
- `note` — persistent cross-session notes
|
||||
- `/task add|list|show|cancel` — persistent background task queue with timeline visibility
|
||||
- `project_map` — high-level project structure visualization
|
||||
|
||||
### Sub-Agents
|
||||
- `agent_spawn` / `agent_swarm` — launch background agents or dependency-aware swarms
|
||||
- `agent_result` / `agent_list` / `agent_cancel` — manage running agents
|
||||
### Code Analysis & Review
|
||||
- `review` — structured code review for files, git diffs, or GitHub PRs
|
||||
- `run_tests` — run `cargo test` with optional arguments
|
||||
- `diagnostics` — report workspace, git, sandbox, and toolchain info
|
||||
|
||||
### Sub-Agent Orchestration
|
||||
- `agent_spawn` / `delegate_to_agent` — launch background agents for focused tasks
|
||||
- `agent_swarm` — orchestrate multiple sub-agents with dependencies
|
||||
- `agent_result` / `agent_list` / `agent_cancel` / `agent_wait` / `wait` / `send_input` — manage running agents
|
||||
- `multi_tool_use.parallel` — execute multiple read-only tools in parallel
|
||||
|
||||
### Structured Data
|
||||
- `weather` / `finance` / `sports` / `time` / `calculator`
|
||||
- `weather` — daily weather forecast for a location
|
||||
- `finance` — latest price for stocks, funds, indices, or cryptocurrency
|
||||
- `sports` — schedules or standings for a league
|
||||
- `time` — current time for a UTC offset
|
||||
- `calculator` — evaluate basic arithmetic expressions
|
||||
|
||||
### Interaction
|
||||
- `request_user_input` — ask the user structured or multiple-choice questions
|
||||
- `multi_tool_use.parallel` — execute multiple read-only tools in parallel
|
||||
|
||||
All file tools respect the `--workspace` boundary unless `/trust` is enabled (YOLO enables trust automatically). MCP tools now use the same approval pipeline as built-in tools; only trusted MCP servers should be configured.
|
||||
### MCP Integration (when configured)
|
||||
- `mcp_read_resource`, `mcp_get_prompt` — read context from external MCP servers
|
||||
- `list_mcp_resources`, `list_mcp_resource_templates` — explore available MCP resources
|
||||
|
||||
All file tools respect the `--workspace` boundary unless `/trust` is enabled (YOLO enables trust automatically). MCP tools use the same approval pipeline as built-in tools; only trusted MCP servers should be configured.
|
||||
|
||||
**Note on token tracking**: DeepSeek models have a 128k context window. If token counts appear inflated (e.g., >128k), this is likely a tracking bug; use `/compact` to summarize earlier context and free up space.
|
||||
|
||||
## Configuration
|
||||
|
||||
@@ -252,6 +270,8 @@ Security caveat:
|
||||
| Skills missing | Run `deepseek setup --skills` (add `--local` for workspace-local) |
|
||||
| MCP tools missing | Run `deepseek mcp init`, then restart |
|
||||
| Sandbox errors (macOS) | Run `deepseek doctor` to confirm sandbox availability |
|
||||
| Finance tool returns no data | Currently, the finance tool relies on Stooq which may be unavailable; use `web.run` for financial data |
|
||||
| Token/cost tracking inaccurate | This is a known bug; metrics are approximate. Use `/compact` to manage context |
|
||||
|
||||
## Documentation
|
||||
|
||||
|
||||
Reference in New Issue
Block a user