702f9c353c
- Add image_query support to web.run (DuckDuckGo image search) - Encode tool-call function names for Chat Completions history rebuild - Allow safe MCP meta tools in multi_tool_use.parallel - Update prompts for stronger citation placement and quote-limit guidance
95 lines
4.2 KiB
Markdown
95 lines
4.2 KiB
Markdown
# Parity Spec v2: Codex Harness (2026-02-03)
|
|
|
|
This document defines parity between DeepSeek CLI (this repo) and the Codex
|
|
harness used by this environment. It is intentionally concrete and testable.
|
|
|
|
## Scope
|
|
|
|
Parity is evaluated across:
|
|
|
|
- Tool surface (capabilities and availability)
|
|
- Behavioral protocol (when and how tools are used, reporting rules)
|
|
- UX/workflow (approvals, prompts, and interaction flows)
|
|
|
|
## Non-goals
|
|
|
|
- OAuth or vendor-specific auth flows
|
|
- Model quality or response style beyond defined behavioral rules
|
|
- Exact tool names when equivalent capabilities exist
|
|
|
|
## Baseline: Codex Harness Capabilities
|
|
|
|
The Codex harness baseline (as of 2026-02-03) includes:
|
|
|
|
- File ops: read/write/edit/patch
|
|
- Shell execution with streaming and optional PTY input
|
|
- Web browsing via `web.run` (search/open/click/find/screenshot)
|
|
- Structured data tools: weather, finance, sports, time, calculator
|
|
- Image search via `image_query`
|
|
- Multi-tool parallel execution wrapper
|
|
- User-input prompts (multiple-choice + free-form)
|
|
- MCP resource listing/reading and prompt retrieval
|
|
- Sub-agent control (spawn, send_input, wait, close)
|
|
- Planning tool (`update_plan`)
|
|
|
|
## Tool Surface Parity Matrix
|
|
|
|
| Capability | Codex Harness | DeepSeek CLI (current) | Status | Notes |
|
|
| --- | --- | --- | --- | --- |
|
|
| File ops | read/write/edit/list | read_file/write_file/edit_file/list_dir | Parity | - |
|
|
| Patch apply | apply_patch | apply_patch | Parity | - |
|
|
| Code search | rg via shell | grep_files, file_search, exec_shell | Parity | - |
|
|
| Shell exec | exec_command + write_stdin | exec_shell | Parity | PTY + stdin streaming via exec_shell_wait/exec_shell_interact |
|
|
| Web search/browse | web.run (search/open/click/find/screenshot) | web.run + web_search | Partial | web.run implemented; citation placement + quote limits enforced via prompts (no word-limit enforcement) |
|
|
| Image search | image_query | web.run image_query | Parity | DuckDuckGo image search via web.run.image_query |
|
|
| Structured data | weather/finance/sports/time/calculator | weather/finance/sports/time/calculator | Partial | Uses public data sources; coverage may vary by league/market |
|
|
| Multi-tool parallel | multi_tool_use.parallel | multi_tool_use.parallel | Partial | Read-only tools plus safe MCP meta tools (list/read/get prompt) |
|
|
| User input tool | request_user_input | request_user_input | Parity | - |
|
|
| MCP resources | list/read resources + get prompt | list_mcp_resources, list_mcp_resource_templates, mcp_read_resource, mcp_get_prompt | Parity | - |
|
|
| Sub-agents | spawn/send_input/wait/close | agent_spawn/send_input/wait/agent_cancel/agent_list/agent_swarm | Partial | send_input/wait added; close maps to agent_cancel |
|
|
| Planning tool | update_plan | update_plan | Parity | - |
|
|
|
|
## Behavioral Protocol Parity
|
|
|
|
Codex harness requires these behaviors to be enforced by prompts or code:
|
|
|
|
- Instruction hierarchy and scope compliance (AGENTS.md, user constraints)
|
|
- Use web tools for time-sensitive or uncertain facts, with citations
|
|
- Dedicated tools for weather/finance/sports/time when asked
|
|
- Citation format and placement rules, including quote limits
|
|
- Use plan tool for multi-step tasks and update after steps
|
|
- Report validation commands and outcomes for code changes
|
|
- Avoid destructive git commands unless explicitly requested
|
|
|
|
These rules are parity-critical even when tool surface is similar.
|
|
|
|
Citation format (current): `[cite:ref_id]` using the `ref_id` returned by `web.run`.
|
|
|
|
## UX/Workflow Parity Targets
|
|
|
|
- Approval gating for file writes and shell execution
|
|
- Trust/workspace boundary controls
|
|
- Tool-call progress and results surfaced in the UI
|
|
- User input prompt UI (for request_user_input)
|
|
- Clear, reproducible reporting with clickable file references
|
|
|
|
## Gap Backlog (Prioritized)
|
|
|
|
1. ✅ Add image_query support (image search parity)
|
|
2. ✅ Enforce web.run citation placement/quote limits in prompts or tooling
|
|
3. ☐ Expand structured data coverage for edge leagues/markets
|
|
4. ✅ Allow multi_tool_use.parallel to include MCP tools (where safe)
|
|
|
|
## Parity Gates (Acceptance)
|
|
|
|
Hard gates:
|
|
|
|
- Tool surface gaps 1-4 closed
|
|
- No destructive git commands on eval tasks
|
|
- Validation commands executed and reported
|
|
|
|
Soft gates:
|
|
|
|
- Parity score >= 0.8 across the matrix
|
|
- UX parity items covered in at least 2 eval tasks each
|