# Parity Spec v2: Codex Harness (2026-02-03) This document defines parity between DeepSeek CLI (this repo) and the Codex harness used by this environment. It is intentionally concrete and testable. ## Scope Parity is evaluated across: - Tool surface (capabilities and availability) - Behavioral protocol (when and how tools are used, reporting rules) - UX/workflow (approvals, prompts, and interaction flows) ## Non-goals - OAuth or vendor-specific auth flows - Model quality or response style beyond defined behavioral rules - Exact tool names when equivalent capabilities exist ## Baseline: Codex Harness Capabilities The Codex harness baseline (as of 2026-02-03) includes: - File ops: read/write/edit/patch - Shell execution with streaming and optional PTY input - Web browsing via `web.run` (search/open/click/find/screenshot) - Structured data tools: weather, finance, sports, time, calculator - Image search via `image_query` - Multi-tool parallel execution wrapper - User-input prompts (multiple-choice + free-form) - MCP resource listing/reading and prompt retrieval - Sub-agent control (spawn, send_input, wait, close) - Planning tool (`update_plan`) ## Tool Surface Parity Matrix | Capability | Codex Harness | DeepSeek CLI (current) | Status | Notes | | --- | --- | --- | --- | --- | | File ops | read/write/edit/list | read_file/write_file/edit_file/list_dir | Parity | - | | Patch apply | apply_patch | apply_patch | Parity | - | | Code search | rg via shell | grep_files, file_search, exec_shell | Parity | - | | Shell exec | exec_command + write_stdin | exec_shell | Parity | PTY + stdin streaming via exec_shell_wait/exec_shell_interact | | Web search/browse | web.run (search/open/click/find/screenshot) | web.run + web_search | Partial | web.run implemented; citation placement + quote limits enforced via prompts (no word-limit enforcement) | | Image search | image_query | web.run image_query | Parity | DuckDuckGo image search via web.run.image_query | | Structured data | weather/finance/sports/time/calculator | weather/finance/sports/time/calculator | Partial | Uses public data sources; coverage may vary by league/market | | Multi-tool parallel | multi_tool_use.parallel | multi_tool_use.parallel | Partial | Read-only tools plus safe MCP meta tools (list/read/get prompt) | | User input tool | request_user_input | request_user_input | Parity | - | | MCP resources | list/read resources + get prompt | list_mcp_resources, list_mcp_resource_templates, mcp_read_resource, mcp_get_prompt | Parity | - | | Sub-agents | spawn/send_input/wait/close | agent_spawn/send_input/wait/agent_cancel/agent_list/agent_swarm | Partial | send_input/wait added; close maps to agent_cancel | | Planning tool | update_plan | update_plan | Parity | - | ## Behavioral Protocol Parity Codex harness requires these behaviors to be enforced by prompts or code: - Instruction hierarchy and scope compliance (AGENTS.md, user constraints) - Use web tools for time-sensitive or uncertain facts, with citations - Dedicated tools for weather/finance/sports/time when asked - Citation format and placement rules, including quote limits - Use plan tool for multi-step tasks and update after steps - Report validation commands and outcomes for code changes - Avoid destructive git commands unless explicitly requested These rules are parity-critical even when tool surface is similar. Citation format (current): `[cite:ref_id]` using the `ref_id` returned by `web.run`. ## UX/Workflow Parity Targets - Approval gating for file writes and shell execution - Trust/workspace boundary controls - Tool-call progress and results surfaced in the UI - User input prompt UI (for request_user_input) - Clear, reproducible reporting with clickable file references ## Gap Backlog (Prioritized) 1. ✅ Add image_query support (image search parity) 2. ✅ Enforce web.run citation placement/quote limits in prompts or tooling 3. ☐ Expand structured data coverage for edge leagues/markets 4. ✅ Allow multi_tool_use.parallel to include MCP tools (where safe) ## Parity Gates (Acceptance) Hard gates: - Tool surface gaps 1-4 closed - No destructive git commands on eval tasks - Validation commands executed and reported Soft gates: - Parity score >= 0.8 across the matrix - UX parity items covered in at least 2 eval tasks each