Files
codewhale/PARITY.md
T
Hunter Bown 702f9c353c Release v0.3.14
- Add image_query support to web.run (DuckDuckGo image search)
- Encode tool-call function names for Chat Completions history rebuild
- Allow safe MCP meta tools in multi_tool_use.parallel
- Update prompts for stronger citation placement and quote-limit guidance
2026-02-05 09:28:39 -06:00

4.2 KiB

Parity Spec v2: Codex Harness (2026-02-03)

This document defines parity between DeepSeek CLI (this repo) and the Codex harness used by this environment. It is intentionally concrete and testable.

Scope

Parity is evaluated across:

  • Tool surface (capabilities and availability)
  • Behavioral protocol (when and how tools are used, reporting rules)
  • UX/workflow (approvals, prompts, and interaction flows)

Non-goals

  • OAuth or vendor-specific auth flows
  • Model quality or response style beyond defined behavioral rules
  • Exact tool names when equivalent capabilities exist

Baseline: Codex Harness Capabilities

The Codex harness baseline (as of 2026-02-03) includes:

  • File ops: read/write/edit/patch
  • Shell execution with streaming and optional PTY input
  • Web browsing via web.run (search/open/click/find/screenshot)
  • Structured data tools: weather, finance, sports, time, calculator
  • Image search via image_query
  • Multi-tool parallel execution wrapper
  • User-input prompts (multiple-choice + free-form)
  • MCP resource listing/reading and prompt retrieval
  • Sub-agent control (spawn, send_input, wait, close)
  • Planning tool (update_plan)

Tool Surface Parity Matrix

Capability Codex Harness DeepSeek CLI (current) Status Notes
File ops read/write/edit/list read_file/write_file/edit_file/list_dir Parity -
Patch apply apply_patch apply_patch Parity -
Code search rg via shell grep_files, file_search, exec_shell Parity -
Shell exec exec_command + write_stdin exec_shell Parity PTY + stdin streaming via exec_shell_wait/exec_shell_interact
Web search/browse web.run (search/open/click/find/screenshot) web.run + web_search Partial web.run implemented; citation placement + quote limits enforced via prompts (no word-limit enforcement)
Image search image_query web.run image_query Parity DuckDuckGo image search via web.run.image_query
Structured data weather/finance/sports/time/calculator weather/finance/sports/time/calculator Partial Uses public data sources; coverage may vary by league/market
Multi-tool parallel multi_tool_use.parallel multi_tool_use.parallel Partial Read-only tools plus safe MCP meta tools (list/read/get prompt)
User input tool request_user_input request_user_input Parity -
MCP resources list/read resources + get prompt list_mcp_resources, list_mcp_resource_templates, mcp_read_resource, mcp_get_prompt Parity -
Sub-agents spawn/send_input/wait/close agent_spawn/send_input/wait/agent_cancel/agent_list/agent_swarm Partial send_input/wait added; close maps to agent_cancel
Planning tool update_plan update_plan Parity -

Behavioral Protocol Parity

Codex harness requires these behaviors to be enforced by prompts or code:

  • Instruction hierarchy and scope compliance (AGENTS.md, user constraints)
  • Use web tools for time-sensitive or uncertain facts, with citations
  • Dedicated tools for weather/finance/sports/time when asked
  • Citation format and placement rules, including quote limits
  • Use plan tool for multi-step tasks and update after steps
  • Report validation commands and outcomes for code changes
  • Avoid destructive git commands unless explicitly requested

These rules are parity-critical even when tool surface is similar.

Citation format (current): [cite:ref_id] using the ref_id returned by web.run.

UX/Workflow Parity Targets

  • Approval gating for file writes and shell execution
  • Trust/workspace boundary controls
  • Tool-call progress and results surfaced in the UI
  • User input prompt UI (for request_user_input)
  • Clear, reproducible reporting with clickable file references

Gap Backlog (Prioritized)

  1. Add image_query support (image search parity)
  2. Enforce web.run citation placement/quote limits in prompts or tooling
  3. ☐ Expand structured data coverage for edge leagues/markets
  4. Allow multi_tool_use.parallel to include MCP tools (where safe)

Parity Gates (Acceptance)

Hard gates:

  • Tool surface gaps 1-4 closed
  • No destructive git commands on eval tasks
  • Validation commands executed and reported

Soft gates:

  • Parity score >= 0.8 across the matrix
  • UX parity items covered in at least 2 eval tasks each