Files
codewhale/AGENTS.md
T
Hunter Bown 5bfc1feb62 v0.8.6: survivability, UX polish, and release hardening
Merge the v0.8.6 feature batch and release hardening.\n\nIncludes the full #373-#380/#382-#402 milestone scope, version bump to 0.8.6, secure /share temp-file handling, Windows-safe self-update replacement, and CI portability fixes.\n\nRemote PR checks passed on the final head before merge.
2026-05-02 20:11:33 -05:00

5.5 KiB

Project Instructions

This file provides context for AI assistants working on this project.

Project Type: Rust

Commands

  • Build: cargo build (default-members include the deepseek dispatcher)
  • Test: cargo test --workspace --all-features
  • Lint: cargo clippy --workspace --all-targets --all-features
  • Format: cargo fmt --all
  • Run (canonical): deepseek — use the deepseek binary, not deepseek-tui. The dispatcher delegates to the TUI for interactive use and is the supported entry point for every flow (deepseek, deepseek -p "...", deepseek doctor, deepseek mcp …, etc.).
  • Run from source: cargo run --bin deepseek (or cargo run -p deepseek-tui-cli).
  • Local dev shorthand: after cargo build --release, run ./target/release/deepseek.

Build Dependencies

  • Rust 1.85+ (for the workspace)

Documentation

See README.md for project overview, docs/ARCHITECTURE.md for internals.

DeepSeek-Specific Notes

  • Thinking Tokens: DeepSeek models output thinking blocks (ContentBlock::Thinking) before final answers. The TUI streams and displays these with visual distinction.
  • Reasoning Models: deepseek-v4-pro and deepseek-v4-flash are the documented V4 model IDs. Legacy deepseek-chat and deepseek-reasoner are compatibility aliases for deepseek-v4-flash.
  • Large Context Window: DeepSeek V4 models have 1M-token context windows. Use search tools to navigate efficiently.
  • API: OpenAI-compatible Chat Completions (/chat/completions) is the documented DeepSeek API path. Base URL configurable for global (api.deepseek.com) or China (api.deepseeki.com); /v1 is accepted for OpenAI SDK compatibility, and /beta is only needed for beta features such as strict tool mode, chat prefix completion, and FIM completion.
  • Thinking + Tool Calls: In V4 thinking mode, assistant messages that contain tool calls must replay their reasoning_content in all subsequent requests or the API returns HTTP 400.

GitHub Operations

Use the gh CLI (/opt/homebrew/bin/gh) for all GitHub operations — issues, PRs, branches, labels. It's already authenticated as Hmbown (token scopes: gist, read:org, repo, workflow). Examples:

  • List open issues: gh issue list --state open --limit 20
  • View an issue: gh issue view <number>
  • Create an issue branch: gh issue develop <number> --branch-name feat/issue-<number>-<slug>
  • Close a verified issue: gh issue close <number> --comment "..."
  • Create a PR: gh pr create --base feat/v0.6.2 --title "..." --body "..."
  • Check PR status: gh pr view <number>

Prefer gh over fetch_url or web_search for GitHub data — it's faster, authenticated, and avoids rate limits. Issues may be closed when the acceptance criteria have been verified or when the user explicitly asks for closure; avoid closing unrelated issues opportunistically.

Important Notes

  • Token/cost tracking inaccuracies: Token counting and cost estimation may be inflated due to thinking token accounting bugs. Use /compact to manage context, and treat cost estimates as approximate.
  • Modes: Three modes — Plan (read-only investigation), Agent (tool use with approval), YOLO (auto-approved). See docs/MODES.md for details.
  • Sub-agents: Single model-callable surface is agent_spawn (returns an agent_id immediately; parent keeps working) plus agent_wait / agent_result / agent_cancel / agent_list / agent_send_input / agent_resume / agent_assign. The old agent_swarm / spawn_agents_on_csv / /swarm surface was removed in v0.8.5 (#336).
  • rlm tool (crates/tui/src/tools/rlm.rs): a sandboxed Python REPL where a sub-LLM can call in-REPL helpers (llm_query(), llm_query_batched(), rlm_query(), rlm_query_batched()) — those *_query names are Python helpers inside the REPL, not separately-registered model-visible tools. Always loaded across all modes.

Session Longevity (Critical)

Long sessions in DeepSeek TUI WILL degrade and crash if you work sequentially. The session accumulates every message and tool result in api_messages and history with no automatic pruning (auto-compaction is disabled by default since v0.6.6). Session saves serialize the entire bloated array to disk.

To survive a multi-hour sprint:

  1. Delegate everything to sub-agents. Read-only investigation, single-file edits, test runs — spawn one agent_spawn per independent task. You are the coordinator, not the worker. Sub-agents start fresh sessions with clean context. Your session stays small.

  2. Batch tool calls. Never fire one read_file and wait. Fire 3 read_file + 2 grep_files + 1 git_status in one turn. The dispatcher runs them in parallel.

  3. Compact aggressively. Suggest /compact at 60% context usage, not 80%. A compacted session that stays fast beats a dead session every time.

  4. Max 3 sequential turns before delegating. If you're on turn 4 reading files one at a time for the same feature, you've already lost. Spawn sub-agents.

  5. Use RLM for batch classification. Need to categorize 15 files? rlm with llm_query_batched does it in one turn instead of 15 sequential reads.

  6. After every 3 turns, check: context under 60%? Sub-agents still running? PRs ready to push? cargo check still passes?

The "mismanaged genius" problem: The system prompt was written for a less capable model and treats sub-agents, RLM, and parallel execution as specialty escape hatches. The model can do all of this — the prompt just doesn't encourage it strongly enough. We fixed this in v0.8.6 (see PROMPT_ANALYSIS.md).