codewhale/AGENTS.md

# Project Instructions

This file provides context for AI assistants working on this project.

## Project Type: Rust

### Commands
- Build: `cargo build` (default-members include the `deepseek` dispatcher)
- Test: `cargo test --workspace --all-features`
- Lint: `cargo clippy --workspace --all-targets --all-features`
- Format: `cargo fmt --all`
- Run (canonical): `deepseek` — use the **`deepseek` binary**, not `deepseek-tui`. The dispatcher delegates to the TUI for interactive use and is the supported entry point for every flow (`deepseek`, `deepseek -p "..."`, `deepseek doctor`, `deepseek mcp …`, etc.).
- Run from source: `cargo run --bin deepseek` (or `cargo run -p deepseek-tui-cli`).
- Local dev shorthand: after `cargo build --release`, run `./target/release/deepseek`.

### Build Dependencies
- **Rust** 1.85+ (for the workspace)

### Documentation
See README.md for project overview, docs/ARCHITECTURE.md for internals.

## DeepSeek-Specific Notes

- **Thinking Tokens**: DeepSeek models output thinking blocks (`ContentBlock::Thinking`) before final answers. The TUI streams and displays these with visual distinction.
- **Reasoning Models**: `deepseek-v4-pro` and `deepseek-v4-flash` are the documented V4 model IDs. Legacy `deepseek-chat` and `deepseek-reasoner` are compatibility aliases for `deepseek-v4-flash`.
- **Large Context Window**: DeepSeek V4 models have 1M-token context windows. Use search tools to navigate efficiently.
- **API**: OpenAI-compatible Chat Completions (`/chat/completions`) is the documented DeepSeek API path. Base URL configurable for global (`api.deepseek.com`) or China (`api.deepseeki.com`); `/v1` is accepted for OpenAI SDK compatibility, and `/beta` is only needed for beta features such as strict tool mode, chat prefix completion, and FIM completion.
- **Thinking + Tool Calls**: In V4 thinking mode, assistant messages that contain tool calls must replay their `reasoning_content` in all subsequent requests or the API returns HTTP 400.

## GitHub Operations

Use the **`gh` CLI** (`/opt/homebrew/bin/gh`) for all GitHub operations — issues, PRs, branches, labels. It's already authenticated as `Hmbown` (token scopes: `gist`, `read:org`, `repo`, `workflow`). Examples:

- List open issues: `gh issue list --state open --limit 20`
- View an issue: `gh issue view <number>`
- Create an issue branch: `gh issue develop <number> --branch-name feat/issue-<number>-<slug>`
- Close a verified issue: `gh issue close <number> --comment "..."`
- Create a PR: `gh pr create --base feat/v0.6.2 --title "..." --body "..."`
- Check PR status: `gh pr view <number>`

Prefer `gh` over `fetch_url` or `web_search` for GitHub data — it's faster, authenticated, and avoids rate limits.
Issues may be closed when the acceptance criteria have been verified or when the user explicitly asks for closure; avoid closing unrelated issues opportunistically.

## Important Notes

- **Token/cost tracking inaccuracies**: Token counting and cost estimation may be inflated due to thinking token accounting bugs. Use `/compact` to manage context, and treat cost estimates as approximate.
- **Modes**: Three modes — Plan (read-only investigation), Agent (tool use with approval), YOLO (auto-approved). See `docs/MODES.md` for details.
- **Sub-agents**: Single model-callable surface is `agent_spawn` (returns an `agent_id` immediately; parent keeps working) plus `agent_wait` / `agent_result` / `agent_cancel` / `agent_list` / `agent_send_input` / `agent_resume` / `agent_assign`. The old `agent_swarm` / `spawn_agents_on_csv` / `/swarm` surface was removed in v0.8.5 (#336).
- **`rlm` tool** (`crates/tui/src/tools/rlm.rs`): a sandboxed Python REPL where a sub-LLM can call in-REPL helpers (`llm_query()`, `llm_query_batched()`, `rlm_query()`, `rlm_query_batched()`) — those `*_query` names are **Python helpers inside the REPL**, not separately-registered model-visible tools. Always loaded across all modes.

## Session Longevity (Critical)

Long sessions in DeepSeek TUI WILL degrade and crash if you work sequentially. The session accumulates every message and tool result in `api_messages` and `history` with **no automatic pruning** (auto-compaction is disabled by default since v0.6.6). Session saves serialize the entire bloated array to disk.

**To survive a multi-hour sprint:**

1. **Delegate everything to sub-agents.** Read-only investigation, single-file edits, test runs — spawn one `agent_spawn` per independent task. You are the coordinator, not the worker. Sub-agents start fresh sessions with clean context. Your session stays small.

2. **Batch tool calls.** Never fire one `read_file` and wait. Fire 3 `read_file` + 2 `grep_files` + 1 `git_status` in one turn. The dispatcher runs them in parallel.

3. **Compact aggressively.** Suggest `/compact` at 60% context usage, not 80%. A compacted session that stays fast beats a dead session every time.

4. **Max 3 sequential turns before delegating.** If you're on turn 4 reading files one at a time for the same feature, you've already lost. Spawn sub-agents.

5. **Use RLM for batch classification.** Need to categorize 15 files? `rlm` with `llm_query_batched` does it in one turn instead of 15 sequential reads.

6. **After every 3 turns, check:** context under 60%? Sub-agents still running? PRs ready to push? `cargo check` still passes?

**The "mismanaged genius" problem:** The system prompt was written for a less capable model and treats sub-agents, RLM, and parallel execution as specialty escape hatches. The model *can* do all of this — the prompt just doesn't encourage it strongly enough. We fixed this in v0.8.6 (see `PROMPT_ANALYSIS.md`).