Files
codewhale/docs/OPERATIONS_RUNBOOK.md
T
Hunter Bown 7b91169017 refactor: move source files into workspace crates
- Move src/* into crates/tui/src/ to create a proper workspace structure
- Add .claude/ and .trimtab/ directories for Trimtab closed-loop workflow
- Add DEPENDENCY_GRAPH.md and update documentation
- Update Cargo.toml files to reflect new crate dependencies
- Update CI workflows and npm package scripts
- All tests pass, release build works
2026-03-11 20:00:38 -05:00

3.1 KiB

DeepSeek TUI Operations Runbook

This runbook covers practical debugging and incident response for the local CLI/TUI runtime.

Quick Triage

  1. Confirm binary + config:
    • cargo run -- --version
    • cat ~/.deepseek/config.toml (or inspect configured profile)
  2. Enable verbose logs:
    • RUST_LOG=deepseek_cli=debug cargo run
    • For HTTP retries/reconnects: RUST_LOG=deepseek_cli::client=debug cargo run
  3. Capture current state:
    • ls ~/.deepseek/sessions
    • ls ~/.deepseek/sessions/checkpoints
    • ls ~/.deepseek/tasks

Incident: Turn Hangs or Stream Stops

Symptoms:

  • TUI remains in loading state
  • partial assistant output with no completion

Checks:

  1. Inspect retry/health logs (deepseek_cli::client)
  2. Verify endpoint connectivity:
    • curl -sS https://api.deepseek.com/v1/models -H "Authorization: Bearer $DEEPSEEK_API_KEY"
  3. Confirm no local sandbox/permission deadlock in tool output

Actions:

  1. Cancel current turn (Esc in TUI while loading)
  2. Retry prompt; if still failing, restart TUI
  3. On restart, verify the previous queued/in-flight runtime turn is shown as interrupted rather than left in a running state

Incident: Network Outage / Offline Behavior

Expected behavior:

  • New prompts are queued while offline mode is active
  • Queue state persists to ~/.deepseek/sessions/checkpoints/offline_queue.json

Checks:

  1. Open queue in TUI: /queue list
  2. Confirm persisted queue file exists and updates timestamp

Actions:

  1. Restore connectivity
  2. Re-send queued entries (from /queue edit <n> + Enter, or normal input flow)
  3. Ensure queue file clears when queue is empty

Incident: Crash Recovery Needed

Expected behavior:

  • Checkpoint stored at ~/.deepseek/sessions/checkpoints/latest.json
  • Startup begins a fresh session unless --resume/--continue is supplied

Actions:

  1. Resume prior work explicitly via deepseek --resume <id> or Ctrl+R in TUI
  2. If checkpoint inspection is needed, inspect latest.json for schema mismatch/details
  3. If schema is newer than binary supports, upgrade binary or remove stale checkpoint

Incident: Persistent State Schema Errors

Symptoms:

  • Errors like schema vX is newer than supported vY

Affected stores:

  • sessions (~/.deepseek/sessions/*.json)
  • runtime thread/turn/item records
  • tasks (~/.deepseek/tasks/tasks/*.json)

Actions:

  1. Confirm binary version and migration expectations
  2. Back up the state directory before editing
  3. Either:
    • run with a newer compatible binary, or
    • archive incompatible records and regenerate state

Incident: MCP/Tool Execution Failures

Checks:

  1. Validate ~/.deepseek/mcp.json schema and server command paths
  2. Confirm server process can start manually
  3. Check sandbox denials in TUI history / logs

Actions:

  1. Retry with required approvals (or YOLO only when appropriate)
  2. Temporarily disable failing MCP server and isolate issue
  3. Re-enable after verification with /mcp diagnostics

Post-Incident Checklist

  1. Preserve logs and relevant state files
  2. Record trigger, impact, and mitigation
  3. Add or update regression tests (retry/recovery/schema)
  4. Update this runbook and architecture docs if behavior changed