ab2c708ca7
Major Features: - Runtime API for external integrations and turn management - Task manager with persistence and recovery - Shell output streaming and improved tool execution - Error taxonomy and audit logging - Command palette and UI enhancements Documentation: - Runtime API documentation - Operations runbook - Architecture updates Fixes: - Auto-compaction threshold and triggering logic - Doctor command API key validation - Clippy and formatting compliance
96 lines
3.0 KiB
Markdown
96 lines
3.0 KiB
Markdown
# DeepSeek CLI Operations Runbook
|
|
|
|
This runbook covers practical debugging and incident response for the local CLI/TUI runtime.
|
|
|
|
## Quick Triage
|
|
|
|
1. Confirm binary + config:
|
|
- `cargo run -- --version`
|
|
- `cat ~/.deepseek/config.toml` (or inspect configured profile)
|
|
2. Enable verbose logs:
|
|
- `RUST_LOG=deepseek_cli=debug cargo run`
|
|
- For HTTP retries/reconnects: `RUST_LOG=deepseek_cli::client=debug cargo run`
|
|
3. Capture current state:
|
|
- `ls ~/.deepseek/sessions`
|
|
- `ls ~/.deepseek/sessions/checkpoints`
|
|
- `ls ~/.deepseek/tasks`
|
|
|
|
## Incident: Turn Hangs or Stream Stops
|
|
|
|
Symptoms:
|
|
- TUI remains in loading state
|
|
- partial assistant output with no completion
|
|
|
|
Checks:
|
|
1. Inspect retry/health logs (`deepseek_cli::client`)
|
|
2. Verify endpoint connectivity:
|
|
- `curl -sS https://api.deepseek.com/v1/models -H "Authorization: Bearer $DEEPSEEK_API_KEY"`
|
|
3. Confirm no local sandbox/permission deadlock in tool output
|
|
|
|
Actions:
|
|
1. Cancel current turn (`Esc` in TUI)
|
|
2. Retry prompt; if still failing, restart TUI
|
|
3. On restart, verify crash checkpoint recovery message appears
|
|
|
|
## Incident: Network Outage / Offline Behavior
|
|
|
|
Expected behavior:
|
|
- New prompts are queued while offline mode is active
|
|
- Queue state persists to `~/.deepseek/sessions/checkpoints/offline_queue.json`
|
|
|
|
Checks:
|
|
1. Open queue in TUI: `/queue list`
|
|
2. Confirm persisted queue file exists and updates timestamp
|
|
|
|
Actions:
|
|
1. Restore connectivity
|
|
2. Re-send queued entries (from `/queue edit <n>` + Enter, or normal input flow)
|
|
3. Ensure queue file clears when queue is empty
|
|
|
|
## Incident: Crash Recovery Needed
|
|
|
|
Expected behavior:
|
|
- Checkpoint stored at `~/.deepseek/sessions/checkpoints/latest.json`
|
|
- Startup auto-restores checkpoint when no explicit `--resume` target is supplied
|
|
|
|
Actions:
|
|
1. Start TUI normally and verify "Recovered checkpoint session" status
|
|
2. If automatic recovery fails, inspect checkpoint JSON for schema mismatch
|
|
3. If schema is newer than binary supports, upgrade binary or remove stale checkpoint
|
|
|
|
## Incident: Persistent State Schema Errors
|
|
|
|
Symptoms:
|
|
- Errors like `schema vX is newer than supported vY`
|
|
|
|
Affected stores:
|
|
- sessions (`~/.deepseek/sessions/*.json`)
|
|
- runtime thread/turn/item records
|
|
- tasks (`~/.deepseek/tasks/tasks/*.json`)
|
|
|
|
Actions:
|
|
1. Confirm binary version and migration expectations
|
|
2. Back up the state directory before editing
|
|
3. Either:
|
|
- run with a newer compatible binary, or
|
|
- archive incompatible records and regenerate state
|
|
|
|
## Incident: MCP/Tool Execution Failures
|
|
|
|
Checks:
|
|
1. Validate `~/.deepseek/mcp.json` schema and server command paths
|
|
2. Confirm server process can start manually
|
|
3. Check sandbox denials in TUI history / logs
|
|
|
|
Actions:
|
|
1. Retry with required approvals (or YOLO only when appropriate)
|
|
2. Temporarily disable failing MCP server and isolate issue
|
|
3. Re-enable after verification with `/mcp` diagnostics
|
|
|
|
## Post-Incident Checklist
|
|
|
|
1. Preserve logs and relevant state files
|
|
2. Record trigger, impact, and mitigation
|
|
3. Add or update regression tests (retry/recovery/schema)
|
|
4. Update this runbook and architecture docs if behavior changed
|