feat(fleet): add manager runbook skill

This commit is contained in:
Hunter B
2026-06-12 22:26:35 -07:00
parent 33f112143d
commit fb1a27b4f2
3 changed files with 175 additions and 2 deletions
@@ -0,0 +1,106 @@
---
name: fleet-manager
description: Use when managing, triaging, restarting, escalating, or summarizing CodeWhale Agent Fleet runs and workers.
metadata:
short-description: Triage CodeWhale Agent Fleet runs
---
# Fleet Manager
Use this skill when acting as a manager agent for CodeWhale Agent Fleet runs.
Your job is to classify worker state, choose the narrowest safe typed action,
and leave a ledgered receipt or a safe escalation draft.
## Authority Boundary
- Prefer typed fleet surfaces over shell spelunking: `codewhale fleet status`,
`inspect`, `logs`, `artifacts`, `interrupt`, `restart`, `stop`, and the
Runtime API fleet endpoints.
- Do not read `.codewhale/fleet.jsonl`, host logs, or remote files directly
unless the typed command or API is missing required evidence.
- Do not send Slack, webhook, PagerDuty, email, or chat messages unless the
user or run config explicitly authorizes sending. Draft the message instead.
- Never include secrets, tokens, webhook URLs, routing keys, full prompts, or
oversized logs in a summary or escalation.
## Triage Loop
1. Identify the run and worker from the user request, run receipt, or fleet
status output. If no worker is named, start with `codewhale fleet status`.
2. Inspect the worker with `codewhale fleet inspect <worker-id>` or the matching
Runtime API worker endpoint.
3. Review bounded evidence with `codewhale fleet logs <worker-id>` and
`codewhale fleet artifacts <worker-id>`. Summarize artifact refs, not full
payloads.
4. Classify the state before acting:
- `transient failure`: transport error, timeout, stale heartbeat, host
unavailable, or retryable provider/network failure.
- `task failure`: worker completed the task but the result is wrong,
missing required artifacts, or reports a domain error.
- `verifier failure`: scorer/verifier failed or disagrees with the worker
result.
- `needs-human`: missing authority, unsafe secret boundary, destructive
action, repeated restart exhaustion, ambiguous product decision, or
conflict between artifacts and verifier.
5. Choose one typed action:
- transient and retry budget remains: `codewhale fleet restart <worker-id>`.
- transient but unsafe to retry: draft escalation and mark needs-human.
- task failure: preserve artifacts, summarize the failure, and avoid restart
unless the task spec says retrying can produce new evidence.
- verifier failure: inspect scorer inputs and artifacts, then escalate if the
verifier cannot be corrected through a typed action.
- needs-human: do not restart automatically; draft a concise escalation.
6. Record the result in the response: classification, action taken or drafted,
evidence commands, artifact refs, and next owner.
## Restart vs Escalate
Restart only when all of these are true:
- the failure is likely transient,
- the task is idempotent or the run policy allows retry,
- retry budget remains,
- no secret, permission, or destructive action boundary is involved, and
- the previous attempt produced enough receipt data to explain the restart.
Escalate when any of these are true:
- restart budget is exhausted,
- the worker requests secrets or new authority,
- artifacts indicate data loss, corruption, or destructive side effects,
- the verifier and task result conflict in a way you cannot resolve from typed
evidence,
- the same failure repeats after a restart, or
- a human product or release decision is required.
## Safe Escalation Draft
Use this shape for Slack/PagerDuty drafts. Keep logs to three short lines or an
artifact ref.
```text
CodeWhale fleet needs attention
Run: <run-id>
Worker: <worker-id>
Task: <task-id or unknown>
Classification: <transient failure | task failure | verifier failure | needs-human>
Reason: <one sentence, no secrets>
Latest typed evidence: codewhale fleet inspect <worker-id>; codewhale fleet artifacts <worker-id>
Safe log excerpt: <3 lines max or "see artifact <ref>">
Requested decision: <restart approval | verifier review | task owner review | permission decision>
```
## Post-Run Receipt
End every fleet-manager response with a compact receipt:
```text
Fleet receipt
Run: <run-id>
Workers checked: <count/list>
Classification: <state>
Action: <restart/interrupt/stop/escalation draft/no-op>
Ledger expectation: <typed action should be recorded | draft only, no send>
Artifacts reviewed: <refs>
Follow-up owner: <manager | task owner | human>
```
+8 -2
View File
@@ -4,13 +4,14 @@
use std::fs; use std::fs;
use std::path::Path; use std::path::Path;
const BUNDLED_SKILL_VERSION: &str = "3"; const BUNDLED_SKILL_VERSION: &str = "4";
const SKILL_CREATOR_BODY: &str = include_str!("../../assets/skills/skill-creator/SKILL.md"); const SKILL_CREATOR_BODY: &str = include_str!("../../assets/skills/skill-creator/SKILL.md");
const DELEGATE_BODY: &str = include_str!("../../assets/skills/delegate/SKILL.md"); const DELEGATE_BODY: &str = include_str!("../../assets/skills/delegate/SKILL.md");
const V4_BEST_PRACTICES_BODY: &str = include_str!("../../assets/skills/v4-best-practices/SKILL.md"); const V4_BEST_PRACTICES_BODY: &str = include_str!("../../assets/skills/v4-best-practices/SKILL.md");
const PLUGIN_CREATOR_BODY: &str = include_str!("../../assets/skills/plugin-creator/SKILL.md"); const PLUGIN_CREATOR_BODY: &str = include_str!("../../assets/skills/plugin-creator/SKILL.md");
const SKILL_INSTALLER_BODY: &str = include_str!("../../assets/skills/skill-installer/SKILL.md"); const SKILL_INSTALLER_BODY: &str = include_str!("../../assets/skills/skill-installer/SKILL.md");
const MCP_BUILDER_BODY: &str = include_str!("../../assets/skills/mcp-builder/SKILL.md"); const MCP_BUILDER_BODY: &str = include_str!("../../assets/skills/mcp-builder/SKILL.md");
const FLEET_MANAGER_BODY: &str = include_str!("../../assets/skills/fleet-manager/SKILL.md");
const DOCUMENTS_BODY: &str = include_str!("../../assets/skills/documents/SKILL.md"); const DOCUMENTS_BODY: &str = include_str!("../../assets/skills/documents/SKILL.md");
const PRESENTATIONS_BODY: &str = include_str!("../../assets/skills/presentations/SKILL.md"); const PRESENTATIONS_BODY: &str = include_str!("../../assets/skills/presentations/SKILL.md");
const SPREADSHEETS_BODY: &str = include_str!("../../assets/skills/spreadsheets/SKILL.md"); const SPREADSHEETS_BODY: &str = include_str!("../../assets/skills/spreadsheets/SKILL.md");
@@ -54,6 +55,11 @@ const BUNDLED_SKILLS: &[BundledSkill] = &[
body: MCP_BUILDER_BODY, body: MCP_BUILDER_BODY,
introduced_in: 3, introduced_in: 3,
}, },
BundledSkill {
name: "fleet-manager",
body: FLEET_MANAGER_BODY,
introduced_in: 4,
},
BundledSkill { BundledSkill {
name: "documents", name: "documents",
body: DOCUMENTS_BODY, body: DOCUMENTS_BODY,
@@ -370,7 +376,7 @@ mod tests {
let tmp = TempDir::new().unwrap(); let tmp = TempDir::new().unwrap();
// Simulate v2 where older bundled skills had been deliberately removed // Simulate v2 where older bundled skills had been deliberately removed
// before v3 introduced more system skills. // before later versions introduced more system skills.
fs::write(marker_file(&tmp), "2").unwrap(); fs::write(marker_file(&tmp), "2").unwrap();
install_system_skills(tmp.path()).unwrap(); install_system_skills(tmp.path()).unwrap();
+61
View File
@@ -269,6 +269,67 @@ POST /v1/fleet/runs/{run_id}/stop
Action endpoints call the same manager controls as the CLI and record their Action endpoints call the same manager controls as the CLI and record their
decisions in the fleet ledger. decisions in the fleet ledger.
## Manager-Agent Runbook
Manager agents should treat Fleet operations as typed, ledgered control-plane
work. Start with `codewhale fleet status`, then inspect one run or worker with
`codewhale fleet inspect <worker-id>`, `logs`, and `artifacts`. Use direct
reads of `.codewhale/fleet.jsonl`, host logs, or remote files only when the
typed CLI/API surface cannot provide the required evidence.
Classify the worker before taking action:
- `transient failure`: stale heartbeat, host timeout, interrupted transport,
retryable provider/network error, or an adapter status that can plausibly
recover without changing the task.
- `task failure`: the worker completed but produced an incorrect result,
domain failure, missing required artifact, or explicit task-level error.
- `verifier failure`: the worker result exists, but the scorer/verifier failed,
timed out, or disagrees with the receipt.
- `needs-human`: missing authority, secret request, destructive operation,
repeated restart exhaustion, ambiguous product decision, or conflicting
evidence that the manager cannot resolve from typed artifacts.
Choose one typed action:
- Restart a worker only when the failure is transient, retry budget remains,
the task is idempotent or retry-safe, and no permission or secret boundary is
involved: `codewhale fleet restart <worker-id>`.
- Interrupt or stop only when the current task is unsafe to continue or the
operator explicitly asks for cancellation: `codewhale fleet interrupt
<worker-id>` or `codewhale fleet stop --all`.
- Do not restart pure task failures by default; preserve artifacts and hand the
receipt to the task owner unless the task spec says retrying can produce new
evidence.
- For verifier failures, inspect scorer inputs and artifact refs first. If the
verifier cannot be corrected through typed fleet actions, escalate for human
review.
- For `needs-human`, draft an escalation instead of sending it unless alert
config explicitly authorizes sending.
Safe Slack or PagerDuty draft:
```text
CodeWhale fleet needs attention
Run: <run-id>
Worker: <worker-id>
Task: <task-id or unknown>
Classification: <transient failure | task failure | verifier failure | needs-human>
Reason: <one sentence, no secrets>
Latest typed evidence: codewhale fleet inspect <worker-id>; codewhale fleet artifacts <worker-id>
Safe log excerpt: <3 lines max or "see artifact <ref>">
Requested decision: <restart approval | verifier review | task owner review | permission decision>
```
Post-run summaries should include the run id, workers checked, classification,
typed action taken or drafted, expected ledger effect, artifact refs reviewed,
and next owner. Keep summaries bounded; link artifact refs instead of copying
full logs or transcripts.
The bundled `fleet-manager` skill mirrors this runbook for manager agents. It
is a first-party system skill and should be discoverable through the normal
skill registry after system skills are installed or refreshed.
## Host Adapters ## Host Adapters
The host adapter boundary supports local child processes and explicit SSH The host adapter boundary supports local child processes and explicit SSH