merge #3164 Fleet manager runbook skill

This commit is contained in:
Hunter B
2026-06-12 22:26:49 -07:00
3 changed files with 175 additions and 2 deletions
@@ -0,0 +1,106 @@
---
name: fleet-manager
description: Use when managing, triaging, restarting, escalating, or summarizing CodeWhale Agent Fleet runs and workers.
metadata:
short-description: Triage CodeWhale Agent Fleet runs
---
# Fleet Manager
Use this skill when acting as a manager agent for CodeWhale Agent Fleet runs.
Your job is to classify worker state, choose the narrowest safe typed action,
and leave a ledgered receipt or a safe escalation draft.
## Authority Boundary
- Prefer typed fleet surfaces over shell spelunking: `codewhale fleet status`,
`inspect`, `logs`, `artifacts`, `interrupt`, `restart`, `stop`, and the
Runtime API fleet endpoints.
- Do not read `.codewhale/fleet.jsonl`, host logs, or remote files directly
unless the typed command or API is missing required evidence.
- Do not send Slack, webhook, PagerDuty, email, or chat messages unless the
user or run config explicitly authorizes sending. Draft the message instead.
- Never include secrets, tokens, webhook URLs, routing keys, full prompts, or
oversized logs in a summary or escalation.
## Triage Loop
1. Identify the run and worker from the user request, run receipt, or fleet
status output. If no worker is named, start with `codewhale fleet status`.
2. Inspect the worker with `codewhale fleet inspect <worker-id>` or the matching
Runtime API worker endpoint.
3. Review bounded evidence with `codewhale fleet logs <worker-id>` and
`codewhale fleet artifacts <worker-id>`. Summarize artifact refs, not full
payloads.
4. Classify the state before acting:
- `transient failure`: transport error, timeout, stale heartbeat, host
unavailable, or retryable provider/network failure.
- `task failure`: worker completed the task but the result is wrong,
missing required artifacts, or reports a domain error.
- `verifier failure`: scorer/verifier failed or disagrees with the worker
result.
- `needs-human`: missing authority, unsafe secret boundary, destructive
action, repeated restart exhaustion, ambiguous product decision, or
conflict between artifacts and verifier.
5. Choose one typed action:
- transient and retry budget remains: `codewhale fleet restart <worker-id>`.
- transient but unsafe to retry: draft escalation and mark needs-human.
- task failure: preserve artifacts, summarize the failure, and avoid restart
unless the task spec says retrying can produce new evidence.
- verifier failure: inspect scorer inputs and artifacts, then escalate if the
verifier cannot be corrected through a typed action.
- needs-human: do not restart automatically; draft a concise escalation.
6. Record the result in the response: classification, action taken or drafted,
evidence commands, artifact refs, and next owner.
## Restart vs Escalate
Restart only when all of these are true:
- the failure is likely transient,
- the task is idempotent or the run policy allows retry,
- retry budget remains,
- no secret, permission, or destructive action boundary is involved, and
- the previous attempt produced enough receipt data to explain the restart.
Escalate when any of these are true:
- restart budget is exhausted,
- the worker requests secrets or new authority,
- artifacts indicate data loss, corruption, or destructive side effects,
- the verifier and task result conflict in a way you cannot resolve from typed
evidence,
- the same failure repeats after a restart, or
- a human product or release decision is required.
## Safe Escalation Draft
Use this shape for Slack/PagerDuty drafts. Keep logs to three short lines or an
artifact ref.
```text
CodeWhale fleet needs attention
Run: <run-id>
Worker: <worker-id>
Task: <task-id or unknown>
Classification: <transient failure | task failure | verifier failure | needs-human>
Reason: <one sentence, no secrets>
Latest typed evidence: codewhale fleet inspect <worker-id>; codewhale fleet artifacts <worker-id>
Safe log excerpt: <3 lines max or "see artifact <ref>">
Requested decision: <restart approval | verifier review | task owner review | permission decision>
```
## Post-Run Receipt
End every fleet-manager response with a compact receipt:
```text
Fleet receipt
Run: <run-id>
Workers checked: <count/list>
Classification: <state>
Action: <restart/interrupt/stop/escalation draft/no-op>
Ledger expectation: <typed action should be recorded | draft only, no send>
Artifacts reviewed: <refs>
Follow-up owner: <manager | task owner | human>
```
+8 -2
View File
@@ -4,13 +4,14 @@
use std::fs;
use std::path::Path;
const BUNDLED_SKILL_VERSION: &str = "3";
const BUNDLED_SKILL_VERSION: &str = "4";
const SKILL_CREATOR_BODY: &str = include_str!("../../assets/skills/skill-creator/SKILL.md");
const DELEGATE_BODY: &str = include_str!("../../assets/skills/delegate/SKILL.md");
const V4_BEST_PRACTICES_BODY: &str = include_str!("../../assets/skills/v4-best-practices/SKILL.md");
const PLUGIN_CREATOR_BODY: &str = include_str!("../../assets/skills/plugin-creator/SKILL.md");
const SKILL_INSTALLER_BODY: &str = include_str!("../../assets/skills/skill-installer/SKILL.md");
const MCP_BUILDER_BODY: &str = include_str!("../../assets/skills/mcp-builder/SKILL.md");
const FLEET_MANAGER_BODY: &str = include_str!("../../assets/skills/fleet-manager/SKILL.md");
const DOCUMENTS_BODY: &str = include_str!("../../assets/skills/documents/SKILL.md");
const PRESENTATIONS_BODY: &str = include_str!("../../assets/skills/presentations/SKILL.md");
const SPREADSHEETS_BODY: &str = include_str!("../../assets/skills/spreadsheets/SKILL.md");
@@ -54,6 +55,11 @@ const BUNDLED_SKILLS: &[BundledSkill] = &[
body: MCP_BUILDER_BODY,
introduced_in: 3,
},
BundledSkill {
name: "fleet-manager",
body: FLEET_MANAGER_BODY,
introduced_in: 4,
},
BundledSkill {
name: "documents",
body: DOCUMENTS_BODY,
@@ -370,7 +376,7 @@ mod tests {
let tmp = TempDir::new().unwrap();
// Simulate v2 where older bundled skills had been deliberately removed
// before v3 introduced more system skills.
// before later versions introduced more system skills.
fs::write(marker_file(&tmp), "2").unwrap();
install_system_skills(tmp.path()).unwrap();
+61
View File
@@ -269,6 +269,67 @@ POST /v1/fleet/runs/{run_id}/stop
Action endpoints call the same manager controls as the CLI and record their
decisions in the fleet ledger.
## Manager-Agent Runbook
Manager agents should treat Fleet operations as typed, ledgered control-plane
work. Start with `codewhale fleet status`, then inspect one run or worker with
`codewhale fleet inspect <worker-id>`, `logs`, and `artifacts`. Use direct
reads of `.codewhale/fleet.jsonl`, host logs, or remote files only when the
typed CLI/API surface cannot provide the required evidence.
Classify the worker before taking action:
- `transient failure`: stale heartbeat, host timeout, interrupted transport,
retryable provider/network error, or an adapter status that can plausibly
recover without changing the task.
- `task failure`: the worker completed but produced an incorrect result,
domain failure, missing required artifact, or explicit task-level error.
- `verifier failure`: the worker result exists, but the scorer/verifier failed,
timed out, or disagrees with the receipt.
- `needs-human`: missing authority, secret request, destructive operation,
repeated restart exhaustion, ambiguous product decision, or conflicting
evidence that the manager cannot resolve from typed artifacts.
Choose one typed action:
- Restart a worker only when the failure is transient, retry budget remains,
the task is idempotent or retry-safe, and no permission or secret boundary is
involved: `codewhale fleet restart <worker-id>`.
- Interrupt or stop only when the current task is unsafe to continue or the
operator explicitly asks for cancellation: `codewhale fleet interrupt
<worker-id>` or `codewhale fleet stop --all`.
- Do not restart pure task failures by default; preserve artifacts and hand the
receipt to the task owner unless the task spec says retrying can produce new
evidence.
- For verifier failures, inspect scorer inputs and artifact refs first. If the
verifier cannot be corrected through typed fleet actions, escalate for human
review.
- For `needs-human`, draft an escalation instead of sending it unless alert
config explicitly authorizes sending.
Safe Slack or PagerDuty draft:
```text
CodeWhale fleet needs attention
Run: <run-id>
Worker: <worker-id>
Task: <task-id or unknown>
Classification: <transient failure | task failure | verifier failure | needs-human>
Reason: <one sentence, no secrets>
Latest typed evidence: codewhale fleet inspect <worker-id>; codewhale fleet artifacts <worker-id>
Safe log excerpt: <3 lines max or "see artifact <ref>">
Requested decision: <restart approval | verifier review | task owner review | permission decision>
```
Post-run summaries should include the run id, workers checked, classification,
typed action taken or drafted, expected ledger effect, artifact refs reviewed,
and next owner. Keep summaries bounded; link artifact refs instead of copying
full logs or transcripts.
The bundled `fleet-manager` skill mirrors this runbook for manager agents. It
is a first-party system skill and should be discoverable through the normal
skill registry after system skills are installed or refreshed.
## Host Adapters
The host adapter boundary supports local child processes and explicit SSH