merge #3164 Fleet manager runbook skill
This commit is contained in:
@@ -0,0 +1,106 @@
|
||||
---
|
||||
name: fleet-manager
|
||||
description: Use when managing, triaging, restarting, escalating, or summarizing CodeWhale Agent Fleet runs and workers.
|
||||
metadata:
|
||||
short-description: Triage CodeWhale Agent Fleet runs
|
||||
---
|
||||
|
||||
# Fleet Manager
|
||||
|
||||
Use this skill when acting as a manager agent for CodeWhale Agent Fleet runs.
|
||||
Your job is to classify worker state, choose the narrowest safe typed action,
|
||||
and leave a ledgered receipt or a safe escalation draft.
|
||||
|
||||
## Authority Boundary
|
||||
|
||||
- Prefer typed fleet surfaces over shell spelunking: `codewhale fleet status`,
|
||||
`inspect`, `logs`, `artifacts`, `interrupt`, `restart`, `stop`, and the
|
||||
Runtime API fleet endpoints.
|
||||
- Do not read `.codewhale/fleet.jsonl`, host logs, or remote files directly
|
||||
unless the typed command or API is missing required evidence.
|
||||
- Do not send Slack, webhook, PagerDuty, email, or chat messages unless the
|
||||
user or run config explicitly authorizes sending. Draft the message instead.
|
||||
- Never include secrets, tokens, webhook URLs, routing keys, full prompts, or
|
||||
oversized logs in a summary or escalation.
|
||||
|
||||
## Triage Loop
|
||||
|
||||
1. Identify the run and worker from the user request, run receipt, or fleet
|
||||
status output. If no worker is named, start with `codewhale fleet status`.
|
||||
2. Inspect the worker with `codewhale fleet inspect <worker-id>` or the matching
|
||||
Runtime API worker endpoint.
|
||||
3. Review bounded evidence with `codewhale fleet logs <worker-id>` and
|
||||
`codewhale fleet artifacts <worker-id>`. Summarize artifact refs, not full
|
||||
payloads.
|
||||
4. Classify the state before acting:
|
||||
- `transient failure`: transport error, timeout, stale heartbeat, host
|
||||
unavailable, or retryable provider/network failure.
|
||||
- `task failure`: worker completed the task but the result is wrong,
|
||||
missing required artifacts, or reports a domain error.
|
||||
- `verifier failure`: scorer/verifier failed or disagrees with the worker
|
||||
result.
|
||||
- `needs-human`: missing authority, unsafe secret boundary, destructive
|
||||
action, repeated restart exhaustion, ambiguous product decision, or
|
||||
conflict between artifacts and verifier.
|
||||
5. Choose one typed action:
|
||||
- transient and retry budget remains: `codewhale fleet restart <worker-id>`.
|
||||
- transient but unsafe to retry: draft escalation and mark needs-human.
|
||||
- task failure: preserve artifacts, summarize the failure, and avoid restart
|
||||
unless the task spec says retrying can produce new evidence.
|
||||
- verifier failure: inspect scorer inputs and artifacts, then escalate if the
|
||||
verifier cannot be corrected through a typed action.
|
||||
- needs-human: do not restart automatically; draft a concise escalation.
|
||||
6. Record the result in the response: classification, action taken or drafted,
|
||||
evidence commands, artifact refs, and next owner.
|
||||
|
||||
## Restart vs Escalate
|
||||
|
||||
Restart only when all of these are true:
|
||||
|
||||
- the failure is likely transient,
|
||||
- the task is idempotent or the run policy allows retry,
|
||||
- retry budget remains,
|
||||
- no secret, permission, or destructive action boundary is involved, and
|
||||
- the previous attempt produced enough receipt data to explain the restart.
|
||||
|
||||
Escalate when any of these are true:
|
||||
|
||||
- restart budget is exhausted,
|
||||
- the worker requests secrets or new authority,
|
||||
- artifacts indicate data loss, corruption, or destructive side effects,
|
||||
- the verifier and task result conflict in a way you cannot resolve from typed
|
||||
evidence,
|
||||
- the same failure repeats after a restart, or
|
||||
- a human product or release decision is required.
|
||||
|
||||
## Safe Escalation Draft
|
||||
|
||||
Use this shape for Slack/PagerDuty drafts. Keep logs to three short lines or an
|
||||
artifact ref.
|
||||
|
||||
```text
|
||||
CodeWhale fleet needs attention
|
||||
Run: <run-id>
|
||||
Worker: <worker-id>
|
||||
Task: <task-id or unknown>
|
||||
Classification: <transient failure | task failure | verifier failure | needs-human>
|
||||
Reason: <one sentence, no secrets>
|
||||
Latest typed evidence: codewhale fleet inspect <worker-id>; codewhale fleet artifacts <worker-id>
|
||||
Safe log excerpt: <3 lines max or "see artifact <ref>">
|
||||
Requested decision: <restart approval | verifier review | task owner review | permission decision>
|
||||
```
|
||||
|
||||
## Post-Run Receipt
|
||||
|
||||
End every fleet-manager response with a compact receipt:
|
||||
|
||||
```text
|
||||
Fleet receipt
|
||||
Run: <run-id>
|
||||
Workers checked: <count/list>
|
||||
Classification: <state>
|
||||
Action: <restart/interrupt/stop/escalation draft/no-op>
|
||||
Ledger expectation: <typed action should be recorded | draft only, no send>
|
||||
Artifacts reviewed: <refs>
|
||||
Follow-up owner: <manager | task owner | human>
|
||||
```
|
||||
@@ -4,13 +4,14 @@
|
||||
use std::fs;
|
||||
use std::path::Path;
|
||||
|
||||
const BUNDLED_SKILL_VERSION: &str = "3";
|
||||
const BUNDLED_SKILL_VERSION: &str = "4";
|
||||
const SKILL_CREATOR_BODY: &str = include_str!("../../assets/skills/skill-creator/SKILL.md");
|
||||
const DELEGATE_BODY: &str = include_str!("../../assets/skills/delegate/SKILL.md");
|
||||
const V4_BEST_PRACTICES_BODY: &str = include_str!("../../assets/skills/v4-best-practices/SKILL.md");
|
||||
const PLUGIN_CREATOR_BODY: &str = include_str!("../../assets/skills/plugin-creator/SKILL.md");
|
||||
const SKILL_INSTALLER_BODY: &str = include_str!("../../assets/skills/skill-installer/SKILL.md");
|
||||
const MCP_BUILDER_BODY: &str = include_str!("../../assets/skills/mcp-builder/SKILL.md");
|
||||
const FLEET_MANAGER_BODY: &str = include_str!("../../assets/skills/fleet-manager/SKILL.md");
|
||||
const DOCUMENTS_BODY: &str = include_str!("../../assets/skills/documents/SKILL.md");
|
||||
const PRESENTATIONS_BODY: &str = include_str!("../../assets/skills/presentations/SKILL.md");
|
||||
const SPREADSHEETS_BODY: &str = include_str!("../../assets/skills/spreadsheets/SKILL.md");
|
||||
@@ -54,6 +55,11 @@ const BUNDLED_SKILLS: &[BundledSkill] = &[
|
||||
body: MCP_BUILDER_BODY,
|
||||
introduced_in: 3,
|
||||
},
|
||||
BundledSkill {
|
||||
name: "fleet-manager",
|
||||
body: FLEET_MANAGER_BODY,
|
||||
introduced_in: 4,
|
||||
},
|
||||
BundledSkill {
|
||||
name: "documents",
|
||||
body: DOCUMENTS_BODY,
|
||||
@@ -370,7 +376,7 @@ mod tests {
|
||||
let tmp = TempDir::new().unwrap();
|
||||
|
||||
// Simulate v2 where older bundled skills had been deliberately removed
|
||||
// before v3 introduced more system skills.
|
||||
// before later versions introduced more system skills.
|
||||
fs::write(marker_file(&tmp), "2").unwrap();
|
||||
|
||||
install_system_skills(tmp.path()).unwrap();
|
||||
|
||||
@@ -269,6 +269,67 @@ POST /v1/fleet/runs/{run_id}/stop
|
||||
Action endpoints call the same manager controls as the CLI and record their
|
||||
decisions in the fleet ledger.
|
||||
|
||||
## Manager-Agent Runbook
|
||||
|
||||
Manager agents should treat Fleet operations as typed, ledgered control-plane
|
||||
work. Start with `codewhale fleet status`, then inspect one run or worker with
|
||||
`codewhale fleet inspect <worker-id>`, `logs`, and `artifacts`. Use direct
|
||||
reads of `.codewhale/fleet.jsonl`, host logs, or remote files only when the
|
||||
typed CLI/API surface cannot provide the required evidence.
|
||||
|
||||
Classify the worker before taking action:
|
||||
|
||||
- `transient failure`: stale heartbeat, host timeout, interrupted transport,
|
||||
retryable provider/network error, or an adapter status that can plausibly
|
||||
recover without changing the task.
|
||||
- `task failure`: the worker completed but produced an incorrect result,
|
||||
domain failure, missing required artifact, or explicit task-level error.
|
||||
- `verifier failure`: the worker result exists, but the scorer/verifier failed,
|
||||
timed out, or disagrees with the receipt.
|
||||
- `needs-human`: missing authority, secret request, destructive operation,
|
||||
repeated restart exhaustion, ambiguous product decision, or conflicting
|
||||
evidence that the manager cannot resolve from typed artifacts.
|
||||
|
||||
Choose one typed action:
|
||||
|
||||
- Restart a worker only when the failure is transient, retry budget remains,
|
||||
the task is idempotent or retry-safe, and no permission or secret boundary is
|
||||
involved: `codewhale fleet restart <worker-id>`.
|
||||
- Interrupt or stop only when the current task is unsafe to continue or the
|
||||
operator explicitly asks for cancellation: `codewhale fleet interrupt
|
||||
<worker-id>` or `codewhale fleet stop --all`.
|
||||
- Do not restart pure task failures by default; preserve artifacts and hand the
|
||||
receipt to the task owner unless the task spec says retrying can produce new
|
||||
evidence.
|
||||
- For verifier failures, inspect scorer inputs and artifact refs first. If the
|
||||
verifier cannot be corrected through typed fleet actions, escalate for human
|
||||
review.
|
||||
- For `needs-human`, draft an escalation instead of sending it unless alert
|
||||
config explicitly authorizes sending.
|
||||
|
||||
Safe Slack or PagerDuty draft:
|
||||
|
||||
```text
|
||||
CodeWhale fleet needs attention
|
||||
Run: <run-id>
|
||||
Worker: <worker-id>
|
||||
Task: <task-id or unknown>
|
||||
Classification: <transient failure | task failure | verifier failure | needs-human>
|
||||
Reason: <one sentence, no secrets>
|
||||
Latest typed evidence: codewhale fleet inspect <worker-id>; codewhale fleet artifacts <worker-id>
|
||||
Safe log excerpt: <3 lines max or "see artifact <ref>">
|
||||
Requested decision: <restart approval | verifier review | task owner review | permission decision>
|
||||
```
|
||||
|
||||
Post-run summaries should include the run id, workers checked, classification,
|
||||
typed action taken or drafted, expected ledger effect, artifact refs reviewed,
|
||||
and next owner. Keep summaries bounded; link artifact refs instead of copying
|
||||
full logs or transcripts.
|
||||
|
||||
The bundled `fleet-manager` skill mirrors this runbook for manager agents. It
|
||||
is a first-party system skill and should be discoverable through the normal
|
||||
skill registry after system skills are installed or refreshed.
|
||||
|
||||
## Host Adapters
|
||||
|
||||
The host adapter boundary supports local child processes and explicit SSH
|
||||
|
||||
Reference in New Issue
Block a user