diff --git a/docs/AGENT_RUNTIME.md b/docs/AGENT_RUNTIME.md new file mode 100644 index 00000000..3423e3c7 --- /dev/null +++ b/docs/AGENT_RUNTIME.md @@ -0,0 +1,116 @@ +# The CodeWhale Agent Runtime — one substrate, three launchers + +This document explains how sub-agents, the headless `exec` path, and the Agent +Fleet relate. It exists because these had drifted into *two* parallel "worker" +systems, and the fix is to treat them as **one**. It also answers the +open direction question in #2972 ("how much Claude Code convergence is right?"). + +## The core idea + +There is exactly **one** thing that runs work: a **headless agent runtime** — a +model loop with the full (policy-gated) tool surface that can, in turn, spawn +its own sub-agents. Everything else is just a different way to *launch* that one +runtime, or a different way to *observe* it. + +``` + ┌───────────────────────────────┐ + │ headless agent runtime │ + │ (full tools + can sub-spawn) │ + └───────────────────────────────┘ + ▲ ▲ ▲ + launches │ │ │ launches + │ │ │ + ┌────────────┴───┐ ┌───────┴────────┐ ┌──┴───────────────────┐ + │ TUI turn │ │ `codewhale │ │ Agent Fleet │ + │ (interactive, │ │ exec` │ │ (durable: ledger, │ + │ in-process) │ │ (headless CLI,│ │ scheduler, SSH, │ + │ │ │ anyone/any- │ │ alerts) — launches │ + │ │ │ time) │ │ `codewhale exec` │ + └────────────────┘ └────────────────┘ │ per worker │ + └───────────────────────┘ +``` + +- A **sub-agent** is a *nested* run of the same runtime, launched from inside + any agent process via `agent_open`. Same spec, same events, same depth axis. +- **`codewhale exec`** is the headless front door: usable by anyone at any time + (CI, scripts, another agent), full tools, emits a `stream-json` event stream, + and can spawn sub-agents. It is *the* runtime with a CLI on it. +- A **fleet worker** *is* a `codewhale exec` run that the fleet launches and + tracks durably — locally as a subprocess, or remotely as + `ssh host … codewhale exec …`. The fleet does not re-implement execution; it + adds **orchestration** (durable ledger, scheduling/leasing/retry, host + transport, alert escalation) *over* the one runtime. + +So "fleet vs sub-agent" is not two categories. It is **the same headless run**, +differing only in *where it was launched from* and *how durably it is tracked*. + +## Why this shape (and why it fixes the lag) + +The motivating problem: spawning many in-process sub-agents made the TUI lag, +because each child cloned a heavy runtime and rebuilt the whole tool registry, +*and* the TUI rendered a full card/transcript per child. + +Surveying Claude Code, Codex, and Kimi, the thing that keeps an orchestrator +light at high fanout is **not** a process boundary — all three run sub-agents +in-process. It is **isolation + a compact event stream**: + +- a child's transcript **never** flows back into the parent — the parent gets a + result summary and a small lifecycle event stream; +- the UI renders **counts** (`2 running / 3 done`), not a child session per + worker; +- each worker's tool surface is built directly from a **role/capability + profile**, not "build everything then filter". + +"Headless" therefore means *the execution is not shaped like the UI* — it does +**not** mean fewer abilities. A headless worker keeps the full toolset and can +spawn sub-agents. + +When the work also needs to be **durable** (survive the TUI closing, a laptop +sleeping) or **remote** (SSH), the fleet runs the worker out-of-process as +`codewhale exec`. The heavy construction then lives in another process entirely, +so the orchestrator stays smooth regardless of fanout, and the run survives +restarts — the day-scale autonomy goal of #3154. + +## One recursion axis + +A worker runs at `spawn_depth = 0` and may spawn children while +`spawn_depth + 1 ≤ max_spawn_depth`, so a budget of `N` affords `N` nested +delegation levels. Sub-agents and fleet workers share **one** axis, sourced from +`codewhale_config`: + +- `DEFAULT_SPAWN_DEPTH = 3` — the default budget for both standalone sub-agents + and fleet workers (so they cannot drift into "two moving targets"); +- `MAX_SPAWN_DEPTH_CEILING = 3` — the hard cap that every configured value + (fleet `max_spawn_depth`, `agent_open`'s `max_depth`) clamps to. + +The root worker always runs even at budget 0; the budget gates *child* +delegation. The default affords at least three nested levels. + +## Event vocabulary + +The fleet ledger persists the worker's own event stream rather than a separate, +simulated taxonomy. `codewhale exec --output-format stream-json` emits +`{"type": "content" | "tool_use" | "tool_result" | "metadata" | "done" | +"error"}` lines, which map onto the fleet ledger's `FleetWorkerEventPayload` +(`RunningTool`, `Running`, `Completed`, `Failed`, …). One vocabulary, two +surfaces. + +## Convergence with Claude Code (#2972) + +CodeWhale should converge with Claude Code on **shape**, not on branding: + +- **Adopt**: a headless runtime with a real CLI/SDK front door; sub-agents as + isolated runs that return summaries (not transcripts); a compact, event-driven + fanout projection; capability/role tool profiles; the skills ecosystem + (#2743); structured run receipts. +- **Keep distinct**: CodeWhale branding and first-class DeepSeek/GLM/MiniMax/ + multi-provider support; the local-first **Agent Fleet** (durable, SSH-capable + orchestration) as CodeWhale's own layer above the shared runtime; WhaleFlow as + the orchestration overlay. +- **Do not** fork execution semantics per surface. The TUI, `exec`, the Runtime + API, and the fleet must all drive the *same* runtime and observe the *same* + event stream — divergence there is what produced the "two moving targets" this + document exists to prevent. + +The litmus test for any new agent surface: *does it launch and observe the one +runtime, or does it invent a second one?* Only the former is allowed. diff --git a/docs/FLEET.md b/docs/FLEET.md index 0e33f454..7e2f7aa5 100644 --- a/docs/FLEET.md +++ b/docs/FLEET.md @@ -1,7 +1,10 @@ # Agent Fleet -Agent Fleet is the local-first control plane for durable multi-worker runs. The -initial CLI surface is: +Agent Fleet is the local-first control plane for durable multi-worker runs. It +is **not** a separate execution engine: a fleet worker is a headless +`codewhale exec` run that the fleet launches and tracks durably. See +[AGENT_RUNTIME.md](AGENT_RUNTIME.md) for how sub-agents, `exec`, and the fleet +are one runtime with three launchers. The initial CLI surface is: ```sh codewhale fleet init