feat(subagents): add agent run receipts

2026-06-12 22:45:53 -07:00
parent 07670871d0
commit c0ba6ce5ad
6 changed files with 686 additions and 4 deletions
@@ -495,6 +495,28 @@ Verification:
 npm test --workspace @codewhale/runtime-sdk
 ```

+## Agent Run Receipts
+
+Sub-agent lanes persist compact run receipts in
+`.codewhale/state/subagents.v1.json`. The Runtime API exposes those receipts as
+a read-only inspection surface:
+
+| Operation | Endpoint |
+|---|---|
+| List persisted agent runs | `GET /v1/agent-runs` |
+| Inspect one run | `GET /v1/agent-runs/{run_id}` |
+
+The response is the same worker-record shape returned by `agent_eval`:
+`spec.run_id`, `actor_kind`, lifecycle `status`, bounded `events`,
+`follow_up`, `takeover`, `artifacts`, `usage`, and `verification`. `run_id`
+falls back to the worker id for older records, and `{run_id}` may be either the
+run id or the worker id.
+
+These endpoints do not start, cancel, or steer sub-agents. Live follow-up still
+goes through `agent_eval`; live cancellation still goes through `agent_close`.
+The API surface exists so app/editor/headless clients can inspect the same
+handoff receipts that the TUI and parent model see.
+
 ## Session lifecycle (native UI supervision)

 | Operation | Endpoint |
@@ -228,6 +228,33 @@ Records that loaded from a pre-#405 persisted state file (no
 `session_boot_id` field) classify as prior-session because the
 manager can't match them to the current boot.

+## Run receipts, follow-up, and takeover
+
+Each sub-agent has a persisted worker record in
+`.codewhale/state/subagents.v1.json`. The record is the current run-ledger
+slice for sub-agent lanes: it stores `run_id`, objective, role/model,
+workspace/branch, lifecycle events, artifact refs, follow-up target, takeover
+target, usage provenance, and verification provenance.
+
+`agent_eval` returns these fields at the top level of the session projection and
+inside `worker_record`. A running or continuable interrupted child should be
+continued through the returned `follow_up` target (`agent_eval` with the same
+agent id or session name). A local takeover should use the returned `takeover`
+instructions; unsupported future cases must say why instead of leaving the
+operator to guess.
+
+Follow-up delivery is explicit. If a message was delivered, the worker record
+stores a bounded preview and timestamp. If the child had already terminated,
+`agent_eval` still returns the projection and transcript handle, but records the
+undelivered follow-up reason so queued instructions do not disappear into UI
+state.
+
+Artifacts are symbolic refs. Use `handle_read` on the returned
+`transcript_handle` for transcript details, and treat `result_summary` as a
+child self-report unless `verification.status` points to a separate gate or
+receipt. `usage.status` is `unknown` until sub-agent token accounting is wired
+into the worker ledger.
+
 ## Output contract

 Every sub-agent produces a final result string with five sections,
@@ -136,6 +136,13 @@ without losing its source context.

 Large logs and command outputs should be artifacts with compact summaries in the transcript. `task_gate_run` handles this automatically for active durable tasks.

+Sub-agent runs also expose a compact run receipt through `agent_eval`: `run_id`,
+`follow_up`, `takeover`, `artifacts`, `usage`, `verification`, and
+`worker_record`. Follow-up delivery receipts record whether an `agent_eval`
+message actually reached the child or why it did not. Usage is marked
+`unknown` until worker-level token accounting is available, and verification is
+`self_report_only` unless a separate gate or artifact proves the claim.
+
 ### GitHub context and guarded writes

 | Tool | Niche |