Merge origin/main into v0.8.58-3018-unhardcode-deepseek — append #3030 step-counter tests after the #3018 routing tests
This commit is contained in:
@@ -0,0 +1,135 @@
|
||||
# Agent Runner Protocol
|
||||
|
||||
How a headless agent (DeepSeek V4 on a DigitalOcean droplet, or any codewhale exec caller) picks up, implements, verifies, and delivers a milestone issue — fully autonomously.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `gh` CLI authenticated with a fine-grained PAT scoped to `Hmbown/CodeWhale` (Contents RW, Issues RW, PRs RW, Metadata R)
|
||||
- `codewhale` binary on `$PATH` (v0.8.57+)
|
||||
- `DEEPSEEK_API_KEY` (or equivalent provider key) exported in the agent user's shell
|
||||
- A `git worktree` per issue (never commit directly to `main`)
|
||||
|
||||
---
|
||||
|
||||
## The loop
|
||||
|
||||
### 1. Pick
|
||||
|
||||
```bash
|
||||
gh issue list \
|
||||
--repo Hmbown/CodeWhale \
|
||||
--milestone v0.8.58 \
|
||||
--label agent-ready \
|
||||
--state open \
|
||||
--json number,title,url
|
||||
```
|
||||
|
||||
Choose an issue. Prefer `release-blocker` → `bug` → `enhancement` order.
|
||||
Do not pick an issue already labeled `agent-in-progress`.
|
||||
|
||||
### 2. Claim
|
||||
|
||||
```bash
|
||||
gh issue edit <N> --add-label agent-in-progress --remove-label agent-ready
|
||||
```
|
||||
|
||||
This prevents other agents from picking the same issue.
|
||||
|
||||
### 3. Isolate
|
||||
|
||||
```bash
|
||||
cd /opt/whalebro/codewhale
|
||||
git fetch origin
|
||||
git worktree add ../worktrees/issue-<N> -b agent/<N>-<slug> origin/main
|
||||
cd ../worktrees/issue-<N>
|
||||
```
|
||||
|
||||
Every issue gets its own branch and worktree. The branch name convention is `agent/<issue-number>-<short-slug>`.
|
||||
|
||||
### 4. Execute
|
||||
|
||||
```bash
|
||||
gh issue view <N> --json body -q .body | \
|
||||
codewhale exec --auto --output-format stream-json "$(cat)"
|
||||
```
|
||||
|
||||
The agent reads the issue body and implements the fix. Use a tmux session per issue so the run survives SSH disconnects:
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s "issue-<N>" \
|
||||
"gh issue view <N> --json body -q .body | \
|
||||
codewhale exec --auto --output-format stream-json \"\$(cat)\" 2>&1 | tee /tmp/issue-<N>.log"
|
||||
```
|
||||
|
||||
For resuming an interrupted run (`--continue` picks up the most recent
|
||||
session for this workspace; `--resume latest` only exists in the interactive
|
||||
TUI):
|
||||
|
||||
```bash
|
||||
codewhale exec --auto --output-format stream-json --continue "..."
|
||||
```
|
||||
|
||||
### 5. Verify
|
||||
|
||||
Run the exact commands from the issue's **Verification** section. If they pass, proceed. If they fail, loop back to step 4 with the error output as context, or label `needs-human`.
|
||||
|
||||
### 6. Deliver
|
||||
|
||||
```bash
|
||||
gh pr create \
|
||||
--repo Hmbown/CodeWhale \
|
||||
--base main \
|
||||
--title "<descriptive title>" \
|
||||
--body "Closes #<N>" \
|
||||
--label v0.8.58
|
||||
```
|
||||
|
||||
All delivery is via PR — never push to `main` directly. Human review is required before merge.
|
||||
|
||||
### 7. On blockage
|
||||
|
||||
```bash
|
||||
gh issue edit <N> --add-label needs-human --remove-label agent-in-progress
|
||||
gh issue comment <N> --body "Blocked: <reason>. Human decision needed."
|
||||
```
|
||||
|
||||
Common blockers: missing credentials, ambiguous scope, test environment unavailable, network outage.
|
||||
|
||||
---
|
||||
|
||||
## Label semantics
|
||||
|
||||
| Label | Meaning | Auto-applied? |
|
||||
|---|---|---|
|
||||
| `agent-ready` | Body has all six template sections; a remote agent may claim it | Yes (template) |
|
||||
| `agent-in-progress` | Claimed by an agent run; do not double-pick | Manual (step 2) |
|
||||
| `needs-human` | Agent blocked; requires human decision or credentials | Manual (step 7) |
|
||||
| `autonomous-ready` | Legacy nightly-loop label; distinct from `agent-ready` | No |
|
||||
|
||||
The `autonomous-ready` label is for the legacy nightly loop (external automation).
|
||||
New work uses `agent-ready`.
|
||||
|
||||
---
|
||||
|
||||
## Safety rules
|
||||
|
||||
1. **PR-only delivery.** Never commit to `main`. Every change is a branch + PR.
|
||||
2. **No force-push.** `git push --force` is forbidden.
|
||||
3. **Secrets never in argv, history, or logs.** API keys, PATs, and credentials live in `/etc/codewhale/*.env` and are sourced into the agent user's shell. The runtime API listens on `127.0.0.1:7878` only. Telegram bridge chats are allowlisted.
|
||||
4. **Human reviews every PR.** The droplet loop delivers PRs; a human on the laptop reviews and merges.
|
||||
5. **One issue per worktree.** No cross-contamination between concurrent agent runs.
|
||||
|
||||
---
|
||||
|
||||
## Issue body format
|
||||
|
||||
Every `agent-ready` issue must have these six sections (enforced by `.github/ISSUE_TEMPLATE/agent-task.yml`):
|
||||
|
||||
1. **Goal / Why** — what problem, why now
|
||||
2. **Scope / Plan** — numbered steps with file paths
|
||||
3. **Key files** — paths to read first
|
||||
4. **Acceptance criteria** — behavior-level checkboxes
|
||||
5. **Verification** — exact shell commands
|
||||
6. **Out of scope** — explicit non-goals
|
||||
|
||||
The body must be self-sufficient: a fresh clone agent with no conversation context must be able to execute it.
|
||||
+88
-1
@@ -612,7 +612,94 @@ receives the text produced by the previous hook. Hooks marked
|
||||
`background = true` are observer-only and cannot transform or block
|
||||
the message. Existing environment variables remain available.
|
||||
`shell_env` hooks keep their existing `KEY=VALUE` stdout contract;
|
||||
the JSON stdout contract applies only to `message_submit`.
|
||||
JSON stdout contracts exist for `message_submit` (above) and
|
||||
`tool_call_before` (below).
|
||||
|
||||
### `tool_call_before` decision hooks
|
||||
|
||||
`tool_call_before` hooks run before each tool call executes. In
|
||||
addition to the legacy hard deny (exit code `2`, which always wins
|
||||
regardless of stdout), a foreground hook may print a JSON decision on
|
||||
stdout with exit code `0`:
|
||||
|
||||
```json
|
||||
{
|
||||
"decision": "allow" | "deny" | "ask",
|
||||
"reason": "human-readable explanation (used for deny)",
|
||||
"updatedInput": { "command": "ls -la" },
|
||||
"additionalContext": "text appended to the tool result for the model"
|
||||
}
|
||||
```
|
||||
|
||||
All fields are optional. Empty stdout, non-JSON stdout, and JSON
|
||||
without a `decision` field behave exactly as before (allow). An
|
||||
unrecognized `decision` string logs a warning and is treated as allow.
|
||||
|
||||
- `deny` blocks the tool; the model receives a permission-denied tool
|
||||
result containing `reason`.
|
||||
- `ask` forces the interactive approval prompt even for tools that
|
||||
would otherwise auto-run.
|
||||
- `updatedInput` must be a JSON object; it replaces the tool input
|
||||
before execution. When several hooks supply it, the last hook wins.
|
||||
- `additionalContext` is appended to the tool result sent back to the
|
||||
model as `[hook context] ...`. Multiple hooks' contexts are
|
||||
concatenated.
|
||||
|
||||
When multiple hooks match, precedence is deny > ask > allow. Hooks
|
||||
marked `background = true` cannot steer tool calls — they exit
|
||||
immediately without a captured result.
|
||||
|
||||
Example deny hook:
|
||||
|
||||
```toml
|
||||
[[hooks.hooks]]
|
||||
event = "tool_call_before"
|
||||
command = '''echo '{"decision":"deny","reason":"blocked by project policy"}' '''
|
||||
condition = { type = "tool_name", name = "exec_shell" }
|
||||
```
|
||||
|
||||
Example ask hook (force approval for every MCP tool):
|
||||
|
||||
```toml
|
||||
[[hooks.hooks]]
|
||||
event = "tool_call_before"
|
||||
command = '''echo '{"decision":"ask"}' '''
|
||||
condition = { type = "tool_name", name = "mcp__*" }
|
||||
```
|
||||
|
||||
Example input rewrite:
|
||||
|
||||
```toml
|
||||
[[hooks.hooks]]
|
||||
event = "tool_call_before"
|
||||
command = "~/.codewhale/hooks/clamp-shell-timeout.sh"
|
||||
condition = { type = "tool_name", name = "exec_shell" }
|
||||
```
|
||||
|
||||
where the script reads the hook context, then prints
|
||||
`{"updatedInput": {...}}` with the adjusted arguments.
|
||||
|
||||
`tool_name` conditions support `*` globs: `mcp__*` matches every MCP
|
||||
tool (e.g. `mcp__github__create_issue`) but not built-ins like
|
||||
`read_file`; exact names keep matching exactly. Other regex
|
||||
metacharacters in the pattern are matched literally.
|
||||
|
||||
### Project-local hooks
|
||||
|
||||
Repositories can ship policy in `<workspace>/.codewhale/hooks.toml`,
|
||||
using the same shape as the `[hooks]` table (top-level fields plus
|
||||
`[[hooks]]` entries). Project hooks are appended after global hooks
|
||||
from `config.toml`, so they run last and, for `updatedInput`, win
|
||||
ties. A malformed project file logs a warning and startup falls back
|
||||
to global hooks only.
|
||||
|
||||
```toml
|
||||
# .codewhale/hooks.toml
|
||||
[[hooks]]
|
||||
event = "tool_call_before"
|
||||
command = '''echo '{"decision":"deny","reason":"no shell in this repo"}' '''
|
||||
condition = { type = "tool_name", name = "exec_shell" }
|
||||
```
|
||||
|
||||
### Turn-end observer hooks
|
||||
|
||||
|
||||
@@ -11,6 +11,7 @@ Bindings are not (yet) user-configurable — tracked for a future release (#436,
|
||||
| `F1` or `Ctrl-/` | Toggle the help overlay |
|
||||
| `Ctrl-K` | Open the command palette (slash-command finder) |
|
||||
| `Ctrl-C` | Cancel current turn / dismiss modal / arm-then-confirm quit |
|
||||
| `Ctrl-B` | Background the running foreground shell command (turn continues; the command becomes a `/jobs` background job) |
|
||||
| `Ctrl-D` | Quit (only when the composer is empty) |
|
||||
| `Tab` | Cycle TUI mode: Plan → Agent → YOLO → Plan |
|
||||
| `Shift-Tab` | Cycle reasoning effort: off → high → max → off |
|
||||
|
||||
@@ -28,7 +28,7 @@ Checks:
|
||||
3. Confirm no local sandbox/permission deadlock in tool output
|
||||
|
||||
Actions:
|
||||
1. If a foreground shell command is running, press `Ctrl+B` and choose whether to background it or cancel the current turn.
|
||||
1. If a foreground shell command is running, press `Ctrl+B` to move it to the background (the turn keeps running and the command becomes a background job under `/jobs`); use `Ctrl+C` instead if you want to cancel the turn.
|
||||
2. If the command was started in the background, ask the assistant to cancel it with `exec_shell_cancel` and the returned task id.
|
||||
3. Use `Esc` or `Ctrl+C` to interrupt the current turn when you want to stop the request itself.
|
||||
4. Retry prompt; if still failing, restart TUI.
|
||||
|
||||
Reference in New Issue
Block a user