Files
codewhale/docs/SKILL_INVOCATION_DESIGN.md
Hunter Bown 8cb4f94f30 docs: v0.8.53 tool-surface-diet design + north-star direction
Design-only deliverables for the v0.8.53 "tool surface diet / canonical
surfaces" cutover (no catalog code in this cycle). Grounded in a verified
inventory of the actual tool registry.

- docs/TOOL_LIFECYCLE.md (#2681): the umbrella policy. Five lifecycle states
  (active / deferred / hidden-compatibility / deprecated / removed) modeled as
  const name-sets + an alias table in tool_catalog.rs (not a per-ToolSpec
  field), so registration stays untouched and old transcripts always replay.
  Includes the deprecation manifest (exec_wait/exec_interact/tts →
  hidden-compat; todo_* → checklist_* deprecated; 11 legacy subagent names are
  already non-visible dead code → cleanup + guardrail), per-mode/per-provider
  active-catalog budget (incl. Arcee's 8-tool first-turn set), prefix-cache
  safety rules, and the tool_agent decision: canonical but DeepSeek-V4-gated.
- docs/CODEBASE_SEARCH_DESIGN.md (#2680, v0.9.0): local-first FTS5/BM25 +
  symbol/path ranking + RRF hybrid; rusqlite storage; mtime/branch/vendor
  invalidation; an explainable tool contract returning reasons[]; and a real
  CodeWhale query eval set. Complements grep_files/file_search, never replaces.
- docs/SKILL_INVOCATION_DESIGN.md (0.9.0): the $<skill-name> inline invocation
  syntax (the token IS the skill name), namespaced resolution, ambiguity-
  suggests-not-guesses, visible activation line, and a smallest-viable slice.
- docs/VISION_NORTH_STAR.md (0.9.0+): intent router, hybrid codebase
  intelligence, WhaleFlow typed workflow IR, skills/rules runtime, the layered
  context-memory stack, tool repair/autoload, the evaluation loop, and the
  command-surface taxonomy (/memory small · /context dashboard · /rules ·
  /workflow · /overlay · $<skill> · codebase_search). Marked DIRECTION, not
  committed 0.8.53 work; also records the deferred-not-done diet items.

Targets codex/v0.8.53.
2026-06-03 11:47:29 -07:00

12 KiB

Skill Invocation Design — the $<skill-name> inline syntax

Status: DESIGN ONLY (v0.8.53 cycle). No catalog/parser code ships in this cycle; the implementation target is 0.9.0. This document describes what will be built and the contracts it must honor against today's code.

Related design docs: TOOL_LIFECYCLE.md (tool lifecycle states + per-skill tool restriction), command-surface taxonomy notes for /memory, /context, /rules, /workflow (/whaleflow), /overlay. Open PRs on codex/v0.8.53: #2684 (subagent role vocab / lifecycle signals / eval ergonomics) and #2685 (git history active + RLM/field errors). Nothing here contradicts those.


1. Problem

Skill activation has no single, model-legible entry point, and the candidate surfaces all compete with each other:

  • A /skill slash command, a load_skill-style tool, plugin/namespace naming (superpowers:systematic-debugging, github:gh-fix-ci), and the long-running workflow commands (/workflow / /whaleflow) all could be "the way you start a skill." None of them is canonical.
  • Slash commands are already overloaded. /memory, /context, /rules, /config, /provider, /workflow, /overlay each map to one subsystem; jamming skill invocation into /-space forces a weaker model to disambiguate "is this a command or a skill?" on every keystroke.
  • Weaker / smaller models (the cheaper providers CodeWhale targets) do not reliably pick the right mechanism. They will free-text "let me use systematic debugging" instead of actually loading the skill body, so the guidance never enters the context window.
  • Today there is no parser that activates an inline skill mention on submit. slash_menu.rs:86 (partial_inline_skill_mention_at_cursor) recognizes an inline /<skill> token under the cursor for popup purposes only; the submit path in ui.rs:4721 (build_queued_message) does not resolve or activate any inline mention. There is also no activation-mode concept (always-on / glob / model-decision / manual) and skills cannot restrict tools yet.

We need one prefix that means exactly "invoke this skill," is visually distinct from commands, and is cheap for a small model to emit correctly.


2. Proposal

Adopt $ as the skill-invocation prefix, where the token is the skill name — not a literal command called $skill.

$systematic-debugging figure out why MiMo auth fails
$test-driven-development add coverage before fixing
$github:gh-fix-ci inspect the failing checks
$aleph search the planning doc

The leading $ is the marker; everything from $ up to the next whitespace is the skill id. The rest of the line is the user's request, passed through to the model with the skill body already loaded as active guidance.

This is deliberately a reference / macro sigil, like a shell variable expansion or an @mention: $skill-id resolves to "the contents and tool policy of that skill," then the surrounding prose is the task.

$ works in three places (see §4): the user composer, the command-palette input, and model-facing planning text — so the model itself can write $systematic-debugging in its plan and have it resolve.


3. Resolution rules

Given a token $<id> (id captured up to the next whitespace):

  1. Exact name first. Look the id up directly: discover_in_workspace(workspace).get(id)skills/mod.rs:553 builds the registry; SkillRegistry::get (skills/mod.rs:421) matches on s.name == id exactly. Skill names come from frontmatter name: (or the first # Heading fallback) parsed at skills/mod.rs:382-417. An exact hit wins unconditionally.

  2. Namespaced $ns:skill. If the id contains a :, treat the part before the colon as a source/plugin namespace and the part after as the skill name: $github:gh-fix-ci, $superpowers:systematic-debugging. Namespaced ids are the disambiguation handle a user is told to type when a bare id is ambiguous. (Glob/wildcard namespacing — $github:* — is explicitly deferred, see §6.)

  3. Fuzzy match suggests, never silently chooses. If there is no exact (or namespaced-exact) hit, run a case-insensitive substring / prefix match over SkillRegistry::list() (skills/mod.rs:426). If exactly one skill matches, surface it as a suggestion ("did you mean $systematic-debugging?") but do not auto-activate it. If more than one matches, list them and require the user/model to re-issue with a disambiguated id (§7). Ambiguity never resolves to a silent pick.

  4. Respect enable-state. A resolved skill is only activated if SkillStateStore::is_enabled(id) is true (skill_state.rs:73: !self.disabled.contains(skill_name)). A disabled skill that resolves by name produces a clear "skill is disabled; enable it with /skill enable <id>" message rather than silently activating or silently doing nothing.

Resolution order is therefore: exact → namespaced-exact → enabled-check → fuzzy-suggest (never auto-pick).


4. Behavior

When a $<id> mention resolves and is enabled:

  • Visible activation line. The transcript shows Using skill: <name> so the user can see which skill body entered context. (Mirrors the existing skill UX vocabulary; one line per activated skill.)
  • Body loaded as active guidance. The skill's body (skills/mod.rs Skill.body) is injected into the turn as authoritative guidance, the same content a /skill-style activation would load. The user's trailing prose is the task the guidance applies to.
  • Tool-surface narrowing (when declared). If the skill declares a set of allowed tools, the active tool surface narrows to that set for the duration of the skill's influence. Per-skill tool restriction is net-new — skills cannot restrict tools today; the mechanism, and how narrowing interacts with the catalog-head byte-stability invariant (tool_catalog.rs:169-196), is specified in TOOL_LIFECYCLE.md. Until that lands, a declared tool list is parsed and shown but not enforced.
  • Multiple $mentions compose explicitly, or prompt. Until formal composition rules exist, two or more $mentions in one message either compose only when the rule is unambiguous (e.g. one guidance skill + one tool-scoping skill) or return a "choose one" prompt listing the mentioned skills. We never silently activate multiple complex skills at once (see §7 and Non-goals).
  • Three input surfaces. Resolution runs for: (a) user prompts in the composer, (b) command-palette input, and (c) model-facing planning text, so a model that writes $test-driven-development in its plan triggers the same activation path a human would.
  • Slash commands remain supported. /skill ... and the rest of the slash surface keep working unchanged. $ is the preferred path for models because it is one token and unambiguous, but it is additive, not a replacement (§7 Non-goals).

5. Why $

  • Visually distinct from /commands. A glance separates "run a subsystem command" (/memory, /context, /workflow) from "load a skill" ($aleph). Weaker models stop confusing the two surfaces.
  • Reads like a reference / macro. $name already means "expand this named thing" to anyone who has touched a shell or a templating language. Skill invocation is an expansion: $skill-id → that skill's guidance + tool policy.
  • Avoids overloading the slash namespace. /workflow, /memory, /config, /provider, /rules, /overlay, /context each already own one meaning in the command-surface taxonomy. Skills get their own sigil instead of a crowded /skill <name> subcommand competing with all of them.
  • Easy to type and remember. Single leading character, then the literal skill name. Nothing to memorize beyond the skill ids the user already sees in /skill list.

6. Implementation plan (smallest viable 0.8.53-ready slice → 0.9.0)

The 0.8.53 cycle is docs only. The plan below is the build order once code is unblocked; the first slice is intentionally the minimum that proves the path.

Slice 1 — token scanner at submit (the minimum viable feature).

  • Add a $<skill-id> token scanner invoked on submit, before build_queued_message runs (ui.rs:4721). The scanner finds leading-$ tokens, captures the id up to the next whitespace, and hands each id to the resolver. The scanner must skip $ occurrences inside code fences and inline command strings (see Non-goals) so shell $VAR references are never treated as skill mentions.
  • Resolve via discover_in_workspace(workspace).get(id) (skills/mod.rs:553 / :421), gate on SkillStateStore::is_enabled (skill_state.rs:73), and emit the Using skill: <name> line plus the loaded body.

Slice 2 — inline-mention popup.

  • Extend the inline-mention popup machinery in slash_menu.rs:86 (partial_inline_skill_mention_at_cursor) to recognize a $-prefixed token under the cursor and offer skill-name completions from SkillRegistry::list(), the same way the slash popup offers commands. This is a UX accelerator on top of Slice 1, not a precondition for it.

Slice 3 — ambiguity diagnostics.

  • When resolution is ambiguous, emit actionable diagnostics, e.g. "$debugging matched 3 skills: systematic-debugging, root-cause-debugging, superpowers:systematic-debugging — use $superpowers:systematic-debugging". Diagnostics name the disambiguated id the user should type next.

Deferred to 0.9.0+ (explicitly out of the first slices):

  • $ns:skill globs / wildcards ($github:*). Plain namespaced-exact ($github:gh-fix-ci) ships in Slice 1; globbing does not.
  • Per-skill tool restriction enforcement. Parsing/display can land early; enforcement and its catalog-head-stability handling are owned by TOOL_LIFECYCLE.md.
  • Multi-skill composition rules. Until defined, fall back to the "choose one" prompt (§4, §7).

7. Ambiguity / error UX, tests, and non-goals

Error / ambiguity UX examples

Input Outcome
$systematic-debugging fix the auth bug Exact hit. Using skill: systematic-debugging, body loaded, task = "fix the auth bug".
$github:gh-fix-ci inspect failing checks Namespaced-exact hit. Using skill: github:gh-fix-ci, body loaded.
$nope do a thing No match. "No skill named 'nope'. Run /skill list to see available skills." No activation; the line is sent as ordinary text.
$debugging ... (3 candidates) "$debugging matched 3 skills: systematic-debugging, root-cause-debugging, superpowers:systematic-debugging — use $superpowers:systematic-debugging." No auto-pick.
$systematic-debug ... (1 fuzzy candidate) Suggest only: "No exact skill 'systematic-debug'. Did you mean $systematic-debugging?" No silent activation.
$aleph ... but aleph disabled "Skill 'aleph' is disabled. Enable it with /skill enable aleph." No activation.
$tdd $systematic-debugging ... (2 mentions) "Choose one skill to lead this turn: $test-driven-development or $systematic-debugging." (until composition rules exist).
echo $PATH inside a code fence / command string Not a mention. Scanner skips $ inside code/command contexts.

Tests (planned)

  • Exact: $systematic-debugging resolves via get(id), activates, loads body.
  • Namespaced: $github:gh-fix-ci resolves on the ns:skill form.
  • Missing: $nope → no-match message, no activation, line passed as text.
  • Ambiguous: $debugging (≥2 candidates) → "matched N skills … use $ns:skill", asserts no auto-activation occurred.
  • Disabled: a skill with is_enabled == false → disabled message, no activation.
  • Guardrail — $ in code: $VAR inside a fenced block or command string is not treated as a mention.

Non-goals

  • Do not remove slash commands. /skill and the whole / surface stay; $ is preferred for models but additive.
  • Do not auto-run arbitrary scripts. A $mention loads guidance (and, later, a declared tool policy) — it never executes shell or skill-attached scripts on its own.
  • Do not silently activate multiple complex skills. Multi-mention falls back to a "choose one" prompt until composition rules are specified.
  • Do not let $ collide with shell variables. $ inside code fences and command strings is never parsed as a skill mention.