Files
codewhale/docs/REMOTE_SETUP_DESIGN.md
T
Hunter Bown 06612495fc chore(release): prep v0.8.51 — Arcee provider, cycle removal, UI fixes
Release-preparation checkpoint for v0.8.51 (workspace + npm bumped to 0.8.51).

Added:
- Arcee AI direct provider: [providers.arcee], ARCEE_API_KEY/BASE_URL/MODEL,
  CLI auth, provider + model picker, registry. Default direct-API model is
  trinity-large-thinking (reasoning, 262K ctx/out); preview + mini selectable.
  Cloudflare-WAF-safe opening turn (benign read-only tool surface, system-prompt
  payload splitting) and reasoning_content replay on tool-call turns.
- Expanded model catalog (qwen3.6 flash/plus/max-preview, Xiaomi MiMo v2.5
  chat/ASR/TTS); provider-aware model picker with per-provider saved models.

Changed:
- Auto-compaction is percentage- and model-aware
  (compaction_threshold_for_model_at_percent; default 80%; auto-enable for
  <=256K windows, opt-in for 1M models).
- Provider/gateway HTTP errors sanitized (HTML/WAF interstitials collapsed,
  401/403 split into authentication vs authorization).

Removed:
- The session cycle / checkpoint-restart system: /cycles, /cycle, /recall,
  recall_archive tool, cycle_manager, cycle-handoff prompt, sidebar cycle lines,
  EngineConfig.cycle / Event::CycleAdvanced / seam cycle thresholds.

Fixed:
- Orphaned assistant 'blue dot' role glyph on whitespace-only turns.
- Sidebar mouse-wheel scroll leaking into the transcript.
- Sidebar hover tooltip overlap + warning-orange styling.
- README Constitution description corrected to match prompts/base.md.
- Repaired release-blocking unit/integration tests after the refactors.

Preflight: cargo fmt clean, workspace builds, 3903 tui tests pass (1 known
flaky MCP SSE test under parallel load, passes in isolation).
2026-06-02 17:36:18 -07:00

13 KiB
Raw Blame History

codewhale remote-setup — Design & Implementation Plan

Status: design (do not implement against the 0.8.48 release wrap; land on a branch or after 0.8.48 ships). Author handoff doc, mirrors the style of REFACTOR_HANDOFF.md.

Goal

One command — codewhale remote-setup — that guides a user through standing up a remote CodeWhale agent they can talk to from a phone chat app, across:

  • Cloud target: Tencent Lighthouse or Azure (extensible to GCP/Hetzner/bare).
  • Chat bridge: Feishu/Lark or Telegram (extensible to Slack/Discord).
  • Model provider: any entry in the existing PROVIDERS registry (DeepSeek, OpenAI, NVIDIA NIM, Atlascloud, WanjieArk, OpenRouter, Novita, Fireworks, Moonshot, SGLang, vLLM, Ollama, Xiaomi).

Decisions locked with the user:

  • Form: native Rust subcommand in-binary (touches crates/cli + crates/tui).
  • Scope: generate the deploy bundle and optionally auto-provision via the cloud CLI (az / CNB), behind a confirmation gate.

Prior art: Hermes Agent (reference only — do not copy)

/Volumes/VIXinSSD/hermesagent (Nous Research's Hermes Agent, Python) solves the same problem and validates this design. Use it for ideas; keep CodeWhale's style (Rust core, zero-dep Node bridges, plain-text replies).

  • Its gateway/platform_registry.py is exactly the table-driven approach here: a PlatformEntry { name, label, adapter_factory, check_fn, validate_config, required_env, install_hint, setup_fn, source }. That maps 1:1 onto our BridgeSpec/CloudTarget rows, and its per-platform setup_fn + required_env are what our wizard reads to prompt. A single gateway process fans out to many platforms — the model we want.
  • Its gateway/pairing.py mirrors our allowlist/first-pairing flow.

Telegram hardening checklist (mined from gateway/platforms/telegram.py)

That adapter is battle-tested; its method names enumerate edge cases our MVP bridge should handle. Status against integrations/telegram-bridge:

Edge case In Hermes In our bridge
409 polling conflict (two getUpdates) _looks_like_polling_conflict done — poll loop backs off 10s + warns
429 retry_after rate-limit handling donetelegramApi honors parameters.retry_after
Forum "General topic = 1" send/typing asymmetry _message_thread_id_for_send vs _for_typing done — omit message_thread_id when id is 1 on send
"message to be replied not found" after restart _send_with_dm_topic_reply_anchor_retry sidestepped — we never set reply_to_message_id
Network/connect-timeout retry _looks_like_network_error partial — generic 3s backoff in poll loop
Text batching + progress-edit (edit one msg vs spam) test_telegram_text_batching deferred — we send a chunk every 15s
MarkdownV2 escaping + table rendering _escape_mdv2, _wrap_markdown_tables deferred — plain text (safe; tables look plain)
Webhook mode as an alternative to long polling _webhook_mode out of scope — long-poll only (no inbound ports)

Deferred items are deliberate: progress-edit and MarkdownV2 add real UX polish but also complexity and (for MDV2) a whole class of parser-escaping bugs. Revisit after remote-setup lands.

Design principle: table-driven, like ProviderSpec

The provider registry (crates/config/src/lib.rs::PROVIDERS) is the model to copy: "adding a provider is one row." Apply the same to clouds and bridges so the matrix grows by data, not by new control flow.

        CloudTarget  ×  BridgeSpec   +  ProviderSpec (existing registry)
        ───────────     ──────────      ────────────────────────────────
        lighthouse      feishu          deepseek / openai / nvidia-nim / …
        azure           telegram        (wizard reads PROVIDERS, prompts for
        (future…)       (future…)        that provider's env_keys[0])

Clean separation that the architecture already implies:

  • Provider = a runtime.env concern. The runtime resolves the provider from CODEWHALE_PROVIDER and the provider's own key var. The bridge never needs to know which provider is behind the runtime — it only forwards model to /v1/threads. So "multi-provider" only touches runtime.env generation.
  • Cloud = where it runs + where secrets live.
  • Bridge = pure transport between a chat app and 127.0.0.1:7878.

Command surface

New variant in crates/cli/src/lib.rs Commands:

/// Provision and configure a remote CodeWhale agent (cloud + chat bridge).
RemoteSetup(RemoteSetupArgs),

RemoteSetupArgs (clap):

Flag Meaning
--cloud <azure|lighthouse> Skip the cloud prompt.
--bridge <telegram|feishu> Skip the bridge prompt.
--provider <slug> Provider slug; validated against PROVIDERS.
--out <dir> Bundle output dir (default ./codewhale-deploy/<cloud>-<bridge>).
--generate-only Emit the bundle, do not provision (default).
--apply Run the cloud CLI to actually provision (the auto-provision path).
--yes Skip the final confirmation gate (CI/non-interactive).
--non-interactive Fail instead of prompting if any required value is missing.

CLI delegates to the TUI binary exactly like Serve/Setup do (delegate_to_tui(&cli, &resolved_runtime, tui_args("remote-setup", args))). The implementation lives next to run_setup in crates/tui/src/.

Code layout

New module crates/tui/src/remote_setup/:

remote_setup/
  mod.rs          # run_remote_setup(): wizard orchestration + dispatch
  registry.rs     # CloudTarget + BridgeSpec tables (the matrix)
  prompt.rs       # thin stdin prompt helpers (reuse existing patterns)
  bundle.rs       # render env files / systemd units / RUNBOOK.md to --out
  provision/
    mod.rs        # Provisioner trait + confirmation gate + dry-run printer
    azure.rs      # az preflight, RG, VM+cloud-init, Key Vault, NSG, start
    lighthouse.rs # cnb.yml + tag_deploy.yml generation, CNB guidance
  templates/      # runtime.env, <bridge>.env, *.service, cloud-init.yaml.tmpl

Registry types

pub struct BridgeSpec {
    pub slug: &'static str,            // "telegram"
    pub display: &'static str,         // "Telegram"
    pub package_dir: &'static str,     // "integrations/telegram-bridge"
    pub service_unit: &'static str,    // "codewhale-telegram-bridge.service"
    pub env_template: &'static str,    // templates/telegram.env
    /// Bridge-specific secret env keys to prompt for (token, etc.).
    pub secret_keys: &'static [&'static str], // ["TELEGRAM_BOT_TOKEN"]
    /// One-liner shown before prompting (e.g. "Create a bot with @BotFather").
    pub setup_hint: &'static str,
}

pub struct CloudTarget {
    pub slug: &'static str,            // "azure"
    pub display: &'static str,         // "Azure VM"
    pub secret_store: SecretStore,     // KeyVault | EnvFile
    pub install: InstallMethod,        // Docker | NativeSystemd
    /// Builds the ordered list of provisioning steps as (description, command).
    /// Commands are returned as data so they can be dry-run printed, gated,
    /// and only then executed.
    pub plan: fn(&DeployInputs) -> Vec<ProvisionStep>,
}

A ProvisionStep { description, program, args, secret_args } is data, never a shell string — so the confirmation gate can print every command, secrets are fed via stdin/temp files (never argv/history), and --apply just executes the already-printed plan.

Wizard flow

  1. Cloud — pick from CLOUD_TARGETS (or --cloud).
  2. Bridge — pick from BRIDGES (or --bridge); print setup_hint.
  3. Provider — list PROVIDERS (canonical names), pick (or --provider). Look up spec.env_keys[0] as the key var to prompt for.
  4. Secrets — prompt for: provider API key, bridge token(s) from secret_keys, allowlist (chat ids). Generate a random CODEWHALE_RUNTIME_TOKEN.
  5. Mode — generate-only vs --apply.
  6. Render bundle to --out (always, even with --apply).
  7. Confirm + provision (only if --apply): print the full ordered command plan, require y (unless --yes), then execute step by step with progress.
  8. Print RUNBOOK.md path and the remaining manual steps.

Generated bundle

Written to ./codewhale-deploy/<cloud>-<bridge>/:

  • runtime.envprovider config lives here:
    CODEWHALE_PROVIDER=openai
    OPENAI_API_KEY=…              # the provider's own key var, from registry
    CODEWHALE_MODEL=auto
    CODEWHALE_RUNTIME_TOKEN=<random>
    CODEWHALE_RUNTIME_PORT=7878
    CODEWHALE_RUNTIME_WORKERS=2
    RUST_LOG=info
    
  • <bridge>.env — transport only: CODEWHALE_RUNTIME_URL=http://127.0.0.1:7878, matching CODEWHALE_RUNTIME_TOKEN, allowlist, TELEGRAM_BOT_TOKEN (or Feishu app id/secret), CODEWHALE_WORKSPACE, CODEWHALE_MODEL.
  • codewhale-runtime.service, codewhale-<bridge>.service.
  • Cloud artifact: cloud-init.yaml + provision.sh (Azure) or cnb.yml + tag_deploy.yml (Lighthouse).
  • RUNBOOK.md — the exact remaining commands + first-pairing steps.

Auto-provision

Azure (--apply --cloud azure)

Preflight: az account show (fail with "run az login" if absent). Then the plan() emits, in order:

  1. az group create (region prompted; default eastus).
  2. az keyvault create + az keyvault secret set for the provider key and the runtime token (secrets via stdin, not argv).
  3. az vm create with --custom-data cloud-init.yaml and a system-assigned managed identity; cloud-init pulls ghcr.io/hmbown/codewhale:latest, reads the secrets from Key Vault via the identity, writes /etc/codewhale/*.env, installs both systemd units, enable --now.
  4. NSG: SSH (22) only, scoped to the caller's IP; 7878 stays on 127.0.0.1.
  5. Print the SSH tunnel command for /status from a laptop if desired.

Lighthouse (--apply --cloud lighthouse)

Reuse the existing deploy/tencent-lighthouse/cnb/*.example pipeline: render cnb.yml + tag_deploy.yml from inputs and walk the user through the CNB trigger (CNB does the VM-side work). Systemd units mirror the existing codewhale-runtime.service.

Safety (matches the harness rules for outward-facing actions):

  • Every command printed before execution; y gate unless --yes.
  • Secrets never in argv or shell history.
  • --generate-only is the default; --apply is explicit.

Namespace migration: DEEPSEEK_*CODEWHALE_*

Follow the convention already in crates/config/src/lib.rs: read CODEWHALE_X first, fall back to DEEPSEEK_X. Nothing breaks for existing deployments.

Touch list:

  1. Bridges (integrations/feishu-bridge, integrations/telegram-bridge): in lib.mjs/index.mjs, read process.env.CODEWHALE_X ?? process.env.DEEPSEEK_X for RUNTIME_URL, RUNTIME_TOKEN, WORKSPACE, MODEL, MODE, ALLOW_SHELL, TRUST_MODE, AUTO_APPROVE, CHAT_ALLOWLIST, ALLOW_UNLISTED, TURN_TIMEOUT_MS. Validators accept either; templates emit CODEWHALE_*.
  2. Deploy units (deploy/tencent-lighthouse/systemd/*, integrations/*/deploy/*): DEEPSEEK_RUNTIME_*CODEWHALE_RUNTIME_*, env file paths /etc/deepseek//etc/codewhale/ (keep reading the old path if present).
  3. .env.example files + config.example.toml: lead with CODEWHALE_*, document DEEPSEEK_* as legacy aliases.
  4. Drop DeepSeek-shaped defaults in the bridge: no hardcoded DEEPSEEK_MODEL=auto; the provider lives in runtime.env via CODEWHALE_PROVIDER + the registry's key var.

Note: items 13 touch tracked files, so they are part of the same "don't ship during 0.8.48" hold. The brand-new (untracked) Telegram bridge can be converted to CODEWHALE_* first as the reference implementation.

Tests

  • registry.rs: every CloudTarget/BridgeSpec slug is unique; each bridge's package_dir/service_unit/env_template exists.
  • bundle.rs: rendering a bundle for each cloud×bridge×provider triple produces files with CODEWHALE_* keys, a matching runtime/bridge token, and a non-empty RUNBOOK.
  • provision: plan() returns the expected ordered steps; commands are built but never executed in tests (assert on program+args, secrets redacted).

Extensibility check

  • Add GCP: one CloudTarget row + a provision/gcp.rs + a cloud-init reuse.
  • Add Slack: one BridgeSpec row + integrations/slack-bridge + template. No changes to the wizard control flow — it iterates the registries.

Suggested sequencing (given the 0.8.48 freeze)

  1. Now (safe, untracked): convert the new Telegram bridge to CODEWHALE_* ?? DEEPSEEK_*; finalize this design.
  2. Post-0.8.48, branch: namespace migration on tracked bridges + deploy units.
  3. Then: implement remote-setup (registry → bundle → Azure provisioner → Lighthouse provisioner), generate-only first, --apply second.