Files
codewhale/crates
Hunter Bown f964e2fd37 fix(snapshot): cap workspace size at 2 GB before first side-repo init
Users reported running `deepseek-tui` inside project directories
with hundreds of GB of content — ML datasets, model weights
(`.safetensors`, `.gguf`, `.pt`, `.onnx`), Docker image dumps,
parquet / arrow caches, anything that falls outside the snapshot
built-in excludes. The pre/post-turn snapshot path called
`SnapshotRepo::open_or_init` which initialized the side git repo
and then ran `git add -A` — which walked the entire workspace
indexing every file. On a 100-300 GB directory this hung the TUI
for minutes-to-hours while git churned through the index.

The pre-existing v0.8.27 fixes (#1112: retention cap, mid-session
prune, expanded built-in excludes) addressed the orthogonal
"snapshots grow unbounded over many turns" angle but did nothing
to prevent the first snapshot from being impossible to take.

This change adds `estimate_workspace_size_bounded()` — a bounded
`ignore::WalkBuilder` walk that respects `.gitignore` and the
snapshot module's existing skip list (`node_modules/`, `target/`,
`.next/`, `.venv/`, `__pycache__/`, etc.). The walk early-exits
at either the byte cap or 200,000 file entries, returning `None`
to signal "too big to snapshot."

`SnapshotRepo::open_or_init_with_cap(workspace, cap_bytes)` calls
the estimator *before* the side `git init`, and returns
`Err(InvalidInput)` with a "workspace too large" reason — which
`turn::snapshot_with_label` already logs at WARN and continues
past, so a too-large workspace silently disables snapshots
without blocking any turn. The check is paid only on first init;
subsequent snapshots through the existing side repo skip it.

Plumbing:
- `SnapshotsConfig.max_workspace_gb` (default 2, `0` disables)
- `EngineConfig.snapshots_max_workspace_bytes` resolved at engine
  construction from `config.snapshots_config().max_workspace_gb`
- `pre_turn_snapshot` / `post_turn_snapshot` / `pre_tool_snapshot`
  take a `cap_bytes: u64` argument threaded from the engine
- `SnapshotRepo::open_or_init` retains its v0.8.31 signature as a
  thin wrapper over `open_or_init_with_cap` using the default cap
- `config.example.toml` documents the new `max_workspace_gb` knob
  with the "set to 0 to disable" escape hatch for users with
  legitimate large monorepos

Six new tests pin both the estimator (under-cap returns Some,
over-cap returns None, builtin-excluded dirs skipped, cap=0
disables the bound) and the `open_or_init_with_cap` integration
(oversized workspace fails with the right error and references
the config knob; cap=0 succeeds even on oversized content).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 02:11:10 -05:00
..
2026-05-11 23:07:02 -05:00
2026-05-11 23:07:02 -05:00
2026-05-11 23:07:02 -05:00
2026-05-11 23:07:02 -05:00
2026-05-11 23:07:02 -05:00
2026-05-11 23:07:02 -05:00