docs(hardening-guide): 8.6 — sandbox-architecture rationale (no code consolidation)

This commit is contained in:
Kjell Tore Guttormsen 2026-04-30 16:55:45 +02:00
commit e8ea75fe6b

View file

@ -254,7 +254,80 @@ tools.
---
## 7. Recommended baseline for production
## 7. Sandbox Architecture: Why git-clone and vsix-sandbox Stay Separate
The plugin has two sandbox-using consumers — `scanners/lib/git-clone.mjs`
(remote-repo cloning) and `scanners/lib/vsix-sandbox.mjs` (URL-fetched VS Code
/ JetBrains plugin extraction). On the surface they look like duplication
candidates: both call `sandbox-exec` on macOS, both call `bwrap` on Linux,
both fall back to in-process execution on Windows. They are intentionally not
consolidated. This section documents why.
### 7.1 Shared primitives, not shared code paths
The `sandbox-exec` profile builders and `bwrap` argument builders live in
`lib/vsix-sandbox.mjs` and are *reused* from `git-clone.mjs` — the
duplication is conceptual, not literal. Both consumers call:
- `buildSandboxProfile(allowedWriteDir)` — emits the macOS sandbox-exec
S-expression that whitelists writes only to `allowedWriteDir`.
- `buildBwrapArgs(allowedWriteDir, networkAllowed)` — emits the bwrap
argv for a unprivileged-user-namespace container with the same
write-restriction.
- `buildSandboxedWorker(dirs, workerPath)` — wraps a Node sub-process
in the platform-appropriate sandbox.
The kernel-level isolation contract is identical for both consumers.
### 7.2 Distinct setup contracts
What differs is the *git/zip side* of each pipeline. These contracts are
not interchangeable:
| Concern | git-clone.mjs | vsix-sandbox.mjs |
|---------|---------------|------------------|
| Untrusted setup vector | `.gitattributes` filter/smudge drivers | ZIP entries with `..` traversal, symlinks, ratio bombs |
| Pre-fetch hardening | `core.hooksPath=/dev/null`, `core.symlinks=false`, all LFS filters disabled, `protocol.file.allow=never`, `transfer.fsckObjects=true` | ZIP-extractor caps (10 000 entries, 500MB uncomp, 100x ratio, depth 20), entry-by-entry path validation |
| Environment isolation | `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0` | None — fetch is plain HTTPS via `lib/vsix-fetch.mjs`, no env-var attack surface |
| Network policy | Network allowed (clone needs HTTPS) | Network allowed in fetch worker only; extraction worker is offline |
| IPC contract | None — git writes its tree directly into the sandboxed temp dir | Single-line JSON on stdout: `{ok, sha256, size, finalUrl, source, extRoot}` |
A unified "do-everything" sandbox helper would either need to know about
git config flags (irrelevant for VSIX), or would need a callback escape
hatch that re-introduces the abstraction tax it was meant to remove.
### 7.3 Consolidation deferred
Three reasons this stays as it is:
1. **Premature abstraction risk on safety-critical code.** Both modules
are on the trust boundary. A bug in shared abstraction would
simultaneously weaken both consumers; today, bugs are isolated.
2. **Two consumers is not enough signal.** The Rule of Three applies:
abstract when a third consumer arrives and the contract becomes clear,
not before.
3. **Distinct review surfaces.** Reviewers reading `git-clone.mjs` get
the full git-attack-surface story in one file; reviewers reading
`vsix-sandbox.mjs` get the full ZIP-attack-surface story in one file.
Splitting either across a generic sandbox helper would force readers
to context-switch to verify the contract.
### 7.4 Trigger condition for revisiting
This decision will be revisited if and when a third sandbox-using
consumer appears in the plugin (e.g., a sandboxed evaluator for
suspicious shell scripts, or a sandboxed PDF/PPTX parser). At that
point the shared contract — write restriction to a temp dir, network
policy, IPC shape — should be lifted into a `lib/sandbox.mjs` module
with the per-consumer setup remaining co-located with its respective
attack-surface logic.
Until then: two consumers, one set of primitives, two co-located
contracts.
---
## 8. Recommended baseline for production
1. Set `CLAUDE_CODE_EFFORT_LEVEL=xhigh` for audit and planning sessions.
2. Set `ENABLE_PROMPT_CACHING_1H=1` globally — reduces cost, does not weaken