docs(hardening-guide): 8.6 — sandbox-architecture rationale (no code consolidation)

2026-04-30 16:55:45 +02:00 · 2026-04-30 16:55:45 +02:00 · e8ea75fe6b
commit e8ea75fe6b
parent 2b7329151c
1 changed files with 74 additions and 1 deletions
--- a/plugins/llm-security/docs/security-hardening-guide.md
+++ b/plugins/llm-security/docs/security-hardening-guide.md
@ -254,7 +254,80 @@ tools.
 ---
-## 7. Recommended baseline for production
+## 7. Sandbox Architecture: Why git-clone and vsix-sandbox Stay Separate
 The plugin has two sandbox-using consumers — `scanners/lib/git-clone.mjs`
 (remote-repo cloning) and `scanners/lib/vsix-sandbox.mjs` (URL-fetched VS Code
 / JetBrains plugin extraction). On the surface they look like duplication
 candidates: both call `sandbox-exec` on macOS, both call `bwrap` on Linux,
 both fall back to in-process execution on Windows. They are intentionally not
 consolidated. This section documents why.
 ### 7.1 Shared primitives, not shared code paths
 The `sandbox-exec` profile builders and `bwrap` argument builders live in
 `lib/vsix-sandbox.mjs` and are *reused* from `git-clone.mjs` — the
 duplication is conceptual, not literal. Both consumers call:
 - `buildSandboxProfile(allowedWriteDir)` — emits the macOS sandbox-exec
  S-expression that whitelists writes only to `allowedWriteDir`.
 - `buildBwrapArgs(allowedWriteDir, networkAllowed)` — emits the bwrap
  argv for a unprivileged-user-namespace container with the same
  write-restriction.
 - `buildSandboxedWorker(dirs, workerPath)` — wraps a Node sub-process
  in the platform-appropriate sandbox.
 The kernel-level isolation contract is identical for both consumers.
 ### 7.2 Distinct setup contracts
 What differs is the *git/zip side* of each pipeline. These contracts are
 not interchangeable:
 | Concern | git-clone.mjs | vsix-sandbox.mjs |
 |---------|---------------|------------------|
 | Untrusted setup vector | `.gitattributes` filter/smudge drivers | ZIP entries with `..` traversal, symlinks, ratio bombs |
 | Pre-fetch hardening | `core.hooksPath=/dev/null`, `core.symlinks=false`, all LFS filters disabled, `protocol.file.allow=never`, `transfer.fsckObjects=true` | ZIP-extractor caps (10 000 entries, 500MB uncomp, 100x ratio, depth 20), entry-by-entry path validation |
 | Environment isolation | `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0` | None — fetch is plain HTTPS via `lib/vsix-fetch.mjs`, no env-var attack surface |
 | Network policy | Network allowed (clone needs HTTPS) | Network allowed in fetch worker only; extraction worker is offline |
 | IPC contract | None — git writes its tree directly into the sandboxed temp dir | Single-line JSON on stdout: `{ok, sha256, size, finalUrl, source, extRoot}` |
 A unified "do-everything" sandbox helper would either need to know about
 git config flags (irrelevant for VSIX), or would need a callback escape
 hatch that re-introduces the abstraction tax it was meant to remove.
 ### 7.3 Consolidation deferred
 Three reasons this stays as it is:
 1. **Premature abstraction risk on safety-critical code.** Both modules
   are on the trust boundary. A bug in shared abstraction would
   simultaneously weaken both consumers; today, bugs are isolated.
 2. **Two consumers is not enough signal.** The Rule of Three applies:
   abstract when a third consumer arrives and the contract becomes clear,
   not before.
 3. **Distinct review surfaces.** Reviewers reading `git-clone.mjs` get
   the full git-attack-surface story in one file; reviewers reading
   `vsix-sandbox.mjs` get the full ZIP-attack-surface story in one file.
   Splitting either across a generic sandbox helper would force readers
   to context-switch to verify the contract.
 ### 7.4 Trigger condition for revisiting
 This decision will be revisited if and when a third sandbox-using
 consumer appears in the plugin (e.g., a sandboxed evaluator for
 suspicious shell scripts, or a sandboxed PDF/PPTX parser). At that
 point the shared contract — write restriction to a temp dir, network
 policy, IPC shape — should be lifted into a `lib/sandbox.mjs` module
 with the per-consumer setup remaining co-located with its respective
 attack-surface logic.
 Until then: two consumers, one set of primitives, two co-located
 contracts.
 ---
 ## 8. Recommended baseline for production
 1. Set `CLAUDE_CODE_EFFORT_LEVEL=xhigh` for audit and planning sessions.
 2. Set `ENABLE_PROMPT_CACHING_1H=1` globally — reduces cost, does not weaken