Kjell Tore Guttormsen e8ea75fe6b docs(hardening-guide): 8.6 — sandbox-architecture rationale (no code consolidation)

2026-04-30 16:55:45 +02:00

17 KiB

Raw Blame History

Security Hardening Guide

This guide documents the environment variables, sandboxing mechanisms, and hook modes available in llm-security, and how to align them with the capabilities of Opus 4.7 and Claude Code 2.1.112.

The guide is opinionated: it describes the configurations the plugin authors run in production. Deviations are fine, but the defaults here are the tested path.

1. Environment variables

1.1 Harness-level (Claude Code)

Variable	Values	Effect
`CLAUDE_CODE_EFFORT_LEVEL`	`low` \| `medium` \| `high` \| `xhigh`	Tunes how aggressively the model spends compute per turn. `xhigh` is recommended for security-sensitive planning and audits.
`ENABLE_PROMPT_CACHING_1H`	`1` \| unset	Enables 1-hour prompt cache TTL. Reduces cost and latency for repeated context; cache hits do not weaken scanning.
`CLAUDE_CODE_SCRIPT_CAPS`	JSON blob	Declares maximum capabilities Claude Code can grant scripts it spawns. Use to lock down hook and command execution.

1.2 Plugin-specific hook modes

Variable	Default	Modes
`LLM_SECURITY_INJECTION_MODE`	`block`	`block` — exit 2 on critical/high injection patterns. `warn` — advisory via systemMessage. `off` — disables scan.
`LLM_SECURITY_TRIFECTA_MODE`	`warn`	`block` — exit 2 when lethal trifecta (untrusted input + sensitive data + exfiltration sink) detected. `warn` — advisory. `off` — disables.
`LLM_SECURITY_PRECOMPACT_MODE`	`warn`	`block` — exit 2 on findings during PreCompact. `warn` — advisory via systemMessage. `off` — disables scan.
`LLM_SECURITY_PRECOMPACT_MAX_BYTES`	`512000`	Tail size in bytes read from transcript for scanning. Higher values increase coverage at the cost of latency.
`LLM_SECURITY_UPDATE_CHECK`	`on`	`off` disables the daily update-check HTTP call.
`LLM_SECURITY_AUDIT_*`	unset	Audit trail configuration (destination, format, etc.) for SIEM-ready JSONL output.

Apply env vars via shell profile, .envrc, or the host MDM. Do not write them into the repository.

2. Sandboxing

2.1 macOS — `sandbox-exec`

scanners/lib/git-clone.mjs wraps remote clones in a sandbox-exec profile that restricts file writes to the specific temp directory. This defends against malicious .gitattributes filter/smudge drivers. The plugin uses this path by default on Darwin.

2.2 Linux — `bubblewrap` (bwrap)

On Linux, the same flow uses bwrap to accomplish equivalent isolation. Works on Fedora and Arch without configuration. Ubuntu 24.04+ may require a permissive AppArmor profile (administrator privileges); fallback is git-config flags only, with a WARN logged in the clone audit trail.

2.3 Windows

Windows has no equivalent OS sandbox available in default installs. The plugin falls back to hardened git-config flags (core.hooksPath=/dev/null, core.symlinks=false, disabled LFS drivers, protocol.file.allow=never, transfer.fsckObjects=true) and environment isolation (GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, GIT_ATTR_NOSYSTEM=1). A WARN is logged so the caller can weigh the residual risk.

2.4 PID-namespace considerations

On Linux hosts with user namespaces disabled (some hardened kernels), bwrap may fail to create the PID namespace. Prefer running scans from a normal user shell; avoid root, which disables user-namespace confinement.

3. Hook modes in practice

3.1 Start in warn mode

Every new integration of llm-security should begin with all modes set to warn. This yields advisories without breaking workflow, and lets the team calibrate false-positive rates against their actual repositories.

3.2 Promote to block after baselining

After a baseline period (typically 1-2 weeks), flip each mode to block in this order: LLM_SECURITY_INJECTION_MODE, LLM_SECURITY_TRIFECTA_MODE, LLM_SECURITY_PRECOMPACT_MODE. The injection hook is first because false positives there are the most visible; blocking comes last because the others build confidence.

3.3 Off mode is a deliberate choice

Use off only when you explicitly need to disable a layer (e.g., during performance profiling). Prefer warn in all other cases — the signal is still recorded in the audit trail.

4. Bash normalization (T1-T6) as defense-in-depth

scanners/lib/bash-normalize.mjs collapses six known bash obfuscation techniques before the denylist gate runs. These are defense-in-depth layers that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.

The plugin's "defense-in-depth" claim resolves to three independent detection layers with documented bypass classes: (1) the Claude Code harness denylist (out of plugin scope, evolves with platform); (2) bash-normalize.mjs T1-T6 collapse rules; (3) pre-bash-destructive.mjs post-normalization pattern match

post-session-guard.mjs runtime trifecta correlation. Each layer has known bypasses (see Defense Philosophy in CLAUDE.md and docs/critical-review-2026-04-20.md §4 for the evasion arsenal). Stacking layers raises attacker cost; it does not provide formal worst-case guarantees.

Layer	Technique	Example	Normalization
T1	Empty quotes	`rm''-rf /`	strip `''` / `""` between tokens
T2	`${}` expansion	`r${x}m -rf /`	drop `${VAR}` where VAR is unset in scan context
T3	Backslash continuation	`rm\<newline>-rf /`	collapse backslash-newline pairs
T4	Tab/whitespace splitting	`rm\t-rf /`	normalize whitespace to single space
T5	`${IFS}` word-splitting	`rm${IFS}-rf${IFS}/`	replace `${IFS}` with space
T6	ANSI-C hex quoting	`$'\x72\x6d' -rf /`	decode `$'\xHH'` to ASCII byte

See CLAUDE.md §Defense Philosophy for the broader framing.

5. Alignment with Opus 4.7 (system card references)

5.1 Agent safety evaluations (§5.2.1)

The Opus 4.7 system card §5.2.1 documents agentic safety evaluations and notes that multi-layer defenses outperform single-layer defenses against adaptive attacks. llm-security implements this posture: prompt-scan + pathguard + trifecta-guard + pre-compact-scan operate in depth. A single layer failing does not compromise the defense.

5.2 Instruction following and hierarchy (§6.3.1.1)

The Opus 4.7 system card §6.3.1.1 describes tighter adherence to the declared instruction hierarchy and more literal interpretation of agent instructions. Consequently:

Stacked imperatives (e.g., "NEVER do X / MUST NOT do X") are less useful than tool-level enforcement. Prefer tools: frontmatter to restrict capabilities at the platform level, so the agent simply does not have the unsafe tool.
Agent instructions should mark speculation as speculation, and cite evidence (path, line number) rather than generalizing from one observation. See the "Step 0 Generaliseringsgrense" note added to skill-scanner-agent.md and mcp-scanner-agent.md.
Parallel Read calls are preferred for independent file reads, documented in the same Step 0 notes. This reduces latency and aligns with the model's improved parallel-tool-use behavior.

5.3 Known limitations (system card §6.3)

Prompt injection is structurally unsolvable in the current architecture. The system card acknowledges this; so does CLAUDE.md §Defense Philosophy. The hardening described here reduces the attack surface and raises the cost of attacks but does not eliminate them.

6. Calibration & false positives (v7.0.0+)

Security scanners live or die by their signal-to-noise ratio. A scanner that cries "extreme" on every project destroys its own credibility — users learn to ignore findings, and genuine threats slip past. v7.0.0 ships three calibration layers to keep that from happening.

6.1 Risk-score v2 formula

The v1 formula was a sum-and-cap: critical*25 + high*10 + medium*4 + low*1, capped at 100. Every non-trivial scan collapsed to 100/Extreme regardless of actual distribution. A codebase with 2 mediums and 100 lows scored the same as a codebase with 5 criticals.

v2 (scanners/lib/severity.mjs) is severity-dominated and log-scaled within tier:

Finding mix	Score range	Band
Critical present	70–95 (1=80, 2=86, 4=90, 10=95)	Critical/Extreme
High only	40–65 (1=48, 5=60, 17=65)	High
Medium only	15–35 (1=20, 5=28, 50=33)	Medium
Low only	1–11 (1=4, 10=11)	Low
None	0	Low

Verdict cutoffs (BLOCK ≥65, WARNING ≥15) are locked to the riskBand() boundaries so you can't get a "BLOCK / Medium band" contradiction. The legacy formula is kept as riskScoreV1() for reference only.

CI impact: Pipelines with --fail-on high keep working (the severity gate is unaffected). Pipelines with score-based thresholds need recalibration — old score >= 21 corresponds roughly to new score >= 15.

6.2 Context-aware entropy scanner

The entropy scanner flags high-Shannon-entropy strings as possible credentials. On codebases heavy with shader code, bundled JS, CSS-in-JS or SQL it produced astronomical false-positive rates. v7.0.0 adds three suppression layers:

File-extension skip — whole files with these extensions are never inspected for entropy findings: .glsl, .frag, .vert, .shader, .wgsl, .css, .scss, .sass, .less, .svg + compound .min.js, .min.css, .map. A skip counter (calibration.files_skipped_by_extension) is reported in the scanner envelope.
Line-level rules 11–18 — applied when a line contains any of: GLSL keywords (uniform, vec3, texture2D…), CSS-in-JS templates (styled.…), inline <svg> markup, ffmpeg filter_complex syntax, browser User-Agent strings, SQL DDL on a dedicated line (^\s*(SELECT|INSERT|…)), throw new Error(\…`) templates, or markdown image syntax with external URL (` — common in JSON content indexes / article metadata).
Per-project policy override — .llm-security/policy.json entropy section supports:

{
  "entropy": {
    "thresholds": {
      "critical": { "entropy": 5.4, "minLen": 128 },
      "high":     { "entropy": 5.1, "minLen": 64 },
      "medium":   { "entropy": 4.7, "minLen": 40 }
    },
    "suppress_extensions": [".custom"],
    "suppress_line_patterns": ["MY_VENDOR_MARKER"],
    "suppress_paths": ["vendored/", "generated/"]
  }
}

The synthesizer agent reports calibration prominently if >80 % of files were skipped (signals a policy so aggressive the scan is effectively bypassed) and omits it silently if <5 % were skipped.

6.3 Typosquat allowlist

The DEP scanner flags Levenshtein-close package names against a top-N list to catch typosquats (lod-ash, expres). On real codebases this tripped on short-name tools like knip, nx, tsx, uv, ruff. v7.0.0 extends knowledge/typosquat-allowlist.json with 22 npm + 5 PyPI entries for modern tools.

6.4 Tuning workflow

Run /security deep-scan on a representative codebase.
Read calibration.files_skipped_by_extension and files_skipped_by_path from the envelope — are they reasonable?
Review the top 10 findings. For each false positive, pick the narrowest suppression that catches it:
- Whole extension noisy → suppress_extensions
- One line pattern recurring → suppress_line_patterns
- Whole directory vendored → suppress_paths
Raise thresholds only as a last resort — you're hiding real signal.
Re-scan and verify verdict/band/score make sense relative to the finding set.

7. Sandbox Architecture: Why git-clone and vsix-sandbox Stay Separate

The plugin has two sandbox-using consumers — scanners/lib/git-clone.mjs (remote-repo cloning) and scanners/lib/vsix-sandbox.mjs (URL-fetched VS Code / JetBrains plugin extraction). On the surface they look like duplication candidates: both call sandbox-exec on macOS, both call bwrap on Linux, both fall back to in-process execution on Windows. They are intentionally not consolidated. This section documents why.

7.1 Shared primitives, not shared code paths

The sandbox-exec profile builders and bwrap argument builders live in lib/vsix-sandbox.mjs and are reused from git-clone.mjs — the duplication is conceptual, not literal. Both consumers call:

buildSandboxProfile(allowedWriteDir) — emits the macOS sandbox-exec S-expression that whitelists writes only to allowedWriteDir.
buildBwrapArgs(allowedWriteDir, networkAllowed) — emits the bwrap argv for a unprivileged-user-namespace container with the same write-restriction.
buildSandboxedWorker(dirs, workerPath) — wraps a Node sub-process in the platform-appropriate sandbox.

The kernel-level isolation contract is identical for both consumers.

7.2 Distinct setup contracts

What differs is the git/zip side of each pipeline. These contracts are not interchangeable:

Concern	git-clone.mjs	vsix-sandbox.mjs
Untrusted setup vector	`.gitattributes` filter/smudge drivers	ZIP entries with `..` traversal, symlinks, ratio bombs
Pre-fetch hardening	`core.hooksPath=/dev/null`, `core.symlinks=false`, all LFS filters disabled, `protocol.file.allow=never`, `transfer.fsckObjects=true`	ZIP-extractor caps (10 000 entries, 500MB uncomp, 100x ratio, depth 20), entry-by-entry path validation
Environment isolation	`GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0`	None — fetch is plain HTTPS via `lib/vsix-fetch.mjs`, no env-var attack surface
Network policy	Network allowed (clone needs HTTPS)	Network allowed in fetch worker only; extraction worker is offline
IPC contract	None — git writes its tree directly into the sandboxed temp dir	Single-line JSON on stdout: `{ok, sha256, size, finalUrl, source, extRoot}`

A unified "do-everything" sandbox helper would either need to know about git config flags (irrelevant for VSIX), or would need a callback escape hatch that re-introduces the abstraction tax it was meant to remove.

7.3 Consolidation deferred

Three reasons this stays as it is:

Premature abstraction risk on safety-critical code. Both modules are on the trust boundary. A bug in shared abstraction would simultaneously weaken both consumers; today, bugs are isolated.
Two consumers is not enough signal. The Rule of Three applies: abstract when a third consumer arrives and the contract becomes clear, not before.
Distinct review surfaces. Reviewers reading git-clone.mjs get the full git-attack-surface story in one file; reviewers reading vsix-sandbox.mjs get the full ZIP-attack-surface story in one file. Splitting either across a generic sandbox helper would force readers to context-switch to verify the contract.

7.4 Trigger condition for revisiting

This decision will be revisited if and when a third sandbox-using consumer appears in the plugin (e.g., a sandboxed evaluator for suspicious shell scripts, or a sandboxed PDF/PPTX parser). At that point the shared contract — write restriction to a temp dir, network policy, IPC shape — should be lifted into a lib/sandbox.mjs module with the per-consumer setup remaining co-located with its respective attack-surface logic.

Until then: two consumers, one set of primitives, two co-located contracts.

8. Recommended baseline for production

Set CLAUDE_CODE_EFFORT_LEVEL=xhigh for audit and planning sessions.
Set ENABLE_PROMPT_CACHING_1H=1 globally — reduces cost, does not weaken scanning.
All three plugin hook modes: start at warn, promote to block after baselining.
Keep sandbox wrappers enabled (default on macOS / Linux).
Periodically run /security posture (16-category scorecard) and /security dashboard (cross-project view) to catch drift.
After first /security deep-scan, run the §6.4 tuning workflow once to calibrate the noise floor for your codebase.

Last updated: 2026-04-29 for v7.1.0.

v7.1.0 calibration note

v7.1.0 is a patch release. No calibration changes; the §6 tuning workflow above is unchanged. Two hook-level bugs were fixed that affect production posture:

pre-write-pathguard.mjs now blocks multi-segment .env.*.*.* paths (previously a regex hole let .env.production.local.backup through).
post-session-guard.mjs block mode now blocks every detected trifecta. Previously required a "concentrated MCP" or "sensitive path" qualifier, so distributed trifectas were advisory-only even in block mode.

If you run with LLM_SECURITY_TRIFECTA_MODE=block, expect the false-block rate to rise after this upgrade — the previous gate suppressed real trifectas. Re-baseline the warn-mode noise floor before promoting to block, per §3.

17 KiB Raw Blame History Unescape Escape