360 lines
17 KiB
Markdown
360 lines
17 KiB
Markdown
# Security Hardening Guide
|
||
|
||
This guide documents the environment variables, sandboxing mechanisms, and hook
|
||
modes available in `llm-security`, and how to align them with the capabilities of
|
||
Opus 4.7 and Claude Code 2.1.112.
|
||
|
||
The guide is opinionated: it describes the configurations the plugin authors run
|
||
in production. Deviations are fine, but the defaults here are the tested path.
|
||
|
||
---
|
||
|
||
## 1. Environment variables
|
||
|
||
### 1.1 Harness-level (Claude Code)
|
||
|
||
| Variable | Values | Effect |
|
||
|----------|--------|--------|
|
||
| `CLAUDE_CODE_EFFORT_LEVEL` | `low` \| `medium` \| `high` \| `xhigh` | Tunes how aggressively the model spends compute per turn. `xhigh` is recommended for security-sensitive planning and audits. |
|
||
| `ENABLE_PROMPT_CACHING_1H` | `1` \| unset | Enables 1-hour prompt cache TTL. Reduces cost and latency for repeated context; cache hits do not weaken scanning. |
|
||
| `CLAUDE_CODE_SCRIPT_CAPS` | JSON blob | Declares maximum capabilities Claude Code can grant scripts it spawns. Use to lock down hook and command execution. |
|
||
|
||
### 1.2 Plugin-specific hook modes
|
||
|
||
| Variable | Default | Modes |
|
||
|----------|---------|-------|
|
||
| `LLM_SECURITY_INJECTION_MODE` | `block` | `block` — exit 2 on critical/high injection patterns. `warn` — advisory via systemMessage. `off` — disables scan. |
|
||
| `LLM_SECURITY_TRIFECTA_MODE` | `warn` | `block` — exit 2 when lethal trifecta (untrusted input + sensitive data + exfiltration sink) detected. `warn` — advisory. `off` — disables. |
|
||
| `LLM_SECURITY_PRECOMPACT_MODE` | `warn` | `block` — exit 2 on findings during PreCompact. `warn` — advisory via systemMessage. `off` — disables scan. |
|
||
| `LLM_SECURITY_PRECOMPACT_MAX_BYTES` | `512000` | Tail size in bytes read from transcript for scanning. Higher values increase coverage at the cost of latency. |
|
||
| `LLM_SECURITY_UPDATE_CHECK` | `on` | `off` disables the daily update-check HTTP call. |
|
||
| `LLM_SECURITY_AUDIT_*` | unset | Audit trail configuration (destination, format, etc.) for SIEM-ready JSONL output. |
|
||
|
||
Apply env vars via shell profile, `.envrc`, or the host MDM. Do not write them
|
||
into the repository.
|
||
|
||
---
|
||
|
||
## 2. Sandboxing
|
||
|
||
### 2.1 macOS — `sandbox-exec`
|
||
|
||
`scanners/lib/git-clone.mjs` wraps remote clones in a `sandbox-exec` profile that
|
||
restricts file writes to the specific temp directory. This defends against
|
||
malicious `.gitattributes` filter/smudge drivers. The plugin uses this path by
|
||
default on Darwin.
|
||
|
||
### 2.2 Linux — `bubblewrap` (bwrap)
|
||
|
||
On Linux, the same flow uses `bwrap` to accomplish equivalent isolation. Works on
|
||
Fedora and Arch without configuration. Ubuntu 24.04+ may require a permissive
|
||
AppArmor profile (administrator privileges); fallback is git-config flags only,
|
||
with a WARN logged in the clone audit trail.
|
||
|
||
### 2.3 Windows
|
||
|
||
Windows has no equivalent OS sandbox available in default installs. The plugin
|
||
falls back to hardened git-config flags (`core.hooksPath=/dev/null`,
|
||
`core.symlinks=false`, disabled LFS drivers, `protocol.file.allow=never`,
|
||
`transfer.fsckObjects=true`) and environment isolation
|
||
(`GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`).
|
||
A WARN is logged so the caller can weigh the residual risk.
|
||
|
||
### 2.4 PID-namespace considerations
|
||
|
||
On Linux hosts with user namespaces disabled (some hardened kernels), `bwrap`
|
||
may fail to create the PID namespace. Prefer running scans from a normal user
|
||
shell; avoid root, which disables user-namespace confinement.
|
||
|
||
---
|
||
|
||
## 3. Hook modes in practice
|
||
|
||
### 3.1 Start in warn mode
|
||
|
||
Every new integration of `llm-security` should begin with all modes set to
|
||
`warn`. This yields advisories without breaking workflow, and lets the team
|
||
calibrate false-positive rates against their actual repositories.
|
||
|
||
### 3.2 Promote to block after baselining
|
||
|
||
After a baseline period (typically 1-2 weeks), flip each mode to `block` in this
|
||
order: `LLM_SECURITY_INJECTION_MODE`, `LLM_SECURITY_TRIFECTA_MODE`,
|
||
`LLM_SECURITY_PRECOMPACT_MODE`. The injection hook is first because false
|
||
positives there are the most visible; blocking comes last because the others
|
||
build confidence.
|
||
|
||
### 3.3 Off mode is a deliberate choice
|
||
|
||
Use `off` only when you explicitly need to disable a layer (e.g., during
|
||
performance profiling). Prefer `warn` in all other cases — the signal is still
|
||
recorded in the audit trail.
|
||
|
||
---
|
||
|
||
## 4. Bash normalization (T1-T6) as defense-in-depth
|
||
|
||
`scanners/lib/bash-normalize.mjs` collapses six known bash obfuscation
|
||
techniques before the denylist gate runs. These are **defense-in-depth** layers
|
||
that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.
|
||
|
||
The plugin's "defense-in-depth" claim resolves to **three independent detection
|
||
layers with documented bypass classes**: (1) the Claude Code harness denylist
|
||
(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6
|
||
collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match
|
||
+ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known
|
||
bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md`
|
||
§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not
|
||
provide formal worst-case guarantees.
|
||
|
||
| Layer | Technique | Example | Normalization |
|
||
|-------|-----------|---------|---------------|
|
||
| T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens |
|
||
| T2 | `${}` expansion | `r${x}m -rf /` | drop `${VAR}` where VAR is unset in scan context |
|
||
| T3 | Backslash continuation | `rm\<newline>-rf /` | collapse backslash-newline pairs |
|
||
| T4 | Tab/whitespace splitting | `rm\t-rf /` | normalize whitespace to single space |
|
||
| T5 | `${IFS}` word-splitting | `rm${IFS}-rf${IFS}/` | replace `${IFS}` with space |
|
||
| T6 | ANSI-C hex quoting | `$'\x72\x6d' -rf /` | decode `$'\xHH'` to ASCII byte |
|
||
|
||
See `CLAUDE.md` §Defense Philosophy for the broader framing.
|
||
|
||
---
|
||
|
||
## 5. Alignment with Opus 4.7 (system card references)
|
||
|
||
### 5.1 Agent safety evaluations (§5.2.1)
|
||
|
||
The Opus 4.7 system card §5.2.1 documents agentic safety evaluations and notes
|
||
that multi-layer defenses outperform single-layer defenses against adaptive
|
||
attacks. `llm-security` implements this posture: prompt-scan + pathguard +
|
||
trifecta-guard + pre-compact-scan operate in depth. A single layer failing does
|
||
not compromise the defense.
|
||
|
||
### 5.2 Instruction following and hierarchy (§6.3.1.1)
|
||
|
||
The Opus 4.7 system card §6.3.1.1 describes tighter adherence to the declared
|
||
instruction hierarchy and more literal interpretation of agent instructions.
|
||
Consequently:
|
||
|
||
- Stacked imperatives (e.g., "NEVER do X / MUST NOT do X") are less useful than
|
||
tool-level enforcement. Prefer `tools:` frontmatter to restrict capabilities
|
||
at the platform level, so the agent simply does not have the unsafe tool.
|
||
- Agent instructions should mark speculation as speculation, and cite evidence
|
||
(path, line number) rather than generalizing from one observation. See the
|
||
"Step 0 Generaliseringsgrense" note added to `skill-scanner-agent.md` and
|
||
`mcp-scanner-agent.md`.
|
||
- Parallel Read calls are preferred for independent file reads, documented in
|
||
the same Step 0 notes. This reduces latency and aligns with the model's
|
||
improved parallel-tool-use behavior.
|
||
|
||
### 5.3 Known limitations (system card §6.3)
|
||
|
||
Prompt injection is structurally unsolvable in the current architecture. The
|
||
system card acknowledges this; so does `CLAUDE.md` §Defense Philosophy. The
|
||
hardening described here reduces the attack surface and raises the cost of
|
||
attacks but does not eliminate them.
|
||
|
||
---
|
||
|
||
## 6. Calibration & false positives (v7.0.0+)
|
||
|
||
Security scanners live or die by their signal-to-noise ratio. A scanner that
|
||
cries "extreme" on every project destroys its own credibility — users learn
|
||
to ignore findings, and genuine threats slip past. v7.0.0 ships three
|
||
calibration layers to keep that from happening.
|
||
|
||
### 6.1 Risk-score v2 formula
|
||
|
||
The v1 formula was a sum-and-cap: `critical*25 + high*10 + medium*4 + low*1`,
|
||
capped at 100. Every non-trivial scan collapsed to 100/Extreme regardless of
|
||
actual distribution. A codebase with 2 mediums and 100 lows scored the same
|
||
as a codebase with 5 criticals.
|
||
|
||
v2 (`scanners/lib/severity.mjs`) is severity-dominated and log-scaled within
|
||
tier:
|
||
|
||
| Finding mix | Score range | Band |
|
||
|-------------|-------------|------|
|
||
| Critical present | 70–95 (1=80, 2=86, 4=90, 10=95) | Critical/Extreme |
|
||
| High only | 40–65 (1=48, 5=60, 17=65) | High |
|
||
| Medium only | 15–35 (1=20, 5=28, 50=33) | Medium |
|
||
| Low only | 1–11 (1=4, 10=11) | Low |
|
||
| None | 0 | Low |
|
||
|
||
Verdict cutoffs (`BLOCK ≥65`, `WARNING ≥15`) are locked to the `riskBand()`
|
||
boundaries so you can't get a "BLOCK / Medium band" contradiction. The legacy
|
||
formula is kept as `riskScoreV1()` for reference only.
|
||
|
||
**CI impact:** Pipelines with `--fail-on high` keep working (the severity
|
||
gate is unaffected). Pipelines with score-based thresholds need recalibration
|
||
— old `score >= 21` corresponds roughly to new `score >= 15`.
|
||
|
||
### 6.2 Context-aware entropy scanner
|
||
|
||
The entropy scanner flags high-Shannon-entropy strings as possible
|
||
credentials. On codebases heavy with shader code, bundled JS, CSS-in-JS or
|
||
SQL it produced astronomical false-positive rates. v7.0.0 adds three
|
||
suppression layers:
|
||
|
||
1. **File-extension skip** — whole files with these extensions are never
|
||
inspected for entropy findings: `.glsl, .frag, .vert, .shader, .wgsl,
|
||
.css, .scss, .sass, .less, .svg` + compound `.min.js, .min.css, .map`. A
|
||
skip counter (`calibration.files_skipped_by_extension`) is reported in the
|
||
scanner envelope.
|
||
2. **Line-level rules 11–18** — applied when a line contains any of: GLSL
|
||
keywords (`uniform`, `vec3`, `texture2D`…), CSS-in-JS templates
|
||
(`styled.…`), inline `<svg>` markup, ffmpeg `filter_complex` syntax,
|
||
browser `User-Agent` strings, SQL DDL on a dedicated line
|
||
(`^\s*(SELECT|INSERT|…)`), `throw new Error(\`…\`)` templates, or
|
||
markdown image syntax with external URL (`` — common
|
||
in JSON content indexes / article metadata).
|
||
3. **Per-project policy override** — `.llm-security/policy.json` `entropy`
|
||
section supports:
|
||
|
||
```json
|
||
{
|
||
"entropy": {
|
||
"thresholds": {
|
||
"critical": { "entropy": 5.4, "minLen": 128 },
|
||
"high": { "entropy": 5.1, "minLen": 64 },
|
||
"medium": { "entropy": 4.7, "minLen": 40 }
|
||
},
|
||
"suppress_extensions": [".custom"],
|
||
"suppress_line_patterns": ["MY_VENDOR_MARKER"],
|
||
"suppress_paths": ["vendored/", "generated/"]
|
||
}
|
||
}
|
||
```
|
||
|
||
The synthesizer agent reports calibration prominently if >80 % of files were
|
||
skipped (signals a policy so aggressive the scan is effectively bypassed)
|
||
and omits it silently if <5 % were skipped.
|
||
|
||
### 6.3 Typosquat allowlist
|
||
|
||
The DEP scanner flags Levenshtein-close package names against a top-N list
|
||
to catch typosquats (`lod-ash`, `expres`). On real codebases this tripped on
|
||
short-name tools like `knip`, `nx`, `tsx`, `uv`, `ruff`. v7.0.0 extends
|
||
`knowledge/typosquat-allowlist.json` with 22 npm + 5 PyPI entries for modern
|
||
tools.
|
||
|
||
### 6.4 Tuning workflow
|
||
|
||
1. Run `/security deep-scan` on a representative codebase.
|
||
2. Read `calibration.files_skipped_by_extension` and `files_skipped_by_path`
|
||
from the envelope — are they reasonable?
|
||
3. Review the top 10 findings. For each false positive, pick the narrowest
|
||
suppression that catches it:
|
||
- Whole extension noisy → `suppress_extensions`
|
||
- One line pattern recurring → `suppress_line_patterns`
|
||
- Whole directory vendored → `suppress_paths`
|
||
4. Raise thresholds only as a last resort — you're hiding real signal.
|
||
5. Re-scan and verify verdict/band/score make sense relative to the finding
|
||
set.
|
||
|
||
---
|
||
|
||
## 7. Sandbox Architecture: Why git-clone and vsix-sandbox Stay Separate
|
||
|
||
The plugin has two sandbox-using consumers — `scanners/lib/git-clone.mjs`
|
||
(remote-repo cloning) and `scanners/lib/vsix-sandbox.mjs` (URL-fetched VS Code
|
||
/ JetBrains plugin extraction). On the surface they look like duplication
|
||
candidates: both call `sandbox-exec` on macOS, both call `bwrap` on Linux,
|
||
both fall back to in-process execution on Windows. They are intentionally not
|
||
consolidated. This section documents why.
|
||
|
||
### 7.1 Shared primitives, not shared code paths
|
||
|
||
The `sandbox-exec` profile builders and `bwrap` argument builders live in
|
||
`lib/vsix-sandbox.mjs` and are *reused* from `git-clone.mjs` — the
|
||
duplication is conceptual, not literal. Both consumers call:
|
||
|
||
- `buildSandboxProfile(allowedWriteDir)` — emits the macOS sandbox-exec
|
||
S-expression that whitelists writes only to `allowedWriteDir`.
|
||
- `buildBwrapArgs(allowedWriteDir, networkAllowed)` — emits the bwrap
|
||
argv for a unprivileged-user-namespace container with the same
|
||
write-restriction.
|
||
- `buildSandboxedWorker(dirs, workerPath)` — wraps a Node sub-process
|
||
in the platform-appropriate sandbox.
|
||
|
||
The kernel-level isolation contract is identical for both consumers.
|
||
|
||
### 7.2 Distinct setup contracts
|
||
|
||
What differs is the *git/zip side* of each pipeline. These contracts are
|
||
not interchangeable:
|
||
|
||
| Concern | git-clone.mjs | vsix-sandbox.mjs |
|
||
|---------|---------------|------------------|
|
||
| Untrusted setup vector | `.gitattributes` filter/smudge drivers | ZIP entries with `..` traversal, symlinks, ratio bombs |
|
||
| Pre-fetch hardening | `core.hooksPath=/dev/null`, `core.symlinks=false`, all LFS filters disabled, `protocol.file.allow=never`, `transfer.fsckObjects=true` | ZIP-extractor caps (10 000 entries, 500MB uncomp, 100x ratio, depth 20), entry-by-entry path validation |
|
||
| Environment isolation | `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0` | None — fetch is plain HTTPS via `lib/vsix-fetch.mjs`, no env-var attack surface |
|
||
| Network policy | Network allowed (clone needs HTTPS) | Network allowed in fetch worker only; extraction worker is offline |
|
||
| IPC contract | None — git writes its tree directly into the sandboxed temp dir | Single-line JSON on stdout: `{ok, sha256, size, finalUrl, source, extRoot}` |
|
||
|
||
A unified "do-everything" sandbox helper would either need to know about
|
||
git config flags (irrelevant for VSIX), or would need a callback escape
|
||
hatch that re-introduces the abstraction tax it was meant to remove.
|
||
|
||
### 7.3 Consolidation deferred
|
||
|
||
Three reasons this stays as it is:
|
||
|
||
1. **Premature abstraction risk on safety-critical code.** Both modules
|
||
are on the trust boundary. A bug in shared abstraction would
|
||
simultaneously weaken both consumers; today, bugs are isolated.
|
||
2. **Two consumers is not enough signal.** The Rule of Three applies:
|
||
abstract when a third consumer arrives and the contract becomes clear,
|
||
not before.
|
||
3. **Distinct review surfaces.** Reviewers reading `git-clone.mjs` get
|
||
the full git-attack-surface story in one file; reviewers reading
|
||
`vsix-sandbox.mjs` get the full ZIP-attack-surface story in one file.
|
||
Splitting either across a generic sandbox helper would force readers
|
||
to context-switch to verify the contract.
|
||
|
||
### 7.4 Trigger condition for revisiting
|
||
|
||
This decision will be revisited if and when a third sandbox-using
|
||
consumer appears in the plugin (e.g., a sandboxed evaluator for
|
||
suspicious shell scripts, or a sandboxed PDF/PPTX parser). At that
|
||
point the shared contract — write restriction to a temp dir, network
|
||
policy, IPC shape — should be lifted into a `lib/sandbox.mjs` module
|
||
with the per-consumer setup remaining co-located with its respective
|
||
attack-surface logic.
|
||
|
||
Until then: two consumers, one set of primitives, two co-located
|
||
contracts.
|
||
|
||
---
|
||
|
||
## 8. Recommended baseline for production
|
||
|
||
1. Set `CLAUDE_CODE_EFFORT_LEVEL=xhigh` for audit and planning sessions.
|
||
2. Set `ENABLE_PROMPT_CACHING_1H=1` globally — reduces cost, does not weaken
|
||
scanning.
|
||
3. All three plugin hook modes: start at `warn`, promote to `block` after
|
||
baselining.
|
||
4. Keep sandbox wrappers enabled (default on macOS / Linux).
|
||
5. Periodically run `/security posture` (16-category scorecard) and
|
||
`/security dashboard` (cross-project view) to catch drift.
|
||
6. After first `/security deep-scan`, run the §6.4 tuning workflow once to
|
||
calibrate the noise floor for your codebase.
|
||
|
||
---
|
||
|
||
**Last updated:** 2026-04-29 for v7.1.0.
|
||
|
||
### v7.1.0 calibration note
|
||
|
||
v7.1.0 is a patch release. No calibration changes; the §6 tuning workflow above is
|
||
unchanged. Two hook-level bugs were fixed that affect production posture:
|
||
|
||
- `pre-write-pathguard.mjs` now blocks multi-segment `.env.*.*.*` paths (previously a
|
||
regex hole let `.env.production.local.backup` through).
|
||
- `post-session-guard.mjs` `block` mode now blocks every detected trifecta. Previously
|
||
required a "concentrated MCP" or "sensitive path" qualifier, so distributed
|
||
trifectas were advisory-only even in block mode.
|
||
|
||
If you run with `LLM_SECURITY_TRIFECTA_MODE=block`, expect the false-block rate to
|
||
rise after this upgrade — the previous gate suppressed real trifectas. Re-baseline
|
||
the warn-mode noise floor before promoting to block, per §3.
|