7.3 KiB
Security Hardening Guide
This guide documents the environment variables, sandboxing mechanisms, and hook
modes available in llm-security, and how to align them with the capabilities of
Opus 4.7 and Claude Code 2.1.112.
The guide is opinionated: it describes the configurations the plugin authors run in production. Deviations are fine, but the defaults here are the tested path.
1. Environment variables
1.1 Harness-level (Claude Code)
| Variable | Values | Effect |
|---|---|---|
CLAUDE_CODE_EFFORT_LEVEL |
low | medium | high | xhigh |
Tunes how aggressively the model spends compute per turn. xhigh is recommended for security-sensitive planning and audits. |
ENABLE_PROMPT_CACHING_1H |
1 | unset |
Enables 1-hour prompt cache TTL. Reduces cost and latency for repeated context; cache hits do not weaken scanning. |
CLAUDE_CODE_SCRIPT_CAPS |
JSON blob | Declares maximum capabilities Claude Code can grant scripts it spawns. Use to lock down hook and command execution. |
1.2 Plugin-specific hook modes
| Variable | Default | Modes |
|---|---|---|
LLM_SECURITY_INJECTION_MODE |
block |
block — exit 2 on critical/high injection patterns. warn — advisory via systemMessage. off — disables scan. |
LLM_SECURITY_TRIFECTA_MODE |
warn |
block — exit 2 when lethal trifecta (untrusted input + sensitive data + exfiltration sink) detected. warn — advisory. off — disables. |
LLM_SECURITY_PRECOMPACT_MODE |
warn |
block — exit 2 on findings during PreCompact. warn — advisory via systemMessage. off — disables scan. |
LLM_SECURITY_PRECOMPACT_MAX_BYTES |
512000 |
Tail size in bytes read from transcript for scanning. Higher values increase coverage at the cost of latency. |
LLM_SECURITY_UPDATE_CHECK |
on |
off disables the daily update-check HTTP call. |
LLM_SECURITY_AUDIT_* |
unset | Audit trail configuration (destination, format, etc.) for SIEM-ready JSONL output. |
Apply env vars via shell profile, .envrc, or the host MDM. Do not write them
into the repository.
2. Sandboxing
2.1 macOS — sandbox-exec
scanners/lib/git-clone.mjs wraps remote clones in a sandbox-exec profile that
restricts file writes to the specific temp directory. This defends against
malicious .gitattributes filter/smudge drivers. The plugin uses this path by
default on Darwin.
2.2 Linux — bubblewrap (bwrap)
On Linux, the same flow uses bwrap to accomplish equivalent isolation. Works on
Fedora and Arch without configuration. Ubuntu 24.04+ may require a permissive
AppArmor profile (administrator privileges); fallback is git-config flags only,
with a WARN logged in the clone audit trail.
2.3 Windows
Windows has no equivalent OS sandbox available in default installs. The plugin
falls back to hardened git-config flags (core.hooksPath=/dev/null,
core.symlinks=false, disabled LFS drivers, protocol.file.allow=never,
transfer.fsckObjects=true) and environment isolation
(GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, GIT_ATTR_NOSYSTEM=1).
A WARN is logged so the caller can weigh the residual risk.
2.4 PID-namespace considerations
On Linux hosts with user namespaces disabled (some hardened kernels), bwrap
may fail to create the PID namespace. Prefer running scans from a normal user
shell; avoid root, which disables user-namespace confinement.
3. Hook modes in practice
3.1 Start in warn mode
Every new integration of llm-security should begin with all modes set to
warn. This yields advisories without breaking workflow, and lets the team
calibrate false-positive rates against their actual repositories.
3.2 Promote to block after baselining
After a baseline period (typically 1-2 weeks), flip each mode to block in this
order: LLM_SECURITY_INJECTION_MODE, LLM_SECURITY_TRIFECTA_MODE,
LLM_SECURITY_PRECOMPACT_MODE. The injection hook is first because false
positives there are the most visible; blocking comes last because the others
build confidence.
3.3 Off mode is a deliberate choice
Use off only when you explicitly need to disable a layer (e.g., during
performance profiling). Prefer warn in all other cases — the signal is still
recorded in the audit trail.
4. Bash normalization (T1-T6) as defense-in-depth
scanners/lib/bash-normalize.mjs collapses six known bash obfuscation
techniques before the denylist gate runs. These are defense-in-depth layers
that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.
| Layer | Technique | Example | Normalization |
|---|---|---|---|
| T1 | Empty quotes | rm''-rf / |
strip '' / "" between tokens |
| T2 | ${} expansion |
r${x}m -rf / |
drop ${VAR} where VAR is unset in scan context |
| T3 | Backslash continuation | rm\<newline>-rf / |
collapse backslash-newline pairs |
| T4 | Tab/whitespace splitting | rm\t-rf / |
normalize whitespace to single space |
| T5 | ${IFS} word-splitting |
rm${IFS}-rf${IFS}/ |
replace ${IFS} with space |
| T6 | ANSI-C hex quoting | $'\x72\x6d' -rf / |
decode $'\xHH' to ASCII byte |
See CLAUDE.md §Defense Philosophy for the broader framing.
5. Alignment with Opus 4.7 (system card references)
5.1 Agent safety evaluations (§5.2.1)
The Opus 4.7 system card §5.2.1 documents agentic safety evaluations and notes
that multi-layer defenses outperform single-layer defenses against adaptive
attacks. llm-security implements this posture: prompt-scan + pathguard +
trifecta-guard + pre-compact-scan operate in depth. A single layer failing does
not compromise the defense.
5.2 Instruction following and hierarchy (§6.3.1.1)
The Opus 4.7 system card §6.3.1.1 describes tighter adherence to the declared instruction hierarchy and more literal interpretation of agent instructions. Consequently:
- Stacked imperatives (e.g., "NEVER do X / MUST NOT do X") are less useful than
tool-level enforcement. Prefer
tools:frontmatter to restrict capabilities at the platform level, so the agent simply does not have the unsafe tool. - Agent instructions should mark speculation as speculation, and cite evidence
(path, line number) rather than generalizing from one observation. See the
"Step 0 Generaliseringsgrense" note added to
skill-scanner-agent.mdandmcp-scanner-agent.md. - Parallel Read calls are preferred for independent file reads, documented in the same Step 0 notes. This reduces latency and aligns with the model's improved parallel-tool-use behavior.
5.3 Known limitations (system card §6.3)
Prompt injection is structurally unsolvable in the current architecture. The
system card acknowledges this; so does CLAUDE.md §Defense Philosophy. The
hardening described here reduces the attack surface and raises the cost of
attacks but does not eliminate them.
6. Recommended baseline for production
- Set
CLAUDE_CODE_EFFORT_LEVEL=xhighfor audit and planning sessions. - Set
ENABLE_PROMPT_CACHING_1H=1globally — reduces cost, does not weaken scanning. - All three plugin hook modes: start at
warn, promote toblockafter baselining. - Keep sandbox wrappers enabled (default on macOS / Linux).
- Periodically run
/security posture(13-category scorecard) and/security dashboard(cross-project view) to catch drift.
Last updated: 2026-04-17 for v6.2.0.