ktg-plugin-marketplace/plugins/llm-security/docs/security-hardening-guide.md
Kjell Tore Guttormsen 621db144bd chore(release): bump llm-security to v7.1.0
Closes A4 of v7.1.0 critical-review patch — release artefacts.

- Version bump 7.0.0 → 7.1.0 across active version sources:
  * package.json
  * .claude-plugin/plugin.json
  * CLAUDE.md header
  * README.md badge
  * scanners/ide-extension-scanner.mjs (VERSION constant)
  * marketplace root README plugin entry
- Marketplace root README test count: 1487 → 1511.
- CHANGELOG.md: new [7.1.0] - 2026-04-29 section above [7.0.0],
  documenting B1, B2, B4, B8, honesty-sweep (7 phrases), and
  test-count delta (+24 → 1511 total).
- docs/security-hardening-guide.md: §6 last-updated bump + new
  v7.1.0 calibration note on hook-level fixes (pathguard regex
  hole, distributed-trifecta block-mode bypass).

Historical references to "7.0.0" intentionally preserved in:
- CHANGELOG [7.0.0] entries (history)
- README.md version-history table v5.0.0/v7.0.0 rows (history)
- CLAUDE.md §"v7.0.0 — Severity-dominated risk scoring" (describes
  what changed at v7.0.0 release)
- scanners/ JSDoc comments noting "v7.0.0+" formula provenance
- agents/ + tests/ + knowledge/ provenance comments

Pre-existing untracked/modified tracker noise (.gitignore,
marketplace.json, config-audit/docs, ultraplan-local/docs) is not
part of this commit per the v7.1.0 NEXT-SESSION-PROMPT handoff.

Tests: 1511/1511 green.
2026-04-29 11:57:16 +02:00

287 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Security Hardening Guide
This guide documents the environment variables, sandboxing mechanisms, and hook
modes available in `llm-security`, and how to align them with the capabilities of
Opus 4.7 and Claude Code 2.1.112.
The guide is opinionated: it describes the configurations the plugin authors run
in production. Deviations are fine, but the defaults here are the tested path.
---
## 1. Environment variables
### 1.1 Harness-level (Claude Code)
| Variable | Values | Effect |
|----------|--------|--------|
| `CLAUDE_CODE_EFFORT_LEVEL` | `low` \| `medium` \| `high` \| `xhigh` | Tunes how aggressively the model spends compute per turn. `xhigh` is recommended for security-sensitive planning and audits. |
| `ENABLE_PROMPT_CACHING_1H` | `1` \| unset | Enables 1-hour prompt cache TTL. Reduces cost and latency for repeated context; cache hits do not weaken scanning. |
| `CLAUDE_CODE_SCRIPT_CAPS` | JSON blob | Declares maximum capabilities Claude Code can grant scripts it spawns. Use to lock down hook and command execution. |
### 1.2 Plugin-specific hook modes
| Variable | Default | Modes |
|----------|---------|-------|
| `LLM_SECURITY_INJECTION_MODE` | `block` | `block` — exit 2 on critical/high injection patterns. `warn` — advisory via systemMessage. `off` — disables scan. |
| `LLM_SECURITY_TRIFECTA_MODE` | `warn` | `block` — exit 2 when lethal trifecta (untrusted input + sensitive data + exfiltration sink) detected. `warn` — advisory. `off` — disables. |
| `LLM_SECURITY_PRECOMPACT_MODE` | `warn` | `block` — exit 2 on findings during PreCompact. `warn` — advisory via systemMessage. `off` — disables scan. |
| `LLM_SECURITY_PRECOMPACT_MAX_BYTES` | `512000` | Tail size in bytes read from transcript for scanning. Higher values increase coverage at the cost of latency. |
| `LLM_SECURITY_UPDATE_CHECK` | `on` | `off` disables the daily update-check HTTP call. |
| `LLM_SECURITY_AUDIT_*` | unset | Audit trail configuration (destination, format, etc.) for SIEM-ready JSONL output. |
Apply env vars via shell profile, `.envrc`, or the host MDM. Do not write them
into the repository.
---
## 2. Sandboxing
### 2.1 macOS — `sandbox-exec`
`scanners/lib/git-clone.mjs` wraps remote clones in a `sandbox-exec` profile that
restricts file writes to the specific temp directory. This defends against
malicious `.gitattributes` filter/smudge drivers. The plugin uses this path by
default on Darwin.
### 2.2 Linux — `bubblewrap` (bwrap)
On Linux, the same flow uses `bwrap` to accomplish equivalent isolation. Works on
Fedora and Arch without configuration. Ubuntu 24.04+ may require a permissive
AppArmor profile (administrator privileges); fallback is git-config flags only,
with a WARN logged in the clone audit trail.
### 2.3 Windows
Windows has no equivalent OS sandbox available in default installs. The plugin
falls back to hardened git-config flags (`core.hooksPath=/dev/null`,
`core.symlinks=false`, disabled LFS drivers, `protocol.file.allow=never`,
`transfer.fsckObjects=true`) and environment isolation
(`GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`).
A WARN is logged so the caller can weigh the residual risk.
### 2.4 PID-namespace considerations
On Linux hosts with user namespaces disabled (some hardened kernels), `bwrap`
may fail to create the PID namespace. Prefer running scans from a normal user
shell; avoid root, which disables user-namespace confinement.
---
## 3. Hook modes in practice
### 3.1 Start in warn mode
Every new integration of `llm-security` should begin with all modes set to
`warn`. This yields advisories without breaking workflow, and lets the team
calibrate false-positive rates against their actual repositories.
### 3.2 Promote to block after baselining
After a baseline period (typically 1-2 weeks), flip each mode to `block` in this
order: `LLM_SECURITY_INJECTION_MODE`, `LLM_SECURITY_TRIFECTA_MODE`,
`LLM_SECURITY_PRECOMPACT_MODE`. The injection hook is first because false
positives there are the most visible; blocking comes last because the others
build confidence.
### 3.3 Off mode is a deliberate choice
Use `off` only when you explicitly need to disable a layer (e.g., during
performance profiling). Prefer `warn` in all other cases — the signal is still
recorded in the audit trail.
---
## 4. Bash normalization (T1-T6) as defense-in-depth
`scanners/lib/bash-normalize.mjs` collapses six known bash obfuscation
techniques before the denylist gate runs. These are **defense-in-depth** layers
that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.
The plugin's "defense-in-depth" claim resolves to **three independent detection
layers with documented bypass classes**: (1) the Claude Code harness denylist
(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6
collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match
+ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known
bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md`
§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not
provide formal worst-case guarantees.
| Layer | Technique | Example | Normalization |
|-------|-----------|---------|---------------|
| T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens |
| T2 | `${}` expansion | `r${x}m -rf /` | drop `${VAR}` where VAR is unset in scan context |
| T3 | Backslash continuation | `rm\<newline>-rf /` | collapse backslash-newline pairs |
| T4 | Tab/whitespace splitting | `rm\t-rf /` | normalize whitespace to single space |
| T5 | `${IFS}` word-splitting | `rm${IFS}-rf${IFS}/` | replace `${IFS}` with space |
| T6 | ANSI-C hex quoting | `$'\x72\x6d' -rf /` | decode `$'\xHH'` to ASCII byte |
See `CLAUDE.md` §Defense Philosophy for the broader framing.
---
## 5. Alignment with Opus 4.7 (system card references)
### 5.1 Agent safety evaluations (§5.2.1)
The Opus 4.7 system card §5.2.1 documents agentic safety evaluations and notes
that multi-layer defenses outperform single-layer defenses against adaptive
attacks. `llm-security` implements this posture: prompt-scan + pathguard +
trifecta-guard + pre-compact-scan operate in depth. A single layer failing does
not compromise the defense.
### 5.2 Instruction following and hierarchy (§6.3.1.1)
The Opus 4.7 system card §6.3.1.1 describes tighter adherence to the declared
instruction hierarchy and more literal interpretation of agent instructions.
Consequently:
- Stacked imperatives (e.g., "NEVER do X / MUST NOT do X") are less useful than
tool-level enforcement. Prefer `tools:` frontmatter to restrict capabilities
at the platform level, so the agent simply does not have the unsafe tool.
- Agent instructions should mark speculation as speculation, and cite evidence
(path, line number) rather than generalizing from one observation. See the
"Step 0 Generaliseringsgrense" note added to `skill-scanner-agent.md` and
`mcp-scanner-agent.md`.
- Parallel Read calls are preferred for independent file reads, documented in
the same Step 0 notes. This reduces latency and aligns with the model's
improved parallel-tool-use behavior.
### 5.3 Known limitations (system card §6.3)
Prompt injection is structurally unsolvable in the current architecture. The
system card acknowledges this; so does `CLAUDE.md` §Defense Philosophy. The
hardening described here reduces the attack surface and raises the cost of
attacks but does not eliminate them.
---
## 6. Calibration & false positives (v7.0.0+)
Security scanners live or die by their signal-to-noise ratio. A scanner that
cries "extreme" on every project destroys its own credibility — users learn
to ignore findings, and genuine threats slip past. v7.0.0 ships three
calibration layers to keep that from happening.
### 6.1 Risk-score v2 formula
The v1 formula was a sum-and-cap: `critical*25 + high*10 + medium*4 + low*1`,
capped at 100. Every non-trivial scan collapsed to 100/Extreme regardless of
actual distribution. A codebase with 2 mediums and 100 lows scored the same
as a codebase with 5 criticals.
v2 (`scanners/lib/severity.mjs`) is severity-dominated and log-scaled within
tier:
| Finding mix | Score range | Band |
|-------------|-------------|------|
| Critical present | 7095 (1=80, 2=86, 4=90, 10=95) | Critical/Extreme |
| High only | 4065 (1=48, 5=60, 17=65) | High |
| Medium only | 1535 (1=20, 5=28, 50=33) | Medium |
| Low only | 111 (1=4, 10=11) | Low |
| None | 0 | Low |
Verdict cutoffs (`BLOCK ≥65`, `WARNING ≥15`) are locked to the `riskBand()`
boundaries so you can't get a "BLOCK / Medium band" contradiction. The legacy
formula is kept as `riskScoreV1()` for reference only.
**CI impact:** Pipelines with `--fail-on high` keep working (the severity
gate is unaffected). Pipelines with score-based thresholds need recalibration
— old `score >= 21` corresponds roughly to new `score >= 15`.
### 6.2 Context-aware entropy scanner
The entropy scanner flags high-Shannon-entropy strings as possible
credentials. On codebases heavy with shader code, bundled JS, CSS-in-JS or
SQL it produced astronomical false-positive rates. v7.0.0 adds three
suppression layers:
1. **File-extension skip** — whole files with these extensions are never
inspected for entropy findings: `.glsl, .frag, .vert, .shader, .wgsl,
.css, .scss, .sass, .less, .svg` + compound `.min.js, .min.css, .map`. A
skip counter (`calibration.files_skipped_by_extension`) is reported in the
scanner envelope.
2. **Line-level rules 1118** — applied when a line contains any of: GLSL
keywords (`uniform`, `vec3`, `texture2D`…), CSS-in-JS templates
(`styled.…`), inline `<svg>` markup, ffmpeg `filter_complex` syntax,
browser `User-Agent` strings, SQL DDL on a dedicated line
(`^\s*(SELECT|INSERT|…)`), `throw new Error(\`…\`)` templates, or
markdown image syntax with external URL (`![alt](https://cdn…)` — common
in JSON content indexes / article metadata).
3. **Per-project policy override** — `.llm-security/policy.json` `entropy`
section supports:
```json
{
"entropy": {
"thresholds": {
"critical": { "entropy": 5.4, "minLen": 128 },
"high": { "entropy": 5.1, "minLen": 64 },
"medium": { "entropy": 4.7, "minLen": 40 }
},
"suppress_extensions": [".custom"],
"suppress_line_patterns": ["MY_VENDOR_MARKER"],
"suppress_paths": ["vendored/", "generated/"]
}
}
```
The synthesizer agent reports calibration prominently if >80 % of files were
skipped (signals a policy so aggressive the scan is effectively bypassed)
and omits it silently if <5 % were skipped.
### 6.3 Typosquat allowlist
The DEP scanner flags Levenshtein-close package names against a top-N list
to catch typosquats (`lod-ash`, `expres`). On real codebases this tripped on
short-name tools like `knip`, `nx`, `tsx`, `uv`, `ruff`. v7.0.0 extends
`knowledge/typosquat-allowlist.json` with 22 npm + 5 PyPI entries for modern
tools.
### 6.4 Tuning workflow
1. Run `/security deep-scan` on a representative codebase.
2. Read `calibration.files_skipped_by_extension` and `files_skipped_by_path`
from the envelope — are they reasonable?
3. Review the top 10 findings. For each false positive, pick the narrowest
suppression that catches it:
- Whole extension noisy → `suppress_extensions`
- One line pattern recurring → `suppress_line_patterns`
- Whole directory vendored → `suppress_paths`
4. Raise thresholds only as a last resort — you're hiding real signal.
5. Re-scan and verify verdict/band/score make sense relative to the finding
set.
---
## 7. Recommended baseline for production
1. Set `CLAUDE_CODE_EFFORT_LEVEL=xhigh` for audit and planning sessions.
2. Set `ENABLE_PROMPT_CACHING_1H=1` globally — reduces cost, does not weaken
scanning.
3. All three plugin hook modes: start at `warn`, promote to `block` after
baselining.
4. Keep sandbox wrappers enabled (default on macOS / Linux).
5. Periodically run `/security posture` (16-category scorecard) and
`/security dashboard` (cross-project view) to catch drift.
6. After first `/security deep-scan`, run the §6.4 tuning workflow once to
calibrate the noise floor for your codebase.
---
**Last updated:** 2026-04-29 for v7.1.0.
### v7.1.0 calibration note
v7.1.0 is a patch release. No calibration changes; the §6 tuning workflow above is
unchanged. Two hook-level bugs were fixed that affect production posture:
- `pre-write-pathguard.mjs` now blocks multi-segment `.env.*.*.*` paths (previously a
regex hole let `.env.production.local.backup` through).
- `post-session-guard.mjs` `block` mode now blocks every detected trifecta. Previously
required a "concentrated MCP" or "sensitive path" qualifier, so distributed
trifectas were advisory-only even in block mode.
If you run with `LLM_SECURITY_TRIFECTA_MODE=block`, expect the false-block rate to
rise after this upgrade — the previous gate suppressed real trifectas. Re-baseline
the warn-mode noise floor before promoting to block, per §3.