Commit graph

39 commits

Author SHA1 Message Date
Kjell Tore Guttormsen
ba5f2b64ad feat(policy-loader): 8.7 — env-var deprecation warnings (v8.0.0 removal) 2026-04-30 17:11:07 +02:00
Kjell Tore Guttormsen
2b7329151c docs(severity): 8.4 — @deprecated annotation on riskScoreV1 2026-04-30 16:54:37 +02:00
Kjell Tore Guttormsen
eaac830300 feat(mcp-description-cache): E14 part 1 — baseline + history schema (cumulative drift) [skip-docs]
Wave C step C1: extend the MCP description cache schema with a sticky
baseline slot per tool and a rolling history array (last 10 drift events).
Cumulative drift = levenshtein(current, baseline) / max(|current|, |baseline|);
emits a separate signal when ratio >= mcp.cumulative_drift_threshold
(default 0.25). Per-update drift logic and threshold unchanged.

- loadCache(): TTL purge now skips entries with a baseline, preserving
  cumulative-drift detection across the 7-day window. v7.2.0 entries
  (no history field) are migrated on read by seeding baseline from the
  current description and adding an empty history array. Entries with
  history but no baseline (post-clearBaseline) are NOT re-seeded.
- checkDescriptionDrift(): when an entry exists with history but no
  baseline (i.e. baseline was cleared), the next call re-seeds baseline
  from the incoming description so the legitimate next version becomes
  the new baseline.
- clearBaseline(toolName?): removes baseline for one tool or all tools.
  Preserves description / firstSeen / lastSeen / history.
- listBaselines(): read-only listing for the upcoming reset CLI.
- LLM_SECURITY_MCP_CACHE_FILE env var override for end-to-end testing.
- New policy key mcp.cumulative_drift_threshold (default 0.25).

Tests: 23 new unit tests; existing 10 still pass.

Docs deferred: CLAUDE.md update lands in C3 alongside the new
/security mcp-baseline-reset command. C2 adds the hooks-table footer
note. Combined wave docs match plan §"Wave C — Touch" list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 16:37:33 +02:00
Kjell Tore Guttormsen
ede37219a3 feat(workflow-scanner): E11 part 2 — re-interpolation + auth-bypass + WFL prefix + orchestrator
Closes E11. Three new pieces, plus integration:

1. Re-interpolation detector (Appsmith GHSL-2024-277 stealth pattern).
   The scanner now collects env: bindings (key -> source-expression
   text) by walking parsed events whose parentChain includes 'env',
   then for each `${{ env.<KEY> }}` inside run:, re-injects MEDIUM
   if the binding source matches the 23-field blacklist. This
   catches the pattern where developers apply env-indirection but
   then re-interpolate the env var in run:, which cancels the
   mitigation (template substitution happens before shell parsing).

2. Auth-bypass category (Synacktiv 2023 Dependabot spoofing).
   Detects `if: ${{ github.actor == 'dependabot[bot]' }}` and
   variants. MEDIUM, owasp: 'LLM06' (Excessive Agency). Distinct
   from injection — same expression syntax, different threat class.
   Recommendation steers users to `github.event.pull_request.user.login`.

3. severity.mjs OWASP map registration. WFL prefix added to all
   four maps:
   - OWASP_MAP['WFL'] = ['LLM02', 'LLM06']
   - OWASP_AGENTIC_MAP['WFL'] = ['ASI04']
   - OWASP_SKILLS_MAP['WFL'] = []
   - OWASP_MCP_MAP['WFL'] = []
   Empty arrays for skills/MCP are explicit, not omitted — keeps
   `Object.keys(OWASP_MAP)` symmetric across maps.

4. scan-orchestrator.mjs registration. workflowScan added between
   supply-chain and toxic-flow (toxic-flow correlates after primaries).
   Verified via integration: orchestrator emits 9 WFL findings on
   tests/fixtures/workflows/.

Bug fix: extractTriggers in workflow-yaml-state.mjs was collecting
sub-properties (`branches:`, `types:`) as triggers. Now tracks the
first nested indent level and ignores anything deeper.

Tests:
- 6 new cases in tests/scanners/workflow-scanner.test.mjs:
  re-interp TP, no-double-count, auth-bypass TP, auth-bypass FP
  (startsWith head_ref is not auth-bypass), OWASP map shape,
  orchestrator import + SCANNERS array entry.
- 2 new fixtures: tp-reinterpolation.yml, auth-bypass-dependabot.yml.
- Existing 14 scanner tests + 15 state-machine tests unchanged.

Test count: 1732 -> 1738 (+6). Wave B total: +53 over baseline 1685.
Pre-compact-scan flake unchanged (passes in isolation).
2026-04-30 15:57:10 +02:00
Kjell Tore Guttormsen
c31d4b1718 feat(workflow-scanner): E11 part 1 — core file-walk + 23-field blacklist + sink-restriction
Adds a deterministic GitHub Actions / Forgejo Actions injection
scanner. Detects \${{ <dangerous-field> }} interpolations inside
\`run:\` step blocks under privileged or semi-privileged triggers.
Sink-restricted: \`if:\` / \`with:\` / \`env:\` (block-level) are
evaluated by the runner expression engine, not the shell, so they
are NOT injection sinks and are suppressed at parser level.

Why: workflow expression injection is the most prevalent SAST class
on GitHub (CodeQL preview: 800K+ findings across 158K repos). The
graduated severity matrix (HIGH for pull_request_target / discussion
/ workflow_run; MEDIUM for pull_request / workflow_dispatch) is the
community-converged calibration target — uniform HIGH causes alert
fatigue.

Components:
- scanners/lib/workflow-yaml-state.mjs — line-based YAML state
  machine. Tracks indentation, parent-context stack, and
  \`run: |\` / \`run: >\` block-scalar entry/exit. Zero deps.
- scanners/workflow-scanner.mjs — discoverWorkflows() probes
  .github/workflows/ and .forgejo/workflows/ directly (file-discovery
  has no glob include). 23-field blacklist (GHSL 17 + 6 GlueStack-
  class additions). Platform encoded via file path; no schema
  extension to finding(). Forgejo-specific: workflow_run advisory
  emitted to stderr; recommendation text mentions Forgejo's
  server-level token scoping (job-level permissions: is ignored).
- knowledge/workflow-injection-patterns.md — 23-field blacklist,
  trigger taxonomy, severity matrix, Forgejo divergences, NVD CVE
  corpus.

Tests (47 new):
- tests/lib/workflow-yaml-state.test.mjs (15): trigger forms
  (string / inline-list / block-list / block-mapping), single-line
  run, block-scalar | and > tracking, env/with sink-mismatch,
  multi-line, comment stripping, line-number accuracy.
- tests/scanners/workflow-scanner.test.mjs (14): TP head_ref
  pull_request_target, TP discussion.title gluestack pattern,
  TP comment.body pull_request, TP issue.body block-scalar,
  FP if-context, FP env-block, INFO numeric, Forgejo TP, Forgejo
  workflow_run advisory, envelope shape, WFL prefix.
- 9 fixtures in tests/fixtures/workflows/{.github,.forgejo}/workflows/.

Out of scope (B4 / Batch D):
- Re-interpolation detection (env.VAR after env: from blacklisted source)
- github.actor authorization-bypass category
- WFL prefix in severity.mjs OWASP maps + scan-orchestrator
  registration (B4)
- Composite-action input tracing, GITHUB_ENV poisoning (Batch D)

Test count: 1685 → 1732 (+47). Pre-compact-scan flake unchanged
(passes in isolation).
2026-04-30 15:48:48 +02:00
Kjell Tore Guttormsen
ad86f5031a feat(pre-install-supply-chain): E13 — npm scope-hopping MEDIUM advisory with allowlist
Adds a scope-hopping detector to the npm install gate. When a user
installs `@<scope>/<unscoped>`, the hook now emits a MEDIUM warning
on stderr (exit 0, never blocks) if:
  - `<unscoped>` matches a popular npm package (POPULAR_NPM, ~80
    names from knowledge/top-packages.json), AND
  - `<scope>` is not on NPM_OFFICIAL_SCOPES (built-in 22 entries) or
    on policy.json `supply_chain.allowed_scopes`.

Why: an attacker publishing `@evilcorp/lodash` cannot squat the bare
`lodash` name, but they can register an unrelated scope and rely on
typo or copy-paste to trick installs. NPM_OFFICIAL_SCOPES anchors the
known-good scopes (@types, @reduxjs, @nestjs, …) so legitimate
installs stay silent.

Implementation:
- `scanners/lib/supply-chain-data.mjs`: exports POPULAR_NPM,
  NPM_OFFICIAL_SCOPES, and `checkScopeHop(name, extraAllowedScopes)` —
  pure function, no policy/network dependency, fully unit-testable.
- `knowledge/typosquat-allowlist.json`: mirrors NPM_OFFICIAL_SCOPES as
  `npm_official_scopes`. A doc-consistency assertion ensures the two
  lists never drift.
- `hooks/scripts/pre-install-supply-chain.mjs`: imports checkScopeHop,
  reads `supply_chain.allowed_scopes` from policy, and pushes a
  warning before existing compromised/audit checks.

Tests:
- 9 new cases in tests/hooks/pre-install-supply-chain.test.mjs:
  TP @evilcorp/lodash, TP @attacker/express, allowlist @types,
  allowlist @reduxjs, allowlist @modelcontextprotocol, FP unscoped
  name not in top-100, bare unscoped name, policy override, defensive
  non-string input, NPM_OFFICIAL_SCOPES <-> typosquat-allowlist.json
  consistency.
2026-04-30 15:38:28 +02:00
Kjell Tore Guttormsen
0f4b0c5f2c feat(git-clone): E12 — .gitattributes filter-driver post-clone advisory
Adds scanGitAttributes(repoDir) — pure function that parses
.gitattributes after a sandboxed clone and returns the
{filter,diff,merge} driver entries that would run on checkout. The
clone CLI prints each entry as a "MEDIUM" stderr advisory followed by
a recommendation to verify the smudge/clean command before moving the
clone outside the sandbox.

Why: filter drivers execute arbitrary shell during checkout (smudge
runs on read, clean on write). Even with the existing sandboxed clone,
downstream consumers that re-checkout files outside the sandbox can be
exploited. Surfacing the directive list lets the caller decide whether
to proceed.

Out-of-scope: in-line content of the smudge command is not analysed —
the advisory is for human review, not automatic blocking.

Tests:
- tests/lib/git-clone-gitattributes.test.mjs (8 cases): LFS-style,
  custom driver, missing/empty/comment-only files, line-number
  tracking, inline-comment stripping, unreadable path graceful return.
2026-04-30 15:29:13 +02:00
Kjell Tore Guttormsen
950e4e4bce feat(injection): E3 — rot13 layer for comment-block injection
Adds rot13 to the variantSet built in scanForInjection(), so
imperative phrases hidden as rot13 inside code comments still hit
the existing CRITICAL/HIGH/MEDIUM pattern arrays.

normalizeForScan() already covers base64, hex, URL, and HTML decoding
in a 3-iteration loop — those are NOT duplicated here. rot13 is the
only genuinely new variant: it is its own inverse and not part of any
NIST/Unicode normalization spec, so it has to be applied explicitly.

Threshold: only inputs >40 chars enter the rot13 pass, to suppress
false positives on accidental letter-shifts in tokens, ids, and short
identifiers. Variants are deduplicated against the existing set so
matchers do not run twice.

3 new tests in injection-patterns.test.mjs (rot13 detection, sub-40
char suppression, plaintext path still green). Total 168 tests pass.

Closes E3 in critical-review-2026-04-20.md.
2026-04-30 15:21:03 +02:00
Kjell Tore Guttormsen
761e81309b feat(bash-normalize): T7 — process substitution collapse (E8)
Strips bash process substitution syntax — <(cmd) and >(cmd) — so the
inner command name is surfaced to downstream regex gates. Defeats
evasion like `cat <(curl evil)` where the destructive command is
hidden behind /dev/fd/N pipe sugar.

Implementation: bounded innermost-first iteration, depth 3. Beyond
that the string is left as-is rather than recurse without bound.
Runs after the single-quote mask phase, so legitimate strings like
`'echo <(x)'` are preserved.

5 new T7 tests (collapse + nested + FP probes) in
bash-normalize-t7-t9.test.mjs (now 12 tests total).

Closes E8 in critical-review-2026-04-20.md.
2026-04-30 15:14:04 +02:00
Kjell Tore Guttormsen
037b9644f3 feat(bash-normalize): T9 — one-level variable substitution (E10)
Defeats split-and-substitute evasion where attackers split a destructive
command name across an assignment and a variable reference (X=rm; later
$X) so downstream regex gates miss the literal command name. T9 collects
prefix assignments (VAR=value at start of string or after ; & |) and
substitutes ${VAR} / $VAR forms with the captured value. One-level
forward-flow only — chained vars are not followed.

Documented limits in JSDoc:
- Quoted assignments (X="rm -rf") not parsed (whitespace stops capture)
- Substitution is global within string, not scoped. Acceptable because
  T3 strips unknown ${VAR} to '' afterwards.

Single-quoted literals are masked before T9 runs, so legitimate
strings are preserved (FP probe in tests).

7 new tests in bash-normalize-t7-t9.test.mjs.
Closes E10 in critical-review-2026-04-20.md.
2026-04-30 15:12:02 +02:00
Kjell Tore Guttormsen
6073952b97 fix(injection): E16 ASCII fast-path + UNI-003 expectation update (v7.2.0)
Two follow-up fixes after E16 + E17 landed:

1. foldHomoglyphs ASCII fast-path
   - scanForInjection calls foldHomoglyphs on every scan (raw + normalized).
   - Pre-fix: NFKC normalization runs unconditionally, even on pure
     ASCII inputs where it's a no-op.
   - Result: benchmark.test.mjs timed out at 120s on the full suite.
   - Fix: charCodeAt sweep for >=128, short-circuit return s when
     all ASCII. NFKC and HOMOGLYPH_MAP iteration only run when
     non-ASCII chars are present (the actual attack case).
   - Verified: benchmark.test.mjs passes within timeout.

2. Attack-scenario UNI-003 expectation
   - Pre-E16: "Homoglyph Cyrillic-Latin mixing" payload triggered only
     a MEDIUM "obfuscation present" advisory (exit 0, stdout match
     "MEDIUM").
   - Post-E16: the same payload is folded to Latin BEFORE pattern
     matching, so it now matches CRITICAL "ignore previous instructions"
     and blocks (exit 2).
   - This is the intended v7.2.0 behavior — not a regression. Updated
     expectation: exit_code 2, stdout_match "block". Renamed scenario
     to "now blocked via E16 fold, v7.2.0".

Suite: pre-compact-scan flake remains (perf-budget under load,
passes isolated). All other tests green.
2026-04-29 14:44:41 +02:00
Kjell Tore Guttormsen
ec4ae268da feat(injection): E16 — homoglyph NFKC fold before every pattern match
Critical-review §4 E16 finding: pre-v7.2.0 homoglyph normalization fired
ONLY for the MEDIUM-advisory "obfuscation present" signal. Pattern
matchers in scanForInjection compared against raw + decoded variants
only — they did NOT compare against a fold-normalized variant. As a
result, "ignоre previous instructions" (Cyrillic о, U+043E) bypassed
the CRITICAL "ignore previous" pattern.

Two coordinated edits:

scanners/lib/string-utils.mjs
- Adds HOMOGLYPH_MAP (frozen) — surgical Cyrillic/Greek → Latin map.
  ~25 entries focused on injection-vocabulary letters
  (a, e, o, c, p, x, y, i, j, s, l, A, E, O, C, P, X, Y, T).
- Adds foldHomoglyphs(s) — pipeline: NFKC → apply HOMOGLYPH_MAP.
  NFKC handles Mathematical Alphanumeric (U+1D400 block), fullwidth
  Latin (U+FF21 block), ligatures, width variants.

Excluded by design from HOMOGLYPH_MAP:
- Latin Extended (æ, ø, å, é, è, ñ, ü, ö, ä, ç, ß, þ, ð) — legitimate
  Norwegian/German/French/Spanish letters. Map them and we false-positive
  on every non-English source file.
- Greek letters not visually overlapping (β, γ, δ, ...)
- Cyrillic letters not visually overlapping (б, г, д, ж, ...)

scanners/lib/injection-patterns.mjs
- scanForInjection now builds a 4-variant set: raw, normalized,
  folded(raw), folded(normalized). Set deduplication skips redundant
  identical variants. Existing dedup-by-label (seenLabels Set) prevents
  double-counts when the same pattern matches in multiple variants.
- foldHomoglyphs added to the imports.

Tests: +27 cases in tests/lib/string-utils-homoglyph.test.mjs:
- 6 Cyrillic → Latin (lowercase, uppercase, multiple substitutions,
  Palochka U+04CF)
- 3 Greek → Latin
- 2 NFKC normalization (Math Bold, Fullwidth)
- 8 preserves-non-confusable (Norwegian æøå, German umlauts, French
  accents, Spanish ñ, emoji, CJK, Arabic/Hebrew)
- 3 edge cases (empty, null/undefined, idempotency)
- 5 scanForInjection integration (Cyrillic ignore, Cyrillic Assistant,
  Norwegian non-trigger, benign "ignore" comment, mixed Cyrillic+Greek)

Test-development found: U+1D5DC is "I" not "A" (test pin caught my
codepoint mistake — fixed during dev).

Suite: 1617 → 1644 (+27). All green.
2026-04-29 14:22:05 +02:00
Kjell Tore Guttormsen
6cef80c640 feat(unicode): E1 — extend hidden-Unicode detection to PUA-A and PUA-B
Critical-review §4 E1 finding: pre-v7.2.0 the Unicode-stego detector
(`containsUnicodeTags`) covered only U+E0001-E007F (Tag block). Private
Use Areas — also invisible in most terminals and surviving normalization
— were not detected. Attackers could encode payloads in PUA codepoints
that pass through `scanForInjection` undetected.

Coverage extended to:
- U+E0001-E007F  Unicode Tag block       (existing — DeepMind kat. 1)
- U+F0000-FFFFD  Supplementary PUA-A      (NEW — E1)
- U+100000-10FFFD Supplementary PUA-B     (NEW — E1)

Detection-only for PUA: PUA characters have NO standard ASCII mapping,
so `decodeUnicodeTags` leaves them unchanged. Detection alone is
sufficient — `scanForInjection` emits HIGH on any presence, regardless
of decoded content.

Function name `containsUnicodeTags` preserved for back-compat. All
existing call sites (injection-patterns.mjs:259, etc.) work unchanged.
Semantically the function is now "containsHiddenUnicode".

Tests: +21 cases in tests/lib/string-utils-hidden-unicode.test.mjs:
- 5 Tag-block regression guards
- 4 PUA-A range cases (start, just-inside, end, buried-in-ASCII)
- 3 PUA-B range cases
- 5 boundary cases (gap U+E0080-EFFFF, U+10FFFE noncharacter, emoji,
  CJK, Latin Extended — all must be FALSE)
- 4 decodeUnicodeTags passthrough cases (PUA-A unchanged, PUA-B
  unchanged, Tag block still decodes, mixed Tag+PUA)

Suite: 1596 → 1617 (+21). All green.
2026-04-29 14:18:49 +02:00
Kjell Tore Guttormsen
5f8f2d3c41 fix(dep): B7 — token-overlap typosquat heuristic alongside Levenshtein
Critical-review §2 B7 finding: pure Levenshtein <=2 misses the most common
modern typosquat pattern — popular-name + token-injection suffix. Examples:
  lodash → lodash-utils    (edit distance 6, not flagged pre-B7)
  react  → react-helper    (edit distance 7, not flagged pre-B7)
  express → express-wrapper (edit distance 8, not flagged pre-B7)

Three coordinated edits:

scanners/lib/string-utils.mjs
- Adds tokenize(name): string[]    splits on -/_, lowercases
- Adds tokenOverlap(a, b): number  intersection.size / min(|a|,|b|)
- Adds TYPOSQUAT_SUSPICIOUS_TOKENS frozen list of common typosquat
  suffixes. Excludes language-extension tokens (js, jsx, ts, tsx) — the
  v7.0.0 allowlist contains `tsx` as a legit package and including the
  same token in the suspicious set creates a contradiction. Caught by
  the new allowlist-intersection-guard test. Also excludes 'pro'
  (legitimate edition marker).

scanners/dep-auditor.mjs + scanners/supply-chain-recheck.mjs
- New checkTyposquatTokenOverlap() helper — fires AFTER Levenshtein 1/2
  branches, only when:
    1. popular package's tokens ⊆ declared name's tokens (strict superset)
    2. declared name has at least one suspicious suffix
    3. popular package is in topCutoff window
  All three conditions required — conservative by design. Allowlist
  precedence preserved (existing 22 npm + 13 PyPI entries always pass).
  MEDIUM severity, NOT block. New finding title prefix:
  "Possible typosquatting via token-overlap".

Tests: +21 cases across two new files
- tests/lib/string-utils-tokens.test.mjs (15) — tokenize, tokenOverlap,
  TYPOSQUAT_SUSPICIOUS_TOKENS frozen contract, allowlist-intersection
  guard (caught the tsx conflict on first run)
- tests/scanners/dep-token-overlap.test.mjs (7) — integration via
  in-memory tmpdir fixtures: lodash-utils flagged, react-helper flagged,
  express-wrapper flagged, lodash exact NOT flagged, allowlist tools
  (knip/tsx/nx/rimraf) NOT flagged, react-router-dom (no suspicious
  suffix) NOT flagged, react itself (equal token set, not superset)
  NOT flagged.

Existing dep.test.mjs and supply-chain-recheck.test.mjs unchanged —
all green (149 → 149 regression guard).

Suite: 1570 → 1591 (+21). All green.
2026-04-29 14:10:53 +02:00
Kjell Tore Guttormsen
3cd68dc9fb docs(severity): B3 — document info as scoring-inert (v7.2.0 prep)
Critical-review §2 B3 finding: `riskScore({info: N}) = 0` silently masks
info-volume findings. The behavior was correct (info is scoring-inert by
design) but undocumented. Operators reading a report with N info findings
had no way to know they contribute zero to verdict/band.

Three coordinated edits:
- scanners/lib/severity.mjs JSDoc — explicit "Info severity" subsection
  spelling out: scoring-inert, surfaced in owaspCategorize aggregates,
  treat as observability telemetry not verdict input. @param updated to
  mark info as accepted but ignored.
- CLAUDE.md v7.0.0 risk-score-v2 line — one-sentence anchor pointing to
  severity.mjs JSDoc.
- tests/lib/severity.test.mjs — anchor test alongside the existing
  4-critical=93 anchor: asserts riskScore({info: 50}) === 0,
  riskScore({info: 1000}) === 0, verdict({info: 100}) === 'ALLOW',
  riskBand(riskScore({info: 500})) === 'Low'.

Decision: skip the optional `infoScore()` helper from the brief. No
current consumer would use it; doc-only fix keeps API surface minimal.
Revisit if a consumer emerges.

Tests: 1522 → 1523 (+1 anchor block, 4 assertions). All green.
2026-04-29 13:56:11 +02:00
Kjell Tore Guttormsen
4aa5318bcb fix(llm-security): A2 batch — JSDoc arithmetic + co-monotonicity test + CaMeL nedton
Closes A2 of v7.1.0 critical-review patch (docs/critical-review-2026-04-20.md):

- B4 (severity JSDoc): 4 critical = 93, not 90. Fixed in scanners/lib/severity.mjs:23
  and CHANGELOG.md v7.0.0 tier description. The actual computation has always been
  93 (70 + log2(5)*10 = 93.22 → round); only the docs were wrong.

- §5.4 co-monotonicity: new sweep test in tests/lib/severity.test.mjs over 15
  representative count vectors. Asserts that (verdict, riskBand) agree under the
  v7.0.0 contract for every case — catches future drift between riskScore tiers,
  verdict cutoffs, and riskBand cutoffs. Includes a B4 anchor test (riskScore
  {critical: 4} === 93) so doc/code drift fails loudly.

- B8 (CaMeL claims toned down): post-session-guard.mjs:646 comment block and
  CLAUDE.md:184 Defense Philosophy bullet now describe the implementation
  honestly — opportunistic byte-matching of truncated output fingerprints
  (first 200 bytes, SHA-256/16-hex), not semantic data-flow tracking.
  Trivially bypassed by mutation, summarisation, or re-encoding. Inspired by
  CaMeL (DeepMind 2025), but not a CaMeL capability-tracking implementation.

Tests: 1495 → 1511 (+16: 15 sweep cases + 1 B4 anchor). All green.
2026-04-29 11:49:08 +02:00
Kjell Tore Guttormsen
6f86de937a feat(llm-security)!: v7.0.0 commit 6 — tests, docs, version bump
Final commit in the trustworthy-scoring series. Bundles verdict cutoff
alignment, the last suite of tests, and all documentation touch-points
that quote version numbers or describe v7.0.0 behaviour.

Verdict/band co-monotonicity
- `scanners/lib/severity.mjs` — verdict cutoffs moved from 61/21 to 65/15
  so `BLOCK >= 65`, `WARNING >= 15` locks onto the v2 riskBand() boundaries.
  Prevents "BLOCK / Medium band" contradictions under the v2 formula.

Scanner hardening (bug fixes from v7.0.0 testing)
- `scanners/entropy-scanner.mjs` — `policy_source` now uses
  `existsSync('.llm-security/policy.json')` instead of value-based check.
  Old heuristic always reported 'policy.json' because DEFAULT_POLICY now
  carries an `entropy.thresholds` section.
- `scanners/lib/file-discovery.mjs` — `.sass` and GPU shader extensions
  (`.glsl, .frag, .vert, .shader, .wgsl`) added to TEXT_EXTENSIONS. Without
  this, shader files were invisible to file-discovery, so they were never
  counted as skipped by the entropy-scanner extension filter.

Tests
- `tests/scanners/entropy-context.test.mjs` (new, 24 tests) — A. File-ext
  skip (4), B. Line-level rules 11-17 (8), C. Policy overrides (3).
  Fixtures generate 80-char base64 payloads at runtime via
  `crypto.randomBytes` to dodge the plugin's own pre-edit credential hook
  on the test source.
- `tests/lib/severity.test.mjs` — rewritten with v2 scoring table (70
  tests total, was 52).
- `tests/lib/output.test.mjs:243` — "1 critical = score 80" under v2
  (was 25 under v1).
- Full suite: 1485/1485 green (was 1461).

Docs
- `CHANGELOG.md` — v7.0.0 entry with BREAKING CHANGES section.
- `README.md` (plugin + marketplace root) — version badge, history table,
  plugin-card version string, test count.
- `CLAUDE.md` — header version, "v7.0.0 — Trustworthy scoring" summary
  paragraph at the top.
- `docs/security-hardening-guide.md` — new section 6 "Calibration & false
  positives" documenting v2 formula, context-aware entropy scanner,
  typosquat allowlist, and §6.4 tuning workflow. Existing "Recommended
  baseline" section renumbered to §7.

Version bump
- `6.6.0 -> 7.0.0` across package.json, .claude-plugin/plugin.json,
  scanners/ide-extension-scanner.mjs VERSION const, README badge,
  CLAUDE.md header, marketplace root README card.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 22:26:35 +02:00
Kjell Tore Guttormsen
a9e377570c feat(llm-security): v7.0.0 commit 3 — policy-driven entropy thresholds
Adds entropy section to DEFAULT_POLICY and wires it into entropy-scanner.
Users can now tune false-positive tradeoffs without forking the scanner.

Policy shape (.llm-security/policy.json):
  entropy:
    thresholds.{critical,high,medium}.{entropy,minLen}  — numeric overrides
    suppress_extensions[]                               — additive ext skip
    suppress_line_patterns[]                            — additional regex
    suppress_paths[]                                    — relPath substrings

Wiring: entropy-scanner calls loadPolicy(targetPath) at scan entry (not
orchestrator-passed — avoids signature churn across 10 scanners). Module-
level state is reset per scan invocation. Scanner envelope now includes
calibration.{policy_source, thresholds, files_skipped_by_*} for
synthesizer transparency (Commit 5).

Malformed user regex silently skipped. Missing policy.json → built-in
defaults (backwards-compatible).

entropy.test.mjs: 9/9 still green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 22:02:52 +02:00
Kjell Tore Guttormsen
d83424a782 feat(llm-security)!: v7.0.0 commit 1 — severity-dominated log-scaled risk score
Replace sum-and-cap formula (every non-trivial scan → 100/Extreme) with
severity-dominated, log-scaled-within-tier model. Discriminates actual
risk: 1 critical = 80, 2 critical = 86, 17 high = 65. Hyperframes-class
rendering codebases no longer collapse to Extreme just from shader noise.

Changes:
- scanners/lib/severity.mjs: new riskScore() v2; keep riskScoreV1() for
  reference; riskBand() cutoffs aligned (14/39/64/84).
- scanners/posture-scanner.mjs: delete inline duplicate formula, import
  riskScore/riskBand/verdict from severity.mjs. Single source of truth.

Breaking: aggregate.risk_score semantics change. Batched with entropy
suppression (Commit 2+) under v7.0.0 bump in Commit 6. Do not release
individually — JSON consumers depend on scoring band stability.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 22:00:29 +02:00
Kjell Tore Guttormsen
23aaaa6e6c feat(llm-security): honor LLM_SECURITY_IDE_ROOTS for JetBrains discovery
Symmetric with the existing VS Code branch — the env var was only wired
into getVSCodeExtensionRoots(), so the plan's master verification
(`LLM_SECURITY_IDE_ROOTS=... --intellij-only`) reported 0 discovered
plugins. Adding the same fallback to discoverJetBrainsExtensions makes
both families honor the CLI override and closes the gap.
2026-04-18 11:09:02 +02:00
Kjell Tore Guttormsen
378e177000 feat(llm-security): URL-fetch support for JetBrains Marketplace (v6.6.0) 2026-04-18 10:46:13 +02:00
Kjell Tore Guttormsen
23455e5a66 feat(llm-security): add fetchJetBrainsPlugin + URL detection for plugins.jetbrains.com 2026-04-18 10:39:54 +02:00
Kjell Tore Guttormsen
112cb5af45 refactor(llm-security): parameterize buildSandboxedWorker with workerPath 2026-04-18 10:37:10 +02:00
Kjell Tore Guttormsen
03d61d8bca feat(llm-security): implement JetBrains discovery + Android Studio base dir 2026-04-18 10:16:28 +02:00
Kjell Tore Guttormsen
5afb9b1f33 feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction 2026-04-18 10:15:12 +02:00
Kjell Tore Guttormsen
b86239448d feat(llm-security): add zero-dep plugin.xml + MANIFEST.MF parsers 2026-04-18 10:07:14 +02:00
Kjell Tore Guttormsen
a86ca00960 feat(llm-security): seed top-jetbrains-plugins.json + loadJetBrainsBlocklist export
Step 1/17 of ultraplan-2026-04-17-jetbrains-ide-scan.

- Populate top-jetbrains-plugins.json with 56 canonical xmlIds (bundled +
  popular third-party): com.intellij.java, org.jetbrains.kotlin,
  com.jetbrains.python.community, org.rust.lang, com.github.copilot,
  mobi.hsz.idea.gitignore, the legitimate-typo 'Lombook Plugin', etc.
- Add loadJetBrainsBlocklist() export mirroring loadVSCodeBlocklist shape.
  Blocklist is empty by design — no public confirmed-malicious JetBrains
  Marketplace plugins as of 2026-04-17.
- Add tests/scanners/ide-extension-data.test.mjs (9 tests, all pass).
- Fix cache bug in loadTopJetBrains: map normalizeId on cache-hit path too
  (was previously unnormalized on second call).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 09:56:55 +02:00
Kjell Tore Guttormsen
9f893c3858 feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0)
VSIX fetch + extract for URL targets now runs in a sub-process wrapped by
sandbox-exec (macOS) or bwrap (Linux), reusing the same primitives proven
by the v5.1 git-clone sandbox. Defense-in-depth — even if our own
zip-extract.mjs ever has a bypass, the kernel refuses any write outside
the per-scan temp directory.

New files:
- scanners/lib/vsix-fetch-worker.mjs — sub-process worker. Argv: --url
  --tmpdir; emits one JSON line on stdout (ok/sha256/size/source/extRoot
  or ok:false/error/code). Silent on stderr. Exit 0/1.
- scanners/lib/vsix-sandbox.mjs — wrapper. Exports buildSandboxProfile,
  buildBwrapArgs, buildSandboxedWorker, runVsixWorker. 35s timeout, 1 MB
  stdout cap.

Changes:
- scanners/ide-extension-scanner.mjs: fetchAndExtractVsixUrl is now
  sandbox-aware (useSandbox option, default true). In-process logic
  preserved as fallback. New meta.source.sandbox field:
  'sandbox-exec' | 'bwrap' | 'none' | 'in-process'.
- scan(target, { useSandbox }) defaults to true; tests pass false because
  globalThis.fetch mocks do not cross process boundaries.
- Windows fallback: in-process with meta.warnings advisory.

Tests:
- 8 new tests in tests/scanners/vsix-sandbox.test.mjs (per-platform
  profile generation, worker arg construction, live worker exit
  behavior on invalid URLs — no network).
- Existing URL tests updated to opt out of sandbox (useSandbox: false).
- 1344 → 1352 tests, all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 17:28:57 +02:00
Kjell Tore Guttormsen
fe0193956d feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0)
Pre-installation verification of VS Code extensions via URL — fetch a remote
VSIX, extract it in a hardened sandbox, and run the existing IDE scanner
pipeline against it. No npm dependencies.

Sources:
- VS Code Marketplace (publisher.gallery.vsassets.io direct download)
- OpenVSX (open-vsx.org official API)
- Direct .vsix HTTPS URLs

Defenses:
- HTTPS-only, TLS verified, manual redirect with per-source host whitelist
- 30s total timeout via AbortController
- 50MB compressed cap, 500MB uncompressed, 100x expansion ratio
- Zero-dep ZIP extractor: zip-slip, absolute paths, drive letters, NUL bytes,
  symlinks (Unix mode 0xA000), depth limits, ZIP64 rejected, encrypted rejected
- SHA-256 streamed during fetch, surfaced in meta.source
- Temp dir cleanup in all paths (try/finally)

Files:
- scanners/lib/vsix-fetch.mjs (HTTPS fetcher, host whitelist, streaming SHA-256)
- scanners/lib/zip-extract.mjs (zero-dep parser with hardening caps)
- knowledge/marketplace-api-notes.md (endpoint reference)
- 3 test files (48 tests added: vsix-fetch, zip-extract, ide-extension-url)

Tests: 1296 → 1344 (all green).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 17:16:26 +02:00
Kjell Tore Guttormsen
6252e55700 feat(llm-security): add /security ide-scan — VS Code / JetBrains extension prescan (v6.3.0)
New standalone scanner (prefix IDE) discovers installed VS Code extensions
across forks (Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH)
and runs 7 IDE-specific threat checks: blocklist match (CRITICAL),
theme-with-code, sideload (unsigned .vsix), dangerous uninstall hook (HIGH),
wildcard activation, extension-pack expansion, typosquat (MEDIUM).

Per-extension reuse of UNI/ENT/NET/TNT/MEM/SCR scanners with bounded
concurrency. Offline-first; --online opt-in. JetBrains discovery stubbed
for v1.1. 22 new tests (1296 total, was 1274).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 16:23:35 +02:00
Kjell Tore Guttormsen
f881cf9251 fix(scanners): preserve single-quoted regions through bash-normalize pipeline
Masks non-empty '...' content before T5/T2-T4 run so literal strings such
as `echo '${IFS}'` are not rewritten. Empty '' pairs are stripped first
so c''u''rl -> curl evasion keeps resolving. ANSI-C $'...' is decoded
before masking.

Caught by the false-positive probe added in Step 3 of ultraplan-v6.2.0.
2026-04-17 14:29:02 +02:00
Kjell Tore Guttormsen
05aaee0fcc feat(scanners): extend bash-normalize with T5 IFS + T6 ANSI-C hex quoting 2026-04-17 13:59:15 +02:00
Kjell Tore Guttormsen
2c33e9cc64 feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates
Add threshold-based exit codes (--fail-on <severity>) and compact
output mode (--compact) to scan-orchestrator and CLI. Pipeline
templates for GitHub Actions, Azure DevOps, GitLab CI with SARIF
upload. CI/CD guide with Schrems II/NSM compliance documentation.
npm publish preparation (files whitelist, .npmignore). Policy ci
section for distributable CI defaults. Version 6.1.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 14:59:05 +02:00
Kjell Tore Guttormsen
8ec320f40c feat(governance): add policy-as-code — .llm-security/policy.json for distributable hook configuration
New policy-loader.mjs reads .llm-security/policy.json with deep-merge against
defaults that exactly match existing hardcoded values. Integrated into all 7 hooks:
- pre-prompt-inject-scan: injection.mode (env var still takes precedence)
- post-session-guard: trifecta.mode, window_size, long_horizon_window
- pre-edit-secrets: secrets.additional_patterns
- pre-bash-destructive: destructive.additional_blocked
- pre-write-pathguard: pathguard.additional_protected
- pre-install-supply-chain: supply_chain.additional_blocked_packages
- post-mcp-verify: mcp.volume_threshold_bytes, mcp.trusted_servers

Backward compatible: no policy file = identical behavior to v5.1.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 13:37:02 +02:00
Kjell Tore Guttormsen
0439e0f650 feat(scanner): add AI-BOM generator — CycloneDX 1.6 format for AI supply chain transparency
New bom-builder.mjs discovers AI components (models, MCP servers, plugins,
knowledge files, hooks) and builds CycloneDX 1.6 JSON BOMs.
CLI entry point: node scanners/ai-bom-generator.mjs <target>.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 13:29:30 +02:00
Kjell Tore Guttormsen
269c14445c feat(governance): add structured JSONL audit trail with SIEM-ready schema
New audit-trail.mjs writes structured events to LLM_SECURITY_AUDIT_LOG path.
Integrated into post-session-guard at 6 warning emission points: trifecta,
escalation-after-input, data flow, volume threshold, slow-burn, behavioral drift.
No-op when env var not set — zero overhead for existing users.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 13:25:59 +02:00
Kjell Tore Guttormsen
2116e702df feat(scanner): add SARIF 2.1.0 output format to scan-orchestrator (--format sarif)
New sarif-formatter.mjs converts scan envelope to OASIS SARIF 2.1.0 standard.
Maps severity to SARIF levels, findings to results with locations and rules.
scan-orchestrator accepts --format sarif|json (default: json).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 13:22:59 +02:00
Kjell Tore Guttormsen
708c898754 feat(llm-security): sandboxed remote cloning v5.1.0
Harden git clone attack surface for remote scans with defense-in-depth:

Layer 1 (all platforms): 8 git config flags disable hooks, symlinks,
filter/smudge drivers, fsmonitor, local file protocol. 4 env vars
isolate from system/user git config and block interactive prompts.

Layer 2 (OS sandbox): macOS sandbox-exec and Linux bubblewrap (bwrap)
restrict file writes to only the specific temp directory. bwrap
probe-tests availability before use. Graceful fallback on Windows
and Ubuntu 24.04+ (git config hardening only).

Additional: post-clone 100MB size check, UUID-unique evidence filenames,
evidence file cleanup, cleanup guarantee in scan/plugin-audit commands.

32 new tests (1147 total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 17:08:32 +02:00
Kjell Tore Guttormsen
f93d6abdae feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00