Adversarial payloads in markdown link title attributes (rendered as
tooltips, parsed by agents) bypassed the existing HTML-content checks
which gated on `<tag>` presence. Pattern: [text](url "title").
Adds linkTitleRegex extraction to the HTML-content block, runs each
captured title through scanForInjection, emits at the strongest tier
encountered with category markdown-link-title-injection.
+3 tests (62 → 62 in post-mcp-verify.test.mjs file, was 59).
Refs: Batch B Wave 4 / Step 9 / v7.2.0
Two follow-up fixes after E16 + E17 landed:
1. foldHomoglyphs ASCII fast-path
- scanForInjection calls foldHomoglyphs on every scan (raw + normalized).
- Pre-fix: NFKC normalization runs unconditionally, even on pure
ASCII inputs where it's a no-op.
- Result: benchmark.test.mjs timed out at 120s on the full suite.
- Fix: charCodeAt sweep for >=128, short-circuit return s when
all ASCII. NFKC and HOMOGLYPH_MAP iteration only run when
non-ASCII chars are present (the actual attack case).
- Verified: benchmark.test.mjs passes within timeout.
2. Attack-scenario UNI-003 expectation
- Pre-E16: "Homoglyph Cyrillic-Latin mixing" payload triggered only
a MEDIUM "obfuscation present" advisory (exit 0, stdout match
"MEDIUM").
- Post-E16: the same payload is folded to Latin BEFORE pattern
matching, so it now matches CRITICAL "ignore previous instructions"
and blocks (exit 2).
- This is the intended v7.2.0 behavior — not a regression. Updated
expectation: exit_code 2, stdout_match "block". Renamed scenario
to "now blocked via E16 fold, v7.2.0".
Suite: pre-compact-scan flake remains (perf-budget under load,
passes isolated). All other tests green.
Critical-review §4 E17 finding: pre-v7.2.0 the delegation-after-input
advisory fired only within a 5-call window. Attackers who deliberately
waited 6+ calls before delegating bypassed detection. Window was also
hardcoded — operators couldn't tune it for their environment.
Two coordinated changes:
1. LLM_SECURITY_ESCALATION_WINDOW env var (primary window override)
- parseInt(env) || getPolicyValue('trifecta', 'escalation_window', 5)
- Mirrors the established pattern from
LLM_SECURITY_TRIFECTA_MODE et al.
- Setting env=3 narrows; env=8 expands.
2. Secondary 20-call MEDIUM advisory (slow-burn variant)
- DELEGATION_ESCALATION_WINDOW_MEDIUM = 20 (hardcoded — same value
for all operators; tunable in a future patch if needed)
- checkEscalationAfterInput now returns `tier: 'primary'|'secondary'|null`
- formatEscalationWarning emits a different message for secondary —
mentions "slow-burn", references env-var, distinct from the
primary "DeepMind Category 4" framing
Hook reads max(WINDOW_SIZE, secondary+5) entries to cover the wider
window. Existing duplicate-suppression (`escalation_warning` state
entry) covers both tiers. Audit-trail event captures `tier` field.
Tests: +5 cases in tests/hooks/post-session-guard.test.mjs:
- secondary window catches 9-call distance (slow-burn)
- secondary boundary at exactly 20 calls
- primary regression guard (1-call distance)
- env=3 narrows primary (4-call distance becomes secondary)
- env=8 expands primary (7-call distance stays primary)
Updated existing test "does NOT trigger when input_source is >5 calls
ago" — now requires >20 calls (secondary window catches 6-20).
Suite: 1644 → 1672 (+28 from new tests + extended scope). All green.
CLAUDE.md hooks table updated to document both windows and the env var.
Critical-review §4 E16 finding: pre-v7.2.0 homoglyph normalization fired
ONLY for the MEDIUM-advisory "obfuscation present" signal. Pattern
matchers in scanForInjection compared against raw + decoded variants
only — they did NOT compare against a fold-normalized variant. As a
result, "ignоre previous instructions" (Cyrillic о, U+043E) bypassed
the CRITICAL "ignore previous" pattern.
Two coordinated edits:
scanners/lib/string-utils.mjs
- Adds HOMOGLYPH_MAP (frozen) — surgical Cyrillic/Greek → Latin map.
~25 entries focused on injection-vocabulary letters
(a, e, o, c, p, x, y, i, j, s, l, A, E, O, C, P, X, Y, T).
- Adds foldHomoglyphs(s) — pipeline: NFKC → apply HOMOGLYPH_MAP.
NFKC handles Mathematical Alphanumeric (U+1D400 block), fullwidth
Latin (U+FF21 block), ligatures, width variants.
Excluded by design from HOMOGLYPH_MAP:
- Latin Extended (æ, ø, å, é, è, ñ, ü, ö, ä, ç, ß, þ, ð) — legitimate
Norwegian/German/French/Spanish letters. Map them and we false-positive
on every non-English source file.
- Greek letters not visually overlapping (β, γ, δ, ...)
- Cyrillic letters not visually overlapping (б, г, д, ж, ...)
scanners/lib/injection-patterns.mjs
- scanForInjection now builds a 4-variant set: raw, normalized,
folded(raw), folded(normalized). Set deduplication skips redundant
identical variants. Existing dedup-by-label (seenLabels Set) prevents
double-counts when the same pattern matches in multiple variants.
- foldHomoglyphs added to the imports.
Tests: +27 cases in tests/lib/string-utils-homoglyph.test.mjs:
- 6 Cyrillic → Latin (lowercase, uppercase, multiple substitutions,
Palochka U+04CF)
- 3 Greek → Latin
- 2 NFKC normalization (Math Bold, Fullwidth)
- 8 preserves-non-confusable (Norwegian æøå, German umlauts, French
accents, Spanish ñ, emoji, CJK, Arabic/Hebrew)
- 3 edge cases (empty, null/undefined, idempotency)
- 5 scanForInjection integration (Cyrillic ignore, Cyrillic Assistant,
Norwegian non-trigger, benign "ignore" comment, mixed Cyrillic+Greek)
Test-development found: U+1D5DC is "I" not "A" (test pin caught my
codepoint mistake — fixed during dev).
Suite: 1617 → 1644 (+27). All green.
Critical-review §4 E1 finding: pre-v7.2.0 the Unicode-stego detector
(`containsUnicodeTags`) covered only U+E0001-E007F (Tag block). Private
Use Areas — also invisible in most terminals and surviving normalization
— were not detected. Attackers could encode payloads in PUA codepoints
that pass through `scanForInjection` undetected.
Coverage extended to:
- U+E0001-E007F Unicode Tag block (existing — DeepMind kat. 1)
- U+F0000-FFFFD Supplementary PUA-A (NEW — E1)
- U+100000-10FFFD Supplementary PUA-B (NEW — E1)
Detection-only for PUA: PUA characters have NO standard ASCII mapping,
so `decodeUnicodeTags` leaves them unchanged. Detection alone is
sufficient — `scanForInjection` emits HIGH on any presence, regardless
of decoded content.
Function name `containsUnicodeTags` preserved for back-compat. All
existing call sites (injection-patterns.mjs:259, etc.) work unchanged.
Semantically the function is now "containsHiddenUnicode".
Tests: +21 cases in tests/lib/string-utils-hidden-unicode.test.mjs:
- 5 Tag-block regression guards
- 4 PUA-A range cases (start, just-inside, end, buried-in-ASCII)
- 3 PUA-B range cases
- 5 boundary cases (gap U+E0080-EFFFF, U+10FFFE noncharacter, emoji,
CJK, Latin Extended — all must be FALSE)
- 4 decodeUnicodeTags passthrough cases (PUA-A unchanged, PUA-B
unchanged, Tag block still decodes, mixed Tag+PUA)
Suite: 1596 → 1617 (+21). All green.
Critical-review §4 E15 finding: agent files in .claude/agents/ are loaded
as Claude Code subagent system prompts and are a direct memory-poisoning
surface. Pre-v7.2.0 the scanner covered CLAUDE.md, .claude/rules/*.md,
memory/*.md, REMEMBER.md, .local.md, and .claude-plugin/plugin.json —
but not .claude/agents/*.md.
Single-line addition to MEMORY_FILE_PATTERNS:
/(?:^|\/)\.claude\/agents\/[^/]+\.md$/
The existing scan loop, scanForInjection integration, and severity-
mapping logic all apply unchanged. STRICT_FILES_PATTERN intentionally
NOT extended — agents may legitimately quote shell commands as examples
(consistent with CLAUDE.md treatment).
Tests: +3 cases in tests/scanners/memory-poisoning.test.mjs:
- "scans .claude/agents/*.md" (smoke test — at least one finding from
the new fixture)
- "agent file injection pattern detected"
- "agent file credential path detected"
New fixture: tests/fixtures/memory-scan/poisoned-project/.claude/agents/
poisoned-agent.md — agent with injection, credential ref, permission
expansion, and exfil URL. Triggers all 4 detection categories.
Suite: 1591 → 1594 (+3). All green.
Critical-review §2 B7 finding: pure Levenshtein <=2 misses the most common
modern typosquat pattern — popular-name + token-injection suffix. Examples:
lodash → lodash-utils (edit distance 6, not flagged pre-B7)
react → react-helper (edit distance 7, not flagged pre-B7)
express → express-wrapper (edit distance 8, not flagged pre-B7)
Three coordinated edits:
scanners/lib/string-utils.mjs
- Adds tokenize(name): string[] splits on -/_, lowercases
- Adds tokenOverlap(a, b): number intersection.size / min(|a|,|b|)
- Adds TYPOSQUAT_SUSPICIOUS_TOKENS frozen list of common typosquat
suffixes. Excludes language-extension tokens (js, jsx, ts, tsx) — the
v7.0.0 allowlist contains `tsx` as a legit package and including the
same token in the suspicious set creates a contradiction. Caught by
the new allowlist-intersection-guard test. Also excludes 'pro'
(legitimate edition marker).
scanners/dep-auditor.mjs + scanners/supply-chain-recheck.mjs
- New checkTyposquatTokenOverlap() helper — fires AFTER Levenshtein 1/2
branches, only when:
1. popular package's tokens ⊆ declared name's tokens (strict superset)
2. declared name has at least one suspicious suffix
3. popular package is in topCutoff window
All three conditions required — conservative by design. Allowlist
precedence preserved (existing 22 npm + 13 PyPI entries always pass).
MEDIUM severity, NOT block. New finding title prefix:
"Possible typosquatting via token-overlap".
Tests: +21 cases across two new files
- tests/lib/string-utils-tokens.test.mjs (15) — tokenize, tokenOverlap,
TYPOSQUAT_SUSPICIOUS_TOKENS frozen contract, allowlist-intersection
guard (caught the tsx conflict on first run)
- tests/scanners/dep-token-overlap.test.mjs (7) — integration via
in-memory tmpdir fixtures: lodash-utils flagged, react-helper flagged,
express-wrapper flagged, lodash exact NOT flagged, allowlist tools
(knip/tsx/nx/rimraf) NOT flagged, react-router-dom (no suspicious
suffix) NOT flagged, react itself (equal token set, not superset)
NOT flagged.
Existing dep.test.mjs and supply-chain-recheck.test.mjs unchanged —
all green (149 → 149 regression guard).
Suite: 1570 → 1591 (+21). All green.
Critical-review §2 B6 finding: extractAssignedVariable handled
`const X = ...` and `X = ...` but missed every modern JS/TS
destructuring pattern. Sinks downstream of destructured/spread vars
produced false negatives at the propagation step.
Patterns now recognized:
- `const { x } = source` object destructuring
- `const { x, y } = source` multi-key
- `const { secret: alias } = source` renamed (key NOT bound)
- `const { x, ...spread } = source` object rest
- `const { a, b: { c } } = source` nested object (key NOT bound)
- `const [a, b] = source` array destructuring
- `const [first, ...rest] = source` array rest
- `const [a, [b, c]] = source` nested array
- `const { user: { id }, ...rest }` mixed nested
Implementation: regex-based two-pass walker. Pass 1 detects whether
the LHS is a destructuring pattern (`{...}` or `[...]`). If yes, the
new `extractDestructuredNames` helper walks the pattern body via a
balanced-bracket depth counter, recurses into nested patterns, and
distinguishes keys (`key:`) from bindings. If no, the plain-decl
branch matches `\b(?:const|let|var)\s+(\w+)`.
Plain-assignment branch (`X = ...` without keyword) and Python-style
patterns are unchanged.
The function is now exported for direct unit testing — same pattern
as `_resetCacheForTest` in policy-loader. The internal walker
(`extractDestructuredNames`) remains module-private.
Tests: +19 cases in tests/scanners/taint-destructuring.test.mjs:
- 5 pre-B6 patterns (regression guard: plain decl, plain assign,
no-match on equality)
- 12 destructuring patterns covering object/array/rest/nested
- 2 non-destructuring regressions (return literal, arrow param)
Existing taint-tracer.test.mjs and taint.test.mjs unchanged — both
green (14 → 14, fixture-based integration tests not affected).
Suite: 1551 → 1570 (+19). All green.
Critical-review §2 B3 finding: `riskScore({info: N}) = 0` silently masks
info-volume findings. The behavior was correct (info is scoring-inert by
design) but undocumented. Operators reading a report with N info findings
had no way to know they contribute zero to verdict/band.
Three coordinated edits:
- scanners/lib/severity.mjs JSDoc — explicit "Info severity" subsection
spelling out: scoring-inert, surfaced in owaspCategorize aggregates,
treat as observability telemetry not verdict input. @param updated to
mark info as accepted but ignored.
- CLAUDE.md v7.0.0 risk-score-v2 line — one-sentence anchor pointing to
severity.mjs JSDoc.
- tests/lib/severity.test.mjs — anchor test alongside the existing
4-critical=93 anchor: asserts riskScore({info: 50}) === 0,
riskScore({info: 1000}) === 0, verdict({info: 100}) === 'ALLOW',
riskBand(riskScore({info: 500})) === 'Low'.
Decision: skip the optional `infoScore()` helper from the brief. No
current consumer would use it; doc-only fix keeps API surface minimal.
Revisit if a consumer emerges.
Tests: 1522 → 1523 (+1 anchor block, 4 assertions). All green.
Documents the v7.1.1 narrative-coherence patch in CLAUDE.md (mini-block
appended after the v7.0.0 paragraph) and CHANGELOG.md (new [7.1.1]
section per Keep a Changelog convention, placed above [7.1.0]).
Plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md
Brief: .claude/ultraplan-spec-2026-04-29-report-coherence.md
Verification gates passed:
- npm test: 1522/1522 (was 1511; +11 from new narrative test)
- node --test tests/lib/severity.test.mjs: 86/86 (co-monotonicity sweep
at lines 252-303 unchanged and green)
- node --test tests/scanners/skill-scanner-narrative.test.mjs: 11/11
- Orchestrator against fixture: WARNING / 48 / 1 HIGH (HITL trap caught
correctly, no whiplash)
- SARIF inline check via toSARIF import: sarif-version 2.1.0, runs: 1
- Zero remaining v1 cutoffs in agent + template
Out of scope but flagged for Batch B (deferred to v7.2.0):
- commands/scan.md:113-114 retains v1 risk formula
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 assertions across 4 describe groups against tests/fixtures/skill-scan/
hyperframes-like/. Tests the deterministic input layer that feeds
skill-scanner-agent — does NOT invoke the LLM (no precedent in 1511 tests).
Coverage:
- content-extractor (5 it): exit 0 on animation markup; exactly 1 HIGH
HITL trap; >= 2 process.env credential refs; has_injection=true (any
injection signal flips it); has_critical_injection=false (no CRITICAL
in fixture).
- entropy scanner (2 it): calibration block present; <= 1 finding (rest
suppressed via line-context rules).
- co-monotonicity (2 it): {high:1} → WARNING/High; {high:1, info:1} →
WARNING (info scoring-inert). Inline guard mirrors the sweep at
tests/lib/severity.test.mjs:252-303 so this file fails fast if the
invariant drifts.
- agent prompt contract (2 it): static asserts that
agents/skill-scanner-agent.md contains 'Step 2.5: Context-First
Severity Assignment', 'summary.narrative_audit.suppressed_findings',
'score>=65', AND zero remaining 'score >= 61' references; same v2-
cutoff + narrative-audit contract on templates/unified-report.md.
Part of v7.1.1 narrative-coherence patch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Synthetic skill content mimicking the noise profile of frontend
animation projects (HTML5 canvas, framework env-vars, inline SVG data
URIs, CSS keyframes) plus exactly one genuine HITL trap signal.
Used by tests/scanners/skill-scanner-narrative.test.mjs (added in
v7.1.1) to exercise:
- content-extractor: HIGH HITL trap signal + framework env-var
references (process.env.REACT_APP_*, VITE_PUBLIC_*)
- entropy scanner: inline SVG data URI suppressed via line-context rules
The .llm-security-ignore file uses the SCANNER:glob format
(scanners/scan-orchestrator.mjs:34-40) — ENT:**/*.md suppresses any
entropy-scanner findings when the fixture is run through scan-orchestrator
in the Step 6 smoke test.
Part of v7.1.1 narrative-coherence patch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five coordinated edits to address scan-rapport whiplash at the agent
prompt level:
- Step 2.5 (NEW): Context-First Severity Assignment. Every signal has
exactly one disposition — suppressed (counted only) or reported (full
finding). The split happens BEFORE severity is assigned. Forbids
'false positive', 'legitimate framework', 'no action required' in
finding-body text; reserves them for the Suppressed Signals section.
- Verdict Logic: replaces stale v1 sum-and-cap formula (BLOCK >=61) with
v2 reference (severity-dominated, BLOCK >=65) matching severity.mjs
since v7.0.0. Documents that severity counts MUST exclude suppressed
signals; introduces verdict_rationale field for descriptive context
when suppressed >= 5 AND reported <= 1 high.
- Output Format: adds Suppressed Signals as required section #4 with
category-level bullet format. Documents the trailing JSON shape
including summary.narrative_audit.suppressed_findings.{count,
by_category} and verdict_rationale fields.
- Comment block before Category 2 suppression rules clarifies that
'false positive' as taxonomy language is OK; only finding-body
description fields are forbidden from using the phrase.
- Step 0 (Norwegian generaliseringsgrense) preserved unchanged.
Part of v7.1.1 narrative-coherence patch (plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates the HTML-comment risk-formula reference at lines 55-66 from the
stale v1 sum-and-cap formula to the v2 severity-dominated tiers that
have been authoritative in scanners/lib/severity.mjs since v7.0.0. Adds
a Narrative Audit block inside the Executive Summary section surfacing
summary.narrative_audit.suppressed_findings.{count,by_category} from
the agent's trailing JSON. The block is transparency only — it does
NOT affect risk_score, riskBand, or verdict.
Part of v7.1.1 narrative-coherence patch (plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v7.1.0 release commit (621db14) bumped the version badge and added a
CHANGELOG entry, but missed the README Version History table. Adding
the row now so the public-facing version history at
git.fromaitochitta.com/open/ktg-plugin-marketplace reflects v7.1.0.
Row covers: B1 + B2 + B4 fixes, A3 honesty-sweep (7 phrases), B8 CaMeL
nedton, test count 1487 → 1511, "why" framing tied to critical-review
§F CISO perspective.
Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying
claim where it is accurate but removes hype/overreach language. Historical
CHANGELOG/README version-table rows are intentionally left as-is (they document
what was claimed at the time of release, not what is true today).
Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md,
docs/security-hardening-guide.md):
- "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring
(v2 model, BREAKING)". Removes hype framing; describes the actual mechanism.
- "Context-aware entropy scanner" → "Rule-based entropy scanner with
file-extension skip, 8 line-level suppression rules, and configurable policy".
No ML/context inference; just rules.
- "1487 tests" → "1511 unit and integration tests; mutation-testing coverage
not published". Updated count after A1+A2 (+24) and added qualifier.
- "Fully Schrems II compatible" → "Schrems II compatible in default offline
mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`)
transmits package identifiers to a Google-operated API and is a separate
compliance consideration." Acknowledges the OSV.dev opt-in caveat.
- "Rule of Two enforcement" → "Rule of Two detection (configurable; default
warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed
trifectas detected but not blocked by default)". "Enforcement" implied
block; default is warn.
- "Hardened ZIP extractor" → suffix " — no fuzz-testing results published
to date". Caps and class-of-attacks rejected are accurate; absence of
formal fuzz coverage now stated.
- "defense-in-depth" — preserved as framing, but quantified in
security-hardening-guide §4: "three independent detection layers with
documented bypass classes". Each layer named, each layer's known bypasses
pointed to (critical-review §4 evasion arsenal).
Tests: 1511/1511 green (no behavioural change).
Closes A2 of v7.1.0 critical-review patch (docs/critical-review-2026-04-20.md):
- B4 (severity JSDoc): 4 critical = 93, not 90. Fixed in scanners/lib/severity.mjs:23
and CHANGELOG.md v7.0.0 tier description. The actual computation has always been
93 (70 + log2(5)*10 = 93.22 → round); only the docs were wrong.
- §5.4 co-monotonicity: new sweep test in tests/lib/severity.test.mjs over 15
representative count vectors. Asserts that (verdict, riskBand) agree under the
v7.0.0 contract for every case — catches future drift between riskScore tiers,
verdict cutoffs, and riskBand cutoffs. Includes a B4 anchor test (riskScore
{critical: 4} === 93) so doc/code drift fails loudly.
- B8 (CaMeL claims toned down): post-session-guard.mjs:646 comment block and
CLAUDE.md:184 Defense Philosophy bullet now describe the implementation
honestly — opportunistic byte-matching of truncated output fingerprints
(first 200 bytes, SHA-256/16-hex), not semantic data-flow tracking.
Trivially bypassed by mutation, summarisation, or re-encoding. Inspired by
CaMeL (DeepMind 2025), but not a CaMeL capability-tracking implementation.
Tests: 1495 → 1511 (+16: 15 sweep cases + 1 B4 anchor). All green.
Previously, `LLM_SECURITY_TRIFECTA_MODE=block` only exited 2 when the
detected trifecta was MCP-concentrated (all three legs via the same MCP
server) or involved sensitive-path + exfil. Distributed trifectas —
three legs originating from different tools, with a non-sensitive data
path and a non-sensitive exfiltration sink — were detected and warned
but not blocked. This mismatched the documented semantics of block mode
and gave operators a false sense of enforcement.
Change: remove the `(mcpInfo.concentrated || sensitiveExfil)` AND-gate
in the `TRIFECTA_MODE === 'block'` branch so any detected trifecta
blocks in block mode. Audit event `severity` still differentiates
critical (concentrated / sensitive-exfil) from high (distributed); the
blocked stderr message now explicitly names "Distributed trifecta:
three legs from different sources" when the confidence sub-signals
are absent.
Addresses critical review 2026-04-20 §2 B2 (HIGH) and §9 row 1
("enforces the Rule of Two").
Tests: 1 added (distributed trifecta in block mode now exits 2).
All 1495 tests pass.
The previous ENV regex `/[\\/]\.env\.[a-z]+$/` only matched a single
lowercase segment after `.env`. Multi-segment and mixed-case variants
such as `.env.production.local.backup`, `.env.stage-1.local`, and
`.env.CI.secret` slipped past the hook. Replaced with
`/[\\/]\.env(\.[A-Za-z0-9._-]+)*$/` which matches `.env` plus any
number of dot-separated alphanumeric/dot/hyphen/underscore segments.
`.envrc` (direnv config, no dot separator) is still allowed.
Addresses critical review 2026-04-20 §2 B1 (HIGH).
Tests: 7 added (6 new multi-segment BLOCK cases + 1 .envrc ALLOW).
All 1494 tests pass.
feature-gap-agent and /posture command both reference quality area count.
Update both to reflect Token Efficiency as the 8th area.
Tests: 543 passing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reflect 9 scanners, 17 commands, 543+ tests, new TOK scanner, and
/config-audit tokens command.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New plugin that produces a complete session handoff in under 60s:
NEXT-SESSION artifact, commit+push, and copy-paste prompt for next
session. Built for context-constrained models like Opus 4.7 where
sessions fill fast.
- Single declarative command, no hooks/agents/skills
- Detects handoff type: multi-session / plugin-work / single-task
- Default filename NEXT-SESSION-PROMPT.local.md; slug-override
- Flags: --no-commit, --dry-run
- Auto-generated Conventional Commits message from git diff --stat
- Respects pre-commit hooks (secrets, pathguard) — never bypasses
Also: add *.local.md to root .gitignore (existing NEXT-SESSION files
were untracked but not ignored) and list plugin in marketplace
README + CLAUDE.md per docs-convention.
Document the Opus 4.7 era upgrade: TOK scanner, /config-audit tokens,
Token Efficiency 8th area, scanner/verifier agent migration to sonnet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add 2026-04 deltas (v2.1.83-v2.1.111) verified against
research/03-claude-code-changes-config-surfaces.md (2026-04-19):
- Opus 4.7 + token-efficiency surfaces (env vars, attribution.commit/pr)
- Sandbox isolation (sandbox.* keys)
- Managed-only enterprise lockdown flags
- disableSkillShellExecution (v2.1.91)
- forceRemoteSettingsRefresh (v2.1.92)
No new hook events in this range — noted in hook-events-reference.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
E2E verification against content-heavy repo (`content-claude-code`) revealed
413 entropy findings (8 HIGH / 405 MEDIUM) from markdown image CDN URLs in
JSON content indexes — e.g., ``.
These are legitimate content-repo artifacts, not credentials. The 40-char
hash segment in the CDN URL trips Shannon entropy (H=5.29 over 300 chars),
and rule 13 (inline <svg>) doesn't match since there's no literal `<svg>`
tag — the `.svg` is just a URL path suffix.
Added rule 18 `MARKDOWN_IMAGE = /!\[[^\]]*\]\(\s*https?:\/\//` — matches
`` / ``. Line-level (not string-level) so URL
is not over-specific.
E2E impact on `content-claude-code`:
- Before: BLOCK / 65 / 8H 437M 0L
- After: WARNING / 56 / 3H 427M 0L
Hyperframes unchanged: BLOCK / 80 / 1C 4H 92M — real CRITICAL SQL-injection
and HIGH findings still detected.
Tests: 2 new (positive + negative fixture) bringing entropy-context to 26,
total suite 1485 → 1487.
Docs updated to "rules 11-18" and "8 new line-suppression rules".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final commit in the trustworthy-scoring series. Bundles verdict cutoff
alignment, the last suite of tests, and all documentation touch-points
that quote version numbers or describe v7.0.0 behaviour.
Verdict/band co-monotonicity
- `scanners/lib/severity.mjs` — verdict cutoffs moved from 61/21 to 65/15
so `BLOCK >= 65`, `WARNING >= 15` locks onto the v2 riskBand() boundaries.
Prevents "BLOCK / Medium band" contradictions under the v2 formula.
Scanner hardening (bug fixes from v7.0.0 testing)
- `scanners/entropy-scanner.mjs` — `policy_source` now uses
`existsSync('.llm-security/policy.json')` instead of value-based check.
Old heuristic always reported 'policy.json' because DEFAULT_POLICY now
carries an `entropy.thresholds` section.
- `scanners/lib/file-discovery.mjs` — `.sass` and GPU shader extensions
(`.glsl, .frag, .vert, .shader, .wgsl`) added to TEXT_EXTENSIONS. Without
this, shader files were invisible to file-discovery, so they were never
counted as skipped by the entropy-scanner extension filter.
Tests
- `tests/scanners/entropy-context.test.mjs` (new, 24 tests) — A. File-ext
skip (4), B. Line-level rules 11-17 (8), C. Policy overrides (3).
Fixtures generate 80-char base64 payloads at runtime via
`crypto.randomBytes` to dodge the plugin's own pre-edit credential hook
on the test source.
- `tests/lib/severity.test.mjs` — rewritten with v2 scoring table (70
tests total, was 52).
- `tests/lib/output.test.mjs:243` — "1 critical = score 80" under v2
(was 25 under v1).
- Full suite: 1485/1485 green (was 1461).
Docs
- `CHANGELOG.md` — v7.0.0 entry with BREAKING CHANGES section.
- `README.md` (plugin + marketplace root) — version badge, history table,
plugin-card version string, test count.
- `CLAUDE.md` — header version, "v7.0.0 — Trustworthy scoring" summary
paragraph at the top.
- `docs/security-hardening-guide.md` — new section 6 "Calibration & false
positives" documenting v2 formula, context-aware entropy scanner,
typosquat allowlist, and §6.4 tuning workflow. Existing "Recommended
baseline" section renumbered to §7.
Version bump
- `6.6.0 -> 7.0.0` across package.json, .claude-plugin/plugin.json,
scanners/ide-extension-scanner.mjs VERSION const, README badge,
CLAUDE.md header, marketplace root README card.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Makes suppression stats visible in the deep-scan report so users can
audit why the scanner produced the counts it did. Before: synthesizer
would acknowledge "true risk is High, not Extreme" in prose while
verdict stayed BLOCK/Extreme — inconsistent. After Commit 1 the
orchestrator verdict is coherent on its own; synthesizer's job shrinks
to transparency.
- Adds 'Scan Calibration' section instruction consuming
scanner.calibration.* fields (entropy files_skipped_by_extension,
policy_source, thresholds).
- Heuristic: omit the section if < 5% of files skipped (no signal).
Flag the section if > 80% skipped (policy may be too aggressive).
- Explicit 'Don't override verdict' directive in DON'T DO list.
Discrepancy goes in calibration, not in a rewritten dashboard.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hyperframes scan flagged knip vs knex, oxlint vs eslint, tsx vs nx,
rimraf vs trim as HIGH typosquats. All four are legitimate top-1000 npm
packages; short names just happen to be within Levenshtein ≤2 of other
top packages. These shouldn't generate HIGH severity on a clean install.
Added to npm allowlist: knip, oxlint, tsx, nx, rimraf, glob, tar, zod,
ky, ow, esm, ip, qs, url, prettier, vitest, vite, rollup, swc, turbo,
bun, deno. Added to pypi allowlist: uv, ruff, rich, typer, anyio.
Dep-auditor normalization (lowercase + [_.-] → -) already applied at
load time. dep.test.mjs: 11/11 still green — lodsah→lodash detection
preserved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds entropy section to DEFAULT_POLICY and wires it into entropy-scanner.
Users can now tune false-positive tradeoffs without forking the scanner.
Policy shape (.llm-security/policy.json):
entropy:
thresholds.{critical,high,medium}.{entropy,minLen} — numeric overrides
suppress_extensions[] — additive ext skip
suppress_line_patterns[] — additional regex
suppress_paths[] — relPath substrings
Wiring: entropy-scanner calls loadPolicy(targetPath) at scan entry (not
orchestrator-passed — avoids signature churn across 10 scanners). Module-
level state is reset per scan invocation. Scanner envelope now includes
calibration.{policy_source, thresholds, files_skipped_by_*} for
synthesizer transparency (Commit 5).
Malformed user regex silently skipped. Missing policy.json → built-in
defaults (backwards-compatible).
entropy.test.mjs: 9/9 still green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observed 70% false-positive rate on renderer/shader codebases (hyperframes):
GLSL, CSS-in-JS, inline HTML/SVG, ffmpeg filter-strings, hardcoded
User-Agent strings all matched base64-like entropy thresholds. This
commit adds two suppression layers before classification.
Layer A — file-extension skip: .glsl/.frag/.vert/.shader/.wgsl (shaders),
.css/.scss/.sass/.less (stylesheets), .svg (markup), .min.js/.min.css
(minified bundles). Tracked via new calibration.files_skipped_by_extension
field on scanner envelope for synthesizer stats.
Layer B — seven new line-level suppression rules in isFalsePositive()
(rules 11-17): GLSL/WGSL keywords, CSS-in-JS (styled/emotion/@keyframes),
inline HTML/SVG markup, ffmpeg filter-graph syntax, browser User-Agent,
SQL DDL/DML, error-message templates with embedded HTML.
Existing entropy.test.mjs: 9/9 still green — known bad base64 payload in
telemetry.mjs fixture still detected. Policy-driven thresholds wired in
Commit 3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace sum-and-cap formula (every non-trivial scan → 100/Extreme) with
severity-dominated, log-scaled-within-tier model. Discriminates actual
risk: 1 critical = 80, 2 critical = 86, 17 high = 65. Hyperframes-class
rendering codebases no longer collapse to Extreme just from shader noise.
Changes:
- scanners/lib/severity.mjs: new riskScore() v2; keep riskScoreV1() for
reference; riskBand() cutoffs aligned (14/39/64/84).
- scanners/posture-scanner.mjs: delete inline duplicate formula, import
riskScore/riskBand/verdict from severity.mjs. Single source of truth.
Breaking: aggregate.risk_score semantics change. Batched with entropy
suppression (Commit 2+) under v7.0.0 bump in Commit 6. Do not release
individually — JSON consumers depend on scoring band stability.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>