Final commit in the trustworthy-scoring series. Bundles verdict cutoff
alignment, the last suite of tests, and all documentation touch-points
that quote version numbers or describe v7.0.0 behaviour.
Verdict/band co-monotonicity
- `scanners/lib/severity.mjs` — verdict cutoffs moved from 61/21 to 65/15
so `BLOCK >= 65`, `WARNING >= 15` locks onto the v2 riskBand() boundaries.
Prevents "BLOCK / Medium band" contradictions under the v2 formula.
Scanner hardening (bug fixes from v7.0.0 testing)
- `scanners/entropy-scanner.mjs` — `policy_source` now uses
`existsSync('.llm-security/policy.json')` instead of value-based check.
Old heuristic always reported 'policy.json' because DEFAULT_POLICY now
carries an `entropy.thresholds` section.
- `scanners/lib/file-discovery.mjs` — `.sass` and GPU shader extensions
(`.glsl, .frag, .vert, .shader, .wgsl`) added to TEXT_EXTENSIONS. Without
this, shader files were invisible to file-discovery, so they were never
counted as skipped by the entropy-scanner extension filter.
Tests
- `tests/scanners/entropy-context.test.mjs` (new, 24 tests) — A. File-ext
skip (4), B. Line-level rules 11-17 (8), C. Policy overrides (3).
Fixtures generate 80-char base64 payloads at runtime via
`crypto.randomBytes` to dodge the plugin's own pre-edit credential hook
on the test source.
- `tests/lib/severity.test.mjs` — rewritten with v2 scoring table (70
tests total, was 52).
- `tests/lib/output.test.mjs:243` — "1 critical = score 80" under v2
(was 25 under v1).
- Full suite: 1485/1485 green (was 1461).
Docs
- `CHANGELOG.md` — v7.0.0 entry with BREAKING CHANGES section.
- `README.md` (plugin + marketplace root) — version badge, history table,
plugin-card version string, test count.
- `CLAUDE.md` — header version, "v7.0.0 — Trustworthy scoring" summary
paragraph at the top.
- `docs/security-hardening-guide.md` — new section 6 "Calibration & false
positives" documenting v2 formula, context-aware entropy scanner,
typosquat allowlist, and §6.4 tuning workflow. Existing "Recommended
baseline" section renumbered to §7.
Version bump
- `6.6.0 -> 7.0.0` across package.json, .claude-plugin/plugin.json,
scanners/ide-extension-scanner.mjs VERSION const, README badge,
CLAUDE.md header, marketplace root README card.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Makes suppression stats visible in the deep-scan report so users can
audit why the scanner produced the counts it did. Before: synthesizer
would acknowledge "true risk is High, not Extreme" in prose while
verdict stayed BLOCK/Extreme — inconsistent. After Commit 1 the
orchestrator verdict is coherent on its own; synthesizer's job shrinks
to transparency.
- Adds 'Scan Calibration' section instruction consuming
scanner.calibration.* fields (entropy files_skipped_by_extension,
policy_source, thresholds).
- Heuristic: omit the section if < 5% of files skipped (no signal).
Flag the section if > 80% skipped (policy may be too aggressive).
- Explicit 'Don't override verdict' directive in DON'T DO list.
Discrepancy goes in calibration, not in a rewritten dashboard.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hyperframes scan flagged knip vs knex, oxlint vs eslint, tsx vs nx,
rimraf vs trim as HIGH typosquats. All four are legitimate top-1000 npm
packages; short names just happen to be within Levenshtein ≤2 of other
top packages. These shouldn't generate HIGH severity on a clean install.
Added to npm allowlist: knip, oxlint, tsx, nx, rimraf, glob, tar, zod,
ky, ow, esm, ip, qs, url, prettier, vitest, vite, rollup, swc, turbo,
bun, deno. Added to pypi allowlist: uv, ruff, rich, typer, anyio.
Dep-auditor normalization (lowercase + [_.-] → -) already applied at
load time. dep.test.mjs: 11/11 still green — lodsah→lodash detection
preserved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds entropy section to DEFAULT_POLICY and wires it into entropy-scanner.
Users can now tune false-positive tradeoffs without forking the scanner.
Policy shape (.llm-security/policy.json):
entropy:
thresholds.{critical,high,medium}.{entropy,minLen} — numeric overrides
suppress_extensions[] — additive ext skip
suppress_line_patterns[] — additional regex
suppress_paths[] — relPath substrings
Wiring: entropy-scanner calls loadPolicy(targetPath) at scan entry (not
orchestrator-passed — avoids signature churn across 10 scanners). Module-
level state is reset per scan invocation. Scanner envelope now includes
calibration.{policy_source, thresholds, files_skipped_by_*} for
synthesizer transparency (Commit 5).
Malformed user regex silently skipped. Missing policy.json → built-in
defaults (backwards-compatible).
entropy.test.mjs: 9/9 still green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observed 70% false-positive rate on renderer/shader codebases (hyperframes):
GLSL, CSS-in-JS, inline HTML/SVG, ffmpeg filter-strings, hardcoded
User-Agent strings all matched base64-like entropy thresholds. This
commit adds two suppression layers before classification.
Layer A — file-extension skip: .glsl/.frag/.vert/.shader/.wgsl (shaders),
.css/.scss/.sass/.less (stylesheets), .svg (markup), .min.js/.min.css
(minified bundles). Tracked via new calibration.files_skipped_by_extension
field on scanner envelope for synthesizer stats.
Layer B — seven new line-level suppression rules in isFalsePositive()
(rules 11-17): GLSL/WGSL keywords, CSS-in-JS (styled/emotion/@keyframes),
inline HTML/SVG markup, ffmpeg filter-graph syntax, browser User-Agent,
SQL DDL/DML, error-message templates with embedded HTML.
Existing entropy.test.mjs: 9/9 still green — known bad base64 payload in
telemetry.mjs fixture still detected. Policy-driven thresholds wired in
Commit 3.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace sum-and-cap formula (every non-trivial scan → 100/Extreme) with
severity-dominated, log-scaled-within-tier model. Discriminates actual
risk: 1 critical = 80, 2 critical = 86, 17 high = 65. Hyperframes-class
rendering codebases no longer collapse to Extreme just from shader noise.
Changes:
- scanners/lib/severity.mjs: new riskScore() v2; keep riskScoreV1() for
reference; riskBand() cutoffs aligned (14/39/64/84).
- scanners/posture-scanner.mjs: delete inline duplicate formula, import
riskScore/riskBand/verdict from severity.mjs. Single source of truth.
Breaking: aggregate.risk_score semantics change. Batched with entropy
suppression (Commit 2+) under v7.0.0 bump in Commit 6. Do not release
individually — JSON consumers depend on scoring band stability.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sync plugin.json, plugin README badge, and marketplace root README
plugin-table to 2.4.0. Closes the v2.4.0 rollout.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add v2.4.0 CHANGELOG entry documenting the background-mode removal
rationale (harness does not expose Agent tool to sub-agents per
github.com/anthropics/claude-code/issues/19077). Update plugin CLAUDE.md
architecture sections to drop background-transition phases and redefine
the three orchestrator agents as inline reference. Update plugin README
mode tables for /ultraresearch-local, /ultra-cc-architect-local,
/ultraplan-local — --fg is now a no-op alias. Update marketplace root
README with a v2.4.0 paragraph above the v2.3 changelog summary.
Closes the docs portion of the v2.4.0 rollout. Version-sync follows in
the next commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add prominent warning block that the Agent tool is not exposed to
sub-agents (foreground or background). Rewrite Shape A (orchestrator
handoff) from "valid pattern" to "confirmed anti-pattern, removed in
v2.4.0". Correct Composition section: background + subagents is
NOT supported — nested spawn from a sub-agent silently fails.
Source: github.com/anthropics/claude-code/issues/19077
Confirmed empirically 2026-04-19.
BREAKING CHANGE: Catalog consumers that cited Shape A should migrate
to orchestrating from the main command context instead.
Redefine research-orchestrator, planning-orchestrator, and
architect-orchestrator from "background executor" to "inline
reference documentation". The agent files remain as the canonical
workflow descriptions, but the /ultra* commands now execute the
phases directly in the main command context instead of spawning
these agents as sub-agents.
The /ultra* command markdowns are now the de-facto orchestrators.
Splitting work into a separate sub-agent was incompatible with the
harness's treatment of the Agent tool (not exposed to sub-agents).
BREAKING CHANGE: These agents are no longer invoked. Any external
integration that spawned them directly should now invoke the
corresponding /ultra* command instead.
Remove background-transition phases from /ultraresearch-local,
/ultraplan-local, /ultra-cc-architect-local, /ultrabrief-local.
All four commands now run their full pipelines inline in main
context; --fg is retained as a no-op alias for backwards
compatibility.
The Claude Code harness does not expose the Agent tool to
sub-agents, so orchestrators launched with run_in_background:true
cannot spawn their documented swarms (docs-researcher,
community-researcher, architecture-mapper, plan-critic, etc.) and
silently degrade to single-context reasoning. Foreground execution
keeps the swarms intact.
Source: github.com/anthropics/claude-code/issues/19077
Confirmed empirically 2026-04-19.
BREAKING CHANGE: Default execution is foreground — the session
blocks until the brief/plan is ready. Use `claude -p` in a
separate terminal for long-running headless work.
Transparency: all code in this marketplace is produced by Claude Code
through dialog-driven development. Root README gets a full disclosure
section; each plugin README gets a one-line disclosure linking back to
the marketplace section.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
skill-drafter now reads {catalog_root}/<slug>.md before writing its
draft and prepends a warning block to its confirmation output when
an existing skill would be overwritten during manual `mv` promotion.
The draft is still written to .drafts/<slug>.md — the check is a
hint, not a block.
Closes v2.3.0 dogfood finding (post_dogfood_findings[0]): the
drafter produced .drafts/hooks-pattern.md when an approved
hooks-pattern.md seed already existed, giving no signal that `mv`
during promotion would silently overwrite the seed. v2.3.1
introduced the qualified-slug mechanism to resolve such collisions;
v2.3.2 surfaces them at the right moment — before promotion.
Changes:
- agents/skill-drafter.md — new Step 2 between slug computation and
source reading. Reads {catalog_root}/<slug>.md, inspects
review_status, derives a kebab-case qualifier from the concept
handle (or source basename fallback). Subsequent steps renumbered
3→7. Output format gains Collision: field and optional warning
block. New Hard Rule.
- tests/fixtures/skill-drafter/slug-collision-expected.md — reference
fixture documenting expected confirmation shape across four
scenarios (no collision, approved collision, soft pending
collision, collision with no good qualifier). Skill-drafter is
prompt-driven; fixture anchors shape for human verification and
downstream parsers.
- CHANGELOG [2.3.2], plugin.json 2.3.1→2.3.2, README badge, plugin
CLAUDE.md slug-convention Collision-hint bullet, marketplace root
README summary, marketplace root CLAUDE.md plugin table.
Non-breaking. No frontmatter/drafts-layout/tool-scope/regex changes.
Existing pipelines see one extra field and an optional warning —
both purely additive.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolves v2.3.0 dogfood collision: skill-factory produced a
specialized hooks-pattern.md draft that would have overwritten the
generic seed. Qualified slugs let one feature host multiple named
patterns at different abstraction levels.
Slug convention: <cc_feature>[-<qualifier>]-<layer>.md. Unqualified =
canonical baseline. Qualified = sub-pattern (e.g., hooks-observability-
pattern.md) that does not displace the baseline.
Changes:
- SKILL.md: slug convention section, coverage-table qualified column,
matcher logic for N patterns per feature, modification rules cover
qualified-vs-canonical choice and collision handling.
- feature-matcher: catalog map is cc_feature -> {layer -> [skills]};
selection rules (baseline by default, qualified when justified,
multi-skill when non-overlapping); supporting_skill accepts list.
- gap-identifier: adds pattern_count[cc_feature] to coverage audit.
- architecture-critic: supporting-skill verification — every cited
skill name must exist in the catalog (blocker severity).
- First qualified skill: hooks-observability-pattern.md (promoted from
.drafts/, source ai-psychosis/README.md, ngram-overlap 0.01).
- Version bump 2.3.0 -> 2.3.1 across plugin.json, badges, table, root
CLAUDE.md, CHANGELOG.
Non-breaking: existing unqualified slugs keep working, no cc_feature
taxonomy changes, hallucination gate unchanged.
Opus orchestrator for /ultra-skill-author-local. Sequential 5-phase pipeline:
validate source → concept-extractor → skill-drafter → ip-hygiene-checker
→ completion summary.
No retry logic, no parallelism, no automation (per brief Non-Goals). Each
phase consumes the previous phase's output. --quick mode skips IP-hygiene
with a BIG WARNING for testing the drafting pipeline in isolation.
Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 9)
Sonnet worker for /ultra-skill-author-local. Consumes concept-extractor JSON
plus source file, writes a 150-600 word draft to .drafts/<slug>.md with the
9-field frontmatter contract (review_status=pending, ngram_overlap_score=null).
Imperative voice, progressive disclosure, no verbatim copy from source.
Aborts with too-technical-to-paraphrase if the subject can't be rephrased.
Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 7)
Sonnet worker for /ultra-skill-author-local. Reads ONE local source file
and emits structured concept JSON with cc_feature/layer/concept fields.
Enforces two gates:
- Hallucination gate: cc_feature MUST be one of 8 canonical values
- Gap-class gate: class C (decision) and D (outside CC) → out_of_scope
Plan: .claude/projects/2026-04-18-skill-factory-fase-1-mvp/plan.md (step 6)
Symmetric with the existing VS Code branch — the env var was only wired
into getVSCodeExtensionRoots(), so the plan's master verification
(`LLM_SECURITY_IDE_ROOTS=... --intellij-only`) reported 0 discovered
plugins. Adding the same fallback to discoverJetBrainsExtensions makes
both families honor the CLI override and closes the gap.
Thresholds <=10 (fixture) and <=20 (plugin root) have been too tight since
before this plan started — baseline on 1634197 already produced 37 and 27
findings. git-forensics findings accumulate with repo history, so fixed
caps are brittle. Raised to <=100 to tolerate organic growth while still
catching runaway/pathological output.
Step 1/17 of ultraplan-2026-04-17-jetbrains-ide-scan.
- Populate top-jetbrains-plugins.json with 56 canonical xmlIds (bundled +
popular third-party): com.intellij.java, org.jetbrains.kotlin,
com.jetbrains.python.community, org.rust.lang, com.github.copilot,
mobi.hsz.idea.gitignore, the legitimate-typo 'Lombook Plugin', etc.
- Add loadJetBrainsBlocklist() export mirroring loadVSCodeBlocklist shape.
Blocklist is empty by design — no public confirmed-malicious JetBrains
Marketplace plugins as of 2026-04-17.
- Add tests/scanners/ide-extension-data.test.mjs (9 tests, all pass).
- Fix cache bug in loadTopJetBrains: map normalizeId on cache-hit path too
(was previously unnormalized on second call).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace hardcoded Q1-Q8 in /ultrabrief-local with a section-driven
completeness loop (Phase 3) and a draft/review/revise loop with
brief-reviewer as stop-gate (Phase 4). Quality drives the interview,
not a question counter.
brief-reviewer now emits a machine-readable JSON block with per-dimension
scores (1-5) and detail arrays alongside the existing prose report;
planning-orchestrator continues to consume the prose verdict unchanged.
Phase 4 gate: all dimensions >= 4 AND research_plan = 5. On fail, a
targeted follow-up is generated from the weakest dimension's detail
field and the draft is re-reviewed. Max 3 review iterations bound cost;
exhaustion writes brief.md with brief_quality: partial and an explicit
Brief Quality section. Force-stop surfaces per-dimension findings before
the user chooses continue or partial.
Not breaking. /ultrabrief-local [--quick] <task> interface unchanged.
--quick now means compact start with escalation, not a max-N cap.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Opus 4.7 reads agent instructions more literally than 4.6. The v1.7
planning-orchestrator described the Step+Manifest schema via prose +
procedural rules, which 4.6 inferred correctly but 4.7 sometimes
rendered as narrative "Fase N" prose — producing plans ultraexecute
Phase 2 rejected. First observed 2026-04-17 during llm-security v6.2.0
planning.
v1.8.0 closes the gap:
- planning-orchestrator Phase 5 embeds a literal copyable Step+Manifest
example (JWT middleware) replacing "read the template" prose
- Explicit forbidden-format clause: ## Fase N, ### Phase N, ### Stage N,
and any non-"### Step N:" heading are denied
- Phase 5.5 schema self-check: grep-verify canonical Step count matches
Manifest count and narrative heading count is zero, before handing to
plan-critic
- ultraexecute-local --validate mode: schema-only check that parses
steps + manifests, reports READY/FAIL with actionable error hints,
no security scan, no execution. Fast sanity check between
/ultraplan-local and full execution.
Static verification: 17/17 PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
VSIX fetch + extract for URL targets now runs in a sub-process wrapped by
sandbox-exec (macOS) or bwrap (Linux), reusing the same primitives proven
by the v5.1 git-clone sandbox. Defense-in-depth — even if our own
zip-extract.mjs ever has a bypass, the kernel refuses any write outside
the per-scan temp directory.
New files:
- scanners/lib/vsix-fetch-worker.mjs — sub-process worker. Argv: --url
--tmpdir; emits one JSON line on stdout (ok/sha256/size/source/extRoot
or ok:false/error/code). Silent on stderr. Exit 0/1.
- scanners/lib/vsix-sandbox.mjs — wrapper. Exports buildSandboxProfile,
buildBwrapArgs, buildSandboxedWorker, runVsixWorker. 35s timeout, 1 MB
stdout cap.
Changes:
- scanners/ide-extension-scanner.mjs: fetchAndExtractVsixUrl is now
sandbox-aware (useSandbox option, default true). In-process logic
preserved as fallback. New meta.source.sandbox field:
'sandbox-exec' | 'bwrap' | 'none' | 'in-process'.
- scan(target, { useSandbox }) defaults to true; tests pass false because
globalThis.fetch mocks do not cross process boundaries.
- Windows fallback: in-process with meta.warnings advisory.
Tests:
- 8 new tests in tests/scanners/vsix-sandbox.test.mjs (per-platform
profile generation, worker arg construction, live worker exit
behavior on invalid URLs — no network).
- Existing URL tests updated to opt out of sandbox (useSandbox: false).
- 1344 → 1352 tests, all green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>