New CPS scanner walks CLAUDE.md cascade and flags volatile content
between lines 31 and 150 — the cache-prefix window beyond TOK Pattern
A's top-30 territory. Volatile content anywhere in the cached prefix
forces a fresh cache write from that line down on every turn.
Volatile-pattern set extends TOK Pattern A with:
- shell-exec lines (! prefix) — common in CLAUDE.md to inject git/date
- ${VAR} substitutions — vary per-shell, defeat cache reuse
Severity: medium per finding. Skips lines 1-30 to avoid duplicating
Pattern A's range; CPS' value is in the 31-150 zone.
Wired into scan-orchestrator + scoring SCANNER_AREA_MAP. CPS shares
the "Token Efficiency" area with TOK; scoreByArea now deduplicates by
area name and combines counts across scanners contributing to the
same area, so the 9-area scorecard contract holds.
Fixtures volatile-mid-section/{volatile-line-60, volatile-line-200}
verify both positive (line 60) and out-of-window (line 200) cases.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag.
Tests: 604 → 611 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New scanners/manifest.mjs CLI + commands/manifest.md slash command.
Reads activeConfig and produces a flat, ranked list of every token
source (CLAUDE.md cascade entries, plugins, skills, MCP servers, hooks)
sorted DESC by estimated_tokens.
CLAUDE.md per-file tokens are derived by distributing
claudeMd.estimatedTokens across the cascade proportional to bytes.
Tests cover both real-config (plugin root) and fixture (rich-repo with
patched HOME containing 2 plugins + 3 skills + .mcp.json) paths, plus
error handling (nonexistent path → exit 3, --output-file).
Builds on readActiveConfig from M1 (v5 alpha.2).
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag on
feat commits without doc changes.
Tests: 593 → 604 (+11).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds detectMcpToolBudget detection block in TOK scanner. Tiered severity
per project-local .mcp.json server based on toolCount:
- < 20: no finding
- 20-49: low
- 50-99: medium
- 100+: high
- null (manifest unparseable): low + "tool count unknown" message
Scoped to source==='.mcp.json' to keep findings actionable for the
audited path; plugin/user-level MCP servers are surfaced by the
manifest scanner (Step 19 / N2).
5 fixtures (mcp-budget/{14,25,60,120,unknown}-tools) use inline `tools`
arrays in .mcp.json — no node_modules needed for these tests.
Tests assert title+severity (not exact ID) since TOK IDs are sequential
per scan, not semantic per pattern.
[skip-docs] reason: v5 plan fences off README/CLAUDE.md badge updates
to Session 5; Forgejo pre-commit-docs-gate hook requires this tag on
feat commits without doc changes.
Tests: 586 → 593 (+7).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Filesystem counts are the source of truth; README badges parsed via
line-anchored substring (badge/<kind>-<N>-...). Emits readmeCheck object
with counts/badges/mismatches.
CLI: node scanners/self-audit.mjs --check-readme [--json]
API: runSelfAudit({ checkReadme: true }) → result.readmeCheck
Helper: checkReadmeBadges(pluginDir) for per-fixture testing
New fixture: readme-desynced/ (commands/foo + bar, README claims 1).
Note: alpha phase does NOT require result.readmeCheck.passed === true.
Self-test of real plugin currently fails (scanners 10 vs 9, tests 31 vs 543);
will be reconciled in Session 5 Step 28 (README sync).
582 → 586 tests, all green.
The mcp-tool-heavy fixture relies on node_modules/mcp-heavy/package.json
being committed so the v5 M1 tool-count detection test runs deterministically.
Add an unignore rule for tests/fixtures/**/node_modules/.
- New Pattern F in TOK: low-severity finding when SKILL.md description > 500 chars
- Scoped to discovery.files (project-local) — activeConfig.skills walk would
pull in user/plugin skills out of project scope
- New fixtures: skill-bloated (594-char desc) + skill-tight (46-char baseline)
574 → 576 tests, all green.
- New Pattern E in TOK: emits medium finding when activeConfig.claudeMd.estimatedTokens > 10_000
- Uses cascade tokens, file count, and calibration note as evidence
- New fixtures: large-cascade (37k bytes / 14475 cascade tokens) + small-cascade (5k baseline)
572 → 574 tests, all green.
- Pattern A (cache-breaking volatile top): medium → high
- Pattern B (redundant permissions): low → medium
- Pattern C (deep @import chain): medium → low
- Add calibration_note evidence on every TOK finding
- Table-driven severity tests (identify by title, IDs are sequential)
563 → 569 tests, all green. Doc sweep deferred to Session 5 (Step 28).
Per-step result table for Steps 1-9 + 8b with commit SHAs and notable
deviations (Step 6 baseline switch to sonnet-era, Step 8 surprise on
sonnet-era discovery scope, PathGuard hook false positive on test
fixtures). 543 → 563 tests, all green, no blockers carried forward.
Summarizes F1-F5 scope: TOK ↔ readActiveConfig integration, 'mcp' kind
in estimateTokens (15 → ≥500), severity-weighted scoreByArea, dead-code
removal in TOK hotspots, Pattern D / CA-TOK-004 removal.
Includes migration notes for downstream consumers (CA-TOK-* globs still
suppress 001-003; scoringVersion field added for v4→v5 detection).
Pattern D was the v4 sonnet-era signature: 'config is structurally
clean but uses no Opus-4.7-specific features'. Two problems:
- It triggered on any minimal config that happened to lack skills/MCP
- The advice was generic and not actionable
The hotspots ranking and per-pattern findings (A/B/C) cover the same
ground with concrete, file-anchored signal. Dropping the noise.
BREAKING (intentional): scanners no longer emit the sonnet-era info
finding. Suppression entries and downstream tooling that reference
the v4 finding ID should be updated. Doc sweep follows in Step 8b.
Tests: sonnet-era fixture now asserts zero findings.
The buildHotspots padding loop and unused 'take' variable were dead
code from the v3 hotspots-min contract. Replaced with a clean
ranked.slice(0, HOTSPOTS_MAX). Tiny fixtures may now return fewer
than 3 hotspots, which is the honest answer; the contract now only
asserts <= 10.
Tests: +2 cases — every hotspot.source is unique (no padding); length
never exceeds HOTSPOTS_MAX.
Removes the v4 'void readActiveConfig' placeholder and wires the
active-config snapshot into the TOK scanner.
Per-turn behavior changes:
- Each enabled MCP server becomes its own hotspot entry (richer than
the parent .mcp.json file alone)
- total_estimated_tokens now includes MCP server cost
- result.activeConfig exposes a small summary
(claudeMdEstimatedTokens, mcpServerCount, pluginCount, skillCount)
Failures of readActiveConfig are non-fatal — the scanner falls back
to the discovery-only path used in v4.
Tests: +3 cases on the new tok-active-config fixture
(.mcp.json with 2 servers, CLAUDE.md, plugin skeleton).
Two MCP enumeration paths in readActiveMcpServers now pass kind='mcp'
to estimateTokens with optional toolCount derived from def.tools array
(populated when callers cache MCP discovery — Step 14 wires that up).
Hook callers keep kind='item' (no schema overhead).
Visible effect: every active MCP server jumps from estimatedTokens=15
to >= 500 (or higher when toolCount is known). The whats-active output
and TOK hotspots now reflect actual MCP cost.
Tests: assert mcpServers[].estimatedTokens >= 500 in fixture.
The plugin lives in ktg-plugin-marketplace and is distributed via the
Claude Code marketplace mechanism. There is no standalone
open/claude-code-llm-security repo; references to it were aspirational
and never realized.
- package.json: homepage now deep-links to plugins/llm-security/ in the
marketplace; repository.url uses the marketplace repo with directory
field (npm convention for monorepo plugins); bugs.url routes to
marketplace issue tracker.
- CLAUDE.md: "Public Repository" section replaced with "Distribution"
section documenting the marketplace install path.
- CONTRIBUTING.md: issue tracker URL points at marketplace issues with
[llm-security] prefix convention.
- CHANGELOG.md: v7.3.1 entry rewritten to reflect actual change
(URLs corrected to marketplace, not "fixed from one wrong URL to
another wrong URL").
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace count-based pass-rate with severity-weighted penalty:
- penalty = sum(count[s] * WEIGHTS[s])
- maxBudget = max(10, findingCount * 4)
- passRate = max(0, 100 - penalty / maxBudget * 100)
A few lows no longer crater an area's grade; a single high or critical
consumes a large fraction of budget. Mirrors the operator intuition that
severity, not count, is the signal.
BREAKING (intentional): scoring semantics differ from v4 for non-clean
configs. Add scoringVersion: 'v5' to the returned struct so consumers
can detect the version. baseline-all-a remains all-A (no critical/high
on that fixture).
Tests: +6 cases for severity weighting; existing "many findings" test
updated to use highs (where v5 still drops the grade as expected).
Promote WEIGHTS const to named export with Object.freeze for downstream
use in scoring.mjs (severity-weighted scoreByArea, F3).
Tests: +2 cases asserting WEIGHTS shape.
No behavior changes. Sets the public stance, tightens documentation, and
removes coherence drift so anyone forking or downloading the plugin gets
a consistent starting point.
Added:
- CONTRIBUTING.md — public fork-and-own guide. Why PRs are not accepted,
how to fork well, what is welcome via issues.
- README "Project scope" section — out-of-scope table naming what is
fork-and-own territory (web dashboard, fleet policy, runtime firewall,
IDE LSP, compliance pack, ticketing, multi-tenancy, ML detectors,
marketplace UI, SSO/SCIM/RBAC) with commercial alternatives.
- package.json: bugs.url, CONTRIBUTING/SECURITY/CHANGELOG in files
whitelist for npm publishing.
Changed:
- SECURITY.md rewritten. Supported-versions table from stale 5.1.x to
current reality (7.3.x active, 7.0-7.2 best-effort, <7.0 EOL).
Best-effort solo response timeline. Scope expanded to bin/.
- Scanner VERSION constants synced to plugin version. Was 6.0.0 in
dashboard-aggregator and posture-scanner.
- package.json repository.url corrected from fromaitochitta/ to open/.
- README "Feedback & contributing" links to CONTRIBUTING.md.
Fixed:
- pre-compact-scan size-cap timing test ceiling raised 500ms -> 1000ms.
Was a flake on Intel Mac and CI under load. Design target unchanged
(<500ms, documented in CLAUDE.md).
Notes:
- First patch on the stabilization line (post-2026-05-01).
- Wave E attack-simulator scenarios deferred indefinitely; coverage
remains at 72.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 7 of v2.0 plan. Registers SessionStart, Stop, and statusLine hooks.
Note: statusLine top-level placement in hooks.json is an open assumption
(brief Assumption 1) — verified to be valid JSON syntax; live smoke-test
required to confirm Claude Code loads it from this location vs requiring
settings.json placement.
Step 6 of v2.0 plan. SessionStart hook fires on source: resume or
source: compact, walks up to 3 levels searching for
NEXT-SESSION-*.local.md, injects content via additionalContext, and
archives the file (rename to *.archived.local.md) to prevent stale-load
in later sessions. 9 tests cover sources, multi-level search,
topic-slug variants, archive filtering, malformed payload.
Step 4 of v2.0 plan. statusLine hook reads context_window.used_percentage
from stdin payload and prints display-only hint at 60% / 70%. NEVER runs
git (research/03 — statusLine scripts can be cancelled mid-flight, unsafe
for side effects). 9 tests cover thresholds, null payload, malformed JSON.
Includes hook-helper.mjs copied from llm-security as test infrastructure.
Step 1 of v2.0 plan. Hard cut from commands/ to skills/ per Anthropic
recommendation for new plugins. Frontmatter sets disable-model-invocation:
true and pins model: claude-sonnet-4-6. Docs (README, CLAUDE.md, root
README) deferred to Step 9 per plan.
Batch C release. Closes 12 implementation tasks (E3, E8-E14, 8.4, 8.6,
8.7, 8.10) across four execution waves: A (bash + decoder), B (supply
chain + workflow scanner), C (MCP cumulative drift), D (code quality).
Wave E (9 new attack-simulator scenarios for the new defenses) deferred
to v7.3.1 — defenses are unit-tested per wave; the deferred work adds
attack-simulator regression coverage on top, not the primary safety net.
Tests: 1665+ → 1777 (Wave A-D cumulative, +112).
Version sync targets touched:
- package.json
- .claude-plugin/plugin.json
- CLAUDE.md (header)
- README.md (badge + new release-history row)
- scanners/ide-extension-scanner.mjs (VERSION constant)
- ../../README.md (marketplace root plugin entry)
- CHANGELOG.md (new [7.3.0] section per Keep a Changelog, all 12 task
IDs covered individually under Added/Changed/Documentation/Tests/Notes)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- package.json med node:test runner og scripts (test, simulate), zero deps
- settings.json: fjern vestigial exploration- og agentTeam-blokker (verifisert leset av ingen kode via grep)
- docs/: commit subagent-delegation-audit.md og ultraexecute-v2-observations-from-config-audit-v4.md (begge real arkitektur-notater)
- docs/: arkiver ultra-suite-brief_2.md som _archive- (var paste fra annet plugin-arbeid, irrelevant her)
- tests/helpers/hook-helper.mjs kopiert fra llm-security m/ provenance-kommentar
Forberedelse for Spor 1 (lib/-moduler), Spor 2 (HANDOVER-CONTRACTS + PreCompact-hook), Spor 3 (bug-fixes + CC-features).
Plan: ~/.claude/plans/det-neste-vi-gj-r-eventual-adleman.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract `/ultra-cc-architect-local` and `/ultra-skill-author-local` plus all 7
supporting agents, the `cc-architect-catalog` skill (13 files), the
`ngram-overlap.mjs` IP-hygiene script, and the skill-factory test fixtures
from `ultraplan-local` v2.4.0 into a new `ultra-cc-architect` plugin v0.1.0.
Why: ultraplan-local had drifted into containing two distinct domains — a
universal planning pipeline (brief → research → plan → execute) and a
Claude-Code-specific architecture phase. Keeping them together forced users
to inherit an unfinished CC-feature catalog (~11 seeds) when they only
wanted the planning pipeline, and locked the catalog and the pipeline into
the same release cadence. The architect was already optional and decoupled
at the code level — only one filesystem touchpoint remained
(auto-discovery of `architecture/overview.md`), which already handles
absence gracefully.
Plugin manifests:
- ultraplan-local: 2.4.0 → 3.0.0 (description + keywords updated)
- ultra-cc-architect: new at 0.1.0 (pre-release; catalog is thin, Fase 2/3
of skill-factory unbuilt, decision-layer empty, fallback list still
needed)
What stays in ultraplan-local: brief/research/plan/execute commands, all
19 planning agents, security hooks, plan auto-discovery of
`architecture/overview.md` (filesystem-level contract, not code-level).
What moved (28 files via git mv, R100 — full history preserved):
- 2 commands, 8 agents, 1 skill catalog (13 files), 2 scripts, 8 fixtures
Documentation updates: plugin CLAUDE.md and README.md for both plugins,
root README.md (added ultra-cc-architect section, updated ultraplan-local
section), root CLAUDE.md (added ultra-cc-architect to repo-struktur),
marketplace.json (registered ultra-cc-architect), ultraplan-local
CHANGELOG.md (v3.0.0 entry with migration guidance).
Test verification: ngram-overlap.test.mjs passes 23/23 from new location.
Memory updated: feedback_no_architect_until_v3.md now points at the new
plugin and reframes the threshold around catalog maturity rather than an
ultraplan-local milestone.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave C step C3: closes E14 with the user-facing reset command.
After a legitimate MCP server upgrade the sticky baseline (added in C1)
becomes a stale "what the tool used to say" anchor and every subsequent
post-mcp-verify advisory will re-flag the change. /security mcp-baseline-reset
lets the user acknowledge the upgrade so the next call seeds a fresh
baseline.
New files:
- scanners/mcp-baseline-reset.mjs — small CLI wrapper around clearBaseline /
listBaselines. Modes: --list (read-only), --target <name>, no-args (all).
Outputs JSON summary on stdout. Exit 0 always (idempotent).
- commands/mcp-baseline-reset.md — dispatcher following mcp-inspect.md
shape. Frontmatter: name=security:mcp-baseline-reset, sonnet model,
Read/Bash/AskUserQuestion tools. 4-step body (list -> confirm scope
-> execute -> confirm result).
- tests/scanners/mcp-baseline-reset.test.mjs — 10 CLI tests across
--list, --target, clear-all, idempotency, history preservation, and
bare-positional sugar.
Updated:
- commands/security.md — new row in commands table after mcp-inspect.
- CLAUDE.md — new commands-table row + new v7.3.0 narrative section
describing the baseline schema, cumulative-drift detection, reset
semantics, and the LLM_SECURITY_MCP_CACHE_FILE override.
- Plugin README.md — new MCP-baseline-reset row in commands table,
scanner count 12 standalone -> 13 standalone, new "MCP Description
Drift (E14, v7.3.0)" subsection explaining the sticky baseline,
cumulative threshold, reset semantics, and env-var override.
- Root marketplace README.md — scanner count 22 -> 23 (10 orchestrated +
13 standalone), command count 19 -> 20, test count 1511 -> 1768.
Wave C complete: 1738 -> 1768 tests (+30 across C1/C2/C3). Per plan,
Wave C does NOT bump the plugin version — that lands at the wave-bundle
release. The advisory text in post-mcp-verify already references the
new command path so the user has a ready remediation step.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave C step C2: surface the cumulative-drift signal from
checkDescriptionDrift() (added in C1) as a separate MEDIUM advisory
with finding category mcp-cumulative-drift. Independent of the existing
per-update drift advisory — a slow-burn rug-pull that keeps each update
below the 10% per-update threshold but cumulatively drifts >=25% from
the sticky baseline now triggers the new advisory without ever crossing
the per-update bar.
The advisory references /security mcp-baseline-reset (added in C3) so
the user knows how to acknowledge a legitimate MCP server upgrade.
CLAUDE.md updates:
- post-mcp-verify hooks-table row mentions per-update + cumulative drift
- mcp-description-cache lib bullet documents baseline schema, history,
cumulative threshold policy key, and LLM_SECURITY_MCP_CACHE_FILE
override.
Tests: 2 new hook tests using LLM_SECURITY_MCP_CACHE_FILE for cache
isolation. Existing 68 still pass; total 70.
Plugin README and root marketplace README updates land in C3 alongside
the new /security mcp-baseline-reset slash command (combined Wave-C
doc update per plan §"Wave C — Touch" list).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>