feat(commands): E14 part 3 — /security mcp-baseline-reset slash command

Wave C step C3: closes E14 with the user-facing reset command.

After a legitimate MCP server upgrade the sticky baseline (added in C1)
becomes a stale "what the tool used to say" anchor and every subsequent
post-mcp-verify advisory will re-flag the change. /security mcp-baseline-reset
lets the user acknowledge the upgrade so the next call seeds a fresh
baseline.

New files:
- scanners/mcp-baseline-reset.mjs — small CLI wrapper around clearBaseline /
  listBaselines. Modes: --list (read-only), --target <name>, no-args (all).
  Outputs JSON summary on stdout. Exit 0 always (idempotent).
- commands/mcp-baseline-reset.md — dispatcher following mcp-inspect.md
  shape. Frontmatter: name=security:mcp-baseline-reset, sonnet model,
  Read/Bash/AskUserQuestion tools. 4-step body (list -> confirm scope
  -> execute -> confirm result).
- tests/scanners/mcp-baseline-reset.test.mjs — 10 CLI tests across
  --list, --target, clear-all, idempotency, history preservation, and
  bare-positional sugar.

Updated:
- commands/security.md — new row in commands table after mcp-inspect.
- CLAUDE.md — new commands-table row + new v7.3.0 narrative section
  describing the baseline schema, cumulative-drift detection, reset
  semantics, and the LLM_SECURITY_MCP_CACHE_FILE override.
- Plugin README.md — new MCP-baseline-reset row in commands table,
  scanner count 12 standalone -> 13 standalone, new "MCP Description
  Drift (E14, v7.3.0)" subsection explaining the sticky baseline,
  cumulative threshold, reset semantics, and env-var override.
- Root marketplace README.md — scanner count 22 -> 23 (10 orchestrated +
  13 standalone), command count 19 -> 20, test count 1511 -> 1768.

Wave C complete: 1738 -> 1768 tests (+30 across C1/C2/C3). Per plan,
Wave C does NOT bump the plugin version — that lands at the wave-bundle
release. The advisory text in post-mcp-verify already references the
new command path so the user has a ready remediation step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-30 16:49:01 +02:00
commit 001df2ebe8
7 changed files with 454 additions and 5 deletions

View file

@ -167,6 +167,7 @@ Or enable directly in `~/.claude/settings.json`:
| `/security plugin-audit [path\|url]` | Dedicated plugin security audit with Install/Review/Do Not Install verdict (local or GitHub URL) |
| `/security mcp-audit [--live]` | Focused audit of all installed MCP server configurations (add `--live` for runtime inspection) |
| `/security mcp-inspect` | Connect to running MCP stdio servers and scan live tool descriptions |
| `/security mcp-baseline-reset` | Reset the cumulative-drift baseline cache after a legitimate MCP server upgrade (E14, v7.3.0) |
| `/security ide-scan [target\|url]` | Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extensions — OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, or direct `.vsix` URL (v6.4.0). Typosquat, theme-with-code, sideload, broad activation, uninstall hooks, plus UNI/ENT/NET/TNT/MEM/SCR per extension. Offline by default |
| `/security posture` | Quick security posture scorecard (16 categories incl. compliance) |
| `/security diff [path]` | Compare scan against stored baseline — shows new/resolved/unchanged/moved findings |
@ -368,7 +369,7 @@ For deep scans (`/security scan --deep` or `/security deep-scan`), deterministic
## Deterministic Scanners
10 orchestrated + 12 standalone Node.js scanner scripts that perform byte-level analysis an LLM cannot. Zero external dependencies. Orchestrated scanners run via `node scanners/scan-orchestrator.mjs <target>` or through `/security deep-scan`. Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>`.
10 orchestrated + 13 standalone Node.js scanner scripts that perform byte-level analysis an LLM cannot. Zero external dependencies. Orchestrated scanners run via `node scanners/scan-orchestrator.mjs <target>` or through `/security deep-scan`. Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>`.
### Orchestrated (10)
@ -385,13 +386,14 @@ For deep scans (`/security scan --deep` or `/security deep-scan`), deterministic
| `supply-chain-recheck.mjs` | SCR | Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquat detection | LLM03 |
| `toxic-flow-analyzer.mjs` | TFA | Lethal trifecta detection: untrusted input + sensitive data access + exfiltration sink. Cross-component correlation (runs last) | ASI01, ASI02, ASI05 |
### Standalone (12)
### Standalone (13)
| Scanner | Prefix | Purpose |
|---------|--------|---------|
| `scan-orchestrator.mjs` | — | Entry point: runs all 10 orchestrated scanners, outputs JSON |
| `posture-scanner.mjs` | PST | Deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms |
| `mcp-live-inspect.mjs` | MCI | Live MCP server inspection via JSON-RPC 2.0 (tool injection, shadowing, URL/IP) |
| `mcp-baseline-reset.mjs` | — | Reset cumulative-drift baseline cache (E14, v7.3.0) — `--list` / `--target <tool>` / clear-all. Idempotent JSON output |
| `ide-extension-scanner.mjs` | IDE | VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extension prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks — then UNI/ENT/NET/TNT/MEM/SCR per extension |
| `attack-simulator.mjs` | — | Red-team harness: 64 scenarios, 12 categories, adaptive mutation mode |
| `ai-bom-generator.mjs` | BOM | CycloneDX 1.6 AI Bill of Materials |
@ -402,6 +404,12 @@ For deep scans (`/security scan --deep` or `/security deep-scan`), deterministic
| `content-extractor.mjs` | — | Pre-extracts evidence from untrusted repos, strips injection patterns |
| `watch-cron.mjs` | — | Cron wrapper: scans all targets in config, writes summary, exits with verdict code |
### MCP Description Drift (E14, v7.3.0)
`scanners/lib/mcp-description-cache.mjs` anchors a sticky **baseline** description per MCP tool plus a rolling 10-event history. Cumulative drift is computed as `levenshtein(current, baseline) / max(|current|, |baseline|)`; when it crosses `mcp.cumulative_drift_threshold` (default 0.25), `post-mcp-verify.mjs` emits a MEDIUM `mcp-cumulative-drift` advisory — independent of the existing per-update >10% drift signal. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from the baseline are now caught.
The baseline survives the 7-day TTL purge so detection persists across the full window. After a legitimate MCP server upgrade, run `/security mcp-baseline-reset` (or `node scanners/mcp-baseline-reset.mjs --target <tool>`) to clear the stale baseline. The next call seeds a fresh baseline from the incoming description; description, firstSeen, lastSeen, and history are preserved across reset for audit. `LLM_SECURITY_MCP_CACHE_FILE` env var overrides the cache path for testing without polluting `~/.cache/llm-security/mcp-descriptions.json`.
**Why deterministic?** LLMs are powerful at semantic analysis — understanding intent, detecting social engineering, assessing context. But they cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.
**Shared library** (`scanners/lib/`): severity classification, string utilities (entropy, Levenshtein, base64 detection), output formatting, file discovery, and YAML frontmatter parsing.