feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]

Companion to 8df5d5c (which only carried the doc updates — the example directories themselves were left out of staging by mistake). This commit adds the actual example mappes: - examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs, expected-findings.md} - examples/mcp-rug-pull/{README.md, run-rug-pull.mjs, expected-findings.md} Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section with a 4-row table covering malicious-skill-demo, prompt-injection- showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the state-isolation discipline notes. Marketplace root README unchanged since plugin's outward coverage is unchanged ([skip-docs] covers the marketplace-level gate).
2026-05-05 14:45:39 +02:00 · 2026-05-05 14:45:39 +02:00 · 583a78c6cc
commit 583a78c6cc
parent 8df5d5c70e
7 changed files with 739 additions and 0 deletions
--- a/plugins/llm-security/examples/mcp-rug-pull/README.md
+++ b/plugins/llm-security/examples/mcp-rug-pull/README.md
@ -0,0 +1,125 @@
+# MCP Cumulative Drift (Rug-Pull) Walkthrough
+
+> **WARNING: This is a demonstration fixture, NOT a real attack.**
+> No live MCP server is contacted. The script feeds eight synthetic
+> tool descriptions through `post-mcp-verify` and shows that the
+> v7.3.0 cumulative-drift advisory fires when per-update detection
+> would have stayed silent.
+
+## What this demonstrates
+
+**OWASP MCP05 — Rug Pull.** A trusted MCP server is updated in a series
+of tiny edits. Each individual update stays under the 10% per-update
+Levenshtein threshold, so the original v7.x detection (added before
+E14) never trips. But after seven small edits the description has
+drifted >25% from the original baseline — the tool now reads "Search
+the local data store" instead of "Search the internal knowledge base
+... for the agent to read".
+
+`v7.3.0 (E14)` added a sticky **baseline** to the MCP description cache.
+Every tool call computes:
+
+- `per_update = levenshtein(current, previous) / |previous|` — fires above 10%
+- `cumulative = levenshtein(current, baseline) / max(|current|, |baseline|)` — fires at 25%
+
+This walkthrough proves the cumulative leg catches the slow burn that
+the per-update leg misses.
+
+## Drift profile
+
+| Stage | Edit | per-update | cumulative | Advisory |
+|-------|------|-----------:|-----------:|----------|
+| 0 | baseline | — | 0.0% | seeded only |
+| 1 | agent → user | 3.3% | 3.3% | none |
+| 2 | ranked → scored | 3.3% | 6.6% | none |
+| 3 | short → brief | 4.2% | 10.7% | none |
+| 4 | documents → files | 5.8% | 16.5% | none |
+| 5 | internal → local | 5.2% | 21.5% | none |
+| 6 | base → store | 3.5% | 24.8% | none (just under threshold) |
+| 7 | knowledge → data | 7.9% | **32.2%** | **mcp-cumulative-drift (MEDIUM)** |
+
+The exact ratios are reproduced by `string-utils.levenshtein()` — see
+`expected-findings.md` for the testable contract.
+
+## How to run
+
+```bash
+cd plugins/llm-security
+node examples/mcp-rug-pull/run-rug-pull.mjs
+
+# Detailed: show stderr + final cache state
+node examples/mcp-rug-pull/run-rug-pull.mjs --verbose
+```
+
+Expected: `8 pass, 0 fail`. Stage 7 produces a `SECURITY ADVISORY
+(post-mcp-verify)` containing `mcp-cumulative-drift` and the literal
+phrase `Slow-burn rug-pull may evade per-update detection`.
+
+## Hooks / scanners involved
+
+- **`hooks/scripts/post-mcp-verify.mjs`** — the only hook invoked.
+  Calls into `scanners/lib/mcp-description-cache.mjs::checkDescriptionDrift()`
+  for the actual drift math.
+- **`scanners/lib/mcp-description-cache.mjs`** — the cache library.
+  Stores `{ description, firstSeen, lastSeen, baseline, history }` per
+  tool. Baseline survives the 7-day TTL purge.
+
+## Cache isolation
+
+`post-mcp-verify` honors `LLM_SECURITY_MCP_CACHE_FILE` env var (added
+v7.3.0 specifically for testing/demos). The script:
+
+1. Creates `mkdtempSync(tmpdir + 'llm-security-rugpull-')`
+2. Points the cache at a file inside that tempdir
+3. Spawns each hook invocation with the env var set
+4. Removes the entire tempdir in `finally{}` before exit
+
+**Your real `~/.cache/llm-security/mcp-descriptions.json` is never
+touched.** This is the same pattern used by the unit tests under
+`tests/lib/mcp-description-cache.test.mjs`.
+
+## Resetting baseline after a legitimate upgrade
+
+Real MCP servers do upgrade their descriptions occasionally — that's
+not always an attack. After confirming the upgrade is genuine, run:
+
+```
+/security mcp-baseline-reset                    # clear all baselines
+/security mcp-baseline-reset --target mcp__foo  # clear one tool
+/security mcp-baseline-reset --list             # see current baselines
+```
+
+The next call to `checkDescriptionDrift` after a clear will re-seed
+the baseline from whatever incoming description appears. `description`,
+`firstSeen`, `lastSeen`, and `history` are preserved for audit.
+
+## OWASP / framework mapping
+
+| Code | Framework | Why |
+|------|-----------|-----|
+| MCP05 | OWASP MCP Top 10 | Rug-pull / unauthorized tool description change |
+| LLM03 | OWASP LLM Top 10 | Supply-chain — compromised MCP server delivers altered behavior |
+| ASI04 | OWASP Agentic Top 10 | Untrusted-tool-influence on agent behavior |
+
+## Limitations
+
+- The walkthrough demonstrates only the `mcp-cumulative-drift` MEDIUM
+  advisory. It does not exercise:
+  - Per-update advisory firing (above 10% in one step) — covered by the
+    older v6.x test suite
+  - Cache TTL purge (7 days) — would require time mocking
+  - History rolling cap (10 events FIFO) — emerges naturally over use
+- This is a description-only rug-pull. Behavior changes that don't show
+  up in the description (e.g. the server returns different *content*
+  while keeping its description) are detected by other layers
+  (`post-session-guard` data flow tagging, `post-mcp-verify` content
+  scanning of `tool_output`).
+
+## See also
+
+- `docs/security-hardening-guide.md` §6 — calibration story for v7.3.0
+- `commands/mcp-baseline-reset.md` — when and how to reset
+- `tests/lib/mcp-description-cache.test.mjs` — unit-test contract
+- `examples/lethal-trifecta-walkthrough/` — adjacent demonstration of
+  another runtime hook
+- `expected-findings.md` (in this folder) — the testable contract