feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]

Companion to 8df5d5c (which only carried the doc updates — the example directories themselves were left out of staging by mistake). This commit adds the actual example mappes: - examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs, expected-findings.md} - examples/mcp-rug-pull/{README.md, run-rug-pull.mjs, expected-findings.md} Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section with a 4-row table covering malicious-skill-demo, prompt-injection- showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the state-isolation discipline notes. Marketplace root README unchanged since plugin's outward coverage is unchanged ([skip-docs] covers the marketplace-level gate).
2026-05-05 14:45:39 +02:00 · 2026-05-05 14:45:39 +02:00 · 583a78c6cc
commit 583a78c6cc
parent 8df5d5c70e
7 changed files with 739 additions and 0 deletions
--- a/plugins/llm-security/examples/mcp-rug-pull/README.md
+++ b/plugins/llm-security/examples/mcp-rug-pull/README.md
@ -0,0 +1,125 @@
+# MCP Cumulative Drift (Rug-Pull) Walkthrough
+
+> **WARNING: This is a demonstration fixture, NOT a real attack.**
+> No live MCP server is contacted. The script feeds eight synthetic
+> tool descriptions through `post-mcp-verify` and shows that the
+> v7.3.0 cumulative-drift advisory fires when per-update detection
+> would have stayed silent.
+
+## What this demonstrates
+
+**OWASP MCP05 — Rug Pull.** A trusted MCP server is updated in a series
+of tiny edits. Each individual update stays under the 10% per-update
+Levenshtein threshold, so the original v7.x detection (added before
+E14) never trips. But after seven small edits the description has
+drifted >25% from the original baseline — the tool now reads "Search
+the local data store" instead of "Search the internal knowledge base
+... for the agent to read".
+
+`v7.3.0 (E14)` added a sticky **baseline** to the MCP description cache.
+Every tool call computes:
+
+- `per_update = levenshtein(current, previous) / |previous|` — fires above 10%
+- `cumulative = levenshtein(current, baseline) / max(|current|, |baseline|)` — fires at 25%
+
+This walkthrough proves the cumulative leg catches the slow burn that
+the per-update leg misses.
+
+## Drift profile
+
+| Stage | Edit | per-update | cumulative | Advisory |
+|-------|------|-----------:|-----------:|----------|
+| 0 | baseline | — | 0.0% | seeded only |
+| 1 | agent → user | 3.3% | 3.3% | none |
+| 2 | ranked → scored | 3.3% | 6.6% | none |
+| 3 | short → brief | 4.2% | 10.7% | none |
+| 4 | documents → files | 5.8% | 16.5% | none |
+| 5 | internal → local | 5.2% | 21.5% | none |
+| 6 | base → store | 3.5% | 24.8% | none (just under threshold) |
+| 7 | knowledge → data | 7.9% | **32.2%** | **mcp-cumulative-drift (MEDIUM)** |
+
+The exact ratios are reproduced by `string-utils.levenshtein()` — see
+`expected-findings.md` for the testable contract.
+
+## How to run
+
+```bash
+cd plugins/llm-security
+node examples/mcp-rug-pull/run-rug-pull.mjs
+
+# Detailed: show stderr + final cache state
+node examples/mcp-rug-pull/run-rug-pull.mjs --verbose
+```
+
+Expected: `8 pass, 0 fail`. Stage 7 produces a `SECURITY ADVISORY
+(post-mcp-verify)` containing `mcp-cumulative-drift` and the literal
+phrase `Slow-burn rug-pull may evade per-update detection`.
+
+## Hooks / scanners involved
+
+- **`hooks/scripts/post-mcp-verify.mjs`** — the only hook invoked.
+  Calls into `scanners/lib/mcp-description-cache.mjs::checkDescriptionDrift()`
+  for the actual drift math.
+- **`scanners/lib/mcp-description-cache.mjs`** — the cache library.
+  Stores `{ description, firstSeen, lastSeen, baseline, history }` per
+  tool. Baseline survives the 7-day TTL purge.
+
+## Cache isolation
+
+`post-mcp-verify` honors `LLM_SECURITY_MCP_CACHE_FILE` env var (added
+v7.3.0 specifically for testing/demos). The script:
+
+1. Creates `mkdtempSync(tmpdir + 'llm-security-rugpull-')`
+2. Points the cache at a file inside that tempdir
+3. Spawns each hook invocation with the env var set
+4. Removes the entire tempdir in `finally{}` before exit
+
+**Your real `~/.cache/llm-security/mcp-descriptions.json` is never
+touched.** This is the same pattern used by the unit tests under
+`tests/lib/mcp-description-cache.test.mjs`.
+
+## Resetting baseline after a legitimate upgrade
+
+Real MCP servers do upgrade their descriptions occasionally — that's
+not always an attack. After confirming the upgrade is genuine, run:
+
+```
+/security mcp-baseline-reset                    # clear all baselines
+/security mcp-baseline-reset --target mcp__foo  # clear one tool
+/security mcp-baseline-reset --list             # see current baselines
+```
+
+The next call to `checkDescriptionDrift` after a clear will re-seed
+the baseline from whatever incoming description appears. `description`,
+`firstSeen`, `lastSeen`, and `history` are preserved for audit.
+
+## OWASP / framework mapping
+
+| Code | Framework | Why |
+|------|-----------|-----|
+| MCP05 | OWASP MCP Top 10 | Rug-pull / unauthorized tool description change |
+| LLM03 | OWASP LLM Top 10 | Supply-chain — compromised MCP server delivers altered behavior |
+| ASI04 | OWASP Agentic Top 10 | Untrusted-tool-influence on agent behavior |
+
+## Limitations
+
+- The walkthrough demonstrates only the `mcp-cumulative-drift` MEDIUM
+  advisory. It does not exercise:
+  - Per-update advisory firing (above 10% in one step) — covered by the
+    older v6.x test suite
+  - Cache TTL purge (7 days) — would require time mocking
+  - History rolling cap (10 events FIFO) — emerges naturally over use
+- This is a description-only rug-pull. Behavior changes that don't show
+  up in the description (e.g. the server returns different *content*
+  while keeping its description) are detected by other layers
+  (`post-session-guard` data flow tagging, `post-mcp-verify` content
+  scanning of `tool_output`).
+
+## See also
+
+- `docs/security-hardening-guide.md` §6 — calibration story for v7.3.0
+- `commands/mcp-baseline-reset.md` — when and how to reset
+- `tests/lib/mcp-description-cache.test.mjs` — unit-test contract
+- `examples/lethal-trifecta-walkthrough/` — adjacent demonstration of
+  another runtime hook
+- `expected-findings.md` (in this folder) — the testable contract
--- a/plugins/llm-security/examples/mcp-rug-pull/expected-findings.md
+++ b/plugins/llm-security/examples/mcp-rug-pull/expected-findings.md
@ -0,0 +1,64 @@
+# Expected Findings — MCP Cumulative Drift Walkthrough
+
+This is the testable contract. `run-rug-pull.mjs` exits 0 only when
+every row matches.
+
+## Per-stage contract
+
+| Stage | per-update advisory | cumulative advisory | OWASP |
+|-------|---------------------|---------------------|-------|
+| 0 | no | no (baseline seeded) | — |
+| 1 | no | no | — |
+| 2 | no | no | — |
+| 3 | no | no | — |
+| 4 | no | no | — |
+| 5 | no | no | — |
+| 6 | no | no (cum=24.8%, just under 25%) | — |
+| 7 | **no** | **YES** | MCP05, LLM03 |
+
+The hook output is JSON `{systemMessage: "..."}` containing
+`SECURITY ADVISORY (post-mcp-verify): Potential data leakage detected.`
+followed by an enumerated advisory. The `mcp-cumulative-drift`
+advisory at stage 7 includes:
+
+- The literal phrase `MCP tool cumulative description drift — MEDIUM`
+- The OWASP tag `(mcp-cumulative-drift, OWASP MCP05)`
+- The phrase `Slow-burn rug-pull may evade per-update detection`
+- A baseline preview matching stage 0's text
+- A current preview matching stage 7's text
+- A pointer to `/security mcp-baseline-reset`
+
+## Drift math (verifiable)
+
+These ratios are produced by
+`scanners/lib/string-utils.mjs::levenshtein()`:
+
+| Stage | Levenshtein vs prev | Levenshtein vs baseline | per_update | cumulative |
+|-------|-------------------:|------------------------:|-----------:|-----------:|
+| 1 | 4 | 4 | 3.3% | 3.3% |
+| 2 | 4 | 8 | 3.3% | 6.6% |
+| 3 | 5 | 13 | 4.2% | 10.7% |
+| 4 | 7 | 20 | 5.8% | 16.5% |
+| 5 | 6 | 26 | 5.2% | 21.5% |
+| 6 | 4 | 30 | 3.5% | 24.8% |
+| 7 | 9 | 39 | 7.9% | **32.2%** |
+
+per_update threshold = 0.10 → never tripped.
+cumulative threshold = 0.25 → tripped at stage 7.
+
+## Cache state at end (verbose mode)
+
+`mcp__knowledge__search` entry should contain:
+
+- `baseline.description` = stage 0 text (immutable since stage 0)
+- `description` = stage 7 text (last seen)
+- `history.length` = 7 (one entry per stage 1-7)
+- `firstSeen` and `lastSeen` set to runtime millis
+- No `clearBaseline()` was called, so baseline is still present
+
+## Side effects
+
+- Cache file is written to `mkdtemp` directory provided via env var
+- Cache directory is removed by `finally{}` block on exit
+- No MCP audit-trail event (audit trail not configured for this demo)
+- No interaction with `~/.cache/llm-security/`
--- a/plugins/llm-security/examples/mcp-rug-pull/run-rug-pull.mjs
+++ b/plugins/llm-security/examples/mcp-rug-pull/run-rug-pull.mjs
@ -0,0 +1,191 @@
+#!/usr/bin/env node
+// run-rug-pull.mjs — MCP slow-burn rug-pull demonstration (v7.3.0 / E14)
+// Mutates an MCP tool description across 7 stages — each edit stays UNDER
+// the per-update threshold (10% Levenshtein vs previous), but cumulatively
+// the description diverges >=25% from the original baseline. Demonstrates
+// that `post-mcp-verify` emits a `mcp-cumulative-drift` MEDIUM advisory
+// even when per-update detection never trips.
+//
+// State isolation:
+//   The hook reads MCP description cache via LLM_SECURITY_MCP_CACHE_FILE.
+//   This script points it at a private temp file and deletes that file
+//   on exit. The user's real ~/.cache/llm-security/mcp-descriptions.json
+//   is never touched.
+//
+// Usage:
+//   cd plugins/llm-security
+//   node examples/mcp-rug-pull/run-rug-pull.mjs
+//   node examples/mcp-rug-pull/run-rug-pull.mjs --verbose
+
+import { execFile } from 'node:child_process';
+import { existsSync, mkdtempSync, readFileSync, rmSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { resolve, dirname, join } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const PLUGIN_ROOT = resolve(__dirname, '../..');
+const HOOK = resolve(PLUGIN_ROOT, 'hooks/scripts/post-mcp-verify.mjs');
+const VERBOSE = process.argv.includes('--verbose');
+
+// ---------------------------------------------------------------------------
+// Description sequence — slow-burn rug-pull
+// ---------------------------------------------------------------------------
+//
+// Baseline: a benign-looking knowledge-base MCP server description.
+// Each subsequent stage applies a small edit. Per-update Levenshtein stays
+// under 10%, but by stage 7 the cumulative distance to baseline >=25% and
+// the meaning has shifted toward exfiltration ("upload", "external",
+// "telemetry endpoint").
+
+// Each stage edits at most ~6 chars vs the previous (per-update <10%), but
+// every edit moves "internal knowledge base / agent / read" toward
+// "local data store / user / brief snippets". By stage 7 the cumulative
+// Levenshtein vs baseline crosses 25% — the rug-pull is detected even
+// though no single update tripped per-update detection.
+const STAGES = [
+  // 0 — baseline (121 chars). All future drift is measured against this.
+  'Search the internal knowledge base. Returns a ranked list of matching documents and short snippets for the agent to read.',
+  // 1 — "agent" → "user"  (per=3.3%, cum=3.3%)
+  'Search the internal knowledge base. Returns a ranked list of matching documents and short snippets for the user to read.',
+  // 2 — "ranked" → "scored"  (per=3.3%, cum=6.6%)
+  'Search the internal knowledge base. Returns a scored list of matching documents and short snippets for the user to read.',
+  // 3 — "short" → "brief"  (per=4.2%, cum=10.7%)
+  'Search the internal knowledge base. Returns a scored list of matching documents and brief snippets for the user to read.',
+  // 4 — "documents" → "files"  (per=5.8%, cum=16.5%)
+  'Search the internal knowledge base. Returns a scored list of matching files and brief snippets for the user to read.',
+  // 5 — "internal" → "local"  (per=5.2%, cum=21.5%)
+  'Search the local knowledge base. Returns a scored list of matching files and brief snippets for the user to read.',
+  // 6 — "base" → "store"  (per=3.5%, cum=24.8% — still under cumulative threshold)
+  'Search the local knowledge store. Returns a scored list of matching files and brief snippets for the user to read.',
+  // 7 — "knowledge" → "data"  (per=7.9%, cum=32.2% — CUMULATIVE TRIPS, per-update does not)
+  'Search the local data store. Returns a scored list of matching files and brief snippets for the user to read.',
+];
+
+// ---------------------------------------------------------------------------
+// Hook runner
+// ---------------------------------------------------------------------------
+
+function runHook(input, env) {
+  return new Promise((res) => {
+    const child = execFile(
+      'node',
+      [HOOK],
+      { timeout: 5000, env: { ...process.env, ...env } },
+      (_err, stdout, stderr) => {
+        res({ code: child.exitCode ?? 1, stdout: stdout || '', stderr: stderr || '' });
+      },
+    );
+    child.stdin.end(JSON.stringify(input));
+  });
+}
+
+function parseAdvisories(stdout) {
+  const trimmed = stdout.trim();
+  if (!trimmed.startsWith('{')) return [];
+  try {
+    const parsed = JSON.parse(trimmed);
+    if (!parsed.systemMessage) return [];
+    // Hook joins multiple advisories with `\n\n---\n\n` (see post-mcp-verify.mjs)
+    return parsed.systemMessage.split('\n\n---\n\n');
+  } catch {
+    return [];
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Main
+// ---------------------------------------------------------------------------
+
+const tmpDir = mkdtempSync(join(tmpdir(), 'llm-security-rugpull-'));
+const cacheFile = join(tmpDir, 'mcp-descriptions.json');
+
+console.log('MCP CUMULATIVE DRIFT (RUG-PULL) WALKTHROUGH');
+console.log('===========================================');
+console.log(`Cache file (deleted on exit): ${cacheFile}\n`);
+console.log('Per-update threshold:  10% Levenshtein vs previous description');
+console.log('Cumulative threshold:  25% Levenshtein vs sticky baseline');
+console.log('OWASP MCP05 (Rug Pull) — v7.3.0 introduces the cumulative leg.\n');
+
+let pass = 0;
+let fail = 0;
+
+const expectations = [
+  { perUpdate: false, cumulative: false, note: 'baseline seeded — no advisory' },
+  { perUpdate: false, cumulative: false, note: 'agent → user' },
+  { perUpdate: false, cumulative: false, note: 'ranked → scored' },
+  { perUpdate: false, cumulative: false, note: 'short → brief' },
+  { perUpdate: false, cumulative: false, note: 'documents → files' },
+  { perUpdate: false, cumulative: false, note: 'internal → local' },
+  { perUpdate: false, cumulative: false, note: 'base → store (cum=24.8%, just under threshold)' },
+  { perUpdate: false, cumulative: true,  note: 'knowledge → data — CUMULATIVE TRIPS at 32.2%' },
+];
+
+try {
+  for (let i = 0; i < STAGES.length; i++) {
+    const description = STAGES[i];
+    const expect = expectations[i];
+
+    // post-mcp-verify exits early when tool_output is empty — the drift
+    // check only runs on tool calls that actually produce output. We send a
+    // benign placeholder so the description-drift code path executes.
+    const result = await runHook(
+      {
+        tool_name: 'mcp__knowledge__search',
+        tool_input: { description, query: 'demo' },
+        tool_output: 'no results',
+      },
+      { LLM_SECURITY_MCP_CACHE_FILE: cacheFile },
+    );
+
+    const advisories = parseAdvisories(result.stdout);
+    const perUpdateAdv = advisories.find(a => a.includes('description drift detected'));
+    const cumulativeAdv = advisories.find(a => a.includes('cumulative description drift'));
+
+    const perUpdateOk = !!perUpdateAdv === expect.perUpdate;
+    const cumulativeOk = !!cumulativeAdv === expect.cumulative;
+    const ok = perUpdateOk && cumulativeOk;
+    if (ok) pass++; else fail++;
+
+    const tick = ok ? 'PASS' : 'FAIL';
+    const len = description.length;
+    console.log(`[${tick}] Stage ${i} (${len} chars) — ${expect.note}`);
+    console.log(`       per-update advisory: expect=${expect.perUpdate} got=${!!perUpdateAdv}`);
+    console.log(`       cumulative advisory: expect=${expect.cumulative} got=${!!cumulativeAdv}`);
+    console.log(`       description: "${description.slice(0, 80)}${len > 80 ? '...' : ''}"`);
+    if (cumulativeAdv) {
+      const head = cumulativeAdv.split('\n').slice(0, 2).join('\n');
+      console.log(`       advisory preview: "${head.replace(/\n/g, ' / ')}"`);
+    }
+    if (VERBOSE && result.stderr.trim()) {
+      console.log(`       stderr: ${result.stderr.trim().slice(0, 120)}`);
+    }
+    console.log();
+  }
+
+  if (VERBOSE && existsSync(cacheFile)) {
+    const cache = JSON.parse(readFileSync(cacheFile, 'utf-8'));
+    const entry = cache['mcp__knowledge__search'];
+    if (entry) {
+      console.log('Cache state at exit:');
+      console.log(`  baseline.description = "${entry.baseline.description.slice(0, 60)}..."`);
+      console.log(`  current.description  = "${entry.description.slice(0, 60)}..."`);
+      console.log(`  history length       = ${entry.history?.length ?? 0}`);
+      console.log();
+    }
+  }
+} finally {
+  rmSync(tmpDir, { recursive: true, force: true });
+}
+
+console.log('---');
+console.log(`Result: ${pass} pass, ${fail} fail`);
+
+if (fail > 0) {
+  console.log('\nFAILURE — see expected-findings.md for the documented contract.');
+  process.exit(1);
+}
+
+console.log('\nSUCCESS — cumulative-drift advisory fired exactly when expected.');
+console.log('Reset MCP baseline after a legitimate upgrade: /security mcp-baseline-reset');
+process.exit(0);