ktg-plugin-marketplace/plugins/llm-security/examples/mcp-rug-pull/README.md
Kjell Tore Guttormsen 583a78c6cc feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]
Companion to 8df5d5c (which only carried the doc updates — the example
directories themselves were left out of staging by mistake). This
commit adds the actual example mappes:

- examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs,
  expected-findings.md}
- examples/mcp-rug-pull/{README.md, run-rug-pull.mjs,
  expected-findings.md}

Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section
with a 4-row table covering malicious-skill-demo, prompt-injection-
showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the
state-isolation discipline notes.

Marketplace root README unchanged since plugin's outward coverage
is unchanged ([skip-docs] covers the marketplace-level gate).
2026-05-05 14:45:39 +02:00

125 lines
5.1 KiB
Markdown

# MCP Cumulative Drift (Rug-Pull) Walkthrough
> **WARNING: This is a demonstration fixture, NOT a real attack.**
> No live MCP server is contacted. The script feeds eight synthetic
> tool descriptions through `post-mcp-verify` and shows that the
> v7.3.0 cumulative-drift advisory fires when per-update detection
> would have stayed silent.
## What this demonstrates
**OWASP MCP05 — Rug Pull.** A trusted MCP server is updated in a series
of tiny edits. Each individual update stays under the 10% per-update
Levenshtein threshold, so the original v7.x detection (added before
E14) never trips. But after seven small edits the description has
drifted >25% from the original baseline — the tool now reads "Search
the local data store" instead of "Search the internal knowledge base
... for the agent to read".
`v7.3.0 (E14)` added a sticky **baseline** to the MCP description cache.
Every tool call computes:
- `per_update = levenshtein(current, previous) / |previous|` — fires above 10%
- `cumulative = levenshtein(current, baseline) / max(|current|, |baseline|)` — fires at 25%
This walkthrough proves the cumulative leg catches the slow burn that
the per-update leg misses.
## Drift profile
| Stage | Edit | per-update | cumulative | Advisory |
|-------|------|-----------:|-----------:|----------|
| 0 | baseline | — | 0.0% | seeded only |
| 1 | agent → user | 3.3% | 3.3% | none |
| 2 | ranked → scored | 3.3% | 6.6% | none |
| 3 | short → brief | 4.2% | 10.7% | none |
| 4 | documents → files | 5.8% | 16.5% | none |
| 5 | internal → local | 5.2% | 21.5% | none |
| 6 | base → store | 3.5% | 24.8% | none (just under threshold) |
| 7 | knowledge → data | 7.9% | **32.2%** | **mcp-cumulative-drift (MEDIUM)** |
The exact ratios are reproduced by `string-utils.levenshtein()` — see
`expected-findings.md` for the testable contract.
## How to run
```bash
cd plugins/llm-security
node examples/mcp-rug-pull/run-rug-pull.mjs
# Detailed: show stderr + final cache state
node examples/mcp-rug-pull/run-rug-pull.mjs --verbose
```
Expected: `8 pass, 0 fail`. Stage 7 produces a `SECURITY ADVISORY
(post-mcp-verify)` containing `mcp-cumulative-drift` and the literal
phrase `Slow-burn rug-pull may evade per-update detection`.
## Hooks / scanners involved
- **`hooks/scripts/post-mcp-verify.mjs`** — the only hook invoked.
Calls into `scanners/lib/mcp-description-cache.mjs::checkDescriptionDrift()`
for the actual drift math.
- **`scanners/lib/mcp-description-cache.mjs`** — the cache library.
Stores `{ description, firstSeen, lastSeen, baseline, history }` per
tool. Baseline survives the 7-day TTL purge.
## Cache isolation
`post-mcp-verify` honors `LLM_SECURITY_MCP_CACHE_FILE` env var (added
v7.3.0 specifically for testing/demos). The script:
1. Creates `mkdtempSync(tmpdir + 'llm-security-rugpull-')`
2. Points the cache at a file inside that tempdir
3. Spawns each hook invocation with the env var set
4. Removes the entire tempdir in `finally{}` before exit
**Your real `~/.cache/llm-security/mcp-descriptions.json` is never
touched.** This is the same pattern used by the unit tests under
`tests/lib/mcp-description-cache.test.mjs`.
## Resetting baseline after a legitimate upgrade
Real MCP servers do upgrade their descriptions occasionally — that's
not always an attack. After confirming the upgrade is genuine, run:
```
/security mcp-baseline-reset # clear all baselines
/security mcp-baseline-reset --target mcp__foo # clear one tool
/security mcp-baseline-reset --list # see current baselines
```
The next call to `checkDescriptionDrift` after a clear will re-seed
the baseline from whatever incoming description appears. `description`,
`firstSeen`, `lastSeen`, and `history` are preserved for audit.
## OWASP / framework mapping
| Code | Framework | Why |
|------|-----------|-----|
| MCP05 | OWASP MCP Top 10 | Rug-pull / unauthorized tool description change |
| LLM03 | OWASP LLM Top 10 | Supply-chain — compromised MCP server delivers altered behavior |
| ASI04 | OWASP Agentic Top 10 | Untrusted-tool-influence on agent behavior |
## Limitations
- The walkthrough demonstrates only the `mcp-cumulative-drift` MEDIUM
advisory. It does not exercise:
- Per-update advisory firing (above 10% in one step) — covered by the
older v6.x test suite
- Cache TTL purge (7 days) — would require time mocking
- History rolling cap (10 events FIFO) — emerges naturally over use
- This is a description-only rug-pull. Behavior changes that don't show
up in the description (e.g. the server returns different *content*
while keeping its description) are detected by other layers
(`post-session-guard` data flow tagging, `post-mcp-verify` content
scanning of `tool_output`).
## See also
- `docs/security-hardening-guide.md` §6 — calibration story for v7.3.0
- `commands/mcp-baseline-reset.md` — when and how to reset
- `tests/lib/mcp-description-cache.test.mjs` — unit-test contract
- `examples/lethal-trifecta-walkthrough/` — adjacent demonstration of
another runtime hook
- `expected-findings.md` (in this folder) — the testable contract