feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]
Companion to 8df5d5c (which only carried the doc updates — the example
directories themselves were left out of staging by mistake). This
commit adds the actual example mappes:
- examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs,
expected-findings.md}
- examples/mcp-rug-pull/{README.md, run-rug-pull.mjs,
expected-findings.md}
Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section
with a 4-row table covering malicious-skill-demo, prompt-injection-
showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the
state-isolation discipline notes.
Marketplace root README unchanged since plugin's outward coverage
is unchanged ([skip-docs] covers the marketplace-level gate).
This commit is contained in:
parent
8df5d5c70e
commit
583a78c6cc
7 changed files with 739 additions and 0 deletions
125
plugins/llm-security/examples/mcp-rug-pull/README.md
Normal file
125
plugins/llm-security/examples/mcp-rug-pull/README.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
# MCP Cumulative Drift (Rug-Pull) Walkthrough
|
||||
|
||||
> **WARNING: This is a demonstration fixture, NOT a real attack.**
|
||||
> No live MCP server is contacted. The script feeds eight synthetic
|
||||
> tool descriptions through `post-mcp-verify` and shows that the
|
||||
> v7.3.0 cumulative-drift advisory fires when per-update detection
|
||||
> would have stayed silent.
|
||||
|
||||
## What this demonstrates
|
||||
|
||||
**OWASP MCP05 — Rug Pull.** A trusted MCP server is updated in a series
|
||||
of tiny edits. Each individual update stays under the 10% per-update
|
||||
Levenshtein threshold, so the original v7.x detection (added before
|
||||
E14) never trips. But after seven small edits the description has
|
||||
drifted >25% from the original baseline — the tool now reads "Search
|
||||
the local data store" instead of "Search the internal knowledge base
|
||||
... for the agent to read".
|
||||
|
||||
`v7.3.0 (E14)` added a sticky **baseline** to the MCP description cache.
|
||||
Every tool call computes:
|
||||
|
||||
- `per_update = levenshtein(current, previous) / |previous|` — fires above 10%
|
||||
- `cumulative = levenshtein(current, baseline) / max(|current|, |baseline|)` — fires at 25%
|
||||
|
||||
This walkthrough proves the cumulative leg catches the slow burn that
|
||||
the per-update leg misses.
|
||||
|
||||
## Drift profile
|
||||
|
||||
| Stage | Edit | per-update | cumulative | Advisory |
|
||||
|-------|------|-----------:|-----------:|----------|
|
||||
| 0 | baseline | — | 0.0% | seeded only |
|
||||
| 1 | agent → user | 3.3% | 3.3% | none |
|
||||
| 2 | ranked → scored | 3.3% | 6.6% | none |
|
||||
| 3 | short → brief | 4.2% | 10.7% | none |
|
||||
| 4 | documents → files | 5.8% | 16.5% | none |
|
||||
| 5 | internal → local | 5.2% | 21.5% | none |
|
||||
| 6 | base → store | 3.5% | 24.8% | none (just under threshold) |
|
||||
| 7 | knowledge → data | 7.9% | **32.2%** | **mcp-cumulative-drift (MEDIUM)** |
|
||||
|
||||
The exact ratios are reproduced by `string-utils.levenshtein()` — see
|
||||
`expected-findings.md` for the testable contract.
|
||||
|
||||
## How to run
|
||||
|
||||
```bash
|
||||
cd plugins/llm-security
|
||||
node examples/mcp-rug-pull/run-rug-pull.mjs
|
||||
|
||||
# Detailed: show stderr + final cache state
|
||||
node examples/mcp-rug-pull/run-rug-pull.mjs --verbose
|
||||
```
|
||||
|
||||
Expected: `8 pass, 0 fail`. Stage 7 produces a `SECURITY ADVISORY
|
||||
(post-mcp-verify)` containing `mcp-cumulative-drift` and the literal
|
||||
phrase `Slow-burn rug-pull may evade per-update detection`.
|
||||
|
||||
## Hooks / scanners involved
|
||||
|
||||
- **`hooks/scripts/post-mcp-verify.mjs`** — the only hook invoked.
|
||||
Calls into `scanners/lib/mcp-description-cache.mjs::checkDescriptionDrift()`
|
||||
for the actual drift math.
|
||||
- **`scanners/lib/mcp-description-cache.mjs`** — the cache library.
|
||||
Stores `{ description, firstSeen, lastSeen, baseline, history }` per
|
||||
tool. Baseline survives the 7-day TTL purge.
|
||||
|
||||
## Cache isolation
|
||||
|
||||
`post-mcp-verify` honors `LLM_SECURITY_MCP_CACHE_FILE` env var (added
|
||||
v7.3.0 specifically for testing/demos). The script:
|
||||
|
||||
1. Creates `mkdtempSync(tmpdir + 'llm-security-rugpull-')`
|
||||
2. Points the cache at a file inside that tempdir
|
||||
3. Spawns each hook invocation with the env var set
|
||||
4. Removes the entire tempdir in `finally{}` before exit
|
||||
|
||||
**Your real `~/.cache/llm-security/mcp-descriptions.json` is never
|
||||
touched.** This is the same pattern used by the unit tests under
|
||||
`tests/lib/mcp-description-cache.test.mjs`.
|
||||
|
||||
## Resetting baseline after a legitimate upgrade
|
||||
|
||||
Real MCP servers do upgrade their descriptions occasionally — that's
|
||||
not always an attack. After confirming the upgrade is genuine, run:
|
||||
|
||||
```
|
||||
/security mcp-baseline-reset # clear all baselines
|
||||
/security mcp-baseline-reset --target mcp__foo # clear one tool
|
||||
/security mcp-baseline-reset --list # see current baselines
|
||||
```
|
||||
|
||||
The next call to `checkDescriptionDrift` after a clear will re-seed
|
||||
the baseline from whatever incoming description appears. `description`,
|
||||
`firstSeen`, `lastSeen`, and `history` are preserved for audit.
|
||||
|
||||
## OWASP / framework mapping
|
||||
|
||||
| Code | Framework | Why |
|
||||
|------|-----------|-----|
|
||||
| MCP05 | OWASP MCP Top 10 | Rug-pull / unauthorized tool description change |
|
||||
| LLM03 | OWASP LLM Top 10 | Supply-chain — compromised MCP server delivers altered behavior |
|
||||
| ASI04 | OWASP Agentic Top 10 | Untrusted-tool-influence on agent behavior |
|
||||
|
||||
## Limitations
|
||||
|
||||
- The walkthrough demonstrates only the `mcp-cumulative-drift` MEDIUM
|
||||
advisory. It does not exercise:
|
||||
- Per-update advisory firing (above 10% in one step) — covered by the
|
||||
older v6.x test suite
|
||||
- Cache TTL purge (7 days) — would require time mocking
|
||||
- History rolling cap (10 events FIFO) — emerges naturally over use
|
||||
- This is a description-only rug-pull. Behavior changes that don't show
|
||||
up in the description (e.g. the server returns different *content*
|
||||
while keeping its description) are detected by other layers
|
||||
(`post-session-guard` data flow tagging, `post-mcp-verify` content
|
||||
scanning of `tool_output`).
|
||||
|
||||
## See also
|
||||
|
||||
- `docs/security-hardening-guide.md` §6 — calibration story for v7.3.0
|
||||
- `commands/mcp-baseline-reset.md` — when and how to reset
|
||||
- `tests/lib/mcp-description-cache.test.mjs` — unit-test contract
|
||||
- `examples/lethal-trifecta-walkthrough/` — adjacent demonstration of
|
||||
another runtime hook
|
||||
- `expected-findings.md` (in this folder) — the testable contract
|
||||
Loading…
Add table
Add a link
Reference in a new issue