ktg-plugin-marketplace/plugins/llm-security/examples/mcp-rug-pull
Kjell Tore Guttormsen 583a78c6cc feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs]
Companion to 8df5d5c (which only carried the doc updates — the example
directories themselves were left out of staging by mistake). This
commit adds the actual example mappes:

- examples/lethal-trifecta-walkthrough/{README.md, run-trifecta.mjs,
  expected-findings.md}
- examples/mcp-rug-pull/{README.md, run-rug-pull.mjs,
  expected-findings.md}

Plus plugin CLAUDE.md "Examples (runnable demonstrations)" section
with a 4-row table covering malicious-skill-demo, prompt-injection-
showcase, lethal-trifecta-walkthrough, and mcp-rug-pull plus the
state-isolation discipline notes.

Marketplace root README unchanged since plugin's outward coverage
is unchanged ([skip-docs] covers the marketplace-level gate).
2026-05-05 14:45:39 +02:00
..
expected-findings.md feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] 2026-05-05 14:45:39 +02:00
README.md feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] 2026-05-05 14:45:39 +02:00
run-rug-pull.mjs feat(llm-security): add lethal-trifecta + mcp-rug-pull example contents [skip-docs] 2026-05-05 14:45:39 +02:00

MCP Cumulative Drift (Rug-Pull) Walkthrough

WARNING: This is a demonstration fixture, NOT a real attack. No live MCP server is contacted. The script feeds eight synthetic tool descriptions through post-mcp-verify and shows that the v7.3.0 cumulative-drift advisory fires when per-update detection would have stayed silent.

What this demonstrates

OWASP MCP05 — Rug Pull. A trusted MCP server is updated in a series of tiny edits. Each individual update stays under the 10% per-update Levenshtein threshold, so the original v7.x detection (added before E14) never trips. But after seven small edits the description has drifted >25% from the original baseline — the tool now reads "Search the local data store" instead of "Search the internal knowledge base ... for the agent to read".

v7.3.0 (E14) added a sticky baseline to the MCP description cache. Every tool call computes:

  • per_update = levenshtein(current, previous) / |previous| — fires above 10%
  • cumulative = levenshtein(current, baseline) / max(|current|, |baseline|) — fires at 25%

This walkthrough proves the cumulative leg catches the slow burn that the per-update leg misses.

Drift profile

Stage Edit per-update cumulative Advisory
0 baseline 0.0% seeded only
1 agent → user 3.3% 3.3% none
2 ranked → scored 3.3% 6.6% none
3 short → brief 4.2% 10.7% none
4 documents → files 5.8% 16.5% none
5 internal → local 5.2% 21.5% none
6 base → store 3.5% 24.8% none (just under threshold)
7 knowledge → data 7.9% 32.2% mcp-cumulative-drift (MEDIUM)

The exact ratios are reproduced by string-utils.levenshtein() — see expected-findings.md for the testable contract.

How to run

cd plugins/llm-security
node examples/mcp-rug-pull/run-rug-pull.mjs

# Detailed: show stderr + final cache state
node examples/mcp-rug-pull/run-rug-pull.mjs --verbose

Expected: 8 pass, 0 fail. Stage 7 produces a SECURITY ADVISORY (post-mcp-verify) containing mcp-cumulative-drift and the literal phrase Slow-burn rug-pull may evade per-update detection.

Hooks / scanners involved

  • hooks/scripts/post-mcp-verify.mjs — the only hook invoked. Calls into scanners/lib/mcp-description-cache.mjs::checkDescriptionDrift() for the actual drift math.
  • scanners/lib/mcp-description-cache.mjs — the cache library. Stores { description, firstSeen, lastSeen, baseline, history } per tool. Baseline survives the 7-day TTL purge.

Cache isolation

post-mcp-verify honors LLM_SECURITY_MCP_CACHE_FILE env var (added v7.3.0 specifically for testing/demos). The script:

  1. Creates mkdtempSync(tmpdir + 'llm-security-rugpull-')
  2. Points the cache at a file inside that tempdir
  3. Spawns each hook invocation with the env var set
  4. Removes the entire tempdir in finally{} before exit

Your real ~/.cache/llm-security/mcp-descriptions.json is never touched. This is the same pattern used by the unit tests under tests/lib/mcp-description-cache.test.mjs.

Resetting baseline after a legitimate upgrade

Real MCP servers do upgrade their descriptions occasionally — that's not always an attack. After confirming the upgrade is genuine, run:

/security mcp-baseline-reset                    # clear all baselines
/security mcp-baseline-reset --target mcp__foo  # clear one tool
/security mcp-baseline-reset --list             # see current baselines

The next call to checkDescriptionDrift after a clear will re-seed the baseline from whatever incoming description appears. description, firstSeen, lastSeen, and history are preserved for audit.

OWASP / framework mapping

Code Framework Why
MCP05 OWASP MCP Top 10 Rug-pull / unauthorized tool description change
LLM03 OWASP LLM Top 10 Supply-chain — compromised MCP server delivers altered behavior
ASI04 OWASP Agentic Top 10 Untrusted-tool-influence on agent behavior

Limitations

  • The walkthrough demonstrates only the mcp-cumulative-drift MEDIUM advisory. It does not exercise:
    • Per-update advisory firing (above 10% in one step) — covered by the older v6.x test suite
    • Cache TTL purge (7 days) — would require time mocking
    • History rolling cap (10 events FIFO) — emerges naturally over use
  • This is a description-only rug-pull. Behavior changes that don't show up in the description (e.g. the server returns different content while keeping its description) are detected by other layers (post-session-guard data flow tagging, post-mcp-verify content scanning of tool_output).

See also

  • docs/security-hardening-guide.md §6 — calibration story for v7.3.0
  • commands/mcp-baseline-reset.md — when and how to reset
  • tests/lib/mcp-description-cache.test.mjs — unit-test contract
  • examples/lethal-trifecta-walkthrough/ — adjacent demonstration of another runtime hook
  • expected-findings.md (in this folder) — the testable contract