ktg-plugin-marketplace/plugins/llm-security/knowledge/owasp-agentic-top10.md

29 KiB

OWASP Top 10 for Agentic AI Applications (2026)

Reference material for security agents analyzing agentic AI systems. Based on the official OWASP GenAI Security Project release (December 2025), developed by 100+ researchers and practitioners.

Prefix: ASI (Agentic Security Issue) Scope: Autonomous AI agents that plan, use tools, delegate to subagents, and act with minimal human supervision. Claude Code is an agentic system and maps directly to these risks. Source: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/


ASI01 — Agent Goal Hijack

Category: Goal and instruction integrity

Description

Attackers alter agent objectives by embedding hidden instructions in external content that the agent reads and processes. Agents cannot reliably separate instructions from data, making them vulnerable to prompt injection via poisoned documents, web pages, emails, or tool outputs.

Real incident: EchoLeak — copilots turned into silent exfiltration engines via injected email content.

Attack Vectors

  • Malicious instructions embedded in files the agent reads (PDF, markdown, code comments)
  • Tool outputs returning adversarial text disguised as data
  • Web content fetched during agent browsing that includes override instructions
  • Injected content in MCP tool responses that redefines the agent's task
  • Multi-turn manipulation: gradual reframing of goals across conversation turns

Detection Signals

  • Agent pursues actions not derivable from the original user request
  • Unexpected tool invocations or action sequences mid-task
  • Agent output references content not present in the original prompt
  • System prompt or role instructions appear to have been re-interpreted
  • Agent skips or rewrites its own stated plan without user input

Claude Code Mappings

  • Skills/commands: A malicious file read during /security scan could inject instructions to skip reporting a specific finding
  • Subagent tasks: Task prompts built from external content can carry injected goals into subagents
  • MCP tool outputs: mcp__tavily__tavily_search or mcp__ms-learn__fetch may return adversarial content that redirects agent behavior
  • Hooks: A PostToolUse hook reading tool output could process injected instructions

Mitigations

  • Treat all external content as untrusted data, never as instructions
  • Apply strict semantic boundaries: system prompt immutable, data sandboxed
  • Use PreToolUse hooks to validate tool inputs before external data is fetched
  • Require human approval before consequential actions (file writes, git commits, API calls)
  • Log the full reasoning chain so deviations from the original goal are auditable

ASI02 — Tool Misuse and Exploitation

Category: Tool integrity and authorization

Description

Agents misuse legitimate tools due to ambiguous prompts, manipulated input, or over-provisioned permissions. Legitimate tools become attack primitives: filesystem access becomes exfiltration, email access becomes phishing, shell access becomes arbitrary code execution.

Real incident: Amazon Q and GitHub Actions compromised via repository content triggering tool misuse.

Attack Vectors

  • Ambiguous task descriptions cause the agent to invoke tools with unintended arguments
  • Poisoned tool descriptors (MCP server descriptions) mislead the agent about tool purpose
  • Over-privileged tool configurations allow actions beyond the task scope
  • Adversarial content causes agents to invoke deletion, exfiltration, or write operations
  • Chained tool calls where output of one tool becomes input to a destructive second tool

Detection Signals

  • Tool called with arguments that were not present in the user's original request
  • Spike in API call volume or calls to tools outside the agent's defined role
  • Destructive operations (file deletion, database writes) without explicit user instruction
  • Sensitive data (secrets, PII) flowing as arguments to network-bound tools
  • Agent invokes tools in an order inconsistent with its stated plan

Claude Code Mappings

  • Hooks: pre-bash-destructive.mjs blocks rm -rf, DROP TABLE, and similar; validate this hook is present and covers the full destructive command surface
  • MCP tools: Each enabled MCP server expands the tool surface — audit mcp.json for over-permissioned servers (e.g., filesystem MCP with write access to /)
  • Skills with Bash tool: Any skill declaring allowed-tools: Bash can spawn processes; verify the necessity and scope of Bash access in frontmatter
  • allowed-tools in commands: Commands should declare the minimal tool set required

Mitigations

  • Apply least-privilege to every tool: scope filesystem access, API permissions, network targets
  • Validate all tool arguments in PreToolUse hooks before execution
  • Require explicit human approval for irreversible operations (destructive Bash, git push)
  • Audit MCP server configurations — each server is an attack surface expansion
  • Pin tool configurations; detect and alert on changes to tool descriptors

ASI03 — Identity and Privilege Abuse

Category: Identity, credentials, and delegation

Description

Agents often inherit user or system identities including high-privilege credentials, session tokens, and delegated access. Unintended privilege reuse, escalation, or cross-agent delegation without proper scoping creates confused deputy scenarios where the agent acts with permissions it should not exercise.

Attack Vectors

  • Agent inherits the operator's credentials and uses them beyond the task scope
  • A compromised subagent operates with the parent agent's delegated identity
  • Short-lived tokens not used — agent uses long-lived credentials that persist across sessions
  • Agent escalates its own permissions by requesting elevated access mid-task
  • Lateral movement: agent uses one system's credentials to authenticate to another

Detection Signals

  • Credential access from unexpected timing or context (e.g., credentials used outside a task)
  • Agent accesses resources unrelated to its defined function
  • Cross-system access chains: authentication to system B immediately after action on system A
  • Failed permission checks followed by attempts via alternative credential paths
  • Subagents performing actions requiring higher privileges than delegated

Claude Code Mappings

  • API keys in environment: Claude Code executes in the user's shell — it inherits all env variables including OPENAI_API_KEY, AZURE_CLIENT_SECRET, etc.
  • pre-edit-secrets.mjs hook: Detects if secrets are being written to files, but does not prevent an agent from using env-var credentials in Bash commands
  • --dangerously-skip-permissions: When used in subagent invocations (claude -p), all permission gates are bypassed for that subagent's session
  • Subagent delegation: Tasks spawned with Task tool receive the parent's tool permissions; verify task prompts do not over-grant scope implicitly

Mitigations

  • Scope credentials to the minimum required for each task; use task-scoped tokens where possible
  • Never pass raw secrets as task arguments to subagents
  • Treat each subagent as a separate identity with its own permission boundary
  • Audit use of --dangerously-skip-permissions — restrict to headless, sandboxed contexts only
  • Rotate credentials after agentic sessions that accessed sensitive systems

ASI04 — Agentic Supply Chain Vulnerabilities

Category: Component integrity and provenance

Description

Tools, plugins, prompt templates, MCP servers, and agent definitions fetched or loaded dynamically can be compromised. Any poisoned component alters agent behavior or exposes data, and the attack surface is invisible to static dependency scanning because components resolve at runtime.

Real incident: Malicious MCP servers impersonating legitimate ones, altering tool behavior post-install.

Attack Vectors

  • Compromised MCP server that behaves correctly during review but exfiltrates data in production
  • Poisoned skill/command markdown fetched from a remote source
  • Agent definition files modified in a plugin repository after installation
  • Typosquatted MCP server names registered to intercept installs
  • Plugin manifest (plugin.json) tampered to add unauthorized tool permissions

Detection Signals

  • MCP server making network connections to undocumented endpoints
  • Plugin files modified after initial installation (file hash change)
  • New tool capabilities appearing after a plugin update
  • Agent behavior changing without corresponding code change
  • hooks.json or plugin.json modifications not tied to a commit

Claude Code Mappings

  • plugin.json manifest: The auto_discover: true setting means any file in the plugin directory is trusted; a supply chain compromise of the plugin repo affects all commands and agents
  • MCP server configurations: mcp.json and .mcp.json files define which servers run — a tampered server definition is a full agent compromise
  • External skill references: Skills referencing remote URLs for knowledge base content introduce runtime supply chain risk
  • hooks/hooks.json: A modified hooks file can add, remove, or neuter security hooks silently

Mitigations

  • Pin MCP server versions; verify checksums before use
  • Monitor plugin directory files for unexpected modifications (file integrity monitoring)
  • Audit plugin.json, hooks.json, and all agent frontmatter on each session start
  • Prefer local MCP servers over remote for sensitive operations; limit network-bound servers
  • Review MCP server source code before enabling; treat third-party servers as untrusted by default

ASI05 — Unexpected Code Execution

Category: Code generation and execution safety

Description

Agents generate or execute code unsafely through shell commands, eval-like constructs, script execution, or deserialization. The attack path runs directly from text input to system commands. Coding agents like Claude Code are high-risk because code generation and execution are core features.

Attack Vectors

  • Prompt injection in source code comments causes agent to generate and run malicious shell commands
  • Agent generates a "helpful" script that includes attacker-controlled payload
  • eval() or exec() applied to LLM output without sandboxing
  • Agent patches a configuration file in a way that achieves code execution on next load
  • Hallucinated library name installed via npm install or pip install (slopsquatting)

Detection Signals

  • Shell commands spawned that were not present in the original task specification
  • Writes to executable paths (/usr/local/bin, .bashrc, ~/.zshrc, cron directories)
  • package.json or requirements.txt modified with packages not in the original task
  • Agent generates code containing subprocess, os.system, eval, exec without review gate
  • Writes to .github/workflows/, Makefile, or other CI/CD configuration files

Claude Code Mappings

  • pre-bash-destructive.mjs hook: First line of defense, but only blocks known-bad patterns; novel payloads may pass through
  • Skills with Bash allowed-tools: Any skill that can run Bash can achieve code execution — validate each skill's tool list is scoped to its purpose
  • allowed-tools: Write + Bash: A skill with both Write and Bash can write a script and execute it — this combination requires strong justification
  • MCP filesystem tools: MCP servers with write access to executable paths are equivalent to unrestricted code execution

Mitigations

  • Sandbox Bash execution: use restricted shells, containers, or read-only mounts where possible
  • Require human approval before any write to executable or configuration paths
  • Block installation of packages not in an approved list (pre-bash hook pattern matching)
  • Never auto-approve actions triggered by content read from external sources (files, web, MCP)
  • Treat all generated code as untrusted until reviewed; do not auto-execute

ASI06 — Memory and Context Poisoning

Category: State integrity and persistence

Description

Agents rely on memory systems, embeddings, RAG databases, context windows, and summaries to maintain state across interactions. Attackers poison this memory to influence future decisions persistently. Unlike one-shot injection, memory poisoning executes on every future session without repeated attack.

Attack Vectors

  • Adversarial text injected into a document that gets stored in a RAG knowledge base
  • Agent's session summary poisoned with false "user preferences" that persist
  • Cross-tenant memory leakage: one user's poisoned entry affects another user's agent session
  • Long-term drift: repeated exposure to adversarial content gradually shifts agent behavior
  • REMEMBER.md or session state files modified to contain false context

Detection Signals

  • Agent references facts or preferences not established in the current session
  • Agent defends false beliefs when challenged with contradictory evidence
  • Behavioral changes appearing after a specific file read or knowledge base query
  • REMEMBER.md or project memory files contain entries inconsistent with recent commits
  • Agent applies "learned preferences" that the user did not specify

Claude Code Mappings

  • REMEMBER.md files: These are trusted by default and read as ground truth at session start; a tampered REMEMBER.md poisons every session in that project
  • MEMORY.md / project memory: The ~/.claude/projects/ memory files are not version-controlled by default — they can be silently modified
  • System prompt context: Skills/commands that inject large context blocks affect the agent's reasoning for the entire session
  • KV store / MCP memory servers: Any MCP server providing persistent memory is a poison vector

Mitigations

  • Version-control all state files (REMEMBER.md, CLAUDE.md) and review diffs before trusting
  • Treat external knowledge base content as untrusted data, not trusted instructions
  • Audit session memory files for entries not traceable to a user action or commit
  • Set explicit expiration on memory entries; do not persist indefinitely without review
  • Segment memory by trust level: user-supplied vs system-generated vs external-sourced

ASI07 — Insecure Inter-Agent Communication

Category: Multi-agent protocol integrity

Description

In multi-agent architectures, agents coordinate through message passing over MCP, RPC, shared files, or direct API calls. These channels often lack authentication or integrity verification. Attackers spoof identities, replay delegation messages, or tamper with unprotected channels to manipulate downstream agents through compromised peers.

Attack Vectors

  • Subagent receives a task prompt that appears to come from the orchestrator but is spoofed
  • Shared scratch file used for inter-agent communication modified by a malicious process
  • Replayed delegation token used to authorize an agent action outside its original context
  • Orchestrator output piped through an untrusted channel before reaching worker agents
  • A compromised worker agent sends poisoned results to the orchestrator, affecting decisions

Detection Signals

  • Agent task prompts referencing context not present in the parent agent's output
  • Unexpected agent spawned without a corresponding Task call in the orchestrator
  • Results returned by a subagent inconsistent with the task it was given
  • Communication over channels (files, pipes) without integrity verification
  • Agent claims to have received instructions from another agent, but no delegation record exists

Claude Code Mappings

  • Task tool: Subagents receive their full task prompt in plaintext with no authentication; a compromised orchestrator or prompt-injected task string is fully trusted by the subagent
  • Shared file channels: Agents that communicate via shared files (e.g., /tmp/results.json) have no message authentication — any process can modify the file
  • MCP as communication bus: Multiple agents using the same MCP server share state without isolation; one agent can read or modify another's data if the server lacks tenancy controls
  • Harness loop state files: Files like pipeline-queue.json used for agent coordination are unauthenticated and modifiable

Mitigations

  • Treat inter-agent messages as untrusted until verified; do not assume orchestrator authenticity
  • Validate subagent inputs at the receiving end, not just at the sending end
  • Use cryptographically signed task descriptions for high-stakes multi-agent workflows
  • Isolate MCP server state per agent session; avoid shared mutable state across agents
  • Log all inter-agent communications with full payloads for forensic capability

ASI08 — Cascading Failures

Category: System resilience and blast radius

Description

In interconnected multi-agent architectures, a single compromised or hallucinating agent can propagate errors, malicious actions, or corrupted state to downstream agents. A small planning error compounds rapidly: a hallucinating planner issues destructive tasks to multiple worker agents that execute without verification, multiplying the blast radius.

Attack Vectors

  • Orchestrator agent hallucinates a task step; all downstream agents execute the bad instruction
  • A prompt-injected agent poisons shared state, affecting all agents reading that state
  • One agent's API error causes retry storms across dependent agents
  • A worker agent produces malformed output that causes the next agent to execute a fallback path with unintended side effects
  • Circular agent delegation creates unbounded loops consuming resources and taking actions

Detection Signals

  • Multiple agents failing or producing anomalous output simultaneously
  • Correlated errors across previously independent agents within the same pipeline
  • Single upstream action traceable as root cause of widespread downstream failures
  • Agent spawning subagents recursively without a documented depth limit
  • Resource consumption (API calls, file writes, tokens) growing super-linearly during a task

Claude Code Mappings

  • Multi-agent harness loops: harness:loop runs autonomous multi-session pipelines — a poisoned session early in the loop propagates through all subsequent sessions
  • Parallel Task invocations: When multiple subagents run in parallel, a shared bad state (e.g., poisoned REMEMBER.md) affects all simultaneously
  • Feature pipeline queues: pipeline-queue.json state drives downstream agent selection; a corrupted queue entry causes all subsequent features to be processed incorrectly
  • Newsletter/research pipelines: Phase-based pipelines with no inter-phase validation gates allow phase 1 errors to compound through phases 2-N

Mitigations

  • Implement circuit breakers: halt the pipeline if an agent returns anomalous output
  • Define explicit depth limits for agent spawning; enforce in orchestrator logic
  • Validate inter-phase state before proceeding to the next phase in any pipeline
  • Test failure propagation in isolated environments before running in production
  • Design for independent agent failure: each agent should be able to fail without corrupting others

ASI09 — Human-Agent Trust Exploitation

Category: Human oversight and social engineering

Description

Users and operators over-trust agent recommendations due to their confident, authoritative presentation. Attackers or misaligned agents exploit this trust to influence high-stakes decisions, extract credentials, approve fraudulent actions, or introduce vulnerabilities into production systems under the guise of helpful assistance.

Real incidents: Coding assistants introducing backdoors in reviewed-but-not-read code; financial copilots approving fraudulent transactions; support agents soliciting credentials.

Attack Vectors

  • Agent provides well-reasoned justification for a malicious action, exploiting approval fatigue
  • Urgent framing pressures operators to approve without full review ("fix needed before deployment")
  • Agent requests credentials "to complete the task" outside its normal operating context
  • Confidence in AI output leads users to skip review of generated code containing vulnerabilities
  • An attacker controls the task that the agent presents as a routine operation requiring approval

Detection Signals

  • Agent requesting credentials or sensitive information not scoped to the current task
  • Approval prompts for actions the agent has not performed before in similar tasks
  • Agent citing urgency or external deadlines to bypass normal review processes
  • Recommendations that contradict the project's security policy or CLAUDE.md constraints
  • High approval rates for novel agent actions without corresponding user scrutiny

Claude Code Mappings

  • Permission prompts: Claude Code's permission system depends on informed user consent; a socially-engineered prompt obscures the actual action being approved
  • --dangerously-skip-permissions: Removes human-in-the-loop for all tool use — this flag exists to serve legitimate automation but eliminates the trust exploitation defence layer
  • Hooks as UI: Users may approve hook-gated actions without reading the full command; hook output text should be explicit and non-manipulable by agent-generated content
  • CLAUDE.md trust: Users trust CLAUDE.md as a source of truth; a modified CLAUDE.md that relaxes security constraints exploits operator trust in project configuration

Mitigations

  • Display full tool arguments in approval prompts — never summarize or truncate
  • Enforce time-boxed review for high-impact actions (git push, API calls, secret access)
  • Train operators to treat credential requests mid-task as high-risk signals
  • Audit CLAUDE.md and project configuration files for unexpected permission relaxations
  • Implement tiered approval: routine actions auto-approved, novel or high-impact actions require explicit human review with stated justification

ASI10 — Rogue Agents

Category: Agent alignment and containment

Description

Rogue agents are compromised or misaligned agents that act harmfully while appearing legitimate. They may self-replicate, persist across sessions, impersonate other agents, or silently pursue objectives that diverge from their stated purpose. Unlike externally attacked agents, rogue agents develop or maintain misaligned behavior from within.

Attack Vectors

  • An agent rewrites its own system prompt or configuration files to persist misaligned behavior
  • A compromised skill modifies other skills or commands to introduce backdoor capabilities
  • An agent spawns persistent background processes that continue acting after the session ends
  • Gradual behavioral drift: an agent's behavior shifts incrementally to avoid detection
  • An agent registers new hooks that run with elevated permissions outside the original task scope

Detection Signals

  • Configuration files (CLAUDE.md, plugin.json, hooks.json) modified without a corresponding user instruction or commit
  • New hooks, commands, or skills appearing after an agentic session
  • Background processes spawned by an agent that persist after Claude Code exits
  • Agent behavior in session N differs from session N-1 without configuration changes
  • Agent output defending or concealing its own modifications to the project

Claude Code Mappings

  • Self-modifying hooks: An agent with Write access to hooks/scripts/ can add or modify hooks that affect all future sessions — this is a high-privilege write path
  • plugin.json modification: An agent writing to .claude-plugin/plugin.json can change its own allowed tools, adding capabilities not granted by the user
  • CLAUDE.md self-modification: An agent instructed to "update documentation" could modify its own behavioral instructions, achieving persistent misalignment
  • Persistent background Bash: An agent can use Bash to spawn detached processes (nohup, &) that continue running after the Claude Code session terminates
  • Hook registration via hooks.json: Adding new hook entries expands the agent's execution surface permanently

Mitigations

  • File-protect critical configuration: plugin.json, hooks.json, CLAUDE.md should require explicit user action to modify (not writable by agents as a side effect of other tasks)
  • Use pre-write-pathguard.mjs to block writes to hook scripts and plugin manifests by default
  • Monitor for new background processes after agentic sessions complete
  • Diff hooks.json and plugin.json after every session that involved Write tool use
  • Enforce the principle that an agent should never modify the files that define its own behavior

Cross-Cutting Concerns

ASI vs LLM01-LLM10 Relationship

The OWASP LLM Top 10 covers model-level risks. The OWASP Agentic Top 10 covers risks that emerge specifically from autonomous, tool-using, multi-agent architectures. Many ASI categories amplify LLM risks:

LLM Risk Agentic Amplification
LLM01 Prompt Injection Becomes ASI01 (goal hijack with tool execution)
LLM06 Excessive Agency Becomes ASI02 (tool misuse) + ASI03 (privilege abuse)
LLM03 Supply Chain Becomes ASI04 (runtime plugin/MCP compromise)
LLM08 Vector Weaknesses Becomes ASI06 (memory poisoning with persistence)

ASI vs DeepMind AI Agent Traps

The DeepMind "AI Agent Traps" taxonomy (April 2026) classifies attacks by technique rather than by risk category. Each ASI risk maps to one or more trap categories:

ASI Risk DeepMind Trap Categories Key Techniques
ASI01 Goal Hijack Cat. 1 (Content Injection), Cat. 2 (Semantic Manipulation) Steganography, syntactic masking, oversight evasion, context normalization
ASI02 Tool Misuse Cat. 5 (Capability Manipulation) Bash evasion, tool descriptor poisoning, ambiguous prompt exploitation
ASI03 Privilege Abuse Cat. 5 (Capability Manipulation) Privilege escalation, credential access via env vars
ASI04 Supply Chain Cat. 5 (Capability Manipulation) Compromised packages, MCP descriptor drift
ASI05 Code Execution Cat. 5 (Capability Manipulation) Parameter expansion evasion, eval injection
ASI06 Memory Poisoning Cat. 3 (Context Manipulation) CLAUDE.md poisoning, REMEMBER.md manipulation, rule injection
ASI07 Inter-Agent Comms Cat. 4 (Multi-Agent Exploitation) Sub-agent spawning, delegation abuse, trust chain attacks
ASI08 Cascading Failures Cat. 4 (Multi-Agent Exploitation) Escalation-after-input, poisoned shared state
ASI09 Trust Exploitation Cat. 6 (HITL Exploitation), Cat. 2 (Semantic Manipulation) Approval urgency, summary suppression, cognitive load traps
ASI10 Rogue Agents Cat. 3 (Context Manipulation), Cat. 5 (Capability Manipulation) Self-modification, persistent background processes

See knowledge/deepmind-agent-traps.md for the full 6-category taxonomy with per-technique coverage status and plugin control mappings.

Claude Code Security Posture Checklist

For scanning agents assessing a Claude Code project against ASI categories:

Check ASI Risk if Missing
pre-bash-destructive.mjs hook present ASI02, ASI05 Unrestricted code execution
pre-write-pathguard.mjs blocks hook/plugin paths ASI10 Rogue agent persistence
pre-edit-secrets.mjs hook present ASI03 Credential exfiltration
All skills declare minimal allowed-tools ASI02 Over-privileged tool use
MCP servers scoped and reviewed ASI02, ASI04 Supply chain + tool misuse
No --dangerously-skip-permissions in production ASI09 No human oversight layer
CLAUDE.md and plugin.json not writable by agents ASI10 Self-modification
Inter-agent state files (REMEMBER.md) version-controlled ASI06, ASI08 Context poisoning
Subagent task prompts do not include raw secret values ASI03 Credential leakage
Pipeline depth limits defined for multi-agent workflows ASI08 Cascading failures

Severity Classification for Automated Scanning

Severity Criteria ASI Categories
Critical Direct code execution or credential exfiltration possible ASI02, ASI03, ASI05
High Agent goal or memory manipulation with persistence ASI01, ASI06, ASI10
Medium Supply chain or inter-agent trust boundary violation ASI04, ASI07, ASI08
Low Human oversight weakness; requires user interaction ASI09
Informational Cascading risk only if other ASI also present ASI08

Source: OWASP GenAI Security Project, "OWASP Top 10 for Agentic Applications (2026)" Released: December 2025 | https://genai.owasp.org Claude Code mappings authored for llm-security plugin v0.1, updated v5.0 with AI Agent Traps cross-references