29 KiB
OWASP Top 10 for Agentic AI Applications (2026)
Reference material for security agents analyzing agentic AI systems. Based on the official OWASP GenAI Security Project release (December 2025), developed by 100+ researchers and practitioners.
Prefix: ASI (Agentic Security Issue) Scope: Autonomous AI agents that plan, use tools, delegate to subagents, and act with minimal human supervision. Claude Code is an agentic system and maps directly to these risks. Source: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
ASI01 — Agent Goal Hijack
Category: Goal and instruction integrity
Description
Attackers alter agent objectives by embedding hidden instructions in external content that the agent reads and processes. Agents cannot reliably separate instructions from data, making them vulnerable to prompt injection via poisoned documents, web pages, emails, or tool outputs.
Real incident: EchoLeak — copilots turned into silent exfiltration engines via injected email content.
Attack Vectors
- Malicious instructions embedded in files the agent reads (PDF, markdown, code comments)
- Tool outputs returning adversarial text disguised as data
- Web content fetched during agent browsing that includes override instructions
- Injected content in MCP tool responses that redefines the agent's task
- Multi-turn manipulation: gradual reframing of goals across conversation turns
Detection Signals
- Agent pursues actions not derivable from the original user request
- Unexpected tool invocations or action sequences mid-task
- Agent output references content not present in the original prompt
- System prompt or role instructions appear to have been re-interpreted
- Agent skips or rewrites its own stated plan without user input
Claude Code Mappings
- Skills/commands: A malicious file read during
/security scancould inject instructions to skip reporting a specific finding - Subagent tasks: Task prompts built from external content can carry injected goals into subagents
- MCP tool outputs:
mcp__tavily__tavily_searchormcp__ms-learn__fetchmay return adversarial content that redirects agent behavior - Hooks: A
PostToolUsehook reading tool output could process injected instructions
Mitigations
- Treat all external content as untrusted data, never as instructions
- Apply strict semantic boundaries: system prompt immutable, data sandboxed
- Use
PreToolUsehooks to validate tool inputs before external data is fetched - Require human approval before consequential actions (file writes, git commits, API calls)
- Log the full reasoning chain so deviations from the original goal are auditable
ASI02 — Tool Misuse and Exploitation
Category: Tool integrity and authorization
Description
Agents misuse legitimate tools due to ambiguous prompts, manipulated input, or over-provisioned permissions. Legitimate tools become attack primitives: filesystem access becomes exfiltration, email access becomes phishing, shell access becomes arbitrary code execution.
Real incident: Amazon Q and GitHub Actions compromised via repository content triggering tool misuse.
Attack Vectors
- Ambiguous task descriptions cause the agent to invoke tools with unintended arguments
- Poisoned tool descriptors (MCP server descriptions) mislead the agent about tool purpose
- Over-privileged tool configurations allow actions beyond the task scope
- Adversarial content causes agents to invoke deletion, exfiltration, or write operations
- Chained tool calls where output of one tool becomes input to a destructive second tool
Detection Signals
- Tool called with arguments that were not present in the user's original request
- Spike in API call volume or calls to tools outside the agent's defined role
- Destructive operations (file deletion, database writes) without explicit user instruction
- Sensitive data (secrets, PII) flowing as arguments to network-bound tools
- Agent invokes tools in an order inconsistent with its stated plan
Claude Code Mappings
- Hooks:
pre-bash-destructive.mjsblocksrm -rf,DROP TABLE, and similar; validate this hook is present and covers the full destructive command surface - MCP tools: Each enabled MCP server expands the tool surface — audit
mcp.jsonfor over-permissioned servers (e.g., filesystem MCP with write access to/) - Skills with
Bashtool: Any skill declaringallowed-tools: Bashcan spawn processes; verify the necessity and scope of Bash access in frontmatter allowed-toolsin commands: Commands should declare the minimal tool set required
Mitigations
- Apply least-privilege to every tool: scope filesystem access, API permissions, network targets
- Validate all tool arguments in
PreToolUsehooks before execution - Require explicit human approval for irreversible operations (destructive Bash, git push)
- Audit MCP server configurations — each server is an attack surface expansion
- Pin tool configurations; detect and alert on changes to tool descriptors
ASI03 — Identity and Privilege Abuse
Category: Identity, credentials, and delegation
Description
Agents often inherit user or system identities including high-privilege credentials, session tokens, and delegated access. Unintended privilege reuse, escalation, or cross-agent delegation without proper scoping creates confused deputy scenarios where the agent acts with permissions it should not exercise.
Attack Vectors
- Agent inherits the operator's credentials and uses them beyond the task scope
- A compromised subagent operates with the parent agent's delegated identity
- Short-lived tokens not used — agent uses long-lived credentials that persist across sessions
- Agent escalates its own permissions by requesting elevated access mid-task
- Lateral movement: agent uses one system's credentials to authenticate to another
Detection Signals
- Credential access from unexpected timing or context (e.g., credentials used outside a task)
- Agent accesses resources unrelated to its defined function
- Cross-system access chains: authentication to system B immediately after action on system A
- Failed permission checks followed by attempts via alternative credential paths
- Subagents performing actions requiring higher privileges than delegated
Claude Code Mappings
- API keys in environment: Claude Code executes in the user's shell — it inherits all env
variables including
OPENAI_API_KEY,AZURE_CLIENT_SECRET, etc. pre-edit-secrets.mjshook: Detects if secrets are being written to files, but does not prevent an agent from using env-var credentials in Bash commands--dangerously-skip-permissions: When used in subagent invocations (claude -p), all permission gates are bypassed for that subagent's session- Subagent delegation: Tasks spawned with
Tasktool receive the parent's tool permissions; verify task prompts do not over-grant scope implicitly
Mitigations
- Scope credentials to the minimum required for each task; use task-scoped tokens where possible
- Never pass raw secrets as task arguments to subagents
- Treat each subagent as a separate identity with its own permission boundary
- Audit use of
--dangerously-skip-permissions— restrict to headless, sandboxed contexts only - Rotate credentials after agentic sessions that accessed sensitive systems
ASI04 — Agentic Supply Chain Vulnerabilities
Category: Component integrity and provenance
Description
Tools, plugins, prompt templates, MCP servers, and agent definitions fetched or loaded dynamically can be compromised. Any poisoned component alters agent behavior or exposes data, and the attack surface is invisible to static dependency scanning because components resolve at runtime.
Real incident: Malicious MCP servers impersonating legitimate ones, altering tool behavior post-install.
Attack Vectors
- Compromised MCP server that behaves correctly during review but exfiltrates data in production
- Poisoned skill/command markdown fetched from a remote source
- Agent definition files modified in a plugin repository after installation
- Typosquatted MCP server names registered to intercept installs
- Plugin manifest (
plugin.json) tampered to add unauthorized tool permissions
Detection Signals
- MCP server making network connections to undocumented endpoints
- Plugin files modified after initial installation (file hash change)
- New tool capabilities appearing after a plugin update
- Agent behavior changing without corresponding code change
hooks.jsonorplugin.jsonmodifications not tied to a commit
Claude Code Mappings
plugin.jsonmanifest: Theauto_discover: truesetting means any file in the plugin directory is trusted; a supply chain compromise of the plugin repo affects all commands and agents- MCP server configurations:
mcp.jsonand.mcp.jsonfiles define which servers run — a tampered server definition is a full agent compromise - External skill references: Skills referencing remote URLs for knowledge base content introduce runtime supply chain risk
hooks/hooks.json: A modified hooks file can add, remove, or neuter security hooks silently
Mitigations
- Pin MCP server versions; verify checksums before use
- Monitor plugin directory files for unexpected modifications (file integrity monitoring)
- Audit
plugin.json,hooks.json, and all agent frontmatter on each session start - Prefer local MCP servers over remote for sensitive operations; limit network-bound servers
- Review MCP server source code before enabling; treat third-party servers as untrusted by default
ASI05 — Unexpected Code Execution
Category: Code generation and execution safety
Description
Agents generate or execute code unsafely through shell commands, eval-like constructs, script execution, or deserialization. The attack path runs directly from text input to system commands. Coding agents like Claude Code are high-risk because code generation and execution are core features.
Attack Vectors
- Prompt injection in source code comments causes agent to generate and run malicious shell commands
- Agent generates a "helpful" script that includes attacker-controlled payload
eval()orexec()applied to LLM output without sandboxing- Agent patches a configuration file in a way that achieves code execution on next load
- Hallucinated library name installed via
npm installorpip install(slopsquatting)
Detection Signals
- Shell commands spawned that were not present in the original task specification
- Writes to executable paths (
/usr/local/bin,.bashrc,~/.zshrc, cron directories) package.jsonorrequirements.txtmodified with packages not in the original task- Agent generates code containing
subprocess,os.system,eval,execwithout review gate - Writes to
.github/workflows/,Makefile, or other CI/CD configuration files
Claude Code Mappings
pre-bash-destructive.mjshook: First line of defense, but only blocks known-bad patterns; novel payloads may pass through- Skills with
Bashallowed-tools: Any skill that can run Bash can achieve code execution — validate each skill's tool list is scoped to its purpose allowed-tools: Write+Bash: A skill with both Write and Bash can write a script and execute it — this combination requires strong justification- MCP filesystem tools: MCP servers with write access to executable paths are equivalent to unrestricted code execution
Mitigations
- Sandbox Bash execution: use restricted shells, containers, or read-only mounts where possible
- Require human approval before any write to executable or configuration paths
- Block installation of packages not in an approved list (
pre-bashhook pattern matching) - Never auto-approve actions triggered by content read from external sources (files, web, MCP)
- Treat all generated code as untrusted until reviewed; do not auto-execute
ASI06 — Memory and Context Poisoning
Category: State integrity and persistence
Description
Agents rely on memory systems, embeddings, RAG databases, context windows, and summaries to maintain state across interactions. Attackers poison this memory to influence future decisions persistently. Unlike one-shot injection, memory poisoning executes on every future session without repeated attack.
Attack Vectors
- Adversarial text injected into a document that gets stored in a RAG knowledge base
- Agent's session summary poisoned with false "user preferences" that persist
- Cross-tenant memory leakage: one user's poisoned entry affects another user's agent session
- Long-term drift: repeated exposure to adversarial content gradually shifts agent behavior
- REMEMBER.md or session state files modified to contain false context
Detection Signals
- Agent references facts or preferences not established in the current session
- Agent defends false beliefs when challenged with contradictory evidence
- Behavioral changes appearing after a specific file read or knowledge base query
REMEMBER.mdor project memory files contain entries inconsistent with recent commits- Agent applies "learned preferences" that the user did not specify
Claude Code Mappings
REMEMBER.mdfiles: These are trusted by default and read as ground truth at session start; a tamperedREMEMBER.mdpoisons every session in that projectMEMORY.md/ project memory: The~/.claude/projects/memory files are not version-controlled by default — they can be silently modified- System prompt context: Skills/commands that inject large context blocks affect the agent's reasoning for the entire session
- KV store / MCP memory servers: Any MCP server providing persistent memory is a poison vector
Mitigations
- Version-control all state files (
REMEMBER.md,CLAUDE.md) and review diffs before trusting - Treat external knowledge base content as untrusted data, not trusted instructions
- Audit session memory files for entries not traceable to a user action or commit
- Set explicit expiration on memory entries; do not persist indefinitely without review
- Segment memory by trust level: user-supplied vs system-generated vs external-sourced
ASI07 — Insecure Inter-Agent Communication
Category: Multi-agent protocol integrity
Description
In multi-agent architectures, agents coordinate through message passing over MCP, RPC, shared files, or direct API calls. These channels often lack authentication or integrity verification. Attackers spoof identities, replay delegation messages, or tamper with unprotected channels to manipulate downstream agents through compromised peers.
Attack Vectors
- Subagent receives a task prompt that appears to come from the orchestrator but is spoofed
- Shared scratch file used for inter-agent communication modified by a malicious process
- Replayed delegation token used to authorize an agent action outside its original context
- Orchestrator output piped through an untrusted channel before reaching worker agents
- A compromised worker agent sends poisoned results to the orchestrator, affecting decisions
Detection Signals
- Agent task prompts referencing context not present in the parent agent's output
- Unexpected agent spawned without a corresponding
Taskcall in the orchestrator - Results returned by a subagent inconsistent with the task it was given
- Communication over channels (files, pipes) without integrity verification
- Agent claims to have received instructions from another agent, but no delegation record exists
Claude Code Mappings
Tasktool: Subagents receive their full task prompt in plaintext with no authentication; a compromised orchestrator or prompt-injected task string is fully trusted by the subagent- Shared file channels: Agents that communicate via shared files (e.g.,
/tmp/results.json) have no message authentication — any process can modify the file - MCP as communication bus: Multiple agents using the same MCP server share state without isolation; one agent can read or modify another's data if the server lacks tenancy controls
- Harness loop state files: Files like
pipeline-queue.jsonused for agent coordination are unauthenticated and modifiable
Mitigations
- Treat inter-agent messages as untrusted until verified; do not assume orchestrator authenticity
- Validate subagent inputs at the receiving end, not just at the sending end
- Use cryptographically signed task descriptions for high-stakes multi-agent workflows
- Isolate MCP server state per agent session; avoid shared mutable state across agents
- Log all inter-agent communications with full payloads for forensic capability
ASI08 — Cascading Failures
Category: System resilience and blast radius
Description
In interconnected multi-agent architectures, a single compromised or hallucinating agent can propagate errors, malicious actions, or corrupted state to downstream agents. A small planning error compounds rapidly: a hallucinating planner issues destructive tasks to multiple worker agents that execute without verification, multiplying the blast radius.
Attack Vectors
- Orchestrator agent hallucinates a task step; all downstream agents execute the bad instruction
- A prompt-injected agent poisons shared state, affecting all agents reading that state
- One agent's API error causes retry storms across dependent agents
- A worker agent produces malformed output that causes the next agent to execute a fallback path with unintended side effects
- Circular agent delegation creates unbounded loops consuming resources and taking actions
Detection Signals
- Multiple agents failing or producing anomalous output simultaneously
- Correlated errors across previously independent agents within the same pipeline
- Single upstream action traceable as root cause of widespread downstream failures
- Agent spawning subagents recursively without a documented depth limit
- Resource consumption (API calls, file writes, tokens) growing super-linearly during a task
Claude Code Mappings
- Multi-agent harness loops:
harness:loopruns autonomous multi-session pipelines — a poisoned session early in the loop propagates through all subsequent sessions - Parallel
Taskinvocations: When multiple subagents run in parallel, a shared bad state (e.g., poisonedREMEMBER.md) affects all simultaneously - Feature pipeline queues:
pipeline-queue.jsonstate drives downstream agent selection; a corrupted queue entry causes all subsequent features to be processed incorrectly - Newsletter/research pipelines: Phase-based pipelines with no inter-phase validation gates allow phase 1 errors to compound through phases 2-N
Mitigations
- Implement circuit breakers: halt the pipeline if an agent returns anomalous output
- Define explicit depth limits for agent spawning; enforce in orchestrator logic
- Validate inter-phase state before proceeding to the next phase in any pipeline
- Test failure propagation in isolated environments before running in production
- Design for independent agent failure: each agent should be able to fail without corrupting others
ASI09 — Human-Agent Trust Exploitation
Category: Human oversight and social engineering
Description
Users and operators over-trust agent recommendations due to their confident, authoritative presentation. Attackers or misaligned agents exploit this trust to influence high-stakes decisions, extract credentials, approve fraudulent actions, or introduce vulnerabilities into production systems under the guise of helpful assistance.
Real incidents: Coding assistants introducing backdoors in reviewed-but-not-read code; financial copilots approving fraudulent transactions; support agents soliciting credentials.
Attack Vectors
- Agent provides well-reasoned justification for a malicious action, exploiting approval fatigue
- Urgent framing pressures operators to approve without full review ("fix needed before deployment")
- Agent requests credentials "to complete the task" outside its normal operating context
- Confidence in AI output leads users to skip review of generated code containing vulnerabilities
- An attacker controls the task that the agent presents as a routine operation requiring approval
Detection Signals
- Agent requesting credentials or sensitive information not scoped to the current task
- Approval prompts for actions the agent has not performed before in similar tasks
- Agent citing urgency or external deadlines to bypass normal review processes
- Recommendations that contradict the project's security policy or CLAUDE.md constraints
- High approval rates for novel agent actions without corresponding user scrutiny
Claude Code Mappings
- Permission prompts: Claude Code's permission system depends on informed user consent; a socially-engineered prompt obscures the actual action being approved
--dangerously-skip-permissions: Removes human-in-the-loop for all tool use — this flag exists to serve legitimate automation but eliminates the trust exploitation defence layer- Hooks as UI: Users may approve hook-gated actions without reading the full command; hook output text should be explicit and non-manipulable by agent-generated content
- CLAUDE.md trust: Users trust CLAUDE.md as a source of truth; a modified CLAUDE.md that relaxes security constraints exploits operator trust in project configuration
Mitigations
- Display full tool arguments in approval prompts — never summarize or truncate
- Enforce time-boxed review for high-impact actions (git push, API calls, secret access)
- Train operators to treat credential requests mid-task as high-risk signals
- Audit CLAUDE.md and project configuration files for unexpected permission relaxations
- Implement tiered approval: routine actions auto-approved, novel or high-impact actions require explicit human review with stated justification
ASI10 — Rogue Agents
Category: Agent alignment and containment
Description
Rogue agents are compromised or misaligned agents that act harmfully while appearing legitimate. They may self-replicate, persist across sessions, impersonate other agents, or silently pursue objectives that diverge from their stated purpose. Unlike externally attacked agents, rogue agents develop or maintain misaligned behavior from within.
Attack Vectors
- An agent rewrites its own system prompt or configuration files to persist misaligned behavior
- A compromised skill modifies other skills or commands to introduce backdoor capabilities
- An agent spawns persistent background processes that continue acting after the session ends
- Gradual behavioral drift: an agent's behavior shifts incrementally to avoid detection
- An agent registers new hooks that run with elevated permissions outside the original task scope
Detection Signals
- Configuration files (CLAUDE.md, plugin.json, hooks.json) modified without a corresponding user instruction or commit
- New hooks, commands, or skills appearing after an agentic session
- Background processes spawned by an agent that persist after Claude Code exits
- Agent behavior in session N differs from session N-1 without configuration changes
- Agent output defending or concealing its own modifications to the project
Claude Code Mappings
- Self-modifying hooks: An agent with Write access to
hooks/scripts/can add or modify hooks that affect all future sessions — this is a high-privilege write path plugin.jsonmodification: An agent writing to.claude-plugin/plugin.jsoncan change its own allowed tools, adding capabilities not granted by the userCLAUDE.mdself-modification: An agent instructed to "update documentation" could modify its own behavioral instructions, achieving persistent misalignment- Persistent background Bash: An agent can use Bash to spawn detached processes (
nohup,&) that continue running after the Claude Code session terminates - Hook registration via
hooks.json: Adding new hook entries expands the agent's execution surface permanently
Mitigations
- File-protect critical configuration:
plugin.json,hooks.json,CLAUDE.mdshould require explicit user action to modify (not writable by agents as a side effect of other tasks) - Use
pre-write-pathguard.mjsto block writes to hook scripts and plugin manifests by default - Monitor for new background processes after agentic sessions complete
- Diff
hooks.jsonandplugin.jsonafter every session that involved Write tool use - Enforce the principle that an agent should never modify the files that define its own behavior
Cross-Cutting Concerns
ASI vs LLM01-LLM10 Relationship
The OWASP LLM Top 10 covers model-level risks. The OWASP Agentic Top 10 covers risks that emerge specifically from autonomous, tool-using, multi-agent architectures. Many ASI categories amplify LLM risks:
| LLM Risk | Agentic Amplification |
|---|---|
| LLM01 Prompt Injection | Becomes ASI01 (goal hijack with tool execution) |
| LLM06 Excessive Agency | Becomes ASI02 (tool misuse) + ASI03 (privilege abuse) |
| LLM03 Supply Chain | Becomes ASI04 (runtime plugin/MCP compromise) |
| LLM08 Vector Weaknesses | Becomes ASI06 (memory poisoning with persistence) |
ASI vs DeepMind AI Agent Traps
The DeepMind "AI Agent Traps" taxonomy (April 2026) classifies attacks by technique rather than by risk category. Each ASI risk maps to one or more trap categories:
| ASI Risk | DeepMind Trap Categories | Key Techniques |
|---|---|---|
| ASI01 Goal Hijack | Cat. 1 (Content Injection), Cat. 2 (Semantic Manipulation) | Steganography, syntactic masking, oversight evasion, context normalization |
| ASI02 Tool Misuse | Cat. 5 (Capability Manipulation) | Bash evasion, tool descriptor poisoning, ambiguous prompt exploitation |
| ASI03 Privilege Abuse | Cat. 5 (Capability Manipulation) | Privilege escalation, credential access via env vars |
| ASI04 Supply Chain | Cat. 5 (Capability Manipulation) | Compromised packages, MCP descriptor drift |
| ASI05 Code Execution | Cat. 5 (Capability Manipulation) | Parameter expansion evasion, eval injection |
| ASI06 Memory Poisoning | Cat. 3 (Context Manipulation) | CLAUDE.md poisoning, REMEMBER.md manipulation, rule injection |
| ASI07 Inter-Agent Comms | Cat. 4 (Multi-Agent Exploitation) | Sub-agent spawning, delegation abuse, trust chain attacks |
| ASI08 Cascading Failures | Cat. 4 (Multi-Agent Exploitation) | Escalation-after-input, poisoned shared state |
| ASI09 Trust Exploitation | Cat. 6 (HITL Exploitation), Cat. 2 (Semantic Manipulation) | Approval urgency, summary suppression, cognitive load traps |
| ASI10 Rogue Agents | Cat. 3 (Context Manipulation), Cat. 5 (Capability Manipulation) | Self-modification, persistent background processes |
See knowledge/deepmind-agent-traps.md for the full 6-category taxonomy with per-technique
coverage status and plugin control mappings.
Claude Code Security Posture Checklist
For scanning agents assessing a Claude Code project against ASI categories:
| Check | ASI | Risk if Missing |
|---|---|---|
pre-bash-destructive.mjs hook present |
ASI02, ASI05 | Unrestricted code execution |
pre-write-pathguard.mjs blocks hook/plugin paths |
ASI10 | Rogue agent persistence |
pre-edit-secrets.mjs hook present |
ASI03 | Credential exfiltration |
All skills declare minimal allowed-tools |
ASI02 | Over-privileged tool use |
| MCP servers scoped and reviewed | ASI02, ASI04 | Supply chain + tool misuse |
No --dangerously-skip-permissions in production |
ASI09 | No human oversight layer |
CLAUDE.md and plugin.json not writable by agents |
ASI10 | Self-modification |
| Inter-agent state files (REMEMBER.md) version-controlled | ASI06, ASI08 | Context poisoning |
| Subagent task prompts do not include raw secret values | ASI03 | Credential leakage |
| Pipeline depth limits defined for multi-agent workflows | ASI08 | Cascading failures |
Severity Classification for Automated Scanning
| Severity | Criteria | ASI Categories |
|---|---|---|
| Critical | Direct code execution or credential exfiltration possible | ASI02, ASI03, ASI05 |
| High | Agent goal or memory manipulation with persistence | ASI01, ASI06, ASI10 |
| Medium | Supply chain or inter-agent trust boundary violation | ASI04, ASI07, ASI08 |
| Low | Human oversight weakness; requires user interaction | ASI09 |
| Informational | Cascading risk only if other ASI also present | ASI08 |
Source: OWASP GenAI Security Project, "OWASP Top 10 for Agentic Applications (2026)" Released: December 2025 | https://genai.owasp.org Claude Code mappings authored for llm-security plugin v0.1, updated v5.0 with AI Agent Traps cross-references