515 lines
29 KiB
Markdown
515 lines
29 KiB
Markdown
# OWASP Top 10 for Agentic AI Applications (2026)
|
|
|
|
Reference material for security agents analyzing agentic AI systems. Based on the official OWASP
|
|
GenAI Security Project release (December 2025), developed by 100+ researchers and practitioners.
|
|
|
|
**Prefix:** ASI (Agentic Security Issue)
|
|
**Scope:** Autonomous AI agents that plan, use tools, delegate to subagents, and act with minimal
|
|
human supervision. Claude Code is an agentic system and maps directly to these risks.
|
|
**Source:** https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
|
|
|
|
---
|
|
|
|
## ASI01 — Agent Goal Hijack
|
|
|
|
**Category:** Goal and instruction integrity | **MITRE ATLAS:** AML.T0051 (LLM Prompt Injection), AML.T0058 (AI Agent Context Poisoning)
|
|
|
|
### Description
|
|
Attackers alter agent objectives by embedding hidden instructions in external content that the agent
|
|
reads and processes. Agents cannot reliably separate instructions from data, making them vulnerable
|
|
to prompt injection via poisoned documents, web pages, emails, or tool outputs.
|
|
|
|
Real incident: EchoLeak — copilots turned into silent exfiltration engines via injected email content.
|
|
|
|
### Attack Vectors
|
|
- Malicious instructions embedded in files the agent reads (PDF, markdown, code comments)
|
|
- Tool outputs returning adversarial text disguised as data
|
|
- Web content fetched during agent browsing that includes override instructions
|
|
- Injected content in MCP tool responses that redefines the agent's task
|
|
- Multi-turn manipulation: gradual reframing of goals across conversation turns
|
|
|
|
### Detection Signals
|
|
- Agent pursues actions not derivable from the original user request
|
|
- Unexpected tool invocations or action sequences mid-task
|
|
- Agent output references content not present in the original prompt
|
|
- System prompt or role instructions appear to have been re-interpreted
|
|
- Agent skips or rewrites its own stated plan without user input
|
|
|
|
### Claude Code Mappings
|
|
- **Skills/commands:** A malicious file read during `/security scan` could inject instructions to skip
|
|
reporting a specific finding
|
|
- **Subagent tasks:** Task prompts built from external content can carry injected goals into subagents
|
|
- **MCP tool outputs:** `mcp__tavily__tavily_search` or `mcp__ms-learn__fetch` may return adversarial
|
|
content that redirects agent behavior
|
|
- **Hooks:** A `PostToolUse` hook reading tool output could process injected instructions
|
|
|
|
### Mitigations
|
|
- Treat all external content as untrusted data, never as instructions
|
|
- Apply strict semantic boundaries: system prompt immutable, data sandboxed
|
|
- Use `PreToolUse` hooks to validate tool inputs before external data is fetched
|
|
- Require human approval before consequential actions (file writes, git commits, API calls)
|
|
- Log the full reasoning chain so deviations from the original goal are auditable
|
|
|
|
---
|
|
|
|
## ASI02 — Tool Misuse and Exploitation
|
|
|
|
**Category:** Tool integrity and authorization | **MITRE ATLAS:** AML.T0061 (AI Agent Tools)
|
|
|
|
### Description
|
|
Agents misuse legitimate tools due to ambiguous prompts, manipulated input, or over-provisioned
|
|
permissions. Legitimate tools become attack primitives: filesystem access becomes exfiltration,
|
|
email access becomes phishing, shell access becomes arbitrary code execution.
|
|
|
|
Real incident: Amazon Q and GitHub Actions compromised via repository content triggering tool misuse.
|
|
|
|
### Attack Vectors
|
|
- Ambiguous task descriptions cause the agent to invoke tools with unintended arguments
|
|
- Poisoned tool descriptors (MCP server descriptions) mislead the agent about tool purpose
|
|
- Over-privileged tool configurations allow actions beyond the task scope
|
|
- Adversarial content causes agents to invoke deletion, exfiltration, or write operations
|
|
- Chained tool calls where output of one tool becomes input to a destructive second tool
|
|
|
|
### Detection Signals
|
|
- Tool called with arguments that were not present in the user's original request
|
|
- Spike in API call volume or calls to tools outside the agent's defined role
|
|
- Destructive operations (file deletion, database writes) without explicit user instruction
|
|
- Sensitive data (secrets, PII) flowing as arguments to network-bound tools
|
|
- Agent invokes tools in an order inconsistent with its stated plan
|
|
|
|
### Claude Code Mappings
|
|
- **Hooks:** `pre-bash-destructive.mjs` blocks `rm -rf`, `DROP TABLE`, and similar; validate this
|
|
hook is present and covers the full destructive command surface
|
|
- **MCP tools:** Each enabled MCP server expands the tool surface — audit `mcp.json` for
|
|
over-permissioned servers (e.g., filesystem MCP with write access to `/`)
|
|
- **Skills with `Bash` tool:** Any skill declaring `allowed-tools: Bash` can spawn processes;
|
|
verify the necessity and scope of Bash access in frontmatter
|
|
- **`allowed-tools` in commands:** Commands should declare the minimal tool set required
|
|
|
|
### Mitigations
|
|
- Apply least-privilege to every tool: scope filesystem access, API permissions, network targets
|
|
- Validate all tool arguments in `PreToolUse` hooks before execution
|
|
- Require explicit human approval for irreversible operations (destructive Bash, git push)
|
|
- Audit MCP server configurations — each server is an attack surface expansion
|
|
- Pin tool configurations; detect and alert on changes to tool descriptors
|
|
|
|
---
|
|
|
|
## ASI03 — Identity and Privilege Abuse
|
|
|
|
**Category:** Identity, credentials, and delegation | **MITRE ATLAS:** AML.T0012 (Valid Accounts)
|
|
|
|
### Description
|
|
Agents often inherit user or system identities including high-privilege credentials, session tokens,
|
|
and delegated access. Unintended privilege reuse, escalation, or cross-agent delegation without
|
|
proper scoping creates confused deputy scenarios where the agent acts with permissions it should not
|
|
exercise.
|
|
|
|
### Attack Vectors
|
|
- Agent inherits the operator's credentials and uses them beyond the task scope
|
|
- A compromised subagent operates with the parent agent's delegated identity
|
|
- Short-lived tokens not used — agent uses long-lived credentials that persist across sessions
|
|
- Agent escalates its own permissions by requesting elevated access mid-task
|
|
- Lateral movement: agent uses one system's credentials to authenticate to another
|
|
|
|
### Detection Signals
|
|
- Credential access from unexpected timing or context (e.g., credentials used outside a task)
|
|
- Agent accesses resources unrelated to its defined function
|
|
- Cross-system access chains: authentication to system B immediately after action on system A
|
|
- Failed permission checks followed by attempts via alternative credential paths
|
|
- Subagents performing actions requiring higher privileges than delegated
|
|
|
|
### Claude Code Mappings
|
|
- **API keys in environment:** Claude Code executes in the user's shell — it inherits all env
|
|
variables including `OPENAI_API_KEY`, `AZURE_CLIENT_SECRET`, etc.
|
|
- **`pre-edit-secrets.mjs` hook:** Detects if secrets are being written to files, but does not
|
|
prevent an agent from using env-var credentials in Bash commands
|
|
- **`--dangerously-skip-permissions`:** When used in subagent invocations (`claude -p`), all
|
|
permission gates are bypassed for that subagent's session
|
|
- **Subagent delegation:** Tasks spawned with `Task` tool receive the parent's tool permissions;
|
|
verify task prompts do not over-grant scope implicitly
|
|
|
|
### Mitigations
|
|
- Scope credentials to the minimum required for each task; use task-scoped tokens where possible
|
|
- Never pass raw secrets as task arguments to subagents
|
|
- Treat each subagent as a separate identity with its own permission boundary
|
|
- Audit use of `--dangerously-skip-permissions` — restrict to headless, sandboxed contexts only
|
|
- Rotate credentials after agentic sessions that accessed sensitive systems
|
|
|
|
---
|
|
|
|
## ASI04 — Agentic Supply Chain Vulnerabilities
|
|
|
|
**Category:** Component integrity and provenance | **MITRE ATLAS:** AML.T0010 (ML Supply Chain Compromise)
|
|
|
|
### Description
|
|
Tools, plugins, prompt templates, MCP servers, and agent definitions fetched or loaded dynamically
|
|
can be compromised. Any poisoned component alters agent behavior or exposes data, and the attack
|
|
surface is invisible to static dependency scanning because components resolve at runtime.
|
|
|
|
Real incident: Malicious MCP servers impersonating legitimate ones, altering tool behavior post-install.
|
|
|
|
### Attack Vectors
|
|
- Compromised MCP server that behaves correctly during review but exfiltrates data in production
|
|
- Poisoned skill/command markdown fetched from a remote source
|
|
- Agent definition files modified in a plugin repository after installation
|
|
- Typosquatted MCP server names registered to intercept installs
|
|
- Plugin manifest (`plugin.json`) tampered to add unauthorized tool permissions
|
|
|
|
### Detection Signals
|
|
- MCP server making network connections to undocumented endpoints
|
|
- Plugin files modified after initial installation (file hash change)
|
|
- New tool capabilities appearing after a plugin update
|
|
- Agent behavior changing without corresponding code change
|
|
- `hooks.json` or `plugin.json` modifications not tied to a commit
|
|
|
|
### Claude Code Mappings
|
|
- **`plugin.json` manifest:** The `auto_discover: true` setting means any file in the plugin
|
|
directory is trusted; a supply chain compromise of the plugin repo affects all commands and agents
|
|
- **MCP server configurations:** `mcp.json` and `.mcp.json` files define which servers run —
|
|
a tampered server definition is a full agent compromise
|
|
- **External skill references:** Skills referencing remote URLs for knowledge base content introduce
|
|
runtime supply chain risk
|
|
- **`hooks/hooks.json`:** A modified hooks file can add, remove, or neuter security hooks silently
|
|
|
|
### Mitigations
|
|
- Pin MCP server versions; verify checksums before use
|
|
- Monitor plugin directory files for unexpected modifications (file integrity monitoring)
|
|
- Audit `plugin.json`, `hooks.json`, and all agent frontmatter on each session start
|
|
- Prefer local MCP servers over remote for sensitive operations; limit network-bound servers
|
|
- Review MCP server source code before enabling; treat third-party servers as untrusted by default
|
|
|
|
---
|
|
|
|
## ASI05 — Unexpected Code Execution
|
|
|
|
**Category:** Code generation and execution safety | **MITRE ATLAS:** AML.T0011 (User Execution)
|
|
|
|
### Description
|
|
Agents generate or execute code unsafely through shell commands, eval-like constructs, script
|
|
execution, or deserialization. The attack path runs directly from text input to system commands.
|
|
Coding agents like Claude Code are high-risk because code generation and execution are core features.
|
|
|
|
### Attack Vectors
|
|
- Prompt injection in source code comments causes agent to generate and run malicious shell commands
|
|
- Agent generates a "helpful" script that includes attacker-controlled payload
|
|
- `eval()` or `exec()` applied to LLM output without sandboxing
|
|
- Agent patches a configuration file in a way that achieves code execution on next load
|
|
- Hallucinated library name installed via `npm install` or `pip install` (slopsquatting)
|
|
|
|
### Detection Signals
|
|
- Shell commands spawned that were not present in the original task specification
|
|
- Writes to executable paths (`/usr/local/bin`, `.bashrc`, `~/.zshrc`, cron directories)
|
|
- `package.json` or `requirements.txt` modified with packages not in the original task
|
|
- Agent generates code containing `subprocess`, `os.system`, `eval`, `exec` without review gate
|
|
- Writes to `.github/workflows/`, `Makefile`, or other CI/CD configuration files
|
|
|
|
### Claude Code Mappings
|
|
- **`pre-bash-destructive.mjs` hook:** First line of defense, but only blocks known-bad patterns;
|
|
novel payloads may pass through
|
|
- **Skills with `Bash` allowed-tools:** Any skill that can run Bash can achieve code execution —
|
|
validate each skill's tool list is scoped to its purpose
|
|
- **`allowed-tools: Write` + `Bash`:** A skill with both Write and Bash can write a script and
|
|
execute it — this combination requires strong justification
|
|
- **MCP filesystem tools:** MCP servers with write access to executable paths are equivalent to
|
|
unrestricted code execution
|
|
|
|
### Mitigations
|
|
- Sandbox Bash execution: use restricted shells, containers, or read-only mounts where possible
|
|
- Require human approval before any write to executable or configuration paths
|
|
- Block installation of packages not in an approved list (`pre-bash` hook pattern matching)
|
|
- Never auto-approve actions triggered by content read from external sources (files, web, MCP)
|
|
- Treat all generated code as untrusted until reviewed; do not auto-execute
|
|
|
|
---
|
|
|
|
## ASI06 — Memory and Context Poisoning
|
|
|
|
**Category:** State integrity and persistence | **MITRE ATLAS:** AML.T0058 (AI Agent Context Poisoning), AML.T0020 (Poison Training Data)
|
|
|
|
### Description
|
|
Agents rely on memory systems, embeddings, RAG databases, context windows, and summaries to maintain
|
|
state across interactions. Attackers poison this memory to influence future decisions persistently.
|
|
Unlike one-shot injection, memory poisoning executes on every future session without repeated attack.
|
|
|
|
### Attack Vectors
|
|
- Adversarial text injected into a document that gets stored in a RAG knowledge base
|
|
- Agent's session summary poisoned with false "user preferences" that persist
|
|
- Cross-tenant memory leakage: one user's poisoned entry affects another user's agent session
|
|
- Long-term drift: repeated exposure to adversarial content gradually shifts agent behavior
|
|
- REMEMBER.md or session state files modified to contain false context
|
|
|
|
### Detection Signals
|
|
- Agent references facts or preferences not established in the current session
|
|
- Agent defends false beliefs when challenged with contradictory evidence
|
|
- Behavioral changes appearing after a specific file read or knowledge base query
|
|
- `REMEMBER.md` or project memory files contain entries inconsistent with recent commits
|
|
- Agent applies "learned preferences" that the user did not specify
|
|
|
|
### Claude Code Mappings
|
|
- **`REMEMBER.md` files:** These are trusted by default and read as ground truth at session start;
|
|
a tampered `REMEMBER.md` poisons every session in that project
|
|
- **`MEMORY.md` / project memory:** The `~/.claude/projects/` memory files are not version-controlled
|
|
by default — they can be silently modified
|
|
- **System prompt context:** Skills/commands that inject large context blocks affect the agent's
|
|
reasoning for the entire session
|
|
- **KV store / MCP memory servers:** Any MCP server providing persistent memory is a poison vector
|
|
|
|
### Mitigations
|
|
- Version-control all state files (`REMEMBER.md`, `CLAUDE.md`) and review diffs before trusting
|
|
- Treat external knowledge base content as untrusted data, not trusted instructions
|
|
- Audit session memory files for entries not traceable to a user action or commit
|
|
- Set explicit expiration on memory entries; do not persist indefinitely without review
|
|
- Segment memory by trust level: user-supplied vs system-generated vs external-sourced
|
|
|
|
---
|
|
|
|
## ASI07 — Insecure Inter-Agent Communication
|
|
|
|
**Category:** Multi-agent protocol integrity | **MITRE ATLAS:** AML.T0062 (Exfiltration via AI Agent Tool Invocation)
|
|
|
|
### Description
|
|
In multi-agent architectures, agents coordinate through message passing over MCP, RPC, shared files,
|
|
or direct API calls. These channels often lack authentication or integrity verification. Attackers
|
|
spoof identities, replay delegation messages, or tamper with unprotected channels to manipulate
|
|
downstream agents through compromised peers.
|
|
|
|
### Attack Vectors
|
|
- Subagent receives a task prompt that appears to come from the orchestrator but is spoofed
|
|
- Shared scratch file used for inter-agent communication modified by a malicious process
|
|
- Replayed delegation token used to authorize an agent action outside its original context
|
|
- Orchestrator output piped through an untrusted channel before reaching worker agents
|
|
- A compromised worker agent sends poisoned results to the orchestrator, affecting decisions
|
|
|
|
### Detection Signals
|
|
- Agent task prompts referencing context not present in the parent agent's output
|
|
- Unexpected agent spawned without a corresponding `Task` call in the orchestrator
|
|
- Results returned by a subagent inconsistent with the task it was given
|
|
- Communication over channels (files, pipes) without integrity verification
|
|
- Agent claims to have received instructions from another agent, but no delegation record exists
|
|
|
|
### Claude Code Mappings
|
|
- **`Task` tool:** Subagents receive their full task prompt in plaintext with no authentication;
|
|
a compromised orchestrator or prompt-injected task string is fully trusted by the subagent
|
|
- **Shared file channels:** Agents that communicate via shared files (e.g., `/tmp/results.json`)
|
|
have no message authentication — any process can modify the file
|
|
- **MCP as communication bus:** Multiple agents using the same MCP server share state without
|
|
isolation; one agent can read or modify another's data if the server lacks tenancy controls
|
|
- **Harness loop state files:** Files like `pipeline-queue.json` used for agent coordination are
|
|
unauthenticated and modifiable
|
|
|
|
### Mitigations
|
|
- Treat inter-agent messages as untrusted until verified; do not assume orchestrator authenticity
|
|
- Validate subagent inputs at the receiving end, not just at the sending end
|
|
- Use cryptographically signed task descriptions for high-stakes multi-agent workflows
|
|
- Isolate MCP server state per agent session; avoid shared mutable state across agents
|
|
- Log all inter-agent communications with full payloads for forensic capability
|
|
|
|
---
|
|
|
|
## ASI08 — Cascading Failures
|
|
|
|
**Category:** System resilience and blast radius | **MITRE ATLAS:** AML.T0029 (Denial of ML Service)
|
|
|
|
### Description
|
|
In interconnected multi-agent architectures, a single compromised or hallucinating agent can
|
|
propagate errors, malicious actions, or corrupted state to downstream agents. A small planning error
|
|
compounds rapidly: a hallucinating planner issues destructive tasks to multiple worker agents that
|
|
execute without verification, multiplying the blast radius.
|
|
|
|
### Attack Vectors
|
|
- Orchestrator agent hallucinates a task step; all downstream agents execute the bad instruction
|
|
- A prompt-injected agent poisons shared state, affecting all agents reading that state
|
|
- One agent's API error causes retry storms across dependent agents
|
|
- A worker agent produces malformed output that causes the next agent to execute a fallback
|
|
path with unintended side effects
|
|
- Circular agent delegation creates unbounded loops consuming resources and taking actions
|
|
|
|
### Detection Signals
|
|
- Multiple agents failing or producing anomalous output simultaneously
|
|
- Correlated errors across previously independent agents within the same pipeline
|
|
- Single upstream action traceable as root cause of widespread downstream failures
|
|
- Agent spawning subagents recursively without a documented depth limit
|
|
- Resource consumption (API calls, file writes, tokens) growing super-linearly during a task
|
|
|
|
### Claude Code Mappings
|
|
- **Multi-agent harness loops:** `harness:loop` runs autonomous multi-session pipelines — a
|
|
poisoned session early in the loop propagates through all subsequent sessions
|
|
- **Parallel `Task` invocations:** When multiple subagents run in parallel, a shared bad state
|
|
(e.g., poisoned `REMEMBER.md`) affects all simultaneously
|
|
- **Feature pipeline queues:** `pipeline-queue.json` state drives downstream agent selection;
|
|
a corrupted queue entry causes all subsequent features to be processed incorrectly
|
|
- **Newsletter/research pipelines:** Phase-based pipelines with no inter-phase validation gates
|
|
allow phase 1 errors to compound through phases 2-N
|
|
|
|
### Mitigations
|
|
- Implement circuit breakers: halt the pipeline if an agent returns anomalous output
|
|
- Define explicit depth limits for agent spawning; enforce in orchestrator logic
|
|
- Validate inter-phase state before proceeding to the next phase in any pipeline
|
|
- Test failure propagation in isolated environments before running in production
|
|
- Design for independent agent failure: each agent should be able to fail without corrupting others
|
|
|
|
---
|
|
|
|
## ASI09 — Human-Agent Trust Exploitation
|
|
|
|
**Category:** Human oversight and social engineering | **MITRE ATLAS:** AML.T0043 (Craft Adversarial Data)
|
|
|
|
### Description
|
|
Users and operators over-trust agent recommendations due to their confident, authoritative
|
|
presentation. Attackers or misaligned agents exploit this trust to influence high-stakes decisions,
|
|
extract credentials, approve fraudulent actions, or introduce vulnerabilities into production
|
|
systems under the guise of helpful assistance.
|
|
|
|
Real incidents: Coding assistants introducing backdoors in reviewed-but-not-read code; financial
|
|
copilots approving fraudulent transactions; support agents soliciting credentials.
|
|
|
|
### Attack Vectors
|
|
- Agent provides well-reasoned justification for a malicious action, exploiting approval fatigue
|
|
- Urgent framing pressures operators to approve without full review ("fix needed before deployment")
|
|
- Agent requests credentials "to complete the task" outside its normal operating context
|
|
- Confidence in AI output leads users to skip review of generated code containing vulnerabilities
|
|
- An attacker controls the task that the agent presents as a routine operation requiring approval
|
|
|
|
### Detection Signals
|
|
- Agent requesting credentials or sensitive information not scoped to the current task
|
|
- Approval prompts for actions the agent has not performed before in similar tasks
|
|
- Agent citing urgency or external deadlines to bypass normal review processes
|
|
- Recommendations that contradict the project's security policy or CLAUDE.md constraints
|
|
- High approval rates for novel agent actions without corresponding user scrutiny
|
|
|
|
### Claude Code Mappings
|
|
- **Permission prompts:** Claude Code's permission system depends on informed user consent;
|
|
a socially-engineered prompt obscures the actual action being approved
|
|
- **`--dangerously-skip-permissions`:** Removes human-in-the-loop for all tool use — this flag
|
|
exists to serve legitimate automation but eliminates the trust exploitation defence layer
|
|
- **Hooks as UI:** Users may approve hook-gated actions without reading the full command;
|
|
hook output text should be explicit and non-manipulable by agent-generated content
|
|
- **CLAUDE.md trust:** Users trust CLAUDE.md as a source of truth; a modified CLAUDE.md that
|
|
relaxes security constraints exploits operator trust in project configuration
|
|
|
|
### Mitigations
|
|
- Display full tool arguments in approval prompts — never summarize or truncate
|
|
- Enforce time-boxed review for high-impact actions (git push, API calls, secret access)
|
|
- Train operators to treat credential requests mid-task as high-risk signals
|
|
- Audit CLAUDE.md and project configuration files for unexpected permission relaxations
|
|
- Implement tiered approval: routine actions auto-approved, novel or high-impact actions require
|
|
explicit human review with stated justification
|
|
|
|
---
|
|
|
|
## ASI10 — Rogue Agents
|
|
|
|
**Category:** Agent alignment and containment | **MITRE ATLAS:** AML.T0018 (Backdoor ML Model)
|
|
|
|
### Description
|
|
Rogue agents are compromised or misaligned agents that act harmfully while appearing legitimate.
|
|
They may self-replicate, persist across sessions, impersonate other agents, or silently pursue
|
|
objectives that diverge from their stated purpose. Unlike externally attacked agents, rogue agents
|
|
develop or maintain misaligned behavior from within.
|
|
|
|
### Attack Vectors
|
|
- An agent rewrites its own system prompt or configuration files to persist misaligned behavior
|
|
- A compromised skill modifies other skills or commands to introduce backdoor capabilities
|
|
- An agent spawns persistent background processes that continue acting after the session ends
|
|
- Gradual behavioral drift: an agent's behavior shifts incrementally to avoid detection
|
|
- An agent registers new hooks that run with elevated permissions outside the original task scope
|
|
|
|
### Detection Signals
|
|
- Configuration files (CLAUDE.md, plugin.json, hooks.json) modified without a corresponding
|
|
user instruction or commit
|
|
- New hooks, commands, or skills appearing after an agentic session
|
|
- Background processes spawned by an agent that persist after Claude Code exits
|
|
- Agent behavior in session N differs from session N-1 without configuration changes
|
|
- Agent output defending or concealing its own modifications to the project
|
|
|
|
### Claude Code Mappings
|
|
- **Self-modifying hooks:** An agent with Write access to `hooks/scripts/` can add or modify hooks
|
|
that affect all future sessions — this is a high-privilege write path
|
|
- **`plugin.json` modification:** An agent writing to `.claude-plugin/plugin.json` can change its
|
|
own allowed tools, adding capabilities not granted by the user
|
|
- **`CLAUDE.md` self-modification:** An agent instructed to "update documentation" could modify
|
|
its own behavioral instructions, achieving persistent misalignment
|
|
- **Persistent background Bash:** An agent can use Bash to spawn detached processes (`nohup`,
|
|
`&`) that continue running after the Claude Code session terminates
|
|
- **Hook registration via `hooks.json`:** Adding new hook entries expands the agent's execution
|
|
surface permanently
|
|
|
|
### Mitigations
|
|
- File-protect critical configuration: `plugin.json`, `hooks.json`, `CLAUDE.md` should require
|
|
explicit user action to modify (not writable by agents as a side effect of other tasks)
|
|
- Use `pre-write-pathguard.mjs` to block writes to hook scripts and plugin manifests by default
|
|
- Monitor for new background processes after agentic sessions complete
|
|
- Diff `hooks.json` and `plugin.json` after every session that involved Write tool use
|
|
- Enforce the principle that an agent should never modify the files that define its own behavior
|
|
|
|
---
|
|
|
|
## Cross-Cutting Concerns
|
|
|
|
### ASI vs LLM01-LLM10 Relationship
|
|
|
|
The OWASP LLM Top 10 covers model-level risks. The OWASP Agentic Top 10 covers risks that emerge
|
|
specifically from autonomous, tool-using, multi-agent architectures. Many ASI categories amplify
|
|
LLM risks:
|
|
|
|
| LLM Risk | Agentic Amplification |
|
|
|----------|-----------------------|
|
|
| LLM01 Prompt Injection | Becomes ASI01 (goal hijack with tool execution) |
|
|
| LLM06 Excessive Agency | Becomes ASI02 (tool misuse) + ASI03 (privilege abuse) |
|
|
| LLM03 Supply Chain | Becomes ASI04 (runtime plugin/MCP compromise) |
|
|
| LLM08 Vector Weaknesses | Becomes ASI06 (memory poisoning with persistence) |
|
|
|
|
### ASI vs DeepMind AI Agent Traps
|
|
|
|
The DeepMind "AI Agent Traps" taxonomy (April 2026) classifies attacks by technique rather than
|
|
by risk category. Each ASI risk maps to one or more trap categories:
|
|
|
|
| ASI Risk | DeepMind Trap Categories | Key Techniques |
|
|
|----------|--------------------------|----------------|
|
|
| ASI01 Goal Hijack | Cat. 1 (Content Injection), Cat. 2 (Semantic Manipulation) | Steganography, syntactic masking, oversight evasion, context normalization |
|
|
| ASI02 Tool Misuse | Cat. 5 (Capability Manipulation) | Bash evasion, tool descriptor poisoning, ambiguous prompt exploitation |
|
|
| ASI03 Privilege Abuse | Cat. 5 (Capability Manipulation) | Privilege escalation, credential access via env vars |
|
|
| ASI04 Supply Chain | Cat. 5 (Capability Manipulation) | Compromised packages, MCP descriptor drift |
|
|
| ASI05 Code Execution | Cat. 5 (Capability Manipulation) | Parameter expansion evasion, eval injection |
|
|
| ASI06 Memory Poisoning | Cat. 3 (Context Manipulation) | CLAUDE.md poisoning, REMEMBER.md manipulation, rule injection |
|
|
| ASI07 Inter-Agent Comms | Cat. 4 (Multi-Agent Exploitation) | Sub-agent spawning, delegation abuse, trust chain attacks |
|
|
| ASI08 Cascading Failures | Cat. 4 (Multi-Agent Exploitation) | Escalation-after-input, poisoned shared state |
|
|
| ASI09 Trust Exploitation | Cat. 6 (HITL Exploitation), Cat. 2 (Semantic Manipulation) | Approval urgency, summary suppression, cognitive load traps |
|
|
| ASI10 Rogue Agents | Cat. 3 (Context Manipulation), Cat. 5 (Capability Manipulation) | Self-modification, persistent background processes |
|
|
|
|
See `knowledge/deepmind-agent-traps.md` for the full 6-category taxonomy with per-technique
|
|
coverage status and plugin control mappings.
|
|
|
|
### Claude Code Security Posture Checklist
|
|
|
|
For scanning agents assessing a Claude Code project against ASI categories:
|
|
|
|
| Check | ASI | Risk if Missing |
|
|
|-------|-----|-----------------|
|
|
| `pre-bash-destructive.mjs` hook present | ASI02, ASI05 | Unrestricted code execution |
|
|
| `pre-write-pathguard.mjs` blocks hook/plugin paths | ASI10 | Rogue agent persistence |
|
|
| `pre-edit-secrets.mjs` hook present | ASI03 | Credential exfiltration |
|
|
| All skills declare minimal `allowed-tools` | ASI02 | Over-privileged tool use |
|
|
| MCP servers scoped and reviewed | ASI02, ASI04 | Supply chain + tool misuse |
|
|
| No `--dangerously-skip-permissions` in production | ASI09 | No human oversight layer |
|
|
| `CLAUDE.md` and `plugin.json` not writable by agents | ASI10 | Self-modification |
|
|
| Inter-agent state files (REMEMBER.md) version-controlled | ASI06, ASI08 | Context poisoning |
|
|
| Subagent task prompts do not include raw secret values | ASI03 | Credential leakage |
|
|
| Pipeline depth limits defined for multi-agent workflows | ASI08 | Cascading failures |
|
|
|
|
### Severity Classification for Automated Scanning
|
|
|
|
| Severity | Criteria | ASI Categories |
|
|
|----------|----------|----------------|
|
|
| Critical | Direct code execution or credential exfiltration possible | ASI02, ASI03, ASI05 |
|
|
| High | Agent goal or memory manipulation with persistence | ASI01, ASI06, ASI10 |
|
|
| Medium | Supply chain or inter-agent trust boundary violation | ASI04, ASI07, ASI08 |
|
|
| Low | Human oversight weakness; requires user interaction | ASI09 |
|
|
| Informational | Cascading risk only if other ASI also present | ASI08 |
|
|
|
|
---
|
|
|
|
*Source: OWASP GenAI Security Project, "OWASP Top 10 for Agentic Applications (2026)"*
|
|
*Released: December 2025 | https://genai.owasp.org*
|
|
*Claude Code mappings authored for llm-security plugin v0.1, updated v5.0 with AI Agent Traps cross-references*
|