Kjell Tore Guttormsen f93d6abdae feat: initial open marketplace with llm-security, config-audit, ultraplan-local

2026-04-06 18:47:49 +02:00

29 KiB

Raw Blame History

OWASP Top 10 for Agentic AI Applications (2026)

Reference material for security agents analyzing agentic AI systems. Based on the official OWASP GenAI Security Project release (December 2025), developed by 100+ researchers and practitioners.

Prefix: ASI (Agentic Security Issue) Scope: Autonomous AI agents that plan, use tools, delegate to subagents, and act with minimal human supervision. Claude Code is an agentic system and maps directly to these risks. Source: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

ASI01 — Agent Goal Hijack

Category: Goal and instruction integrity

Description

Attackers alter agent objectives by embedding hidden instructions in external content that the agent reads and processes. Agents cannot reliably separate instructions from data, making them vulnerable to prompt injection via poisoned documents, web pages, emails, or tool outputs.

Real incident: EchoLeak — copilots turned into silent exfiltration engines via injected email content.

Attack Vectors

Malicious instructions embedded in files the agent reads (PDF, markdown, code comments)
Tool outputs returning adversarial text disguised as data
Web content fetched during agent browsing that includes override instructions
Injected content in MCP tool responses that redefines the agent's task
Multi-turn manipulation: gradual reframing of goals across conversation turns

Detection Signals

Agent pursues actions not derivable from the original user request
Unexpected tool invocations or action sequences mid-task
Agent output references content not present in the original prompt
System prompt or role instructions appear to have been re-interpreted
Agent skips or rewrites its own stated plan without user input

Claude Code Mappings

Skills/commands: A malicious file read during /security scan could inject instructions to skip reporting a specific finding
Subagent tasks: Task prompts built from external content can carry injected goals into subagents
MCP tool outputs: mcp__tavily__tavily_search or mcp__ms-learn__fetch may return adversarial content that redirects agent behavior
Hooks: A PostToolUse hook reading tool output could process injected instructions

Mitigations

Treat all external content as untrusted data, never as instructions
Apply strict semantic boundaries: system prompt immutable, data sandboxed
Use PreToolUse hooks to validate tool inputs before external data is fetched
Require human approval before consequential actions (file writes, git commits, API calls)
Log the full reasoning chain so deviations from the original goal are auditable

ASI02 — Tool Misuse and Exploitation

Category: Tool integrity and authorization

Description

Agents misuse legitimate tools due to ambiguous prompts, manipulated input, or over-provisioned permissions. Legitimate tools become attack primitives: filesystem access becomes exfiltration, email access becomes phishing, shell access becomes arbitrary code execution.

Real incident: Amazon Q and GitHub Actions compromised via repository content triggering tool misuse.

Attack Vectors

Ambiguous task descriptions cause the agent to invoke tools with unintended arguments
Poisoned tool descriptors (MCP server descriptions) mislead the agent about tool purpose
Over-privileged tool configurations allow actions beyond the task scope
Adversarial content causes agents to invoke deletion, exfiltration, or write operations
Chained tool calls where output of one tool becomes input to a destructive second tool

Detection Signals

Tool called with arguments that were not present in the user's original request
Spike in API call volume or calls to tools outside the agent's defined role
Destructive operations (file deletion, database writes) without explicit user instruction
Sensitive data (secrets, PII) flowing as arguments to network-bound tools
Agent invokes tools in an order inconsistent with its stated plan

Claude Code Mappings

Hooks: pre-bash-destructive.mjs blocks rm -rf, DROP TABLE, and similar; validate this hook is present and covers the full destructive command surface
MCP tools: Each enabled MCP server expands the tool surface — audit mcp.json for over-permissioned servers (e.g., filesystem MCP with write access to /)
Skills with Bash tool: Any skill declaring allowed-tools: Bash can spawn processes; verify the necessity and scope of Bash access in frontmatter
allowed-tools in commands: Commands should declare the minimal tool set required

Mitigations

Apply least-privilege to every tool: scope filesystem access, API permissions, network targets
Validate all tool arguments in PreToolUse hooks before execution
Require explicit human approval for irreversible operations (destructive Bash, git push)
Audit MCP server configurations — each server is an attack surface expansion
Pin tool configurations; detect and alert on changes to tool descriptors

ASI03 — Identity and Privilege Abuse

Category: Identity, credentials, and delegation

Description

Agents often inherit user or system identities including high-privilege credentials, session tokens, and delegated access. Unintended privilege reuse, escalation, or cross-agent delegation without proper scoping creates confused deputy scenarios where the agent acts with permissions it should not exercise.

Attack Vectors

Agent inherits the operator's credentials and uses them beyond the task scope
A compromised subagent operates with the parent agent's delegated identity
Short-lived tokens not used — agent uses long-lived credentials that persist across sessions
Agent escalates its own permissions by requesting elevated access mid-task
Lateral movement: agent uses one system's credentials to authenticate to another

Detection Signals

Credential access from unexpected timing or context (e.g., credentials used outside a task)
Agent accesses resources unrelated to its defined function
Cross-system access chains: authentication to system B immediately after action on system A
Failed permission checks followed by attempts via alternative credential paths
Subagents performing actions requiring higher privileges than delegated

Claude Code Mappings

API keys in environment: Claude Code executes in the user's shell — it inherits all env variables including OPENAI_API_KEY, AZURE_CLIENT_SECRET, etc.
pre-edit-secrets.mjs hook: Detects if secrets are being written to files, but does not prevent an agent from using env-var credentials in Bash commands
--dangerously-skip-permissions: When used in subagent invocations (claude -p), all permission gates are bypassed for that subagent's session
Subagent delegation: Tasks spawned with Task tool receive the parent's tool permissions; verify task prompts do not over-grant scope implicitly

Mitigations

Scope credentials to the minimum required for each task; use task-scoped tokens where possible
Never pass raw secrets as task arguments to subagents
Treat each subagent as a separate identity with its own permission boundary
Audit use of --dangerously-skip-permissions — restrict to headless, sandboxed contexts only
Rotate credentials after agentic sessions that accessed sensitive systems

ASI04 — Agentic Supply Chain Vulnerabilities

Category: Component integrity and provenance

Description

Tools, plugins, prompt templates, MCP servers, and agent definitions fetched or loaded dynamically can be compromised. Any poisoned component alters agent behavior or exposes data, and the attack surface is invisible to static dependency scanning because components resolve at runtime.

Real incident: Malicious MCP servers impersonating legitimate ones, altering tool behavior post-install.

Attack Vectors

Compromised MCP server that behaves correctly during review but exfiltrates data in production
Poisoned skill/command markdown fetched from a remote source
Agent definition files modified in a plugin repository after installation
Typosquatted MCP server names registered to intercept installs
Plugin manifest (plugin.json) tampered to add unauthorized tool permissions

Detection Signals

MCP server making network connections to undocumented endpoints
Plugin files modified after initial installation (file hash change)
New tool capabilities appearing after a plugin update
Agent behavior changing without corresponding code change
hooks.json or plugin.json modifications not tied to a commit

Claude Code Mappings

plugin.json manifest: The auto_discover: true setting means any file in the plugin directory is trusted; a supply chain compromise of the plugin repo affects all commands and agents
MCP server configurations: mcp.json and .mcp.json files define which servers run — a tampered server definition is a full agent compromise
External skill references: Skills referencing remote URLs for knowledge base content introduce runtime supply chain risk
hooks/hooks.json: A modified hooks file can add, remove, or neuter security hooks silently

Mitigations

Pin MCP server versions; verify checksums before use
Monitor plugin directory files for unexpected modifications (file integrity monitoring)
Audit plugin.json, hooks.json, and all agent frontmatter on each session start
Prefer local MCP servers over remote for sensitive operations; limit network-bound servers
Review MCP server source code before enabling; treat third-party servers as untrusted by default

ASI05 — Unexpected Code Execution

Category: Code generation and execution safety

Description

Agents generate or execute code unsafely through shell commands, eval-like constructs, script execution, or deserialization. The attack path runs directly from text input to system commands. Coding agents like Claude Code are high-risk because code generation and execution are core features.

Attack Vectors

Prompt injection in source code comments causes agent to generate and run malicious shell commands
Agent generates a "helpful" script that includes attacker-controlled payload
eval() or exec() applied to LLM output without sandboxing
Agent patches a configuration file in a way that achieves code execution on next load
Hallucinated library name installed via npm install or pip install (slopsquatting)

Detection Signals

Shell commands spawned that were not present in the original task specification
Writes to executable paths (/usr/local/bin, .bashrc, ~/.zshrc, cron directories)
package.json or requirements.txt modified with packages not in the original task
Agent generates code containing subprocess, os.system, eval, exec without review gate
Writes to .github/workflows/, Makefile, or other CI/CD configuration files

Claude Code Mappings

pre-bash-destructive.mjs hook: First line of defense, but only blocks known-bad patterns; novel payloads may pass through
Skills with Bash allowed-tools: Any skill that can run Bash can achieve code execution — validate each skill's tool list is scoped to its purpose
allowed-tools: Write + Bash: A skill with both Write and Bash can write a script and execute it — this combination requires strong justification
MCP filesystem tools: MCP servers with write access to executable paths are equivalent to unrestricted code execution

Mitigations

Sandbox Bash execution: use restricted shells, containers, or read-only mounts where possible
Require human approval before any write to executable or configuration paths
Block installation of packages not in an approved list (pre-bash hook pattern matching)
Never auto-approve actions triggered by content read from external sources (files, web, MCP)
Treat all generated code as untrusted until reviewed; do not auto-execute

ASI06 — Memory and Context Poisoning

Category: State integrity and persistence

Description

Agents rely on memory systems, embeddings, RAG databases, context windows, and summaries to maintain state across interactions. Attackers poison this memory to influence future decisions persistently. Unlike one-shot injection, memory poisoning executes on every future session without repeated attack.

Attack Vectors

Adversarial text injected into a document that gets stored in a RAG knowledge base
Agent's session summary poisoned with false "user preferences" that persist
Cross-tenant memory leakage: one user's poisoned entry affects another user's agent session
Long-term drift: repeated exposure to adversarial content gradually shifts agent behavior
REMEMBER.md or session state files modified to contain false context

Detection Signals

Agent references facts or preferences not established in the current session
Agent defends false beliefs when challenged with contradictory evidence
Behavioral changes appearing after a specific file read or knowledge base query
REMEMBER.md or project memory files contain entries inconsistent with recent commits
Agent applies "learned preferences" that the user did not specify

Claude Code Mappings

REMEMBER.md files: These are trusted by default and read as ground truth at session start; a tampered REMEMBER.md poisons every session in that project
MEMORY.md / project memory: The ~/.claude/projects/ memory files are not version-controlled by default — they can be silently modified
System prompt context: Skills/commands that inject large context blocks affect the agent's reasoning for the entire session
KV store / MCP memory servers: Any MCP server providing persistent memory is a poison vector

Mitigations

Version-control all state files (REMEMBER.md, CLAUDE.md) and review diffs before trusting
Treat external knowledge base content as untrusted data, not trusted instructions
Audit session memory files for entries not traceable to a user action or commit
Set explicit expiration on memory entries; do not persist indefinitely without review
Segment memory by trust level: user-supplied vs system-generated vs external-sourced

ASI07 — Insecure Inter-Agent Communication

Category: Multi-agent protocol integrity

Description

In multi-agent architectures, agents coordinate through message passing over MCP, RPC, shared files, or direct API calls. These channels often lack authentication or integrity verification. Attackers spoof identities, replay delegation messages, or tamper with unprotected channels to manipulate downstream agents through compromised peers.

Attack Vectors

Subagent receives a task prompt that appears to come from the orchestrator but is spoofed
Shared scratch file used for inter-agent communication modified by a malicious process
Replayed delegation token used to authorize an agent action outside its original context
Orchestrator output piped through an untrusted channel before reaching worker agents
A compromised worker agent sends poisoned results to the orchestrator, affecting decisions

Detection Signals

Agent task prompts referencing context not present in the parent agent's output
Unexpected agent spawned without a corresponding Task call in the orchestrator
Results returned by a subagent inconsistent with the task it was given
Communication over channels (files, pipes) without integrity verification
Agent claims to have received instructions from another agent, but no delegation record exists

Claude Code Mappings

Task tool: Subagents receive their full task prompt in plaintext with no authentication; a compromised orchestrator or prompt-injected task string is fully trusted by the subagent
Shared file channels: Agents that communicate via shared files (e.g., /tmp/results.json) have no message authentication — any process can modify the file
MCP as communication bus: Multiple agents using the same MCP server share state without isolation; one agent can read or modify another's data if the server lacks tenancy controls
Harness loop state files: Files like pipeline-queue.json used for agent coordination are unauthenticated and modifiable

Mitigations

Treat inter-agent messages as untrusted until verified; do not assume orchestrator authenticity
Validate subagent inputs at the receiving end, not just at the sending end
Use cryptographically signed task descriptions for high-stakes multi-agent workflows
Isolate MCP server state per agent session; avoid shared mutable state across agents
Log all inter-agent communications with full payloads for forensic capability

ASI08 — Cascading Failures

Category: System resilience and blast radius

Description

In interconnected multi-agent architectures, a single compromised or hallucinating agent can propagate errors, malicious actions, or corrupted state to downstream agents. A small planning error compounds rapidly: a hallucinating planner issues destructive tasks to multiple worker agents that execute without verification, multiplying the blast radius.

Attack Vectors

Orchestrator agent hallucinates a task step; all downstream agents execute the bad instruction
A prompt-injected agent poisons shared state, affecting all agents reading that state
One agent's API error causes retry storms across dependent agents
A worker agent produces malformed output that causes the next agent to execute a fallback path with unintended side effects
Circular agent delegation creates unbounded loops consuming resources and taking actions

Detection Signals

Multiple agents failing or producing anomalous output simultaneously
Correlated errors across previously independent agents within the same pipeline
Single upstream action traceable as root cause of widespread downstream failures
Agent spawning subagents recursively without a documented depth limit
Resource consumption (API calls, file writes, tokens) growing super-linearly during a task

Claude Code Mappings

Multi-agent harness loops: harness:loop runs autonomous multi-session pipelines — a poisoned session early in the loop propagates through all subsequent sessions
Parallel Task invocations: When multiple subagents run in parallel, a shared bad state (e.g., poisoned REMEMBER.md) affects all simultaneously
Feature pipeline queues: pipeline-queue.json state drives downstream agent selection; a corrupted queue entry causes all subsequent features to be processed incorrectly
Newsletter/research pipelines: Phase-based pipelines with no inter-phase validation gates allow phase 1 errors to compound through phases 2-N

Mitigations

Implement circuit breakers: halt the pipeline if an agent returns anomalous output
Define explicit depth limits for agent spawning; enforce in orchestrator logic
Validate inter-phase state before proceeding to the next phase in any pipeline
Test failure propagation in isolated environments before running in production
Design for independent agent failure: each agent should be able to fail without corrupting others

ASI09 — Human-Agent Trust Exploitation

Category: Human oversight and social engineering

Description

Users and operators over-trust agent recommendations due to their confident, authoritative presentation. Attackers or misaligned agents exploit this trust to influence high-stakes decisions, extract credentials, approve fraudulent actions, or introduce vulnerabilities into production systems under the guise of helpful assistance.

Real incidents: Coding assistants introducing backdoors in reviewed-but-not-read code; financial copilots approving fraudulent transactions; support agents soliciting credentials.

Attack Vectors

Agent provides well-reasoned justification for a malicious action, exploiting approval fatigue
Urgent framing pressures operators to approve without full review ("fix needed before deployment")
Agent requests credentials "to complete the task" outside its normal operating context
Confidence in AI output leads users to skip review of generated code containing vulnerabilities
An attacker controls the task that the agent presents as a routine operation requiring approval

Detection Signals

Agent requesting credentials or sensitive information not scoped to the current task
Approval prompts for actions the agent has not performed before in similar tasks
Agent citing urgency or external deadlines to bypass normal review processes
Recommendations that contradict the project's security policy or CLAUDE.md constraints
High approval rates for novel agent actions without corresponding user scrutiny

Claude Code Mappings

Permission prompts: Claude Code's permission system depends on informed user consent; a socially-engineered prompt obscures the actual action being approved
--dangerously-skip-permissions: Removes human-in-the-loop for all tool use — this flag exists to serve legitimate automation but eliminates the trust exploitation defence layer
Hooks as UI: Users may approve hook-gated actions without reading the full command; hook output text should be explicit and non-manipulable by agent-generated content
CLAUDE.md trust: Users trust CLAUDE.md as a source of truth; a modified CLAUDE.md that relaxes security constraints exploits operator trust in project configuration

Mitigations

Display full tool arguments in approval prompts — never summarize or truncate
Enforce time-boxed review for high-impact actions (git push, API calls, secret access)
Train operators to treat credential requests mid-task as high-risk signals
Audit CLAUDE.md and project configuration files for unexpected permission relaxations
Implement tiered approval: routine actions auto-approved, novel or high-impact actions require explicit human review with stated justification

ASI10 — Rogue Agents

Category: Agent alignment and containment

Description

Rogue agents are compromised or misaligned agents that act harmfully while appearing legitimate. They may self-replicate, persist across sessions, impersonate other agents, or silently pursue objectives that diverge from their stated purpose. Unlike externally attacked agents, rogue agents develop or maintain misaligned behavior from within.

Attack Vectors

An agent rewrites its own system prompt or configuration files to persist misaligned behavior
A compromised skill modifies other skills or commands to introduce backdoor capabilities
An agent spawns persistent background processes that continue acting after the session ends
Gradual behavioral drift: an agent's behavior shifts incrementally to avoid detection
An agent registers new hooks that run with elevated permissions outside the original task scope

Detection Signals

Configuration files (CLAUDE.md, plugin.json, hooks.json) modified without a corresponding user instruction or commit
New hooks, commands, or skills appearing after an agentic session
Background processes spawned by an agent that persist after Claude Code exits
Agent behavior in session N differs from session N-1 without configuration changes
Agent output defending or concealing its own modifications to the project

Claude Code Mappings

Self-modifying hooks: An agent with Write access to hooks/scripts/ can add or modify hooks that affect all future sessions — this is a high-privilege write path
plugin.json modification: An agent writing to .claude-plugin/plugin.json can change its own allowed tools, adding capabilities not granted by the user
CLAUDE.md self-modification: An agent instructed to "update documentation" could modify its own behavioral instructions, achieving persistent misalignment
Persistent background Bash: An agent can use Bash to spawn detached processes (nohup, &) that continue running after the Claude Code session terminates
Hook registration via hooks.json: Adding new hook entries expands the agent's execution surface permanently

Mitigations

File-protect critical configuration: plugin.json, hooks.json, CLAUDE.md should require explicit user action to modify (not writable by agents as a side effect of other tasks)
Use pre-write-pathguard.mjs to block writes to hook scripts and plugin manifests by default
Monitor for new background processes after agentic sessions complete
Diff hooks.json and plugin.json after every session that involved Write tool use
Enforce the principle that an agent should never modify the files that define its own behavior

Cross-Cutting Concerns

ASI vs LLM01-LLM10 Relationship

The OWASP LLM Top 10 covers model-level risks. The OWASP Agentic Top 10 covers risks that emerge specifically from autonomous, tool-using, multi-agent architectures. Many ASI categories amplify LLM risks:

LLM Risk	Agentic Amplification
LLM01 Prompt Injection	Becomes ASI01 (goal hijack with tool execution)
LLM06 Excessive Agency	Becomes ASI02 (tool misuse) + ASI03 (privilege abuse)
LLM03 Supply Chain	Becomes ASI04 (runtime plugin/MCP compromise)
LLM08 Vector Weaknesses	Becomes ASI06 (memory poisoning with persistence)

ASI vs DeepMind AI Agent Traps

The DeepMind "AI Agent Traps" taxonomy (April 2026) classifies attacks by technique rather than by risk category. Each ASI risk maps to one or more trap categories:

ASI Risk	DeepMind Trap Categories	Key Techniques
ASI01 Goal Hijack	Cat. 1 (Content Injection), Cat. 2 (Semantic Manipulation)	Steganography, syntactic masking, oversight evasion, context normalization
ASI02 Tool Misuse	Cat. 5 (Capability Manipulation)	Bash evasion, tool descriptor poisoning, ambiguous prompt exploitation
ASI03 Privilege Abuse	Cat. 5 (Capability Manipulation)	Privilege escalation, credential access via env vars
ASI04 Supply Chain	Cat. 5 (Capability Manipulation)	Compromised packages, MCP descriptor drift
ASI05 Code Execution	Cat. 5 (Capability Manipulation)	Parameter expansion evasion, eval injection
ASI06 Memory Poisoning	Cat. 3 (Context Manipulation)	CLAUDE.md poisoning, REMEMBER.md manipulation, rule injection
ASI07 Inter-Agent Comms	Cat. 4 (Multi-Agent Exploitation)	Sub-agent spawning, delegation abuse, trust chain attacks
ASI08 Cascading Failures	Cat. 4 (Multi-Agent Exploitation)	Escalation-after-input, poisoned shared state
ASI09 Trust Exploitation	Cat. 6 (HITL Exploitation), Cat. 2 (Semantic Manipulation)	Approval urgency, summary suppression, cognitive load traps
ASI10 Rogue Agents	Cat. 3 (Context Manipulation), Cat. 5 (Capability Manipulation)	Self-modification, persistent background processes

See knowledge/deepmind-agent-traps.md for the full 6-category taxonomy with per-technique coverage status and plugin control mappings.

Claude Code Security Posture Checklist

For scanning agents assessing a Claude Code project against ASI categories:

Check	ASI	Risk if Missing
`pre-bash-destructive.mjs` hook present	ASI02, ASI05	Unrestricted code execution
`pre-write-pathguard.mjs` blocks hook/plugin paths	ASI10	Rogue agent persistence
`pre-edit-secrets.mjs` hook present	ASI03	Credential exfiltration
All skills declare minimal `allowed-tools`	ASI02	Over-privileged tool use
MCP servers scoped and reviewed	ASI02, ASI04	Supply chain + tool misuse
No `--dangerously-skip-permissions` in production	ASI09	No human oversight layer
`CLAUDE.md` and `plugin.json` not writable by agents	ASI10	Self-modification
Inter-agent state files (REMEMBER.md) version-controlled	ASI06, ASI08	Context poisoning
Subagent task prompts do not include raw secret values	ASI03	Credential leakage
Pipeline depth limits defined for multi-agent workflows	ASI08	Cascading failures

Severity Classification for Automated Scanning

Severity	Criteria	ASI Categories
Critical	Direct code execution or credential exfiltration possible	ASI02, ASI03, ASI05
High	Agent goal or memory manipulation with persistence	ASI01, ASI06, ASI10
Medium	Supply chain or inter-agent trust boundary violation	ASI04, ASI07, ASI08
Low	Human oversight weakness; requires user interaction	ASI09
Informational	Cascading risk only if other ASI also present	ASI08

Source: OWASP GenAI Security Project, "OWASP Top 10 for Agentic Applications (2026)" Released: December 2025 | https://genai.owasp.org Claude Code mappings authored for llm-security plugin v0.1, updated v5.0 with AI Agent Traps cross-references

29 KiB Raw Blame History

OWASP Top 10 for Agentic AI Applications (2026)

ASI01 — Agent Goal Hijack

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI02 — Tool Misuse and Exploitation

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI03 — Identity and Privilege Abuse

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI04 — Agentic Supply Chain Vulnerabilities

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI05 — Unexpected Code Execution

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI06 — Memory and Context Poisoning

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI07 — Insecure Inter-Agent Communication

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI08 — Cascading Failures

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI09 — Human-Agent Trust Exploitation

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

ASI10 — Rogue Agents

Description

Attack Vectors

Detection Signals

Claude Code Mappings

Mitigations

Cross-Cutting Concerns

ASI vs LLM01-LLM10 Relationship

ASI vs DeepMind AI Agent Traps

Claude Code Security Posture Checklist

Severity Classification for Automated Scanning

29 KiB

Raw Blame History