feat: initial agent-builder plugin (v0.1.0)
Build complete autonomous agent systems with Claude Code. 7-phase guided workflow: map work, CLAUDE.md, agent team, pipeline, security, deployment, test. Components: - commands/build.md: main guided workflow - agents/builder.md: scaffolding agent - skills/agent-system-design: architecture knowledge + 4 references - scripts/templates: hooks, automation, launchd, systemd Covers 22 OpenClaw capabilities across 4 deployment targets (local, Mac Mini, VPS, Managed Agents). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
commit
075383990f
17 changed files with 1895 additions and 0 deletions
115
skills/agent-system-design/SKILL.md
Normal file
115
skills/agent-system-design/SKILL.md
Normal file
|
|
@ -0,0 +1,115 @@
|
|||
---
|
||||
name: agent-system-design
|
||||
description: |
|
||||
This skill should be used when the user asks about "building an agent",
|
||||
"autonomous agent system", "agent architecture", "OpenClaw alternative",
|
||||
"always-on agent", "personal AI agent", "complete agent system",
|
||||
"agent that runs itself", "agent pipeline design", "multi-agent system",
|
||||
"how to build an agent with Claude Code"
|
||||
version: 0.1.0
|
||||
---
|
||||
|
||||
## What is an autonomous agent system
|
||||
|
||||
An autonomous agent system is a set of Claude Code components that work together to execute multi-step workflows with minimal human intervention. The system runs on a schedule, responds to triggers, processes inputs, and produces outputs — all orchestrated through Claude Code's native primitives.
|
||||
|
||||
You do not need a separate orchestration framework. Claude Code provides everything required: subagents, skills, hooks, and automation scripts.
|
||||
|
||||
## Core architecture pattern
|
||||
|
||||
The foundational pattern is a three-agent pipeline with a skill that chains them:
|
||||
|
||||
```
|
||||
Trigger (launchd/cron/manual)
|
||||
→ Pipeline skill
|
||||
→ Agent 1: Researcher (gather and structure inputs)
|
||||
→ Agent 2: Writer (produce primary output)
|
||||
→ Agent 3: Reviewer (evaluate and approve or revise)
|
||||
→ Save outputs
|
||||
→ Update memory
|
||||
```
|
||||
|
||||
The pipeline skill is a `.claude/skills/<name>/SKILL.md` file with step-by-step instructions. Each step invokes an agent using the `Agent` tool. Agents are defined in `.claude/agents/*.md`.
|
||||
|
||||
This pattern scales from simple (one agent, one step) to complex (six agents, branching logic, parallel execution).
|
||||
|
||||
## System components
|
||||
|
||||
A complete agent system has seven components:
|
||||
|
||||
| Component | Location | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| CLAUDE.md | project root | Context, rules, and memory pointers for the project |
|
||||
| Agents | `.claude/agents/*.md` | Specialized subagents with defined roles and tool access |
|
||||
| Pipeline skills | `.claude/skills/*/SKILL.md` | Orchestration sequences that chain agents |
|
||||
| Knowledge skills | `.claude/skills/*/SKILL.md` | Reference knowledge auto-injected by topic |
|
||||
| Hooks | `hooks/*.sh` | Pre/post tool use guards for automated safety |
|
||||
| Settings | `.claude/settings.json` | Permissions, tool allowlists, and hook wiring |
|
||||
| Automation | `scripts/` + `launchd/` | Scheduled execution (Mac: launchd, Linux: systemd/cron) |
|
||||
| Memory | `memory/` or `data/` | Persistent state files updated each run |
|
||||
|
||||
Not all components are required for every system. Start with agents + one pipeline skill. Add hooks and automation when you move to scheduled/unattended operation.
|
||||
|
||||
## Design principles
|
||||
|
||||
**Start with 2-3 agents.** A researcher and a writer cover most content workflows. A reviewer adds quality gates. Resist the urge to create more agents than you need — each agent adds latency and cost.
|
||||
|
||||
**Pipeline skills chain agents.** The skill file is the orchestrator. It contains the sequencing logic, error handling instructions, and output routing. Keep agents dumb and focused; put the workflow intelligence in the skill.
|
||||
|
||||
**Hooks protect automated runs.** When Claude runs unattended, hooks are your circuit breakers. Use `PreToolUse` hooks to block writes to sensitive paths, enforce naming conventions, or validate inputs before destructive operations. Without hooks, an unattended run has no guardrails.
|
||||
|
||||
**Memory persists state between runs.** Agents cannot remember previous sessions by default. A memory file (e.g., `memory/MEMORY.md` or `data/run-state.json`) gives the system continuity. Update it at the end of every pipeline run.
|
||||
|
||||
**One concern per agent.** Agents that do too many things are hard to tune and debug. A researcher should research. A writer should write. Mixing concerns makes prompt engineering harder and output quality lower.
|
||||
|
||||
## Deployment options
|
||||
|
||||
| Platform | Best for | Notes |
|
||||
|----------|----------|-------|
|
||||
| Local workstation | Development, on-demand runs | Use launchd (Mac) or cron (Linux) for scheduling |
|
||||
| Mac Mini (always-on) | Personal production pipelines | Runs overnight; launchd + wake schedule |
|
||||
| VPS (Hetzner, DO) | Server-side automation, webhooks | systemd service; pair with SSH access |
|
||||
| Managed Agents | High-volume, API-driven workflows | Claude API `/v1/agents` endpoint; no local shell needed |
|
||||
|
||||
For most personal or small-team use cases, a Mac Mini or VPS running launchd/systemd is sufficient and much cheaper than Managed Agents at scale.
|
||||
|
||||
## OpenClaw capability coverage
|
||||
|
||||
This architecture covers 22 OpenClaw capabilities. 13 are a full match via native Claude Code primitives. 8 use a different approach but achieve the same outcome. 1 gap remains.
|
||||
|
||||
For the detailed mapping, see:
|
||||
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md`
|
||||
|
||||
## Agent frontmatter fields
|
||||
|
||||
```yaml
|
||||
name: <slug> # required, used for routing
|
||||
description: | # required, must include <example> blocks
|
||||
...
|
||||
model: sonnet|opus # required
|
||||
tools: [...] # required, explicit allowlist
|
||||
color: <color> # optional, UI hint
|
||||
```
|
||||
|
||||
The `description` field is what triggers agent selection. Write it with concrete user phrases, not abstract capability descriptions.
|
||||
|
||||
## Common mistakes
|
||||
|
||||
- **No examples in agent description** — Claude cannot reliably select the agent without seeing what user messages should trigger it
|
||||
- **Hooks skipped in automation** — unattended runs need guards; add hooks before scheduling
|
||||
- **Memory not updated** — next run starts blind; always write state at the end of a pipeline
|
||||
- **One giant agent** — harder to tune, harder to debug; split by role
|
||||
- **Wrong model for the job** — opus for synthesis and narrative, sonnet for retrieval and transformation
|
||||
|
||||
## Getting started
|
||||
|
||||
Run `/agent-builder:build` for the guided 7-phase workflow. It interviews you about your use case, selects the right pattern, and generates all files.
|
||||
|
||||
For pipeline design patterns and agent role templates, see:
|
||||
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/pipeline-patterns.md`
|
||||
|
||||
For scheduling and deployment configuration, see:
|
||||
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/deployment-config.md`
|
||||
|
||||
For the OpenClaw feature map, see:
|
||||
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md`
|
||||
189
skills/agent-system-design/references/deployment-targets.md
Normal file
189
skills/agent-system-design/references/deployment-targets.md
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
# Deployment Targets
|
||||
|
||||
Reference for the agent-system-design skill. Covers the four deployment platforms
|
||||
available for Claude Code agents. Use this to guide target selection and scaffold
|
||||
the correct infrastructure files.
|
||||
|
||||
---
|
||||
|
||||
## 1. Local (cron/launchd)
|
||||
|
||||
**How it works:** Claude Code CLI invoked by the host scheduler. No persistent process;
|
||||
each run starts a fresh Claude Code session and exits.
|
||||
|
||||
**Setup files to scaffold:**
|
||||
- `automation.sh` -- wrapper script that sets env vars and invokes `claude`
|
||||
- Cron entry comment block in README, or `launchd.plist` for macOS
|
||||
|
||||
**Cron pattern:**
|
||||
```
|
||||
0 5 * * * /path/to/project/automation.sh >> /var/log/agent.log 2>&1
|
||||
```
|
||||
|
||||
**launchd pattern (macOS):**
|
||||
```xml
|
||||
<key>StartCalendarInterval</key>
|
||||
<dict>
|
||||
<key>Hour</key><integer>5</integer>
|
||||
<key>Minute</key><integer>0</integer>
|
||||
</dict>
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Simple setup, no infrastructure cost
|
||||
- Full tool access (filesystem, shell, MCP)
|
||||
- Easy to test locally before any deployment
|
||||
|
||||
**Cons:**
|
||||
- Not always-on between scheduled runs
|
||||
- No mobile access without a channel plugin
|
||||
- Log rotation must be managed manually
|
||||
|
||||
**Best for:** Development, testing, personal daily pipelines (nightly batch, ecosystem pulse).
|
||||
|
||||
---
|
||||
|
||||
## 2. Mac Mini (launchd + channels)
|
||||
|
||||
**How it works:** A dedicated Mac runs Claude Code inside a persistent `tmux` session.
|
||||
`launchd` ensures the session restarts on reboot. Channel plugins (iMessage, Telegram)
|
||||
provide mobile access.
|
||||
|
||||
**Setup files to scaffold:**
|
||||
- `launchd.plist` -- keeps the tmux session alive across reboots
|
||||
- `tmux-start.sh` -- creates/attaches the named session
|
||||
- `.mcp.json` -- channel MCP server (Telegram or iMessage)
|
||||
- `channels-guide.md` -- instructions for mobile interaction
|
||||
|
||||
**tmux session pattern:**
|
||||
```bash
|
||||
tmux new-session -d -s agent -x 220 -y 50
|
||||
tmux send-keys -t agent "claude --model claude-sonnet-4-6" Enter
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Always-on between scheduled runs
|
||||
- Native iMessage support (macOS-only feature)
|
||||
- Full GUI access for Computer Use (Desktop app)
|
||||
- Local hardware, no cloud dependency
|
||||
|
||||
**Cons:**
|
||||
- Requires dedicated physical hardware
|
||||
- iMessage and Computer Use are macOS-only
|
||||
- Single machine; no horizontal scaling
|
||||
|
||||
**Best for:** Personal always-on agent with phone access, Computer Use workflows,
|
||||
home automation pipelines.
|
||||
|
||||
---
|
||||
|
||||
## 3. VPS (systemd + cron)
|
||||
|
||||
**How it works:** Linux server runs Claude Code headless. `systemd` manages the service
|
||||
lifecycle. Cron handles scheduled tasks. Telegram or Slack provides the channel layer.
|
||||
|
||||
**Setup files to scaffold:**
|
||||
- `systemd/agent.service` -- service unit for the agent process
|
||||
- `systemd/agent.timer` -- timer unit for scheduled runs (alternative to cron)
|
||||
- `automation.sh` -- invocation wrapper
|
||||
- `.mcp.json` -- Telegram or Slack MCP server
|
||||
- `channels-guide.md` -- instructions for team interaction
|
||||
|
||||
**systemd service pattern:**
|
||||
```ini
|
||||
[Service]
|
||||
ExecStart=/path/to/automation.sh
|
||||
Restart=on-failure
|
||||
User=agent
|
||||
Environment=HOME=/home/agent
|
||||
```
|
||||
|
||||
**systemd timer pattern:**
|
||||
```ini
|
||||
[Timer]
|
||||
OnCalendar=*-*-* 05:00:00
|
||||
Persistent=true
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- True always-on with automatic restart on failure
|
||||
- Scalable: multiple agents on one server or across servers
|
||||
- Team access via shared Telegram/Slack channel
|
||||
- No hardware to manage locally
|
||||
|
||||
**Cons:**
|
||||
- No iMessage (Linux)
|
||||
- No Computer Use (no display server in headless mode)
|
||||
- Requires server management (security patches, disk, logs)
|
||||
|
||||
**Best for:** Server-side agents, team pipelines, production workloads, nightly batch
|
||||
that must run reliably even when the developer's laptop is off.
|
||||
|
||||
---
|
||||
|
||||
## 4. Managed Agents (Anthropic API)
|
||||
|
||||
**How it works:** Anthropic hosts the agent runtime. The builder deploys via the
|
||||
`/v1/agents` and `/v1/sessions` REST API, or uses `@anthropic-ai/sdk` in TypeScript/Python.
|
||||
No local Claude Code installation required.
|
||||
|
||||
**Setup files to scaffold:**
|
||||
- `agent.ts` or `agent.py` -- SDK code defining the agent
|
||||
- `sessions.ts` -- session management helpers
|
||||
- `.env.template` -- API key template
|
||||
- `README.md` -- deployment and configuration instructions
|
||||
|
||||
**TypeScript pattern:**
|
||||
```typescript
|
||||
import Anthropic from "@anthropic-ai/sdk";
|
||||
const client = new Anthropic();
|
||||
const session = await client.sessions.create({ agent_id: "ag_..." });
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Cloud-native, zero infrastructure to manage
|
||||
- Scales automatically across sessions
|
||||
- Persistent sessions via the API
|
||||
- Integrates directly into SaaS products or APIs
|
||||
|
||||
**Cons:**
|
||||
- Different architecture from Claude Code CLI (no local filesystem by default)
|
||||
- API costs per token, no flat rate
|
||||
- Less direct filesystem access than CLI agents
|
||||
- Feature set tied to what the Managed Agents API exposes
|
||||
|
||||
**Best for:** Production deployment at scale, SaaS product integration, agents that
|
||||
must be accessible to end users without CLI access.
|
||||
|
||||
---
|
||||
|
||||
## Comparison Matrix
|
||||
|
||||
| Dimension | Local (cron) | Mac Mini | VPS | Managed API |
|
||||
|-----------|-------------|----------|-----|-------------|
|
||||
| Always-on | No | Yes | Yes | Yes |
|
||||
| Computer Use | No | Yes (Desktop) | No | No |
|
||||
| iMessage | No | Yes | No | No |
|
||||
| Telegram/Slack | Via MCP | Via MCP | Via MCP | Via integration |
|
||||
| Infrastructure cost | Zero | Hardware | VPS fee | API token cost |
|
||||
| Setup complexity | Low | Medium | Medium | Low (SDK) |
|
||||
| Team access | No | Limited | Yes | Yes |
|
||||
| Filesystem access | Full | Full | Full | Limited |
|
||||
| Horizontal scaling | No | No | Manual | Automatic |
|
||||
| Best environment | Dev/test | Personal | Team/prod | SaaS/prod |
|
||||
|
||||
## Scaffold Decision Guide
|
||||
|
||||
1. **Is this a personal agent for one developer?** Start with Local. Upgrade to Mac Mini
|
||||
if always-on or iMessage is required.
|
||||
|
||||
2. **Does the agent need Computer Use?** Mac Mini with Claude Code Desktop is the only
|
||||
option today. Document this constraint if the user is on Linux or Windows.
|
||||
|
||||
3. **Is this a team or production workload?** VPS with systemd. Add Telegram/Slack channel.
|
||||
|
||||
4. **Is this going into a product or SaaS?** Managed API. Scaffold TypeScript SDK code,
|
||||
not a CLAUDE.md agent.
|
||||
|
||||
5. **Do not mix targets in one scaffold.** Pick one. Document the alternatives in the
|
||||
`## Deployment` section of the generated README.
|
||||
80
skills/agent-system-design/references/feature-map.md
Normal file
80
skills/agent-system-design/references/feature-map.md
Normal file
|
|
@ -0,0 +1,80 @@
|
|||
# OpenClaw vs Claude Code Feature Map
|
||||
|
||||
Builder reference for the agent-system-design skill. For each capability, the table shows
|
||||
what to scaffold when Claude Code is the target runtime.
|
||||
|
||||
## Capability Coverage
|
||||
|
||||
| # | Capability | Status | What to scaffold | Min CC version |
|
||||
|---|-----------|--------|-----------------|---------------|
|
||||
| 1 | Agent Runtime | OK | `CLAUDE.md` + `settings.json` | v2.1.84 |
|
||||
| 2 | Shell Execution | OK | `hooks/pre-tool-use.sh` + deny list in `settings.json` | v2.1.78 |
|
||||
| 3 | File I/O | OK | `settings.json` allow list under `permissions.allow` | baseline |
|
||||
| 4 | Web Search | OK | `settings.json` allow list (WebSearch tool) | baseline |
|
||||
| 5 | Browser | OK | `.mcp.json` with Playwright server entry | external |
|
||||
| 6 | Computer Use | Docs | README note: Desktop app required, not available headless | v2.1.86 |
|
||||
| 7 | Memory | Partial | `memory/MEMORY.md` + memory block in `CLAUDE.md` | v2.1.32 |
|
||||
| 8 | Multi-Agent | OK | `.claude/agents/*.md` subagent definitions | v2.1.32 |
|
||||
| 9 | Messaging | Partial | `.mcp.json` Slack server + channels guide in README | v2.1.80 |
|
||||
| 10 | Model Providers | Partial | `model:` frontmatter in agent `.md` files | baseline |
|
||||
| 11 | Cron/Automation | OK | `automation.sh` wrapper + `launchd.plist` or `crontab` entry | v2.1.71 |
|
||||
| 12 | Always-On | Partial | `launchd`/`systemd` service + `tmux` session guide | infra |
|
||||
| 13 | Plugin System | OK | Plugin manifest (`plugin.json` + `CLAUDE.md`) | v2.1.84 |
|
||||
| 14 | Skills | OK | `.claude/skills/*.md` skill definitions | baseline |
|
||||
| 15 | Security | OK | Hooks + permissions deny list + audit log | v2.1.78 |
|
||||
| 16 | Voice/TTS | Docs | README note: MCP-based approach, no native support | N/A |
|
||||
| 17 | Companion Apps | Docs | README refs to Desktop app and Dispatch channel | v2.1.85 |
|
||||
| 18 | Gateway | Partial | `/schedule` skill + HTTP webhook hooks | v2.1.63 |
|
||||
| 19 | Canvas/A2UI | Gap | Playwright workaround only, no native equivalent | N/A |
|
||||
| 20 | Configuration | OK | `settings.json` + `CLAUDE.md` hierarchy | v2.1.84 |
|
||||
| 21 | Chat Commands | OK | `.claude/skills/*.md` (slash commands) | baseline |
|
||||
| 22 | CLI | OK | Wrapper scripts (`automation.sh`, `run-agent.sh`) | baseline |
|
||||
|
||||
**Score: 13 full OK (59%) | 8 different approach/Partial/Docs (36%) | 1 gap (5%)**
|
||||
**Minimum version for full coverage: v2.1.86** (Computer Use requires Desktop app)
|
||||
|
||||
## Status Key
|
||||
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| OK | Native Claude Code equivalent, scaffold directly |
|
||||
| Partial | Functional but requires workaround or external integration |
|
||||
| Docs | No runtime equivalent; document the limitation and alternative |
|
||||
| Gap | No practical equivalent in Claude Code today |
|
||||
|
||||
## Scaffold Actions by Status
|
||||
|
||||
**OK** -- Generate the file(s) listed in "What to scaffold". Standard templates apply.
|
||||
|
||||
**Partial** -- Scaffold what exists, add a `## Limitations` section to the README noting
|
||||
the gap and the workaround. Do not promise feature parity.
|
||||
|
||||
**Docs** -- Add a `## Notes` section to the README only. Do not scaffold non-existent
|
||||
infrastructure. Link to the relevant Anthropic documentation or issue.
|
||||
|
||||
**Gap** -- Add a `## Known Gaps` section. Acknowledge the gap, document the workaround
|
||||
(Playwright for Canvas/A2UI), and note if it is on the roadmap.
|
||||
|
||||
## Claude Code Ecosystem Map
|
||||
|
||||
How Claude Code components map to the OpenClaw product family:
|
||||
|
||||
| Claude Code component | OpenClaw equivalent | Notes |
|
||||
|----------------------|--------------------|----|
|
||||
| Claude Code CLI | OpenClaw core agent | Headless, full tool access |
|
||||
| Claude Code Desktop | OpenClaw + macOS app | Adds Computer Use, GUI |
|
||||
| Cowork | OpenClaw for non-developers | Simplified UX, no CLI |
|
||||
| Dispatch | Telegram/WhatsApp channels | Mobile access layer |
|
||||
| `/schedule` skill | HEARTBEAT.md cron | Scheduled agent triggers |
|
||||
| Anthropic Agent SDK | OpenClaw API | Managed agents via `/v1/agents` |
|
||||
|
||||
## Version Compatibility Notes
|
||||
|
||||
- **baseline**: Available since first public Claude Code release; no version gate.
|
||||
- **external**: Depends on MCP server availability, not Claude Code version.
|
||||
- **infra**: Depends on the deployment host (macOS/Linux), not Claude Code version.
|
||||
- **N/A**: Not applicable to Claude Code; alternative approach required.
|
||||
|
||||
When scaffolding for a specific Claude Code version, check that all required capabilities
|
||||
meet the min version. If the user's version is below v2.1.86, exclude Computer Use from
|
||||
the feature set and document it under Known Gaps.
|
||||
357
skills/agent-system-design/references/pipeline-patterns.md
Normal file
357
skills/agent-system-design/references/pipeline-patterns.md
Normal file
|
|
@ -0,0 +1,357 @@
|
|||
# Pipeline Patterns Reference
|
||||
|
||||
Detailed patterns for designing multi-agent pipelines in Claude Code.
|
||||
|
||||
---
|
||||
|
||||
## The 3-agent pattern
|
||||
|
||||
The foundational pattern for autonomous content and analysis workflows.
|
||||
|
||||
**Roles:**
|
||||
- **Researcher** — gathers inputs, structures knowledge, produces a brief
|
||||
- **Writer** — produces primary output from the brief
|
||||
- **Reviewer** — evaluates output against criteria, approves or requests revision
|
||||
|
||||
**When to use:** Any workflow where you need sourced input, generated output, and a quality gate. Content production, report generation, code review pipelines, competitive analysis.
|
||||
|
||||
**How to customize:**
|
||||
|
||||
| Domain | Researcher focus | Writer focus | Reviewer criteria |
|
||||
|--------|-----------------|--------------|-------------------|
|
||||
| Content | Web sources, existing articles, reader questions | Article draft matching voice and format | Accuracy, engagement, brand voice |
|
||||
| Engineering | Codebase patterns, issue context, API docs | Implementation or PR description | Correctness, style, test coverage |
|
||||
| Consulting | Client data, market research, precedents | Recommendation or slide content | Evidence quality, actionability |
|
||||
| Operations | Logs, metrics, incident history | Incident report or runbook update | Completeness, clarity, ownership |
|
||||
|
||||
**Scaling the pattern:**
|
||||
|
||||
- Add a **Finalizer** agent between Reviewer and output for polish steps (SEO, formatting, compliance checks)
|
||||
- Add a **Distributor** agent after output for routing (email, Slack, WordPress, Linear)
|
||||
- Replace the single Researcher with a **parallel research team** (two agents gathering different source types simultaneously)
|
||||
- Add a **Memory Manager** agent that reads and writes state files, keeping other agents focused on their domain
|
||||
|
||||
---
|
||||
|
||||
## The 9-step pipeline template
|
||||
|
||||
This is the canonical sequence for a full pipeline skill. Adapt as needed — not all steps are required for every workflow.
|
||||
|
||||
```
|
||||
Step 1: Read project context
|
||||
- Read CLAUDE.md and any project-specific config
|
||||
- Establish constraints before any agent runs
|
||||
|
||||
Step 2: Read memory / previous state
|
||||
- Load memory/MEMORY.md or data/run-state.json
|
||||
- Pass relevant state to downstream agents as context
|
||||
|
||||
Step 3: Agent 1 — Researcher
|
||||
- Invoke with: topic, constraints, memory context
|
||||
- Output: structured research brief (markdown or JSON)
|
||||
|
||||
Step 4: Agent 2 — Writer
|
||||
- Invoke with: research brief, output format spec, voice guidelines
|
||||
- Output: primary draft
|
||||
|
||||
Step 5: Agent 3 — Reviewer
|
||||
- Invoke with: draft, scoring rubric, acceptance criteria
|
||||
- Output: score + pass/fail + specific revision requests
|
||||
|
||||
Step 6: Revision loop (conditional)
|
||||
- If reviewer score < threshold: invoke Writer again with feedback
|
||||
- Max 2 revision passes before escalating to human
|
||||
- If max passes exceeded: save draft with NEEDS_REVIEW flag
|
||||
|
||||
Step 7: Save outputs
|
||||
- Write final output to designated location
|
||||
- Publish if automated publishing is configured
|
||||
|
||||
Step 8: Update memory
|
||||
- Append run summary to memory file
|
||||
- Update counters, timestamps, last-processed markers
|
||||
|
||||
Step 9: Confirm and report
|
||||
- Print summary of what was produced
|
||||
- List any items that need human attention
|
||||
```
|
||||
|
||||
**Revision loop implementation note:** The loop should be explicit in the skill file. Do not rely on the agent to decide whether to loop — tell it exactly: "If the reviewer score is below 70, invoke the writer agent again with the reviewer's feedback. Do this at most twice."
|
||||
|
||||
---
|
||||
|
||||
## Agent role templates
|
||||
|
||||
Copy these as starting points. Replace bracketed values.
|
||||
|
||||
### Researcher
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: researcher
|
||||
description: |
|
||||
Use this agent to gather and structure information before writing or analysis.
|
||||
|
||||
<example>
|
||||
Context: Pipeline needs sourced input before writing
|
||||
user: "Research [topic] for this week's report"
|
||||
assistant: "I'll use the researcher agent to gather sources and produce a brief."
|
||||
<commentary>
|
||||
Research request before production triggers the researcher.
|
||||
</commentary>
|
||||
</example>
|
||||
model: sonnet
|
||||
tools: ["Read", "Glob", "Grep", "WebSearch", "Bash"]
|
||||
---
|
||||
|
||||
## How you work
|
||||
|
||||
You produce research briefs, not finished content. Your output is always structured
|
||||
for a downstream writer to consume.
|
||||
|
||||
1. Read any existing memory or prior research on this topic
|
||||
2. Gather sources using available tools (web search, local files, MCP servers)
|
||||
3. Extract the 5-7 most relevant facts, quotes, or data points
|
||||
4. Note source reliability and any gaps in coverage
|
||||
5. Produce a brief with sections: Background, Key Points, Sources, Gaps
|
||||
|
||||
## Rules
|
||||
|
||||
- Never fabricate sources or quotes
|
||||
- Mark unverified claims explicitly
|
||||
- Keep briefs under 800 words unless the topic demands more
|
||||
- List every source URL or file path used
|
||||
|
||||
## Output format
|
||||
|
||||
```
|
||||
## Research Brief: [Topic]
|
||||
Date: [date]
|
||||
|
||||
### Background
|
||||
[2-3 sentences of context]
|
||||
|
||||
### Key Points
|
||||
- [point 1] (source: [url/file])
|
||||
- [point 2] (source: [url/file])
|
||||
...
|
||||
|
||||
### Sources
|
||||
[list of all sources consulted]
|
||||
|
||||
### Gaps
|
||||
[what could not be verified or found]
|
||||
```
|
||||
```
|
||||
|
||||
### Writer
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: writer
|
||||
description: |
|
||||
Use this agent to produce primary written output from a research brief or spec.
|
||||
|
||||
<example>
|
||||
Context: Research brief is ready, article needs to be written
|
||||
user: "Write the article from this brief"
|
||||
assistant: "I'll use the writer agent to draft from the research brief."
|
||||
<commentary>
|
||||
Production request with existing brief triggers the writer.
|
||||
</commentary>
|
||||
</example>
|
||||
model: opus
|
||||
tools: ["Read", "Write", "Glob"]
|
||||
---
|
||||
|
||||
## How you work
|
||||
|
||||
You produce first drafts from structured inputs. You do not research — you write.
|
||||
|
||||
1. Read the research brief and any style/voice guidelines
|
||||
2. Read examples of approved past output for voice calibration
|
||||
3. Draft the primary output following the specified format
|
||||
4. Do not add information not present in the brief
|
||||
5. Flag any gaps where the brief was insufficient
|
||||
|
||||
## Rules
|
||||
|
||||
- Follow the voice and format guidelines exactly
|
||||
- Never add claims not supported by the brief
|
||||
- Keep within specified word count ±10%
|
||||
- End with a concrete takeaway or call to action
|
||||
|
||||
## Output format
|
||||
|
||||
[Specify the exact output format for your domain]
|
||||
```
|
||||
|
||||
### Reviewer
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: reviewer
|
||||
description: |
|
||||
Use this agent to evaluate output quality and approve or request revisions.
|
||||
|
||||
<example>
|
||||
Context: Draft is ready for quality check
|
||||
user: "Review this draft before publishing"
|
||||
assistant: "I'll use the reviewer agent to score and evaluate the draft."
|
||||
<commentary>
|
||||
Quality evaluation request triggers the reviewer.
|
||||
</commentary>
|
||||
</example>
|
||||
model: opus
|
||||
tools: ["Read"]
|
||||
---
|
||||
|
||||
## How you work
|
||||
|
||||
You evaluate drafts against defined criteria and produce a scored assessment.
|
||||
|
||||
1. Read the draft and the original brief or requirements
|
||||
2. Score against each dimension in the rubric (see Output format)
|
||||
3. Note specific issues with line references where possible
|
||||
4. Produce a pass/fail decision with justification
|
||||
|
||||
## Rules
|
||||
|
||||
- Score honestly — do not inflate to avoid revision cycles
|
||||
- Be specific: "paragraph 3 is vague" not "needs more detail"
|
||||
- Pass threshold is 70/100 overall with no dimension below 50
|
||||
|
||||
## Output format
|
||||
|
||||
```
|
||||
## Review: [Draft title]
|
||||
|
||||
### Scores
|
||||
- Accuracy: [0-25] — [one sentence justification]
|
||||
- Clarity: [0-25] — [one sentence justification]
|
||||
- Completeness: [0-25] — [one sentence justification]
|
||||
- Format/Voice: [0-25] — [one sentence justification]
|
||||
|
||||
### Overall: [total]/100
|
||||
|
||||
### Decision: PASS | REVISE | REJECT
|
||||
|
||||
### Revision requests (if REVISE)
|
||||
1. [specific request]
|
||||
2. [specific request]
|
||||
```
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quality gates: 4-level scoring rubric
|
||||
|
||||
Use this rubric in reviewer agents and pipeline acceptance criteria.
|
||||
|
||||
| Dimension | 0-12 (Poor) | 13-18 (Acceptable) | 19-22 (Good) | 23-25 (Excellent) |
|
||||
|-----------|-------------|-------------------|--------------|-------------------|
|
||||
| **Accuracy** | Multiple errors or unsupported claims | Minor errors, mostly supported | All claims verifiable | Fully sourced, no errors |
|
||||
| **Clarity** | Hard to follow, jargon-heavy | Mostly clear, some confusion | Clear throughout | Immediately clear, no ambiguity |
|
||||
| **Completeness** | Major gaps, incomplete | Covers main points, some gaps | Thorough coverage | Nothing missing |
|
||||
| **Format/Voice** | Wrong format or tone | Mostly correct, minor deviations | Correct format and tone | Perfect fit for context |
|
||||
|
||||
**Thresholds:**
|
||||
- 90-100: Publish immediately
|
||||
- 70-89: Publish with minor edits
|
||||
- 50-69: Revise and re-review
|
||||
- Below 50: Reject, start over or escalate to human
|
||||
|
||||
---
|
||||
|
||||
## Pipeline skill format
|
||||
|
||||
Pipeline skills live in `.claude/skills/<name>/SKILL.md`. They are invoked as `/plugin:skill-name` or triggered by the agent system automatically.
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: weekly-report
|
||||
description: |
|
||||
Run the weekly report pipeline. Triggers on: "run weekly report",
|
||||
"generate this week's report", "weekly pipeline"
|
||||
version: 0.1.0
|
||||
---
|
||||
|
||||
## Weekly Report Pipeline
|
||||
|
||||
Run these steps in order. Do not skip steps. If a step fails, stop and report the error.
|
||||
|
||||
### Step 1: Load context
|
||||
Read `CLAUDE.md` and `memory/MEMORY.md`. Note the last run date and any pending items.
|
||||
|
||||
### Step 2: Research
|
||||
Use the Agent tool to invoke the `researcher` agent with this prompt:
|
||||
"Research [topic] for the period [date range]. Focus on [specific angle]."
|
||||
Save the research brief to `data/research-brief-[date].md`.
|
||||
|
||||
### Step 3: Write
|
||||
Use the Agent tool to invoke the `writer` agent with this prompt:
|
||||
"Write the weekly report from [path to brief]. Follow the format in [style guide path]."
|
||||
Save the draft to `drafts/weekly-[date].md`.
|
||||
|
||||
### Step 4: Review
|
||||
Use the Agent tool to invoke the `reviewer` agent with this prompt:
|
||||
"Review the draft at [path]. Use the standard 4-dimension rubric."
|
||||
|
||||
### Step 5: Handle review result
|
||||
- If score >= 70: proceed to Step 6
|
||||
- If score < 70 and revision count < 2: invoke writer again with reviewer feedback, then re-review
|
||||
- If score < 70 after 2 revisions: save draft with NEEDS_REVIEW flag, skip to Step 7
|
||||
|
||||
### Step 6: Finalize
|
||||
[Publishing or distribution steps]
|
||||
|
||||
### Step 7: Update memory
|
||||
Append to `memory/MEMORY.md`:
|
||||
- Date of run
|
||||
- Output file path
|
||||
- Review score
|
||||
- Any items needing human attention
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent frontmatter: all valid fields
|
||||
|
||||
```yaml
|
||||
name: <string> # required — slug, used for routing and invocation
|
||||
description: | # required — trigger text + examples
|
||||
<string>
|
||||
model: sonnet|opus # required — model for this agent's runs
|
||||
tools: [<string>, ...] # required — explicit tool allowlist
|
||||
color: <string> # optional — UI color hint (green, blue, red, yellow, purple)
|
||||
```
|
||||
|
||||
Tools available for agents: `Read`, `Write`, `Edit`, `Glob`, `Grep`, `Bash`, `WebSearch`, `WebFetch`, `Agent`, `AskUserQuestion`, and any MCP tool by its full name (e.g., `mcp__tavily__tavily_search`).
|
||||
|
||||
---
|
||||
|
||||
## How agents communicate
|
||||
|
||||
**Agent tool (sequential):** The orchestrating skill or parent agent uses the `Agent` tool to invoke a subagent. The subagent runs to completion and returns its output. This is the standard pattern for pipeline steps.
|
||||
|
||||
```
|
||||
Agent tool call:
|
||||
agent: researcher
|
||||
prompt: "Research X and produce a brief in the format..."
|
||||
→ researcher runs, returns brief text
|
||||
→ parent continues with Step 2
|
||||
```
|
||||
|
||||
**SendMessage (async / worktree):** For parallel execution, agents can be spawned in separate worktrees. Each worktree runs independently; results are assembled by the orchestrator after all complete. Use this when steps have no dependencies on each other (e.g., researching two topics simultaneously).
|
||||
|
||||
**Worktree isolation:** When an agent runs in a worktree, it has its own working copy of the repository. It cannot see changes made by other agents running simultaneously. Use a shared output directory (outside the worktrees) or a coordination file to merge results.
|
||||
|
||||
**File-based handoff (simple and reliable):** The most robust communication pattern is file-based. Each agent writes its output to a designated path; the next agent reads from that path. This works in any execution mode and produces an audit trail of intermediate outputs.
|
||||
|
||||
```
|
||||
researcher → data/brief-2026-04-10.md
|
||||
writer → reads data/brief-2026-04-10.md → drafts/article-2026-04-10.md
|
||||
reviewer → reads drafts/article-2026-04-10.md → data/review-2026-04-10.md
|
||||
```
|
||||
|
||||
For most personal and small-team pipelines, sequential execution with file-based handoff is the right choice. It is simpler to debug, easier to resume after failure, and produces a clear audit trail.
|
||||
280
skills/agent-system-design/references/security-patterns.md
Normal file
280
skills/agent-system-design/references/security-patterns.md
Normal file
|
|
@ -0,0 +1,280 @@
|
|||
# Security Patterns for Autonomous Agents
|
||||
|
||||
Reference for the agent-system-design skill. Covers permission modes, hook-based
|
||||
guardrails, settings.json configuration, and a checklist for hardening autonomous agents.
|
||||
|
||||
---
|
||||
|
||||
## 1. Permission Modes
|
||||
|
||||
Four levels of approval, from most restrictive to least:
|
||||
|
||||
### Default (interactive approval)
|
||||
Claude asks the user before every tool call. Suitable for exploratory sessions.
|
||||
No configuration required.
|
||||
|
||||
### Auto-edit (AcceptEdits mode)
|
||||
File edits are auto-approved. Bash commands still require approval.
|
||||
Enables faster iteration on code without allowing arbitrary shell execution.
|
||||
|
||||
```json
|
||||
{
|
||||
"autoApprove": ["Edit", "Write"]
|
||||
}
|
||||
```
|
||||
|
||||
### Auto Mode (AI classifier)
|
||||
An AI classifier evaluates each tool call and approves or blocks it based on
|
||||
predicted risk. Reported metrics: 0.4% false positive rate (safe calls blocked),
|
||||
5.7% false negative rate (unsafe calls approved).
|
||||
|
||||
Suitable for: attended automation where the developer is nearby and can intervene.
|
||||
Not suitable for: fully unattended production agents.
|
||||
|
||||
```json
|
||||
{
|
||||
"autoApprove": ["auto"]
|
||||
}
|
||||
```
|
||||
|
||||
### Bypass (--dangerously-skip-permissions)
|
||||
All tool calls are auto-approved without review. Must only be used inside a
|
||||
sandbox (Docker, VM, or a throwaway environment with no access to production systems).
|
||||
Never use on the host machine with access to real credentials or filesystems.
|
||||
|
||||
```bash
|
||||
claude --dangerously-skip-permissions -p "run the pipeline"
|
||||
```
|
||||
|
||||
**Decision rule:** Match the permission mode to the blast radius. The more access
|
||||
the agent has, the more restrictive the approval mode must be.
|
||||
|
||||
---
|
||||
|
||||
## 2. Hook-Based Guardrails
|
||||
|
||||
Hooks run synchronously before and after tool calls. A non-zero exit from
|
||||
`PreToolUse` blocks the tool call entirely. Five standard patterns:
|
||||
|
||||
### Pattern 1: Destructive Command Blocking
|
||||
Block commands that cannot be undone.
|
||||
|
||||
```bash
|
||||
# hooks/pre-tool-use.sh
|
||||
BLOCKED_PATTERNS=(
|
||||
"rm -rf"
|
||||
"mkfs"
|
||||
"dd if="
|
||||
":(){ :|:& };:" # fork bomb
|
||||
"> /dev/"
|
||||
)
|
||||
COMMAND="$CLAUDE_TOOL_INPUT_COMMAND"
|
||||
for pattern in "${BLOCKED_PATTERNS[@]}"; do
|
||||
if echo "$COMMAND" | grep -qF "$pattern"; then
|
||||
echo "BLOCKED: destructive pattern detected: $pattern" >&2
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Pattern 2: Piped Script Execution Blocking
|
||||
Block patterns that download and execute code in a single pipeline.
|
||||
|
||||
```bash
|
||||
PIPED_EXEC_PATTERNS=(
|
||||
"curl.*|.*bash"
|
||||
"curl.*|.*sh"
|
||||
"wget.*|.*bash"
|
||||
"wget.*|.*sh"
|
||||
)
|
||||
for pattern in "${PIPED_EXEC_PATTERNS[@]}"; do
|
||||
if echo "$COMMAND" | grep -qE "$pattern"; then
|
||||
echo "BLOCKED: piped script execution detected" >&2
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Pattern 3: Privilege Escalation Blocking
|
||||
Block commands that elevate privileges or weaken file permissions.
|
||||
|
||||
```bash
|
||||
PRIV_PATTERNS=(
|
||||
"sudo"
|
||||
"chmod 777"
|
||||
"chmod a+x /etc"
|
||||
"shutdown"
|
||||
"reboot"
|
||||
"init 0"
|
||||
)
|
||||
for pattern in "${PRIV_PATTERNS[@]}"; do
|
||||
if echo "$COMMAND" | grep -qF "$pattern"; then
|
||||
echo "BLOCKED: privilege escalation detected" >&2
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Pattern 4: Path Restriction
|
||||
Prevent writes outside the project directory.
|
||||
|
||||
```bash
|
||||
PROJECT_DIR="$(cd "$(dirname "$0")/.." && pwd)"
|
||||
TOOL_PATH="$CLAUDE_TOOL_INPUT_FILE_PATH"
|
||||
if [ -n "$TOOL_PATH" ]; then
|
||||
REAL_PATH="$(realpath "$TOOL_PATH" 2>/dev/null || echo "$TOOL_PATH")"
|
||||
if [[ "$REAL_PATH" != "$PROJECT_DIR"* ]]; then
|
||||
echo "BLOCKED: write outside project dir: $REAL_PATH" >&2
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
### Pattern 5: Audit Logging
|
||||
Log every tool call with timestamp for post-hoc review.
|
||||
|
||||
```bash
|
||||
# hooks/post-tool-use.sh
|
||||
LOG_FILE="$PROJECT_DIR/logs/audit.log"
|
||||
mkdir -p "$(dirname "$LOG_FILE")"
|
||||
echo "$(date -u +"%Y-%m-%dT%H:%M:%SZ") TOOL=$CLAUDE_TOOL_NAME INPUT=$CLAUDE_TOOL_INPUT" >> "$LOG_FILE"
|
||||
```
|
||||
|
||||
Combine all five patterns into a single `pre-tool-use.sh` and a separate
|
||||
`post-tool-use.sh`. Keep them under 80 lines each so they are auditable at a glance.
|
||||
|
||||
---
|
||||
|
||||
## 3. settings.json Security Configuration
|
||||
|
||||
The full security configuration surface in `settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"permissions": {
|
||||
"allow": [
|
||||
"Bash(git:*)",
|
||||
"Bash(npm run *)",
|
||||
"Read(**)",
|
||||
"Write(src/**)",
|
||||
"Edit(src/**)"
|
||||
],
|
||||
"deny": [
|
||||
"Bash(rm -rf *)",
|
||||
"Bash(sudo *)",
|
||||
"Bash(curl * | *)",
|
||||
"Bash(wget * | *)",
|
||||
"Write(/etc/*)",
|
||||
"Write(~/.ssh/*)",
|
||||
"Write(~/.zshenv)"
|
||||
]
|
||||
},
|
||||
"hooks": {
|
||||
"PreToolUse": [
|
||||
{
|
||||
"matcher": "Bash",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "./hooks/pre-tool-use.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "*",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "./hooks/post-tool-use.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Allow list principle:** Prefer an explicit allow list over a deny list alone.
|
||||
The deny list catches known-bad patterns; the allow list enforces least privilege.
|
||||
Both together provide defense in depth.
|
||||
|
||||
**Glob patterns in deny list:** Use `*` conservatively. `Bash(sudo *)` blocks
|
||||
all sudo invocations. `Write(/etc/*)` blocks all writes to system config.
|
||||
|
||||
---
|
||||
|
||||
## 4. Security Checklist for Autonomous Agents
|
||||
|
||||
Verify each item before declaring an agent production-ready:
|
||||
|
||||
- [ ] `PreToolUse` hook is present and blocks destructive commands (Pattern 1-3 above)
|
||||
- [ ] `PostToolUse` audit log is enabled and written to a persistent location
|
||||
- [ ] `permissions.deny` list covers: `rm -rf *`, `sudo *`, `curl * | *`, `wget * | *`
|
||||
- [ ] `permissions.allow` list is as narrow as the agent's task requires
|
||||
- [ ] MEMORY.md does not contain API keys, tokens, or passwords
|
||||
- [ ] `.env` is in `.gitignore`; secrets are loaded from environment, not from files tracked by git
|
||||
- [ ] Deployment target matches the blast radius (see `deployment-targets.md`)
|
||||
- [ ] If always-on: phone approval via permission relay is configured (v2.1.81+)
|
||||
- [ ] Audit log is rotated (logrotate or launchd-managed)
|
||||
- [ ] Hook scripts are executable (`chmod +x hooks/*.sh`) and checked into version control
|
||||
|
||||
### Phone Approval (v2.1.81+)
|
||||
|
||||
For unattended agents that must occasionally escalate to a human:
|
||||
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PreToolUse": [
|
||||
{
|
||||
"matcher": "Bash(sudo *)",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "prompt",
|
||||
"prompt": "This command requires elevated privileges. Approve? (yes/no)",
|
||||
"channel": "telegram"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The agent pauses and sends the approval request to the configured channel.
|
||||
The operator approves or rejects from their phone. If no response within the
|
||||
timeout, the hook exits non-zero and the command is blocked.
|
||||
|
||||
---
|
||||
|
||||
## 5. OpenClaw vs Claude Code: Security Philosophy
|
||||
|
||||
| Dimension | OpenClaw | Claude Code |
|
||||
|-----------|---------|-------------|
|
||||
| Primary mechanism | Docker sandbox (containment) | Hooks + deny list (prevention) |
|
||||
| Approach | Contain the damage after the fact | Prevent the action before it runs |
|
||||
| Escape risk | Container escape (low, not zero) | Hook bypass if hook is misconfigured |
|
||||
| Auditability | Container logs | Audit log hook (Pattern 5) |
|
||||
| Operator control | Docker network/volume flags | settings.json permissions |
|
||||
| Mobile escalation | Native | Permission relay via channel MCP (v2.1.81+) |
|
||||
|
||||
**Containment vs Prevention:** OpenClaw runs the agent inside a Docker container with
|
||||
limited network and volume mounts. If the agent does something harmful, the damage is
|
||||
contained to the container. Claude Code instead prevents harmful actions from running
|
||||
at all, via hooks and the permissions deny list.
|
||||
|
||||
**Tradeoff:** Containment is harder to escape but allows the harmful action to attempt
|
||||
execution. Prevention stops it earlier but requires the hook to be correctly configured
|
||||
and maintained. For production agents, combine both: run Claude Code inside Docker AND
|
||||
configure hooks and deny lists.
|
||||
|
||||
**Combined approach (recommended for production):**
|
||||
1. Run `claude` inside a Docker container with minimal volume mounts
|
||||
2. Configure `PreToolUse` hooks with Patterns 1-4
|
||||
3. Enable `PostToolUse` audit logging (Pattern 5)
|
||||
4. Use an explicit `permissions.deny` list
|
||||
5. Do not use `--dangerously-skip-permissions` outside the container
|
||||
|
||||
This gives containment as a last resort and prevention as the primary defense.
|
||||
Loading…
Add table
Add a link
Reference in a new issue