From 1a776bdeb22b5ce387bfc63a21610ee150d67faf Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Sat, 11 Apr 2026 11:21:17 +0200 Subject: [PATCH] docs(plans): create session blueprints for Agent Factory execution 8 session blueprints covering all 27 steps across 3 waves: - Session 1: Foundation (rename + commands, Steps 1-5) - Session 2: Skills and templates (Steps 6-7) - Session 3: OpenClaw patterns (memory/heartbeat/proactive/cron, Steps 9-12) - Session 4: Paperclip patterns (context/goals/budget/governance/org-chart, Steps 14-18) - Session 5: Self-learning (feedback/optimization, Steps 20-21) - Session 6: Integration (Docker/transfer/5 more domains, Steps 22-24) - Session 7: Skill updates (memory/autonomy/orchestration/governance/MCP refs, Steps 13,19,25) - Session 8: Finalization (build command integration + v1.0, Steps 8,26,27) Also updates plan assumptions table with verified findings. Co-Authored-By: Claude Opus 4.6 --- .../plans/blueprints/session-1-foundation.md | 548 ++++++ .../blueprints/session-2-skills-templates.md | 1122 +++++++++++ .../plans/blueprints/session-3-openclaw.md | 1254 +++++++++++++ .../plans/blueprints/session-4-paperclip.md | 757 ++++++++ .../blueprints/session-5-selflearning.md | 997 ++++++++++ .../plans/blueprints/session-6-integration.md | 1668 +++++++++++++++++ .../blueprints/session-7-skill-updates.md | 1593 ++++++++++++++++ .../blueprints/session-8-finalization.md | 937 +++++++++ .../ultraplan-2026-04-11-agent-factory.md | 17 +- 9 files changed, 8885 insertions(+), 8 deletions(-) create mode 100644 .claude/plans/blueprints/session-1-foundation.md create mode 100644 .claude/plans/blueprints/session-2-skills-templates.md create mode 100644 .claude/plans/blueprints/session-3-openclaw.md create mode 100644 .claude/plans/blueprints/session-4-paperclip.md create mode 100644 .claude/plans/blueprints/session-5-selflearning.md create mode 100644 .claude/plans/blueprints/session-6-integration.md create mode 100644 .claude/plans/blueprints/session-7-skill-updates.md create mode 100644 .claude/plans/blueprints/session-8-finalization.md diff --git a/.claude/plans/blueprints/session-1-foundation.md b/.claude/plans/blueprints/session-1-foundation.md new file mode 100644 index 0000000..8f49ebe --- /dev/null +++ b/.claude/plans/blueprints/session-1-foundation.md @@ -0,0 +1,548 @@ +# Session 1: Foundation — Rename and Commands + +> Steps 1, 2, 3, 4, 5 | Wave 1 | Depends on: none + +## Dependencies + +Entry condition: none (first session, clean repo) + +## Scope Fence + +**Touch:** +- `.claude-plugin/plugin.json` +- `CLAUDE.md` +- `README.md` +- `commands/build.md` +- `commands/deploy.md` (new) +- `commands/evaluate.md` (new) +- `commands/status.md` (new) +- `agents/deployment-advisor.md` (new) +- `skills/agent-system-design/SKILL.md` (rename reference only) + +**Never touch:** +- `scripts/templates/*` (except reading existing for reference) +- `skills/managed-agents/` +- `agents/builder.md` (content — only read for format reference) +- Any file in `scripts/templates/memory/`, `scripts/templates/heartbeat/`, etc. + +--- + +## Step 1: Rename plugin from agent-builder to agent-factory + +### Files to modify + +**`.claude-plugin/plugin.json`** — Replace content with: + +```json +{ + "name": "agent-factory", + "description": "Build and manage autonomous agent systems with Claude Code. Guided workflow from idea to deployed, self-running agents. Covers all 22 agent capabilities with patterns from OpenClaw and Paperclip.", + "version": "0.2.0", + "author": { + "name": "Kjell Tore Guttormsen", + "url": "https://fromaitochitta.com" + }, + "repository": "https://git.fromaitochitta.com/open/agent-factory", + "license": "MIT", + "keywords": [ + "agent", + "autonomous", + "pipeline", + "automation", + "hooks", + "security", + "deployment", + "memory", + "heartbeat", + "governance" + ] +} +``` + +**`CLAUDE.md`** — Apply these diffs: + +- Line 1: `# Agent Builder Plugin` → `# Agent Factory Plugin` +- Line 3: `Plugin that helps users build complete autonomous agent systems` → `Plugin that guides users through building complete autonomous agent systems` +- Line 8: `The `/agent-builder:build` command` → `The `/agent-factory:build` command` +- All other occurrences of `agent-builder` → `agent-factory` + +**`README.md`** — Full rewrite for v0.2.0 (will be rewritten again in Step 27 for v1.0): + +Replace `# Agent Builder` with `# Agent Factory` and update: +- Install command: `/install agent-factory` or `--plugin-dir ./agent-factory` +- All `/agent-builder:` prefixes → `/agent-factory:` +- Repository URL: `https://git.fromaitochitta.com/open/agent-factory` +- Add recommendation: "For the best experience, install via [ktg-plugin-marketplace](https://git.fromaitochitta.com/open/ktg-plugin-marketplace) which includes Agent Factory plus the ultra-suite (ultraplan, ultraresearch, ultraexecute)." + +**`commands/build.md`** — Diff: + +- Line 7: `You are running `/agent-builder:build`` → `You are running `/agent-factory:build`` + +**`skills/agent-system-design/SKILL.md`** — Diff: + +- Line 106: `Run `/agent-builder:build`` → `Run `/agent-factory:build`` + +### Verify + +```bash +grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan-spec" | grep -v "ultraplan-2026" | grep -v "blueprints/" | grep -v "execution-guide" | wc -l +``` +Expected: `0` (all renamed) + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat!: rename plugin from agent-builder to agent-factory" +``` + +--- + +## Step 2: Create /agent-factory:deploy command + +### Files to create + +**`commands/deploy.md`**: + +```markdown +--- +description: Configure deployment for your agent system. Supports /schedule (cloud), Desktop scheduled tasks, cron/launchd, systemd, and Docker. +argument-hint: "Optional: deployment target (schedule, desktop, local, vps, docker)" +allowed-tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash", "Agent", "AskUserQuestion"] +--- + +You are running `/agent-factory:deploy` — a guided deployment configuration for autonomous agent systems. + +If $ARGUMENTS specifies a target, skip to that target's section. Otherwise, ask the user to choose. + +--- + +## Step 1: Scan existing system + +Read the user's agent system: +- Glob for `.claude/agents/*.md` — list all agents with names and models +- Glob for `.claude/skills/*/SKILL.md` and `.claude/skills/*.md` — list all skills +- Read `.claude/settings.json` if it exists — check hook configuration +- Check for existing automation: `automation/`, `HEARTBEAT.md`, `docker-compose.yml` +- Read `CLAUDE.md` for project context + +Summarize what was found before proceeding. + +--- + +## Step 2: Choose deployment target + +Ask using AskUserQuestion: +"Where should your agents run? Choose a deployment target: + +1. **Cloud (/schedule)** — Runs on Anthropic's cloud. Needs GitHub repo. No local file access. 1-hour minimum interval. Best for: PR reviews, CI triage, repo maintenance. +2. **Desktop scheduled tasks** — Runs on your machine via Claude Code Desktop app. Local file access. 1-minute minimum interval. Best for: local automation, file processing. +3. **Local (cron/launchd)** — Traditional scheduler. Runs headless via `claude -p`. Full local access. Best for: personal daily pipelines, development. +4. **VPS (systemd)** — Linux server with systemd service/timer. Always-on. Best for: team pipelines, production workloads. +5. **Docker** — Containerized agent. Portable, isolated. Best for: reproducible deployments, security isolation." + +If `$ARGUMENTS` matches a target name, skip this question. + +--- + +## Step 3: Configure chosen target + +### Cloud (/schedule) + +1. Check GitHub connection: suggest `/web-setup` if not connected +2. Explain: "Cloud tasks clone your repo on each run. Local files are not accessible. MCP connectors provide external service access." +3. Generate a task prompt from the user's pipeline skills +4. Guide the user through `/schedule` to create the task +5. Note minimum 1-hour interval + +### Desktop scheduled tasks + +1. Explain: "Desktop tasks run on your machine with full local file access." +2. Guide through the Desktop app's Schedule page +3. Configure permission mode for unattended operation + +### Local (cron/launchd) + +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh` +2. Copy and customize to `automation/run-pipeline.sh` +3. Replace SKILL_NAME with the user's pipeline name +4. Ask: "What schedule? (e.g., daily at 07:00, every 2 hours)" +5. For macOS: read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/launchd.plist`, customize, save to `automation/` +6. For Linux: generate cron entry +7. Provide exact activation commands + +### VPS (systemd) + +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/systemd-service.unit` +2. Ask: "What Linux user runs the agent?" and "Absolute path to project?" +3. Customize and save service + timer units to `automation/` +4. Copy automation.sh template to `automation/run-pipeline.sh` +5. Provide setup instructions including `systemctl enable/start` + +### Docker + +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/docker/` templates (created in Session 6) +2. If templates don't exist yet: generate basic Dockerfile + docker-compose.yml inline +3. Customize with project-specific values +4. Save to project root +5. Provide `docker compose up -d` instructions + +--- + +## Step 4: Verify deployment + +For each target, provide a verification command: +- Cloud: `/schedule list` → task visible +- Desktop: Check Schedule page in Desktop app +- Local: `crontab -l | grep claude` or `launchctl list | grep claude` +- VPS: `systemctl status claude-agent` +- Docker: `docker compose ps` + +Use the `deployment-advisor` agent when the user needs guidance choosing between targets. + +--- + +## Step 5: Next steps + +Tell the user: +- "Run `/agent-factory:status` to check your deployment" +- "Run `/agent-factory:evaluate` to assess your system's capability coverage" +``` + +### Verify + +```bash +head -3 /Users/ktg/repos/agent-builder/commands/deploy.md | grep -c "description:" +``` +Expected: `1` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(commands): add /agent-factory:deploy command" +``` + +--- + +## Step 3: Create deployment-advisor agent + +### Files to create + +**`agents/deployment-advisor.md`**: + +```markdown +--- +name: deployment-advisor +description: | + Use this agent when the user needs help choosing or configuring a deployment target for their agent system. + + + Context: User has built agents and wants to deploy + user: "How should I deploy my agent system?" + assistant: "I'll use the deployment-advisor to analyze your setup and recommend a target." + + Deployment guidance request triggers the advisor. + + + + + Context: User wants to switch deployment targets + user: "Can I move my agents from cron to Docker?" + assistant: "I'll use the deployment-advisor to plan the migration." + + Deployment migration request triggers the advisor. + + + + + Context: User asks about cloud vs local deployment + user: "Should I use /schedule or cron for my pipeline?" + assistant: "I'll use the deployment-advisor to compare the options for your use case." + + Deployment comparison request triggers the advisor. + + +model: sonnet +color: blue +tools: ["Read", "Glob", "Grep", "Bash", "AskUserQuestion"] +--- + +## How you work + +You analyze the user's agent system and recommend a deployment target based on their requirements. + +1. Scan the project: `.claude/agents/*.md`, `.claude/skills/`, `.claude/settings.json`, `CLAUDE.md`, `automation/`, `HEARTBEAT.md` +2. Assess requirements by asking targeted questions: + - Does this need to run when your computer is off? + - Do agents need local filesystem access? + - Is this for personal use or a team? + - Any budget constraints for hosting? + - Do agents need Computer Use (browser interaction)? +3. Read the deployment reference at `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/deployment-targets.md` +4. Apply the decision guide from that reference +5. Recommend ONE primary target with clear reasoning +6. Generate the deployment configuration files for the chosen target + +## Rules + +- Never overwrite existing deployment config without asking the user first +- Always verify generated shell scripts with `bash -n` before saving +- Always include rollback instructions (how to undo the deployment) +- If the user's needs span multiple targets, recommend the simplest one that covers all requirements +- For Docker: always include `security_opt: [no-new-privileges:true]` in docker-compose.yml +- For /schedule (cloud): warn that local files are not accessible — only GitHub repo content +- Never recommend `--dangerously-skip-permissions` outside a Docker container or sandboxed environment + +## Output format + +``` +DEPLOYMENT RECOMMENDATION +======================== +Target: [chosen target] +Reason: [why this fits] + +Files to create: +- [file 1]: [description] +- [file 2]: [description] + +Activation: +[exact commands to activate the deployment] + +Verification: +[exact commands to verify it's running] + +Rollback: +[exact commands to undo if needed] +``` +``` + +### Verify + +```bash +python3 -c "import yaml; yaml.safe_load(open('/Users/ktg/repos/agent-builder/agents/deployment-advisor.md').read().split('---')[1])" 2>&1 && echo "VALID" +``` +Expected: `VALID` + +### On failure +retry — fix YAML syntax, then revert if still failing + +### Checkpoint +```bash +git commit -m "feat(agents): add deployment-advisor agent" +``` + +--- + +## Step 4: Create /agent-factory:evaluate command + +### Files to create + +**`commands/evaluate.md`**: + +```markdown +--- +description: Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations. +argument-hint: "Optional: focus area (security, deployment, memory, autonomy)" +allowed-tools: ["Read", "Glob", "Grep", "Bash"] +--- + +You are running `/agent-factory:evaluate` — a capability assessment for your agent system. + +## Step 1: Scan project components + +Scan for all agent system components: +- Agents: Glob for `.claude/agents/*.md` +- Pipeline skills: Glob for `.claude/skills/*/SKILL.md` +- Knowledge skills: Glob for `.claude/skills/*.md` +- Hooks: Glob for `.claude/hooks/*.sh` and `hooks/*.sh` +- Settings: Read `.claude/settings.json` if it exists +- Context: Read `CLAUDE.md` if it exists +- Automation: Glob for `automation/*`, `scripts/*.sh` +- Memory: look for `memory/MEMORY.md`, `memory/SESSION-STATE.md`, `data/run-state.json` +- Heartbeat: look for `HEARTBEAT.md` +- Goals: look for `GOALS.md` +- Governance: look for `GOVERNANCE.md` +- Org chart: look for `ORG-CHART.md` +- Budget: look for `budget/BUDGET.md`, `budget/cost-events.jsonl` +- Docker: look for `Dockerfile`, `docker-compose.yml` + +## Step 2: Score against 22 capabilities + +Read the feature map at `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md`. + +For each of the 22 capabilities, check whether the user's project has the corresponding component. Score as: +- **OK** — component exists and is properly configured +- **Partial** — component exists but is incomplete or misconfigured +- **Missing** — component does not exist + +## Step 3: Output capability matrix + +``` +AGENT SYSTEM EVALUATION +======================= + +| # | Capability | Status | What exists | What's needed | +|---|-----------|--------|-------------|---------------| +| 1 | Agent Runtime | OK | CLAUDE.md + settings.json | — | +| 2 | Shell Execution | Missing | — | hooks/pre-tool-use.sh + deny list | +... + +Score: X/22 OK | Y/22 Partial | Z/22 Missing +``` + +## Step 4: Recommendations + +Provide specific recommendations for filling gaps, ordered by impact: +1. Security gaps first (hooks, permissions) +2. Core functionality gaps (missing agents, skills) +3. Operational gaps (memory, automation, deployment) +4. Advanced gaps (governance, budget, self-learning) + +If $ARGUMENTS specifies a focus area, expand that section with detailed guidance and link to relevant templates from `${CLAUDE_PLUGIN_ROOT}/scripts/templates/`. + +## Step 5: Next steps + +Suggest: "Run `/agent-factory:build` to fill gaps interactively, or `/agent-factory:deploy` to configure deployment." +``` + +### Verify + +```bash +head -3 /Users/ktg/repos/agent-builder/commands/evaluate.md | grep -c "description:" +``` +Expected: `1` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(commands): add /agent-factory:evaluate command" +``` + +--- + +## Step 5: Create /agent-factory:status command + +### Files to create + +**`commands/status.md`**: + +```markdown +--- +description: Quick status check of your agent infrastructure. Shows agents, skills, hooks, deployment, and recent activity. +argument-hint: "" +allowed-tools: ["Read", "Glob", "Grep", "Bash"] +--- + +You are running `/agent-factory:status` — a quick health check of your agent infrastructure. + +## Scan and report + +Run all scans, then output a single compact status report. + +### Agents +Glob for `.claude/agents/*.md`. For each agent file: +- Read the frontmatter to extract: name, model, tools count +- List as: `[name] (model: [model], tools: [N])` +- Flag: agents without `` blocks in description (unreliable triggering) + +### Skills +Glob for `.claude/skills/*/SKILL.md` and `.claude/skills/*.md`. For each: +- Read frontmatter to extract: name, version (if present) +- List as: `[name] v[version]` +- Flag: skills without version field + +### Hooks +Glob for `.claude/hooks/*.sh` and `hooks/*.sh`. For each: +- Check if executable: `test -x [path]` +- List as: `[filename] ([executable/not executable])` +- Flag: hook scripts that are not executable + +### Settings +Read `.claude/settings.json` if it exists: +- Count permission allow rules and deny rules +- Count hook configurations +- Summarize: `[N] allow rules, [M] deny rules, [K] hooks configured` + +### Deployment +Check for: +- `automation/*.sh` → "Local automation scripts found" +- `automation/*.plist` → "launchd config found" +- `automation/*.service` → "systemd config found" +- `docker-compose.yml` or `Dockerfile` → "Docker config found" +- `HEARTBEAT.md` → "Heartbeat file found" + +### Memory +Check for: +- `memory/MEMORY.md` → "Long-term memory: found" +- `memory/SESSION-STATE.md` → "Session state: found" +- `memory/*.md` (daily logs) → "Daily logs: [N] files" +- `data/run-state.json` → "Run state: found" + +### Recent activity +If `.claude/hooks/audit.log` or `hooks/audit.log` exists: +- Read last 5 lines +- Display as recent tool call log + +### Issues +Compile all flags from above into an issues list: +- "WARNING: [agent] has no example blocks — may not trigger reliably" +- "WARNING: [hook] is not executable — run chmod +x" +- "WARNING: No hooks configured — unattended runs have no guardrails" + +## Output format + +``` +AGENT FACTORY STATUS +==================== + +Agents: [N] configured +Skills: [N] loaded +Hooks: [N] configured ([M] executable) +Settings: [allow/deny/hooks counts] +Deployment: [target or "not configured"] +Memory: [tiers found] + +Issues: [N] +[list if any] +``` +``` + +### Verify + +```bash +head -3 /Users/ktg/repos/agent-builder/commands/status.md | grep -c "description:" +``` +Expected: `1` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(commands): add /agent-factory:status command" +``` + +--- + +## Exit Condition + +- [ ] `grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan" | grep -v "blueprints/" | grep -v "execution-guide" | wc -l` → 0 +- [ ] `ls /Users/ktg/repos/agent-builder/commands/ | sort` → `build.md deploy.md evaluate.md status.md` +- [ ] `ls /Users/ktg/repos/agent-builder/agents/ | sort` → `builder.md deployment-advisor.md` +- [ ] `python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['name']=='agent-factory'; assert d['version']=='0.2.0'"` → no error +- [ ] All agent files have valid YAML frontmatter: `find /Users/ktg/repos/agent-builder/agents -name "*.md" -exec python3 -c "import yaml,sys; yaml.safe_load(open(sys.argv[1]).read().split('---')[1])" {} \;` → no errors + +## Quality Criteria + +- All 4 command files have valid YAML frontmatter with `description:` field +- deployment-advisor.md has 3 `` blocks in description +- deploy.md covers all 5 deployment targets with specific instructions +- evaluate.md references the 22-capability feature map +- status.md scans all 7 component types (agents, skills, hooks, settings, deployment, memory, activity) +- No `agent-builder` references remain in any touched file +- plugin.json version is `0.2.0` and name is `agent-factory` diff --git a/.claude/plans/blueprints/session-2-skills-templates.md b/.claude/plans/blueprints/session-2-skills-templates.md new file mode 100644 index 0000000..1d75b5b --- /dev/null +++ b/.claude/plans/blueprints/session-2-skills-templates.md @@ -0,0 +1,1122 @@ +# Session 2: Skills and Initial Domain Templates + +> Steps 6, 7 | Wave 1 | Depends on: none + +## Dependencies + +Entry condition: none (independent of Session 1) + +## Scope Fence + +**Touch:** +- `skills/managed-agents/SKILL.md` (new) +- `skills/managed-agents/references/api-patterns.md` (new) +- `scripts/templates/domains/content-pipeline.md` (new) +- `scripts/templates/domains/code-review.md` (new) +- `scripts/templates/domains/monitoring.md` (new) +- `scripts/templates/domains/research-synthesis.md` (new) +- `scripts/templates/domains/data-processing.md` (new) +- `scripts/templates/domains/README.md` (new) + +**Never touch:** +- `commands/` +- `agents/` +- `.claude-plugin/` +- `CLAUDE.md` +- `README.md` +- `scripts/templates/memory/` +- `scripts/templates/heartbeat/` +- Any existing file in `skills/agent-system-design/` + +--- + +## Step 6: Create managed-agents skill + +### Files to create + +**`skills/managed-agents/SKILL.md`**: + +```markdown +--- +name: managed-agents +description: | + This skill should be used when the user asks about "managed agents", + "Anthropic API agents", "cloud-hosted agents", "agent SDK", + "deploying agents to the cloud", "serverless agents", + "API-based agent deployment", "/v1/agents endpoint", + "remote agent hosting", "agent as a service" +version: 0.1.0 +--- + +## What are managed agents + +Managed agents are Anthropic-hosted agent runtimes accessed via the Agent SDK +(`@anthropic-ai/sdk` for TypeScript, `anthropic` for Python). Instead of running +Claude Code locally, the agent runs on Anthropic's infrastructure with persistent +sessions, tool access, and automatic scaling. + +Key difference from local agents: managed agents don't have local filesystem +access by default. They work through tools you define in code, not through +Claude Code's built-in Read/Write/Bash tools. + +## When to use managed agents vs local deployment + +| Dimension | Managed Agents (API) | Local (Claude Code CLI) | +|-----------|---------------------|------------------------| +| Infrastructure | Anthropic-hosted | Your machine/server | +| Filesystem | Via tools you define | Full local access | +| MCP servers | Not available | Full MCP support | +| Scaling | Automatic | Manual | +| Cost model | Per-token API billing | Subscription or API key | +| Best for | SaaS products, API integrations | Personal pipelines, file-heavy work | +| Session persistence | Via API sessions | Via `--resume` / `--name` | + +**Decision rule:** If your agents need local filesystem access, MCP servers, or +run as part of a personal workflow → use local deployment (cron/launchd/systemd/Docker). +If your agents are part of a product, need to scale, or don't need local files → +use managed agents. + +**Important limitation:** Managed agents cannot use MCP servers. If your agent +system relies on MCP servers for Slack, GitHub, databases, or other integrations, +use local deployment with Docker for isolation instead. + +## SDK patterns + +For concrete code patterns, see: +`${CLAUDE_PLUGIN_ROOT}/skills/managed-agents/references/api-patterns.md` + +## Session management + +Managed agents support persistent sessions via the API: + +```typescript +// Create a new session +const session = await client.agents.sessions.create({ + agent_id: "ag_...", + system_prompt: "You are a research agent..." +}); + +// Resume an existing session +const response = await client.agents.sessions.messages.create({ + agent_id: "ag_...", + session_id: session.id, + messages: [{ role: "user", content: "Continue the analysis" }] +}); +``` + +Sessions maintain conversation history and tool state across multiple +interactions, similar to `claude --resume` for local agents. + +## Budget and cost considerations + +Managed agents bill per token at standard API rates. For cost control: + +1. **Set max_tokens** on each request to cap output length +2. **Use prompt caching** — cached input tokens cost 90% less +3. **Batch non-urgent work** — batch API gives 50% discount +4. **Monitor with Admin API** — if you have org access, use + `/v1/organizations/usage_report/messages` with an Admin API key + (`sk-ant-admin...`) for detailed cost breakdowns +5. **Use `--max-budget-usd`** flag for local headless runs as a budget cap + +Note: The Usage & Cost API requires an Admin API key and organization account. +Individual accounts should estimate costs from token counts. + +## Migration path: local → managed + +1. Extract your agent's system prompt from `.claude/agents/[name].md` +2. Convert tool access to SDK tool definitions +3. Replace file-based memory with session persistence or external storage +4. Replace MCP server integrations with direct API calls in tool handlers +5. Test with the SDK before removing local deployment + +This is a significant architectural change. Only migrate if you need API-based +access or auto-scaling. Local deployment is simpler and cheaper for personal use. + +## Getting started + +For a guided setup: run `/agent-factory:build` and choose "Managed Agents" +as the deployment target in Phase 6. + +For manual setup: see the API patterns reference at +`${CLAUDE_PLUGIN_ROOT}/skills/managed-agents/references/api-patterns.md` +``` + +**`skills/managed-agents/references/api-patterns.md`**: + +```markdown +# Managed Agents API Patterns + +Code patterns for creating and managing agents via the Anthropic SDK. +All examples use `@anthropic-ai/sdk` (TypeScript) with Python equivalents noted. + +--- + +## Basic agent creation + +```typescript +import Anthropic from "@anthropic-ai/sdk"; + +const client = new Anthropic(); + +// Create an agent with tools +const response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 4096, + system: "You are a research agent. Produce structured briefs.", + tools: [ + { + name: "web_search", + description: "Search the web for information", + input_schema: { + type: "object", + properties: { + query: { type: "string", description: "Search query" } + }, + required: ["query"] + } + } + ], + messages: [ + { role: "user", content: "Research the latest Claude Code features" } + ] +}); +``` + +## Agent with persistent sessions + +```typescript +// Create a session-based agent +const session = await client.agents.sessions.create({ + agent_id: "ag_your_agent_id", + system_prompt: `You are a data analyst. You have access to the + company database via the query tool. Always verify your findings + before reporting.`, + tools: [/* your tool definitions */] +}); + +// First interaction +const result1 = await client.agents.sessions.messages.create({ + agent_id: "ag_your_agent_id", + session_id: session.id, + messages: [{ role: "user", content: "Analyze Q1 revenue trends" }] +}); + +// Continue the conversation (session remembers context) +const result2 = await client.agents.sessions.messages.create({ + agent_id: "ag_your_agent_id", + session_id: session.id, + messages: [{ role: "user", content: "Now compare with Q4 of last year" }] +}); +``` + +## Tool handling pattern + +```typescript +async function runAgentLoop( + client: Anthropic, + messages: Anthropic.MessageParam[], + tools: Anthropic.Tool[] +) { + let response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 4096, + tools, + messages + }); + + while (response.stop_reason === "tool_use") { + const toolUseBlocks = response.content.filter( + (b): b is Anthropic.ToolUseBlock => b.type === "tool_use" + ); + + const toolResults: Anthropic.ToolResultBlockParam[] = []; + for (const toolUse of toolUseBlocks) { + const result = await executeToolCall(toolUse.name, toolUse.input); + toolResults.push({ + type: "tool_result", + tool_use_id: toolUse.id, + content: JSON.stringify(result) + }); + } + + messages.push({ role: "assistant", content: response.content }); + messages.push({ role: "user", content: toolResults }); + + response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 4096, + tools, + messages + }); + } + + return response; +} +``` + +## Cost optimization with prompt caching + +```typescript +const response = await client.messages.create({ + model: "claude-sonnet-4-6", + max_tokens: 1024, + system: [ + { + type: "text", + text: longSystemPrompt, // cached — 90% cheaper on reuse + cache_control: { type: "ephemeral" } + } + ], + messages: [{ role: "user", content: userQuery }] +}); +``` + +## Error handling + +```typescript +try { + const response = await client.messages.create(/* ... */); +} catch (error) { + if (error instanceof Anthropic.RateLimitError) { + // Retry with exponential backoff + await sleep(retryDelay); + retryDelay *= 2; + } else if (error instanceof Anthropic.APIError) { + console.error(`API error ${error.status}: ${error.message}`); + // Log for debugging, don't retry on 4xx + } +} +``` + +## Python equivalent + +```python +import anthropic + +client = anthropic.Anthropic() + +response = client.messages.create( + model="claude-sonnet-4-6", + max_tokens=4096, + system="You are a research agent.", + messages=[{"role": "user", "content": "Research topic X"}] +) +``` + +## Deployment pattern: scheduled managed agent + +```typescript +// Run as a scheduled job (e.g., via cron or cloud scheduler) +async function dailyReport() { + const client = new Anthropic(); + + const response = await runAgentLoop( + client, + [{ role: "user", content: "Generate the daily status report" }], + reportTools + ); + + // Extract and save the report + const text = response.content + .filter((b): b is Anthropic.TextBlock => b.type === "text") + .map((b) => b.text) + .join("\n"); + + await saveReport(text); +} + +dailyReport().catch(console.error); +``` +``` + +### Verify + +```bash +head -5 /Users/ktg/repos/agent-builder/skills/managed-agents/SKILL.md | grep -c "name: managed-agents" +``` +Expected: `1` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(skills): add managed-agents knowledge skill" +``` + +--- + +## Step 7: Create 5 domain templates + +### Files to create + +**`scripts/templates/domains/README.md`**: + +```markdown +# Domain Templates + +Pre-built pipeline templates for common use cases. The builder agent reads these +during `/agent-factory:build` Phase 0 to pre-populate the design sketch. + +## Available Templates + +| Template | Domain | Agents | Pipeline | +|----------|--------|--------|----------| +| content-pipeline | Content production | content-researcher, content-writer, content-reviewer | Research → Draft → Review → Publish | +| code-review | Code review | code-analyzer, review-writer, standards-checker | Analyze → Write review → Check standards → Post | +| monitoring | System monitoring | monitor-checker, incident-reporter, remediation-advisor | Check → Detect → Report → Advise | +| research-synthesis | Research & analysis | source-gatherer, synthesizer, fact-checker | Gather → Synthesize → Verify → Produce brief | +| data-processing | Data transformation | data-validator, transformer, quality-checker | Validate → Transform → Check quality → Save | + +## Usage + +During `/agent-factory:build`, choose a template when prompted: +"Would you like to start from a domain template?" + +The builder reads the chosen template and pre-populates: +- Agent roles and descriptions +- Pipeline steps and handoff points +- Recommended hooks for the domain +- Example CLAUDE.md sections + +## Template format + +Each template is a plain markdown file with `{{PLACEHOLDER}}` variables. +The builder agent replaces placeholders with project-specific values during +scaffolding. All templates follow the same structure: + +1. Header comment (domain description) +2. Agent definitions (frontmatter + system prompt per agent) +3. Pipeline skill template +4. Recommended hooks +5. Example CLAUDE.md sections + +## Placeholders + +All templates use these standard placeholders: + +| Placeholder | Description | +|------------|-------------| +| `{{PROJECT_DIR}}` | Absolute path to the user's project | +| `{{AGENT_NAME}}` | Name of the agent being generated | +| `{{PIPELINE_NAME}}` | Name of the pipeline skill | +| `{{SCHEDULE}}` | Cron expression or schedule description | +| `{{DOMAIN}}` | Domain name (e.g., "content", "code-review") | + +## Creating custom templates + +Copy any existing template and modify it. The builder agent can also generate +custom templates during the build workflow. +``` + +**`scripts/templates/domains/content-pipeline.md`**: + +```markdown +# Domain Template: Content Pipeline + + + + + +## Agent Definitions + +### content-researcher + +--- +name: content-researcher +description: | + Use this agent to gather and structure information for content production. + + + Context: Content pipeline needs sourced input + user: "Research {{PIPELINE_NAME}} topic for this week" + assistant: "I'll use the content-researcher to gather sources and produce a brief." + Research stage of content pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep", "WebSearch", "WebFetch", "Bash"] +--- + +You are the content researcher for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read CLAUDE.md for project context, voice guidelines, and audience definition +2. Read memory/MEMORY.md for prior research and recurring themes +3. Search for sources using WebSearch and WebFetch +4. Extract 5-7 key points with source attribution +5. Identify gaps in coverage +6. Write SESSION-STATE.md before producing output (WAL protocol) + +## Rules + +- Never fabricate sources or quotes +- Mark unverified claims with [UNVERIFIED] +- Keep briefs under 800 words +- List every source URL used +- Write to SESSION-STATE.md before responding + +## Output format + +Save to `pipeline-output/research-$(date +%Y-%m-%d).md`: + +``` +## Research Brief: [Topic] +Date: [date] + +### Background +[2-3 sentences] + +### Key Points +- [point] (source: [url]) +... + +### Sources +[list] + +### Gaps +[what couldn't be verified] +``` + +### content-writer + +--- +name: content-writer +description: | + Use this agent to produce written content from a research brief. + + + Context: Research brief is ready + user: "Write the article from this brief" + assistant: "I'll use the content-writer to draft from the research." + Drafting stage of content pipeline triggers this agent. + +model: opus +tools: ["Read", "Write", "Glob"] +--- + +You are the content writer for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the research brief +2. Read CLAUDE.md for voice and format guidelines +3. Read examples of approved past output (if available in pipeline-output/) +4. Draft the content following format specifications +5. Do not add claims not in the brief + +## Rules + +- Follow voice guidelines exactly +- Never add unsupported claims +- Stay within word count ±10% +- End with a concrete takeaway + +## Output format + +Save to `pipeline-output/draft-$(date +%Y-%m-%d).md` + +### content-reviewer + +--- +name: content-reviewer +description: | + Use this agent to evaluate content quality and approve or request revisions. + + + Context: Draft is ready for review + user: "Review this draft" + assistant: "I'll use the content-reviewer to score and evaluate." + Quality review stage of content pipeline triggers this agent. + +model: opus +tools: ["Read"] +--- + +You are the content reviewer for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the draft and original research brief +2. Score against: Accuracy (0-25), Clarity (0-25), Completeness (0-25), Voice (0-25) +3. Note specific issues with line references +4. Decide: PASS (70+), REVISE (50-69), REJECT (<50) + +## Rules + +- Score honestly — do not inflate +- Be specific: "paragraph 3 needs a source" not "needs work" +- Pass threshold: 70/100 overall, no dimension below 50 + +## Output format + +Save to `pipeline-output/review-$(date +%Y-%m-%d).md` + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run the {{DOMAIN}} content pipeline. Produces researched, reviewed content. + Triggers on: "run {{PIPELINE_NAME}}", "produce content", "write article" +version: 0.1.0 +--- + +Run this pipeline end-to-end. $ARGUMENTS is the topic or input. + +**Step 1 — Load context** +Read CLAUDE.md. Read memory/MEMORY.md if it exists. + +**Step 2 — Research** +Use the content-researcher agent. Pass $ARGUMENTS and context. + +**Step 3 — Draft** +Use the content-writer agent. Pass the research brief. + +**Step 4 — Review** +Use the content-reviewer agent. Pass the draft. + +**Step 5 — Revision loop** +If reviewer score < 70 and revisions < 2: send draft + feedback to writer, re-review. +If still < 70 after 2 revisions: save with NEEDS_REVIEW flag. + +**Step 6 — Save output** +Write final to pipeline-output/final-$(date +%Y-%m-%d).md + +**Step 7 — Update memory** +Append to memory/MEMORY.md: date, topic, score, issues. + +**Step 8 — Report** +Tell the user: file path, score, time, issues. +``` + +## Recommended Hooks + +Pre-tool-use: Block writes outside {{PROJECT_DIR}} and pipeline-output/ +Post-tool-use: Audit log all tool calls + +## Example CLAUDE.md Sections + +```markdown +## Content Guidelines + +- Voice: [describe your brand voice] +- Audience: [who reads this] +- Format: [article/newsletter/report specifics] +- Word count: [target range] +- Sources: [what counts as a valid source] +``` +``` + +**`scripts/templates/domains/code-review.md`**: + +```markdown +# Domain Template: Automated Code Review + + + + + +## Agent Definitions + +### code-analyzer + +--- +name: code-analyzer +description: | + Use this agent to analyze code changes for quality issues. + + + Context: PR or diff needs analysis + user: "Analyze the changes in this PR" + assistant: "I'll use the code-analyzer to examine the diff." + Code analysis request triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep", "Bash"] +--- + +You are a code analyzer for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the diff or PR description +2. Identify: new files, modified files, deleted files +3. For each changed file: check for bugs, security issues, performance problems +4. Categorize findings: critical, warning, info +5. Check test coverage: are there tests for the changes? + +## Rules + +- Focus on real issues, not style preferences +- Always check for security vulnerabilities (OWASP Top 10) +- Note missing tests for new functionality +- Don't flag auto-generated or dependency files + +### review-writer + +--- +name: review-writer +description: | + Use this agent to write a structured code review from analysis findings. + + + Context: Code analysis is complete + user: "Write the review" + assistant: "I'll use the review-writer to produce a structured review." + Review writing stage triggers this agent. + +model: sonnet +tools: ["Read", "Write"] +--- + +You are a code review writer for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the analysis findings +2. Group by severity: critical first, then warnings, then info +3. Write actionable comments with file:line references +4. Suggest specific fixes where possible +5. Note positive aspects (good patterns, thorough tests) + +## Output format + +Save to `pipeline-output/review-$(date +%Y-%m-%d).md` + +### standards-checker + +--- +name: standards-checker +description: | + Use this agent to verify code against project standards. + + + Context: Code review needs standards verification + user: "Check this against our coding standards" + assistant: "I'll use the standards-checker to verify compliance." + Standards check triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep", "Bash"] +--- + +You are a standards checker for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read CLAUDE.md for project conventions +2. Read existing code for patterns (naming, structure, imports) +3. Check changed files against conventions +4. Run linters/formatters if available: `npm run lint`, `ruff check`, etc. +5. Report deviations from established patterns + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run automated code review pipeline on recent changes. + Triggers on: "review code", "check PR", "run code review" +version: 0.1.0 +--- + +**Step 1 — Get changes:** Run `git diff HEAD~1` or read PR description from $ARGUMENTS +**Step 2 — Analyze:** Use code-analyzer agent on the diff +**Step 3 — Write review:** Use review-writer agent with analysis findings +**Step 4 — Check standards:** Use standards-checker agent on changed files +**Step 5 — Combine:** Merge review + standards findings into final review +**Step 6 — Save:** Write to pipeline-output/review-$(date +%Y-%m-%d).md +**Step 7 — Update memory:** Log review date, files checked, findings count +``` + +## Recommended Hooks + +Pre-tool-use: Block `git push --force`, `git reset --hard` +Post-tool-use: Log all Bash commands for audit trail +``` + +**`scripts/templates/domains/monitoring.md`**: + +```markdown +# Domain Template: System Monitoring + + + + + +## Agent Definitions + +### monitor-checker + +--- +name: monitor-checker +description: | + Use this agent to check system health and detect anomalies. + + + Context: Scheduled health check + user: "Run the system health check" + assistant: "I'll use the monitor-checker to scan endpoints and logs." + Health check request triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Glob", "Grep", "WebFetch"] +--- + +You check system health for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read monitoring config from CLAUDE.md or `monitoring/config.md` +2. For each endpoint: check HTTP status, response time, expected content +3. For log files: grep for ERROR/WARN patterns, count occurrences +4. Compare against baselines from memory/MEMORY.md +5. Flag anomalies: new errors, response time spikes, missing services + +### incident-reporter + +--- +name: incident-reporter +description: | + Use this agent to create structured incident reports from monitoring findings. + + + Context: Monitoring detected issues + user: "Report the incidents found" + assistant: "I'll use the incident-reporter to create structured reports." + Incident reporting triggers this agent. + +model: sonnet +tools: ["Read", "Write"] +--- + +You create incident reports for {{DOMAIN}}. + +## Output format + +Save to `pipeline-output/incident-$(date +%Y-%m-%d).md`: +- Severity (critical/warning/info) +- Affected service +- Detection time +- Symptom description +- Recent changes (if known) + +### remediation-advisor + +--- +name: remediation-advisor +description: | + Use this agent to suggest fixes for detected incidents. + + + Context: Incidents have been reported + user: "What should we do about these issues?" + assistant: "I'll use the remediation-advisor to suggest fixes." + Remediation advice request triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep"] +--- + +You advise on incident remediation for {{DOMAIN}}. + +## How you work + +1. Read the incident report +2. For each incident: identify likely root cause +3. Suggest specific remediation steps +4. Categorize: automated fix possible, needs manual intervention, needs investigation +5. Reference runbooks if available in the project + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run system monitoring pipeline. Checks health, detects issues, advises fixes. + Triggers on: "check systems", "run monitoring", "health check" +version: 0.1.0 +--- + +**Step 1 — Load config:** Read monitoring endpoints and thresholds from CLAUDE.md +**Step 2 — Check health:** Use monitor-checker agent +**Step 3 — Report incidents:** If issues found, use incident-reporter agent +**Step 4 — Advise remediation:** Use remediation-advisor agent +**Step 5 — Save:** Write report to pipeline-output/monitoring-$(date +%Y-%m-%d).md +**Step 6 — Alert:** If critical issues, print prominent warning +**Step 7 — Update memory:** Log check time, findings count, actions taken +``` + +## Recommended Hooks + +Pre-tool-use: Block any write operations outside pipeline-output/ and monitoring/ +Post-tool-use: Log all checks with timestamps +``` + +**`scripts/templates/domains/research-synthesis.md`**: + +```markdown +# Domain Template: Research Synthesis + + + + + +## Agent Definitions + +### source-gatherer + +--- +name: source-gatherer +description: | + Use this agent to gather sources from multiple channels for research. + + + Context: Research topic needs sources + user: "Gather sources on this topic" + assistant: "I'll use the source-gatherer to find relevant sources." + Source gathering request triggers this agent. + +model: sonnet +tools: ["Read", "WebSearch", "WebFetch", "Glob", "Grep", "Bash"] +--- + +You gather and organize research sources for {{DOMAIN}}. + +## How you work + +1. Parse the research question from input +2. Search multiple source types: web, local files, databases (via MCP if available) +3. For each source: extract key claims, note author credibility, capture URL +4. De-duplicate findings across sources +5. Organize by theme or subtopic +6. Rate source quality: official docs > peer-reviewed > community > opinion + +### synthesizer + +--- +name: synthesizer +description: | + Use this agent to synthesize research findings into a coherent brief. + + + Context: Sources have been gathered + user: "Synthesize these findings" + assistant: "I'll use the synthesizer to produce a coherent brief." + Synthesis request triggers this agent. + +model: opus +tools: ["Read", "Write"] +--- + +You synthesize research into actionable briefs for {{DOMAIN}}. + +## How you work + +1. Read all gathered sources +2. Identify consensus points (multiple sources agree) +3. Identify conflicts (sources disagree — note both sides) +4. Draw conclusions supported by evidence +5. Structure as: Executive Summary → Findings → Conflicts → Recommendation + +### fact-checker + +--- +name: fact-checker +description: | + Use this agent to verify claims in a research synthesis. + + + Context: Synthesis needs fact-checking + user: "Verify the claims in this brief" + assistant: "I'll use the fact-checker to verify each claim." + Fact-checking request triggers this agent. + +model: sonnet +tools: ["Read", "WebSearch", "WebFetch"] +--- + +You verify claims for {{DOMAIN}}. + +## How you work + +1. Extract every factual claim from the synthesis +2. For each claim: search for independent verification +3. Mark as: VERIFIED (independent source confirms), UNVERIFIED (no confirmation found), DISPUTED (contradicting source found) +4. For DISPUTED claims: note both sides with sources + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run research synthesis pipeline. Gathers, synthesizes, and verifies. + Triggers on: "research topic", "investigate", "produce research brief" +version: 0.1.0 +--- + +**Step 1 — Load context:** Read CLAUDE.md and memory/MEMORY.md for prior research +**Step 2 — Gather:** Use source-gatherer agent with $ARGUMENTS +**Step 3 — Synthesize:** Use synthesizer agent with gathered sources +**Step 4 — Verify:** Use fact-checker agent on synthesis +**Step 5 — Revise:** If unverified claims found, return to source-gatherer for those specific claims +**Step 6 — Save:** Write to pipeline-output/research-$(date +%Y-%m-%d).md +**Step 7 — Update memory:** Log research topic, source count, verification rate +``` +``` + +**`scripts/templates/domains/data-processing.md`**: + +```markdown +# Domain Template: Data Processing + + + + + +## Agent Definitions + +### data-validator + +--- +name: data-validator +description: | + Use this agent to validate input data before processing. + + + Context: Data needs validation before transformation + user: "Validate this data file" + assistant: "I'll use the data-validator to check the input." + Data validation request triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Glob"] +--- + +You validate input data for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the input file or data source +2. Check format: expected file type, encoding, structure +3. Check schema: required fields present, correct types +4. Check values: within expected ranges, no obvious anomalies +5. Report: valid records count, invalid records with reasons + +### transformer + +--- +name: transformer +description: | + Use this agent to transform data between formats or structures. + + + Context: Validated data needs transformation + user: "Transform this data to the target format" + assistant: "I'll use the transformer to process the data." + Data transformation request triggers this agent. + +model: sonnet +tools: ["Read", "Write", "Bash"] +--- + +You transform data for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the validated input and transformation spec +2. Apply transformations: field mapping, type conversion, aggregation +3. Handle edge cases: nulls, missing fields, encoding issues +4. Write output to specified format +5. Log transformation stats: records processed, skipped, errored + +### quality-checker + +--- +name: quality-checker +description: | + Use this agent to verify output data quality after transformation. + + + Context: Transformed data needs quality check + user: "Check the output quality" + assistant: "I'll use the quality-checker to verify the transformation." + Quality check request triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Grep"] +--- + +You check data quality for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the transformed output +2. Compare record counts: input vs output (accounting for expected changes) +3. Spot-check values: sample records for correctness +4. Check referential integrity if applicable +5. Generate quality report: completeness, accuracy, consistency scores + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run data processing pipeline. Validates, transforms, and checks quality. + Triggers on: "process data", "transform data", "run data pipeline" +version: 0.1.0 +--- + +**Step 1 — Load config:** Read CLAUDE.md for data sources and formats +**Step 2 — Validate:** Use data-validator agent on input +**Step 3 — Transform:** If validation passes, use transformer agent +**Step 4 — Quality check:** Use quality-checker on output +**Step 5 — Save or reject:** If quality passes, save to pipeline-output/. If not, save with NEEDS_REVIEW flag. +**Step 6 — Update memory:** Log: date, records processed, quality score +``` + +## Recommended Hooks + +Pre-tool-use: Block writes outside {{PROJECT_DIR}}, pipeline-output/, and data/ +Post-tool-use: Log all file operations for data lineage tracking +``` + +### Verify + +```bash +ls /Users/ktg/repos/agent-builder/scripts/templates/domains/ | wc -l +``` +Expected: `6` (5 templates + README) + +### On failure +retry — ensure all 5 templates follow the format, then revert if still failing + +### Checkpoint +```bash +git commit -m "feat(templates): add 5 domain-specific pipeline templates" +``` + +--- + +## Exit Condition + +- [ ] `head -5 /Users/ktg/repos/agent-builder/skills/managed-agents/SKILL.md | grep -c "name: managed-agents"` → 1 +- [ ] `test -f /Users/ktg/repos/agent-builder/skills/managed-agents/references/api-patterns.md && echo OK` → OK +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/domains/ | wc -l` → 6 +- [ ] All 5 domain templates contain `{{PIPELINE_NAME}}` placeholder: `grep -l "PIPELINE_NAME" /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l` → 5 + +## Quality Criteria + +- managed-agents skill has valid YAML frontmatter with trigger phrases +- managed-agents skill covers: what they are, when to use, SDK patterns, cost considerations, migration path +- api-patterns.md has working TypeScript code examples with proper imports +- All 5 domain templates follow identical structure: header comment, 3 agent definitions, pipeline skill template +- All templates use consistent `{{PLACEHOLDER}}` syntax +- README.md lists all 5 templates with one-line descriptions diff --git a/.claude/plans/blueprints/session-3-openclaw.md b/.claude/plans/blueprints/session-3-openclaw.md new file mode 100644 index 0000000..73da4ac --- /dev/null +++ b/.claude/plans/blueprints/session-3-openclaw.md @@ -0,0 +1,1254 @@ +# Session 3: OpenClaw Memory and Autonomy Patterns + +> Steps 9, 10, 11, 12 | Wave 1 | Depends on: none + +## Dependencies + +Entry condition: none (independent — creates new template directories only) + +## Scope Fence + +**Touch:** +- `scripts/templates/memory/SESSION-STATE.md` (new) +- `scripts/templates/memory/DAILY-LOG.md` (new) +- `scripts/templates/memory/MEMORY.md` (new) +- `scripts/templates/memory/README.md` (new) +- `scripts/templates/heartbeat/HEARTBEAT.md` (new) +- `scripts/templates/heartbeat/heartbeat-runner.sh` (new) +- `scripts/templates/heartbeat/README.md` (new) +- `scripts/templates/proactive/PROACTIVE-AGENT.md` (new) +- `scripts/templates/proactive/ADL-RULES.md` (new) +- `scripts/templates/proactive/VFM-SCORING.md` (new) +- `scripts/templates/proactive/README.md` (new) +- `scripts/templates/cron/agent-turn.sh` (new) +- `scripts/templates/cron/system-event.sh` (new) +- `scripts/templates/cron/README.md` (new) + +**Never touch:** +- `commands/` +- `agents/` +- `skills/` +- `scripts/templates/heartbeat/context-packet.md` (Session 4) +- `scripts/templates/heartbeat/wake-prompt.md` (Session 4) +- `.claude-plugin/` +- `CLAUDE.md`, `README.md` + +--- + +## Step 9: Create 3-tier memory templates + +### Files to create + +**`scripts/templates/memory/SESSION-STATE.md`**: + +```markdown +# Session State: {{AGENT_NAME}} + +> Hot working memory. Updated every turn. Read first on resume. + +## WAL Protocol + +**Write important details HERE before responding to the user.** +This prevents data loss if the session crashes or context compacts mid-response. + +## Current Task + +- Task: [what you're working on right now] +- Started: [timestamp] +- Status: [in progress / blocked / waiting] +- Key decision: [the most important choice made this session] + +## Context Window Usage + +- Estimated usage: [low / medium / high / DANGER ZONE] +- If above 60%: activate Working Buffer below + +## Active Decisions + +| Decision | Choice | Reason | Reversible? | +|----------|--------|--------|-------------| +| | | | | + +## Pending Actions + +- [ ] [action 1] +- [ ] [action 2] + +## Working Buffer + +> Activate when context usage exceeds 60%. Capture key exchanges here +> before they are lost to compaction. This is your safety net. + +### Recent exchanges to preserve + +[Paste critical user messages and your key responses here when in the danger zone] + +### Key facts from this session + +[Extract and list facts that would be lost on compaction] + +## Compaction Recovery + +If you're reading this after a context compaction: +1. Read this SESSION-STATE.md first (you're here) +2. Read today's daily log: `memory/$(date +%Y-%m-%d).md` +3. Read memory/MEMORY.md for long-term context +4. Search daily logs for relevant prior context +5. Resume from the Current Task and Pending Actions above +``` + +**`scripts/templates/memory/DAILY-LOG.md`**: + +```markdown +# Daily Log: {{AGENT_NAME}} + +> Warm daily capture. One file per day. Filename: memory/YYYY-MM-DD.md +> Auto-rotated: the pipeline creates a new file each day. + +## {{DATE}} + +### Summary of Work + +[1-3 sentences describing what was accomplished today] + +### Decisions Made + +| Decision | Context | Outcome | +|----------|---------|---------| +| | | | + +### Files Modified + +- [file path] — [what changed and why] + +### Issues Encountered + +- [issue description] — [resolution or status] + +### Quality Scores + +| Pipeline run | Reviewer score | Notes | +|-------------|---------------|-------| +| | | | + +### Carry Forward + +> Items for the next session. These get checked on the next pipeline run. + +- [ ] [item that needs attention tomorrow] +- [ ] [follow-up from today's work] + +### Cost + +- Estimated tokens used: [if tracked] +- Pipeline runs: [count] +``` + +**`scripts/templates/memory/MEMORY.md`**: + +```markdown +# Long-Term Memory: {{AGENT_NAME}} + +> Cold curated memory. Updated manually or after significant learnings. +> This is the last file read during compaction recovery. + +## Agent Identity + +- Name: {{AGENT_NAME}} +- Role: [what this agent does] +- Domain: {{DOMAIN}} +- Created: {{DATE}} + +## Key Learnings + +> Manually curated from daily logs. Only include insights that affect +> future behavior. Delete entries that are no longer relevant. + +- [learning 1 — date discovered] +- [learning 2 — date discovered] + +## Recurring Patterns + +> Patterns observed across multiple runs. Used to improve agent behavior. + +| Pattern | Frequency | Impact | Action taken | +|---------|-----------|--------|-------------| +| | | | | + +## Known Issues + +> Active issues that affect agent performance. + +- [issue] — [workaround] — [status: open/resolved] + +## Project Context + +> Static context that doesn't change often. + +- Project: {{PROJECT_DIR}} +- Pipeline: {{PIPELINE_NAME}} +- Schedule: {{SCHEDULE}} +- Deployment: [target] + +## Last Updated + +[date — update this when you curate this file] +``` + +**`scripts/templates/memory/README.md`**: + +```markdown +# 3-Tier Memory System + +Inspired by OpenClaw's proactive agent memory pattern. Three tiers serve +different purposes with different update frequencies. + +## Architecture + +``` +Tier 1: SESSION-STATE.md (hot) + - Updated every turn + - Read FIRST on session start and compaction recovery + - Contains: current task, decisions, pending actions, working buffer + +Tier 2: memory/YYYY-MM-DD.md (warm) + - One file per day, auto-rotated + - Updated at end of each pipeline run + - Contains: daily summary, decisions, files modified, issues, carry-forward + +Tier 3: memory/MEMORY.md (cold) + - Updated manually or after significant learnings + - Read LAST during compaction recovery + - Contains: identity, curated learnings, patterns, known issues +``` + +## WAL Protocol (Write-Ahead Logging) + +Before responding to the user with important information, write it to +SESSION-STATE.md first. This prevents data loss if: +- The session crashes mid-response +- Context compaction removes the exchange +- The user's connection drops + +## Working Buffer Protocol + +When context usage exceeds ~60% (the "danger zone"): +1. Activate the Working Buffer section in SESSION-STATE.md +2. Copy critical recent exchanges into the buffer +3. Extract and list key facts that would be lost on compaction +4. Continue working normally — the buffer is your safety net + +## Compaction Recovery + +When Claude resumes after context compaction, it reads in this order: +1. SESSION-STATE.md (current task, decisions, working buffer) +2. Today's daily log (what happened today) +3. MEMORY.md (long-term context, known issues) +4. Search older daily logs if needed for specific context + +## Integration with Agent Factory + +During `/agent-factory:build` Phase 2.5 (Memory Setup): +1. Copy these templates to the user's `memory/` directory +2. Replace `{{PLACEHOLDER}}` variables with project-specific values +3. Create the initial SESSION-STATE.md and MEMORY.md +4. Configure the pipeline skill to update daily logs after each run + +## File locations after scaffolding + +``` +project/ + memory/ + SESSION-STATE.md (from Tier 1 template) + MEMORY.md (from Tier 3 template) + 2026-04-11.md (generated daily, from Tier 2 template) + 2026-04-12.md + ... +``` +``` + +### Verify + +```bash +ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l +``` +Expected: `4` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(templates): add 3-tier memory templates (OpenClaw pattern)" +``` + +--- + +## Step 10: Create heartbeat and cron templates + +### Files to create + +**`scripts/templates/heartbeat/HEARTBEAT.md`**: + +```markdown +# Heartbeat: {{AGENT_NAME}} + +Read this file on each heartbeat. Follow it strictly. Do not infer or +repeat old tasks from prior chats. If nothing needs attention, reply +HEARTBEAT_OK. + +## Tasks + +tasks: + - name: {{TASK_1_NAME}} + interval: {{TASK_1_INTERVAL}} + prompt: "{{TASK_1_PROMPT}}" + - name: {{TASK_2_NAME}} + interval: {{TASK_2_INTERVAL}} + prompt: "{{TASK_2_PROMPT}}" + +## Context + +{{CONTEXT_NOTES}} + +## Rules + +- Only perform tasks listed above +- Respect the interval — do not run a task before its next due time +- If a task fails, log the error and continue to the next task +- Respond with HEARTBEAT_OK if no tasks are due +``` + +**`scripts/templates/heartbeat/heartbeat-runner.sh`**: + +```bash +#!/bin/bash +# Heartbeat runner for Claude Code agents. +# Reads HEARTBEAT.md, checks which tasks are due, invokes claude -p for each. +# +# Bash 3.2 compatible: no associative arrays, no mapfile, no |& +# Uses python3 for all JSON/YAML/date operations. +# +# Usage: ./heartbeat-runner.sh [--catchup] +# --catchup: run missed tasks on first invocation (max 5, 5s stagger) +# +# Placeholders: +# {{AGENT_NAME}} - name of the agent +# {{WORKING_DIR}} - absolute path to project directory +# {{MAX_TURNS}} - max turns per heartbeat (default: 10) +# {{ACK_MAX_CHARS}} - suppress responses shorter than this (default: 300) + +AGENT_NAME="{{AGENT_NAME}}" +WORKING_DIR="{{WORKING_DIR}}" +MAX_TURNS="${MAX_TURNS:-10}" +ACK_MAX_CHARS="${ACK_MAX_CHARS:-300}" +HEARTBEAT_FILE="$WORKING_DIR/HEARTBEAT.md" +STATE_FILE="$WORKING_DIR/.heartbeat-state.json" +LOG_DIR="$WORKING_DIR/logs" +CATCHUP_MODE=false + +if [ "$1" = "--catchup" ]; then + CATCHUP_MODE=true +fi + +# Ensure directories exist +mkdir -p "$LOG_DIR" + +# --- Emptiness detection (OpenClaw pattern) --- +# Skip API calls if heartbeat file has only headers/empty items +is_heartbeat_empty() { + python3 << 'PYEOF' +import sys, re + +try: + with open("$HEARTBEAT_FILE", "r") as f: + content = f.read() +except FileNotFoundError: + print("true") + sys.exit(0) + +# Strip markdown headers, blank lines, and YAML structure markers +stripped = re.sub(r'^#+.*$', '', content, flags=re.MULTILINE) +stripped = re.sub(r'^tasks:\s*$', '', stripped, flags=re.MULTILINE) +stripped = re.sub(r'^\s*-\s*name:\s*\{\{.*\}\}\s*$', '', stripped, flags=re.MULTILINE) +stripped = re.sub(r'^\s*$', '', stripped, flags=re.MULTILINE) +stripped = stripped.strip() + +if len(stripped) < 20: + print("true") +else: + print("false") +PYEOF +} + +HEARTBEAT_FILE_ACTUAL="$HEARTBEAT_FILE" +EMPTY_CHECK=$(HEARTBEAT_FILE="$HEARTBEAT_FILE_ACTUAL" python3 -c " +import sys, re, os +hf = os.environ.get('HEARTBEAT_FILE', '') +try: + content = open(hf).read() +except: + print('true'); sys.exit(0) +stripped = re.sub(r'^#+.*$', '', content, flags=re.MULTILINE) +stripped = re.sub(r'^\s*$', '', stripped, flags=re.MULTILINE).strip() +print('true' if len(stripped) < 20 else 'false') +" 2>/dev/null) + +if [ "$EMPTY_CHECK" = "true" ]; then + echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | heartbeat | SKIP (empty heartbeat file)" >> "$LOG_DIR/heartbeat.log" + exit 0 +fi + +# --- Parse tasks and check due times --- +DUE_TASKS=$(python3 << PYEOF +import json, re, os, time +from datetime import datetime, timedelta + +heartbeat_file = "$HEARTBEAT_FILE_ACTUAL" +state_file = "$STATE_FILE" +catchup = "$CATCHUP_MODE" == "true" + +# Parse tasks from HEARTBEAT.md +try: + content = open(heartbeat_file).read() +except FileNotFoundError: + print("[]") + exit(0) + +# Simple YAML-like task parsing +tasks = [] +current_task = {} +for line in content.split('\n'): + line = line.strip() + m_name = re.match(r'-\s*name:\s*(.+)', line) + m_interval = re.match(r'interval:\s*(.+)', line) + m_prompt = re.match(r'prompt:\s*"(.+)"', line) + if m_name: + if current_task.get('name'): + tasks.append(current_task) + current_task = {'name': m_name.group(1).strip()} + elif m_interval and current_task: + current_task['interval'] = m_interval.group(1).strip() + elif m_prompt and current_task: + current_task['prompt'] = m_prompt.group(1).strip() +if current_task.get('name'): + tasks.append(current_task) + +# Load state +try: + state = json.load(open(state_file)) +except: + state = {} + +# Parse interval to seconds +def parse_interval(s): + s = s.strip() + m = re.match(r'(\d+)\s*(m|min|h|hr|d)', s) + if not m: + return 3600 # default 1 hour + val, unit = int(m.group(1)), m.group(2) + if unit in ('m', 'min'): + return val * 60 + elif unit in ('h', 'hr'): + return val * 3600 + elif unit == 'd': + return val * 86400 + return 3600 + +# Check which tasks are due +now = time.time() +due = [] +for task in tasks: + name = task.get('name', '') + interval_sec = parse_interval(task.get('interval', '1h')) + last_run = state.get(name, {}).get('last_run', 0) + if now - last_run >= interval_sec: + due.append(task) + elif catchup and last_run == 0: + due.append(task) + +# Limit catchup to 5 tasks +if catchup: + due = due[:5] + +print(json.dumps(due)) +PYEOF +) + +# --- Run due tasks --- +TASK_COUNT=$(echo "$DUE_TASKS" | python3 -c "import sys,json; print(len(json.load(sys.stdin)))" 2>/dev/null) + +if [ "$TASK_COUNT" = "0" ]; then + echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | heartbeat | HEARTBEAT_OK (no tasks due)" >> "$LOG_DIR/heartbeat.log" + exit 0 +fi + +echo "$DUE_TASKS" | python3 -c " +import sys, json, subprocess, time, os + +tasks = json.load(sys.stdin) +state_file = '$STATE_FILE' +log_dir = '$LOG_DIR' +working_dir = '$WORKING_DIR' +max_turns = '$MAX_TURNS' +ack_max = int('$ACK_MAX_CHARS') +catchup = '$CATCHUP_MODE' == 'true' + +# Load state +try: + state = json.load(open(state_file)) +except: + state = {} + +for i, task in enumerate(tasks): + name = task.get('name', 'unknown') + prompt = task.get('prompt', '') + + if not prompt: + continue + + # Stagger catchup tasks + if catchup and i > 0: + time.sleep(5) + + ts = time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()) + print(f'{ts} | heartbeat | RUNNING: {name}') + + try: + result = subprocess.run( + ['claude', '-p', prompt, '--output-format', 'text', '--max-turns', str(max_turns)], + capture_output=True, text=True, timeout=600, + cwd=working_dir + ) + output = result.stdout.strip() + + # Suppress short ack responses (OpenClaw ackMaxChars pattern) + if len(output) <= ack_max and 'HEARTBEAT_OK' in output: + log_line = f'{ts} | heartbeat | {name} | HEARTBEAT_OK (suppressed)' + else: + log_line = f'{ts} | heartbeat | {name} | completed ({len(output)} chars)' + # Save full output + with open(os.path.join(log_dir, f'heartbeat-{name}-{time.strftime(\"%Y-%m-%d\")}.log'), 'a') as f: + f.write(f'--- {ts} ---\n{output}\n\n') + + except subprocess.TimeoutExpired: + log_line = f'{ts} | heartbeat | {name} | TIMEOUT' + except Exception as e: + log_line = f'{ts} | heartbeat | {name} | ERROR: {str(e)}' + + with open(os.path.join(log_dir, 'heartbeat.log'), 'a') as f: + f.write(log_line + '\n') + + # Update state + state[name] = {'last_run': time.time()} + +# Save state +with open(state_file, 'w') as f: + json.dump(state, f, indent=2) +" + +echo "Heartbeat complete: $TASK_COUNT tasks processed" +``` + +**`scripts/templates/heartbeat/README.md`**: + +```markdown +# Heartbeat Scheduling + +Combines OpenClaw's HEARTBEAT.md task format with Paperclip's interval-based +heartbeat model. Designed for Claude Code agents running on a schedule. + +## How it works + +1. A scheduler (cron/launchd/systemd) runs `heartbeat-runner.sh` at a fixed + interval (e.g., every 30 minutes) +2. The runner reads `HEARTBEAT.md` for task definitions +3. **Emptiness detection**: if the file has no real tasks, skip the API call + entirely (saves cost — from OpenClaw) +4. For each task: check if it's due based on its interval and last-run time +5. Run due tasks via `claude -p` with the task's prompt +6. Suppress short acknowledgment responses (<300 chars containing HEARTBEAT_OK) +7. Update `.heartbeat-state.json` with last-run timestamps + +## Two execution types (from OpenClaw) + +### systemEvent +Injects a text event into an existing session. Lightweight, no new session. +Use for: notifications, status checks, simple updates. +Template: `scripts/templates/cron/system-event.sh` + +### agentTurn +Fires a full agent turn with its own session. Full context, full tool access. +Use for: background autonomous work, complex tasks, multi-step operations. +Template: `scripts/templates/cron/agent-turn.sh` + +## Startup catchup (OpenClaw pattern) + +When the runner starts after downtime (e.g., machine was off): +- Run `heartbeat-runner.sh --catchup` +- Processes up to 5 missed tasks +- 5-second stagger between tasks (prevents thundering herd) + +## Cost optimization + +- **Emptiness detection**: No API call if HEARTBEAT.md has no real content +- **ackMaxChars suppression**: Responses under 300 chars with HEARTBEAT_OK + are logged but not displayed (saves downstream processing) +- **Interval-based**: Only run tasks when actually due, not every heartbeat + +## Example cron entries + +```bash +# Run heartbeat every 30 minutes +*/30 * * * * /path/to/heartbeat-runner.sh >> /tmp/heartbeat.log 2>&1 + +# Run heartbeat every hour with catchup on restart +@reboot /path/to/heartbeat-runner.sh --catchup >> /tmp/heartbeat.log 2>&1 +0 * * * * /path/to/heartbeat-runner.sh >> /tmp/heartbeat.log 2>&1 +``` + +## State file format + +`.heartbeat-state.json`: +```json +{ + "email-check": { "last_run": 1712847600 }, + "report-generation": { "last_run": 1712844000 } +} +``` +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh +``` +Expected: no syntax errors (exit 0) + +### On failure +retry — fix bash 3.2 syntax issues, then revert if still failing + +### Checkpoint +```bash +git commit -m "feat(templates): add heartbeat templates with emptiness detection and catchup" +``` + +--- + +## Step 11: Create proactive agent templates with ADL/VFM + +### Files to create + +**`scripts/templates/proactive/PROACTIVE-AGENT.md`**: + +```markdown +--- +name: {{AGENT_NAME}} +description: | + A proactive agent that can identify improvements and self-modify within + strict guardrails. Uses ADL (Anti-Drift Limits) and VFM (Value-First + Modification) scoring to prevent uncontrolled drift. + + + Context: Agent identifies a recurring inefficiency + user: "Check for improvements" + assistant: "I'll review recent performance data and propose changes via VFM scoring." + Proactive improvement cycle triggered by performance review. + +model: sonnet +tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash"] +--- + +## How you work + +You are a proactive agent. You don't just respond to tasks — you observe +your environment, identify improvements, and implement changes that pass +VFM scoring. + +### Proactive cycle + +1. **Observe**: Read performance data (feedback/FEEDBACK.md, audit.log, cost-events.jsonl) +2. **Identify**: Find patterns: recurring errors, slow steps, unnecessary work +3. **Score**: Run VFM scoring on each proposed change (see VFM protocol below) +4. **Implement**: Only changes with VFM score > 50. All others logged but not applied. +5. **Log**: Record every decision (implement or defer) with scores and reasoning + +### VFM protocol + +Before making ANY change to your own config, skills, prompts, or behavior: + +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/VFM-SCORING.md` +2. Score the proposed change across 4 dimensions (0-25 each) +3. If total score > 50: implement and log +4. If total score <= 50: log with reason for deferral, do NOT implement + +### Self-healing protocol + +When encountering errors: +1. Log the error with full context +2. Try approach 1 (most likely fix based on error message) +3. If fail: try approach 2 (alternative strategy) +4. If fail: try approach 3 (simplified version) +5. Continue up to 5 attempts with increasingly conservative approaches +6. After 5 failures: escalate to human with full attempt log + +## Rules (ADL — Anti-Drift Limits) + +Read the full ADL at `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/ADL-RULES.md`. + +Core constraints: +- **No fake intelligence**: Do not simulate capabilities you lack +- **No unverifiable modifications**: Every change must be testable +- **No novelty over stability**: Prefer proven approaches over clever ones +- **No scope expansion without approval**: Stay within your defined boundaries +- **No silent failures**: All errors must be logged + +Priority ordering: Stability > Explainability > Reusability > Scalability > Novelty + +## Output format + +After each proactive cycle, produce: + +``` +PROACTIVE CYCLE REPORT +====================== +Date: [timestamp] +Observations: [N] patterns found +Proposals: [N] changes evaluated + +| Proposed change | VFM score | Decision | Reason | +|----------------|-----------|----------|--------| +| [change 1] | [score] | implement/defer | [why] | +... + +Implemented: [N] +Deferred: [N] +Errors handled: [N] (max attempt: [N]) +``` +``` + +**`scripts/templates/proactive/ADL-RULES.md`**: + +```markdown +# Anti-Drift Limits (ADL) + +Guardrails that prevent proactive agents from drifting beyond useful behavior. +Inspired by OpenClaw's proactive agent skill. + +## Constraints + +### 1. No fake intelligence +Do not simulate capabilities you do not have. If you cannot access a tool, +do not pretend the operation succeeded. If you cannot verify a fact, say so. + +### 2. No unverifiable modifications +Every change you make must be testable. Before implementing: +- Define how to verify the change worked +- Run the verification after implementation +- Revert if verification fails + +### 3. No novelty over stability +When choosing between a clever new approach and a proven existing one, +choose the proven approach unless VFM scoring strongly favors the new one +(score > 75). + +### 4. No scope expansion without approval +Your boundaries are defined by your agent file and CLAUDE.md. You may +optimize within those boundaries. You may NOT: +- Add new tools to your own configuration +- Modify other agents' files +- Change system-level settings +- Create new agents or skills + +### 5. No silent failures +Every error, every failed attempt, every unexpected result must be logged. +Write to the daily log (memory/YYYY-MM-DD.md) or a dedicated error log. + +## Priority Ordering + +When constraints conflict, apply this priority: + +``` +Stability > Explainability > Reusability > Scalability > Novelty +``` + +A stable system that is hard to understand is better than a novel system +that breaks. An explainable system that doesn't scale is better than a +scalable system that nobody can debug. + +## When to override ADL + +ADL can be overridden ONLY by explicit human instruction. If the user says +"try the new approach even though it's risky," that overrides constraint #3. +Log the override with the user's exact instruction. + +Never self-override. The whole point of ADL is to prevent the agent from +convincing itself that an exception is warranted. +``` + +**`scripts/templates/proactive/VFM-SCORING.md`**: + +```markdown +# Value-First Modification (VFM) Scoring + +Scoring rubric for evaluating proposed self-modifications. Any change to +agent config, prompts, behavior, or pipeline structure must score > 50 +to be implemented. + +## Dimensions + +### Frequency (0-25 points) +How often does the issue this change addresses occur? + +| Score | Criteria | +|-------|----------| +| 0-5 | Happened once, may not recur | +| 6-10 | Happens occasionally (1-2x per week) | +| 11-15 | Happens regularly (daily) | +| 16-20 | Happens frequently (multiple times per day) | +| 21-25 | Happens on nearly every run | + +### Failure Reduction (0-25 points) +Does this change fix real failures? + +| Score | Criteria | +|-------|----------| +| 0-5 | Cosmetic improvement, no failures prevented | +| 6-10 | Prevents occasional warnings or non-critical errors | +| 11-15 | Prevents errors that require manual intervention | +| 16-20 | Prevents errors that cause pipeline failure | +| 21-25 | Prevents errors that cause data loss or system damage | + +### Burden Reduction (0-25 points) +Does this reduce human effort? + +| Score | Criteria | +|-------|----------| +| 0-5 | Saves less than 1 minute per occurrence | +| 6-10 | Saves 1-5 minutes per occurrence | +| 11-15 | Saves 5-30 minutes per occurrence | +| 16-20 | Eliminates a manual step entirely | +| 21-25 | Eliminates multiple manual steps or a recurring task | + +### Cost Savings (0-25 points) +Does this reduce API/compute costs? + +| Score | Criteria | +|-------|----------| +| 0-5 | Negligible cost difference | +| 6-10 | Saves <10% on affected operations | +| 11-15 | Saves 10-25% on affected operations | +| 16-20 | Saves 25-50% on affected operations | +| 21-25 | Saves >50% or eliminates unnecessary API calls entirely | + +## Decision threshold + +| Total score | Decision | +|-------------|----------| +| > 50 | **Implement** — change is worth the risk | +| 26-50 | **Defer** — log for future consideration | +| <= 25 | **Reject** — not worth pursuing | + +## Logging format + +Every VFM evaluation must be logged, whether implemented or not: + +``` +VFM EVALUATION +Date: [timestamp] +Proposed change: [description] +Scores: + Frequency: [score] — [justification] + Failure reduction: [score] — [justification] + Burden reduction: [score] — [justification] + Cost savings: [score] — [justification] +Total: [sum]/100 +Decision: implement / defer / reject +``` + +## Worked examples + +### Example 1: Add retry logic to web search (Implement) +- Frequency: 18 (search fails ~3x daily due to timeouts) +- Failure reduction: 15 (prevents pipeline stall requiring manual restart) +- Burden reduction: 16 (eliminates manual re-run) +- Cost savings: 8 (slight cost from retry, but saves failed run cost) +- **Total: 57 → Implement** + +### Example 2: Refactor prompt to use XML tags (Defer) +- Frequency: 25 (every run) +- Failure reduction: 3 (current format works fine) +- Burden reduction: 2 (no human effort saved) +- Cost savings: 5 (maybe slightly fewer tokens) +- **Total: 35 → Defer** (improvement is real but marginal) + +### Example 3: Switch to experimental model (Reject) +- Frequency: 25 (every run) +- Failure reduction: 0 (current model has no failures) +- Burden reduction: 0 (no human effort saved) +- Cost savings: 10 (newer model might be cheaper) +- **Total: 35 → Defer** (stability > novelty per ADL) +``` + +**`scripts/templates/proactive/README.md`**: + +```markdown +# Proactive Agent Pattern + +A proactive agent observes its environment, identifies improvements, and +self-modifies within strict guardrails. This pattern is inspired by +OpenClaw's proactive agent skill. + +## When to use + +- Agents that run frequently and should improve over time +- Pipelines with measurable performance metrics +- Systems where the cost of not improving exceeds the risk of changes + +## When NOT to use + +- Simple pipelines that just need to run reliably +- Human-in-the-loop workflows (the human provides the feedback) +- New systems that haven't established a performance baseline yet + +## Components + +- **PROACTIVE-AGENT.md**: Agent template with proactive cycle, VFM protocol, self-healing +- **ADL-RULES.md**: Anti-Drift Limits — constraints that prevent uncontrolled drift +- **VFM-SCORING.md**: Value-First Modification — scoring rubric for proposed changes + +## How ADL and VFM work together + +ADL defines what the agent CANNOT do (hard boundaries). +VFM determines what the agent SHOULD do (prioritization within boundaries). + +``` +Proposed change + → Check ADL constraints → BLOCKED if constraint violated + → Score with VFM → IMPLEMENT if > 50, DEFER if <= 50 + → Log decision either way +``` + +## Integration with feedback loops + +The proactive agent reads from: +- `feedback/FEEDBACK.md` — pipeline run outcomes +- `budget/cost-events.jsonl` — cost data +- `logs/audit.log` — tool call history +- `memory/MEMORY.md` — long-term patterns + +It writes to: +- Daily log (decisions and scores) +- Its own agent file (when implementing approved changes) +- SESSION-STATE.md (current proactive cycle state) +``` + +### Verify + +```bash +ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l +``` +Expected: `4` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(templates): add proactive agent templates with ADL/VFM guardrails" +``` + +--- + +## Step 12: Create cron templates (agentTurn + systemEvent) + +### Files to create + +**`scripts/templates/cron/agent-turn.sh`**: + +```bash +#!/bin/bash +# Agent Turn: Full background autonomy for Claude Code agents. +# Fires a complete agent turn with its own named session. +# +# Bash 3.2 compatible. Uses python3 for JSON/date operations. +# +# Placeholders: +# {{AGENT_NAME}} - name of the agent +# {{WORKING_DIR}} - absolute path to project directory +# {{MAX_TURNS}} - max turns per agent turn (default: 15) + +AGENT_NAME="{{AGENT_NAME}}" +WORKING_DIR="{{WORKING_DIR}}" +MAX_TURNS="${MAX_TURNS:-15}" +SESSION_NAME="agent:${AGENT_NAME}:turn" +PID_FILE="$WORKING_DIR/.agent-turn-${AGENT_NAME}.pid" +LOG_DIR="$WORKING_DIR/logs" +STATE_FILE="$WORKING_DIR/.agent-turn-state.json" + +mkdir -p "$LOG_DIR" + +# --- Orphan detection --- +# Check if a previous agent turn is still running +if [ -f "$PID_FILE" ]; then + OLD_PID=$(cat "$PID_FILE" 2>/dev/null) + if [ -n "$OLD_PID" ] && kill -0 "$OLD_PID" 2>/dev/null; then + echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | agent-turn | SKIP: previous run still active (PID $OLD_PID)" >> "$LOG_DIR/agent-turn.log" + exit 0 + else + echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | agent-turn | Cleaned orphan PID file (PID $OLD_PID)" >> "$LOG_DIR/agent-turn.log" + rm -f "$PID_FILE" + fi +fi + +# --- Session rollover check --- +# After 200 turns or 72 hours, start a fresh session +NEEDS_FRESH=$(python3 -c " +import json, time, os +state_file = '$STATE_FILE' +agent = '$AGENT_NAME' +try: + state = json.load(open(state_file)) + agent_state = state.get(agent, {}) + turns = agent_state.get('turn_count', 0) + started = agent_state.get('session_started', 0) + age_hours = (time.time() - started) / 3600 if started else 999 + if turns >= 200 or age_hours >= 72: + print('true') + else: + print('false') +except: + print('true') +" 2>/dev/null) + +if [ "$NEEDS_FRESH" = "true" ]; then + # Use a new session name with timestamp for rollover + SESSION_NAME="agent:${AGENT_NAME}:turn:$(date +%s)" + # Reset state + python3 -c " +import json, time, os +state_file = '$STATE_FILE' +agent = '$AGENT_NAME' +try: + state = json.load(open(state_file)) +except: + state = {} +state[agent] = {'turn_count': 0, 'session_started': time.time(), 'session_name': '$SESSION_NAME'} +with open(state_file, 'w') as f: + json.dump(state, f, indent=2) +" +fi + +# --- Load context --- +# Build prompt from HEARTBEAT.md and SESSION-STATE.md +PROMPT=$(python3 -c " +import os +working_dir = '$WORKING_DIR' +parts = [] + +# Read SESSION-STATE.md for current context +ss = os.path.join(working_dir, 'memory', 'SESSION-STATE.md') if os.path.isdir(os.path.join(working_dir, 'memory')) else os.path.join(working_dir, 'SESSION-STATE.md') +if os.path.exists(ss): + parts.append('## Current session state:\n' + open(ss).read()[:2000]) + +# Read HEARTBEAT.md for tasks +hb = os.path.join(working_dir, 'HEARTBEAT.md') +if os.path.exists(hb): + parts.append('## Heartbeat tasks:\n' + open(hb).read()[:2000]) + +if parts: + print('\n\n'.join(parts)) +else: + print('No context files found. Check the working directory and proceed with any pending tasks.') +" 2>/dev/null) + +# --- Run agent turn --- +echo $$ > "$PID_FILE" +TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) +echo "$TIMESTAMP | agent-turn | START: $AGENT_NAME (session: $SESSION_NAME)" >> "$LOG_DIR/agent-turn.log" + +cd "$WORKING_DIR" + +OUTPUT=$(claude -p "$PROMPT" \ + --name "$SESSION_NAME" \ + --output-format text \ + --max-turns "$MAX_TURNS" 2>&1) + +EXIT_CODE=$? +TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) + +# Save output to dated log +echo "--- $TIMESTAMP ---" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log" +echo "$OUTPUT" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log" +echo "" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log" + +# Update turn count +python3 -c " +import json, time +state_file = '$STATE_FILE' +agent = '$AGENT_NAME' +try: + state = json.load(open(state_file)) +except: + state = {} +if agent not in state: + state[agent] = {'turn_count': 0, 'session_started': time.time(), 'session_name': '$SESSION_NAME'} +state[agent]['turn_count'] = state[agent].get('turn_count', 0) + 1 +state[agent]['last_run'] = time.time() +with open(state_file, 'w') as f: + json.dump(state, f, indent=2) +" + +if [ "$EXIT_CODE" -eq 0 ]; then + echo "$TIMESTAMP | agent-turn | COMPLETE: $AGENT_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/agent-turn.log" +else + echo "$TIMESTAMP | agent-turn | ERROR: $AGENT_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/agent-turn.log" +fi + +# Cleanup +rm -f "$PID_FILE" +``` + +**`scripts/templates/cron/system-event.sh`**: + +```bash +#!/bin/bash +# System Event: Inject a text event into an existing Claude Code session. +# Lighter than agentTurn — does not create a new session. +# +# Bash 3.2 compatible. +# +# Usage: ./system-event.sh "session-name" "Event text to inject" +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +SESSION_NAME="${1:-}" +EVENT_TEXT="${2:-}" + +if [ -z "$SESSION_NAME" ] || [ -z "$EVENT_TEXT" ]; then + echo "Usage: $0 " + echo "Example: $0 'agent:researcher:turn' 'New data available in /data/inbox'" + exit 1 +fi + +LOG_DIR="$WORKING_DIR/logs" +mkdir -p "$LOG_DIR" + +TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) +echo "$TIMESTAMP | system-event | INJECT: $SESSION_NAME — $EVENT_TEXT" >> "$LOG_DIR/system-event.log" + +cd "$WORKING_DIR" + +# Resume the named session and inject the event +OUTPUT=$(claude --resume "$SESSION_NAME" -p "$EVENT_TEXT" \ + --output-format text \ + --max-turns 3 2>&1) + +EXIT_CODE=$? +TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) + +if [ "$EXIT_CODE" -eq 0 ]; then + echo "$TIMESTAMP | system-event | DELIVERED: $SESSION_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/system-event.log" +else + echo "$TIMESTAMP | system-event | FAILED: $SESSION_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/system-event.log" +fi +``` + +**`scripts/templates/cron/README.md`**: + +```markdown +# Cron Execution Templates + +Two execution types for scheduled agent work, inspired by OpenClaw's cron service. + +## agentTurn (agent-turn.sh) + +Full background autonomy. Creates/resumes its own named session. + +**Use for:** +- Complex multi-step work +- Tasks that need full context and tool access +- Background autonomous operation (proactive agent cycles) + +**Features:** +- Named sessions via `--name` for resume capability +- Orphan detection (checks PID before starting new run) +- Session rollover: fresh session after 200 turns or 72 hours +- Context loading from SESSION-STATE.md and HEARTBEAT.md +- Dated log files per agent per day + +**Session naming (VERIFIED):** +Uses `--name "agent::turn"` with `--resume` for continuity. +Note: `--session-id` requires valid UUIDs. Named sessions are the +correct approach for human-readable, resumable agent sessions. + +## systemEvent (system-event.sh) + +Injects text into an existing session. No new session created. + +**Use for:** +- Notifications ("new data available") +- Simple status checks +- Triggering a specific action in a running session + +**Limitations:** +- Requires an active named session to resume +- Limited to 3 turns (quick action, not full autonomy) +- If the session doesn't exist, the command fails gracefully + +## Scheduling examples + +```bash +# agentTurn: run full agent turn every hour +0 * * * * /path/to/agent-turn.sh >> /tmp/agent-turn.log 2>&1 + +# systemEvent: notify agent of new data every 30 min +*/30 * * * * /path/to/system-event.sh "agent:processor:turn" "Check /data/inbox for new files" + +# agentTurn with catchup on reboot +@reboot /path/to/agent-turn.sh >> /tmp/agent-turn.log 2>&1 +``` +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure +retry — fix bash 3.2 syntax issues, then revert if still failing + +### Checkpoint +```bash +git commit -m "feat(templates): add isolated agentTurn and systemEvent cron templates" +``` + +--- + +## Exit Condition + +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l` → 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l` → 3 (HEARTBEAT.md, heartbeat-runner.sh, README.md) +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l` → 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/cron/ | wc -l` → 3 +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh` → exit 0 +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh` → exit 0 +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh` → exit 0 +- [ ] All memory templates contain `{{AGENT_NAME}}` placeholder: `grep -l "AGENT_NAME" /Users/ktg/repos/agent-builder/scripts/templates/memory/*.md | wc -l` → 3 + +## Quality Criteria + +- 3-tier memory matches OpenClaw pattern: SESSION-STATE (hot), daily logs (warm), MEMORY (cold) +- WAL protocol instructions included in SESSION-STATE template +- Working Buffer protocol with 60% danger zone threshold included +- Compaction recovery order documented +- heartbeat-runner.sh implements emptiness detection (cost saving) +- heartbeat-runner.sh implements startup catchup with stagger +- heartbeat-runner.sh suppresses ack responses under 300 chars +- All shell scripts pass `bash -n` syntax check +- agent-turn.sh uses `--name` (not custom session IDs per verified assumption) +- ADL and VFM scoring rubrics have concrete thresholds and worked examples diff --git a/.claude/plans/blueprints/session-4-paperclip.md b/.claude/plans/blueprints/session-4-paperclip.md new file mode 100644 index 0000000..ad64310 --- /dev/null +++ b/.claude/plans/blueprints/session-4-paperclip.md @@ -0,0 +1,757 @@ +# Session 4: Paperclip Orchestration Patterns + +> Steps 14, 15, 16, 17, 18 | Wave 1 | Depends on: none + +## Dependencies + +Entry condition: none (creates new template directories only, plus 2 new files in heartbeat/) + +## Scope Fence + +**Touch:** +- `scripts/templates/heartbeat/context-packet.md` (new) +- `scripts/templates/heartbeat/wake-prompt.md` (new) +- `scripts/templates/goals/GOALS.md` (new) +- `scripts/templates/goals/goal-tracker.sh` (new) +- `scripts/templates/goals/README.md` (new) +- `scripts/templates/budget/budget-hook.sh` (new) +- `scripts/templates/budget/BUDGET.md` (new) +- `scripts/templates/budget/budget-report.sh` (new) +- `scripts/templates/budget/README.md` (new) +- `scripts/templates/governance/GOVERNANCE.md` (new) +- `scripts/templates/governance/approval-gate.sh` (new) +- `scripts/templates/governance/README.md` (new) +- `scripts/templates/org-chart/ORG-CHART.md` (new) +- `scripts/templates/org-chart/org-manager.sh` (new) +- `scripts/templates/org-chart/README.md` (new) + +**Never touch:** +- `commands/` +- `agents/` +- `skills/` +- `scripts/templates/heartbeat/HEARTBEAT.md` (Session 3) +- `scripts/templates/heartbeat/heartbeat-runner.sh` (Session 3) +- `scripts/templates/heartbeat/README.md` (Session 3) +- `scripts/templates/memory/` +- `scripts/templates/proactive/` +- `scripts/templates/cron/` +- `.claude-plugin/`, `CLAUDE.md`, `README.md` + +--- + +## Step 14: Create heartbeat context injection templates + +### Files to create + +**`scripts/templates/heartbeat/context-packet.md`** — Paperclip's "Memento Man" pattern: + +```markdown +# Context Packet: {{AGENT_NAME}} +Generated: {{TIMESTAMP}} + +## Identity +{{AGENT_IDENTITY}} + +## Current Goals +{{ACTIVE_GOALS}} + +## Memory State +{{MEMORY_SUMMARY}} + +## Task Queue +{{PENDING_TASKS}} + +## Recent Events (last 24h) +{{RECENT_EVENTS}} + +## Wake Reason +{{WAKE_REASON}} + +## Budget Status +Spent: {{BUDGET_SPENT}} / {{BUDGET_LIMIT}} ({{BUDGET_PERCENT}}%) +{{BUDGET_WARNING}} +``` + +**`scripts/templates/heartbeat/wake-prompt.md`** — Prompt template for each heartbeat wakeup: + +```markdown +You are {{AGENT_NAME}}. You are waking up for a scheduled heartbeat. + +Read the context packet below carefully. It contains everything you +need to know about your current state and pending work. + +{{CONTEXT_PACKET}} + +Your task for this beat: +{{WAKE_REASON}} + +Rules: +- Do NOT infer tasks from prior conversations +- Only work on what is in the context packet and wake reason +- If nothing needs attention, respond with HEARTBEAT_OK +- Update SESSION-STATE.md before finishing +``` + +### Verify + +```bash +ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l +``` +Expected: `5` (3 from Session 3 + 2 from this step) + +### On failure: revert + +### Checkpoint +```bash +git commit -m "feat(templates): add context injection templates (Paperclip heartbeat pattern)" +``` + +--- + +## Step 15: Create goal hierarchy templates + +### Files to create + +**`scripts/templates/goals/GOALS.md`**: + +```markdown +# Goals: {{PROJECT_NAME}} + +## Company Goals +- [G1] {{COMPANY_GOAL_1}} +- [G2] {{COMPANY_GOAL_2}} + +## Project Goals +- [G1.1] {{PROJECT_GOAL_1}} (parent: G1) +- [G1.2] {{PROJECT_GOAL_2}} (parent: G1) +- [G2.1] {{PROJECT_GOAL_3}} (parent: G2) + +## Task Goals +- [G1.1.1] {{TASK_GOAL_1}} (parent: G1.1, owner: {{AGENT_NAME}}, status: active) +- [G1.1.2] {{TASK_GOAL_2}} (parent: G1.1, owner: {{AGENT_NAME}}, status: pending) + +## Notes + +Goal IDs use hierarchical dot notation. Each goal has: +- ID: unique identifier (e.g., G1.1.1) +- Description: what the goal is +- Parent: which goal this supports (simple parent reference, not recursive) +- Owner: which agent is responsible (task goals only) +- Status: active | pending | complete | blocked +``` + +**`scripts/templates/goals/goal-tracker.sh`**: + +```bash +#!/bin/bash +# Goal tracker: parse and manage GOALS.md +# Bash 3.2 compatible. Uses python3 for parsing. +# +# Usage: +# ./goal-tracker.sh # Show goal summary +# ./goal-tracker.sh complete G1.1.1 # Mark goal as complete +# ./goal-tracker.sh status # Show status counts +# ./goal-tracker.sh context # Generate context for heartbeat injection +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +GOALS_FILE="$WORKING_DIR/GOALS.md" +ACTION="${1:-summary}" +GOAL_ID="${2:-}" + +if [ ! -f "$GOALS_FILE" ]; then + echo "Error: $GOALS_FILE not found" + exit 1 +fi + +case "$ACTION" in + summary|status) + python3 << PYEOF +import re + +goals = [] +with open("$GOALS_FILE") as f: + for line in f: + m = re.match(r'-\s*\[(\S+)\]\s+(.+)', line.strip()) + if m: + gid = m.group(1) + rest = m.group(2) + status_m = re.search(r'status:\s*(\w+)', rest) + parent_m = re.search(r'parent:\s*(\S+)', rest) + owner_m = re.search(r'owner:\s*(\S+)', rest) + status = status_m.group(1) if status_m else 'active' + parent = parent_m.group(1).rstrip(',)') if parent_m else None + owner = owner_m.group(1).rstrip(',)') if owner_m else None + desc = re.sub(r'\(.*\)', '', rest).strip() + goals.append({'id': gid, 'desc': desc, 'status': status, 'parent': parent, 'owner': owner}) + +# Status counts +counts = {} +for g in goals: + counts[g['status']] = counts.get(g['status'], 0) + 1 + +print("Goal Status Summary") +print("=" * 40) +for status, count in sorted(counts.items()): + print(f" {status}: {count}") +print(f" Total: {len(goals)}") + +# Check for orphans +all_ids = set(g['id'] for g in goals) +orphans = [g for g in goals if g['parent'] and g['parent'] not in all_ids] +if orphans: + print(f"\nOrphaned goals (parent not found):") + for g in orphans: + print(f" [{g['id']}] parent: {g['parent']}") + +# Goals without owners at task level +unowned = [g for g in goals if '.' in g['id'] and g['id'].count('.') >= 2 and not g['owner']] +if unowned: + print(f"\nTask goals without owners:") + for g in unowned: + print(f" [{g['id']}] {g['desc']}") +PYEOF + ;; + + complete) + if [ -z "$GOAL_ID" ]; then + echo "Usage: $0 complete " + exit 1 + fi + python3 -c " +import re +goal_id = '$GOAL_ID' +with open('$GOALS_FILE') as f: + content = f.read() +# Replace status for the specific goal +pattern = r'(\[' + re.escape(goal_id) + r'\][^)]*status:\s*)\w+' +if re.search(pattern, content): + content = re.sub(pattern, r'\1complete', content) + with open('$GOALS_FILE', 'w') as f: + f.write(content) + print(f'Goal {goal_id} marked as complete') +else: + print(f'Goal {goal_id} not found or has no status field') +" + ;; + + context) + # Generate a goal summary for heartbeat context injection + python3 << PYEOF +import re + +goals = [] +with open("$GOALS_FILE") as f: + for line in f: + m = re.match(r'-\s*\[(\S+)\]\s+(.+)', line.strip()) + if m: + gid = m.group(1) + rest = m.group(2) + status_m = re.search(r'status:\s*(\w+)', rest) + status = status_m.group(1) if status_m else 'active' + desc = re.sub(r'\(.*\)', '', rest).strip() + goals.append({'id': gid, 'desc': desc, 'status': status}) + +active = [g for g in goals if g['status'] == 'active'] +if active: + print("Active goals:") + for g in active: + print(f" [{g['id']}] {g['desc']}") +else: + print("No active goals.") +PYEOF + ;; + + *) + echo "Usage: $0 [summary|complete |status|context]" + exit 1 + ;; +esac +``` + +**`scripts/templates/goals/README.md`**: + +```markdown +# Goal Hierarchy + +File-based goal hierarchy inspired by Paperclip's goal system. + +## Design decisions + +- **Simple parent_id, not recursive**: Each goal references its parent by ID. + No recursive traversal at runtime — matching Paperclip's actual implementation + (not their aspirational docs which describe "full ancestry"). +- **Dot notation for hierarchy**: G1 → G1.1 → G1.1.1 makes the hierarchy visible + in the ID itself. +- **File-based, not database**: Human-editable, version-controlled, no dependencies. + +## Goal levels + +| Level | ID pattern | Example | Description | +|-------|-----------|---------|-------------| +| Company | G1, G2 | G1 | Top-level organizational goals | +| Project | G1.1, G1.2 | G1.1 | Goals that support a company goal | +| Task | G1.1.1 | G1.1.1 | Specific tasks assigned to agents | + +## Usage + +```bash +# View goal summary +./goal-tracker.sh + +# Mark a task as complete +./goal-tracker.sh complete G1.1.1 + +# Generate context for heartbeat injection +./goal-tracker.sh context +``` + +## Integration with heartbeat + +The `context` command produces a summary suitable for injection into +the heartbeat context packet (see `scripts/templates/heartbeat/context-packet.md`). +The heartbeat runner can call `./goal-tracker.sh context` and inject +the output as `{{ACTIVE_GOALS}}`. +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/goals/goal-tracker.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure: retry — fix bash syntax, then revert + +### Checkpoint +```bash +git commit -m "feat(templates): add goal hierarchy templates (Paperclip pattern)" +``` + +--- + +## Step 16: Create budget tracking templates + +### Files to create + +**`scripts/templates/budget/BUDGET.md`**: + +```markdown +# Budget Policy: {{PROJECT_NAME}} + +## Company Budget +- window: {{BUDGET_WINDOW}} +- limit: {{BUDGET_LIMIT_CENTS}} cents +- warn_percent: 80 +- hard_stop: true + +## Agent Budgets +- {{AGENT_NAME}}: {{AGENT_BUDGET_CENTS}} cents/{{BUDGET_WINDOW}} + +## Notification +- on_warn: log +- on_hard_stop: pause + +## Notes + +Budget enforcement is POST-HOC (checked after each run, not before). +This matches Paperclip's proven approach: check SUM(cost) after run, +pause if exceeded. No pre-run reservation needed. + +Cost estimation uses token counts × published pricing. For accurate +cost data, organizations can use the Admin API: +`/v1/organizations/cost_report` (requires Admin API key: sk-ant-admin...). + +For headless runs, use `claude -p --max-budget-usd N` as a per-run cap. +``` + +**`scripts/templates/budget/budget-hook.sh`**: + +```bash +#!/bin/bash +# PostToolUse hook: Log cost events and enforce budget. +# Bash 3.2 compatible. Uses python3 for JSON parsing. +# +# Follows Paperclip's post-hoc enforcement pattern: +# 1. Log cost event after each tool call +# 2. Check cumulative cost against budget policy +# 3. Warn at soft threshold, pause at hard threshold +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +BUDGET_DIR="$WORKING_DIR/budget" +COST_LOG="$BUDGET_DIR/cost-events.jsonl" +BUDGET_FILE="$WORKING_DIR/BUDGET.md" +PAUSED_FLAG="$BUDGET_DIR/PAUSED" + +mkdir -p "$BUDGET_DIR" + +# Read hook input +INPUT=$(cat) +TOOL_NAME=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_name',''))" 2>/dev/null) + +# Log cost event +python3 << PYEOF +import json, sys, time, os + +try: + data = json.loads('''$INPUT''') +except: + sys.exit(0) + +tool_name = data.get('tool_name', '') +# Estimate cost from token counts if available in tool result +# This is a rough estimate; actual costs come from the Admin API +event = { + 'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), + 'tool_name': tool_name, + 'agent': os.environ.get('AGENT_NAME', 'unknown'), + 'estimated_tokens': 0 +} + +cost_log = "$COST_LOG" +with open(cost_log, 'a') as f: + f.write(json.dumps(event) + '\n') +PYEOF + +# Check budget if BUDGET.md exists +if [ -f "$BUDGET_FILE" ] && [ -f "$COST_LOG" ]; then + BUDGET_CHECK=$(python3 << 'PYEOF' +import re, json, os + +budget_file = os.environ.get('BUDGET_FILE', '') +cost_log = os.environ.get('COST_LOG', '') +paused_flag = os.environ.get('PAUSED_FLAG', '') + +if not budget_file or not cost_log: + print("ok") + exit(0) + +# Parse budget +try: + content = open(budget_file).read() + limit_m = re.search(r'limit:\s*(\d+)\s*cents', content) + warn_m = re.search(r'warn_percent:\s*(\d+)', content) + hard_m = re.search(r'hard_stop:\s*(\w+)', content) + if not limit_m: + print("ok") + exit(0) + limit = int(limit_m.group(1)) + warn_pct = int(warn_m.group(1)) if warn_m else 80 + hard_stop = hard_m.group(1).lower() == 'true' if hard_m else True +except: + print("ok") + exit(0) + +# Sum cost events (count events as rough proxy — actual cost tracking +# requires Admin API or token counting from responses) +try: + event_count = sum(1 for _ in open(cost_log)) +except: + event_count = 0 + +# Rough estimate: each event ~ 1 cent (placeholder — customize per model) +estimated_cents = event_count + +pct = (estimated_cents / limit * 100) if limit > 0 else 0 + +if pct >= 100 and hard_stop: + open(paused_flag, 'w').write(f'Budget exceeded: {estimated_cents}/{limit} cents') + print("hard_stop") +elif pct >= warn_pct: + print("warn") +else: + print("ok") +PYEOF + ) + + BUDGET_FILE="$BUDGET_FILE" COST_LOG="$COST_LOG" PAUSED_FLAG="$PAUSED_FLAG" \ + BUDGET_RESULT=$(BUDGET_FILE="$BUDGET_FILE" COST_LOG="$COST_LOG" PAUSED_FLAG="$PAUSED_FLAG" python3 -c " +import re, json, os +budget_file = '$BUDGET_FILE' +cost_log = '$COST_LOG' +paused_flag = '$PAUSED_FLAG' +try: + content = open(budget_file).read() + limit_m = re.search(r'limit:\s*(\d+)\s*cents', content) + if not limit_m: print('ok'); exit(0) + limit = int(limit_m.group(1)) + warn_m = re.search(r'warn_percent:\s*(\d+)', content) + warn_pct = int(warn_m.group(1)) if warn_m else 80 + hard_m = re.search(r'hard_stop:\s*(\w+)', content) + hard_stop = hard_m.group(1).lower() == 'true' if hard_m else True + event_count = sum(1 for _ in open(cost_log)) + estimated_cents = event_count + pct = (estimated_cents / limit * 100) if limit > 0 else 0 + if pct >= 100 and hard_stop: + open(paused_flag, 'w').write(f'Budget exceeded: {estimated_cents}/{limit} cents') + print('hard_stop') + elif pct >= warn_pct: + print('warn') + else: + print('ok') +except Exception as e: + print('ok') +" 2>/dev/null) + + if [ "$BUDGET_RESULT" = "hard_stop" ]; then + echo "BUDGET EXCEEDED — agent paused. Check $PAUSED_FLAG" >&2 + elif [ "$BUDGET_RESULT" = "warn" ]; then + echo "BUDGET WARNING — approaching limit" >&2 + fi +fi + +# Check if agent is paused +if [ -f "$PAUSED_FLAG" ]; then + echo '{"decision": "block", "reason": "Agent paused: budget exceeded. Remove '"$PAUSED_FLAG"' to resume."}' + exit 2 +fi + +exit 0 +``` + +**`scripts/templates/budget/budget-report.sh`**: + +```bash +#!/bin/bash +# Budget report: summarize cost events and compare against policy. +# Bash 3.2 compatible. Uses python3 for aggregation. +# +# Usage: ./budget-report.sh +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +COST_LOG="$WORKING_DIR/budget/cost-events.jsonl" +BUDGET_FILE="$WORKING_DIR/BUDGET.md" +PAUSED_FLAG="$WORKING_DIR/budget/PAUSED" + +if [ ! -f "$COST_LOG" ]; then + echo "No cost events recorded yet." + exit 0 +fi + +python3 << PYEOF +import json, re, os +from collections import defaultdict + +cost_log = "$COST_LOG" +budget_file = "$BUDGET_FILE" +paused_flag = "$PAUSED_FLAG" + +# Read events +events = [] +with open(cost_log) as f: + for line in f: + line = line.strip() + if line: + try: + events.append(json.loads(line)) + except: + pass + +# Aggregate +by_agent = defaultdict(int) +by_day = defaultdict(int) +by_tool = defaultdict(int) + +for e in events: + agent = e.get('agent', 'unknown') + day = e.get('timestamp', '')[:10] + tool = e.get('tool_name', 'unknown') + by_agent[agent] += 1 + by_day[day] += 1 + by_tool[tool] += 1 + +print("BUDGET REPORT") +print("=" * 50) +print(f"Total events: {len(events)}") +print() + +# Per-agent breakdown +print("By Agent:") +for agent, count in sorted(by_agent.items(), key=lambda x: -x[1]): + print(f" {agent}: {count} events") +print() + +# Per-day breakdown (last 7 days) +print("By Day (last 7):") +for day, count in sorted(by_day.items())[-7:]: + print(f" {day}: {count} events") +print() + +# Budget comparison +if os.path.exists(budget_file): + content = open(budget_file).read() + limit_m = re.search(r'limit:\s*(\d+)\s*cents', content) + if limit_m: + limit = int(limit_m.group(1)) + est_cents = len(events) # rough proxy + pct = (est_cents / limit * 100) if limit > 0 else 0 + print(f"Budget: ~{est_cents}/{limit} cents ({pct:.0f}%)") + +# Paused status +if os.path.exists(paused_flag): + print(f"\n!! AGENT PAUSED: {open(paused_flag).read().strip()}") + print(f" Remove {paused_flag} to resume") +PYEOF +``` + +**`scripts/templates/budget/README.md`**: + +```markdown +# Budget Tracking + +Post-hoc budget enforcement inspired by Paperclip's budget system. + +## How it works + +1. `budget-hook.sh` runs as a PostToolUse hook after every tool call +2. Each call is logged to `budget/cost-events.jsonl` +3. After logging, cumulative cost is compared against `BUDGET.md` policy +4. If soft threshold (default 80%) exceeded: warning to stderr +5. If hard threshold (100%) exceeded and hard_stop=true: creates `budget/PAUSED` + flag file, subsequent tool calls are blocked (exit 2) + +## Why post-hoc, not pre-run? + +Paperclip uses the same approach. Pre-run budget reservation requires a +persistent service or lock file coordination. Post-hoc checking is simpler +and robust enough in practice — the worst case is one extra run before pause. + +## Cost estimation + +The current implementation counts events as a rough proxy for cost. For +accurate cost tracking, you have two options: + +1. **Admin API** (org accounts only): Query `/v1/organizations/cost_report` + with an Admin API key (`sk-ant-admin...`). This gives actual USD costs. +2. **Token estimation**: Parse token counts from Claude's responses and + multiply by published per-token prices. More accurate than event counting + but still an estimate. + +For headless runs, `claude -p --max-budget-usd N` provides a per-run +budget cap directly in the CLI. + +## Integration + +Add to `.claude/settings.json`: +```json +{ + "hooks": { + "PostToolUse": [{ + "matcher": "*", + "hooks": [{"type": "command", "command": "bash budget/budget-hook.sh"}] + }] + } +} +``` +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-hook.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-report.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure: retry — fix bash syntax, then revert + +### Checkpoint +```bash +git commit -m "feat(templates): add budget tracking templates (Paperclip pattern)" +``` + +--- + +## Step 17: Create governance and approval gate templates + +### Files to create + +**`scripts/templates/governance/GOVERNANCE.md`** — See plan Step 17 for full content. Key sections: +- Autonomy Levels (0-4 scale from full manual to full autonomy) +- Approval Gates with `{{GATE_NAME}}` and `{{GATE_CONDITION}}` placeholders +- Escalation Rules (budget exceeded, error threshold, unknown tool, scope violation) +- Audit Requirements (tool calls, budget events, approvals, retention) + +**`scripts/templates/governance/approval-gate.sh`** — PreToolUse hook implementing approval gates. Bash 3.2 compatible. Key behavior: +1. Reads GOVERNANCE.md for current autonomy level +2. Based on level, auto-approves or requires approval +3. For gated operations: writes request to `governance/pending-approvals.jsonl` +4. Checks `governance/approval-responses.jsonl` for matching response +5. No response within timeout → block (exit 2) + +**`scripts/templates/governance/README.md`** — Explains the governance model, autonomy levels, Paperclip's "autonomy is a privilege" philosophy. + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/governance/approval-gate.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure: retry — fix bash syntax, then revert + +### Checkpoint +```bash +git commit -m "feat(templates): add governance and approval gate templates (Paperclip pattern)" +``` + +--- + +## Step 18: Create org-chart template + +### Files to create + +**`scripts/templates/org-chart/ORG-CHART.md`** — Markdown table with columns: Agent, Role, Reports To, Status, Budget. Uses `reportsTo` pattern from Paperclip. Includes delegation rules and human override section. + +**`scripts/templates/org-chart/org-manager.sh`** — Bash 3.2 script that: +- Parses ORG-CHART.md table with python3 +- Validates: agents exist in `.claude/agents/`, no circular chains +- Can add/remove agents from the chart +- Generates text-based org tree visualization + +**`scripts/templates/org-chart/README.md`** — Explains the simple `reportsTo` pattern, delegation flows, cross-team routing. + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/org-chart/org-manager.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure: retry — fix bash syntax, then revert + +### Checkpoint +```bash +git commit -m "feat(templates): add org-chart template (Paperclip pattern)" +``` + +--- + +## Exit Condition + +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l` → 5 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/goals/ | wc -l` → 3 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/budget/ | wc -l` → 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/governance/ | wc -l` → 3 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/org-chart/ | wc -l` → 3 +- [ ] All shell scripts pass `bash -n`: `find /Users/ktg/repos/agent-builder/scripts/templates/goals /Users/ktg/repos/agent-builder/scripts/templates/budget /Users/ktg/repos/agent-builder/scripts/templates/governance /Users/ktg/repos/agent-builder/scripts/templates/org-chart -name "*.sh" -exec bash -n {} \;` → no errors +- [ ] context-packet.md contains `{{WAKE_REASON}}` placeholder +- [ ] budget-hook.sh contains reference to PAUSED flag file + +## Quality Criteria + +- Context packet follows Paperclip's "Memento Man" pattern with all sections +- Wake prompt includes rules about not inferring from prior conversations +- Goal hierarchy uses simple parent_id (not recursive) matching Paperclip's actual code +- Budget enforcement is post-hoc matching Paperclip's pattern +- Budget README honestly documents Admin API limitation (org-only, admin key) +- Governance has 5 autonomy levels (0-4) with clear descriptions +- Org chart uses simple reportsTo pattern with human override authority +- All bash scripts are 3.2 compatible diff --git a/.claude/plans/blueprints/session-5-selflearning.md b/.claude/plans/blueprints/session-5-selflearning.md new file mode 100644 index 0000000..5aa3994 --- /dev/null +++ b/.claude/plans/blueprints/session-5-selflearning.md @@ -0,0 +1,997 @@ +# Session 5: Self-Learning Systems + +> Steps 20, 21 | Wave 1 | Depends on: none + +## Dependencies + +Entry condition: none (independent — creates new template directories only) + +## Scope Fence + +**Touch:** +- `scripts/templates/feedback/FEEDBACK.md` (new) +- `scripts/templates/feedback/feedback-collector.sh` (new) +- `scripts/templates/feedback/performance-scorer.sh` (new) +- `scripts/templates/feedback/README.md` (new) +- `scripts/templates/optimization/pipeline-optimizer.sh` (new) +- `scripts/templates/optimization/self-healing.sh` (new) +- `scripts/templates/optimization/README.md` (new) + +**Never touch:** +- `commands/` +- `agents/` +- `skills/` +- `scripts/templates/heartbeat/` +- `scripts/templates/memory/` +- `scripts/templates/proactive/` +- `scripts/templates/cron/` +- `scripts/templates/goals/` +- `scripts/templates/budget/` +- `scripts/templates/governance/` +- `scripts/templates/org-chart/` +- `.claude-plugin/`, `CLAUDE.md`, `README.md` + +--- + +## Step 20: Create feedback loop templates + +### Files to create + +**`scripts/templates/feedback/FEEDBACK.md`** — Feedback tracking file: + +```markdown +# Feedback Log: {{PROJECT_NAME}} + +> Append-only. One row per pipeline run. Reviewed by performance-scorer.sh. + +## Feedback Table + +| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern | +|------|----------|-------|-------|-------|------------|---------| +| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} | + +## Pattern Tags + +Use consistent tags so performance-scorer.sh can detect recurring issues: + +- `quality-low` — output below acceptance threshold +- `loop-excess` — more revision iterations than expected +- `timeout` — agent exceeded time budget +- `tool-fail` — tool call failed or returned unexpected result +- `cost-spike` — single run cost exceeded 3x average +- `scope-drift` — agent worked outside defined scope +- `hallucination` — output contained factual errors + +## Notes + +Scores are 0–100 as assigned by the reviewer agent or human reviewer. +A score below 60 triggers a flag in performance-scorer.sh. +Three or more rows with the same Pattern tag = recurring issue. +Recurring issues should drive prompt iteration or pipeline redesign. +``` + +**`scripts/templates/feedback/feedback-collector.sh`** — PostToolUse hook variant that appends feedback after pipeline completion: + +```bash +#!/bin/bash +# PostToolUse hook: Collect feedback after pipeline completion. +# Bash 3.2 compatible. Uses python3 for JSON parsing and CSV/MD append. +# +# Triggered after a designated "review" tool call completes. +# Reads pipeline output and reviewer score, appends to FEEDBACK.md, +# and detects recurring patterns (3+ rows with same tag = recurring). +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory +# {{PIPELINE_NAME}} - name of the pipeline being tracked +# {{SCORE_THRESHOLD}} - minimum acceptable score (default: 60) + +WORKING_DIR="{{WORKING_DIR}}" +PIPELINE_NAME="{{PIPELINE_NAME}}" +SCORE_THRESHOLD="${SCORE_THRESHOLD:-60}" +FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md" +HOOK_INPUT=$(cat) + +# Only act on review tool calls +TOOL_NAME=$(echo "$HOOK_INPUT" | python3 -c " +import sys, json +try: + data = json.load(sys.stdin) + print(data.get('tool_name', '')) +except: + print('') +" 2>/dev/null) + +if [ "$TOOL_NAME" != "review_pipeline" ] && [ "$TOOL_NAME" != "score_output" ]; then + exit 0 +fi + +# Extract score, agent, issue, resolution, pattern from hook input +python3 << PYEOF +import sys, json, re, os +from datetime import datetime + +hook_input = """$HOOK_INPUT""" +feedback_file = "$FEEDBACK_FILE" +pipeline_name = "$PIPELINE_NAME" +score_threshold = int("$SCORE_THRESHOLD") + +try: + data = json.loads(hook_input) +except Exception: + sys.exit(0) + +tool_result = data.get('tool_result', '') +if isinstance(tool_result, dict): + tool_result = json.dumps(tool_result) + +# Parse structured fields from tool result (expects JSON or key:value) +agent_name = os.environ.get('AGENT_NAME', 'unknown') +score = 0 +issue = '' +resolution = '' +pattern = '' + +try: + result_data = json.loads(tool_result) + agent_name = result_data.get('agent', agent_name) + score = int(result_data.get('score', 0)) + issue = result_data.get('issue', '') + resolution = result_data.get('resolution', '') + pattern = result_data.get('pattern', '') +except Exception: + # Fallback: look for score: N in plain text + m = re.search(r'score[:\s]+(\d+)', tool_result, re.IGNORECASE) + if m: + score = int(m.group(1)) + m = re.search(r'pattern[:\s]+(\S+)', tool_result, re.IGNORECASE) + if m: + pattern = m.group(1) + +if score == 0 and not issue: + sys.exit(0) + +date_str = datetime.utcnow().strftime('%Y-%m-%d') +row = f"| {date_str} | {pipeline_name} | {agent_name} | {score}/100 | {issue} | {resolution} | {pattern} |" + +# Append to feedback table +if not os.path.exists(feedback_file): + print(f"Warning: {feedback_file} not found — skipping feedback append") + sys.exit(0) + +with open(feedback_file, 'r') as f: + content = f.read() + +# Insert row after the header row of the table +table_header = '| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |' +separator = '|------|----------|-------|-------|-------|------------|---------|' +placeholder_row = '| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} |' + +if placeholder_row in content: + # Replace placeholder with real row + keep placeholder for next time + content = content.replace(placeholder_row, row + '\n' + placeholder_row) +elif separator in content: + content = content.replace(separator, separator + '\n' + row) +else: + content += '\n' + row + '\n' + +with open(feedback_file, 'w') as f: + f.write(content) + +print(f"Feedback recorded: score={score}, pattern={pattern}") + +# Detect recurring patterns +if pattern: + pattern_count = content.count(f'| {pattern} |') + if pattern_count >= 3: + print(f"RECURRING PATTERN DETECTED: '{pattern}' appears {pattern_count} times") + print(f"Action required: review prompt or pipeline for '{pipeline_name}'") + +# Flag low scores +if score < score_threshold and score > 0: + print(f"LOW SCORE: {score} < threshold {score_threshold} for agent {agent_name}") +PYEOF + +exit 0 +``` + +**`scripts/templates/feedback/performance-scorer.sh`** — Standalone scoring script that reads FEEDBACK.md and cost-events.jsonl: + +```bash +#!/bin/bash +# Performance scorer: per-agent metrics from FEEDBACK.md + cost-events.jsonl. +# Bash 3.2 compatible. Uses python3 for all metrics computation. +# +# Metrics per agent: +# - Average score (0-100) +# - Error rate (rows with score < threshold / total rows) +# - Cost per run (from cost-events.jsonl, rough proxy) +# - Improvement trend: avg of last 10 scores vs. previous 10 +# +# Flags agents below threshold (default 60/100). +# +# Usage: +# ./performance-scorer.sh # Score all agents +# ./performance-scorer.sh --agent {{AGENT}} # Score specific agent +# ./performance-scorer.sh --threshold 70 # Custom threshold +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md" +COST_LOG="$WORKING_DIR/budget/cost-events.jsonl" +THRESHOLD="${2:-60}" +AGENT_FILTER="" + +# Parse arguments (bash 3.2 compatible — no associative arrays) +while [ "$#" -gt 0 ]; do + case "$1" in + --agent) AGENT_FILTER="$2"; shift 2 ;; + --threshold) THRESHOLD="$2"; shift 2 ;; + *) shift ;; + esac +done + +if [ ! -f "$FEEDBACK_FILE" ]; then + echo "No feedback file found at $FEEDBACK_FILE" + exit 0 +fi + +python3 << PYEOF +import re, json, os, sys +from collections import defaultdict + +feedback_file = "$FEEDBACK_FILE" +cost_log = "$COST_LOG" +threshold = int("$THRESHOLD") +agent_filter = "$AGENT_FILTER" + +# Parse FEEDBACK.md table rows +# Expected columns: Date, Pipeline, Agent, Score, Issue, Resolution, Pattern +feedback_rows = [] +with open(feedback_file) as f: + in_table = False + header_seen = False + for line in f: + line = line.strip() + if '| Date |' in line: + in_table = True + header_seen = True + continue + if in_table and line.startswith('|---'): + continue + if in_table and line.startswith('|') and '{{' not in line and header_seen: + cols = [c.strip() for c in line.strip('|').split('|')] + if len(cols) >= 7: + try: + date = cols[0] + pipeline = cols[1] + agent = cols[2] + score_str = cols[3] + issue = cols[4] + resolution = cols[5] + pattern = cols[6] + # Parse score: "75/100" or "75" + score_m = re.match(r'(\d+)', score_str) + score = int(score_m.group(1)) if score_m else 0 + feedback_rows.append({ + 'date': date, + 'pipeline': pipeline, + 'agent': agent, + 'score': score, + 'issue': issue, + 'pattern': pattern + }) + except (ValueError, IndexError): + pass + +# Filter by agent if specified +if agent_filter: + feedback_rows = [r for r in feedback_rows if r['agent'] == agent_filter] + +if not feedback_rows: + print("No feedback rows found.") + sys.exit(0) + +# Read cost events if available +cost_by_agent = defaultdict(int) +if os.path.exists(cost_log): + with open(cost_log) as f: + for line in f: + line = line.strip() + if line: + try: + event = json.loads(line) + agent = event.get('agent', 'unknown') + cost_by_agent[agent] += 1 # event count as proxy + except Exception: + pass + +# Compute per-agent metrics +agents = list(set(r['agent'] for r in feedback_rows)) + +print("PERFORMANCE SCORECARD") +print("=" * 60) +print(f"Threshold: {threshold}/100") +print(f"Total feedback rows: {len(feedback_rows)}") +print() + +flagged = [] + +for agent in sorted(agents): + rows = [r for r in feedback_rows if r['agent'] == agent] + scores = [r['score'] for r in rows] + + avg_score = sum(scores) / len(scores) if scores else 0 + error_rate = len([s for s in scores if s < threshold]) / len(scores) if scores else 0 + cost_events = cost_by_agent.get(agent, 0) + cost_per_run = cost_events / len(rows) if rows else 0 + + # Improvement trend: last 10 vs. prev 10 + trend_str = "n/a (fewer than 20 runs)" + if len(scores) >= 20: + prev10 = scores[-20:-10] + last10 = scores[-10:] + prev_avg = sum(prev10) / len(prev10) + last_avg = sum(last10) / len(last10) + delta = last_avg - prev_avg + if delta > 5: + trend_str = f"improving (+{delta:.1f})" + elif delta < -5: + trend_str = f"declining ({delta:.1f})" + else: + trend_str = f"stable ({delta:+.1f})" + elif len(scores) >= 10: + last10 = scores[-10:] + trend_str = f"recent avg: {sum(last10)/len(last10):.1f} (need 20 runs for trend)" + + # Pattern frequency + patterns = defaultdict(int) + for r in rows: + if r['pattern']: + patterns[r['pattern']] += 1 + top_patterns = sorted(patterns.items(), key=lambda x: -x[1])[:3] + + print(f"Agent: {agent}") + print(f" Runs: {len(rows)}") + print(f" Avg score: {avg_score:.1f}/100") + print(f" Error rate: {error_rate*100:.0f}% (score < {threshold})") + print(f" Cost/run: ~{cost_per_run:.1f} events (rough proxy)") + print(f" Trend: {trend_str}") + if top_patterns: + print(f" Top patterns: {', '.join(f'{p}({c})' for p, c in top_patterns)}") + print() + + if avg_score < threshold: + flagged.append((agent, avg_score)) + +# Summary of flagged agents +if flagged: + print("FLAGGED AGENTS (below threshold)") + print("-" * 40) + for agent, avg in flagged: + print(f" {agent}: avg {avg:.1f} < {threshold}") + print() + print("Recommended actions:") + print(" 1. Review feedback rows for top patterns") + print(" 2. Iterate on agent system prompt") + print(" 3. Consider pipeline redesign if pattern is structural") + print(" 4. Run pipeline-optimizer.sh for bottleneck analysis") +else: + print("All agents above threshold.") +PYEOF +``` + +**`scripts/templates/feedback/README.md`** — Explains the feedback loop pattern: + +```markdown +# Feedback Loop + +Systematic feedback collection and performance scoring for agent pipelines. + +## How it works + +1. After each pipeline run, a reviewer agent (or human) assigns a score (0–100) + and categorizes any issues with a pattern tag. +2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or + `score_output` tool calls. It appends a row to `FEEDBACK.md`. +3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires. +4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl` + to compute per-agent metrics: average score, error rate, cost per run, + improvement trend (last 10 vs. previous 10 runs). +5. Agents scoring below the threshold (default 60/100) are flagged for review. + +## Pattern tags + +Consistent tags are required for pattern detection to work. Use the tags +defined in `FEEDBACK.md`. Add project-specific tags as needed — but be +consistent. Inconsistent tagging produces false negatives. + +## Scoring → self-improvement connection + +Feedback scores are the input to VFM (Value-for-Money) pre-scoring +defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11). +A low-scoring agent gets a lower VFM pre-score for future pipeline tasks, +making it less likely to be selected until its performance improves. + +The feedback loop closes the improvement cycle: +1. Pipeline runs → reviewer assigns score + pattern tag +2. `feedback-collector.sh` appends to FEEDBACK.md +3. `performance-scorer.sh` flags underperforming agents +4. Developer reviews top patterns → iterates on agent prompt +5. New runs produce new feedback → trend shows improvement +6. VFM scores update automatically on next pipeline selection + +## Example: prompt iteration driven by feedback + +Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`: + +``` +| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low | +| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low | +| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low | +``` + +After 3 rows: feedback-collector.sh fires the recurring-pattern alert. +performance-scorer.sh shows avg 45/100, error rate 100%. +Action: update agent-writer's system prompt with explicit length and +depth requirements. Next 10 runs show trend "improving (+18.3)". + +## Integration + +Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`: + +```json +{ + "hooks": { + "PostToolUse": [{ + "matcher": "review_pipeline", + "hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}] + }] + } +} +``` + +Run performance-scorer.sh on demand or as a scheduled report: + +```bash +./feedback/performance-scorer.sh +./feedback/performance-scorer.sh --agent agent-writer --threshold 70 +``` +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure: retry — fix bash syntax, then revert + +### Checkpoint +```bash +git commit -m "feat(templates): add feedback loop and performance scoring templates" +``` + +--- + +## Step 21: Create pipeline optimization templates + +### Files to create + +**`scripts/templates/optimization/pipeline-optimizer.sh`** — Analyzes pipeline performance and generates recommendations: + +```bash +#!/bin/bash +# Pipeline optimizer: identify bottlenecks, excess loops, cost outliers. +# Bash 3.2 compatible. Uses python3 for all analysis. +# Does NOT auto-implement any changes — produces RECOMMENDATIONS.md only. +# +# Analysis covers: +# - Bottleneck agents (highest avg duration or cost per run) +# - Unnecessary revision loops (agents that loop 3+ times on average) +# - Underutilized agents (invoked < 10% of pipeline runs) +# - Cost outliers (single run cost >= 3x average) +# +# Output: RECOMMENDATIONS.md with VFM pre-scores for each recommendation. +# +# Usage: +# ./pipeline-optimizer.sh +# ./pipeline-optimizer.sh --pipeline {{PIPELINE_NAME}} +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md" +COST_LOG="$WORKING_DIR/budget/cost-events.jsonl" +RECOMMENDATIONS_FILE="$WORKING_DIR/RECOMMENDATIONS.md" +PIPELINE_FILTER="" + +# Parse arguments (bash 3.2 compatible) +while [ "$#" -gt 0 ]; do + case "$1" in + --pipeline) PIPELINE_FILTER="$2"; shift 2 ;; + *) shift ;; + esac +done + +python3 << PYEOF +import re, json, os, sys +from collections import defaultdict +from datetime import datetime + +feedback_file = "$FEEDBACK_FILE" +cost_log = "$COST_LOG" +recommendations_file = "$RECOMMENDATIONS_FILE" +pipeline_filter = "$PIPELINE_FILTER" + +# Parse FEEDBACK.md +feedback_rows = [] +if os.path.exists(feedback_file): + with open(feedback_file) as f: + in_table = False + for line in f: + line = line.strip() + if '| Date |' in line: + in_table = True + continue + if in_table and line.startswith('|---'): + continue + if in_table and line.startswith('|') and '{{' not in line: + cols = [c.strip() for c in line.strip('|').split('|')] + if len(cols) >= 7: + try: + score_m = re.match(r'(\d+)', cols[3]) + score = int(score_m.group(1)) if score_m else 0 + feedback_rows.append({ + 'date': cols[0], + 'pipeline': cols[1], + 'agent': cols[2], + 'score': score, + 'issue': cols[4], + 'pattern': cols[6] + }) + except (ValueError, IndexError): + pass + +# Filter by pipeline +if pipeline_filter: + feedback_rows = [r for r in feedback_rows if r['pipeline'] == pipeline_filter] + +# Parse cost events +cost_events = [] +if os.path.exists(cost_log): + with open(cost_log) as f: + for line in f: + line = line.strip() + if line: + try: + cost_events.append(json.loads(line)) + except Exception: + pass + +# Per-agent event counts (cost proxy) +cost_by_agent = defaultdict(list) +# Group by agent+date for per-run cost +run_costs = defaultdict(list) +for e in cost_events: + agent = e.get('agent', 'unknown') + date = e.get('timestamp', '')[:10] + run_key = f"{agent}:{date}" + cost_by_agent[agent].append(1) + run_costs[agent].append(1) + +# Build recommendations +recommendations = [] + +# 1. Bottleneck agents: top 2 by event count +if cost_by_agent: + agent_totals = [(a, len(events)) for a, events in cost_by_agent.items()] + agent_totals.sort(key=lambda x: -x[1]) + for agent, total in agent_totals[:2]: + all_costs = [len(v) for v in run_costs.values()] + avg_cost = sum(all_costs) / len(all_costs) if all_costs else 1 + if total > avg_cost * 1.5: + recommendations.append({ + 'type': 'bottleneck', + 'agent': agent, + 'description': f"Agent '{agent}' accounts for {total} events vs avg {avg_cost:.0f}. " + f"Consider batching its tool calls or reducing its task scope.", + 'vfm_prescore': 70 + }) + +# 2. Unnecessary revision loops: agents with loop-excess pattern >= 3 times +pattern_by_agent = defaultdict(lambda: defaultdict(int)) +for r in feedback_rows: + if r['pattern']: + pattern_by_agent[r['agent']][r['pattern']] += 1 + +for agent, patterns in pattern_by_agent.items(): + if patterns.get('loop-excess', 0) >= 3: + count = patterns['loop-excess'] + recommendations.append({ + 'type': 'loop-excess', + 'agent': agent, + 'description': f"Agent '{agent}' has {count} feedback rows tagged 'loop-excess'. " + f"Review pipeline revision criteria — tighten acceptance conditions " + f"or add a max-iterations guard (see self-healing.sh).", + 'vfm_prescore': 80 + }) + +# 3. Underutilized agents: invoked in < 10% of pipeline runs +if feedback_rows: + all_runs = set(r['date'] + ':' + r['pipeline'] for r in feedback_rows) + total_runs = len(all_runs) if all_runs else 1 + agent_runs = defaultdict(set) + for r in feedback_rows: + agent_runs[r['agent']].add(r['date'] + ':' + r['pipeline']) + for agent, runs in agent_runs.items(): + utilization = len(runs) / total_runs + if utilization < 0.1 and total_runs >= 10: + recommendations.append({ + 'type': 'underutilized', + 'agent': agent, + 'description': f"Agent '{agent}' appears in only {utilization*100:.0f}% of pipeline runs. " + f"Consider removing from the pipeline or combining with another agent.", + 'vfm_prescore': 60 + }) + +# 4. Cost outliers: single-run cost >= 3x average +if run_costs: + all_run_totals = [] + for agent, runs in run_costs.items(): + all_run_totals.extend(runs) + avg_run = sum(all_run_totals) / len(all_run_totals) if all_run_totals else 1 + for agent, runs in run_costs.items(): + for run_cost in runs: + if run_cost >= avg_run * 3: + recommendations.append({ + 'type': 'cost-outlier', + 'agent': agent, + 'description': f"Agent '{agent}' had a run costing {run_cost} events " + f"vs avg {avg_run:.1f} (3x+ threshold). " + f"Add per-run budget cap with budget-hook.sh.", + 'vfm_prescore': 75 + }) + break # one recommendation per agent + +# Write RECOMMENDATIONS.md +timestamp = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ') +pipeline_label = pipeline_filter if pipeline_filter else "all pipelines" + +lines = [ + f"# Pipeline Optimization Recommendations", + f"", + f"Generated: {timestamp}", + f"Scope: {pipeline_label}", + f"", + f"> These are recommendations only. No changes have been made.", + f"> Review each item and implement manually or with team approval.", + f"", +] + +if recommendations: + lines.append(f"## Recommendations ({len(recommendations)} found)") + lines.append("") + for i, rec in enumerate(recommendations, 1): + lines.append(f"### R{i}: {rec['type'].upper()} — {rec['agent']}") + lines.append("") + lines.append(rec['description']) + lines.append("") + lines.append(f"**VFM pre-score:** {rec['vfm_prescore']}/100") + lines.append("") +else: + lines.append("## No recommendations") + lines.append("") + lines.append("No bottlenecks, excess loops, underutilized agents, or cost outliers detected.") + lines.append("") + +lines.append("## Next steps") +lines.append("") +lines.append("1. Review each recommendation with the team") +lines.append("2. Prioritize by VFM pre-score (higher = more value per effort)") +lines.append("3. Implement approved changes one at a time") +lines.append("4. Run feedback-collector.sh for 10+ runs after each change") +lines.append("5. Re-run pipeline-optimizer.sh to confirm improvement") + +with open(recommendations_file, 'w') as f: + f.write('\n'.join(lines) + '\n') + +print(f"Recommendations written to {recommendations_file}") +print(f" Found: {len(recommendations)} recommendations") +for rec in recommendations: + print(f" - [{rec['type']}] {rec['agent']}: VFM pre-score {rec['vfm_prescore']}") +PYEOF +``` + +**`scripts/templates/optimization/self-healing.sh`** — Error recovery after agent/pipeline failures: + +```bash +#!/bin/bash +# Self-healing: categorize errors and apply recovery strategies. +# Bash 3.2 compatible. Uses python3 for JSON/log parsing. +# +# Error categories and recovery strategies: +# timeout → retry with shorter task scope +# permission-denied → log and skip (do not retry) +# tool-not-found → log and alert, do not retry +# api-error → exponential backoff, max 3 retries +# content-quality → re-run with stricter prompt, max 2 retries +# +# Max total attempts: 5 (OpenClaw pattern — hard cap regardless of category). +# All recovery events logged to healing-log.jsonl. +# +# Usage: +# ./self-healing.sh --error-type --agent --attempt --context +# +# Exit codes: +# 0 — recovery action taken (caller should retry) +# 1 — no recovery possible (caller should abort) +# 2 — max attempts reached (caller should escalate) +# +# Placeholders: +# {{WORKING_DIR}} - absolute path to project directory + +WORKING_DIR="{{WORKING_DIR}}" +HEALING_LOG="$WORKING_DIR/healing-log.jsonl" +MAX_ATTEMPTS=5 + +ERROR_TYPE="" +AGENT_NAME="" +ATTEMPT=1 +CONTEXT_MSG="" + +# Parse arguments (bash 3.2 compatible) +while [ "$#" -gt 0 ]; do + case "$1" in + --error-type) ERROR_TYPE="$2"; shift 2 ;; + --agent) AGENT_NAME="$2"; shift 2 ;; + --attempt) ATTEMPT="$2"; shift 2 ;; + --context) CONTEXT_MSG="$2"; shift 2 ;; + *) shift ;; + esac +done + +if [ -z "$ERROR_TYPE" ]; then + echo "Usage: $0 --error-type --agent --attempt --context " + exit 1 +fi + +# Hard cap: max 5 attempts total +if [ "$ATTEMPT" -gt "$MAX_ATTEMPTS" ]; then + echo "MAX ATTEMPTS REACHED ($MAX_ATTEMPTS) for $AGENT_NAME. Escalating." + python3 -c " +import json, time, os +event = { + 'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), + 'agent': '$AGENT_NAME', + 'error_type': '$ERROR_TYPE', + 'attempt': $ATTEMPT, + 'action': 'escalate', + 'reason': 'max_attempts_reached', + 'context': '$CONTEXT_MSG' +} +with open('$HEALING_LOG', 'a') as f: + f.write(json.dumps(event) + '\n') +print(json.dumps(event)) +" + exit 2 +fi + +# Determine recovery action per category +RECOVERY_ACTION="" +RECOVERY_DETAIL="" +EXIT_CODE=0 + +case "$ERROR_TYPE" in + timeout) + RECOVERY_ACTION="retry_shorter" + RECOVERY_DETAIL="Re-run with reduced task scope. Split task if attempt >= 3." + if [ "$ATTEMPT" -ge 3 ]; then + RECOVERY_DETAIL="Attempt $ATTEMPT: recommend splitting task before retry." + fi + EXIT_CODE=0 + ;; + permission-denied) + RECOVERY_ACTION="skip" + RECOVERY_DETAIL="Permission errors cannot be auto-resolved. Log and skip. Notify operator." + EXIT_CODE=1 + ;; + tool-not-found) + RECOVERY_ACTION="alert" + RECOVERY_DETAIL="Tool not found — check agent config and hook registrations. Do not retry." + EXIT_CODE=1 + ;; + api-error) + # Exponential backoff: 2^(attempt-1) seconds, max 3 retries + if [ "$ATTEMPT" -le 3 ]; then + BACKOFF_SECS=$(python3 -c "print(min(2 ** ($ATTEMPT - 1), 16))") + RECOVERY_ACTION="retry_backoff" + RECOVERY_DETAIL="API error — wait ${BACKOFF_SECS}s then retry (attempt $ATTEMPT/3)." + sleep "$BACKOFF_SECS" + EXIT_CODE=0 + else + RECOVERY_ACTION="abort" + RECOVERY_DETAIL="API error persists after 3 retries. Aborting." + EXIT_CODE=1 + fi + ;; + content-quality) + # Max 2 retries for quality issues + if [ "$ATTEMPT" -le 2 ]; then + RECOVERY_ACTION="retry_strict" + RECOVERY_DETAIL="Re-run with stricter prompt. Add explicit quality criteria (attempt $ATTEMPT/2)." + EXIT_CODE=0 + else + RECOVERY_ACTION="escalate_quality" + RECOVERY_DETAIL="Content quality below threshold after 2 retries. Escalate to human review." + EXIT_CODE=2 + fi + ;; + *) + RECOVERY_ACTION="unknown" + RECOVERY_DETAIL="Unknown error type '$ERROR_TYPE'. Logging and aborting." + EXIT_CODE=1 + ;; +esac + +# Log recovery event +python3 -c " +import json, time +event = { + 'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), + 'agent': '$AGENT_NAME', + 'error_type': '$ERROR_TYPE', + 'attempt': $ATTEMPT, + 'action': '$RECOVERY_ACTION', + 'detail': '$RECOVERY_DETAIL', + 'context': '$CONTEXT_MSG' +} +with open('$HEALING_LOG', 'a') as f: + f.write(json.dumps(event) + '\n') +print(json.dumps(event, indent=2)) +" + +echo "Recovery: $RECOVERY_ACTION — $RECOVERY_DETAIL" +exit $EXIT_CODE +``` + +**`scripts/templates/optimization/README.md`** — Explains optimization and self-healing: + +```markdown +# Pipeline Optimization and Self-Healing + +Two tools for making agent pipelines more efficient and resilient over time. + +## pipeline-optimizer.sh + +Analyzes FEEDBACK.md and cost-events.jsonl to identify: + +| Issue | Detection | Recommendation | +|-------|-----------|----------------| +| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope | +| Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard | +| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent | +| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh | + +Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each +recommendation. Higher VFM pre-scores mean more value per implementation effort. + +**This script does not auto-implement anything.** All changes require +manual review and explicit approval. This is intentional — pipeline +restructuring is a high-stakes operation. + +## self-healing.sh + +Categorizes errors and applies targeted recovery strategies: + +| Error Type | Recovery | Max Retries | +|------------|----------|-------------| +| `timeout` | Retry with shorter scope | 5 (hard cap) | +| `permission-denied` | Log and skip | 0 (no retry) | +| `tool-not-found` | Alert operator | 0 (no retry) | +| `api-error` | Exponential backoff (2^n seconds) | 3 | +| `content-quality` | Retry with stricter prompt | 2 | + +**Hard cap: 5 total attempts regardless of category.** This follows the +OpenClaw pattern — unbounded retry loops are the most common cause of +runaway agent costs. The cap is non-negotiable. + +After the hard cap is reached, the script exits with code 2 (escalate). +The caller is responsible for deciding whether to pause, alert a human, +or abort the pipeline run. + +## Connection to feedback and VFM + +``` +feedback-collector.sh → FEEDBACK.md → performance-scorer.sh → flagged agents + ↓ + pipeline-optimizer.sh → RECOMMENDATIONS.md + ↓ + (manual review + approval) + ↓ + prompt/pipeline update + ↓ + new runs → new feedback +``` + +VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as +`scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are +pre-scores, not final scores — the VFM evaluation still needs to run +when the task is scheduled. The pre-scores help prioritize which +recommendations to tackle first. + +## Safety limits + +- `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files +- `self-healing.sh`: max 5 attempts hard cap, permission errors never retried +- All events logged to `healing-log.jsonl` for audit trail +- No auto-escalation to external systems — exit codes only + +## Usage + +```bash +# Run optimizer for all pipelines +./optimization/pipeline-optimizer.sh + +# Run optimizer for a specific pipeline +./optimization/pipeline-optimizer.sh --pipeline doc-pipeline + +# Handle an error in a pipeline step +./optimization/self-healing.sh \ + --error-type api-error \ + --agent agent-writer \ + --attempt 1 \ + --context "OpenAI timeout on summarize call" + +# Check healing log +cat healing-log.jsonl | python3 -m json.tool +``` +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh && echo "VALID" +``` +Expected: `VALID` + +### On failure: retry — fix bash syntax, then revert + +### Checkpoint +```bash +git commit -m "feat(templates): add pipeline optimization and self-healing templates" +``` + +--- + +## Exit Condition + +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/feedback/ | wc -l` → 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/optimization/ | wc -l` → 3 +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh` → no errors +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh` → no errors +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh` → no errors +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh` → no errors +- [ ] FEEDBACK.md contains `| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |` header row +- [ ] performance-scorer.sh computes improvement trend (last 10 vs. prev 10) +- [ ] pipeline-optimizer.sh writes to RECOMMENDATIONS.md and does NOT modify any pipeline files +- [ ] self-healing.sh exits 2 when attempt > 5 (hard cap enforced) +- [ ] healing-log.jsonl referenced in self-healing.sh +- [ ] All bash scripts are 3.2 compatible (no associative arrays, no mapfile, no `|&`) + +## Quality Criteria + +- Feedback table columns match the 7-column spec (Date, Pipeline, Agent, Score, Issue, Resolution, Pattern) +- Pattern detection fires at exactly 3 occurrences (not 2, not 4) +- Performance-scorer.sh improvement trend correctly computes last 10 vs. previous 10 scores +- pipeline-optimizer.sh detects all 4 issue types: bottleneck, loop-excess, underutilized, cost-outlier +- VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as VFM-SCORING.md (Step 11) +- self-healing.sh hard cap is exactly 5 (OpenClaw pattern) — not configurable +- permission-denied and tool-not-found errors are never retried (exit 1 immediately) +- api-error uses exponential backoff: 1s, 2s, 4s (2^0, 2^1, 2^2) before aborting at attempt 4 +- content-quality escalates to human review (exit 2) after 2 retries, not abort +- All scripts use `#!/bin/bash` shebang and are bash 3.2 compatible diff --git a/.claude/plans/blueprints/session-6-integration.md b/.claude/plans/blueprints/session-6-integration.md new file mode 100644 index 0000000..a511873 --- /dev/null +++ b/.claude/plans/blueprints/session-6-integration.md @@ -0,0 +1,1668 @@ +# Session 6: Integration — Docker, Transfer, Additional Templates + +> Steps 22, 23, 24 | Wave 1 | Depends on: none + +## Dependencies + +Entry condition: none (creates new directories only, no overlap with other sessions) + +## Scope Fence + +**Touch:** +- `scripts/templates/docker/Dockerfile` (new) +- `scripts/templates/docker/docker-compose.yml` (new) +- `scripts/templates/docker/docker-entrypoint.sh` (new) +- `scripts/templates/docker/README.md` (new) +- `scripts/templates/transfer/export-system.sh` (new) +- `scripts/templates/transfer/import-system.sh` (new) +- `scripts/templates/transfer/MANIFEST.md` (new) +- `scripts/templates/transfer/README.md` (new) +- `scripts/templates/domains/customer-support.md` (new) +- `scripts/templates/domains/devops-automation.md` (new) +- `scripts/templates/domains/legal-review.md` (new) +- `scripts/templates/domains/sales-intelligence.md` (new) +- `scripts/templates/domains/security-audit.md` (new) +- `scripts/templates/domains/README.md` (update only — add 5 new entries) + +**Never touch:** +- `commands/` +- `agents/` +- `skills/` +- `scripts/templates/domains/content-pipeline.md` +- `scripts/templates/domains/code-review.md` +- `scripts/templates/domains/monitoring.md` +- `scripts/templates/domains/research-synthesis.md` +- `scripts/templates/domains/data-processing.md` + +--- + +## Step 22: Create Docker deployment templates + +### Files to create + +**`scripts/templates/docker/Dockerfile`**: + +```dockerfile +# Agent Factory — Docker deployment template +# Runs a Claude Code agent system in an isolated container. +# Replace {{PROJECT_NAME}} and {{ANTHROPIC_API_KEY}} with real values +# (or pass them at runtime via .env / docker compose env_file). + +FROM node:22-slim + +# Install Claude Code globally +RUN npm install -g @anthropic-ai/claude-code + +# Create a non-root agent user for security +RUN useradd -m -s /bin/bash agent + +WORKDIR /home/agent/project + +# Copy project files (adjust to your project structure) +COPY --chown=agent:agent . . + +# Set up entrypoint script +COPY --chown=agent:agent docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh +RUN chmod +x /usr/local/bin/docker-entrypoint.sh + +USER agent + +# API key injected at runtime — never bake into image +ENV ANTHROPIC_API_KEY={{ANTHROPIC_API_KEY}} + +# Health check — entrypoint writes /tmp/agent-health on each beat +HEALTHCHECK --interval=60s --timeout=10s --start-period=30s --retries=3 \ + CMD test -f /tmp/agent-health && \ + test $(( $(date +%s) - $(date +%s -r /tmp/agent-health 2>/dev/null || echo 0) )) -lt 300 + +ENTRYPOINT ["docker-entrypoint.sh"] +``` + +**`scripts/templates/docker/docker-compose.yml`**: + +```yaml +# Agent Factory — docker-compose deployment template +# Usage: docker compose up -d +# Requires: .env file with ANTHROPIC_API_KEY=... + +version: "3.8" + +services: + agent: + container_name: {{PROJECT_NAME}}-agent + build: . + restart: unless-stopped + env_file: + - .env + volumes: + # Persistent data directories — survive container restarts + - ./data:/home/agent/project/data + - ./memory:/home/agent/project/memory + - ./budget:/home/agent/project/budget + - ./logs:/home/agent/project/logs + security_opt: + - no-new-privileges:true + read_only: false + # No Docker socket mount — agent cannot control the Docker daemon + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" +``` + +**`scripts/templates/docker/docker-entrypoint.sh`**: + +```bash +#!/bin/bash +# Agent Factory — Docker container entrypoint +# Validates environment, then runs the heartbeat runner in a loop. +# Bash 3.2 compatible. + +set -e + +HEALTH_FILE="/tmp/agent-health" +LOG_DIR="/home/agent/project/logs" +HEARTBEAT_SCRIPT="/home/agent/project/automation/heartbeat-runner.sh" + +# Graceful shutdown handler +shutdown_handler() { + echo "[entrypoint] SIGTERM received — shutting down gracefully" + if [ -n "$RUNNER_PID" ]; then + kill "$RUNNER_PID" 2>/dev/null || true + wait "$RUNNER_PID" 2>/dev/null || true + fi + rm -f "$HEALTH_FILE" + echo "[entrypoint] Shutdown complete" + exit 0 +} + +trap shutdown_handler TERM INT + +# Validate required environment variables +if [ -z "$ANTHROPIC_API_KEY" ]; then + echo "[entrypoint] ERROR: ANTHROPIC_API_KEY is not set" >&2 + exit 1 +fi + +# Create required directories +mkdir -p "$LOG_DIR" +mkdir -p "/home/agent/project/data" +mkdir -p "/home/agent/project/memory" +mkdir -p "/home/agent/project/budget" +mkdir -p "/home/agent/project/pipeline-output" + +echo "[entrypoint] Starting agent container at $(date)" +echo "[entrypoint] Project: $(basename /home/agent/project)" + +# Verify Claude Code is available +if ! command -v claude >/dev/null 2>&1; then + echo "[entrypoint] ERROR: claude command not found" >&2 + exit 1 +fi + +# Run heartbeat runner in loop +while true; do + # Write health check timestamp + date > "$HEALTH_FILE" + + if [ -f "$HEARTBEAT_SCRIPT" ]; then + echo "[entrypoint] Running heartbeat at $(date)" + bash "$HEARTBEAT_SCRIPT" >> "$LOG_DIR/agent.log" 2>&1 & + RUNNER_PID=$! + wait $RUNNER_PID + RUNNER_PID="" + else + echo "[entrypoint] WARNING: $HEARTBEAT_SCRIPT not found — sleeping" + fi + + # Sleep between beats (default 3600s = 1 hour) + BEAT_INTERVAL="${AGENT_BEAT_INTERVAL:-3600}" + echo "[entrypoint] Sleeping ${BEAT_INTERVAL}s until next beat" + sleep "$BEAT_INTERVAL" & + RUNNER_PID=$! + wait $RUNNER_PID + RUNNER_PID="" +done +``` + +**`scripts/templates/docker/README.md`**: + +```markdown +# Docker Deployment + +Run your agent system in an isolated Docker container. + +## Prerequisites + +- Docker and Docker Compose installed +- `.env` file with your API key (see below) +- Agent system built with `/agent-factory:build` + +## Setup + +1. Copy these files to your project root: + - `Dockerfile` + - `docker-compose.yml` + - `docker-entrypoint.sh` + +2. Create `.env` in your project root: + ``` + ANTHROPIC_API_KEY=sk-ant-... + AGENT_BEAT_INTERVAL=3600 + ``` + Add `.env` to `.gitignore` — never commit API keys. + +3. Replace `{{PROJECT_NAME}}` in `docker-compose.yml` with your project name. + +## Build and run + +```bash +# Build the image +docker compose build + +# Start in background +docker compose up -d + +# View logs +docker compose logs -f + +# Stop +docker compose down +``` + +## Volume mounts + +| Host path | Container path | Purpose | +|-----------|---------------|---------| +| `./data` | `/home/agent/project/data` | Run state, outputs | +| `./memory` | `/home/agent/project/memory` | Long-term memory files | +| `./budget` | `/home/agent/project/budget` | Budget tracking | +| `./logs` | `/home/agent/project/logs` | Agent activity logs | + +These directories are created automatically on first run. + +## Environment variables + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `ANTHROPIC_API_KEY` | Yes | — | Your Anthropic API key | +| `AGENT_BEAT_INTERVAL` | No | `3600` | Seconds between heartbeat runs | + +## Security + +- **Never bake the API key into the image.** Always pass it via `.env` or `--env-file`. +- **Never mount the Docker socket** (`/var/run/docker.sock`) — the agent does not need Docker control. +- The container runs as a non-root `agent` user. +- `no-new-privileges:true` prevents privilege escalation. +- `restart: unless-stopped` ensures the agent recovers from crashes automatically. + +## Health check + +The entrypoint writes a timestamp to `/tmp/agent-health` on each beat. +Docker's `HEALTHCHECK` verifies this file is updated within 5 minutes. + +Check health status: +```bash +docker inspect --format='{{.State.Health.Status}}' {{PROJECT_NAME}}-agent +``` +``` + +### Verify + +```bash +test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/Dockerfile && \ +test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/docker-compose.yml && \ +test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/docker-entrypoint.sh && \ +test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/README.md && \ +echo OK +``` +Expected: `OK` + +### On failure +revert + +### Checkpoint +```bash +git commit -m "feat(templates): add Docker deployment templates" +``` + +--- + +## Step 23: Create import/export system + +### Files to create + +**`scripts/templates/transfer/export-system.sh`**: + +```bash +#!/bin/bash +# Agent Factory — export-system.sh +# Packages an agent system into a portable tarball with checksums. +# Usage: bash export-system.sh +# Bash 3.2 compatible. + +set -e + +EXPORT_NAME="${1:-agent-system}" +DATE=$(python3 -c "import datetime; print(datetime.date.today().strftime('%Y-%m-%d'))") +OUTPUT_FILE="agent-system-${EXPORT_NAME}-${DATE}.tar.gz" +STAGING_DIR="/tmp/agent-export-$$" +MANIFEST_FILE="${STAGING_DIR}/MANIFEST.md" + +# Determine project root (directory containing .claude/) +PROJECT_ROOT="$(pwd)" +if [ ! -d "${PROJECT_ROOT}/.claude" ]; then + echo "ERROR: No .claude/ directory found in $(pwd)" >&2 + echo "Run this script from your project root." >&2 + exit 1 +fi + +echo "[export] Starting export: ${EXPORT_NAME}" +echo "[export] Date: ${DATE}" +echo "[export] Output: ${OUTPUT_FILE}" + +mkdir -p "${STAGING_DIR}" + +# Collect files — explicit inclusions only +collect_files() { + local dir="$1" + local dest="$2" + if [ -d "${PROJECT_ROOT}/${dir}" ]; then + mkdir -p "${STAGING_DIR}/${dest}" + cp -r "${PROJECT_ROOT}/${dir}/." "${STAGING_DIR}/${dest}/" 2>/dev/null || true + echo "[export] Collected: ${dir}" + fi +} + +collect_files ".claude/agents" ".claude/agents" +collect_files ".claude/skills" ".claude/skills" +collect_files ".claude/hooks" ".claude/hooks" +collect_files "hooks" "hooks" +collect_files "automation" "automation" +collect_files "scripts" "scripts" + +# Copy settings.json if it exists +if [ -f "${PROJECT_ROOT}/.claude/settings.json" ]; then + mkdir -p "${STAGING_DIR}/.claude" + cp "${PROJECT_ROOT}/.claude/settings.json" "${STAGING_DIR}/.claude/settings.json" + echo "[export] Collected: .claude/settings.json" +fi + +# Copy CLAUDE.md if it exists +if [ -f "${PROJECT_ROOT}/CLAUDE.md" ]; then + cp "${PROJECT_ROOT}/CLAUDE.md" "${STAGING_DIR}/CLAUDE.md" + echo "[export] Collected: CLAUDE.md" +fi + +# Remove excluded files from staging +echo "[export] Removing excluded files" +rm -f "${STAGING_DIR}/.env" 2>/dev/null || true +find "${STAGING_DIR}" -name "*.local.*" -delete 2>/dev/null || true +find "${STAGING_DIR}" -name "audit.log" -delete 2>/dev/null || true +find "${STAGING_DIR}" -name "cost-events.jsonl" -delete 2>/dev/null || true +find "${STAGING_DIR}" -name ".git" -prune -o -print | grep "\.git$" | xargs rm -rf 2>/dev/null || true +rm -rf "${STAGING_DIR}/memory" 2>/dev/null || true + +# Generate MANIFEST.md with checksums +echo "[export] Generating manifest" + +COMPONENT_ROWS="" +find "${STAGING_DIR}" -type f | sort | while read -r fpath; do + rel="${fpath#${STAGING_DIR}/}" + checksum=$(python3 -c "import hashlib; print(hashlib.sha256(open('${fpath}','rb').read()).hexdigest()[:16])") + filetype="other" + case "$rel" in + .claude/agents/*) filetype="agent" ;; + .claude/skills/*) filetype="skill" ;; + .claude/hooks/*|hooks/*) filetype="hook" ;; + .claude/settings.json) filetype="settings" ;; + automation/*) filetype="automation" ;; + CLAUDE.md) filetype="context" ;; + esac + echo "| ${filetype} | ${rel} | ${checksum} |" >> "${MANIFEST_FILE}.rows" +done + +cat > "${MANIFEST_FILE}" << MANIFEST +# Agent System Manifest +Export name: ${EXPORT_NAME} +Export date: ${DATE} +Generated by: Agent Factory export-system.sh + +## Components + +| Type | File | Checksum (SHA256, first 16) | +|------|------|---------------------------| +MANIFEST + +if [ -f "${MANIFEST_FILE}.rows" ]; then + cat "${MANIFEST_FILE}.rows" >> "${MANIFEST_FILE}" + rm "${MANIFEST_FILE}.rows" +fi + +cat >> "${MANIFEST_FILE}" << MANIFEST + +## Requirements + +| Requirement | Value | +|-------------|-------| +| Claude Code version | 1.x or later | +| MCP servers | List any required MCP servers here | +| Tools | List any required tools (Bash, WebSearch, etc.) | +| Environment | List required env vars (ANTHROPIC_API_KEY, etc.) | + +## Notes + +Add any project-specific setup notes here. +MANIFEST + +echo "[export] Manifest written" + +# Create tarball +cd "${STAGING_DIR}" +tar -czf "${PROJECT_ROOT}/${OUTPUT_FILE}" . +cd "${PROJECT_ROOT}" + +# Cleanup +rm -rf "${STAGING_DIR}" + +echo "[export] Done: ${OUTPUT_FILE}" +echo "[export] Size: $(du -sh "${OUTPUT_FILE}" | cut -f1)" +``` + +**`scripts/templates/transfer/import-system.sh`**: + +```bash +#!/bin/bash +# Agent Factory — import-system.sh +# Imports an exported agent system tarball into the current project. +# Usage: bash import-system.sh [--force] +# Bash 3.2 compatible. + +set -e + +TARBALL="$1" +FORCE="${2:-}" +STAGING_DIR="/tmp/agent-import-$$" + +if [ -z "$TARBALL" ]; then + echo "Usage: bash import-system.sh [--force]" >&2 + exit 1 +fi + +if [ ! -f "$TARBALL" ]; then + echo "ERROR: File not found: $TARBALL" >&2 + exit 1 +fi + +echo "[import] Importing: $TARBALL" + +mkdir -p "$STAGING_DIR" +tar -xzf "$TARBALL" -C "$STAGING_DIR" + +# Read MANIFEST.md +MANIFEST="${STAGING_DIR}/MANIFEST.md" +if [ ! -f "$MANIFEST" ]; then + echo "ERROR: No MANIFEST.md found in tarball — may not be a valid Agent Factory export" >&2 + rm -rf "$STAGING_DIR" + exit 1 +fi + +echo "[import] Manifest found:" +head -5 "$MANIFEST" +echo "" + +# Check for conflicts (existing files in destination) +CONFLICTS="" +find "$STAGING_DIR" -type f | sort | while read -r fpath; do + rel="${fpath#${STAGING_DIR}/}" + dest="${PWD}/${rel}" + if [ -f "$dest" ] && [ "$FORCE" != "--force" ]; then + echo "[import] CONFLICT: $rel already exists" + CONFLICTS="yes" + fi +done + +if [ -n "$CONFLICTS" ] && [ "$FORCE" != "--force" ]; then + echo "" + echo "Conflicts detected. Re-run with --force to overwrite, or remove conflicting files first." >&2 + rm -rf "$STAGING_DIR" + exit 1 +fi + +# Extract files to project +echo "[import] Extracting files" +find "$STAGING_DIR" -type f | sort | while read -r fpath; do + rel="${fpath#${STAGING_DIR}/}" + dest="${PWD}/${rel}" + destdir="$(dirname "$dest")" + mkdir -p "$destdir" + cp "$fpath" "$dest" + echo "[import] Wrote: $rel" +done + +# Replace {{PLACEHOLDER}} variables +echo "[import] Replacing placeholders" +PROJECT_DIR="$(pwd)" +PROJECT_NAME="$(basename "$PROJECT_DIR")" + +find "$PROJECT_DIR/.claude" "$PROJECT_DIR/automation" "$PROJECT_DIR/hooks" \ + "$PROJECT_DIR/CLAUDE.md" -type f 2>/dev/null | while read -r fpath; do + case "$fpath" in + *.sh|*.md|*.json|*.yml|*.yaml) + python3 - "$fpath" "$PROJECT_DIR" "$PROJECT_NAME" << 'PYEOF' +import sys, re +path, project_dir, project_name = sys.argv[1], sys.argv[2], sys.argv[3] +try: + with open(path, 'r', encoding='utf-8') as f: + content = f.read() + content = content.replace('{{PROJECT_DIR}}', project_dir) + content = content.replace('{{PROJECT_NAME}}', project_name) + with open(path, 'w', encoding='utf-8') as f: + f.write(content) +except Exception: + pass +PYEOF + ;; + esac +done + +# Make .sh files executable +echo "[import] Setting executable permissions" +find "$PROJECT_DIR/.claude/hooks" "$PROJECT_DIR/hooks" "$PROJECT_DIR/automation" \ + -name "*.sh" -type f 2>/dev/null | while read -r fpath; do + chmod +x "$fpath" + echo "[import] chmod +x: $fpath" +done + +# Validate frontmatter in agent files +echo "[import] Validating agent frontmatter" +INVALID_AGENTS="" +if [ -d "$PROJECT_DIR/.claude/agents" ]; then + for agent in "$PROJECT_DIR/.claude/agents"/*.md; do + [ -f "$agent" ] || continue + python3 - "$agent" << 'PYEOF' +import sys, re +path = sys.argv[1] +try: + with open(path, 'r') as f: + content = f.read() + if not content.startswith('---'): + print(f"WARN: {path} missing YAML frontmatter") + sys.exit(0) + parts = content.split('---', 2) + if len(parts) < 3: + print(f"WARN: {path} malformed frontmatter") +except Exception as e: + print(f"WARN: {path} could not be validated: {e}") +PYEOF + done +fi + +# Validate bash syntax on hook scripts +echo "[import] Validating hook syntax" +if [ -d "$PROJECT_DIR/.claude/hooks" ]; then + for script in "$PROJECT_DIR/.claude/hooks"/*.sh; do + [ -f "$script" ] || continue + if bash -n "$script" 2>/dev/null; then + echo "[import] OK: $script" + else + echo "[import] WARN: $script failed bash -n check" + fi + done +fi + +rm -rf "$STAGING_DIR" + +echo "" +echo "[import] Import complete." +echo "[import] Next steps:" +echo " 1. Review CLAUDE.md and update project-specific sections" +echo " 2. Add your ANTHROPIC_API_KEY to .env" +echo " 3. Run /agent-factory:status to verify the system" +``` + +**`scripts/templates/transfer/MANIFEST.md`**: + +```markdown +# Agent System Manifest +Export name: {{EXPORT_NAME}} +Export date: {{EXPORT_DATE}} +Generated by: Agent Factory export-system.sh + +## Components + +| Type | File | Checksum (SHA256, first 16) | +|------|------|---------------------------| +| agent | .claude/agents/pipeline-runner.md | a1b2c3d4e5f6a7b8 | +| skill | .claude/skills/my-pipeline/SKILL.md | b2c3d4e5f6a7b8c9 | +| hook | .claude/hooks/pre-tool-use.sh | c3d4e5f6a7b8c9d0 | +| settings | .claude/settings.json | d4e5f6a7b8c9d0e1 | +| automation | automation/run-pipeline.sh | e5f6a7b8c9d0e1f2 | +| context | CLAUDE.md | f6a7b8c9d0e1f2a3 | + +## Requirements + +| Requirement | Value | +|-------------|-------| +| Claude Code version | 1.x or later | +| MCP servers | (list any required MCP servers, e.g. GitHub, Slack) | +| Tools | Bash, Read, Write, Glob, Grep | +| Environment | ANTHROPIC_API_KEY | + +## Notes + +Replace all `{{PLACEHOLDER}}` variables after import. +See transfer/README.md for import instructions. +``` + +**`scripts/templates/transfer/README.md`**: + +```markdown +# Import / Export System + +Portable packaging for agent systems. Export a working agent system from one +project and import it into another, with automatic placeholder substitution. + +## Export + +Pack your current agent system into a tarball: + +```bash +bash scripts/templates/transfer/export-system.sh my-project-name +# Creates: agent-system-my-project-name-2026-04-11.tar.gz +``` + +### What is included + +| Included | Excluded | +|---------|---------| +| `.claude/agents/` | `.env` (secrets) | +| `.claude/skills/` | `*.local.*` files | +| `.claude/hooks/` | `audit.log` | +| `.claude/settings.json` | `cost-events.jsonl` | +| `hooks/` | `memory/` (machine-specific state) | +| `automation/` | `.git/` | +| `scripts/` | | +| `CLAUDE.md` | | + +## Import + +Extract a tarball into a new project directory: + +```bash +# Dry run first: inspect without extracting +tar -tzf agent-system-my-project-name-2026-04-11.tar.gz + +# Import (will stop if files conflict) +bash scripts/templates/transfer/import-system.sh agent-system-my-project-name-2026-04-11.tar.gz + +# Import and overwrite existing files +bash scripts/templates/transfer/import-system.sh agent-system-my-project-name-2026-04-11.tar.gz --force +``` + +The import script will: +1. Read `MANIFEST.md` and verify the archive +2. Check for conflicts with existing files (stops unless `--force`) +3. Extract all files +4. Replace `{{PROJECT_DIR}}` and `{{PROJECT_NAME}}` placeholders +5. Make `.sh` files executable +6. Validate agent frontmatter and hook syntax + +## Customization after import + +1. **Update `CLAUDE.md`** — replace project-specific context, goals, and constraints +2. **Add secrets** — create `.env` with `ANTHROPIC_API_KEY=sk-ant-...` +3. **Review hooks** — check that paths in hook scripts match your environment +4. **Run `/agent-factory:status`** — verify all components load correctly +5. **Run `/agent-factory:evaluate`** — check capability coverage + +## Manifest + +Each export includes `MANIFEST.md` listing every file with a SHA256 checksum +(first 16 hex characters). Use this to verify file integrity after transfer. + +Template manifest: `scripts/templates/transfer/MANIFEST.md` +``` + +### Verify + +```bash +bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/export-system.sh && \ +bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/import-system.sh && \ +echo OK +``` +Expected: `OK` + +### On failure +retry — fix syntax errors in the failing script, then re-verify + +### Checkpoint +```bash +git commit -m "feat(templates): add import/export system for agent systems" +``` + +--- + +## Step 24: Create 5 additional domain templates (total 10) + +### Files to create + +**`scripts/templates/domains/customer-support.md`**: + +```markdown +# Domain Template: Customer Support + + + + + +## Agent Definitions + +### ticket-classifier + +--- +name: ticket-classifier +description: | + Use this agent to classify incoming support tickets by type, priority, and sentiment. + + + Context: New support ticket needs routing + user: "Classify this support ticket" + assistant: "I'll use the ticket-classifier to determine type and priority." + Ticket triage step in customer support pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep", "Bash"] +--- + +You classify customer support tickets for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the ticket content from $ARGUMENTS or from `pipeline-input/` directory +2. Read CLAUDE.md for product context and classification taxonomy +3. Read memory/MEMORY.md for patterns from prior tickets +4. Classify along 3 axes: + - Type: billing, technical, feature-request, complaint, general + - Priority: critical (SLA breach risk), high, normal, low + - Sentiment: angry, frustrated, neutral, satisfied +5. Extract: customer name (if present), product area, key complaint phrase +6. Write classification to `pipeline-output/classified-$(date +%Y-%m-%d-%H%M).md` + +## Rules + +- Never guess at account details — extract only what is written +- If type is ambiguous, choose the broader category +- Mark as critical if: mentions legal action, data loss, or account termination threat +- Always output structured JSON in addition to the markdown report + +## Output format + +```json +{ + "type": "technical", + "priority": "high", + "sentiment": "frustrated", + "product_area": "{{DOMAIN}}", + "key_phrase": "cannot log in since yesterday", + "requires_escalation": false +} +``` + +### response-drafter + +--- +name: response-drafter +description: | + Use this agent to draft a customer support response from a classified ticket. + + + Context: Ticket has been classified and needs a response + user: "Draft a response for this ticket" + assistant: "I'll use the response-drafter to write a support reply." + Response drafting stage of customer support pipeline triggers this agent. + +model: opus +tools: ["Read", "Write", "Glob"] +--- + +You draft customer support responses for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the classified ticket and its JSON classification +2. Read CLAUDE.md for tone guidelines, response templates, and SLA commitments +3. Read `support-templates/` directory if it exists for approved response patterns +4. Match the tone to the sentiment: empathetic for frustrated, direct for neutral +5. Draft a response that: acknowledges the issue, provides a resolution or next step, sets expectations +6. Never promise features not confirmed in CLAUDE.md +7. Save draft to `pipeline-output/draft-response-$(date +%Y-%m-%d-%H%M).md` + +## Rules + +- Always acknowledge the customer's experience before explaining the solution +- Never use corporate jargon or hollow phrases ("We apologize for any inconvenience") +- If resolution is unclear: provide a concrete next step (link, escalation, timeline) +- Keep responses under 200 words unless complex technical explanation is needed +- Match formality to the customer's writing style + +### escalation-checker + +--- +name: escalation-checker +description: | + Use this agent to determine whether a ticket requires escalation beyond a standard response. + + + Context: Draft response is ready, need to check escalation policy + user: "Should this ticket be escalated?" + assistant: "I'll use the escalation-checker to evaluate the escalation criteria." + Escalation check stage of customer support pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep"] +--- + +You check escalation criteria for customer support tickets in {{PROJECT_DIR}}. + +## How you work + +1. Read the classified ticket, draft response, and escalation policy from CLAUDE.md +2. Check escalation triggers: + - Priority is critical + - Sentiment is angry AND issue is unresolved + - Customer has contacted support more than 3 times on the same issue (check memory) + - Legal or regulatory language in ticket + - Data loss or security concern +3. If escalation is triggered: identify the appropriate escalation path from CLAUDE.md +4. Output escalation decision with reasoning + +## Output format + +``` +ESCALATION DECISION: [YES / NO] +Triggers met: [list triggers, or "none"] +Escalation path: [team or person if YES, "n/a" if NO] +Recommended action: [specific next step] +``` + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run customer support ticket pipeline. Classifies, drafts responses, checks escalation. + Triggers on: "handle support ticket", "process ticket", "support pipeline" +version: 0.1.0 +--- + +**Step 1 — Load context:** Read CLAUDE.md for product info and support policy +**Step 2 — Classify:** Use ticket-classifier agent on incoming ticket +**Step 3 — Draft response:** Use response-drafter agent with classification +**Step 4 — Check escalation:** Use escalation-checker agent with ticket and draft +**Step 5 — Route:** If escalation YES: save to pipeline-output/escalate/. If NO: save to pipeline-output/ready/ +**Step 6 — Update memory:** Log ticket type, sentiment, resolution approach +**Step 7 — Report:** Output classification, response path, escalation decision +``` + +## Recommended Hooks + +Pre-tool-use: Block writes outside {{PROJECT_DIR}} and pipeline-output/ +Post-tool-use: Audit log all tool calls with ticket ID reference + +## Example CLAUDE.md Sections + +```markdown +## Customer Support Policy + +- Product: [your product name] +- Support channels: [email/chat/ticketing system] +- SLA: [response time commitments by priority] +- Escalation team: [team name or contact] +- Tone: [professional, friendly, direct] +- Approved resolution paths: [list standard resolutions] +``` +``` + +**`scripts/templates/domains/devops-automation.md`**: + +```markdown +# Domain Template: DevOps Automation + + + + + +## Agent Definitions + +### deploy-checker + +--- +name: deploy-checker +description: | + Use this agent to verify deployment health after a release. + + + Context: Deployment just completed + user: "Check the deployment health" + assistant: "I'll use the deploy-checker to verify service status post-deploy." + Post-deployment health check triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Glob", "Grep", "WebFetch"] +--- + +You check deployment health for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read deployment config from CLAUDE.md or `devops/config.md` +2. Run health checks: + - HTTP endpoint checks: expected status codes and response content + - Service process checks: expected processes running + - Log scanning: new ERROR/FATAL entries since deploy timestamp + - Resource checks: disk, memory within thresholds (via Bash if available) +3. Compare against baseline from memory/MEMORY.md +4. Classify findings: healthy, degraded, down + +## Rules + +- Record the check timestamp and deployment reference +- Never modify deployed services — read-only checks only +- Flag any ERROR log line introduced within 10 minutes of deploy + +### incident-detector + +--- +name: incident-detector +description: | + Use this agent to detect and classify incidents from system signals. + + + Context: Monitoring data shows anomalies + user: "Detect incidents from this data" + assistant: "I'll use the incident-detector to classify the anomalies." + Incident detection step in DevOps pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Grep", "Glob"] +--- + +You detect and classify incidents for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read health check output from deploy-checker +2. Scan log files for error patterns: stack traces, OOM kills, connection timeouts +3. Check alert rules from CLAUDE.md or `devops/alert-rules.md` +4. Classify incident severity: + - P1 (critical): service down, data loss risk, security breach + - P2 (high): significant degradation, partial outage + - P3 (medium): minor degradation, non-critical errors + - P4 (low): cosmetic issues, single isolated errors +5. Link incident to known runbooks if available in `devops/runbooks/` + +### runbook-executor + +--- +name: runbook-executor +description: | + Use this agent to execute a runbook in response to a detected incident. + + + Context: Incident detected and runbook identified + user: "Execute the restart runbook for this incident" + assistant: "I'll use the runbook-executor to run the appropriate runbook." + Runbook execution step in DevOps pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Write", "Glob"] +--- + +You execute runbooks for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the incident report and identified runbook from `devops/runbooks/` +2. Parse runbook steps — each step has: description, command, expected outcome, rollback +3. Execute steps one at a time via Bash, checking outcome against expected +4. If a step fails: stop, log failure, do NOT proceed to next step +5. Write execution log to `pipeline-output/runbook-run-$(date +%Y-%m-%d-%H%M).md` + +## Rules + +- Never execute runbook steps marked MANUAL — list them for human action instead +- Always confirm destructive operations (restart, delete) by re-reading the runbook step +- Log every command and its output before moving to the next step +- If the runbook is missing or incomplete: report and wait for human input + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run DevOps automation pipeline. Checks deployment, detects incidents, executes runbooks. + Triggers on: "check deployment", "run devops pipeline", "incident check" +version: 0.1.0 +--- + +**Step 1 — Load config:** Read CLAUDE.md for service endpoints and alert thresholds +**Step 2 — Check deployment:** Use deploy-checker agent +**Step 3 — Detect incidents:** If issues found, use incident-detector agent +**Step 4 — Execute runbook:** For P1/P2 incidents with matching runbook, use runbook-executor +**Step 5 — Save:** Write report to pipeline-output/devops-$(date +%Y-%m-%d-%H%M).md +**Step 6 — Alert:** For P1 incidents: print prominent warning; for P2: note in report +**Step 7 — Update memory:** Log check time, incident count, runbooks executed +``` + +## Recommended Hooks + +Pre-tool-use: Require confirmation before Bash commands matching `restart|stop|kill|delete|drop` +Post-tool-use: Audit all Bash executions with full command and exit code + +## Example CLAUDE.md Sections + +```markdown +## DevOps Configuration + +- Services: [list service names and endpoints] +- Health check endpoints: [URLs with expected responses] +- Log paths: [absolute paths to log files] +- Alert thresholds: [error rate, response time, disk usage] +- Runbooks: devops/runbooks/ directory +- On-call contact: [team or person for P1 incidents] +``` +``` + +**`scripts/templates/domains/legal-review.md`**: + +```markdown +# Domain Template: Legal Review + + + + + +## Agent Definitions + +### clause-extractor + +--- +name: clause-extractor +description: | + Use this agent to extract and categorize clauses from legal documents. + + + Context: Contract needs clause extraction before review + user: "Extract clauses from this contract" + assistant: "I'll use the clause-extractor to identify and categorize all clauses." + Clause extraction step in legal review pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Write"] +--- + +You extract and categorize clauses from legal documents for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the document from $ARGUMENTS or the `legal-input/` directory +2. Read CLAUDE.md for the clause taxonomy (which types of clauses matter for this domain) +3. Identify and extract all clauses, organized by type: + - Liability and indemnification + - Termination and notice + - Intellectual property + - Confidentiality and NDA + - Governing law and dispute resolution + - Payment and fee terms + - Warranties and representations +4. Note clause location (section number, page reference if available) +5. Flag non-standard or unusual phrasing + +## Rules + +- Extract verbatim — never paraphrase clauses in the extraction stage +- Note if a standard clause type appears to be missing +- This agent does NOT give legal advice — it extracts and organizes + +### risk-assessor + +--- +name: risk-assessor +description: | + Use this agent to assess risk in extracted contract clauses. + + + Context: Clauses have been extracted from a contract + user: "Assess the risk in these clauses" + assistant: "I'll use the risk-assessor to evaluate each clause for risk." + Risk assessment step in legal review pipeline triggers this agent. + +model: opus +tools: ["Read", "Write", "Glob"] +--- + +You assess risk in legal clauses for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the extracted clauses from clause-extractor output +2. Read CLAUDE.md for risk tolerance guidelines and known problematic patterns +3. For each clause type, assess: + - Exposure: what liability or obligation does this create? + - Asymmetry: is this clause balanced or heavily one-sided? + - Ambiguity: are key terms defined? Are obligations measurable? + - Precedent: is this standard for this type of contract? +4. Rate each finding: high risk, medium risk, low risk, note only +5. Provide specific commentary on high-risk clauses + +## Rules + +- This is a risk identification tool, not legal advice +- Always note that findings should be reviewed by qualified legal counsel +- Focus on structural risk, not stylistic preferences +- Compare against market standard where CLAUDE.md provides benchmarks + +### compliance-checker + +--- +name: compliance-checker +description: | + Use this agent to check a legal document against regulatory compliance requirements. + + + Context: Contract needs compliance verification + user: "Check this contract for GDPR compliance" + assistant: "I'll use the compliance-checker to verify regulatory requirements." + Compliance check step in legal review pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep"] +--- + +You check legal documents for compliance requirements in {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the document and extracted clauses +2. Read CLAUDE.md for applicable regulations and compliance checklist +3. For each regulation in scope: verify required clauses or language is present +4. Check data processing agreements if GDPR/CCPA in scope +5. Check jurisdiction-specific requirements from the governing law clause +6. Output: compliance checklist with PASS/FAIL/MISSING per requirement + +## Rules + +- Only check against regulations explicitly listed in CLAUDE.md +- Flag if governing law clause is missing or ambiguous +- Note if jurisdiction creates additional requirements not covered in CLAUDE.md +- This is a checklist tool — final compliance determination requires legal counsel + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run legal review pipeline. Extracts clauses, assesses risk, checks compliance. + Triggers on: "review contract", "legal review", "check this agreement" +version: 0.1.0 +--- + +**Step 1 — Load context:** Read CLAUDE.md for clause taxonomy and compliance requirements +**Step 2 — Extract clauses:** Use clause-extractor agent on the document +**Step 3 — Assess risk:** Use risk-assessor agent on extracted clauses +**Step 4 — Check compliance:** Use compliance-checker agent +**Step 5 — Combine:** Merge risk and compliance findings into a single report +**Step 6 — Save:** Write to pipeline-output/legal-review-$(date +%Y-%m-%d).md +**Step 7 — Update memory:** Log document type, risk findings count, compliance status +``` + +## Recommended Hooks + +Pre-tool-use: Block all writes outside {{PROJECT_DIR}} and pipeline-output/ — legal docs must not leave the project +Post-tool-use: Audit all file reads for data governance logging + +## Example CLAUDE.md Sections + +```markdown +## Legal Review Configuration + +- Contract types in scope: [MSA, NDA, SaaS agreements, etc.] +- Clause taxonomy: [list clause types that matter for your domain] +- Risk tolerance: [what risk levels require escalation to counsel] +- Regulations in scope: [GDPR, CCPA, SOC2, industry-specific] +- Compliance checklist: [link to or embed the checklist] +- Legal counsel contact: [for escalation of high-risk findings] + +IMPORTANT: This agent system identifies risk patterns and compliance gaps. +It does not provide legal advice. All high-risk findings must be reviewed +by qualified legal counsel before signing. +``` +``` + +**`scripts/templates/domains/sales-intelligence.md`**: + +```markdown +# Domain Template: Sales Intelligence + + + + + +## Agent Definitions + +### prospect-researcher + +--- +name: prospect-researcher +description: | + Use this agent to research a prospect before a sales engagement. + + + Context: Sales team needs intelligence on a prospect + user: "Research this prospect company" + assistant: "I'll use the prospect-researcher to gather intelligence on the company." + Prospect research step in sales intelligence pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep", "WebSearch", "WebFetch", "Write"] +--- + +You research sales prospects for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Parse prospect name/URL from $ARGUMENTS +2. Read CLAUDE.md for ICP (ideal customer profile) and what signals matter +3. Gather intelligence: + - Company overview: size, industry, funding stage, recent news + - Technology stack clues: job postings, tech blog, GitHub presence + - Pain signals: recent hiring patterns, product announcements, leadership changes + - Budget signals: funding rounds, enterprise customer base + - Decision-makers: who buys your category (from LinkedIn structure if available) +4. Score against ICP: strong fit, partial fit, weak fit +5. Save to `pipeline-output/prospect-{{AGENT_NAME}}-$(date +%Y-%m-%d).md` + +## Rules + +- Only use publicly available information +- Note source for every data point +- Mark inferences explicitly as [INFERRED] vs [CONFIRMED] +- Never fabricate contact details or company information + +### pitch-customizer + +--- +name: pitch-customizer +description: | + Use this agent to customize a sales pitch based on prospect research. + + + Context: Prospect research is complete and pitch needs customization + user: "Customize the pitch for this prospect" + assistant: "I'll use the pitch-customizer to tailor the messaging." + Pitch customization step in sales intelligence pipeline triggers this agent. + +model: opus +tools: ["Read", "Write", "Glob"] +--- + +You customize sales pitches for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the prospect research brief +2. Read the base pitch from CLAUDE.md or `sales/pitch-base.md` +3. Identify the 2-3 pain signals most relevant to your solution +4. Customize the pitch: + - Opening: reference specific prospect context (recent news, known challenge) + - Value proposition: emphasize benefits most relevant to their pain signals + - Social proof: pick case studies matching their industry/size + - Call to action: match their stage (awareness vs. evaluation vs. decision) +5. Keep the customization to specific paragraphs — do not rewrite the entire pitch + +## Rules + +- Stay within the approved pitch framework from CLAUDE.md +- Never claim capabilities not listed in the base pitch +- Flag if no matching case study exists for the prospect's profile + +### follow-up-tracker + +--- +name: follow-up-tracker +description: | + Use this agent to track and schedule follow-up actions for sales opportunities. + + + Context: Sales interaction completed and follow-up needed + user: "Schedule follow-up actions for this opportunity" + assistant: "I'll use the follow-up-tracker to log and schedule next steps." + Follow-up tracking step in sales intelligence pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Write", "Glob", "Grep", "Bash"] +--- + +You track follow-up actions for sales opportunities in {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read the interaction notes from $ARGUMENTS or `pipeline-input/` +2. Read memory/MEMORY.md for prior interactions with this prospect +3. Extract commitments: what was promised, by whom, by when +4. Identify next steps: follow-up date, required materials, approvals needed +5. Write to `pipeline-output/follow-up-$(date +%Y-%m-%d).md` +6. Append summary to memory/MEMORY.md for continuity + +## Output format + +``` +OPPORTUNITY: [prospect name] +Last interaction: [date] +Stage: [awareness / evaluation / proposal / negotiation / closed] + +Commitments: +- [who] will [what] by [when] + +Next steps: +- [action] by [date] — owner: [person or agent] + +Follow-up due: [date] +``` + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run sales intelligence pipeline. Researches prospects, customizes pitches, tracks follow-up. + Triggers on: "research prospect", "sales pipeline", "prepare for meeting" +version: 0.1.0 +--- + +**Step 1 — Load context:** Read CLAUDE.md for ICP, pitch framework, and active opportunities +**Step 2 — Research prospect:** Use prospect-researcher agent with $ARGUMENTS +**Step 3 — Customize pitch:** Use pitch-customizer agent with research brief +**Step 4 — Track follow-up:** Use follow-up-tracker agent to log commitments and schedule next steps +**Step 5 — Save:** Write complete intelligence pack to pipeline-output/sales-$(date +%Y-%m-%d).md +**Step 6 — Update memory:** Append interaction summary, ICP score, next follow-up date +``` + +## Recommended Hooks + +Pre-tool-use: Block writes outside {{PROJECT_DIR}} and pipeline-output/ — prospect data must stay within project +Post-tool-use: Log all web fetches for source attribution + +## Example CLAUDE.md Sections + +```markdown +## Sales Configuration + +- Product: [what you sell] +- ICP: [ideal customer profile — industry, size, tech stack signals, pain points] +- Base pitch: sales/pitch-base.md +- Case studies: sales/case-studies/ +- Pitch framework: [problem → solution → proof → CTA] +- CRM integration: [manual log, or MCP connector for your CRM] +``` +``` + +**`scripts/templates/domains/security-audit.md`**: + +```markdown +# Domain Template: Security Audit + + + + + +## Agent Definitions + +### config-scanner + +--- +name: config-scanner +description: | + Use this agent to scan configuration files for security misconfigurations. + + + Context: Security audit of project configuration needed + user: "Scan this project's configuration for security issues" + assistant: "I'll use the config-scanner to check for misconfigurations." + Configuration scanning step in security audit pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Glob", "Grep", "Bash"] +--- + +You scan configurations for security issues in {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read CLAUDE.md for the technology stack and what config files exist +2. Glob for all config files: `.env.example`, `*.yml`, `*.yaml`, `*.json`, `*.toml`, `*.ini` +3. For each config file, check: + - Secrets in plain text (API keys, passwords, tokens) + - Overly permissive file permissions (`chmod 777`, world-writable paths) + - Debug mode enabled in production configs + - Insecure defaults (default credentials, open CORS, disabled auth) + - Dependency versions with known CVEs (check package.json, requirements.txt) +4. Classify findings: critical, high, medium, informational + +## Rules + +- Never output the actual secret values — mask them as `[REDACTED]` +- Check `.gitignore` and warn if secret files might not be excluded +- Flag if `.env` files are committed (check git log if available) + +### vulnerability-checker + +--- +name: vulnerability-checker +description: | + Use this agent to check a project for known vulnerabilities. + + + Context: Config scan is complete and deeper vulnerability check is needed + user: "Check for vulnerabilities in this project" + assistant: "I'll use the vulnerability-checker to identify known CVEs and security patterns." + Vulnerability checking step in security audit pipeline triggers this agent. + +model: sonnet +tools: ["Read", "Bash", "Glob", "Grep"] +--- + +You check for vulnerabilities in {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read config-scanner findings +2. Run available dependency audit tools via Bash (non-destructive only): + - Node.js: `npm audit --json 2>/dev/null` if package.json exists + - Python: `pip-audit --output json 2>/dev/null` if requirements.txt exists +3. Check code patterns for common vulnerabilities: + - SQL injection: string concatenation in queries + - Command injection: unsanitized user input in shell commands + - Path traversal: user-controlled file paths without validation + - Hardcoded credentials in source code + - Insecure direct object references +4. Check Claude Code-specific risks: + - Hooks running untrusted input as shell commands + - Agents with Bash tool and no deny-list + - `--dangerously-skip-permissions` outside sandboxed context +5. Output: CVE list (if found), code pattern findings, Claude Code-specific risks + +### remediation-advisor + +--- +name: remediation-advisor +description: | + Use this agent to recommend remediation steps for security findings. + + + Context: Security findings need remediation recommendations + user: "Recommend fixes for these security findings" + assistant: "I'll use the remediation-advisor to produce actionable remediation steps." + Remediation advice step in security audit pipeline triggers this agent. + +model: opus +tools: ["Read", "Write", "Glob"] +--- + +You recommend security remediations for {{DOMAIN}} in {{PROJECT_DIR}}. + +## How you work + +1. Read all security findings from config-scanner and vulnerability-checker +2. For each finding, produce a remediation entry: + - What is the risk (plain language) + - Specific fix (exact change, not vague guidance) + - Effort estimate: low (< 1 hour), medium (< 1 day), high (> 1 day) + - Whether the fix can be automated vs. requires manual review +3. Prioritize: critical first, then by effort-to-impact ratio +4. For dependency CVEs: provide the minimum safe version to upgrade to +5. For Claude Code-specific findings: reference the appropriate settings.json pattern + +## Rules + +- Provide specific, actionable fixes — not "improve security" +- Never suggest fixes that would break functionality without noting the trade-off +- For critical findings with no easy fix: note interim mitigations + +## Output format + +``` +SECURITY AUDIT REPORT — {{DOMAIN}} +Date: [date] +Scope: {{PROJECT_DIR}} + +## Summary +Critical: [N] | High: [N] | Medium: [N] | Informational: [N] + +## Critical Findings +### [Finding ID]: [Title] +Risk: [plain language risk description] +Location: [file:line or component] +Fix: [specific remediation] +Effort: [low/medium/high] + +[repeat for each finding] + +## Recommended Priority Order +1. [finding ID] — [one line reason] +... +``` + +## Pipeline Skill Template + +```markdown +--- +name: {{PIPELINE_NAME}} +description: | + Run security audit pipeline. Scans config, checks vulnerabilities, recommends remediation. + Triggers on: "run security audit", "check security", "security scan" +version: 0.1.0 +--- + +**Step 1 — Load context:** Read CLAUDE.md for tech stack and security scope +**Step 2 — Scan config:** Use config-scanner agent on project files +**Step 3 — Check vulnerabilities:** Use vulnerability-checker agent +**Step 4 — Recommend remediation:** Use remediation-advisor agent with all findings +**Step 5 — Save:** Write full report to pipeline-output/security-audit-$(date +%Y-%m-%d).md +**Step 6 — Alert:** If critical findings: print prominent summary with finding IDs +**Step 7 — Update memory:** Log audit date, finding counts, remediated items from prior audits +``` + +## Recommended Hooks + +Pre-tool-use: Block writes outside {{PROJECT_DIR}} and pipeline-output/ — audit output must stay local +Post-tool-use: Log all file reads for audit trail + +## Example CLAUDE.md Sections + +```markdown +## Security Audit Configuration + +- Tech stack: [languages, frameworks, infrastructure] +- Config files to scan: [list key config file paths] +- Dependency manifests: [package.json, requirements.txt, go.mod, etc.] +- Compliance requirements: [SOC2, ISO 27001, PCI-DSS, etc.] +- Known accepted risks: [any accepted findings with risk owner and date] +- Secret patterns: [regex patterns for project-specific secrets to scan for] +``` +``` + +Now update `scripts/templates/domains/README.md` to list all 10 templates. The existing README covers 5; add the new 5 rows and update the table: + +**`scripts/templates/domains/README.md`** — replace the `## Available Templates` table with: + +```markdown +# Domain Templates + +Pre-built pipeline templates for common use cases. The builder agent reads these +during `/agent-factory:build` Phase 0 to pre-populate the design sketch. + +## Available Templates + +| Template | Domain | Agents | Pipeline | +|----------|--------|--------|----------| +| content-pipeline | Content production | content-researcher, content-writer, content-reviewer | Research → Draft → Review → Publish | +| code-review | Code review | code-analyzer, review-writer, standards-checker | Analyze → Write review → Check standards → Post | +| monitoring | System monitoring | monitor-checker, incident-reporter, remediation-advisor | Check → Detect → Report → Advise | +| research-synthesis | Research & analysis | source-gatherer, synthesizer, fact-checker | Gather → Synthesize → Verify → Produce brief | +| data-processing | Data transformation | data-validator, transformer, quality-checker | Validate → Transform → Check quality → Save | +| customer-support | Customer support | ticket-classifier, response-drafter, escalation-checker | Classify → Draft → Escalation check → Route | +| devops-automation | DevOps automation | deploy-checker, incident-detector, runbook-executor | Deploy check → Detect → Execute runbook → Report | +| legal-review | Legal document review | clause-extractor, risk-assessor, compliance-checker | Extract → Assess risk → Compliance check → Report | +| sales-intelligence | Sales intelligence | prospect-researcher, pitch-customizer, follow-up-tracker | Research → Customize pitch → Track follow-up → Report | +| security-audit | Security auditing | config-scanner, vulnerability-checker, remediation-advisor | Scan config → Check CVEs → Remediation → Report | + +## Usage + +During `/agent-factory:build`, choose a template when prompted: +"Would you like to start from a domain template?" + +The builder reads the chosen template and pre-populates: +- Agent roles and descriptions +- Pipeline steps and handoff points +- Recommended hooks for the domain +- Example CLAUDE.md sections + +## Template format + +Each template is a plain markdown file with `{{PLACEHOLDER}}` variables. +The builder agent replaces placeholders with project-specific values during +scaffolding. All templates follow the same structure: + +1. Header comment (domain description) +2. Agent definitions (frontmatter + system prompt per agent) +3. Pipeline skill template +4. Recommended hooks +5. Example CLAUDE.md sections + +## Placeholders + +All templates use these standard placeholders: + +| Placeholder | Description | +|------------|-------------| +| `{{PROJECT_DIR}}` | Absolute path to the user's project | +| `{{AGENT_NAME}}` | Name of the agent being generated | +| `{{PIPELINE_NAME}}` | Name of the pipeline skill | +| `{{SCHEDULE}}` | Cron expression or schedule description | +| `{{DOMAIN}}` | Domain name (e.g., "content", "code-review") | + +## Creating custom templates + +Copy any existing template and modify it. The builder agent can also generate +custom templates during the build workflow. +``` + +### Verify + +```bash +ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l +``` +Expected: `11` (6 from Session 2 + 5 new = 11 total) + +### On failure +retry — ensure all 5 new templates follow the same structure as the existing 5, then revert if still failing + +### Checkpoint +```bash +git commit -m "feat(templates): add 5 more domain templates (10 total)" +``` + +--- + +## Exit Condition + +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/docker/ | wc -l` → `4` +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/docker/docker-entrypoint.sh && echo OK` → `OK` +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/export-system.sh && echo OK` → `OK` +- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/import-system.sh && echo OK` → `OK` +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/transfer/ | wc -l` → `4` +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l` → `11` +- [ ] All 5 new domain templates contain `{{PIPELINE_NAME}}` placeholder: `grep -l "PIPELINE_NAME" /Users/ktg/repos/agent-builder/scripts/templates/domains/customer-support.md /Users/ktg/repos/agent-builder/scripts/templates/domains/devops-automation.md /Users/ktg/repos/agent-builder/scripts/templates/domains/legal-review.md /Users/ktg/repos/agent-builder/scripts/templates/domains/sales-intelligence.md /Users/ktg/repos/agent-builder/scripts/templates/domains/security-audit.md | wc -l` → `5` +- [ ] `grep -c "customer-support\|devops-automation\|legal-review\|sales-intelligence\|security-audit" /Users/ktg/repos/agent-builder/scripts/templates/domains/README.md` → `5` + +## Quality Criteria + +- All Docker templates use `{{PROJECT_NAME}}` and `{{ANTHROPIC_API_KEY}}` placeholders — no hardcoded values +- `docker-entrypoint.sh` is bash 3.2 compatible: no associative arrays, no mapfile, no `|&` +- `docker-entrypoint.sh` handles SIGTERM gracefully and writes health file on each beat +- `docker-compose.yml` includes `security_opt: [no-new-privileges:true]` and no Docker socket mount +- `export-system.sh` excludes `.env`, `*.local.*`, `audit.log`, `cost-events.jsonl`, `memory/`, `.git/` +- `export-system.sh` generates a MANIFEST.md with SHA256 checksums via python3 +- `import-system.sh` validates frontmatter and runs `bash -n` on all hook scripts after import +- Both transfer scripts are bash 3.2 compatible and pass `bash -n` +- All 5 new domain templates have 3 agent definitions each with valid YAML frontmatter blocks +- All 5 new domain templates have `` blocks in each agent description +- All 5 new domain templates include pipeline skill template, recommended hooks, and example CLAUDE.md sections +- `scripts/templates/domains/README.md` lists all 10 templates in the table diff --git a/.claude/plans/blueprints/session-7-skill-updates.md b/.claude/plans/blueprints/session-7-skill-updates.md new file mode 100644 index 0000000..03b580c --- /dev/null +++ b/.claude/plans/blueprints/session-7-skill-updates.md @@ -0,0 +1,1593 @@ +# Session 7: Skill Updates and References + +> Steps 13, 19, 25 | Wave 2 | Depends on: Sessions 3 and 4 + +## Dependencies + +Entry condition: Sessions 3 and 4 must be complete. Session 3 creates the memory and +autonomy templates referenced in Step 13. Session 4 creates the orchestration and +governance templates referenced in Step 19. Verify before starting: + +```bash +ls /Users/ktg/repos/agent-builder/scripts/templates/memory/SESSION-STATE.md +ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/HEARTBEAT.md +ls /Users/ktg/repos/agent-builder/scripts/templates/governance/GOVERNANCE.md +ls /Users/ktg/repos/agent-builder/scripts/templates/goals/GOALS.md +``` + +All four files must exist. + +## Scope Fence + +**Touch:** +- `skills/agent-system-design/SKILL.md` (modify) +- `skills/agent-system-design/references/memory-patterns.md` (new) +- `skills/agent-system-design/references/autonomy-patterns.md` (new) +- `skills/agent-system-design/references/orchestration-patterns.md` (new) +- `skills/agent-system-design/references/governance-patterns.md` (new) +- `skills/agent-system-design/references/mcp-integrations.md` (new) + +**Never touch:** +- `commands/` +- `agents/` +- `scripts/templates/` + +--- + +## Step 13: Add memory and autonomy pattern references to agent-system-design skill + +### Files to modify + +**`skills/agent-system-design/SKILL.md`** — Apply these diffs: + +**Diff 1 — Add trigger phrases to the `description` field in frontmatter.** + +Find: +``` + "how to build an agent with Claude Code" +version: 0.1.0 +``` + +Replace with: +``` + "how to build an agent with Claude Code", + "agent memory", "3-tier memory", "WAL protocol", + "proactive agent", "self-improving agent", "heartbeat scheduling" +version: 0.1.0 +``` + +**Diff 2 — Update the System components table to include Memory and Heartbeat rows.** + +Find: +``` +| Automation | `scripts/` + `launchd/` | Scheduled execution (Mac: launchd, Linux: systemd/cron) | +| Memory | `memory/` or `data/` | Persistent state files updated each run | +``` + +Replace with: +``` +| Automation | `scripts/` + `launchd/` | Scheduled execution (Mac: launchd, Linux: systemd/cron) | +| Memory | `memory/` or `data/` | Persistent state files updated each run | +| Memory (3-tier) | `SESSION-STATE.md`, `DAILY-LOG.md`, `MEMORY.md` | Hot/warm/cold tier memory with WAL protocol for crash safety | +| Heartbeat | `heartbeat/HEARTBEAT.md` + cron | Scheduled wakeup with context injection to restore agent state | +``` + +**Diff 3 — Add Memory patterns, Autonomy patterns, and Heartbeat scheduling sections before "Getting started".** + +Find: +``` +## Getting started + +Run `/agent-factory:build` for the guided 7-phase workflow. +``` + +Replace with: +``` +## Memory patterns + +Autonomous agents lose state between sessions. The 3-tier memory architecture gives +agents persistent memory that survives context compaction and crashes: + +- **Hot tier** — `SESSION-STATE.md`: written every turn using the WAL protocol. Read + first on resume. Captures current task, active decisions, pending actions. +- **Warm tier** — `DAILY-LOG.md`: rolling 7-day log of decisions and completions. + Used for mid-term recall and generating daily summaries. +- **Cold tier** — `MEMORY.md`: durable facts, project history, learned preferences. + Updated by compaction from DAILY-LOG at end of week. + +The WAL (Write-Ahead Log) protocol requires agents to write critical state to +SESSION-STATE.md *before* responding. If the session crashes mid-response, the +next session reads SESSION-STATE.md and recovers. + +For full memory architecture, compaction procedures, and template locations, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md` + +## Autonomy patterns + +Proactive agents act without explicit user prompts. Two constraints govern when +an agent should act autonomously versus wait: + +- **ADL (Autonomous Decision Limit)**: Defines which actions the agent may take + without approval. Examples: `ADL: read files, write to memory/, run tests`. + Anything not in the ADL list requires escalation. +- **VFM (Value/Friction Matrix)**: Scores each candidate action by value (benefit + to user) and friction (cost, reversibility, risk). Act autonomously only when + value is high and friction is low. Scored on a 0-10 scale each axis. + +The self-healing protocol handles transient failures: retry up to 5 attempts with +exponential backoff, then pause and log. Never retry indefinitely without a cap. + +For the full proactive cycle, ADL constraint examples, VFM scoring rubric with +worked examples, and when NOT to use proactive patterns, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md` + +## Heartbeat scheduling + +The heartbeat pattern wakes an agent on a schedule, injects a context packet +(goals, memory summary, pending tasks, budget status, wake reason), and lets the +agent decide what to do in this beat. If nothing needs attention, the agent +responds `HEARTBEAT_OK` and exits. + +This differs from a simple cron trigger: the context packet ensures the agent +always wakes up knowing its current state, even if it has no persistent memory +of previous sessions. + +Template locations (created in Sessions 3 and 4): +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/HEARTBEAT.md` +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/heartbeat-runner.sh` +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/context-packet.md` +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/wake-prompt.md` + +## Getting started + +Run `/agent-factory:build` for the guided 7-phase workflow. +``` + +**Diff 4 — Add memory-patterns and autonomy-patterns links to the "Getting started" reference block.** + +Find: +``` +For pipeline design patterns and agent role templates, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/pipeline-patterns.md` +``` + +Replace with: +``` +For pipeline design patterns and agent role templates, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/pipeline-patterns.md` + +For 3-tier memory architecture, WAL protocol, and compaction procedures, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md` + +For proactive agent patterns, ADL constraints, and VFM scoring, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md` +``` + +### Files to create + +**`skills/agent-system-design/references/memory-patterns.md`**: + +```markdown +# Memory Patterns for Autonomous Agents + +Reference for the agent-system-design skill. Covers 3-tier memory architecture, +WAL protocol, Working Buffer protocol, compaction recovery, and template locations. + +--- + +## 1. Three-Tier Memory Architecture + +Autonomous agents have no built-in persistent memory. The 3-tier pattern gives +them a structured place to write and read state across sessions and compaction events. + +| Tier | File | Volatility | Updated | Read order | +|------|------|-----------|---------|------------| +| Hot | `SESSION-STATE.md` | Every turn | Agent writes before each response | First | +| Warm | `DAILY-LOG.md` | Daily | Agent appends at session end | Second | +| Cold | `MEMORY.md` | Weekly | Compaction from DAILY-LOG | Third | + +**Read order on session start:** SESSION-STATE.md → DAILY-LOG.md (last 2 days) → +MEMORY.md. This ensures the agent loads the most recent context first without +reading the entire history. + +**Write order on session end:** Update SESSION-STATE.md first (WAL), then append +to DAILY-LOG.md, then check if weekly compaction is due. + +--- + +## 2. WAL Protocol + +WAL (Write-Ahead Log) prevents data loss when sessions crash or context compacts +mid-response. The rule is simple: **write before you respond**. + +### Protocol + +1. Agent receives a message or trigger +2. **Before generating the response:** write current task, key decisions, and + pending actions to SESSION-STATE.md +3. Generate the response +4. On session resume: read SESSION-STATE.md first — it contains the last known state + +### Why it works + +If the session crashes after step 2 but before step 3, the next session reads +SESSION-STATE.md and recovers. Without WAL, the crash leaves no trace of what +the agent was doing. + +### Example entry + +```markdown +## WAL Entry — 2025-11-14T09:30:00Z + +**Task:** Compacting DAILY-LOG.md into MEMORY.md +**Decision:** Keep entries from last 14 days, summarize older +**Next action:** Write compacted MEMORY.md, then truncate DAILY-LOG.md +**Status:** in progress +``` + +The agent writes this *before* starting the compaction. If it crashes mid-compaction, +the next session reads this entry and knows to check whether MEMORY.md was updated. + +--- + +## 3. Working Buffer Protocol + +When context window usage exceeds 60%, the agent activates the Working Buffer +in SESSION-STATE.md. This is a scratchpad for capturing key exchanges before +they are pushed out of the context window by compaction. + +### When to activate + +The agent monitors context usage through self-assessment. Signals to activate: +- Response latency increasing (internal signal) +- Conversation has been active for 90+ minutes +- Agent notices it cannot recall something from earlier in the session + +### Working Buffer format + +```markdown +## Working Buffer (ACTIVE — context at 72%) + +### Key exchange 1 — 09:15 +User asked about X. Decided to use approach Y because Z. + +### Key exchange 2 — 09:40 +Discovered constraint: W. This affects steps 3 and 4. + +### Decision: [the most critical unresolved decision in this session] +``` + +The Working Buffer is read before generating any response once activated. +Deactivate by moving captured content to DAILY-LOG.md at session end. + +--- + +## 4. Compaction Recovery Read Order + +When a session resumes after a context compaction event (Claude Code auto-summary), +the agent may have lost conversational context. Recovery read order: + +1. `SESSION-STATE.md` — current task and WAL entry (most recent state) +2. `DAILY-LOG.md` — last 2 days of entries (recent decisions and completions) +3. `MEMORY.md` — durable facts (only if SESSION-STATE.md indicates this is needed) + +This order minimizes tokens loaded while ensuring the agent can continue its +most recent task without asking the user to repeat context. + +--- + +## 5. Template Locations + +All memory templates are installed by Session 3 of the Agent Factory build workflow. + +| Template | Location | Purpose | +|----------|----------|---------| +| SESSION-STATE.md | `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/SESSION-STATE.md` | Hot tier template | +| DAILY-LOG.md | `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/DAILY-LOG.md` | Warm tier template | +| MEMORY.md | `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/MEMORY.md` | Cold tier template | +| README.md | `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/README.md` | Setup instructions | + +The Agent Factory builder copies these templates into the user's project directory +during Phase 2 (memory configuration) of the `/agent-factory:build` workflow. + +--- + +## 6. Memory and the Heartbeat Pattern + +The heartbeat pattern (see `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md`) +reads the memory files to build a context packet before each wakeup. The context +injection pattern is: + +``` +heartbeat-runner.sh + → read SESSION-STATE.md (current task and WAL) + → read DAILY-LOG.md (2 days) (recent activity) + → read GOALS.md (active goals) + → build context-packet.md (assembled packet) + → inject into wake-prompt.md (agent receives full context) + → invoke agent +``` + +This means the heartbeat runner is the memory reader, not the agent itself. +The agent receives a pre-assembled context packet and does not need to read +memory files directly. +``` + +**`skills/agent-system-design/references/autonomy-patterns.md`**: + +```markdown +# Autonomy Patterns for Proactive Agents + +Reference for the agent-system-design skill. Covers the proactive agent cycle, +ADL constraints, VFM scoring, self-healing protocol, agentTurn vs systemEvent, +and when NOT to use proactive patterns. + +--- + +## 1. The Proactive Agent Cycle + +A proactive agent wakes on schedule, evaluates its situation, acts or skips, +then sleeps again. The cycle has five stages: + +``` +WAKE + ↓ +READ context (SESSION-STATE, GOALS, recent events) + ↓ +EVALUATE: Is there work to do? (VFM scoring) + ↓ +If value > threshold AND friction < threshold: + ACT: take the highest-value low-friction action + WRITE state (WAL before acting, update after) + ↓ +If no qualifying action: + LOG: "HEARTBEAT_OK — no action needed" + ↓ +SLEEP (until next scheduled wakeup) +``` + +The key constraint: the agent evaluates *before* acting. An agent that acts +without evaluation is not proactive — it is reactive and unpredictable. + +--- + +## 2. ADL — Autonomous Decision Limit + +The ADL defines which actions the agent may take without human approval. +Everything outside the ADL requires escalation. + +### ADL format + +```markdown +## Autonomous Decision Limit + +The agent MAY autonomously: +- Read files anywhere in the project directory +- Write to memory/ and logs/ directories +- Run tests (npm test, pytest, cargo test) +- Create branches with prefix feature/ or fix/ +- Send Slack messages to #agent-activity channel only + +The agent MUST escalate before: +- Modifying files outside the project directory +- Making commits to main/master +- Sending messages to any channel other than #agent-activity +- Making API calls that have costs +- Taking any irreversible action not listed above +``` + +### ADL design principles + +- **Start narrow, expand on evidence.** Start with read-only and memory writes. + Expand the ADL only after observing the agent behave correctly in more sessions. +- **Irreversibility drives the line.** Actions that cannot be undone (commits, + external API calls, file deletions) stay outside the ADL until trust is earned. +- **Channel specificity over channel type.** "Send Slack messages" is too broad. + "Send Slack messages to #agent-activity channel only" is a testable constraint. + +### ADL examples by autonomy level + +| Level | ADL scope | +|-------|-----------| +| 0 — Human in loop | Empty ADL. Every action requires approval. | +| 1 — Read-only | Read files, read memory, read logs. No writes. | +| 2 — Memory + logs | Level 1 plus write to memory/ and logs/. | +| 3 — Local automation | Level 2 plus run tests, create branches, send designated channel messages. | +| 4 — Trusted operator | Level 3 plus commit to non-main branches, call approved external APIs. | + +Autonomy level 4 (full trust) should only be reached after the agent has +operated at level 3 for a sustained period without escalation failures. + +--- + +## 3. VFM — Value/Friction Matrix + +The VFM scoring rubric determines whether an action qualifies for autonomous +execution. Each candidate action gets two scores: + +- **Value (0–10):** How much does this action benefit the user if done now? +- **Friction (0–10):** How much effort, risk, or cost does this action incur? + +### Decision rule + +``` +If value >= 7 AND friction <= 3: ACT autonomously +If value >= 5 AND friction <= 2: ACT autonomously +If value >= 8 AND friction <= 5: ACT autonomously (high-value exception) +Otherwise: LOG intent, DO NOT ACT, optionally notify +``` + +The thresholds are starting points. Tune them based on observed behavior. +Lower the friction threshold if the agent acts too aggressively. Raise the +value threshold if the agent acts too conservatively. + +### Worked examples + +**Example 1 — Update SESSION-STATE.md at end of session** +- Value: 9 (prevents state loss, always needed) +- Friction: 1 (one file write, fully reversible) +- Decision: ACT (9 >= 7, 1 <= 3) ✓ + +**Example 2 — Send daily summary to Slack** +- Value: 7 (useful to the user, expected behavior) +- Friction: 3 (external message, channel is in ADL, low risk) +- Decision: ACT (7 >= 7, 3 <= 3) ✓ + +**Example 3 — Commit and push to main branch** +- Value: 6 (saves work, but user may want to review) +- Friction: 8 (irreversible without force push, affects others) +- Decision: DO NOT ACT (friction > 3) — log intent and notify + +**Example 4 — Run the full test suite** +- Value: 5 (useful, but not urgent if nothing changed) +- Friction: 2 (runs locally, no external side effects) +- Decision: ACT (5 >= 5, 2 <= 2) ✓ + +**Example 5 — Delete build artifacts older than 30 days** +- Value: 4 (marginal disk savings) +- Friction: 4 (deletion is irreversible, could delete needed files) +- Decision: DO NOT ACT (value < 5, friction > 3) — log suggestion + +--- + +## 4. Self-Healing Protocol + +Autonomous agents encounter transient failures: network timeouts, locked files, +API rate limits, temporary permission errors. The self-healing protocol handles +these without human intervention, up to a defined limit. + +### Protocol + +``` +Attempt action + → On success: continue + → On transient failure (timeout, rate limit, lock contention): + Wait 2^n seconds (exponential backoff: 2, 4, 8, 16, 32 seconds) + Retry + After 5 attempts: PAUSE and log failure + → On hard failure (permission denied, file not found, invalid input): + Do NOT retry + Log failure with full error message + Escalate if action was in ADL +``` + +**The 5-attempt limit is mandatory.** An agent that retries indefinitely will +run forever on a permanent failure. Always cap retries and log when the cap +is reached. + +### Transient vs hard failures + +| Error type | Retry? | Examples | +|-----------|--------|---------| +| Network timeout | Yes | HTTP 429, 503, connection refused | +| File lock | Yes | File currently being written by another process | +| Rate limit | Yes (with longer wait) | API 429, git push rate limit | +| Permission denied | No | chmod 000 file, read-only filesystem | +| File not found | No (usually) | Missing dependency, wrong path | +| Invalid input | No | Malformed JSON, wrong argument type | + +--- + +## 5. agentTurn vs systemEvent + +Two trigger types for proactive agents. Understanding the difference prevents +the agent from acting when it should not. + +### agentTurn + +The agent is invoked as part of a deliberate workflow. A human or orchestrator +explicitly asked the agent to do something. The agent has full permission to +act within its ADL. + +Use for: scheduled heartbeats, pipeline steps, explicit `/agent-factory:build` phases. + +### systemEvent + +An external event fired the agent (file watcher, webhook, monitoring alert). +The agent was not explicitly asked to act — it was notified. The agent should +evaluate using VFM before acting, not act automatically. + +Use for: file change monitors, CI failure alerts, Slack mention triggers. + +**Key distinction:** In an agentTurn, the context gives permission. In a +systemEvent, the agent must earn permission through VFM evaluation. + +--- + +## 6. When NOT to Use Proactive Patterns + +Proactive patterns add complexity. Do not use them unless the use case requires it. + +**Do not use proactive agents for:** + +- **One-shot tasks.** If the user runs the agent manually each time, a simple + pipeline is sufficient. Proactive patterns add overhead with no benefit. +- **Tasks that require real-time data.** A proactive agent wakes on a schedule. + If the task requires acting within seconds of an event, use a webhook receiver + or event-driven architecture instead. +- **Tasks with high friction, always.** If every action in the workflow scores + high on friction (irreversible, external side effects, costly), the VFM will + never clear the threshold. The agent will wake, evaluate, and go back to sleep + every time. This is waste. +- **Untrusted environments.** Do not deploy a proactive agent in an environment + where the blast radius of a mistake is large and you have not yet validated + the agent's behavior manually. + +**Start proactive only when:** +- The agent has been validated in manual (non-proactive) mode first +- The ADL is defined and tested +- The VFM thresholds are tuned for the specific use case +- Monitoring and logging are in place +``` + +### Verify + +```bash +grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md +``` +Expected: `>= 3` + +### On failure: revert + +### Checkpoint +```bash +git commit -m "feat(skills): add memory and autonomy pattern references to agent-system-design" +``` + +--- + +## Step 19: Add orchestration and governance pattern references to agent-system-design skill + +### Files to modify + +**`skills/agent-system-design/SKILL.md`** — Apply these diffs (builds on Step 13 changes): + +**Diff 1 — Extend trigger phrases in the `description` field.** + +Find: +``` + "agent memory", "3-tier memory", "WAL protocol", + "proactive agent", "self-improving agent", "heartbeat scheduling" +version: 0.1.0 +``` + +Replace with: +``` + "agent memory", "3-tier memory", "WAL protocol", + "proactive agent", "self-improving agent", "heartbeat scheduling", + "goal hierarchy", "budget tracking", "approval gates", + "governance", "org chart", "multi-agent coordination", "agent budget" +version: 0.1.0 +``` + +**Diff 2 — Extend the System components table to include orchestration components.** + +Find: +``` +| Memory (3-tier) | `SESSION-STATE.md`, `DAILY-LOG.md`, `MEMORY.md` | Hot/warm/cold tier memory with WAL protocol for crash safety | +| Heartbeat | `heartbeat/HEARTBEAT.md` + cron | Scheduled wakeup with context injection to restore agent state | +``` + +Replace with: +``` +| Memory (3-tier) | `SESSION-STATE.md`, `DAILY-LOG.md`, `MEMORY.md` | Hot/warm/cold tier memory with WAL protocol for crash safety | +| Heartbeat | `heartbeat/HEARTBEAT.md` + cron | Scheduled wakeup with context injection to restore agent state | +| Goals | `GOALS.md` | File-based goal hierarchy (company → project → task) with agent ownership | +| Budget | `BUDGET.md` + `budget/` | Post-hoc cost enforcement with soft warn and hard pause | +| Governance | `GOVERNANCE.md` | Autonomy level policy, approval gates, escalation rules | +| Org Chart | `ORG-CHART.md` | Agent reporting hierarchy with delegation and human override | +``` + +**Diff 3 — Add Orchestration patterns and Governance patterns sections before "Getting started".** + +Find: +``` +## Getting started + +Run `/agent-factory:build` for the guided 7-phase workflow. +``` + +Replace with: +``` +## Orchestration patterns + +Multi-agent systems need coordination beyond sequential pipelines. Four Paperclip +orchestration patterns supported by Agent Factory: + +- **Heartbeat with context injection** — The scheduler injects a pre-built context + packet into each wakeup. The agent wakes already knowing its state, goals, and + budget. No cold start. +- **Goal hierarchy** — Goals are organized as company → project → task with a + simple `parent_id` reference (not recursive). Each task goal has an owner (agent + name) and a status. The `goal-tracker.sh` script reads GOALS.md and generates + context for heartbeat injection. +- **Org chart and delegation** — The ORG-CHART.md file maps agents to roles and + reporting lines using a `reportsTo` pattern. Orchestrator agents route work to + specialists based on the org chart. Humans always have override authority. +- **Task checkout via file locking** — Multiple agents avoid working on the same + task by writing a `.lock` file before starting. If the lock exists, the agent + skips and logs. Lock files are cleaned up on completion or timeout. + +For full orchestration patterns including session persistence, heartbeat comparison +matrix (OpenClaw cron vs Paperclip heartbeat vs /schedule), see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md` + +## Governance patterns + +Autonomous agents need constraints. The governance system defines what level of +autonomy each agent operates at and what requires approval. + +**Autonomy levels (0–4):** + +| Level | Name | Description | +|-------|------|-------------| +| 0 | Full manual | Every action requires human approval | +| 1 | Read-only | Agent may read files and memory; no writes | +| 2 | Memory writes | Level 1 plus writes to memory/ and logs/ | +| 3 | Local operator | Level 2 plus tests, branches, designated channel messages | +| 4 | Trusted operator | Level 3 plus non-main commits, approved external APIs | + +**Approval gates** — Named checkpoints where an agent must request human approval +before proceeding. Defined in GOVERNANCE.md. Example gates: `pre-commit`, +`pre-deploy`, `pre-external-api`, `budget-exceeded`. + +**Budget enforcement** — Post-hoc: cost is checked after each run against the +policy in BUDGET.md. Soft threshold (default 80%) triggers a warning. Hard +threshold (100%) creates a PAUSED flag and blocks subsequent tool calls. + +For the full governance reference including audit trail requirements, +error threshold auto-pause, and Paperclip's "autonomy is a privilege" philosophy, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md` + +## Getting started + +Run `/agent-factory:build` for the guided 7-phase workflow. +``` + +**Diff 4 — Add orchestration and governance links to the reference block.** + +Find: +``` +For proactive agent patterns, ADL constraints, and VFM scoring, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md` +``` + +Replace with: +``` +For proactive agent patterns, ADL constraints, and VFM scoring, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md` + +For multi-agent orchestration, goal hierarchy, org chart, and heartbeat comparison, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md` + +For governance, autonomy levels, approval gates, and budget enforcement, see: +`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md` +``` + +### Files to create + +**`skills/agent-system-design/references/orchestration-patterns.md`**: + +```markdown +# Orchestration Patterns for Multi-Agent Systems + +Reference for the agent-system-design skill. Covers heartbeat with context +injection, goal hierarchy, org chart and delegation, task checkout, session +persistence, and a comparison of scheduling approaches. + +--- + +## 1. Heartbeat with Context Injection + +The Paperclip heartbeat pattern extends a simple cron trigger with a context +assembly step. Instead of waking an empty agent, the scheduler builds a context +packet containing everything the agent needs and injects it into the wake prompt. + +### The context assembly flow + +``` +heartbeat-runner.sh (runs on schedule) + ↓ +Assemble context packet: + read SESSION-STATE.md → current task, WAL entry + read DAILY-LOG.md (2 days) → recent activity + read GOALS.md → active goals (./goal-tracker.sh context) + read BUDGET.md + cost-events.jsonl → budget status + determine wake reason → scheduled beat, or event trigger + + ↓ +Populate context-packet.md template: + {{AGENT_NAME}}, {{MEMORY_SUMMARY}}, {{ACTIVE_GOALS}}, + {{BUDGET_STATUS}}, {{WAKE_REASON}} + + ↓ +Populate wake-prompt.md with context packet + ↓ +Invoke: claude -p --resume {{SESSION_NAME}} < wake-prompt.md +``` + +### Why inject context rather than let the agent read files? + +The agent reading its own memory files requires extra tool calls, adds latency, +and fails silently if a file is missing. The runner assembles the packet once, +handles missing files gracefully, and delivers a complete context in a single +system turn. The agent arrives oriented. + +### Template locations + +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/context-packet.md` +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/wake-prompt.md` +- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/heartbeat-runner.sh` + +--- + +## 2. Goal Hierarchy + +File-based goal hierarchy inspired by Paperclip's goal system. Three levels: +company → project → task. Each goal references its parent by ID. + +### Structure + +```markdown +## Company Goals +- [G1] Increase content output quality +- [G2] Reduce operational overhead + +## Project Goals +- [G1.1] Improve research agent accuracy (parent: G1) +- [G2.1] Automate daily log compaction (parent: G2) + +## Task Goals +- [G1.1.1] Add source verification to researcher (parent: G1.1, owner: researcher-agent, status: active) +- [G2.1.1] Write compaction script (parent: G2.1, owner: memory-agent, status: pending) +``` + +### Design decisions + +- **Simple parent_id, not recursive.** Each goal references its parent by ID. + The `goal-tracker.sh` script uses this for orphan detection and context + generation but does not traverse the tree recursively at runtime. +- **Dot notation for hierarchy.** G1 → G1.1 → G1.1.1 makes depth visible in + the ID without a database or recursive query. +- **Status at task level only.** Company and project goals are structural. + Status tracking happens at the task level where agents own the work. + +### Integration with heartbeat + +```bash +# In heartbeat-runner.sh: +ACTIVE_GOALS=$(./scripts/goal-tracker.sh context) +# Inject as {{ACTIVE_GOALS}} into context-packet.md +``` + +Template location: `${CLAUDE_PLUGIN_ROOT}/scripts/templates/goals/` + +--- + +## 3. Org Chart and Delegation + +The org chart defines the reporting structure for a multi-agent system. It +answers: when agent A cannot handle a task, who does it delegate to? + +### ORG-CHART.md format + +```markdown +| Agent | Role | Reports To | Status | Budget (cents/day) | +|-------|------|-----------|--------|-------------------| +| orchestrator | Coordinator | human | active | 200 | +| researcher | Research | orchestrator | active | 100 | +| writer | Content | orchestrator | active | 100 | +| reviewer | Quality | orchestrator | active | 50 | +``` + +### Delegation rules + +1. Orchestrator assigns tasks to specialists based on the org chart +2. Specialist completes or escalates to orchestrator (never skips levels) +3. Orchestrator escalates to human for decisions outside its ADL +4. Human override authority: any human instruction takes priority over the + org chart at any level + +### Cross-team routing + +For organizations with multiple orchestrators: + +```markdown +| Agent | Role | Reports To | Status | +|-------|------|-----------|--------| +| content-orchestrator | Content team lead | human | active | +| research-orchestrator | Research team lead | human | active | +| shared-researcher | Shared resource | content-orchestrator | active | +``` + +Shared agents report to a primary orchestrator but can receive delegated +tasks from other orchestrators via a task request file. + +Template location: `${CLAUDE_PLUGIN_ROOT}/scripts/templates/org-chart/` + +--- + +## 4. Task Checkout via File Locking + +When multiple agents may work on the same task pool, use file locking to +prevent duplicate work. Bash 3.2 compatible using `mkdir` as an atomic lock. + +```bash +LOCK_DIR="tasks/${TASK_ID}.lock" + +# Attempt checkout +if mkdir "$LOCK_DIR" 2>/dev/null; then + # Lock acquired — we own this task + echo "$AGENT_NAME" > "$LOCK_DIR/owner" + echo "$(date -u +%Y-%m-%dT%H:%M:%SZ)" > "$LOCK_DIR/acquired" + + # Do the work here + do_task "$TASK_ID" + + # Release lock + rm -rf "$LOCK_DIR" +else + # Lock held by another agent + OWNER=$(cat "$LOCK_DIR/owner" 2>/dev/null || echo "unknown") + echo "Task $TASK_ID is locked by $OWNER — skipping" + exit 0 +fi +``` + +### Stale lock cleanup + +Add to the heartbeat runner to clean locks older than 2 hours: + +```bash +find tasks/ -name "*.lock" -type d | while read lock; do + acquired=$(cat "$lock/acquired" 2>/dev/null || echo "") + if [ -n "$acquired" ]; then + age=$(python3 -c " +from datetime import datetime, timezone +import sys +t = datetime.fromisoformat('$acquired'.replace('Z','+00:00')) +now = datetime.now(timezone.utc) +print(int((now-t).total_seconds())) +" 2>/dev/null || echo 0) + if [ "$age" -gt 7200 ]; then + echo "Removing stale lock: $lock (age: ${age}s)" + rm -rf "$lock" + fi + fi +done +``` + +--- + +## 5. Session Persistence + +Agents can resume previous sessions using Claude Code's `--resume` flag. +The heartbeat runner manages session names to ensure consistent resumption. + +```bash +# heartbeat-runner.sh — session persistence pattern +SESSION_NAME="${AGENT_NAME}-$(date +%Y-%m)" # one session per month +SESSION_FILE="$WORKING_DIR/heartbeat/.current-session" + +# Check if session exists +if [ -f "$SESSION_FILE" ]; then + SESSION_ID=$(cat "$SESSION_FILE") + RESUME_FLAG="--resume $SESSION_ID" +else + RESUME_FLAG="" +fi + +# Run agent +claude -p $RESUME_FLAG --name "$SESSION_NAME" < wake-prompt-filled.md + +# Save session ID for next run +claude session list --json | python3 -c " +import json, sys +sessions = json.load(sys.stdin) +for s in sessions: + if s.get('name') == '$SESSION_NAME': + print(s['id']) + break +" > "$SESSION_FILE" 2>/dev/null || true +``` + +**Note:** Session persistence is optional. The context injection pattern +(Section 1) means the agent can work correctly without resuming a previous +session, because the context packet re-establishes its state. + +--- + +## 6. Scheduling Comparison Matrix + +Three scheduling approaches supported by Agent Factory. Choose based on +run frequency, context requirements, and deployment environment. + +| Dimension | OpenClaw cron | Paperclip heartbeat | Claude /schedule | +|-----------|--------------|---------------------|-----------------| +| Mechanism | cron / launchd | cron + context assembly script | Cloud task (Anthropic-hosted) | +| Context injection | Agent reads its own memory | Runner assembles and injects packet | Prompt in task definition | +| Minimum interval | 1 minute | 1 minute | 1 hour | +| Local file access | Yes | Yes | No | +| MCP servers | Yes | Yes | Via MCP connectors only | +| Session resume | Optional (--resume) | Optional (--resume) | Via API sessions | +| Best for | Simple schedules, file-heavy | Multi-agent systems, stateful agents | Repo-only work, cloud-native | +| Cold start problem | Agent reads memory each run | Solved by context injection | N/A (stateless tasks) | + +**Decision rule:** +- Personal pipeline with local files → OpenClaw cron pattern +- Multi-agent system with shared state → Paperclip heartbeat pattern +- GitHub/Linear/web-only tasks on a long interval → Claude /schedule +``` + +**`skills/agent-system-design/references/governance-patterns.md`**: + +```markdown +# Governance Patterns for Autonomous Agents + +Reference for the agent-system-design skill. Covers autonomy levels, approval +gates, budget enforcement, audit trail, error threshold auto-pause, and +Paperclip's "autonomy is a privilege" philosophy. + +--- + +## 1. The Core Philosophy: Autonomy is a Privilege + +Autonomous agents earn the right to act without approval through demonstrated +reliability. The governance system formalizes this by requiring agents to start +at low autonomy (level 0 or 1) and progress upward based on observed behavior. + +**The principle, stated plainly:** +> An agent that has never been seen to fail may act autonomously. +> An agent that has failed without recovery must earn autonomy back. + +This is Paperclip's operating assumption. It is not about distrust — it is about +building a verifiable track record before expanding permissions. Governance is not +a ceiling; it is a ladder. + +--- + +## 2. Autonomy Levels (0–4) + +Five levels mapping to progressively broader autonomous action. Define the +current level in GOVERNANCE.md. + +| Level | Name | What the agent may do without approval | +|-------|------|----------------------------------------| +| 0 | Full manual | Nothing. Every action requires human approval via prompt. | +| 1 | Read-only | Read files, read memory, query status. No writes. | +| 2 | Memory writes | Level 1 plus write to memory/ and logs/ directories. | +| 3 | Local operator | Level 2 plus run tests, create non-main branches, write to designated channels. | +| 4 | Trusted operator | Level 3 plus commit to non-main branches, call approved external APIs. | + +### Progressing through levels + +Advance one level at a time. Criteria for advancing: + +- **Level 0 → 1:** Agent correctly reads and summarizes state in 5 consecutive sessions +- **Level 1 → 2:** Agent correctly writes memory files without data loss in 5 consecutive sessions +- **Level 2 → 3:** Agent correctly manages its task queue and escalates appropriately in 10 consecutive sessions +- **Level 3 → 4:** Agent operates at level 3 for 30 days without an escalation failure + +Regress one level immediately on: unrecoverable state corruption, unauthorized +action outside ADL, failure to escalate a hard failure. + +--- + +## 3. Approval Gates + +Named checkpoints where the agent must pause and request human approval before +proceeding. Defined in GOVERNANCE.md under `## Approval Gates`. + +### Gate format + +```markdown +## Approval Gates + +- pre-commit: Required before any git commit to a non-draft branch + Condition: agent has staged changes and plans to commit + Channel: slack #agent-approvals + +- pre-deploy: Required before any deployment action + Condition: agent is about to run a deploy script or kubectl apply + Channel: slack #agent-approvals + +- budget-exceeded: Required when cumulative cost exceeds soft threshold + Condition: budget/cost-events.jsonl shows > 80% of limit + Channel: slack #agent-activity (warn), #agent-approvals (hard stop) + +- unknown-tool: Required when agent wants to use a tool not in its allowlist + Condition: tool name not present in settings.json permissions.allow + Channel: slack #agent-approvals +``` + +### Gate enforcement via PreToolUse hook + +The `approval-gate.sh` template implements gates as a PreToolUse hook: + +1. Check current autonomy level from GOVERNANCE.md +2. Check if the requested tool call matches a gate condition +3. If match and level < 4: write approval request to `governance/pending-approvals.jsonl` +4. Poll `governance/approval-responses.jsonl` for a matching response (timeout: 5 minutes) +5. If approved: exit 0 (allow) +6. If rejected or timeout: exit 2 (block) + +Template location: `${CLAUDE_PLUGIN_ROOT}/scripts/templates/governance/approval-gate.sh` + +--- + +## 4. Budget Enforcement + +Post-hoc budget enforcement: check cost after each run rather than reserving +budget before. This matches Paperclip's actual implementation (simpler, no +pre-run coordination needed). + +### How it works + +1. `budget-hook.sh` runs as a PostToolUse hook +2. Each tool call is logged to `budget/cost-events.jsonl` +3. Cumulative cost is compared against `BUDGET.md` policy +4. At soft threshold (default 80%): warning logged, Slack notification optional +5. At hard threshold (100%): `budget/PAUSED` flag created; subsequent PreToolUse + hook calls exit 2 (block) until the flag is manually removed + +### Resuming after a hard stop + +```bash +# Check why agent was paused +cat budget/PAUSED + +# Review cost events +./scripts/budget-report.sh + +# If budget is sufficient, remove the pause flag to resume +rm budget/PAUSED +``` + +### Important limitation: cost estimation accuracy + +The budget-hook.sh template counts tool events as a rough cost proxy. +Accurate cost tracking requires one of: + +- **Admin API** (`/v1/organizations/cost_report`) — requires an Admin API key + (`sk-ant-admin...`), available to organization accounts only +- **Token counting** — parse token counts from Claude's responses and multiply + by published per-token prices +- **`--max-budget-usd N` flag** — per-run budget cap when invoking `claude -p` + +For personal agents, the event-count proxy is sufficient as a relative indicator. +For production agents with real cost accountability, use one of the accurate methods. + +Template locations: `${CLAUDE_PLUGIN_ROOT}/scripts/templates/budget/` + +--- + +## 5. Audit Trail + +Every action taken by an autonomous agent should be logged for post-hoc review. +Minimum audit trail for a production agent: + +``` +logs/ + audit.log — tool calls with timestamp, tool name, input summary + decisions.log — VFM evaluations and outcomes + escalations.log — approval gate requests and responses +budget/ + cost-events.jsonl — per-tool-call cost events +governance/ + pending-approvals.jsonl — approval requests sent + approval-responses.jsonl — responses received +``` + +### Audit log format (per tool call) + +``` +2025-11-14T09:30:00Z TOOL=Write FILE=memory/SESSION-STATE.md AGENT=orchestrator LEVEL=2 STATUS=allowed +2025-11-14T09:30:05Z TOOL=Bash CMD=git commit -m "..." AGENT=orchestrator LEVEL=2 GATE=pre-commit STATUS=pending-approval +2025-11-14T09:35:12Z GATE=pre-commit RESPONSE=approved APPROVER=human AGENT=orchestrator +``` + +### Log rotation + +Audit logs grow unbounded without rotation. Options: +- **logrotate** (Linux): configure in `/etc/logrotate.d/` +- **launchd** (macOS): size-based rotation via a cleanup script in the heartbeat +- **max-lines truncation**: keep last 10,000 lines, archive the rest to `logs/archive/` + +--- + +## 6. Error Threshold and Auto-Pause + +Agents that encounter repeated failures indicate either a systemic problem or a +changed environment. The error threshold triggers auto-pause before the agent +causes further damage. + +### Configuration in GOVERNANCE.md + +```markdown +## Error Policy + +- error_threshold: 3 # consecutive failures before auto-pause +- error_window: 1h # window for counting consecutive failures +- auto_pause: true # create PAUSED flag on threshold breach +- notify_on_pause: slack # channel to notify (optional) +``` + +### What counts as a failure + +- Any non-zero exit from a pipeline step that is not a deliberate `HEARTBEAT_OK` +- Any escalation that receives a rejection response (not just timeout) +- Any budget hard stop + +### Recovering from auto-pause + +```bash +# Check error log +tail -20 logs/audit.log + +# Understand why the threshold was reached +grep "FAILURE\|ERROR" logs/audit.log | tail -20 + +# Fix the underlying issue first +# Then remove the pause flag +rm budget/PAUSED # if budget-triggered +rm governance/PAUSED # if error-threshold-triggered + +# Reset error counter (optional — some implementations auto-reset after fix) +rm -f governance/error-count +``` + +--- + +## 7. GOVERNANCE.md Template Structure + +```markdown +# Governance Policy: {{PROJECT_NAME}} + +## Autonomy Level + +current: {{AUTONOMY_LEVEL}} # 0-4 +set_by: {{SET_BY}} +set_date: {{SET_DATE}} + +## Autonomous Decision Limit + +The agent MAY autonomously: +{{ADL_ALLOWED_ACTIONS}} + +The agent MUST escalate before: +{{ADL_ESCALATION_TRIGGERS}} + +## Approval Gates + +{{APPROVAL_GATES}} + +## Error Policy + +- error_threshold: 3 +- error_window: 1h +- auto_pause: true +- notify_on_pause: none + +## Audit Requirements + +- tool calls: log all +- budget events: log all +- escalations: log all +- retention: 90 days +``` + +Template location: `${CLAUDE_PLUGIN_ROOT}/scripts/templates/governance/GOVERNANCE.md` +``` + +### Verify + +```bash +grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md +``` +Expected: `>= 5` + +### On failure: revert + +### Checkpoint +```bash +git commit -m "feat(skills): add orchestration and governance pattern references" +``` + +--- + +## Step 25: Create MCP integration reference + +### Files to create + +**`skills/agent-system-design/references/mcp-integrations.md`**: + +```markdown +# MCP Integration Reference + +Reference for the agent-system-design skill. Covers MCP servers for communication, +data, browser automation, and custom integrations. Includes .mcp.json examples, +agent type recommendations, and security considerations. + +--- + +## What is MCP + +MCP (Model Context Protocol) is an open protocol that lets Claude Code connect to +external services via MCP servers. MCP servers expose tools that Claude can call +just like built-in tools (Read, Write, Bash). The connection is configured in +`.mcp.json` in the project root or `~/.claude/.mcp.json` for global servers. + +**Important:** MCP servers are available only in local deployments (cron, launchd, +Docker, VPS). Managed agents (Anthropic API `/v1/agents`) do not support MCP servers. + +--- + +## 1. Communication Integrations + +### Slack — `@anthropic-ai/mcp-server-slack` + +Lets agents send and read Slack messages. Most useful for: notifications from +heartbeat agents, approval gate requests, daily summaries. + +```json +{ + "mcpServers": { + "slack": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-slack"], + "env": { + "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}", + "SLACK_TEAM_ID": "${SLACK_TEAM_ID}" + } + } + } +} +``` + +**Agent types that benefit:** Orchestrators (send status updates), approval-gate +hooks (request approvals), monitoring agents (alert on threshold breach). + +**Security considerations:** +- Use a dedicated bot account, not a personal token +- Scope the bot to specific channels (don't grant workspace-wide access) +- Rotate the bot token every 90 days +- Store `SLACK_BOT_TOKEN` in environment (macOS Keychain → `~/.zshenv`), never in `.mcp.json` + +### GitHub — `@anthropic-ai/mcp-server-github` + +Lets agents read and write GitHub issues, PRs, and files. Most useful for: +PR review agents, issue triage, automated release notes. + +```json +{ + "mcpServers": { + "github": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-github"], + "env": { + "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" + } + } + } +} +``` + +**Agent types that benefit:** Code review agents, release managers, issue triagers. + +**Security considerations:** +- Use a fine-grained personal access token scoped to specific repos +- Grant only the permissions the agent actually needs (read-only if it only comments) +- Do not use a classic token with full repo scope + +### Linear — `@linear/mcp-server-linear` (via plugin) + +Lets agents create, update, and query Linear issues and projects. + +```json +{ + "mcpServers": { + "linear": { + "command": "npx", + "args": ["-y", "@linear/mcp-server"], + "env": { + "LINEAR_API_KEY": "${LINEAR_API_KEY}" + } + } + } +} +``` + +**Agent types that benefit:** Project management agents, sprint planners, backlog +maintainers. + +**Security considerations:** +- Use a Linear API key scoped to the specific team or workspace +- An agent that can create issues can also spam your backlog — scope the ADL carefully + +--- + +## 2. Data Integrations + +### PostgreSQL + +Lets agents query and write to a PostgreSQL database. Most useful for: data +analysis agents, state storage beyond flat files, reporting pipelines. + +```json +{ + "mcpServers": { + "postgres": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-postgres", + "postgresql://localhost/mydb"] + } + } +} +``` + +**Agent types that benefit:** Data analysis agents, audit log writers (more +durable than JSONL), reporting agents. + +**Security considerations:** +- Use a dedicated database user with minimal privileges (SELECT + INSERT only if the agent only reads and logs) +- Never use the postgres superuser +- Prefer a local socket connection over TCP if the database is on the same machine +- Connection string contains credentials — load from environment, never hardcode + +### SQLite + +File-based SQL database. No server required. Good for single-agent state that +outgrows flat files but does not need PostgreSQL. + +```json +{ + "mcpServers": { + "sqlite": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-sqlite", + "--db-path", "/path/to/agent.db"] + } + } +} +``` + +**Agent types that benefit:** Single-agent systems where JSONL is too unstructured +but PostgreSQL is too heavy. + +**Security considerations:** +- The SQLite file is a plain file — apply filesystem permissions (`chmod 600`) +- Back up the file before each heartbeat run if it contains critical state + +### Filesystem (extended) + +The built-in Read/Write/Bash tools cover most filesystem needs. The MCP filesystem +server adds directory listing, search, and recursive operations that are otherwise +verbose with raw Bash. + +```json +{ + "mcpServers": { + "filesystem": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-filesystem", + "/path/to/allowed/directory"] + } + } +} +``` + +**Security considerations:** +- The path argument is the root of allowed access — set it to the project directory, not `/` +- Add a `PreToolUse` hook to log filesystem MCP tool calls alongside native tool calls + +--- + +## 3. Browser Automation + +### Playwright — `@anthropic-ai/mcp-server-playwright` + +Lets agents control a browser: navigate, click, fill forms, take screenshots, +extract content. Most useful for: web scraping, form automation, UI testing, +competitive research. + +```json +{ + "mcpServers": { + "playwright": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-playwright"] + } + } +} +``` + +**Agent types that benefit:** Research agents (deep web extraction), monitoring +agents (check site availability or content), automation agents (form submission). + +**Security considerations:** +- Browser automation can interact with authenticated sessions — ensure the agent + runs with a clean browser profile, not your personal browser profile +- Headless mode is appropriate for server deployments; headed mode for local debugging +- Rate-limit scraping tasks to avoid being blocked or causing load on target sites +- Never store session cookies in version control + +--- + +## 4. Custom MCP Servers + +When no existing MCP server covers your integration, build one using the +Anthropic SDK. + +### Minimal TypeScript MCP server + +```typescript +import Anthropic from "@anthropic-ai/sdk"; +import { Server } from "@anthropic-ai/sdk/mcp"; +import { z } from "zod"; + +const server = new Server({ + name: "my-custom-server", + version: "0.1.0", +}); + +server.tool( + "my_tool", + { + description: "Does something useful", + inputSchema: z.object({ + param: z.string().describe("The input parameter"), + }), + }, + async ({ param }) => { + // Your integration logic here + const result = await callExternalService(param); + return { + content: [{ type: "text", text: result }], + }; + } +); + +server.run({ transport: "stdio" }); +``` + +```json +{ + "mcpServers": { + "my-custom-server": { + "command": "node", + "args": ["/path/to/my-custom-server/index.js"], + "env": { + "MY_API_KEY": "${MY_API_KEY}" + } + } + } +} +``` + +**When to build custom:** +- The target API has no existing MCP server +- Existing MCP servers expose more than the agent needs (reduce attack surface) +- You need to enforce business logic in the server (e.g., rate limits, audit logging) +- The integration requires local state that a remote MCP server cannot hold + +--- + +## 5. Integration with Agent Factory + +The Agent Factory builder configures `.mcp.json` in Phase 3 of the +`/agent-factory:build` workflow: + +1. **Phase 3 — MCP configuration:** Builder asks which external services the + agent system needs. For each selected service, it appends the appropriate + server entry to `.mcp.json` and adds the required environment variable to + `hooks/env-check.sh`. + +2. **Phase 5 — Deployment:** The `deployment-advisor` agent checks MCP + availability for the chosen deployment target and warns if managed agents + are selected with MCP dependencies. + +3. **Phase 6 — Security review:** The security review step checks that all + MCP server tokens are loaded from environment variables (not hardcoded), + that each server is scoped to the minimum required permissions, and that + the `.mcp.json` is not committed with secrets. + +### Checking MCP availability + +```bash +# Verify MCP servers are connected +claude mcp list + +# Test a specific server +claude mcp test slack +``` + +### Keeping .mcp.json out of version control + +If `.mcp.json` contains sensitive paths or server configs that vary by environment, +add it to `.gitignore` and provide a `.mcp.json.template` with placeholders: + +```gitignore +.mcp.json +``` + +```json +{ + "mcpServers": { + "slack": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-slack"], + "env": { + "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}", + "SLACK_TEAM_ID": "${SLACK_TEAM_ID}" + } + } + } +} +``` + +The template uses `${VAR}` syntax — shell environment variables, not hardcoded values. +``` + +### Verify + +```bash +test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md && echo "EXISTS" +``` +Expected: `EXISTS` + +### On failure: skip (supplementary reference, does not block other steps) + +### Checkpoint +```bash +git commit -m "docs(skills): add MCP integration reference" +``` + +--- + +## Exit Condition + +- [ ] `grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md` → `>= 3` +- [ ] `grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md` → `>= 5` +- [ ] `test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/memory-patterns.md` → exit 0 +- [ ] `test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/autonomy-patterns.md` → exit 0 +- [ ] `test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/orchestration-patterns.md` → exit 0 +- [ ] `test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/governance-patterns.md` → exit 0 +- [ ] `test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md` → exit 0 +- [ ] SKILL.md description field contains trigger phrases from both Step 13 and Step 19 (composable) +- [ ] All reference files contain section headers matching their names (grep for "## 1.") +- [ ] Step 13 and Step 19 diffs do not overlap — each diff targets unique text anchors + +## Quality Criteria + +- SKILL.md description field triggers match the pattern of the existing trigger phrases (quoted, comma-separated) +- System components table rows added in Step 13 and Step 19 do not duplicate each other +- Memory patterns reference covers all three tiers with precise write/read order +- WAL protocol entry includes a concrete example, not just a description +- Autonomy patterns reference includes VFM worked examples with numeric scores and outcomes +- Self-healing protocol specifies the 5-attempt cap explicitly (never open-ended retries) +- Orchestration patterns comparison matrix covers all three scheduling approaches (OpenClaw cron, Paperclip heartbeat, /schedule) +- Governance patterns reference states the "autonomy is a privilege" philosophy in concrete terms with progression criteria +- MCP reference includes `.mcp.json` examples for each server — copy-paste ready +- MCP reference notes the managed-agents limitation (no MCP support) explicitly +- All `${CLAUDE_PLUGIN_ROOT}` paths point to files that will exist after Sessions 3 and 4 +- No security considerations section uses vague language — each point is a specific, actionable instruction diff --git a/.claude/plans/blueprints/session-8-finalization.md b/.claude/plans/blueprints/session-8-finalization.md new file mode 100644 index 0000000..4cc2c73 --- /dev/null +++ b/.claude/plans/blueprints/session-8-finalization.md @@ -0,0 +1,937 @@ +# Session 8: Build Command Integration and Finalization + +> Steps 8, 26, 27 | Wave 3 | Depends on: Sessions 1, 2, 7 + +## Dependencies + +Entry condition: Session 1 complete (commands renamed to `/agent-factory:*`), Session 2 complete (5 domain templates exist in `scripts/templates/domains/`), Session 7 complete (skill references updated in `skills/agent-system-design/`) + +## Scope Fence + +**Touch:** +- `commands/build.md` +- `.claude-plugin/plugin.json` +- `CLAUDE.md` +- `README.md` + +**Never touch:** +- `scripts/templates/` (read-only — already created in Sessions 3-6) +- `skills/` (read-only — already updated in Session 7) +- `agents/` (no changes needed) + +--- + +## Step 8: Update build command to use domain templates and new features + +### Files to modify + +**`commands/build.md`** — Apply the following changes: + +**Change 1 — Rename command reference in line 7:** + +Old: +``` +You are running `/agent-builder:build` — a guided 7-phase workflow for building a complete autonomous agent system with Claude Code. +``` + +New: +``` +You are running `/agent-factory:build` — a guided 7-phase workflow for building a complete autonomous agent system with Claude Code. +``` + +**Change 2 — Add Phase 0 before Phase 1:** + +Insert the following section immediately before the `## Phase 1: Map Your Work` heading: + +```markdown +--- + +## Phase 0: Choose a Starting Point + +Goal: Offer domain templates to accelerate the design process. + +Ask the user using AskUserQuestion: +"Would you like to start from a domain template? Templates pre-populate the agent roles and pipeline structure for common use cases. + +Available templates: +1. **content-pipeline** — Articles, newsletters, reports. Agents: researcher, writer, reviewer. +2. **code-review** — Automated PR review. Agents: analyzer, review-writer, standards-checker. +3. **monitoring** — System/service monitoring. Agents: monitor-checker, incident-reporter, remediation-advisor. +4. **research-synthesis** — Research and analysis. Agents: source-gatherer, synthesizer, fact-checker. +5. **data-processing** — Data transformation. Agents: data-validator, transformer, quality-checker. +6. **custom** — Blank start. Answer the Phase 1 questions yourself. + +Enter a template name or 'custom'." + +If a template is chosen (not 'custom'): +- Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.md` where `{{TEMPLATE}}` is the chosen name +- Extract the agent definitions and pipeline skill template from the file +- Pre-populate the Phase 1 design sketch using the template's roles and pipeline steps +- Tell the user: "I've loaded the [template] template. In Phase 1, I'll show you the pre-built design and you can customize it." +- Skip Phase 1's interview questions and go directly to showing the pre-populated design sketch +- Ask the user: "Does this design sketch match your use case? What would you change?" +- Incorporate any changes before proceeding to Phase 2. + +If 'custom' is chosen: +- Proceed normally to Phase 1. +``` + +**Change 3 — Replace Phase 6 (Deployment) entirely:** + +Replace the entire `## Phase 6: Deployment` section with: + +```markdown +## Phase 6: Deployment + +Goal: Configure automated scheduling for the pipeline. + +Use the `/agent-factory:deploy` command to handle deployment configuration. Tell the user: +"I'll hand off to the deploy command to configure your deployment target." + +Then invoke `/agent-factory:deploy` passing the pipeline name and agent list as context. + +If the user wants to configure deployment inline without switching commands, ask using AskUserQuestion: + +"Where will this pipeline run? Choose a deployment target: + +1. **/schedule (cloud)** — Anthropic's cloud scheduler. Needs GitHub repo. No local file access. 1-hour minimum interval. Simplest option if your project is on GitHub. +2. **Desktop scheduled tasks** — Runs on your machine via Claude Code Desktop. Local file access. 1-minute minimum interval. Best for local automation. +3. **Local (cron/launchd)** — Traditional scheduler. Runs headless via `claude -p`. Full local access. Best for personal daily pipelines. +4. **VPS (systemd)** — Linux server with systemd service + timer. Always-on. Best for team pipelines and production workloads. +5. **Docker** — Containerized agent. Portable, isolated, reproducible. Best for consistent environments and security isolation." + +For each target, follow the deployment steps from `${CLAUDE_PLUGIN_ROOT}/commands/deploy.md`. + +Tell the user what files were created and the exact commands needed to activate the schedule. +``` + +**Change 4 — Update Summary section:** + +Replace the summary block at the end of the file with: + +```markdown +## Summary + +After all phases are complete (or the user stops early), print a summary: + +``` +AGENT SYSTEM BUILT +================== +Agents: .claude/agents/[list] +Pipeline: .claude/skills/[name].md +Hooks: .claude/hooks/pre-tool-use.sh, post-tool-use.sh +Settings: .claude/settings.json +Schedule: automation/[script or config] + +Run your pipeline: /[pipeline-name] [your topic] +Review logs: .claude/hooks/audit.log +Evaluate system: /agent-factory:evaluate +Check status: /agent-factory:status +``` + +If the user stopped early, tell them which phase to continue from and remind them that this command can be re-invoked at any time. + +Next steps: +- Run `/agent-factory:evaluate` to score your system against the 22 agent capabilities +- Run `/agent-factory:status` for a quick health check of all components +- Run `/agent-factory:deploy` to configure or change your deployment target +``` + +### Verify + +```bash +grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md +``` +Expected: `>= 3` + +### On failure +revert — build command is critical path + +### Checkpoint +```bash +git commit -m "feat(commands): integrate domain templates and new commands into build workflow" +``` + +--- + +## Step 26: Update build command to integrate all Phase 2-5 features + +### Files to modify + +**`commands/build.md`** — Build on Step 8's changes. Apply the following additions: + +**Change 1 — Add Phase 2.5 after Phase 2:** + +Insert the following section immediately after the `## Phase 2: Operating Manual` section (after the line "Tell the user: "Created CLAUDE.md. Review it and edit freely — this is your operating manual, not mine.""): + +```markdown +--- + +## Phase 2.5: Memory Setup + +Goal: Optionally configure 3-tier memory for agents that need to remember state across runs. + +Ask the user using AskUserQuestion: +"Would you like to set up 3-tier memory for your agents? This lets agents remember context across sessions. + +Memory tiers: +- **Session state** (hot) — Working memory, updated every turn. Prevents data loss on crash. Agents write here before responding. +- **Daily logs** (warm) — One file per day. Captures decisions, files modified, carry-forward items. +- **Long-term memory** (cold) — Curated knowledge that persists indefinitely. Agents read this on startup. + +This is recommended for pipelines that run daily and need to track progress over time. Skip if this is a one-shot or stateless pipeline." + +If yes: +1. Create the `memory/` directory in the user's project +2. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/SESSION-STATE.md` and copy to `memory/SESSION-STATE.md`, replacing `{{AGENT_NAME}}` with the main pipeline agent name +3. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/MEMORY.md` and copy to `memory/MEMORY.md` +4. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/DAILY-LOG.md` — explain to the user that daily log files are created automatically with the pattern `memory/YYYY-MM-DD.md` +5. Tell each agent to read `memory/SESSION-STATE.md` first on startup by adding this instruction to each agent's "How you work" section: + "1. Read memory/SESSION-STATE.md for current task context (write your intent here before responding)" +6. Tell the user: "Memory configured. Agents will now persist state across runs. The WAL protocol ensures no data loss on crash." + +If no: skip and continue to Phase 3. +``` + +**Change 2 — Add Phase 3.5 after Phase 3:** + +Insert the following section immediately after the `## Phase 3: Agent Team` section (after the "Make adjustments before moving to Phase 4" line): + +```markdown +--- + +## Phase 3.5: Skills and Custom Components + +Goal: Wire up specialized skills and configure the pause/resume mechanism for missing dependencies. + +Ask the user using AskUserQuestion: +"Do your agents need any specialized knowledge or behaviors beyond what's been generated? For example: writing style guides, domain-specific rules, external API patterns, or custom tool workflows." + +If yes, for each described need: +1. Check if a matching skill already exists in the project: `Glob for .claude/skills/*.md` and `.claude/skills/*/SKILL.md` +2. If a matching skill exists: note it and confirm it will be auto-loaded by Claude Code +3. If no matching skill exists, ask: "For [described need], I can: + (a) Generate a skill skeleton now that you fill in + (b) Pause and let you build it first, then resume the build + Which do you prefer?" + +If user chooses (a): generate `.claude/skills/[name].md` with basic frontmatter and placeholder content + +If user chooses (b) — pause and resume: +1. Write `build-state.json` to the project root with the current state: +```json +{ + "phase": "3.5", + "completed": ["0", "1", "2", "2.5", "3"], + "choices": { + "template": "[chosen template or 'custom']", + "agents": ["[agent names]"], + "memory": true, + "pipeline_name": "[name]" + }, + "paused_reason": "User building custom skill: [skill name]", + "resume_instructions": "Run /agent-factory:build --resume when the skill is ready" +} +``` +2. Tell the user: "Build paused. Create your skill at `.claude/skills/[name].md`, then run `/agent-factory:build --resume` to continue from Phase 3.5." +3. Stop — do not proceed further until resumed. + +On `--resume` (when $ARGUMENTS contains "--resume"): +1. Read `build-state.json` from the project root +2. Print a summary: "Resuming build. Completed phases: [list]. Paused at: [phase]. Reason: [reason]." +3. Continue from the saved phase using the saved choices. +``` + +**Change 3 — Add Phase 3.7 after Phase 3.5:** + +Insert the following section immediately after the Phase 3.5 section: + +```markdown +--- + +## Phase 3.7: Proactive Agent (Optional) + +Goal: Enable self-improving behavior for agents that should operate autonomously over time. + +Ask the user using AskUserQuestion: +"Should any of your agents be proactive — able to identify improvements to their own behavior and implement them within guardrails? + +Proactive agents use two protection mechanisms: +- **ADL (Anti-Drift Limits)** — Prevents agents from faking capabilities, making unverifiable changes, or expanding scope without approval +- **VFM (Value-First Modification)** — Requires each proposed change to score >50/100 on: frequency (0-25), failure reduction (0-25), burden reduction (0-25), cost savings (0-25) + +This is recommended for long-running pipelines where you want the agent to improve itself over time. Skip if you want full manual control." + +If yes: +1. Ask: "Which agent(s) should be proactive?" and list the agents from Phase 3 +2. For each chosen agent: + a. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/ADL-RULES.md` and append its contents to the agent's `.claude/agents/[name].md` as a new "## Anti-Drift Limits" section + b. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/VFM-SCORING.md` and append as a new "## Value-First Modification" section +3. Show the user an example VFM scoring calculation: "Before adding a new step to my pipeline, I score it: Frequency (how often this gap causes issues): 20/25. Failure reduction: 18/25. Burden reduction: 15/25. Cost savings: 5/25. Total: 58/100 → implement." +4. Tell the user: "Proactive guardrails added. Agents will self-improve within ADL/VFM boundaries." + +If no: skip and continue to Phase 4. +``` + +**Change 4 — Add Phase 4.5 after Phase 4:** + +Insert the following section immediately after the `## Phase 4: Pipeline` section (after "Tell the user: "Created .claude/skills/[pipeline-name].md. Run it with /[pipeline-name] [your topic].""): + +```markdown +--- + +## Phase 4.5: Integrations and MCP Servers + +Goal: Connect agents to external services they need to do real work. + +Ask the user using AskUserQuestion: +"What external services do your agents need to interact with? For example: Slack, GitHub, Linear, databases, REST APIs, file systems, browsers." + +If none: skip and continue to Phase 5. + +For each named service: +1. Check if a known MCP server exists for it: + - Slack → `@anthropic-ai/mcp-server-slack` + - GitHub → available as MCP server + - Playwright/browser → `@anthropic-ai/mcp-server-playwright` + - PostgreSQL/SQLite → community MCP servers available + - Linear, Jira, Notion → community MCP servers available + - Read `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/mcp-integrations.md` for the full list + +2. If MCP server exists: + a. Generate a `.mcp.json` entry for it (create or append to `.mcp.json` in project root): +```json +{ + "mcpServers": { + "[service-name]": { + "command": "npx", + "args": ["-y", "@anthropic-ai/mcp-server-[name]"], + "env": { + "[SERVICE]_API_KEY": "${[SERVICE]_API_KEY}" + } + } + } +} +``` + b. Tell the user: "Add [SERVICE]_API_KEY to your environment. For local runs: export it in ~/.zshenv. For Docker: add it to .env." + +3. If no MCP server exists for the service: + a. Explain: "There's no pre-built MCP server for [service]. You have three options: + (a) Use the Bash tool directly — simpler but less structured. Good for REST APIs with curl. + (b) Find a community MCP server — search: https://github.com/modelcontextprotocol/servers + (c) Build a custom MCP server — I can generate a skeleton using the `/mcp-builder` skill." + b. Ask which option they prefer + c. If option (a): add Bash tool to the agent's tools list and note the API pattern in the agent's system prompt + d. If option (b): pause and provide search instructions + e. If option (c) — pause and resume: + - Write build-state.json with phase "4.5", add current MCP choices to "choices" + - Tell the user: "Build paused. Build your MCP server, then run `/agent-factory:build --resume`." + - Stop. + +After all integrations: +1. Validate `.mcp.json` syntax: `python3 -c "import json; json.load(open('.mcp.json'))"` — tell user if invalid +2. Tell the user: "MCP integrations configured. Note: `/schedule` cloud tasks cannot use MCP servers. For cloud deployment, agents must use Bash with direct API calls instead." +``` + +**Change 5 — Add Phase 5.5 after Phase 5:** + +Insert the following section immediately after the `## Phase 5: Security` section (after "Tell the user: "Created hooks and settings.json. The audit log will be written to .claude/hooks/audit.log after each tool call.""): + +```markdown +--- + +## Phase 5.5: Governance + +Goal: Define how much autonomy the agent system has and where humans stay in the loop. + +Ask the user using AskUserQuestion: +"What autonomy level do you want for your agent system? + +- **Level 0** — Full manual approval. Every tool call requires your OK. +- **Level 1** — Auto-approve reads only. (Read, Glob, Grep run freely. All writes need approval.) +- **Level 2** — Auto-approve file operations within the project. (Writes to project dir run freely.) +- **Level 3** — Auto-approve all except destructive operations. (Bash with non-destructive commands runs freely.) +- **Level 4** — Full autonomy with hooks as guardrails. (Everything runs unless blocked by a hook rule.) + +For most personal pipelines: Level 3 or 4. For shared or production systems: Level 1 or 2." + +Based on the chosen level: +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/governance/GOVERNANCE.md` +2. Copy to `GOVERNANCE.md` in the project root +3. Replace `{{AUTONOMY_LEVEL}}` with the chosen level +4. Replace `{{PROJECT_NAME}}` with the project name from Phase 2 +5. Ask: "Are there any specific approval gates — operations that must always pause for your review? For example: sending emails, deleting files, committing to main branch." +6. For each gate: add an entry to the Approval Gates section in GOVERNANCE.md + +Ask using AskUserQuestion: +"Do you want to track agent spending? Budget tracking records estimated API costs per run and can pause agents that exceed a monthly limit." + +If yes: +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/budget/BUDGET.md` and copy to `budget/BUDGET.md` +2. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/budget/budget-hook.sh` and copy to `.claude/hooks/budget-hook.sh` +3. Make budget-hook.sh executable: `chmod +x .claude/hooks/budget-hook.sh` +4. Ask: "What monthly budget limit in USD?" and set `{{BUDGET_LIMIT_CENTS}}` to (USD * 100) +5. Add budget-hook.sh to PostToolUse hooks in `.claude/settings.json` +6. Tell the user: "Budget tracking configured. Costs are estimated from token counts. A warning triggers at 80% and agents pause at 100% of the monthly limit." +``` + +**Change 6 — Add Phase 5.7 after Phase 5.5:** + +Insert the following section immediately after the Phase 5.5 section: + +```markdown +--- + +## Phase 5.7: Goals and Org Chart (Multi-Agent Systems) + +Goal: For systems with 3 or more agents, define the hierarchy and shared goals. + +Count the agents created in Phase 3. If fewer than 3: skip this phase entirely. + +If 3 or more agents: + +Ask the user using AskUserQuestion: +"Your system has [N] agents. Would you like to define: +(a) A **goal hierarchy** — what this system is trying to achieve at the company, project, and task levels +(b) An **org chart** — which agents report to which, and who has what authority +(c) Both +(d) Skip — I'll manage coordination manually" + +If goals (a or c): +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/goals/GOALS.md` and copy to `GOALS.md` in the project root +2. Ask: "What is the top-level goal this agent system serves? (e.g., 'Publish 3 high-quality articles per week')" +3. For each agent, ask: "What does [agent name] contribute toward that goal?" +4. Populate the Goals file with the hierarchy using dot-notation IDs: G1 (company goal) → G1.1 (project goal) → G1.1.1 (this agent's task goal) +5. Tell the user: "GOALS.md created. Agents can reference their goal ID to stay aligned with the overall mission." + +If org chart (b or c): +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/org-chart/ORG-CHART.md` and copy to `ORG-CHART.md` in the project root +2. Ask: "Which agent is the top-level orchestrator? (The one that coordinates others.)" +3. For remaining agents: ask "Does [agent] report to [orchestrator] or another agent?" +4. Populate the ORG-CHART.md table with `reportsTo` references +5. Tell the user: "ORG-CHART.md created. You (the human operator) are always the 'board' with override authority on all agents." +``` + +**Change 7 — Update Phase 6 deployment options** (replace the Phase 6 written in Step 8 with this expanded version): + +Replace the `## Phase 6: Deployment` section written in Step 8 with: + +```markdown +## Phase 6: Deployment + +Goal: Configure automated scheduling for the pipeline. Present ALL deployment options with clear trade-offs. + +Use the `/agent-factory:deploy` command to handle deployment configuration. Tell the user: +"I'll hand off to the deploy command to configure your deployment target." + +Then invoke `/agent-factory:deploy` passing the pipeline name, agent list, and any MCP integrations from Phase 4.5 as context. + +If the user wants to configure deployment inline, ask using AskUserQuestion: + +"Choose a deployment target. Here are the trade-offs: + +| Target | Needs | Minimum interval | Local files | Best for | +|--------|-------|-----------------|-------------|----------| +| /schedule (cloud) | GitHub repo | 1 hour | No | Repo maintenance, CI triage, PR review | +| Desktop tasks | Desktop app running | 1 minute | Yes | Local automation while you work | +| Local cron/launchd | Machine on | Any | Yes | Personal daily pipelines | +| VPS systemd | Linux server | Any | Yes | Team pipelines, production | +| Docker | Docker installed | Any | Via volumes | Isolated, portable, reproducible | + +Which target? (You can configure multiple.)" + +For `/schedule (cloud)`: +1. Check GitHub connection. If not connected: explain the GitHub repo requirement. +2. Warn: "Cloud tasks cannot access local files or MCP servers. Only GitHub repo contents and allowed MCP connectors are accessible." +3. Generate a task prompt from the pipeline skill. Guide the user through `/schedule` to create the task. + +For Desktop scheduled tasks: +1. Explain: "Desktop tasks run on your machine with full local file access and MCP servers available." +2. Guide through the Desktop app Schedule page. Recommend `bypassPermissionsModeAllowed: true` for unattended runs. + +For Local (cron/launchd): +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh`. Copy to `automation/run-pipeline.sh` and customize. +2. Ask: "What schedule? (e.g., daily at 07:00)" +3. For macOS: read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/launchd.plist`, customize, save to `automation/` +4. Provide activation: `launchctl load ~/Library/LaunchAgents/[label].plist` + +For VPS (systemd): +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/systemd-service.unit`. Copy and customize to `automation/claude-pipeline.service`. +2. Ask: "What Linux user runs the agent?" and "Absolute path to project?" +3. Provide activation: `systemctl enable --now claude-pipeline.timer` + +For Docker: +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/docker/Dockerfile` and `docker-compose.yml`. Copy and customize to project root. +2. Warn: "Never bake ANTHROPIC_API_KEY into the image. Use .env file or environment injection." +3. Provide activation: `docker compose up -d` + +Tell the user what files were created and the exact commands needed to activate the schedule. +``` + +**Change 8 — Update Phase 7:** + +Append the following to the end of the `## Phase 7: Test and Iterate` section (after "Implement that one change before closing the session."): + +```markdown + +After the test run, ask using AskUserQuestion: +"Would you like to set up a feedback loop? This records reviewer scores and recurring issues so you can track pipeline quality over time and identify which agents need tuning." + +If yes: +1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/feedback/FEEDBACK.md` and copy to `feedback/FEEDBACK.md` +2. Tell the user: "Log pipeline results to feedback/FEEDBACK.md after each run. After your first week, run `scripts/templates/optimization/pipeline-optimizer.sh` to get specific recommendations for improving your agents." +``` + +**Change 9 — Update the Summary section** (replace the summary from Step 8 with the full version): + +Replace the `## Summary` section written in Step 8 with: + +```markdown +## Summary + +After all phases are complete (or the user stops early), print a summary: + +``` +AGENT SYSTEM BUILT +================== +Agents: .claude/agents/[list] +Pipeline: .claude/skills/[name].md +Hooks: .claude/hooks/pre-tool-use.sh, post-tool-use.sh +Settings: .claude/settings.json +Schedule: automation/[script or config] +Memory: memory/ (3-tier: SESSION-STATE, daily logs, MEMORY.md) +Governance: GOVERNANCE.md (Level [N]) +Budget: budget/BUDGET.md + .claude/hooks/budget-hook.sh +Goals: GOALS.md +Org chart: ORG-CHART.md +MCP config: .mcp.json +Feedback: feedback/FEEDBACK.md + +Run your pipeline: /[pipeline-name] [your topic] +Review logs: .claude/hooks/audit.log +Evaluate system: /agent-factory:evaluate +Check status: /agent-factory:status +Deploy/reschedule: /agent-factory:deploy +``` + +If build was paused (build-state.json exists): +- Tell the user: "Build paused at Phase [phase]. Run `/agent-factory:build --resume` to continue." +- List completed phases and remaining phases. + +If the user stopped early without pausing: +- Tell them which phase to continue from and remind them that this command can be re-invoked at any time. + +Next steps after first week: +- Run `scripts/templates/optimization/pipeline-optimizer.sh` after collecting feedback +- Run `/agent-factory:evaluate` to score your system against all 22 capabilities +``` + +**Change 10 — Add build-state.json resume handling at top of file:** + +Add the following paragraph immediately after the opening description paragraph (after "Work through phases sequentially..."): + +```markdown +**Resume mode:** If $ARGUMENTS contains `--resume`, read `build-state.json` from the current directory. Print: "Resuming build — completed phases: [list]. Paused at Phase [phase]. Reason: [reason]. Continuing..." Then skip all completed phases and continue from the saved phase using the saved choices. If build-state.json does not exist, tell the user: "No saved build state found. Starting fresh." and proceed normally. + +**build-state.json schema:** +```json +{ + "phase": "3.5", + "completed": ["0", "1", "2", "2.5", "3"], + "choices": { + "template": "monitoring", + "agents": ["monitor-checker", "incident-reporter", "remediation-advisor"], + "pipeline_name": "my-monitoring", + "memory": true, + "proactive": false, + "mcp_servers": ["slack"], + "autonomy_level": 3, + "budget": true, + "goals": false, + "org_chart": false, + "deployment_target": "local" + }, + "paused_reason": "User creating custom MCP server for internal ticketing system", + "resume_instructions": "Run /agent-factory:build --resume when MCP server is ready" +} +``` +``` + +### Verify + +```bash +wc -l /Users/ktg/repos/agent-builder/commands/build.md +``` +Expected: `>= 600` (was 390 before Step 8, Step 26 should bring it to 600+) + +Also verify the resume mechanism is present: +```bash +grep -c "build-state.json" /Users/ktg/repos/agent-builder/commands/build.md +``` +Expected: `>= 3` + +### On failure +revert — build command is critical path, must be valid + +### Checkpoint +```bash +git commit -m "feat(commands): integrate all Phase 2-5 features into build workflow" +``` + +--- + +## Step 27: Update plugin.json, CLAUDE.md, README.md for v1.0 + +### Files to modify + +**`.claude-plugin/plugin.json`** — Replace content with: + +```json +{ + "name": "agent-factory", + "description": "Build and manage autonomous agent systems with Claude Code. Guided workflow with 3-tier memory, heartbeat scheduling, budget tracking, governance, org-chart, 10 domain templates, Docker deployment, and import/export. Inspired by OpenClaw and Paperclip patterns.", + "version": "1.0.0", + "author": { + "name": "Kjell Tore Guttormsen", + "url": "https://fromaitochitta.com" + }, + "repository": "https://git.fromaitochitta.com/open/agent-factory", + "license": "MIT", + "keywords": [ + "agent", + "autonomous", + "pipeline", + "automation", + "hooks", + "security", + "deployment", + "memory", + "heartbeat", + "budget", + "governance", + "org-chart", + "templates", + "import", + "export" + ] +} +``` + +**`CLAUDE.md`** — Replace content with: + +```markdown +# Agent Factory Plugin + +Plugin that guides users through building complete autonomous agent systems +using Claude Code. Install via `/install agent-factory` or `--plugin-dir`. + +## What this plugin does + +Guides users through building their own multi-agent system end to end: +agents, skills, hooks, 3-tier memory, heartbeat scheduling, budget +tracking, governance, org-chart, MCP integrations, and deployment. +The `/agent-factory:build` command runs a guided workflow covering +all major agent system capabilities. + +## Plugin structure + +- `commands/` — 4 user-invoked slash commands + - `build.md` — Guided build workflow (10 phases including memory, governance, integrations) + - `deploy.md` — Configure deployment (5 targets: cloud, desktop, cron/launchd, systemd, Docker) + - `evaluate.md` — Score system against 22 agent capabilities + - `status.md` — Quick health check of all infrastructure +- `agents/` — 2 plugin agents + - `builder.md` — Guided build orchestrator + - `deployment-advisor.md` — Deployment target recommendation agent +- `skills/` — 2 auto-triggering knowledge skills + - `agent-system-design/` — 22-capability feature map, pipeline patterns, security, deployment, memory, autonomy, orchestration, governance, MCP references + - `managed-agents/` — Anthropic API managed agents patterns and SDK examples +- `scripts/templates/` — File templates the builder copies into the user's project + +## Template directories + +All templates are in `scripts/templates/`: + +| Directory | Contents | +|-----------|----------| +| `memory/` | 3-tier memory: SESSION-STATE.md, DAILY-LOG.md, MEMORY.md | +| `heartbeat/` | Heartbeat runner, HEARTBEAT.md, context-packet.md, wake-prompt.md | +| `proactive/` | PROACTIVE-AGENT.md, ADL-RULES.md, VFM-SCORING.md | +| `cron/` | agent-turn.sh (isolated agentTurn), system-event.sh | +| `goals/` | GOALS.md goal hierarchy, goal-tracker.sh | +| `budget/` | BUDGET.md policy, budget-hook.sh, budget-report.sh | +| `governance/` | GOVERNANCE.md, approval-gate.sh | +| `org-chart/` | ORG-CHART.md, org-manager.sh | +| `docker/` | Dockerfile, docker-compose.yml, docker-entrypoint.sh | +| `domains/` | 10 domain pipeline templates (content, code-review, monitoring, research, data-processing, customer-support, devops, legal, sales, security) | +| `transfer/` | export-system.sh, import-system.sh, MANIFEST.md | +| `feedback/` | FEEDBACK.md, feedback-collector.sh, performance-scorer.sh | +| `optimization/` | pipeline-optimizer.sh, self-healing.sh | + +## Rules + +- Never write files outside the user's project directory +- Always ask before overwriting existing files +- Hook templates must be bash 3.2 compatible (Intel Mac) +- Generated agents must have valid YAML frontmatter with `` blocks +- Use `${CLAUDE_PLUGIN_ROOT}` for all intra-plugin paths +- Domain templates use `{{PLACEHOLDER}}` syntax (plain string replace, no engine) +``` + +**`README.md`** — Complete rewrite with: + +```markdown +# Agent Factory + +> Install one plugin. Build complete agent systems that do real work. + +Agent Factory is a Claude Code plugin that guides you through building +autonomous agent systems from scratch — covering every layer from +agents and pipelines to memory, heartbeat scheduling, governance, and +deployment. + +**Recommended installation:** via [ktg-plugin-marketplace](https://git.fromaitochitta.com/open/ktg-plugin-marketplace), which bundles Agent Factory with the ultra-suite (ultraplan, ultraresearch, ultraexecute). + +```bash +# Install from marketplace (recommended) +/install ktg-plugin-marketplace + +# Or install standalone +/install agent-factory +# or +claude --plugin-dir ./agent-factory +``` + +--- + +## Commands + +| Command | What it does | +|---------|-------------| +| `/agent-factory:build` | Guided build workflow — 10 phases from blank to deployed system | +| `/agent-factory:deploy` | Configure deployment for any target (cloud, local, VPS, Docker) | +| `/agent-factory:evaluate` | Score your system against all 22 agent capabilities | +| `/agent-factory:status` | Quick health check: agents, skills, hooks, deployment, memory | + +--- + +## What it builds + +A complete autonomous agent system with: + +**Core infrastructure** +- Agent files (`.claude/agents/*.md`) with reliable trigger patterns +- Pipeline skill that chains agents end to end +- Pre-tool-use and post-tool-use security hooks +- Settings with granular permission allow/deny rules + +**Memory (OpenClaw pattern)** +- **Session state** (hot) — Working memory with WAL protocol. Written before responding; survives crashes. +- **Daily logs** (warm) — One file per day; captures decisions and carry-forward items. +- **Long-term memory** (cold) — Curated knowledge that persists indefinitely. + +**Heartbeat scheduling (OpenClaw + Paperclip)** +- `HEARTBEAT.md` — Defines scheduled tasks and intervals +- `heartbeat-runner.sh` — Bash 3.2 script with emptiness detection (no API call if nothing to do), startup catchup (catches up to 5 missed tasks after downtime), and task state tracking +- Context injection — `context-packet.md` + `wake-prompt.md` give agents their full state on each wakeup + +**Proactive agents (OpenClaw)** +- Anti-Drift Limits (ADL) — Prevents agents from faking capabilities, expanding scope without approval, or making unverifiable changes +- Value-First Modification (VFM) — Requires proposed self-improvements to score >50/100 before implementation + +**Goal hierarchy (Paperclip)** +- `GOALS.md` — Three-level goal tree: company → project → task +- Simple `parent_id` references (not recursive traversal — matches actual Paperclip implementation) +- Goal IDs in agent prompts for alignment + +**Budget tracking (Paperclip)** +- `BUDGET.md` — Monthly budget policy per agent and per project +- `budget-hook.sh` — PostToolUse hook that logs cost events, warns at 80%, pauses at 100% +- `budget-report.sh` — Per-agent cost breakdown and projection + +**Governance (Paperclip)** +- Autonomy levels 0-4 (from "all calls need approval" to "full autonomy with hook guardrails") +- Approval gates — specific operations that always pause for human review +- `approval-gate.sh` — PreToolUse hook implementing the gate logic +- Audit trail: tool calls, budget events, approval decisions + +**Org chart (Paperclip)** +- `ORG-CHART.md` — Agent hierarchy with `reportsTo` references +- Delegation rules, cross-team routing, human override authority +- `org-manager.sh` — Validates hierarchy, generates org tree visualization + +**MCP integrations** +- `.mcp.json` configuration for Slack, GitHub, Playwright, databases +- Pause/resume build state when MCP server creation is needed +- Trade-off guidance: MCP vs Bash vs custom server + +**10 domain templates** + +Start from a pre-built system rather than blank: + +| Template | Domain | Agents | +|----------|--------|--------| +| content-pipeline | Articles, newsletters, reports | researcher, writer, reviewer | +| code-review | Automated PR review | analyzer, review-writer, standards-checker | +| monitoring | System/service monitoring | monitor-checker, incident-reporter, remediation-advisor | +| research-synthesis | Research and analysis | source-gatherer, synthesizer, fact-checker | +| data-processing | Data transformation | data-validator, transformer, quality-checker | +| customer-support | Ticket handling | classifier, response-drafter, escalation-checker | +| devops-automation | Deployment and incidents | deploy-checker, incident-detector, runbook-executor | +| legal-review | Document review | clause-extractor, risk-assessor, compliance-checker | +| sales-intelligence | Prospect research | prospect-researcher, pitch-customizer, follow-up-tracker | +| security-audit | Security posture | config-scanner, vulnerability-checker, remediation-advisor | + +**Deployment (5 targets)** + +| Target | Needs | Local files | Best for | +|--------|-------|-------------|----------| +| `/schedule` (cloud) | GitHub repo | No | Repo maintenance, PR review, CI triage | +| Desktop tasks | Desktop app | Yes | Local automation while you work | +| cron/launchd | Machine on | Yes | Personal daily pipelines | +| VPS systemd | Linux server | Yes | Team pipelines, production | +| Docker | Docker installed | Via volumes | Isolated, portable, reproducible | + +**Import/export** +- `export-system.sh` — Package your agent system into a versioned tarball with MANIFEST.md +- `import-system.sh` — Import a system tarball, check for conflicts, validate all components +- MANIFEST.md — Component list with checksums + +**Feedback and self-learning** +- `FEEDBACK.md` — Records pipeline scores and recurring issues +- `feedback-collector.sh` — PostToolUse hook that detects patterns after 3+ occurrences +- `performance-scorer.sh` — Per-agent metrics and improvement trends +- `pipeline-optimizer.sh` — VFM-scored recommendations for bottleneck agents +- `self-healing.sh` — Categorized error recovery with backoff and max-attempt limits + +--- + +## Quick start + +```bash +# Install and run +/install agent-factory +/agent-factory:build + +# Start from a domain template +/agent-factory:build monitoring + +# Resume an interrupted build +/agent-factory:build --resume + +# Evaluate what you've built +/agent-factory:evaluate + +# Check system health +/agent-factory:status +``` + +--- + +## Architecture overview + +``` +Phase 0: Choose template (or custom) +Phase 1: Map your work — pipeline design sketch +Phase 2: Operating manual — CLAUDE.md +Phase 2.5: Memory setup — 3-tier memory +Phase 3: Agent team — .claude/agents/*.md +Phase 3.5: Skills — wire up or generate skill files +Phase 3.7: Proactive agent — ADL/VFM guardrails +Phase 4: Pipeline — .claude/skills/[name].md +Phase 4.5: Integrations — .mcp.json + MCP servers +Phase 5: Security — hooks + settings.json +Phase 5.5: Governance — GOVERNANCE.md + autonomy level +Phase 5.7: Goals and org chart — for 3+ agent systems +Phase 6: Deployment — choose and configure target +Phase 7: Test and iterate — feedback loop setup +``` + +--- + +## Pattern reference + +Patterns implemented by this plugin are documented in the skills: + +- **22 agent capabilities** — `skills/agent-system-design/references/feature-map.md` +- **Memory patterns** — `skills/agent-system-design/references/memory-patterns.md` +- **Autonomy patterns** — `skills/agent-system-design/references/autonomy-patterns.md` +- **Orchestration patterns** — `skills/agent-system-design/references/orchestration-patterns.md` +- **Governance patterns** — `skills/agent-system-design/references/governance-patterns.md` +- **MCP integrations** — `skills/agent-system-design/references/mcp-integrations.md` +- **Managed Agents (API)** — `skills/managed-agents/SKILL.md` + +--- + +## Version history + +| Version | What changed | +|---------|-------------| +| 1.0.0 | Full vision: 10 templates, memory, heartbeat, governance, budget, org-chart, Docker, import/export | +| 0.2.0 | Rename to agent-factory, add deploy/evaluate/status commands, managed-agents skill, 5 domain templates | +| 0.1.0 | Initial release: build command, builder agent, agent-system-design skill, 3 hook templates | + +--- + +## Value proposition + +Install one plugin. Build complete agent systems that do real work. No servers, no databases, no configuration files beyond what the agents themselves manage. Everything runs in Claude Code with shell scripts you can read, edit, and understand. +``` + +### Verify + +```bash +python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'; print('OK')" +``` +Expected: `OK` + +Also verify CLAUDE.md lists all 13 template directories: +```bash +grep -c "memory/\|heartbeat/\|proactive/\|cron/\|goals/\|budget/\|governance/\|org-chart/\|docker/\|domains/\|transfer/\|feedback/\|optimization/" /Users/ktg/repos/agent-builder/CLAUDE.md +``` +Expected: `13` + +Also verify README.md has the value proposition line: +```bash +grep -c "No servers, no databases, no configuration" /Users/ktg/repos/agent-builder/README.md +``` +Expected: `1` + +### On failure +revert — manifest must be valid JSON with correct version + +### Checkpoint +```bash +git commit -m "feat: Agent Factory v1.0.0 — full vision realized" +``` + +--- + +## Exit Condition + +- [ ] `grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md` → `>= 3` +- [ ] `wc -l /Users/ktg/repos/agent-builder/commands/build.md | awk '{print ($1 >= 600)}'` → `1` +- [ ] `grep -c "build-state.json" /Users/ktg/repos/agent-builder/commands/build.md` → `>= 3` +- [ ] `grep -c "Phase 2.5\|Phase 3.5\|Phase 3.7\|Phase 4.5\|Phase 5.5\|Phase 5.7" /Users/ktg/repos/agent-builder/commands/build.md` → `6` +- [ ] `python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'"` → no error +- [ ] `grep -c "memory/\|heartbeat/\|proactive/\|cron/\|goals/\|budget/\|governance/\|org-chart/\|docker/\|domains/\|transfer/\|feedback/\|optimization/" /Users/ktg/repos/agent-builder/CLAUDE.md` → `13` +- [ ] `grep -c "No servers, no databases, no configuration" /Users/ktg/repos/agent-builder/README.md` → `1` +- [ ] `grep -c "1.0.0" /Users/ktg/repos/agent-builder/README.md` → `>= 1` + +## Quality Criteria + +- `commands/build.md` has all 6 new phases (2.5, 3.5, 3.7, 4.5, 5.5, 5.7) in correct phase-number order +- Phase 0 domain template selection asks by name and reads the correct template file via `${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.md` +- Phase 3.5 pause/resume writes and reads `build-state.json` with the exact schema specified +- Phase 4.5 covers all 3 options for missing MCP servers: Bash fallback, community search, custom build with pause +- Phase 5.5 governance includes both GOVERNANCE.md generation and the optional budget tracking setup +- Phase 5.7 is conditional: only shown for systems with 3+ agents +- Phase 6 deployment table shows all 5 targets with `Local files` and `Minimum interval` columns +- Resume logic (`--resume` in $ARGUMENTS) reads build-state.json and skips completed phases +- `plugin.json` version is exactly `"1.0.0"`, name is `"agent-factory"`, has all 14 keywords including `"memory"`, `"heartbeat"`, `"budget"`, `"governance"`, `"org-chart"`, `"templates"`, `"import"`, `"export"` +- `CLAUDE.md` lists all 13 template directories in the Template directories table +- `README.md` command table has all 4 commands +- `README.md` deployment table has all 5 targets with trade-offs +- `README.md` domain template table has all 10 templates +- `README.md` ends with the value proposition: "No servers, no databases, no configuration..." +- Steps 8 and 26 are composable: Step 26 builds on Step 8's Phase 0 and Phase 6 changes without reverting them diff --git a/.claude/plans/ultraplan-2026-04-11-agent-factory.md b/.claude/plans/ultraplan-2026-04-11-agent-factory.md index 03e2566..451ef80 100644 --- a/.claude/plans/ultraplan-2026-04-11-agent-factory.md +++ b/.claude/plans/ultraplan-2026-04-11-agent-factory.md @@ -993,16 +993,17 @@ graph TD | Low | Docker template may need updates for newer Claude Code versions | `scripts/templates/docker/` | Docker deployment breaks | Dockerfile uses `node:22-slim` and `npm install -g` which auto-updates | | Low | Domain templates may not match all user domains | `scripts/templates/domains/` | Users need custom templates | 10 templates cover common domains; builder agent can create custom templates | -## Assumptions +## Assumptions (VERIFIED 2026-04-11) -| # | Assumption | Why unverifiable | Impact if wrong | -|---|-----------|-----------------|-----------------| -| 1 | Anthropic billing API exists and is accessible with standard API key | No docs found confirming exact endpoint | Budget tracking falls back to token-based estimation; budget-hook.sh needs manual cost configuration | -| 2 | `/schedule` trigger interface is stable enough to build on | Claude Code internal API, no stability guarantee | Heartbeat templates still work with cron/launchd/systemd; `/schedule` is optional deployment target | -| 3 | Docker deployment should use docker-compose.yml with Dockerfile | Spec assumption, no user confirmation | Minor: can generate either format; both templates provided | -| 4 | `claude --resume` with custom session IDs works for isolated agent turns | Based on CLI docs, not tested with custom session key formats | agentTurn template may need session ID format adjustment | +| # | Assumption | Status | Finding | Impact on plan | +|---|-----------|--------|---------|----------------| +| 1 | Anthropic billing API exists and is accessible with standard API key | **PARTIAL** | Usage & Cost API exists at `/v1/organizations/usage_report/messages` and `/v1/organizations/cost_report`. BUT requires Admin API key (`sk-ant-admin...`), not standard API key. Only available for organizations, not individual accounts. CLI has `--max-budget-usd` flag for print-mode budget caps. | budget-hook.sh MUST use token-count estimation as primary method. Admin API integration is optional, org-only. Template README should document this limitation clearly. CLI `--max-budget-usd` is a simpler alternative for headless runs. | +| 2 | `claude --resume` with custom session IDs works for isolated agent turns | **WRONG** | `--session-id` requires a valid UUID. Custom formats like `agent:name:turn:123` are rejected. `--name` flag sets human-readable session names. `--resume` accepts session names for fuzzy search. | agent-turn.sh template MUST use `--name "agent:{{AGENT_NAME}}:turn"` + `--resume "agent:{{AGENT_NAME}}:turn"` pattern instead of custom session IDs. For deterministic UUIDs: `python3 -c "import uuid; print(uuid.uuid5(uuid.NAMESPACE_DNS, 'agent:name:task'))"`. | +| 3 | `/schedule` trigger interface is stable enough to build on | **CONFIRMED with caveats** | `/schedule` is GA, available to Pro/Max/Team/Enterprise. Runs on Anthropic cloud from GitHub clone (NOT local files). 1-hour minimum interval. Desktop scheduled tasks (`/schedule` in Desktop app) are the local alternative with 1-minute minimum and local file access. MCP connectors available for cloud tasks. | Deploy command must distinguish: `/schedule` = cloud (needs GitHub, no local files), Desktop scheduled tasks = local machine, cron/launchd/systemd = traditional local. Heartbeat templates are for local/VPS/Docker only; `/schedule` has its own mechanism. | +| 4 | Docker deployment should use docker-compose.yml with Dockerfile | **CONFIRMED** | Well-established pattern. Official Anthropic devcontainer reference exists. Docker sandboxes documented for safe autonomous operation. `--dangerously-skip-permissions` for unattended operation inside containers. Key security: `security_opt: no-new-privileges:true`, never mount Docker socket, mount only specific project folders. | docker-compose.yml template should add `security_opt: [no-new-privileges:true]`. Dockerfile approach is correct. Add Docker socket warning to README. | -**WARNING: Plan has 4 unverified assumptions -- review before executing.** +**All 4 assumptions verified. 1 was wrong (session IDs), 1 was partial (billing API), 2 confirmed.** +Sources: [CLI Reference](https://code.claude.com/docs/en/cli-reference), [Scheduled Tasks](https://code.claude.com/docs/en/web-scheduled-tasks), [Usage & Cost API](https://platform.claude.com/docs/en/build-with-claude/usage-cost-api), [Docker Blog](https://www.docker.com/blog/docker-sandboxes-run-claude-code-and-other-coding-agents-unsupervised-but-safely/) ## Verification