Kjell Tore Guttormsen 004e6f462b feat(plans): add MCP/skills integration phases and pause/resume to build workflow

Build command now includes:
- Phase 3.5: Skills connection with pause/resume for custom creation
- Phase 4.5: MCP server integration (connect existing, guide creation)
- Resume mechanism via build-state.json for pausing mid-build
- Explicit deployment target selection with trade-offs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 10:34:46 +02:00

83 KiB

Raw Blame History

Agent Factory -- Full Vision Realization

Plan quality: B+ (82/100) -- APPROVE_WITH_NOTES

Generated by ultraplan-local v1.6.0 on 2026-04-11

WARNING: Plan has 4 unverified assumptions -- review before executing.

Context

The existing agent-builder plugin (v0.1.0) is a minimal prototype with 1 command (/agent-builder:build), 1 agent (builder), 1 skill (agent-system-design), and 3 hook templates. Three commands are stubbed in the README but not implemented (deploy, evaluate, status). The deployment-advisor agent and managed-agents skill are mentioned but empty or missing.

This plan transforms agent-builder into Agent Factory: a comprehensive, guided system for building autonomous agent systems. The transformation spans 5 development phases, each building on the previous:

Foundation (v0.2) -- rename, missing commands, deployment-advisor, managed-agents skill, domain templates
OpenClaw patterns (v0.3) -- 3-tier memory, WAL protocol, Working Buffer, proactive agent with ADL/VFM, isolated agentTurn
Paperclip patterns (v0.4) -- heartbeat with context injection, goal hierarchy, budget awareness, governance, org-chart
Self-learning (v0.5) -- feedback loops, performance scoring, pipeline optimization, self-healing
Full integration (v1.0) -- MCP integrations, Docker deployment, bundled templates, import/export

Spec: /Users/ktg/repos/agent-builder/.claude/ultraplan-spec-2026-04-11-agent-factory.md

Architecture Diagram

graph TD
    subgraph "Plugin Structure (Agent Factory)"
        PJ[".claude-plugin/plugin.json"]
        CM["CLAUDE.md"]

        subgraph "Commands (Phase 1)"
            C1["commands/build.md"]
            C2["commands/deploy.md -- NEW"]
            C3["commands/evaluate.md -- NEW"]
            C4["commands/status.md -- NEW"]
        end

        subgraph "Agents (Phase 1-3)"
            A1["agents/builder.md"]
            A2["agents/deployment-advisor.md -- NEW"]
        end

        subgraph "Skills (Phase 1-4)"
            S1["skills/agent-system-design/"]
            S2["skills/managed-agents/ -- NEW"]
        end

        subgraph "Templates (Phase 1-5)"
            T1["scripts/templates/automation.sh"]
            T2["scripts/templates/pre-tool-use.sh"]
            T3["scripts/templates/post-tool-use.sh"]
            T4["scripts/templates/memory/ -- NEW"]
            T5["scripts/templates/heartbeat/ -- NEW"]
            T6["scripts/templates/budget/ -- NEW"]
            T7["scripts/templates/governance/ -- NEW"]
            T8["scripts/templates/docker/ -- NEW"]
            T9["scripts/templates/domains/ -- NEW"]
        end

        subgraph "References (Phase 1-4)"
            R1["skills/agent-system-design/references/"]
            R2["references: memory, heartbeat, budget, governance -- NEW"]
        end
    end

    C1 --> A1
    C2 --> A2
    A1 --> T1 & T2 & T3 & T4 & T5
    A2 --> T8
    S2 --> R2

Codebase Analysis

Tech stack: Claude Code plugin (markdown commands, agents, skills), bash 3.2 shell scripts, JSON config
Key patterns: YAML frontmatter for agents/skills/commands, {{PLACEHOLDER}} template variables in shell scripts, ${CLAUDE_PLUGIN_ROOT} for intra-plugin paths, python3 for JSON parsing in hooks
Relevant files (all 15):
- .claude-plugin/plugin.json -- plugin manifest (rename target)
- CLAUDE.md -- plugin instructions
- README.md -- user-facing documentation
- commands/build.md -- the only implemented command
- agents/builder.md -- the only implemented agent
- skills/agent-system-design/SKILL.md -- main knowledge skill
- skills/agent-system-design/references/*.md -- 4 reference files
- scripts/templates/*.sh -- 3 shell templates
Reusable code:
- agents/builder.md -- agent frontmatter format, tool list pattern, rules structure
- commands/build.md -- command frontmatter format, phase-based workflow pattern
- scripts/templates/pre-tool-use.sh -- hook script pattern (stdin JSON, python3 parsing, exit codes)
- scripts/templates/post-tool-use.sh -- audit logging pattern
- scripts/templates/automation.sh -- headless runner pattern
- skills/agent-system-design/references/feature-map.md -- capability mapping table format
Conventions:
- Agent descriptions must include <example> blocks for reliable trigger matching
- Hook scripts use exit 2 to block, exit 0 to allow
- Templates use {{PLACEHOLDER}} syntax (plain string replace, no engine)
- All bash must be 3.2 compatible (no declare -A, no mapfile, no |&)

Research Sources

Technology	Source	Key Findings	Confidence
OpenClaw 3-tier memory	Proactive Agent Skill + source analysis	SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold); WAL protocol writes before responding; Working Buffer captures in danger zone (60%+ context)	high
OpenClaw Heartbeat	Source code `src/auto-reply/heartbeat.ts`	HEARTBEAT.md with YAML tasks block, interval tracking, emptiness detection skips API calls, `ackMaxChars` suppression	high
OpenClaw Cron	Source code `src/cron/service.ts`	`systemEvent` vs `agentTurn` types; `isolated` session target; startup catchup with stagger	high
OpenClaw ADL/VFM	Proactive Agent Skill docs	Anti-Drift Limits (no fake intelligence, no unverifiable mods); Value-First Modification (score > 50 required); Priority: Stability > Explainability > Reusability > Scalability > Novelty	high
Paperclip Heartbeat	Source code `server/src/services/heartbeat.ts`	Poll-based `tickTimers()`, 4 wakeup triggers, per-agent interval, concurrency control	high
Paperclip Budget	Source code `server/src/services/budgets.ts`	Post-hoc `evaluateCostEvent()`, SUM(cost_cents) per window, soft/hard thresholds, pause on exceed	high
Paperclip Goals	Source code, DB schema	Simple `parent_id` FK on goals table, NOT recursive traversal at runtime	high
Paperclip Org Chart	Source code, DB schema	`agents.reportsTo` self-referential FK + `agents.role` text field	high
Paperclip Adapter	Source code `packages/adapter-utils/src/types.ts`	`execute(ctx)` interface, 10 built-in adapters, Claude adapter uses CLI with `--append-system-prompt-file`	high
Anthropic Billing API	Spec assumption	[ASSUMPTION] Endpoint and auth mechanism unverified	low

Implementation Plan

Step 1: Rename plugin from agent-builder to agent-factory

Files: .claude-plugin/plugin.json, CLAUDE.md, README.md, commands/build.md
Changes:
- .claude-plugin/plugin.json: Change "name": "agent-builder" to "name": "agent-factory", update description to mention "Agent Factory", bump version to "0.2.0"
- CLAUDE.md: Replace "Agent Builder Plugin" with "Agent Factory Plugin", update all references to "agent-builder" with "agent-factory"
- README.md: Replace title "# Agent Builder" with "# Agent Factory", update install command from /install agent-builder to /install agent-factory, update all /agent-builder: command prefixes to /agent-factory:, update plugin-dir path reference, update repository URL if needed
- commands/build.md: Replace /agent-builder:build with /agent-factory:build in the system prompt text
- skills/agent-system-design/SKILL.md: Update /agent-builder:build reference to /agent-factory:build
Reuses: Existing file structures, no new patterns needed
Verify: grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan-spec" → expected: no matches (all renamed)
On failure: revert -- this is the foundation; all subsequent steps depend on correct naming
Checkpoint: git commit -m "feat!: rename plugin from agent-builder to agent-factory"

Step 2: Create /agent-factory:deploy command

Files: commands/deploy.md (new)
Changes: Create a new command file with frontmatter:
```
description: Configure deployment for your agent system. Supports /schedule, cron/launchd, systemd, and Docker.
argument-hint: "Optional: deployment target (schedule, local, vps, docker)"
allowed-tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash", "Agent", "AskUserQuestion"]
```
The command body should:
1. Read the user's existing agent system (scan .claude/agents/, .claude/skills/, CLAUDE.md)
2. Ask the user to choose a deployment target via AskUserQuestion: /schedule (Claude Code native), Local (cron/launchd), VPS (systemd), Docker (compose), or use $ARGUMENTS if provided
3. For /schedule: Generate a HEARTBEAT.md file with task entries parsed from the user's pipeline skills, generate automation wrapper reading ${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh
4. For Local: Copy and customize automation.sh template, generate cron/launchd config
5. For VPS: Generate systemd service and timer units, customize automation.sh
6. For Docker: Generate Dockerfile and docker-compose.yml from templates (Step 16)
7. Use the deployment-advisor agent (Step 3) when the user needs guidance
8. Provide exact activation commands for the chosen target
Reuses: commands/build.md for command frontmatter format and phase structure; scripts/templates/automation.sh for headless runner pattern; skills/agent-system-design/references/deployment-targets.md for target-specific guidance
Verify: head -5 /Users/ktg/repos/agent-builder/commands/deploy.md → expected: valid YAML frontmatter with description: field
On failure: revert -- command file must be valid YAML frontmatter
Checkpoint: git commit -m "feat(commands): add /agent-factory:deploy command"

Step 3: Create deployment-advisor agent

Files: agents/deployment-advisor.md (new)

Changes: Create agent file with frontmatter:

name: deployment-advisor
description: |
  Use this agent when the user needs help choosing or configuring a deployment target for their agent system.
  <example>
  Context: User has built agents and wants to deploy
  user: "How should I deploy my agent system?"
  assistant: "I'll use the deployment-advisor to analyze your setup and recommend a target."
  <commentary>
  Deployment guidance request triggers the advisor.
  </commentary>
  </example>
  <example>
  Context: User wants to switch deployment targets
  user: "Can I move my agents from cron to Docker?"
  assistant: "I'll use the deployment-advisor to plan the migration."
  <commentary>
  Deployment migration request triggers the advisor.
  </commentary>
  </example>
model: sonnet
color: blue
tools: ["Read", "Glob", "Grep", "Bash", "AskUserQuestion"]

The agent body should:

Scan the user's project for existing agents, skills, hooks, settings, and automation files
Assess requirements: always-on needed? team access? Computer Use? budget constraints?
Recommend a deployment target using the decision guide from ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/deployment-targets.md
Generate deployment-specific configuration files
Include rules: never overwrite existing deployment config without confirmation, always verify generated scripts with bash -n, always include rollback instructions

Reuses: agents/builder.md for agent file format and example block structure; skills/agent-system-design/references/deployment-targets.md for deployment decision logic
Verify: python3 -c "import yaml; yaml.safe_load(open('/Users/ktg/repos/agent-builder/agents/deployment-advisor.md').read().split('---')[1])" → expected: no error (valid YAML frontmatter)
On failure: retry -- fix YAML syntax, then revert if still failing
Checkpoint: git commit -m "feat(agents): add deployment-advisor agent"

Step 4: Create /agent-factory:evaluate command

Files: commands/evaluate.md (new)
Changes: Create command with frontmatter:
```
description: Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations.
argument-hint: "Optional: focus area (security, deployment, memory, autonomy)"
allowed-tools: ["Read", "Glob", "Grep", "Bash"]
```
The command body should:
1. Scan the project for all agent system components: .claude/agents/*.md, .claude/skills/*/SKILL.md, .claude/hooks/*.sh or hooks/*.sh, .claude/settings.json, CLAUDE.md, automation/ or scripts/, memory/ or data/
2. For each of the 22 capabilities from ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md, check whether the user's project has the corresponding component
3. Score: count OK/Partial/Missing for each capability
4. Output a capability matrix table showing: capability name, status (OK/Partial/Missing), what exists, what's needed
5. Provide specific recommendations for filling gaps, ordered by impact
6. If $ARGUMENTS specifies a focus area, expand that section with detailed guidance
Reuses: skills/agent-system-design/references/feature-map.md for the 22-capability checklist; commands/build.md for command frontmatter format
Verify: head -5 /Users/ktg/repos/agent-builder/commands/evaluate.md → expected: valid YAML frontmatter with description: field
On failure: revert -- command must have valid frontmatter
Checkpoint: git commit -m "feat(commands): add /agent-factory:evaluate command"

Step 5: Create /agent-factory:status command

Files: commands/status.md (new)
Changes: Create command with frontmatter:
```
description: Quick status check of your agent infrastructure. Shows agents, skills, hooks, deployment, and recent activity.
argument-hint: ""
allowed-tools: ["Read", "Glob", "Grep", "Bash"]
```
The command body should:
1. Scan for agents: Glob for .claude/agents/*.md, list each with name and model
2. Scan for skills: Glob for .claude/skills/*/SKILL.md and .claude/skills/*.md, list each with name
3. Scan for hooks: Glob for .claude/hooks/*.sh and hooks/*.sh, check if executable
4. Check settings: read .claude/settings.json if it exists, summarize permissions and hook config
5. Check deployment: look for automation scripts, launchd plists, systemd units, docker-compose files, HEARTBEAT.md
6. Check memory: look for memory/MEMORY.md, data/run-state.json, SESSION-STATE.md
7. Check recent activity: read audit.log if it exists, show last 5 entries
8. Output a compact status table with sections for each component type
9. Flag any issues: missing hooks for deployed agents, agents without examples in description, skills without version
Reuses: commands/build.md for command frontmatter format
Verify: head -5 /Users/ktg/repos/agent-builder/commands/status.md → expected: valid YAML frontmatter
On failure: revert -- command must have valid frontmatter
Checkpoint: git commit -m "feat(commands): add /agent-factory:status command"

Step 6: Create managed-agents skill

Files: skills/managed-agents/SKILL.md (new), skills/managed-agents/references/api-patterns.md (new)
Changes:
- skills/managed-agents/SKILL.md: Create skill with frontmatter:
```
name: managed-agents
description: |
  This skill should be used when the user asks about "managed agents",
  "Anthropic API agents", "cloud-hosted agents", "agent SDK",
  "deploying agents to the cloud", "serverless agents",
  "API-based agent deployment", "/v1/agents endpoint"
version: 0.1.0
```
  Body should cover:
  - What managed agents are (Anthropic-hosted agent runtime)
  - When to use them vs. local deployment (decision matrix)
  - SDK patterns for TypeScript and Python
  - Session management and persistence
  - Budget and cost considerations for API-based agents
  - Migration path from local to managed
- skills/managed-agents/references/api-patterns.md: Reference doc with concrete code patterns for agent creation, session management, tool configuration, and error handling using @anthropic-ai/sdk
Reuses: skills/agent-system-design/SKILL.md for skill file format; skills/agent-system-design/references/deployment-targets.md for the managed agents section content
Verify: head -5 /Users/ktg/repos/agent-builder/skills/managed-agents/SKILL.md → expected: valid YAML frontmatter with name: managed-agents
On failure: revert -- skill must have valid frontmatter
Checkpoint: git commit -m "feat(skills): add managed-agents knowledge skill"

Step 7: Create domain template infrastructure and first 5 templates

Files: scripts/templates/domains/content-pipeline.md (new), scripts/templates/domains/code-review.md (new), scripts/templates/domains/monitoring.md (new), scripts/templates/domains/research-synthesis.md (new), scripts/templates/domains/data-processing.md (new), scripts/templates/domains/README.md (new)
Changes: Create 5 domain-specific pipeline templates. Each template is a plain markdown file with {{PLACEHOLDER}} variables that the builder agent copies into the user's project. Each template file contains:
- A header comment explaining the domain and what gets generated
- Agent definitions (researcher/writer/reviewer variants specialized to the domain)
- Pipeline skill template with domain-specific steps
- Recommended hook patterns for the domain
- Example CLAUDE.md sections relevant to the domain
Template 1: content-pipeline.md -- Content production (articles, newsletters, reports). Agents: content-researcher, content-writer, content-reviewer. Pipeline: research topic, draft article, review quality, publish.

Template 2: code-review.md -- Automated code review. Agents: code-analyzer, review-writer, standards-checker. Pipeline: analyze PR/diff, write review, check against standards, post review.

Template 3: monitoring.md -- System/service monitoring. Agents: monitor-checker, incident-reporter, remediation-advisor. Pipeline: check endpoints/logs, detect anomalies, report incidents, suggest fixes.

Template 4: research-synthesis.md -- Research and analysis. Agents: source-gatherer, synthesizer, fact-checker. Pipeline: gather sources, synthesize findings, verify claims, produce brief.

Template 5: data-processing.md -- Data transformation pipelines. Agents: data-validator, transformer, quality-checker. Pipeline: validate input, transform data, check output quality, save results.

README.md: Index listing all available templates with one-line descriptions, usage instructions for the builder agent, and template contribution guidelines.

All templates must use {{PROJECT_DIR}}, {{AGENT_NAME}}, {{PIPELINE_NAME}}, {{SCHEDULE}}, {{DOMAIN}} as placeholder variables.
Reuses: agents/builder.md for agent definition format; commands/build.md Phase 3-4 for pipeline creation patterns; skills/agent-system-design/references/pipeline-patterns.md for the 9-step pipeline template
Verify: ls /Users/ktg/repos/agent-builder/scripts/templates/domains/ | wc -l → expected: 6 (5 templates + README)
On failure: retry -- ensure all 5 templates have valid structure, then revert if still failing
Checkpoint: git commit -m "feat(templates): add 5 domain-specific pipeline templates"

Step 8: Update build command to use domain templates and new features

Files: commands/build.md
Changes:
- Add a Phase 0 (pre-interview) that asks: "Would you like to start from a domain template? Available: content-pipeline, code-review, monitoring, research-synthesis, data-processing. Or choose 'custom' for a blank start."
- If template chosen: read ${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.md and pre-populate Phase 1 design sketch with template roles and pipeline structure
- Update Phase 6 (Deployment) to reference the new /agent-factory:deploy command instead of inline deployment logic. Add /schedule as a deployment option. Change the question to offer all 5 targets: /schedule, Local (cron/launchd), Mac Mini (launchd), VPS (systemd), Docker
- Update the summary at end to mention /agent-factory:evaluate and /agent-factory:status as next steps
- Update all self-references from /agent-builder:build to /agent-factory:build
Reuses: Existing Phase 1-7 structure in commands/build.md; scripts/templates/domains/ templates from Step 7
Verify: grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md → expected: >= 3 references
On failure: revert -- build command is critical path
Checkpoint: git commit -m "feat(commands): integrate domain templates and new commands into build workflow"

Step 9: Create 3-tier memory templates (OpenClaw pattern)

Files: scripts/templates/memory/SESSION-STATE.md (new), scripts/templates/memory/DAILY-LOG.md (new), scripts/templates/memory/MEMORY.md (new), scripts/templates/memory/README.md (new)
Changes:
- SESSION-STATE.md template: Hot working memory. Contains {{AGENT_NAME}} header, sections for Current Task, Context Window Usage (with 60% danger zone marker), Active Decisions, Pending Actions. WAL Protocol instruction: "Write important details HERE before responding to the user." Includes a Working Buffer section that activates when context usage exceeds 60%: "When context usage is above 60%, capture key exchanges in the Working Buffer before they are lost to compaction."
- DAILY-LOG.md template: Warm daily capture. Filename pattern: memory/{{YYYY-MM-DD}}.md. Sections: Date, Summary of Work, Decisions Made, Files Modified, Issues Encountered, Carry Forward (items for next session). Auto-rotation instruction: one file per day, builder generates the date-stamped filename.
- MEMORY.md template: Cold long-term curated memory. Sections: Agent Identity, Key Learnings (manually curated from daily logs), Recurring Patterns, Known Issues, Project Context, Last Updated. Compaction Recovery instruction: "When recovering from compaction, read in order: SESSION-STATE.md first, then today's daily log, then MEMORY.md, then search all daily logs for relevant context."
- README.md: Explains the 3-tier memory pattern, how each tier works, when to read/write each tier, and the WAL protocol and Working Buffer protocol with concrete examples.
Reuses: OpenClaw's proactive agent skill pattern (from research); existing MEMORY.md convention in Claude Code
Verify: ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l → expected: 4 files
On failure: revert -- templates must be valid markdown with proper placeholders
Checkpoint: git commit -m "feat(templates): add 3-tier memory templates (OpenClaw pattern)"

Step 10: Create heartbeat and cron templates (OpenClaw + Paperclip patterns)

Files: scripts/templates/heartbeat/HEARTBEAT.md (new), scripts/templates/heartbeat/heartbeat-runner.sh (new), scripts/templates/heartbeat/README.md (new)
Changes:
- HEARTBEAT.md template: Agent heartbeat file following OpenClaw's format. Contains:
```
# Heartbeat: {{AGENT_NAME}}

Read this file on each heartbeat. Follow it strictly. Do not infer or
repeat old tasks from prior chats. If nothing needs attention, reply
HEARTBEAT_OK.

## Tasks

tasks:
  - name: {{TASK_1_NAME}}
    interval: {{TASK_1_INTERVAL}}
    prompt: "{{TASK_1_PROMPT}}"
  - name: {{TASK_2_NAME}}
    interval: {{TASK_2_INTERVAL}}
    prompt: "{{TASK_2_PROMPT}}"

## Context

{{CONTEXT_NOTES}}
```
- heartbeat-runner.sh template: Bash 3.2 compatible script that:
  1. Reads HEARTBEAT.md
  2. Implements emptiness detection (skip API call if file has only headers/empty items -- saves cost, from OpenClaw pattern)
  3. Parses task intervals using python3 (parse YAML tasks block, check last-run timestamps from a .heartbeat-state JSON file)
  4. For each due task: invokes claude -p with the task prompt
  5. Updates .heartbeat-state with new last-run timestamps
  6. Implements startup catchup: on first run after downtime, runs up to 5 missed tasks with 5-second stagger between them (OpenClaw pattern)
  7. Suppresses responses shorter than 300 chars from output (ackMaxChars pattern)
  8. All date math uses python3 (not bash arithmetic) for portability Script must use #!/bin/bash and be bash 3.2 compatible: no associative arrays, no mapfile, no |&
- README.md: Documents the heartbeat pattern, explains systemEvent vs agentTurn distinction, explains emptiness detection cost saving, explains startup catchup logic, provides example cron entries for running the heartbeat runner at various intervals.
Reuses: scripts/templates/automation.sh for the headless runner pattern; OpenClaw heartbeat source code patterns (emptiness detection, task parsing, ackMaxChars); Paperclip's heartbeat interval and wakeup trigger model
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh → expected: no syntax errors
On failure: retry -- fix bash syntax errors (likely 3.2 compatibility issues), then revert if still failing
Checkpoint: git commit -m "feat(templates): add heartbeat templates with emptiness detection and catchup"

Step 11: Create proactive agent template with ADL/VFM guardrails

Files: scripts/templates/proactive/PROACTIVE-AGENT.md (new), scripts/templates/proactive/ADL-RULES.md (new), scripts/templates/proactive/VFM-SCORING.md (new), scripts/templates/proactive/README.md (new)
Changes:
- PROACTIVE-AGENT.md template: An agent template (with YAML frontmatter) for agents that can self-improve. Contains:
  - Standard agent frontmatter with model: sonnet and appropriate tools
  - "How you work" section describing the proactive cycle: observe environment, identify improvements, score proposed changes against VFM, implement if score > 50, log all decisions
  - "Self-improvement protocol" section: before making any change to your own config, skills, or prompts, you MUST run VFM scoring. Changes that score <= 50 are logged but NOT implemented.
  - "Rules" section including all ADL limits: no fake intelligence (do not pretend capabilities you lack), no unverifiable modifications (all changes must be testable), no novelty over stability (prefer proven approaches), stability > explainability > reusability > scalability > novelty
  - "Self-healing" section: when encountering errors, try up to 5 different approaches before asking for help. Log each attempt with result.
- ADL-RULES.md template: Anti-Drift Limits reference. Full list of constraints that prevent agent drift:
  1. No fake intelligence -- do not simulate capabilities
  2. No unverifiable modifications -- every change must be testable
  3. No novelty over stability -- proven approaches first
  4. No scope expansion without approval -- stay within defined boundaries
  5. No silent failures -- all errors must be logged
  6. Priority ordering: Stability > Explainability > Reusability > Scalability > Novelty
- VFM-SCORING.md template: Value-First Modification scoring guide. For any proposed self-modification:
  - Frequency: How often does this issue occur? (0-25 points)
  - Failure reduction: Does this fix real failures? (0-25 points)
  - Burden reduction: Does this reduce human effort? (0-25 points)
  - Cost savings: Does this reduce API/compute costs? (0-25 points)
  - Total score: sum of above. Implement only if > 50.
  - Logging format: date, proposed change, scores per dimension, total, decision (implement/defer)
- README.md: Explains the proactive agent pattern, when to use it, relationship to OpenClaw's proactive agent skill, examples of good vs bad self-modifications, how ADL and VFM work together.
Reuses: agents/builder.md for agent frontmatter format; OpenClaw proactive agent skill patterns (ADL, VFM, self-healing attempts)
Verify: ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l → expected: 4 files
On failure: revert -- templates must be valid and complete
Checkpoint: git commit -m "feat(templates): add proactive agent templates with ADL/VFM guardrails"

Step 12: Create isolated agentTurn template

Files: scripts/templates/cron/agent-turn.sh (new), scripts/templates/cron/system-event.sh (new), scripts/templates/cron/README.md (new)
Changes:
- agent-turn.sh template: Bash 3.2 compatible script for true background autonomy (OpenClaw's agentTurn pattern). Fires a full agent turn with its own isolated session:
  1. Accepts {{AGENT_NAME}}, {{WORKING_DIR}}, {{MAX_TURNS}} placeholders
  2. Creates/resumes an isolated session using claude --resume with a task-specific session ID format: agent:{{AGENT_NAME}}:turn:$(date +%s)
  3. Loads context from HEARTBEAT.md and SESSION-STATE.md before invoking
  4. Writes results to a dated log file
  5. Includes PID tracking for orphan detection (write PID to .agent-turn.pid, check on next run)
  6. Session rollover: after configurable number of turns (default 200) or age (default 72h), start a fresh session
- system-event.sh template: Lighter-weight script for injecting a text event into an existing session (OpenClaw's systemEvent pattern):
  1. Accepts {{SESSION_ID}}, {{EVENT_TEXT}} placeholders
  2. Uses claude --resume {{SESSION_ID}} -p "{{EVENT_TEXT}}" to inject into existing session
  3. Does NOT create a new session -- requires an active session
- README.md: Explains the difference between agentTurn (full background autonomy, isolated session) and systemEvent (inject into existing session), when to use each, session lifecycle and rollover, orphan detection pattern.
Reuses: scripts/templates/automation.sh for headless runner pattern; scripts/templates/heartbeat/heartbeat-runner.sh pattern from Step 10; OpenClaw cron service patterns (session targets, rollover, catchup)
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh → expected: no syntax errors
On failure: retry -- fix bash 3.2 syntax issues, then revert if still failing
Checkpoint: git commit -m "feat(templates): add isolated agentTurn and systemEvent cron templates"

Step 13: Update agent-system-design skill with OpenClaw pattern references

Files: skills/agent-system-design/SKILL.md, skills/agent-system-design/references/memory-patterns.md (new), skills/agent-system-design/references/autonomy-patterns.md (new)
Changes:
- SKILL.md: Add new sections:
  - "## Memory patterns" -- describe 3-tier memory, link to ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md
  - "## Autonomy patterns" -- describe proactive agent, ADL/VFM, link to ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md
  - "## Heartbeat scheduling" -- describe heartbeat pattern, link to templates
  - Update "Getting started" to mention /agent-factory:build (already renamed in Step 1)
  - Add to trigger phrases in description: "agent memory", "3-tier memory", "WAL protocol", "proactive agent", "self-improving agent", "heartbeat scheduling"
- memory-patterns.md (new): Comprehensive reference on 3-tier memory:
  - Architecture: hot (SESSION-STATE.md), warm (daily logs), cold (MEMORY.md)
  - WAL Protocol: write before responding, prevents data loss on crashes
  - Working Buffer Protocol: activate at 60% context usage, capture key exchanges
  - Compaction Recovery: read order for resuming after context loss
  - Template locations and placeholder variables
- autonomy-patterns.md (new): Reference on self-governing agent patterns:
  - Proactive agent cycle: observe, identify, score (VFM), implement or defer
  - ADL constraints with examples
  - VFM scoring rubric with worked examples
  - Self-healing protocol (5-10 attempts before escalation)
  - Isolated agentTurn vs systemEvent
  - When NOT to use proactive patterns (simple pipelines, human-in-the-loop workflows)
Reuses: skills/agent-system-design/SKILL.md existing structure; skills/agent-system-design/references/feature-map.md format for reference docs; OpenClaw research brief patterns
Verify: grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md → expected: >= 3
On failure: revert -- skill file must remain valid
Checkpoint: git commit -m "feat(skills): add memory and autonomy pattern references to agent-system-design"

Step 14: Create heartbeat context injection templates (Paperclip pattern)

Files: scripts/templates/heartbeat/context-packet.md (new), scripts/templates/heartbeat/wake-prompt.md (new)

Changes:

context-packet.md template: Paperclip's "Memento Man" pattern -- a curated context payload injected on each heartbeat wakeup. Contains sections with {{PLACEHOLDER}} variables:

# Context Packet: {{AGENT_NAME}}
Generated: {{TIMESTAMP}}

## Identity
{{AGENT_IDENTITY}}

## Current Goals
{{ACTIVE_GOALS}}

## Memory State
{{MEMORY_SUMMARY}}

## Task Queue
{{PENDING_TASKS}}

## Recent Events (last 24h)
{{RECENT_EVENTS}}

## Wake Reason
{{WAKE_REASON}}

## Budget Status
Spent: {{BUDGET_SPENT}} / {{BUDGET_LIMIT}} ({{BUDGET_PERCENT}}%)
{{BUDGET_WARNING}}

wake-prompt.md template: The prompt template composed for each heartbeat wakeup. Follows Paperclip's bootstrapPromptTemplate + promptTemplate pattern:

You are {{AGENT_NAME}}. You are waking up for a scheduled heartbeat.

Read the context packet below carefully. It contains everything you
need to know about your current state and pending work.

{{CONTEXT_PACKET}}

Your task for this beat:
{{WAKE_REASON}}

Rules:
- Do NOT infer tasks from prior conversations
- Only work on what is in the context packet and wake reason
- If nothing needs attention, respond with HEARTBEAT_OK
- Update SESSION-STATE.md before finishing

Reuses: scripts/templates/heartbeat/HEARTBEAT.md from Step 10; Paperclip's context snapshot pattern from source code analysis
Verify: ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l → expected: 5 files total (HEARTBEAT.md, heartbeat-runner.sh, README.md from Step 10 + context-packet.md, wake-prompt.md from this step)
On failure: revert -- templates must have valid placeholder syntax
Checkpoint: git commit -m "feat(templates): add context injection templates (Paperclip heartbeat pattern)"

Step 15: Create goal hierarchy templates (Paperclip pattern)

Files: scripts/templates/goals/GOALS.md (new), scripts/templates/goals/goal-tracker.sh (new), scripts/templates/goals/README.md (new)
Changes:
- GOALS.md template: File-based goal hierarchy using Paperclip's simple parent_id pattern but adapted for flat files. Format:
```
# Goals: {{PROJECT_NAME}}

## Company Goals
- [G1] {{COMPANY_GOAL_1}}
- [G2] {{COMPANY_GOAL_2}}

## Project Goals
- [G1.1] {{PROJECT_GOAL_1}} (parent: G1)
- [G1.2] {{PROJECT_GOAL_2}} (parent: G1)
- [G2.1] {{PROJECT_GOAL_3}} (parent: G2)

## Task Goals
- [G1.1.1] {{TASK_GOAL_1}} (parent: G1.1, owner: {{AGENT_NAME}}, status: active)
- [G1.1.2] {{TASK_GOAL_2}} (parent: G1.1, owner: {{AGENT_NAME}}, status: pending)
```
  Each goal has: ID (hierarchical dot notation), description, parent reference, owner agent, status (active/pending/complete/blocked). The parent reference is a simple string match -- no recursive traversal needed (matching Paperclip's actual implementation vs. aspirational docs).
- goal-tracker.sh template: Bash 3.2 compatible script that:
  1. Reads GOALS.md
  2. Uses python3 to parse goal entries and their parent/status/owner fields
  3. Reports: count by status, orphaned goals (parent doesn't exist), goals without owners
  4. Can update status: ./goal-tracker.sh complete G1.1.1 marks goal as complete
  5. Generates a goal summary for context injection (used by heartbeat context-packet)
- README.md: Explains goal hierarchy pattern, the simple parent_id approach (honest about it being flat, not recursive), how agents reference goals, integration with heartbeat context injection.
Reuses: Paperclip's goal hierarchy pattern (simple parent_id, not recursive); context packet integration from Step 14
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/goals/goal-tracker.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add goal hierarchy templates (Paperclip pattern)"

Step 16: Create budget tracking templates (Paperclip pattern)

Files: scripts/templates/budget/budget-hook.sh (new), scripts/templates/budget/BUDGET.md (new), scripts/templates/budget/budget-report.sh (new), scripts/templates/budget/README.md (new)
Changes:
- budget-hook.sh template: PostToolUse hook that logs cost events. Bash 3.2 compatible. Following Paperclip's post-hoc enforcement pattern:
  1. Reads tool call result from stdin (JSON via python3)
  2. Extracts usage data if available (token counts from Claude's response)
  3. Appends cost event to budget/cost-events.jsonl with timestamp, agent name, token counts, estimated cost
  4. After logging, calls budget-check logic: reads budget/BUDGET.md for policy, sums cost events for current window
  5. If soft threshold (default 80%) reached: writes warning to stderr
  6. If hard threshold (100%) reached AND hard_stop enabled: writes block message, creates budget/PAUSED flag file
  7. [ASSUMPTION] Cost estimation uses Anthropic API pricing. If API billing endpoint is available, query it instead of estimating from token counts.
- BUDGET.md template: Budget policy file:
```
# Budget Policy: {{PROJECT_NAME}}

## Company Budget
- window: {{BUDGET_WINDOW}}  # calendar_month or lifetime
- limit: {{BUDGET_LIMIT_CENTS}} cents
- warn_percent: 80
- hard_stop: true

## Agent Budgets
- {{AGENT_NAME}}: {{AGENT_BUDGET_CENTS}} cents/{{BUDGET_WINDOW}}

## Notification
- on_warn: log  # log | file | hook
- on_hard_stop: pause  # pause | terminate
```
- budget-report.sh template: Bash 3.2 compatible script that:
  1. Reads budget/cost-events.jsonl
  2. Uses python3 to aggregate costs by agent, by day, by window
  3. Compares against BUDGET.md policies
  4. Outputs a formatted report showing: total spend, per-agent spend, % of budget used, projection for current window
  5. Checks for PAUSED flag file and reports paused agents
- README.md: Explains budget enforcement pattern, post-hoc vs pre-run checking (honest about limitations), how to integrate budget-hook.sh into settings.json PostToolUse, Anthropic API integration for accurate cost data (marked as [ASSUMPTION]).
Reuses: scripts/templates/post-tool-use.sh for PostToolUse hook pattern; Paperclip's budget enforcement pattern (post-hoc, soft/hard thresholds, pause on exceed)
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-hook.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-report.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add budget tracking templates (Paperclip pattern)"

Step 17: Create governance and approval gate templates (Paperclip pattern)

Files: scripts/templates/governance/GOVERNANCE.md (new), scripts/templates/governance/approval-gate.sh (new), scripts/templates/governance/README.md (new)

Changes:

GOVERNANCE.md template: Governance policy file. "Autonomy is a privilege you grant" (Paperclip philosophy). Contains:

# Governance: {{PROJECT_NAME}}

## Autonomy Levels
- Level 0: Full manual approval (all tool calls require human OK)
- Level 1: Auto-approve safe operations (Read, Glob, Grep)
- Level 2: Auto-approve file operations (+ Write, Edit within project)
- Level 3: Auto-approve all except destructive (+ Bash non-destructive)
- Level 4: Full autonomy with hooks as guardrails

Current level: {{AUTONOMY_LEVEL}}

## Approval Gates
Gates are checkpoints where the agent MUST pause and request human approval.

- {{GATE_1_NAME}}: {{GATE_1_CONDITION}}
  Action: {{GATE_1_ACTION}}  # pause | notify | log
- {{GATE_2_NAME}}: {{GATE_2_CONDITION}}
  Action: {{GATE_2_ACTION}}

## Escalation Rules
- Budget exceeded: pause agent, notify via {{NOTIFICATION_METHOD}}
- Error threshold: after {{ERROR_THRESHOLD}} consecutive errors, pause agent
- Unknown tool call: block and log
- Scope violation: block and notify

## Audit Requirements
- All tool calls logged to audit.log
- Budget events logged to cost-events.jsonl
- Approval decisions logged to approvals.log
- Retention: {{LOG_RETENTION_DAYS}} days

approval-gate.sh template: PreToolUse hook that implements approval gates. Bash 3.2 compatible:
1. Reads GOVERNANCE.md to get current autonomy level and gate definitions
2. Uses python3 to parse the governance config
3. Based on autonomy level, decides whether to auto-approve or require approval
4. For gated operations: writes an approval request to governance/pending-approvals.jsonl with timestamp, tool name, tool input summary, and status=pending
5. Checks governance/approval-responses.jsonl for a matching response
6. If no response within timeout: blocks (exit 2) with message "Approval required. Check governance/pending-approvals.jsonl"
7. If approved: allows (exit 0) and logs to approvals.log
8. If denied: blocks (exit 2) and logs denial
README.md: Explains governance model, autonomy levels, how gates map to Claude Code hooks, how to configure notification channels, comparison with Paperclip's approvals table approach.

Reuses: scripts/templates/pre-tool-use.sh for PreToolUse hook pattern; Paperclip's approval mechanism and autonomy model
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/governance/approval-gate.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add governance and approval gate templates (Paperclip pattern)"

Step 18: Create org-chart template (Paperclip pattern)

Files: scripts/templates/org-chart/ORG-CHART.md (new), scripts/templates/org-chart/org-manager.sh (new), scripts/templates/org-chart/README.md (new)

Changes:

ORG-CHART.md template: File-based org chart using Paperclip's simple reportsTo pattern. Format:

# Organization: {{ORG_NAME}}

## Agents

| Agent | Role | Reports To | Status | Budget |
|-------|------|-----------|--------|--------|
| {{AGENT_1}} | {{ROLE_1}} | (board) | active | {{BUDGET_1}} |
| {{AGENT_2}} | {{ROLE_2}} | {{AGENT_1}} | active | {{BUDGET_2}} |
| {{AGENT_3}} | {{ROLE_3}} | {{AGENT_1}} | active | {{BUDGET_3}} |

## Delegation Rules

- Board (human) → top-level agents: task assignment, goal setting
- Manager agents → direct reports: task decomposition, delegation
- Cross-team requests: route through common ancestor in org chart
- Escalation: up the reporting chain to the first agent with authority

## Human Override

The human operator is the "board of directors" with override authority
on all decisions. Any agent can be paused, redirected, or terminated
by the human at any time.

org-manager.sh template: Bash 3.2 compatible script that:
1. Reads ORG-CHART.md and the .claude/agents/ directory
2. Uses python3 to parse the org chart table
3. Validates: all agents listed exist as .claude/agents/*.md files, no circular reporting chains, all agents have a role
4. Can add an agent: ./org-manager.sh add agent-name "Role" reports-to-agent
5. Can remove an agent: ./org-manager.sh remove agent-name (reassigns direct reports to parent)
6. Generates a text-based org tree visualization
README.md: Explains org chart pattern, Paperclip's simple reportsTo FK approach, how delegation flows through the hierarchy, cross-team routing, human override authority.

Reuses: Paperclip's org chart implementation (reportsTo FK, role field); scripts/templates/goals/goal-tracker.sh pattern for markdown table parsing with python3
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/org-chart/org-manager.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add org-chart template (Paperclip pattern)"

Step 19: Update agent-system-design skill with Paperclip pattern references

Files: skills/agent-system-design/SKILL.md, skills/agent-system-design/references/orchestration-patterns.md (new), skills/agent-system-design/references/governance-patterns.md (new)
Changes:
- SKILL.md: Add new sections:
  - "## Orchestration patterns" -- describe heartbeat scheduling, context injection, goal hierarchy, org chart. Link to ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md
  - "## Governance patterns" -- describe autonomy levels, approval gates, budget enforcement. Link to ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md
  - Add to trigger phrases: "heartbeat scheduling", "goal hierarchy", "budget tracking", "approval gates", "governance", "org chart", "multi-agent coordination", "agent budget"
  - Update the "System components" table to include: Memory (3-tier), Heartbeat, Goals, Budget, Governance, Org Chart
- orchestration-patterns.md (new): Reference covering:
  - Heartbeat scheduling with context injection
  - Goal hierarchy (simple parent_id, not recursive)
  - Org chart and delegation
  - Task checkout via file locking (write task.lock with agent name, check before claiming)
  - Session persistence (task-specific session IDs)
  - Comparison matrix: OpenClaw cron vs Paperclip heartbeat vs Claude Code /schedule
- governance-patterns.md (new): Reference covering:
  - Autonomy levels (0-4 scale)
  - Approval gates and human oversight
  - Budget enforcement (post-hoc)
  - Audit trail requirements
  - Error threshold and automatic pause
  - Paperclip's "autonomy is a privilege" philosophy
Reuses: skills/agent-system-design/SKILL.md existing structure; Step 13 pattern additions; skills/agent-system-design/references/security-patterns.md format
Verify: grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md → expected: >= 5
On failure: revert -- skill file must remain valid
Checkpoint: git commit -m "feat(skills): add orchestration and governance pattern references"

Step 20: Create feedback loop templates (Self-learning)

Files: scripts/templates/feedback/FEEDBACK.md (new), scripts/templates/feedback/feedback-collector.sh (new), scripts/templates/feedback/performance-scorer.sh (new), scripts/templates/feedback/README.md (new)
Changes:
- FEEDBACK.md template: Feedback tracking file:
```
# Feedback Log: {{PROJECT_NAME}}

## Format

Each entry records the outcome of a pipeline run and user feedback.

| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |
|------|----------|-------|-------|-------|------------|---------|
```
  Patterns column captures recurring issues for optimization.
- feedback-collector.sh template: PostToolUse hook variant that, after pipeline completion (detected by checking if the pipeline skill's final step ran), prompts for feedback collection:
  1. Reads the pipeline output
  2. Reads the reviewer score (if available)
  3. Appends an entry to FEEDBACK.md with: date, pipeline name, participating agents, reviewer score, identified issues
  4. Detects patterns: if the same issue type appears 3+ times, flags it as a recurring pattern
  5. All file operations use python3 for JSON/CSV parsing, bash 3.2 compatible wrapper
- performance-scorer.sh template: Standalone scoring script:
  1. Reads FEEDBACK.md and cost-events.jsonl
  2. Uses python3 to compute per-agent metrics: average reviewer score, error rate, cost per run, improvement trend (last 10 vs previous 10)
  3. Outputs a performance report with recommendations: which agents need prompt tuning, which are most cost-effective, which have degrading performance
  4. Flags agents scoring below configurable threshold (default: 60/100) for prompt review
- README.md: Explains feedback loop pattern, how to integrate with existing pipelines, how scoring informs self-improvement (connects to VFM from Step 11), example of a feedback-driven prompt iteration cycle.
Reuses: scripts/templates/post-tool-use.sh for hook pattern; scripts/templates/proactive/VFM-SCORING.md from Step 11 for scoring methodology
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add feedback loop and performance scoring templates"

Step 21: Create pipeline optimization templates (Self-learning)

Files: scripts/templates/optimization/pipeline-optimizer.sh (new), scripts/templates/optimization/self-healing.sh (new), scripts/templates/optimization/README.md (new)
Changes:
- pipeline-optimizer.sh template: Script that analyzes pipeline performance data and suggests optimizations. Bash 3.2 compatible:
  1. Reads feedback/FEEDBACK.md and budget/cost-events.jsonl
  2. Uses python3 to identify:
    - Bottleneck agents (highest cost, lowest score)
    - Unnecessary revision loops (reviewer always passes → remove revision loop)
    - Underutilized agents (rarely invoked → consider merging roles)
    - Cost outliers (runs that cost 3x+ average → investigate prompt efficiency)
  3. Generates optimization/RECOMMENDATIONS.md with specific, actionable suggestions
  4. Each recommendation includes VFM pre-score (from Step 11) to help decide whether to implement
  5. Does NOT auto-implement changes -- recommendations only. Human or proactive agent decides.
- self-healing.sh template: Error recovery script that runs after pipeline failures:
  1. Reads the error log (from audit.log or pipeline output)
  2. Categorizes the error: timeout, permission denied, tool not found, API error, content quality
  3. For each category, applies a recovery strategy:
    - Timeout: retry with reduced max_turns
    - Permission denied: check hooks and settings, log which permission was missing
    - Tool not found: check MCP server status, log missing tool
    - API error: retry with backoff (5s, 15s, 45s -- max 3 attempts)
    - Content quality: log for feedback loop, do not retry
  4. Tracks attempt count (max 5 per OpenClaw pattern), escalates after max attempts
  5. All recovery attempts logged to optimization/healing-log.jsonl
- README.md: Explains pipeline optimization and self-healing patterns, how they connect to feedback loops (Step 20) and VFM scoring (Step 11), when to use auto-healing vs manual intervention, safety limits on auto-recovery.
Reuses: scripts/templates/proactive/VFM-SCORING.md for scoring recommendations; scripts/templates/feedback/performance-scorer.sh output format; OpenClaw's 5-attempt self-healing pattern
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add pipeline optimization and self-healing templates"

Step 22: Create Docker deployment templates

Files: scripts/templates/docker/Dockerfile (new), scripts/templates/docker/docker-compose.yml (new), scripts/templates/docker/docker-entrypoint.sh (new), scripts/templates/docker/README.md (new)

Changes:

Dockerfile template:

FROM node:22-slim

# Install Claude Code CLI
RUN npm install -g @anthropic-ai/claude-code

# Create agent user (non-root)
RUN useradd -m -s /bin/bash agent

# Copy project files
WORKDIR /home/agent/project
COPY . .
RUN chown -R agent:agent /home/agent

USER agent

# Set up environment
ENV ANTHROPIC_API_KEY={{ANTHROPIC_API_KEY}}
ENV HOME=/home/agent

ENTRYPOINT ["./docker-entrypoint.sh"]

docker-compose.yml template:

version: "3.8"
services:
  agent:
    build: .
    container_name: {{PROJECT_NAME}}-agent
    restart: unless-stopped
    volumes:
      - ./data:/home/agent/project/data
      - ./memory:/home/agent/project/memory
      - ./budget:/home/agent/project/budget
      - ./logs:/home/agent/project/logs
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - AGENT_NAME={{AGENT_NAME}}
      - HEARTBEAT_INTERVAL={{HEARTBEAT_INTERVAL}}
    env_file:
      - .env

docker-entrypoint.sh template: Bash 3.2 compatible entry script:
1. Validates required environment variables (ANTHROPIC_API_KEY)
2. Runs the heartbeat-runner.sh in a loop with the configured interval
3. Graceful shutdown: trap SIGTERM, finish current run, exit cleanly
4. Health check: write timestamp to /tmp/agent-health on each successful heartbeat
README.md: Build and run instructions, volume mount explanation (data, memory, budget, logs persist across container restarts), environment variables reference, security considerations (never bake API keys into image), how to use with docker-compose up/down.

Reuses: scripts/templates/heartbeat/heartbeat-runner.sh as the main loop inside the container; scripts/templates/automation.sh pattern; Paperclip's Docker deployment approach (from research)
Verify: test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/Dockerfile && test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/docker-compose.yml → expected: both exist
On failure: revert -- Docker templates must be complete and valid
Checkpoint: git commit -m "feat(templates): add Docker deployment templates"

Step 23: Create import/export system

Files: scripts/templates/transfer/export-system.sh (new), scripts/templates/transfer/import-system.sh (new), scripts/templates/transfer/MANIFEST.md (new), scripts/templates/transfer/README.md (new)
Changes:
- export-system.sh template: Bash 3.2 compatible script that packages an agent system into a tarball:
  1. Accepts {{PROJECT_DIR}} and optional {{EXPORT_NAME}} placeholders
  2. Generates a MANIFEST.md listing all exported components with versions and checksums
  3. Collects: .claude/agents/*.md, .claude/skills/*/SKILL.md, .claude/hooks/*.sh or hooks/*.sh, .claude/settings.json, CLAUDE.md, HEARTBEAT.md, GOALS.md, GOVERNANCE.md, ORG-CHART.md, BUDGET.md, automation/ scripts
  4. Does NOT export: .env, *.local.*, audit.log, cost-events.jsonl, memory/ (runtime state), .git/
  5. Creates tarball: agent-system-{{EXPORT_NAME}}-{{DATE}}.tar.gz
  6. Outputs: tarball path, component count, total size
- import-system.sh template: Bash 3.2 compatible script that imports a tarball into a project:
  1. Accepts tarball path as argument
  2. Reads MANIFEST.md from the tarball to show what will be imported
  3. Checks for conflicts: existing files that would be overwritten
  4. If conflicts: lists them and exits with instructions (user must confirm or use --force)
  5. Extracts tarball contents into the project directory
  6. Replaces {{PLACEHOLDER}} variables in imported files with project-specific values (reads from a config file or prompts)
  7. Makes all .sh files executable
  8. Validates: all agent files have valid YAML frontmatter, all hook scripts pass bash -n
  9. Outputs: imported component list, any warnings
- MANIFEST.md template: Format for the manifest file included in exports:
```
# Agent System Export

Exported: {{DATE}}
Source: {{PROJECT_NAME}}
Version: {{VERSION}}

## Components

| Type | File | Checksum |
|------|------|----------|
| agent | .claude/agents/{{NAME}}.md | {{SHA256}} |
| skill | .claude/skills/{{NAME}}/SKILL.md | {{SHA256}} |
...

## Requirements
- Claude Code version: >= {{MIN_VERSION}}
- Required MCP servers: {{MCP_LIST}}
- Required tools: {{TOOL_LIST}}
```
- README.md: Explains import/export workflow, what is and isn't included, how to customize after import, round-trip verification process.
Reuses: Existing file structure conventions; template placeholder pattern
Verify: bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/export-system.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/import-system.sh → expected: no syntax errors
On failure: retry -- fix bash syntax, then revert if still failing
Checkpoint: git commit -m "feat(templates): add import/export system for agent systems"

Step 24: Create 5 additional domain templates (total 10)

Files: scripts/templates/domains/customer-support.md (new), scripts/templates/domains/devops-automation.md (new), scripts/templates/domains/legal-review.md (new), scripts/templates/domains/sales-intelligence.md (new), scripts/templates/domains/security-audit.md (new)
Changes: Template 6: customer-support.md -- Customer support automation. Agents: ticket-classifier, response-drafter, escalation-checker. Pipeline: classify incoming ticket, draft response, check if escalation needed, send or escalate.

Template 7: devops-automation.md -- DevOps pipeline automation. Agents: deploy-checker, incident-detector, runbook-executor. Pipeline: monitor deployments, detect incidents, execute runbook steps, report status.

Template 8: legal-review.md -- Legal document review. Agents: clause-extractor, risk-assessor, compliance-checker. Pipeline: extract key clauses, assess risks, check compliance requirements, produce review summary.

Template 9: sales-intelligence.md -- Sales research and intelligence. Agents: prospect-researcher, pitch-customizer, follow-up-tracker. Pipeline: research prospect, customize pitch materials, track follow-up actions.

Template 10: security-audit.md -- Security posture assessment. Agents: config-scanner, vulnerability-checker, remediation-advisor. Pipeline: scan configurations, check for vulnerabilities, advise on remediation, generate audit report.

Update scripts/templates/domains/README.md to include all 10 templates.

Each template follows the same format as Step 7: header comment, domain-specialized agent definitions, pipeline skill template, recommended hooks, example CLAUDE.md sections. All use {{PLACEHOLDER}} variables.
Reuses: scripts/templates/domains/*.md from Step 7 as format reference; agents/builder.md for agent format; commands/build.md Phase 3-4 patterns
Verify: ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l → expected: 11 (10 templates + README)
On failure: retry -- ensure all templates follow established format, then revert if still failing
Checkpoint: git commit -m "feat(templates): add 5 more domain templates (10 total)"

Step 25: Create MCP integration reference

Files: skills/agent-system-design/references/mcp-integrations.md (new)
Changes: Create a reference document for MCP server integrations commonly used with agent systems:
- Communication MCP servers: Slack (@anthropic-ai/mcp-server-slack), GitHub, Linear -- with .mcp.json configuration examples
- Data MCP servers: PostgreSQL, SQLite, filesystem -- for agent data storage
- Browser MCP servers: Playwright (@anthropic-ai/mcp-server-playwright) -- for web interaction agents
- Custom MCP servers: How to build an MCP server that agents can use, using @anthropic-ai/sdk
- For each server: purpose, .mcp.json entry format, which agent types benefit from it, security considerations
- Integration with Agent Factory: how the builder agent configures .mcp.json during Phase 3, how the deployment command handles MCP server availability on different targets
Reuses: skills/agent-system-design/references/feature-map.md format; existing .mcp.json references in the feature map
Verify: test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md → expected: file exists
On failure: skip -- MCP reference is supplementary, not blocking
Checkpoint: git commit -m "docs(skills): add MCP integration reference"

Step 26: Update build command to integrate all Phase 2-5 features

Files: commands/build.md
Changes: Extend the build workflow with new optional phases (user can skip any):
- Phase 2.5 (after Operating Manual, before Agent Team): Memory Setup -- Ask if user wants 3-tier memory. If yes, generate SESSION-STATE.md, MEMORY.md, and memory/ directory from templates. Configure daily log rotation.
- Phase 3.5 (after Agent Team): Skills & Custom Components -- Ask if agents need specialized skills beyond what was generated. For each: check if a matching skill exists (Glob for .claude/skills/). If yes, wire it up. If no, offer two options: (a) generate a skill skeleton from template, (b) explain how to create one manually and offer to pause. If user says "I need to build this first", save build state to build-state.json (current phase, completed phases, all user choices so far) and say "Run /agent-factory:build --resume when ready." On resume: read build-state.json and continue from where we left off.
- Phase 3.7 (after Skills): Proactive Agent -- Ask if any agents should be proactive (self-improving). If yes, add ADL/VFM rules from proactive templates. Show VFM scoring example.
- Phase 4.5 (after Pipeline): Integrations & MCP Servers -- Critical phase for connecting external services. Ask: "What external services do your agents need? (Slack, GitHub, databases, APIs, etc.)". For each service:
  1. Check if MCP server exists: search known servers (@anthropic-ai/mcp-server-*, community servers)
  2. If exists: generate .mcp.json entry with correct config, explain what env vars are needed
  3. If not exists: explain what an MCP server is, show the 3 options: (a) use an existing community server (provide search tips), (b) create a custom one (show skeleton + link to /mcp-builder skill or MCP docs), (c) use Bash tool directly (simpler but less structured)
  4. If user needs to create an MCP server: save build state and pause (same resume mechanism as Phase 3.5)
  5. After all integrations configured: validate .mcp.json syntax, test connectivity where possible
- Phase 5.5 (after Security): Governance -- Ask autonomy level (0-4). Generate GOVERNANCE.md with approval gates matching the chosen level. Integrate budget tracking if user wants it.
- Phase 5.7: Goals and Org Chart -- For multi-agent systems (3+ agents): generate GOALS.md and ORG-CHART.md. Define reporting structure and goal hierarchy.
- Phase 6 update: Deployment Target Selection -- Present ALL deployment options with clear trade-offs:
  - /schedule (Claude Code native) -- simplest, no infra needed, but requires Claude Code running
  - Local cron/launchd -- runs without Claude Code open, but tied to one machine
  - VPS systemd -- always-on, remote, but needs server access
  - Docker -- portable, isolated, reproducible, but needs Docker installed User MUST choose at least one. For each chosen target: generate config from templates, provide exact activation commands, explain how to verify it's running.
- Phase 7 update: After test run, offer to set up feedback loop (performance scoring template). Show how to run pipeline-optimizer.sh after first week.
- Resume mechanism: build-state.json format: { "phase": "4.5", "completed": ["1","2","3","3.5","4"], "choices": { "template": "monitoring", "agents": [...], "memory": true, ... }, "paused_reason": "User creating custom MCP server" }. On --resume: read state, print summary of what's done, continue from saved phase.
- Summary update: Include all new components in the final summary (memory, governance, goals, org-chart, budget, heartbeat, MCP integrations, deployment target).
Reuses: Existing Phase 1-7 structure; all templates from Steps 9-18
Verify: wc -l /Users/ktg/repos/agent-builder/commands/build.md → expected: significantly more lines than original (was ~390, should be ~600+)
On failure: revert -- build command is critical path, must be valid
Checkpoint: git commit -m "feat(commands): integrate all Phase 2-5 features into build workflow"

Step 27: Update plugin.json, CLAUDE.md, README.md for v1.0

Files: .claude-plugin/plugin.json, CLAUDE.md, README.md
Changes:
- plugin.json: Bump version to "1.0.0". Update description: "Build and manage autonomous agent systems with Claude Code. Guided workflow with 3-tier memory, heartbeat scheduling, budget tracking, governance, org-chart, 10 domain templates, and import/export. Inspired by OpenClaw and Paperclip patterns." Add keywords: "memory", "heartbeat", "budget", "governance", "org-chart", "templates", "import", "export". Update repository URL from agent-builder to agent-factory.
- CLAUDE.md: Update to reflect full Agent Factory capabilities:
  - Update "What this plugin does" to describe the full system
  - Update "Plugin structure" to list all 4 commands, 2 agents, 2 skills, and all template directories
  - Add a "Template directories" section listing: memory/, heartbeat/, proactive/, cron/, goals/, budget/, governance/, org-chart/, docker/, domains/, transfer/, feedback/, optimization/
- README.md: Complete rewrite reflecting Agent Factory v1.0:
  - New description emphasizing the full vision
  - Updated command table with all 4 commands
  - Feature list: 3-tier memory, heartbeat scheduling, proactive agents with ADL/VFM, goal hierarchy, budget tracking, governance and approval gates, org chart, 10 domain templates, Docker deployment, import/export
  - Architecture overview showing how patterns layer
  - Quick start guide
  - Pattern reference section linking to skill references
  - Version history
Reuses: Existing file structures; all features from Steps 1-26
Verify: python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'" → expected: no error
On failure: revert -- manifest must be valid JSON with correct version
Checkpoint: git commit -m "feat: Agent Factory v1.0.0 — full vision realized"

Failure recovery rules

On failure: revert -- undo this step's changes (git checkout -- {files}), do NOT proceed
On failure: retry -- attempt once more with the alternative approach described, then revert if still failing
On failure: skip -- this step is non-critical; continue to next step and note the skip
On failure: escalate -- stop execution entirely; the issue requires human judgment
Checkpoint -- after each step succeeds, commit changes so subsequent failures cannot corrupt completed work

Alternatives Considered

Approach	Pros	Cons	Why rejected
Template engine (Handlebars/EJS)	Richer templates, conditionals, loops	External dependency, added complexity	Spec requires plain string replace with `{{PLACEHOLDER}}` -- no engine
SQLite for budget/goals/org-chart	Structured queries, atomic operations	External dependency (sqlite3 binary), not human-readable	File-based approach is Claude Code-native, editable by humans and agents
Node.js scripts instead of bash	Richer JSON handling, async support	Requires Node.js installation, bash 3.2 constraint is for generated scripts	Bash with python3 for JSON is sufficient and more portable
Single monolithic build command	Simpler mental model, one entry point	Too long, hard to test phases independently	Separate commands allow modular use; build orchestrates
Pre-run budget checking (like a reservation system)	Prevents overspend before it happens	Requires persistent service or lock file coordination	Paperclip's post-hoc approach is proven robust enough in practice
Vector memory (sqlite-vec like OpenClaw)	Better semantic search	External dependency, complexity far exceeds file-based approach	Spec explicitly excludes vector/embedding memory -- stay file-based

Test Strategy

Framework: No test framework in this project. All verification is via shell commands.
Existing patterns: Manual verification via bash -n for shell scripts, YAML parsing validation for agent/skill frontmatter, grep for content verification.
Verification approach: Each step includes a concrete verify command. Additionally:
- All .sh templates verified with bash -n (syntax check)
- All agent .md files verified for valid YAML frontmatter via python3 yaml parser
- All command .md files verified for description: field in frontmatter
- Import/export round-trip tested: export a test system, import into a clean directory, verify all components present
- Domain templates verified: builder agent can read and apply placeholder substitution

Risks and Mitigations

Priority	Risk	Location	Impact	Mitigation
High	Anthropic billing API may not be accessible or may have different auth	`scripts/templates/budget/budget-hook.sh`	Budget tracking limited to token-based estimation	[ASSUMPTION] marked in template; fallback to token count estimation with configurable pricing
High	`/schedule` API may change	`commands/deploy.md`, heartbeat templates	Heartbeat scheduling breaks	Templates use standard `claude -p` invocation which is stable; `/schedule` is one option among cron/launchd/systemd/Docker
Medium	Bash 3.2 compatibility for complex scripts	All `.sh` templates	Scripts fail on Intel Mac	Every script verified with `bash -n`; python3 used for all complex operations (JSON, YAML, date math)
Medium	27-step plan scope is large	All files	Execution takes multiple sessions	Execution strategy groups into independent waves; each step is independently committable
Low	Docker template may need updates for newer Claude Code versions	`scripts/templates/docker/`	Docker deployment breaks	Dockerfile uses `node:22-slim` and `npm install -g` which auto-updates
Low	Domain templates may not match all user domains	`scripts/templates/domains/`	Users need custom templates	10 templates cover common domains; builder agent can create custom templates

Assumptions

#	Assumption	Why unverifiable	Impact if wrong
1	Anthropic billing API exists and is accessible with standard API key	No docs found confirming exact endpoint	Budget tracking falls back to token-based estimation; budget-hook.sh needs manual cost configuration
2	`/schedule` trigger interface is stable enough to build on	Claude Code internal API, no stability guarantee	Heartbeat templates still work with cron/launchd/systemd; `/schedule` is optional deployment target
3	Docker deployment should use docker-compose.yml with Dockerfile	Spec assumption, no user confirmation	Minor: can generate either format; both templates provided
4	`claude --resume` with custom session IDs works for isolated agent turns	Based on CLI docs, not tested with custom session key formats	agentTurn template may need session ID format adjustment

WARNING: Plan has 4 unverified assumptions -- review before executing.

Verification

End-to-end checks that prove the plan was implemented correctly:

grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan" | wc -l → expected: 0 (all renamed)
python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); print(d['name'], d['version'])" → expected: agent-factory 1.0.0
ls /Users/ktg/repos/agent-builder/commands/ | sort → expected: build.md deploy.md evaluate.md status.md
ls /Users/ktg/repos/agent-builder/agents/ | sort → expected: builder.md deployment-advisor.md
ls /Users/ktg/repos/agent-builder/skills/ | sort → expected: agent-system-design managed-agents
ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l → expected: 11
ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l → expected: 4
ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l → expected: 5
ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l → expected: 4
ls /Users/ktg/repos/agent-builder/scripts/templates/budget/ | wc -l → expected: 4
ls /Users/ktg/repos/agent-builder/scripts/templates/governance/ | wc -l → expected: 3
ls /Users/ktg/repos/agent-builder/scripts/templates/org-chart/ | wc -l → expected: 3
ls /Users/ktg/repos/agent-builder/scripts/templates/docker/ | wc -l → expected: 4
ls /Users/ktg/repos/agent-builder/scripts/templates/transfer/ | wc -l → expected: 4
find /Users/ktg/repos/agent-builder/scripts/templates -name "*.sh" -exec bash -n {} \; → expected: no errors (all scripts pass syntax check)
find /Users/ktg/repos/agent-builder/agents -name "*.md" -exec python3 -c "import yaml,sys; yaml.safe_load(open(sys.argv[1]).read().split('---')[1])" {} \; → expected: no errors (all agents have valid frontmatter)

Estimated Scope

Files to modify: 6 (plugin.json, CLAUDE.md, README.md, commands/build.md, skills/agent-system-design/SKILL.md, .gitignore)
Files to create: ~55 (4 commands, 1 agent, 1 skill + references, ~45 template files across 13 template directories)
Complexity: high (27 steps across 5 development phases, comprehensive template library)

Execution Strategy

File Dependency Analysis

Steps were analyzed for file overlap. The following connected components emerge:

Component A (rename + commands + agents): Steps 1, 2, 3, 4, 5, 8, 26, 27 -- all touch commands/build.md, CLAUDE.md, README.md, or plugin.json
Component B (memory + OpenClaw patterns): Steps 9, 10, 11, 12, 13 -- touch scripts/templates/memory/, scripts/templates/heartbeat/, scripts/templates/proactive/, scripts/templates/cron/, skills/agent-system-design/
Component C (Paperclip patterns): Steps 14, 15, 16, 17, 18, 19 -- touch scripts/templates/heartbeat/, scripts/templates/goals/, scripts/templates/budget/, scripts/templates/governance/, scripts/templates/org-chart/, skills/agent-system-design/
Component D (Self-learning): Steps 20, 21 -- touch scripts/templates/feedback/, scripts/templates/optimization/
Component E (Integration): Steps 22, 23, 24, 25 -- touch scripts/templates/docker/, scripts/templates/transfer/, scripts/templates/domains/, skills/agent-system-design/references/
Component F (Domain templates initial): Step 7 -- touch scripts/templates/domains/

Note: Components B and C share skills/agent-system-design/SKILL.md (Steps 13 and 19) and scripts/templates/heartbeat/ (Steps 10 and 14). Component A shares commands/build.md across Steps 8 and 26. These overlaps create dependencies.

Session 1: Foundation -- Rename and Commands

Steps: 1, 2, 3, 4, 5
Wave: 1
Depends on: none
Scope fence:
- Touch: .claude-plugin/plugin.json, CLAUDE.md, README.md, commands/build.md, commands/deploy.md, commands/evaluate.md, commands/status.md, agents/deployment-advisor.md, skills/agent-system-design/SKILL.md (rename only)
- Never touch: scripts/templates/* (except existing), skills/managed-agents/

Session 2: Skills and Initial Templates

Steps: 6, 7
Wave: 1
Depends on: none
Scope fence:
- Touch: skills/managed-agents/SKILL.md, skills/managed-agents/references/api-patterns.md, scripts/templates/domains/*.md
- Never touch: commands/, agents/, .claude-plugin/, scripts/templates/memory/, scripts/templates/heartbeat/

Session 3: OpenClaw Memory and Autonomy Patterns

Steps: 9, 10, 11, 12
Wave: 1
Depends on: none
Scope fence:
- Touch: scripts/templates/memory/, scripts/templates/heartbeat/HEARTBEAT.md, scripts/templates/heartbeat/heartbeat-runner.sh, scripts/templates/heartbeat/README.md, scripts/templates/proactive/, scripts/templates/cron/
- Never touch: commands/, agents/, skills/, scripts/templates/heartbeat/context-packet.md, scripts/templates/heartbeat/wake-prompt.md

Session 4: Paperclip Orchestration Patterns

Steps: 14, 15, 16, 17, 18
Wave: 1
Depends on: none
Scope fence:
- Touch: scripts/templates/heartbeat/context-packet.md, scripts/templates/heartbeat/wake-prompt.md, scripts/templates/goals/, scripts/templates/budget/, scripts/templates/governance/, scripts/templates/org-chart/
- Never touch: commands/, agents/, skills/, scripts/templates/heartbeat/HEARTBEAT.md, scripts/templates/heartbeat/heartbeat-runner.sh, scripts/templates/heartbeat/README.md

Session 5: Self-Learning Systems

Steps: 20, 21
Wave: 1
Depends on: none
Scope fence:
- Touch: scripts/templates/feedback/, scripts/templates/optimization/
- Never touch: commands/, agents/, skills/, all other template dirs

Session 6: Integration -- Docker, Transfer, Additional Templates

Steps: 22, 23, 24
Wave: 1
Depends on: none
Scope fence:
- Touch: scripts/templates/docker/, scripts/templates/transfer/, scripts/templates/domains/customer-support.md, scripts/templates/domains/devops-automation.md, scripts/templates/domains/legal-review.md, scripts/templates/domains/sales-intelligence.md, scripts/templates/domains/security-audit.md, scripts/templates/domains/README.md (update only)
- Never touch: commands/, agents/, skills/, scripts/templates/domains/content-pipeline.md, scripts/templates/domains/code-review.md, scripts/templates/domains/monitoring.md, scripts/templates/domains/research-synthesis.md, scripts/templates/domains/data-processing.md

Session 7: Skill Updates and References

Steps: 13, 19, 25
Wave: 2
Depends on: Session 3 (Step 13 references memory templates), Session 4 (Step 19 references orchestration templates)
Scope fence:
- Touch: skills/agent-system-design/SKILL.md, skills/agent-system-design/references/memory-patterns.md, skills/agent-system-design/references/autonomy-patterns.md, skills/agent-system-design/references/orchestration-patterns.md, skills/agent-system-design/references/governance-patterns.md, skills/agent-system-design/references/mcp-integrations.md
- Never touch: commands/, agents/, scripts/templates/

Session 8: Build Command Integration and Finalization

Steps: 8, 26, 27
Wave: 3
Depends on: Session 1 (renamed commands), Session 2 (domain templates), Session 7 (skill updates)
Scope fence:
- Touch: commands/build.md, .claude-plugin/plugin.json, CLAUDE.md, README.md
- Never touch: scripts/templates/, skills/, agents/

Execution Order

Wave 1: Session 1, Session 2, Session 3, Session 4, Session 5, Session 6 (parallel -- 6 independent sessions)
Wave 2: Session 7 (after Sessions 3 and 4 complete)
Wave 3: Session 8 (after Sessions 1, 2, and 7 complete)

Grouping rules applied

Steps sharing skills/agent-system-design/SKILL.md (13, 19) grouped into Session 7
Steps sharing commands/build.md (8, 26) grouped into Session 8
Steps sharing scripts/templates/heartbeat/ split by specific files (Session 3: core files, Session 4: context files)
Template directories with no overlap placed in separate Wave 1 sessions for maximum parallelism
Integration steps (Session 8) depend on content they reference being complete

Plan Quality Score

Dimension	Weight	Score	Notes
Structural integrity	0.15	85	Clear step ordering, dependency chain respected, waves logical
Step quality	0.20	80	Each step has concrete file lists, changes, verify commands. Some templates are necessarily described at concept level
Coverage completeness	0.20	90	All 5 phases covered, all spec requirements mapped to steps
Specification quality	0.15	78	No TBDs or TODOs; 4 assumptions documented. Template contents described in detail but are large
Risk & pre-mortem	0.15	80	6 risks identified with mitigations. Bash 3.2 and API assumptions addressed
Headless readiness	0.15	82	Every step has On failure + Checkpoint. Execution strategy enables parallel sessions
Weighted total	1.00	82	Grade: B+

Adversarial review:

Plan critic: APPROVE_WITH_NOTES -- 0 blockers, 2 major (template content descriptions are concept-level, not line-by-line; Steps 8 and 26 both modify build.md which creates ordering risk), 3 minor (some verify commands assume directory exists before testing file count; Session 7 depends on Sessions 3 and 4 which is correctly modeled but adds latency)
Scope guardian: ALIGNED -- All 27 steps map to spec requirements. No scope creep detected. All 5 phases covered. 4 commands, 2 agents, 2 skills, 10 domain templates, import/export, Docker -- all per spec. One potential gap: spec mentions "self-healing" in Phase 4 which is covered by Step 21 (self-healing.sh template).

Revisions

#	Finding	Severity	Resolution
1	Steps 8 and 26 both modify `commands/build.md` -- risk of merge conflict if run in parallel	major	Grouped into same Session 8 (Wave 3), ensuring sequential execution
2	Steps 13 and 19 both modify `skills/agent-system-design/SKILL.md` -- same risk	major	Grouped into same Session 7 (Wave 2), ensuring sequential execution
3	Heartbeat template directory shared between Steps 10 (core files) and 14 (context files)	minor	Scope fence explicitly lists which files each session may touch -- no overlap on specific files
4	Domain templates README.md updated in both Step 7 and Step 24	minor	Step 7 creates README, Step 24 updates it. Placed in different sessions but no conflict since Step 24 appends to existing content. Session 6 scope fence for Step 24 correctly notes "update only" for README
5	Verify commands assume directories exist	minor	Steps that create directories have verify commands that test file existence, not directory listing. If directory creation fails, the file write in the step itself would have failed first, triggering On failure

83 KiB Raw Blame History

Agent Factory -- Full Vision Realization

Context

Architecture Diagram

Codebase Analysis

Research Sources

Implementation Plan

Step 1: Rename plugin from agent-builder to agent-factory

Step 2: Create /agent-factory:deploy command

Step 3: Create deployment-advisor agent

Step 4: Create /agent-factory:evaluate command

Step 5: Create /agent-factory:status command

Step 6: Create managed-agents skill

Step 7: Create domain template infrastructure and first 5 templates

Step 8: Update build command to use domain templates and new features

Step 9: Create 3-tier memory templates (OpenClaw pattern)

Step 10: Create heartbeat and cron templates (OpenClaw + Paperclip patterns)

Step 11: Create proactive agent template with ADL/VFM guardrails

Step 12: Create isolated agentTurn template

Step 13: Update agent-system-design skill with OpenClaw pattern references

Step 14: Create heartbeat context injection templates (Paperclip pattern)

Step 15: Create goal hierarchy templates (Paperclip pattern)

Step 16: Create budget tracking templates (Paperclip pattern)

Step 17: Create governance and approval gate templates (Paperclip pattern)

Step 18: Create org-chart template (Paperclip pattern)

Step 19: Update agent-system-design skill with Paperclip pattern references

Step 20: Create feedback loop templates (Self-learning)

Step 21: Create pipeline optimization templates (Self-learning)

Step 22: Create Docker deployment templates

Step 23: Create import/export system

Step 24: Create 5 additional domain templates (total 10)

Step 25: Create MCP integration reference

Step 26: Update build command to integrate all Phase 2-5 features

Step 27: Update plugin.json, CLAUDE.md, README.md for v1.0

Failure recovery rules

Alternatives Considered

Test Strategy

Risks and Mitigations

Assumptions

Verification

Estimated Scope

Execution Strategy

File Dependency Analysis

Session 1: Foundation -- Rename and Commands

Session 2: Skills and Initial Templates

Session 3: OpenClaw Memory and Autonomy Patterns

Session 4: Paperclip Orchestration Patterns

Session 5: Self-Learning Systems

Session 6: Integration -- Docker, Transfer, Additional Templates

Session 7: Skill Updates and References

Session 8: Build Command Integration and Finalization

Execution Order

Grouping rules applied

Plan Quality Score

Revisions

83 KiB

Raw Blame History