Build command now includes: - Phase 3.5: Skills connection with pause/resume for custom creation - Phase 4.5: MCP server integration (connect existing, guide creation) - Resume mechanism via build-state.json for pausing mid-build - Explicit deployment target selection with trade-offs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
83 KiB
Agent Factory -- Full Vision Realization
Plan quality: B+ (82/100) -- APPROVE_WITH_NOTES
Generated by ultraplan-local v1.6.0 on 2026-04-11
WARNING: Plan has 4 unverified assumptions -- review before executing.
Context
The existing agent-builder plugin (v0.1.0) is a minimal prototype with 1 command
(/agent-builder:build), 1 agent (builder), 1 skill (agent-system-design), and
3 hook templates. Three commands are stubbed in the README but not implemented
(deploy, evaluate, status). The deployment-advisor agent and managed-agents
skill are mentioned but empty or missing.
This plan transforms agent-builder into Agent Factory: a comprehensive, guided system for building autonomous agent systems. The transformation spans 5 development phases, each building on the previous:
- Foundation (v0.2) -- rename, missing commands, deployment-advisor, managed-agents skill, domain templates
- OpenClaw patterns (v0.3) -- 3-tier memory, WAL protocol, Working Buffer, proactive agent with ADL/VFM, isolated agentTurn
- Paperclip patterns (v0.4) -- heartbeat with context injection, goal hierarchy, budget awareness, governance, org-chart
- Self-learning (v0.5) -- feedback loops, performance scoring, pipeline optimization, self-healing
- Full integration (v1.0) -- MCP integrations, Docker deployment, bundled templates, import/export
Spec: /Users/ktg/repos/agent-builder/.claude/ultraplan-spec-2026-04-11-agent-factory.md
Architecture Diagram
graph TD
subgraph "Plugin Structure (Agent Factory)"
PJ[".claude-plugin/plugin.json"]
CM["CLAUDE.md"]
subgraph "Commands (Phase 1)"
C1["commands/build.md"]
C2["commands/deploy.md -- NEW"]
C3["commands/evaluate.md -- NEW"]
C4["commands/status.md -- NEW"]
end
subgraph "Agents (Phase 1-3)"
A1["agents/builder.md"]
A2["agents/deployment-advisor.md -- NEW"]
end
subgraph "Skills (Phase 1-4)"
S1["skills/agent-system-design/"]
S2["skills/managed-agents/ -- NEW"]
end
subgraph "Templates (Phase 1-5)"
T1["scripts/templates/automation.sh"]
T2["scripts/templates/pre-tool-use.sh"]
T3["scripts/templates/post-tool-use.sh"]
T4["scripts/templates/memory/ -- NEW"]
T5["scripts/templates/heartbeat/ -- NEW"]
T6["scripts/templates/budget/ -- NEW"]
T7["scripts/templates/governance/ -- NEW"]
T8["scripts/templates/docker/ -- NEW"]
T9["scripts/templates/domains/ -- NEW"]
end
subgraph "References (Phase 1-4)"
R1["skills/agent-system-design/references/"]
R2["references: memory, heartbeat, budget, governance -- NEW"]
end
end
C1 --> A1
C2 --> A2
A1 --> T1 & T2 & T3 & T4 & T5
A2 --> T8
S2 --> R2
Codebase Analysis
- Tech stack: Claude Code plugin (markdown commands, agents, skills), bash 3.2 shell scripts, JSON config
- Key patterns: YAML frontmatter for agents/skills/commands,
{{PLACEHOLDER}}template variables in shell scripts,${CLAUDE_PLUGIN_ROOT}for intra-plugin paths, python3 for JSON parsing in hooks - Relevant files (all 15):
.claude-plugin/plugin.json-- plugin manifest (rename target)CLAUDE.md-- plugin instructionsREADME.md-- user-facing documentationcommands/build.md-- the only implemented commandagents/builder.md-- the only implemented agentskills/agent-system-design/SKILL.md-- main knowledge skillskills/agent-system-design/references/*.md-- 4 reference filesscripts/templates/*.sh-- 3 shell templates
- Reusable code:
agents/builder.md-- agent frontmatter format, tool list pattern, rules structurecommands/build.md-- command frontmatter format, phase-based workflow patternscripts/templates/pre-tool-use.sh-- hook script pattern (stdin JSON, python3 parsing, exit codes)scripts/templates/post-tool-use.sh-- audit logging patternscripts/templates/automation.sh-- headless runner patternskills/agent-system-design/references/feature-map.md-- capability mapping table format
- Conventions:
- Agent descriptions must include
<example>blocks for reliable trigger matching - Hook scripts use
exit 2to block,exit 0to allow - Templates use
{{PLACEHOLDER}}syntax (plain string replace, no engine) - All bash must be 3.2 compatible (no
declare -A, nomapfile, no|&)
- Agent descriptions must include
Research Sources
| Technology | Source | Key Findings | Confidence |
|---|---|---|---|
| OpenClaw 3-tier memory | Proactive Agent Skill + source analysis | SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold); WAL protocol writes before responding; Working Buffer captures in danger zone (60%+ context) | high |
| OpenClaw Heartbeat | Source code src/auto-reply/heartbeat.ts |
HEARTBEAT.md with YAML tasks block, interval tracking, emptiness detection skips API calls, ackMaxChars suppression |
high |
| OpenClaw Cron | Source code src/cron/service.ts |
systemEvent vs agentTurn types; isolated session target; startup catchup with stagger |
high |
| OpenClaw ADL/VFM | Proactive Agent Skill docs | Anti-Drift Limits (no fake intelligence, no unverifiable mods); Value-First Modification (score > 50 required); Priority: Stability > Explainability > Reusability > Scalability > Novelty | high |
| Paperclip Heartbeat | Source code server/src/services/heartbeat.ts |
Poll-based tickTimers(), 4 wakeup triggers, per-agent interval, concurrency control |
high |
| Paperclip Budget | Source code server/src/services/budgets.ts |
Post-hoc evaluateCostEvent(), SUM(cost_cents) per window, soft/hard thresholds, pause on exceed |
high |
| Paperclip Goals | Source code, DB schema | Simple parent_id FK on goals table, NOT recursive traversal at runtime |
high |
| Paperclip Org Chart | Source code, DB schema | agents.reportsTo self-referential FK + agents.role text field |
high |
| Paperclip Adapter | Source code packages/adapter-utils/src/types.ts |
execute(ctx) interface, 10 built-in adapters, Claude adapter uses CLI with --append-system-prompt-file |
high |
| Anthropic Billing API | Spec assumption | [ASSUMPTION] Endpoint and auth mechanism unverified | low |
Implementation Plan
Step 1: Rename plugin from agent-builder to agent-factory
- Files:
.claude-plugin/plugin.json,CLAUDE.md,README.md,commands/build.md - Changes:
.claude-plugin/plugin.json: Change"name": "agent-builder"to"name": "agent-factory", update description to mention "Agent Factory", bump version to"0.2.0"CLAUDE.md: Replace "Agent Builder Plugin" with "Agent Factory Plugin", update all references to "agent-builder" with "agent-factory"README.md: Replace title "# Agent Builder" with "# Agent Factory", update install command from/install agent-builderto/install agent-factory, update all/agent-builder:command prefixes to/agent-factory:, update plugin-dir path reference, update repository URL if neededcommands/build.md: Replace/agent-builder:buildwith/agent-factory:buildin the system prompt textskills/agent-system-design/SKILL.md: Update/agent-builder:buildreference to/agent-factory:build
- Reuses: Existing file structures, no new patterns needed
- Verify:
grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan-spec"→ expected: no matches (all renamed) - On failure: revert -- this is the foundation; all subsequent steps depend on correct naming
- Checkpoint:
git commit -m "feat!: rename plugin from agent-builder to agent-factory"
Step 2: Create /agent-factory:deploy command
- Files:
commands/deploy.md(new) - Changes: Create a new command file with frontmatter:
The command body should:description: Configure deployment for your agent system. Supports /schedule, cron/launchd, systemd, and Docker. argument-hint: "Optional: deployment target (schedule, local, vps, docker)" allowed-tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash", "Agent", "AskUserQuestion"]- Read the user's existing agent system (scan
.claude/agents/,.claude/skills/,CLAUDE.md) - Ask the user to choose a deployment target via AskUserQuestion:
/schedule(Claude Code native), Local (cron/launchd), VPS (systemd), Docker (compose), or use$ARGUMENTSif provided - For
/schedule: Generate a HEARTBEAT.md file with task entries parsed from the user's pipeline skills, generate automation wrapper reading${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh - For Local: Copy and customize
automation.shtemplate, generate cron/launchd config - For VPS: Generate systemd service and timer units, customize automation.sh
- For Docker: Generate
Dockerfileanddocker-compose.ymlfrom templates (Step 16) - Use the
deployment-advisoragent (Step 3) when the user needs guidance - Provide exact activation commands for the chosen target
- Read the user's existing agent system (scan
- Reuses:
commands/build.mdfor command frontmatter format and phase structure;scripts/templates/automation.shfor headless runner pattern;skills/agent-system-design/references/deployment-targets.mdfor target-specific guidance - Verify:
head -5 /Users/ktg/repos/agent-builder/commands/deploy.md→ expected: valid YAML frontmatter withdescription:field - On failure: revert -- command file must be valid YAML frontmatter
- Checkpoint:
git commit -m "feat(commands): add /agent-factory:deploy command"
Step 3: Create deployment-advisor agent
- Files:
agents/deployment-advisor.md(new) - Changes: Create agent file with frontmatter:
The agent body should:name: deployment-advisor description: | Use this agent when the user needs help choosing or configuring a deployment target for their agent system. <example> Context: User has built agents and wants to deploy user: "How should I deploy my agent system?" assistant: "I'll use the deployment-advisor to analyze your setup and recommend a target." <commentary> Deployment guidance request triggers the advisor. </commentary> </example> <example> Context: User wants to switch deployment targets user: "Can I move my agents from cron to Docker?" assistant: "I'll use the deployment-advisor to plan the migration." <commentary> Deployment migration request triggers the advisor. </commentary> </example> model: sonnet color: blue tools: ["Read", "Glob", "Grep", "Bash", "AskUserQuestion"]- Scan the user's project for existing agents, skills, hooks, settings, and automation files
- Assess requirements: always-on needed? team access? Computer Use? budget constraints?
- Recommend a deployment target using the decision guide from
${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/deployment-targets.md - Generate deployment-specific configuration files
- Include rules: never overwrite existing deployment config without confirmation, always verify generated scripts with
bash -n, always include rollback instructions
- Reuses:
agents/builder.mdfor agent file format and example block structure;skills/agent-system-design/references/deployment-targets.mdfor deployment decision logic - Verify:
python3 -c "import yaml; yaml.safe_load(open('/Users/ktg/repos/agent-builder/agents/deployment-advisor.md').read().split('---')[1])"→ expected: no error (valid YAML frontmatter) - On failure: retry -- fix YAML syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(agents): add deployment-advisor agent"
Step 4: Create /agent-factory:evaluate command
- Files:
commands/evaluate.md(new) - Changes: Create command with frontmatter:
The command body should:description: Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations. argument-hint: "Optional: focus area (security, deployment, memory, autonomy)" allowed-tools: ["Read", "Glob", "Grep", "Bash"]- Scan the project for all agent system components:
.claude/agents/*.md,.claude/skills/*/SKILL.md,.claude/hooks/*.shorhooks/*.sh,.claude/settings.json,CLAUDE.md,automation/orscripts/,memory/ordata/ - For each of the 22 capabilities from
${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md, check whether the user's project has the corresponding component - Score: count OK/Partial/Missing for each capability
- Output a capability matrix table showing: capability name, status (OK/Partial/Missing), what exists, what's needed
- Provide specific recommendations for filling gaps, ordered by impact
- If
$ARGUMENTSspecifies a focus area, expand that section with detailed guidance
- Scan the project for all agent system components:
- Reuses:
skills/agent-system-design/references/feature-map.mdfor the 22-capability checklist;commands/build.mdfor command frontmatter format - Verify:
head -5 /Users/ktg/repos/agent-builder/commands/evaluate.md→ expected: valid YAML frontmatter withdescription:field - On failure: revert -- command must have valid frontmatter
- Checkpoint:
git commit -m "feat(commands): add /agent-factory:evaluate command"
Step 5: Create /agent-factory:status command
- Files:
commands/status.md(new) - Changes: Create command with frontmatter:
The command body should:description: Quick status check of your agent infrastructure. Shows agents, skills, hooks, deployment, and recent activity. argument-hint: "" allowed-tools: ["Read", "Glob", "Grep", "Bash"]- Scan for agents:
Globfor.claude/agents/*.md, list each with name and model - Scan for skills:
Globfor.claude/skills/*/SKILL.mdand.claude/skills/*.md, list each with name - Scan for hooks:
Globfor.claude/hooks/*.shandhooks/*.sh, check if executable - Check settings: read
.claude/settings.jsonif it exists, summarize permissions and hook config - Check deployment: look for automation scripts, launchd plists, systemd units, docker-compose files, HEARTBEAT.md
- Check memory: look for
memory/MEMORY.md,data/run-state.json, SESSION-STATE.md - Check recent activity: read audit.log if it exists, show last 5 entries
- Output a compact status table with sections for each component type
- Flag any issues: missing hooks for deployed agents, agents without examples in description, skills without version
- Scan for agents:
- Reuses:
commands/build.mdfor command frontmatter format - Verify:
head -5 /Users/ktg/repos/agent-builder/commands/status.md→ expected: valid YAML frontmatter - On failure: revert -- command must have valid frontmatter
- Checkpoint:
git commit -m "feat(commands): add /agent-factory:status command"
Step 6: Create managed-agents skill
- Files:
skills/managed-agents/SKILL.md(new),skills/managed-agents/references/api-patterns.md(new) - Changes:
skills/managed-agents/SKILL.md: Create skill with frontmatter:
Body should cover:name: managed-agents description: | This skill should be used when the user asks about "managed agents", "Anthropic API agents", "cloud-hosted agents", "agent SDK", "deploying agents to the cloud", "serverless agents", "API-based agent deployment", "/v1/agents endpoint" version: 0.1.0- What managed agents are (Anthropic-hosted agent runtime)
- When to use them vs. local deployment (decision matrix)
- SDK patterns for TypeScript and Python
- Session management and persistence
- Budget and cost considerations for API-based agents
- Migration path from local to managed
skills/managed-agents/references/api-patterns.md: Reference doc with concrete code patterns for agent creation, session management, tool configuration, and error handling using@anthropic-ai/sdk
- Reuses:
skills/agent-system-design/SKILL.mdfor skill file format;skills/agent-system-design/references/deployment-targets.mdfor the managed agents section content - Verify:
head -5 /Users/ktg/repos/agent-builder/skills/managed-agents/SKILL.md→ expected: valid YAML frontmatter withname: managed-agents - On failure: revert -- skill must have valid frontmatter
- Checkpoint:
git commit -m "feat(skills): add managed-agents knowledge skill"
Step 7: Create domain template infrastructure and first 5 templates
-
Files:
scripts/templates/domains/content-pipeline.md(new),scripts/templates/domains/code-review.md(new),scripts/templates/domains/monitoring.md(new),scripts/templates/domains/research-synthesis.md(new),scripts/templates/domains/data-processing.md(new),scripts/templates/domains/README.md(new) -
Changes: Create 5 domain-specific pipeline templates. Each template is a plain markdown file with
{{PLACEHOLDER}}variables that the builder agent copies into the user's project. Each template file contains:- A header comment explaining the domain and what gets generated
- Agent definitions (researcher/writer/reviewer variants specialized to the domain)
- Pipeline skill template with domain-specific steps
- Recommended hook patterns for the domain
- Example CLAUDE.md sections relevant to the domain
Template 1: content-pipeline.md -- Content production (articles, newsletters, reports). Agents: content-researcher, content-writer, content-reviewer. Pipeline: research topic, draft article, review quality, publish.
Template 2: code-review.md -- Automated code review. Agents: code-analyzer, review-writer, standards-checker. Pipeline: analyze PR/diff, write review, check against standards, post review.
Template 3: monitoring.md -- System/service monitoring. Agents: monitor-checker, incident-reporter, remediation-advisor. Pipeline: check endpoints/logs, detect anomalies, report incidents, suggest fixes.
Template 4: research-synthesis.md -- Research and analysis. Agents: source-gatherer, synthesizer, fact-checker. Pipeline: gather sources, synthesize findings, verify claims, produce brief.
Template 5: data-processing.md -- Data transformation pipelines. Agents: data-validator, transformer, quality-checker. Pipeline: validate input, transform data, check output quality, save results.
README.md: Index listing all available templates with one-line descriptions, usage instructions for the builder agent, and template contribution guidelines.
All templates must use
{{PROJECT_DIR}},{{AGENT_NAME}},{{PIPELINE_NAME}},{{SCHEDULE}},{{DOMAIN}}as placeholder variables. -
Reuses:
agents/builder.mdfor agent definition format;commands/build.mdPhase 3-4 for pipeline creation patterns;skills/agent-system-design/references/pipeline-patterns.mdfor the 9-step pipeline template -
Verify:
ls /Users/ktg/repos/agent-builder/scripts/templates/domains/ | wc -l→ expected: 6 (5 templates + README) -
On failure: retry -- ensure all 5 templates have valid structure, then revert if still failing
-
Checkpoint:
git commit -m "feat(templates): add 5 domain-specific pipeline templates"
Step 8: Update build command to use domain templates and new features
- Files:
commands/build.md - Changes:
- Add a Phase 0 (pre-interview) that asks: "Would you like to start from a domain template? Available: content-pipeline, code-review, monitoring, research-synthesis, data-processing. Or choose 'custom' for a blank start."
- If template chosen: read
${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.mdand pre-populate Phase 1 design sketch with template roles and pipeline structure - Update Phase 6 (Deployment) to reference the new
/agent-factory:deploycommand instead of inline deployment logic. Add/scheduleas a deployment option. Change the question to offer all 5 targets:/schedule, Local (cron/launchd), Mac Mini (launchd), VPS (systemd), Docker - Update the summary at end to mention
/agent-factory:evaluateand/agent-factory:statusas next steps - Update all self-references from
/agent-builder:buildto/agent-factory:build
- Reuses: Existing Phase 1-7 structure in
commands/build.md;scripts/templates/domains/templates from Step 7 - Verify:
grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md→ expected: >= 3 references - On failure: revert -- build command is critical path
- Checkpoint:
git commit -m "feat(commands): integrate domain templates and new commands into build workflow"
Step 9: Create 3-tier memory templates (OpenClaw pattern)
- Files:
scripts/templates/memory/SESSION-STATE.md(new),scripts/templates/memory/DAILY-LOG.md(new),scripts/templates/memory/MEMORY.md(new),scripts/templates/memory/README.md(new) - Changes:
- SESSION-STATE.md template: Hot working memory. Contains
{{AGENT_NAME}}header, sections for Current Task, Context Window Usage (with 60% danger zone marker), Active Decisions, Pending Actions. WAL Protocol instruction: "Write important details HERE before responding to the user." Includes a Working Buffer section that activates when context usage exceeds 60%: "When context usage is above 60%, capture key exchanges in the Working Buffer before they are lost to compaction." - DAILY-LOG.md template: Warm daily capture. Filename pattern:
memory/{{YYYY-MM-DD}}.md. Sections: Date, Summary of Work, Decisions Made, Files Modified, Issues Encountered, Carry Forward (items for next session). Auto-rotation instruction: one file per day, builder generates the date-stamped filename. - MEMORY.md template: Cold long-term curated memory. Sections: Agent Identity, Key Learnings (manually curated from daily logs), Recurring Patterns, Known Issues, Project Context, Last Updated. Compaction Recovery instruction: "When recovering from compaction, read in order: SESSION-STATE.md first, then today's daily log, then MEMORY.md, then search all daily logs for relevant context."
- README.md: Explains the 3-tier memory pattern, how each tier works, when to read/write each tier, and the WAL protocol and Working Buffer protocol with concrete examples.
- SESSION-STATE.md template: Hot working memory. Contains
- Reuses: OpenClaw's proactive agent skill pattern (from research); existing
MEMORY.mdconvention in Claude Code - Verify:
ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l→ expected: 4 files - On failure: revert -- templates must be valid markdown with proper placeholders
- Checkpoint:
git commit -m "feat(templates): add 3-tier memory templates (OpenClaw pattern)"
Step 10: Create heartbeat and cron templates (OpenClaw + Paperclip patterns)
- Files:
scripts/templates/heartbeat/HEARTBEAT.md(new),scripts/templates/heartbeat/heartbeat-runner.sh(new),scripts/templates/heartbeat/README.md(new) - Changes:
- HEARTBEAT.md template: Agent heartbeat file following OpenClaw's format. Contains:
# Heartbeat: {{AGENT_NAME}} Read this file on each heartbeat. Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK. ## Tasks tasks: - name: {{TASK_1_NAME}} interval: {{TASK_1_INTERVAL}} prompt: "{{TASK_1_PROMPT}}" - name: {{TASK_2_NAME}} interval: {{TASK_2_INTERVAL}} prompt: "{{TASK_2_PROMPT}}" ## Context {{CONTEXT_NOTES}} - heartbeat-runner.sh template: Bash 3.2 compatible script that:
- Reads HEARTBEAT.md
- Implements emptiness detection (skip API call if file has only headers/empty items -- saves cost, from OpenClaw pattern)
- Parses task intervals using python3 (parse YAML tasks block, check last-run timestamps from a
.heartbeat-stateJSON file) - For each due task: invokes
claude -pwith the task prompt - Updates
.heartbeat-statewith new last-run timestamps - Implements startup catchup: on first run after downtime, runs up to 5 missed tasks with 5-second stagger between them (OpenClaw pattern)
- Suppresses responses shorter than 300 chars from output (ackMaxChars pattern)
- All date math uses
python3(not bash arithmetic) for portability Script must use#!/bin/bashand be bash 3.2 compatible: no associative arrays, no mapfile, no|&
- README.md: Documents the heartbeat pattern, explains
systemEventvsagentTurndistinction, explains emptiness detection cost saving, explains startup catchup logic, provides example cron entries for running the heartbeat runner at various intervals.
- HEARTBEAT.md template: Agent heartbeat file following OpenClaw's format. Contains:
- Reuses:
scripts/templates/automation.shfor the headless runner pattern; OpenClaw heartbeat source code patterns (emptiness detection, task parsing, ackMaxChars); Paperclip's heartbeat interval and wakeup trigger model - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax errors (likely 3.2 compatibility issues), then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add heartbeat templates with emptiness detection and catchup"
Step 11: Create proactive agent template with ADL/VFM guardrails
- Files:
scripts/templates/proactive/PROACTIVE-AGENT.md(new),scripts/templates/proactive/ADL-RULES.md(new),scripts/templates/proactive/VFM-SCORING.md(new),scripts/templates/proactive/README.md(new) - Changes:
- PROACTIVE-AGENT.md template: An agent template (with YAML frontmatter) for agents that can self-improve. Contains:
- Standard agent frontmatter with
model: sonnetand appropriate tools - "How you work" section describing the proactive cycle: observe environment, identify improvements, score proposed changes against VFM, implement if score > 50, log all decisions
- "Self-improvement protocol" section: before making any change to your own config, skills, or prompts, you MUST run VFM scoring. Changes that score <= 50 are logged but NOT implemented.
- "Rules" section including all ADL limits: no fake intelligence (do not pretend capabilities you lack), no unverifiable modifications (all changes must be testable), no novelty over stability (prefer proven approaches), stability > explainability > reusability > scalability > novelty
- "Self-healing" section: when encountering errors, try up to 5 different approaches before asking for help. Log each attempt with result.
- Standard agent frontmatter with
- ADL-RULES.md template: Anti-Drift Limits reference. Full list of constraints that prevent agent drift:
- No fake intelligence -- do not simulate capabilities
- No unverifiable modifications -- every change must be testable
- No novelty over stability -- proven approaches first
- No scope expansion without approval -- stay within defined boundaries
- No silent failures -- all errors must be logged
- Priority ordering: Stability > Explainability > Reusability > Scalability > Novelty
- VFM-SCORING.md template: Value-First Modification scoring guide. For any proposed self-modification:
- Frequency: How often does this issue occur? (0-25 points)
- Failure reduction: Does this fix real failures? (0-25 points)
- Burden reduction: Does this reduce human effort? (0-25 points)
- Cost savings: Does this reduce API/compute costs? (0-25 points)
- Total score: sum of above. Implement only if > 50.
- Logging format: date, proposed change, scores per dimension, total, decision (implement/defer)
- README.md: Explains the proactive agent pattern, when to use it, relationship to OpenClaw's proactive agent skill, examples of good vs bad self-modifications, how ADL and VFM work together.
- PROACTIVE-AGENT.md template: An agent template (with YAML frontmatter) for agents that can self-improve. Contains:
- Reuses:
agents/builder.mdfor agent frontmatter format; OpenClaw proactive agent skill patterns (ADL, VFM, self-healing attempts) - Verify:
ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l→ expected: 4 files - On failure: revert -- templates must be valid and complete
- Checkpoint:
git commit -m "feat(templates): add proactive agent templates with ADL/VFM guardrails"
Step 12: Create isolated agentTurn template
- Files:
scripts/templates/cron/agent-turn.sh(new),scripts/templates/cron/system-event.sh(new),scripts/templates/cron/README.md(new) - Changes:
- agent-turn.sh template: Bash 3.2 compatible script for true background autonomy (OpenClaw's
agentTurnpattern). Fires a full agent turn with its own isolated session:- Accepts
{{AGENT_NAME}},{{WORKING_DIR}},{{MAX_TURNS}}placeholders - Creates/resumes an isolated session using
claude --resumewith a task-specific session ID format:agent:{{AGENT_NAME}}:turn:$(date +%s) - Loads context from HEARTBEAT.md and SESSION-STATE.md before invoking
- Writes results to a dated log file
- Includes PID tracking for orphan detection (write PID to
.agent-turn.pid, check on next run) - Session rollover: after configurable number of turns (default 200) or age (default 72h), start a fresh session
- Accepts
- system-event.sh template: Lighter-weight script for injecting a text event into an existing session (OpenClaw's
systemEventpattern):- Accepts
{{SESSION_ID}},{{EVENT_TEXT}}placeholders - Uses
claude --resume {{SESSION_ID}} -p "{{EVENT_TEXT}}"to inject into existing session - Does NOT create a new session -- requires an active session
- Accepts
- README.md: Explains the difference between agentTurn (full background autonomy, isolated session) and systemEvent (inject into existing session), when to use each, session lifecycle and rollover, orphan detection pattern.
- agent-turn.sh template: Bash 3.2 compatible script for true background autonomy (OpenClaw's
- Reuses:
scripts/templates/automation.shfor headless runner pattern;scripts/templates/heartbeat/heartbeat-runner.shpattern from Step 10; OpenClaw cron service patterns (session targets, rollover, catchup) - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh→ expected: no syntax errors - On failure: retry -- fix bash 3.2 syntax issues, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add isolated agentTurn and systemEvent cron templates"
Step 13: Update agent-system-design skill with OpenClaw pattern references
- Files:
skills/agent-system-design/SKILL.md,skills/agent-system-design/references/memory-patterns.md(new),skills/agent-system-design/references/autonomy-patterns.md(new) - Changes:
- SKILL.md: Add new sections:
- "## Memory patterns" -- describe 3-tier memory, link to
${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md - "## Autonomy patterns" -- describe proactive agent, ADL/VFM, link to
${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md - "## Heartbeat scheduling" -- describe heartbeat pattern, link to templates
- Update "Getting started" to mention
/agent-factory:build(already renamed in Step 1) - Add to trigger phrases in description: "agent memory", "3-tier memory", "WAL protocol", "proactive agent", "self-improving agent", "heartbeat scheduling"
- "## Memory patterns" -- describe 3-tier memory, link to
- memory-patterns.md (new): Comprehensive reference on 3-tier memory:
- Architecture: hot (SESSION-STATE.md), warm (daily logs), cold (MEMORY.md)
- WAL Protocol: write before responding, prevents data loss on crashes
- Working Buffer Protocol: activate at 60% context usage, capture key exchanges
- Compaction Recovery: read order for resuming after context loss
- Template locations and placeholder variables
- autonomy-patterns.md (new): Reference on self-governing agent patterns:
- Proactive agent cycle: observe, identify, score (VFM), implement or defer
- ADL constraints with examples
- VFM scoring rubric with worked examples
- Self-healing protocol (5-10 attempts before escalation)
- Isolated agentTurn vs systemEvent
- When NOT to use proactive patterns (simple pipelines, human-in-the-loop workflows)
- SKILL.md: Add new sections:
- Reuses:
skills/agent-system-design/SKILL.mdexisting structure;skills/agent-system-design/references/feature-map.mdformat for reference docs; OpenClaw research brief patterns - Verify:
grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md→ expected: >= 3 - On failure: revert -- skill file must remain valid
- Checkpoint:
git commit -m "feat(skills): add memory and autonomy pattern references to agent-system-design"
Step 14: Create heartbeat context injection templates (Paperclip pattern)
- Files:
scripts/templates/heartbeat/context-packet.md(new),scripts/templates/heartbeat/wake-prompt.md(new) - Changes:
- context-packet.md template: Paperclip's "Memento Man" pattern -- a curated context payload injected on each heartbeat wakeup. Contains sections with
{{PLACEHOLDER}}variables:# Context Packet: {{AGENT_NAME}} Generated: {{TIMESTAMP}} ## Identity {{AGENT_IDENTITY}} ## Current Goals {{ACTIVE_GOALS}} ## Memory State {{MEMORY_SUMMARY}} ## Task Queue {{PENDING_TASKS}} ## Recent Events (last 24h) {{RECENT_EVENTS}} ## Wake Reason {{WAKE_REASON}} ## Budget Status Spent: {{BUDGET_SPENT}} / {{BUDGET_LIMIT}} ({{BUDGET_PERCENT}}%) {{BUDGET_WARNING}} - wake-prompt.md template: The prompt template composed for each heartbeat wakeup. Follows Paperclip's
bootstrapPromptTemplate+promptTemplatepattern:You are {{AGENT_NAME}}. You are waking up for a scheduled heartbeat. Read the context packet below carefully. It contains everything you need to know about your current state and pending work. {{CONTEXT_PACKET}} Your task for this beat: {{WAKE_REASON}} Rules: - Do NOT infer tasks from prior conversations - Only work on what is in the context packet and wake reason - If nothing needs attention, respond with HEARTBEAT_OK - Update SESSION-STATE.md before finishing
- context-packet.md template: Paperclip's "Memento Man" pattern -- a curated context payload injected on each heartbeat wakeup. Contains sections with
- Reuses:
scripts/templates/heartbeat/HEARTBEAT.mdfrom Step 10; Paperclip's context snapshot pattern from source code analysis - Verify:
ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l→ expected: 5 files total (HEARTBEAT.md, heartbeat-runner.sh, README.md from Step 10 + context-packet.md, wake-prompt.md from this step) - On failure: revert -- templates must have valid placeholder syntax
- Checkpoint:
git commit -m "feat(templates): add context injection templates (Paperclip heartbeat pattern)"
Step 15: Create goal hierarchy templates (Paperclip pattern)
- Files:
scripts/templates/goals/GOALS.md(new),scripts/templates/goals/goal-tracker.sh(new),scripts/templates/goals/README.md(new) - Changes:
- GOALS.md template: File-based goal hierarchy using Paperclip's simple
parent_idpattern but adapted for flat files. Format:
Each goal has: ID (hierarchical dot notation), description, parent reference, owner agent, status (active/pending/complete/blocked). The parent reference is a simple string match -- no recursive traversal needed (matching Paperclip's actual implementation vs. aspirational docs).# Goals: {{PROJECT_NAME}} ## Company Goals - [G1] {{COMPANY_GOAL_1}} - [G2] {{COMPANY_GOAL_2}} ## Project Goals - [G1.1] {{PROJECT_GOAL_1}} (parent: G1) - [G1.2] {{PROJECT_GOAL_2}} (parent: G1) - [G2.1] {{PROJECT_GOAL_3}} (parent: G2) ## Task Goals - [G1.1.1] {{TASK_GOAL_1}} (parent: G1.1, owner: {{AGENT_NAME}}, status: active) - [G1.1.2] {{TASK_GOAL_2}} (parent: G1.1, owner: {{AGENT_NAME}}, status: pending) - goal-tracker.sh template: Bash 3.2 compatible script that:
- Reads GOALS.md
- Uses python3 to parse goal entries and their parent/status/owner fields
- Reports: count by status, orphaned goals (parent doesn't exist), goals without owners
- Can update status:
./goal-tracker.sh complete G1.1.1marks goal as complete - Generates a goal summary for context injection (used by heartbeat context-packet)
- README.md: Explains goal hierarchy pattern, the simple parent_id approach (honest about it being flat, not recursive), how agents reference goals, integration with heartbeat context injection.
- GOALS.md template: File-based goal hierarchy using Paperclip's simple
- Reuses: Paperclip's goal hierarchy pattern (simple parent_id, not recursive); context packet integration from Step 14
- Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/goals/goal-tracker.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add goal hierarchy templates (Paperclip pattern)"
Step 16: Create budget tracking templates (Paperclip pattern)
- Files:
scripts/templates/budget/budget-hook.sh(new),scripts/templates/budget/BUDGET.md(new),scripts/templates/budget/budget-report.sh(new),scripts/templates/budget/README.md(new) - Changes:
- budget-hook.sh template: PostToolUse hook that logs cost events. Bash 3.2 compatible. Following Paperclip's post-hoc enforcement pattern:
- Reads tool call result from stdin (JSON via python3)
- Extracts usage data if available (token counts from Claude's response)
- Appends cost event to
budget/cost-events.jsonlwith timestamp, agent name, token counts, estimated cost - After logging, calls budget-check logic: reads
budget/BUDGET.mdfor policy, sums cost events for current window - If soft threshold (default 80%) reached: writes warning to stderr
- If hard threshold (100%) reached AND hard_stop enabled: writes block message, creates
budget/PAUSEDflag file - [ASSUMPTION] Cost estimation uses Anthropic API pricing. If API billing endpoint is available, query it instead of estimating from token counts.
- BUDGET.md template: Budget policy file:
# Budget Policy: {{PROJECT_NAME}} ## Company Budget - window: {{BUDGET_WINDOW}} # calendar_month or lifetime - limit: {{BUDGET_LIMIT_CENTS}} cents - warn_percent: 80 - hard_stop: true ## Agent Budgets - {{AGENT_NAME}}: {{AGENT_BUDGET_CENTS}} cents/{{BUDGET_WINDOW}} ## Notification - on_warn: log # log | file | hook - on_hard_stop: pause # pause | terminate - budget-report.sh template: Bash 3.2 compatible script that:
- Reads
budget/cost-events.jsonl - Uses python3 to aggregate costs by agent, by day, by window
- Compares against BUDGET.md policies
- Outputs a formatted report showing: total spend, per-agent spend, % of budget used, projection for current window
- Checks for PAUSED flag file and reports paused agents
- Reads
- README.md: Explains budget enforcement pattern, post-hoc vs pre-run checking (honest about limitations), how to integrate budget-hook.sh into settings.json PostToolUse, Anthropic API integration for accurate cost data (marked as [ASSUMPTION]).
- budget-hook.sh template: PostToolUse hook that logs cost events. Bash 3.2 compatible. Following Paperclip's post-hoc enforcement pattern:
- Reuses:
scripts/templates/post-tool-use.shfor PostToolUse hook pattern; Paperclip's budget enforcement pattern (post-hoc, soft/hard thresholds, pause on exceed) - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-hook.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-report.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add budget tracking templates (Paperclip pattern)"
Step 17: Create governance and approval gate templates (Paperclip pattern)
- Files:
scripts/templates/governance/GOVERNANCE.md(new),scripts/templates/governance/approval-gate.sh(new),scripts/templates/governance/README.md(new) - Changes:
- GOVERNANCE.md template: Governance policy file. "Autonomy is a privilege you grant" (Paperclip philosophy). Contains:
# Governance: {{PROJECT_NAME}} ## Autonomy Levels - Level 0: Full manual approval (all tool calls require human OK) - Level 1: Auto-approve safe operations (Read, Glob, Grep) - Level 2: Auto-approve file operations (+ Write, Edit within project) - Level 3: Auto-approve all except destructive (+ Bash non-destructive) - Level 4: Full autonomy with hooks as guardrails Current level: {{AUTONOMY_LEVEL}} ## Approval Gates Gates are checkpoints where the agent MUST pause and request human approval. - {{GATE_1_NAME}}: {{GATE_1_CONDITION}} Action: {{GATE_1_ACTION}} # pause | notify | log - {{GATE_2_NAME}}: {{GATE_2_CONDITION}} Action: {{GATE_2_ACTION}} ## Escalation Rules - Budget exceeded: pause agent, notify via {{NOTIFICATION_METHOD}} - Error threshold: after {{ERROR_THRESHOLD}} consecutive errors, pause agent - Unknown tool call: block and log - Scope violation: block and notify ## Audit Requirements - All tool calls logged to audit.log - Budget events logged to cost-events.jsonl - Approval decisions logged to approvals.log - Retention: {{LOG_RETENTION_DAYS}} days - approval-gate.sh template: PreToolUse hook that implements approval gates. Bash 3.2 compatible:
- Reads GOVERNANCE.md to get current autonomy level and gate definitions
- Uses python3 to parse the governance config
- Based on autonomy level, decides whether to auto-approve or require approval
- For gated operations: writes an approval request to
governance/pending-approvals.jsonlwith timestamp, tool name, tool input summary, and status=pending - Checks
governance/approval-responses.jsonlfor a matching response - If no response within timeout: blocks (exit 2) with message "Approval required. Check governance/pending-approvals.jsonl"
- If approved: allows (exit 0) and logs to approvals.log
- If denied: blocks (exit 2) and logs denial
- README.md: Explains governance model, autonomy levels, how gates map to Claude Code hooks, how to configure notification channels, comparison with Paperclip's
approvalstable approach.
- GOVERNANCE.md template: Governance policy file. "Autonomy is a privilege you grant" (Paperclip philosophy). Contains:
- Reuses:
scripts/templates/pre-tool-use.shfor PreToolUse hook pattern; Paperclip's approval mechanism and autonomy model - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/governance/approval-gate.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add governance and approval gate templates (Paperclip pattern)"
Step 18: Create org-chart template (Paperclip pattern)
- Files:
scripts/templates/org-chart/ORG-CHART.md(new),scripts/templates/org-chart/org-manager.sh(new),scripts/templates/org-chart/README.md(new) - Changes:
- ORG-CHART.md template: File-based org chart using Paperclip's simple
reportsTopattern. Format:# Organization: {{ORG_NAME}} ## Agents | Agent | Role | Reports To | Status | Budget | |-------|------|-----------|--------|--------| | {{AGENT_1}} | {{ROLE_1}} | (board) | active | {{BUDGET_1}} | | {{AGENT_2}} | {{ROLE_2}} | {{AGENT_1}} | active | {{BUDGET_2}} | | {{AGENT_3}} | {{ROLE_3}} | {{AGENT_1}} | active | {{BUDGET_3}} | ## Delegation Rules - Board (human) → top-level agents: task assignment, goal setting - Manager agents → direct reports: task decomposition, delegation - Cross-team requests: route through common ancestor in org chart - Escalation: up the reporting chain to the first agent with authority ## Human Override The human operator is the "board of directors" with override authority on all decisions. Any agent can be paused, redirected, or terminated by the human at any time. - org-manager.sh template: Bash 3.2 compatible script that:
- Reads ORG-CHART.md and the
.claude/agents/directory - Uses python3 to parse the org chart table
- Validates: all agents listed exist as
.claude/agents/*.mdfiles, no circular reporting chains, all agents have a role - Can add an agent:
./org-manager.sh add agent-name "Role" reports-to-agent - Can remove an agent:
./org-manager.sh remove agent-name(reassigns direct reports to parent) - Generates a text-based org tree visualization
- Reads ORG-CHART.md and the
- README.md: Explains org chart pattern, Paperclip's simple
reportsToFK approach, how delegation flows through the hierarchy, cross-team routing, human override authority.
- ORG-CHART.md template: File-based org chart using Paperclip's simple
- Reuses: Paperclip's org chart implementation (reportsTo FK, role field);
scripts/templates/goals/goal-tracker.shpattern for markdown table parsing with python3 - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/org-chart/org-manager.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add org-chart template (Paperclip pattern)"
Step 19: Update agent-system-design skill with Paperclip pattern references
- Files:
skills/agent-system-design/SKILL.md,skills/agent-system-design/references/orchestration-patterns.md(new),skills/agent-system-design/references/governance-patterns.md(new) - Changes:
- SKILL.md: Add new sections:
- "## Orchestration patterns" -- describe heartbeat scheduling, context injection, goal hierarchy, org chart. Link to
${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md - "## Governance patterns" -- describe autonomy levels, approval gates, budget enforcement. Link to
${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md - Add to trigger phrases: "heartbeat scheduling", "goal hierarchy", "budget tracking", "approval gates", "governance", "org chart", "multi-agent coordination", "agent budget"
- Update the "System components" table to include: Memory (3-tier), Heartbeat, Goals, Budget, Governance, Org Chart
- "## Orchestration patterns" -- describe heartbeat scheduling, context injection, goal hierarchy, org chart. Link to
- orchestration-patterns.md (new): Reference covering:
- Heartbeat scheduling with context injection
- Goal hierarchy (simple parent_id, not recursive)
- Org chart and delegation
- Task checkout via file locking (write
task.lockwith agent name, check before claiming) - Session persistence (task-specific session IDs)
- Comparison matrix: OpenClaw cron vs Paperclip heartbeat vs Claude Code /schedule
- governance-patterns.md (new): Reference covering:
- Autonomy levels (0-4 scale)
- Approval gates and human oversight
- Budget enforcement (post-hoc)
- Audit trail requirements
- Error threshold and automatic pause
- Paperclip's "autonomy is a privilege" philosophy
- SKILL.md: Add new sections:
- Reuses:
skills/agent-system-design/SKILL.mdexisting structure; Step 13 pattern additions;skills/agent-system-design/references/security-patterns.mdformat - Verify:
grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md→ expected: >= 5 - On failure: revert -- skill file must remain valid
- Checkpoint:
git commit -m "feat(skills): add orchestration and governance pattern references"
Step 20: Create feedback loop templates (Self-learning)
- Files:
scripts/templates/feedback/FEEDBACK.md(new),scripts/templates/feedback/feedback-collector.sh(new),scripts/templates/feedback/performance-scorer.sh(new),scripts/templates/feedback/README.md(new) - Changes:
- FEEDBACK.md template: Feedback tracking file:
Patterns column captures recurring issues for optimization.# Feedback Log: {{PROJECT_NAME}} ## Format Each entry records the outcome of a pipeline run and user feedback. | Date | Pipeline | Agent | Score | Issue | Resolution | Pattern | |------|----------|-------|-------|-------|------------|---------| - feedback-collector.sh template: PostToolUse hook variant that, after pipeline completion (detected by checking if the pipeline skill's final step ran), prompts for feedback collection:
- Reads the pipeline output
- Reads the reviewer score (if available)
- Appends an entry to FEEDBACK.md with: date, pipeline name, participating agents, reviewer score, identified issues
- Detects patterns: if the same issue type appears 3+ times, flags it as a recurring pattern
- All file operations use python3 for JSON/CSV parsing, bash 3.2 compatible wrapper
- performance-scorer.sh template: Standalone scoring script:
- Reads FEEDBACK.md and cost-events.jsonl
- Uses python3 to compute per-agent metrics: average reviewer score, error rate, cost per run, improvement trend (last 10 vs previous 10)
- Outputs a performance report with recommendations: which agents need prompt tuning, which are most cost-effective, which have degrading performance
- Flags agents scoring below configurable threshold (default: 60/100) for prompt review
- README.md: Explains feedback loop pattern, how to integrate with existing pipelines, how scoring informs self-improvement (connects to VFM from Step 11), example of a feedback-driven prompt iteration cycle.
- FEEDBACK.md template: Feedback tracking file:
- Reuses:
scripts/templates/post-tool-use.shfor hook pattern;scripts/templates/proactive/VFM-SCORING.mdfrom Step 11 for scoring methodology - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add feedback loop and performance scoring templates"
Step 21: Create pipeline optimization templates (Self-learning)
- Files:
scripts/templates/optimization/pipeline-optimizer.sh(new),scripts/templates/optimization/self-healing.sh(new),scripts/templates/optimization/README.md(new) - Changes:
- pipeline-optimizer.sh template: Script that analyzes pipeline performance data and suggests optimizations. Bash 3.2 compatible:
- Reads feedback/FEEDBACK.md and budget/cost-events.jsonl
- Uses python3 to identify:
- Bottleneck agents (highest cost, lowest score)
- Unnecessary revision loops (reviewer always passes → remove revision loop)
- Underutilized agents (rarely invoked → consider merging roles)
- Cost outliers (runs that cost 3x+ average → investigate prompt efficiency)
- Generates
optimization/RECOMMENDATIONS.mdwith specific, actionable suggestions - Each recommendation includes VFM pre-score (from Step 11) to help decide whether to implement
- Does NOT auto-implement changes -- recommendations only. Human or proactive agent decides.
- self-healing.sh template: Error recovery script that runs after pipeline failures:
- Reads the error log (from audit.log or pipeline output)
- Categorizes the error: timeout, permission denied, tool not found, API error, content quality
- For each category, applies a recovery strategy:
- Timeout: retry with reduced max_turns
- Permission denied: check hooks and settings, log which permission was missing
- Tool not found: check MCP server status, log missing tool
- API error: retry with backoff (5s, 15s, 45s -- max 3 attempts)
- Content quality: log for feedback loop, do not retry
- Tracks attempt count (max 5 per OpenClaw pattern), escalates after max attempts
- All recovery attempts logged to
optimization/healing-log.jsonl
- README.md: Explains pipeline optimization and self-healing patterns, how they connect to feedback loops (Step 20) and VFM scoring (Step 11), when to use auto-healing vs manual intervention, safety limits on auto-recovery.
- pipeline-optimizer.sh template: Script that analyzes pipeline performance data and suggests optimizations. Bash 3.2 compatible:
- Reuses:
scripts/templates/proactive/VFM-SCORING.mdfor scoring recommendations;scripts/templates/feedback/performance-scorer.shoutput format; OpenClaw's 5-attempt self-healing pattern - Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add pipeline optimization and self-healing templates"
Step 22: Create Docker deployment templates
- Files:
scripts/templates/docker/Dockerfile(new),scripts/templates/docker/docker-compose.yml(new),scripts/templates/docker/docker-entrypoint.sh(new),scripts/templates/docker/README.md(new) - Changes:
- Dockerfile template:
FROM node:22-slim # Install Claude Code CLI RUN npm install -g @anthropic-ai/claude-code # Create agent user (non-root) RUN useradd -m -s /bin/bash agent # Copy project files WORKDIR /home/agent/project COPY . . RUN chown -R agent:agent /home/agent USER agent # Set up environment ENV ANTHROPIC_API_KEY={{ANTHROPIC_API_KEY}} ENV HOME=/home/agent ENTRYPOINT ["./docker-entrypoint.sh"] - docker-compose.yml template:
version: "3.8" services: agent: build: . container_name: {{PROJECT_NAME}}-agent restart: unless-stopped volumes: - ./data:/home/agent/project/data - ./memory:/home/agent/project/memory - ./budget:/home/agent/project/budget - ./logs:/home/agent/project/logs environment: - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} - AGENT_NAME={{AGENT_NAME}} - HEARTBEAT_INTERVAL={{HEARTBEAT_INTERVAL}} env_file: - .env - docker-entrypoint.sh template: Bash 3.2 compatible entry script:
- Validates required environment variables (ANTHROPIC_API_KEY)
- Runs the heartbeat-runner.sh in a loop with the configured interval
- Graceful shutdown: trap SIGTERM, finish current run, exit cleanly
- Health check: write timestamp to
/tmp/agent-healthon each successful heartbeat
- README.md: Build and run instructions, volume mount explanation (data, memory, budget, logs persist across container restarts), environment variables reference, security considerations (never bake API keys into image), how to use with docker-compose up/down.
- Dockerfile template:
- Reuses:
scripts/templates/heartbeat/heartbeat-runner.shas the main loop inside the container;scripts/templates/automation.shpattern; Paperclip's Docker deployment approach (from research) - Verify:
test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/Dockerfile && test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/docker-compose.yml→ expected: both exist - On failure: revert -- Docker templates must be complete and valid
- Checkpoint:
git commit -m "feat(templates): add Docker deployment templates"
Step 23: Create import/export system
- Files:
scripts/templates/transfer/export-system.sh(new),scripts/templates/transfer/import-system.sh(new),scripts/templates/transfer/MANIFEST.md(new),scripts/templates/transfer/README.md(new) - Changes:
- export-system.sh template: Bash 3.2 compatible script that packages an agent system into a tarball:
- Accepts
{{PROJECT_DIR}}and optional{{EXPORT_NAME}}placeholders - Generates a MANIFEST.md listing all exported components with versions and checksums
- Collects:
.claude/agents/*.md,.claude/skills/*/SKILL.md,.claude/hooks/*.shorhooks/*.sh,.claude/settings.json,CLAUDE.md,HEARTBEAT.md,GOALS.md,GOVERNANCE.md,ORG-CHART.md,BUDGET.md,automation/scripts - Does NOT export:
.env,*.local.*,audit.log,cost-events.jsonl,memory/(runtime state),.git/ - Creates tarball:
agent-system-{{EXPORT_NAME}}-{{DATE}}.tar.gz - Outputs: tarball path, component count, total size
- Accepts
- import-system.sh template: Bash 3.2 compatible script that imports a tarball into a project:
- Accepts tarball path as argument
- Reads MANIFEST.md from the tarball to show what will be imported
- Checks for conflicts: existing files that would be overwritten
- If conflicts: lists them and exits with instructions (user must confirm or use --force)
- Extracts tarball contents into the project directory
- Replaces
{{PLACEHOLDER}}variables in imported files with project-specific values (reads from a config file or prompts) - Makes all
.shfiles executable - Validates: all agent files have valid YAML frontmatter, all hook scripts pass
bash -n - Outputs: imported component list, any warnings
- MANIFEST.md template: Format for the manifest file included in exports:
# Agent System Export Exported: {{DATE}} Source: {{PROJECT_NAME}} Version: {{VERSION}} ## Components | Type | File | Checksum | |------|------|----------| | agent | .claude/agents/{{NAME}}.md | {{SHA256}} | | skill | .claude/skills/{{NAME}}/SKILL.md | {{SHA256}} | ... ## Requirements - Claude Code version: >= {{MIN_VERSION}} - Required MCP servers: {{MCP_LIST}} - Required tools: {{TOOL_LIST}} - README.md: Explains import/export workflow, what is and isn't included, how to customize after import, round-trip verification process.
- export-system.sh template: Bash 3.2 compatible script that packages an agent system into a tarball:
- Reuses: Existing file structure conventions; template placeholder pattern
- Verify:
bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/export-system.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/import-system.sh→ expected: no syntax errors - On failure: retry -- fix bash syntax, then revert if still failing
- Checkpoint:
git commit -m "feat(templates): add import/export system for agent systems"
Step 24: Create 5 additional domain templates (total 10)
-
Files:
scripts/templates/domains/customer-support.md(new),scripts/templates/domains/devops-automation.md(new),scripts/templates/domains/legal-review.md(new),scripts/templates/domains/sales-intelligence.md(new),scripts/templates/domains/security-audit.md(new) -
Changes: Template 6: customer-support.md -- Customer support automation. Agents: ticket-classifier, response-drafter, escalation-checker. Pipeline: classify incoming ticket, draft response, check if escalation needed, send or escalate.
Template 7: devops-automation.md -- DevOps pipeline automation. Agents: deploy-checker, incident-detector, runbook-executor. Pipeline: monitor deployments, detect incidents, execute runbook steps, report status.
Template 8: legal-review.md -- Legal document review. Agents: clause-extractor, risk-assessor, compliance-checker. Pipeline: extract key clauses, assess risks, check compliance requirements, produce review summary.
Template 9: sales-intelligence.md -- Sales research and intelligence. Agents: prospect-researcher, pitch-customizer, follow-up-tracker. Pipeline: research prospect, customize pitch materials, track follow-up actions.
Template 10: security-audit.md -- Security posture assessment. Agents: config-scanner, vulnerability-checker, remediation-advisor. Pipeline: scan configurations, check for vulnerabilities, advise on remediation, generate audit report.
Update
scripts/templates/domains/README.mdto include all 10 templates.Each template follows the same format as Step 7: header comment, domain-specialized agent definitions, pipeline skill template, recommended hooks, example CLAUDE.md sections. All use
{{PLACEHOLDER}}variables. -
Reuses:
scripts/templates/domains/*.mdfrom Step 7 as format reference;agents/builder.mdfor agent format;commands/build.mdPhase 3-4 patterns -
Verify:
ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l→ expected: 11 (10 templates + README) -
On failure: retry -- ensure all templates follow established format, then revert if still failing
-
Checkpoint:
git commit -m "feat(templates): add 5 more domain templates (10 total)"
Step 25: Create MCP integration reference
- Files:
skills/agent-system-design/references/mcp-integrations.md(new) - Changes: Create a reference document for MCP server integrations commonly used with agent systems:
- Communication MCP servers: Slack (
@anthropic-ai/mcp-server-slack), GitHub, Linear -- with.mcp.jsonconfiguration examples - Data MCP servers: PostgreSQL, SQLite, filesystem -- for agent data storage
- Browser MCP servers: Playwright (
@anthropic-ai/mcp-server-playwright) -- for web interaction agents - Custom MCP servers: How to build an MCP server that agents can use, using
@anthropic-ai/sdk - For each server: purpose,
.mcp.jsonentry format, which agent types benefit from it, security considerations - Integration with Agent Factory: how the builder agent configures
.mcp.jsonduring Phase 3, how the deployment command handles MCP server availability on different targets
- Communication MCP servers: Slack (
- Reuses:
skills/agent-system-design/references/feature-map.mdformat; existing.mcp.jsonreferences in the feature map - Verify:
test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md→ expected: file exists - On failure: skip -- MCP reference is supplementary, not blocking
- Checkpoint:
git commit -m "docs(skills): add MCP integration reference"
Step 26: Update build command to integrate all Phase 2-5 features
- Files:
commands/build.md - Changes: Extend the build workflow with new optional phases (user can skip any):
- Phase 2.5 (after Operating Manual, before Agent Team): Memory Setup -- Ask if user wants 3-tier memory. If yes, generate SESSION-STATE.md, MEMORY.md, and memory/ directory from templates. Configure daily log rotation.
- Phase 3.5 (after Agent Team): Skills & Custom Components -- Ask if agents need specialized skills beyond what was generated. For each: check if a matching skill exists (Glob for
.claude/skills/). If yes, wire it up. If no, offer two options: (a) generate a skill skeleton from template, (b) explain how to create one manually and offer to pause. If user says "I need to build this first", save build state tobuild-state.json(current phase, completed phases, all user choices so far) and say "Run/agent-factory:build --resumewhen ready." On resume: read build-state.json and continue from where we left off. - Phase 3.7 (after Skills): Proactive Agent -- Ask if any agents should be proactive (self-improving). If yes, add ADL/VFM rules from proactive templates. Show VFM scoring example.
- Phase 4.5 (after Pipeline): Integrations & MCP Servers -- Critical phase for connecting external services. Ask: "What external services do your agents need? (Slack, GitHub, databases, APIs, etc.)". For each service:
- Check if MCP server exists: search known servers (
@anthropic-ai/mcp-server-*, community servers) - If exists: generate
.mcp.jsonentry with correct config, explain what env vars are needed - If not exists: explain what an MCP server is, show the 3 options: (a) use an existing community server (provide search tips), (b) create a custom one (show skeleton + link to
/mcp-builderskill or MCP docs), (c) use Bash tool directly (simpler but less structured) - If user needs to create an MCP server: save build state and pause (same resume mechanism as Phase 3.5)
- After all integrations configured: validate
.mcp.jsonsyntax, test connectivity where possible
- Check if MCP server exists: search known servers (
- Phase 5.5 (after Security): Governance -- Ask autonomy level (0-4). Generate GOVERNANCE.md with approval gates matching the chosen level. Integrate budget tracking if user wants it.
- Phase 5.7: Goals and Org Chart -- For multi-agent systems (3+ agents): generate GOALS.md and ORG-CHART.md. Define reporting structure and goal hierarchy.
- Phase 6 update: Deployment Target Selection -- Present ALL deployment options with clear trade-offs:
/schedule(Claude Code native) -- simplest, no infra needed, but requires Claude Code running- Local cron/launchd -- runs without Claude Code open, but tied to one machine
- VPS systemd -- always-on, remote, but needs server access
- Docker -- portable, isolated, reproducible, but needs Docker installed User MUST choose at least one. For each chosen target: generate config from templates, provide exact activation commands, explain how to verify it's running.
- Phase 7 update: After test run, offer to set up feedback loop (performance scoring template). Show how to run pipeline-optimizer.sh after first week.
- Resume mechanism:
build-state.jsonformat:{ "phase": "4.5", "completed": ["1","2","3","3.5","4"], "choices": { "template": "monitoring", "agents": [...], "memory": true, ... }, "paused_reason": "User creating custom MCP server" }. On--resume: read state, print summary of what's done, continue from saved phase. - Summary update: Include all new components in the final summary (memory, governance, goals, org-chart, budget, heartbeat, MCP integrations, deployment target).
- Reuses: Existing Phase 1-7 structure; all templates from Steps 9-18
- Verify:
wc -l /Users/ktg/repos/agent-builder/commands/build.md→ expected: significantly more lines than original (was ~390, should be ~600+) - On failure: revert -- build command is critical path, must be valid
- Checkpoint:
git commit -m "feat(commands): integrate all Phase 2-5 features into build workflow"
Step 27: Update plugin.json, CLAUDE.md, README.md for v1.0
- Files:
.claude-plugin/plugin.json,CLAUDE.md,README.md - Changes:
- plugin.json: Bump version to
"1.0.0". Update description: "Build and manage autonomous agent systems with Claude Code. Guided workflow with 3-tier memory, heartbeat scheduling, budget tracking, governance, org-chart, 10 domain templates, and import/export. Inspired by OpenClaw and Paperclip patterns." Add keywords:"memory","heartbeat","budget","governance","org-chart","templates","import","export". Update repository URL from agent-builder to agent-factory. - CLAUDE.md: Update to reflect full Agent Factory capabilities:
- Update "What this plugin does" to describe the full system
- Update "Plugin structure" to list all 4 commands, 2 agents, 2 skills, and all template directories
- Add a "Template directories" section listing: memory/, heartbeat/, proactive/, cron/, goals/, budget/, governance/, org-chart/, docker/, domains/, transfer/, feedback/, optimization/
- README.md: Complete rewrite reflecting Agent Factory v1.0:
- New description emphasizing the full vision
- Updated command table with all 4 commands
- Feature list: 3-tier memory, heartbeat scheduling, proactive agents with ADL/VFM, goal hierarchy, budget tracking, governance and approval gates, org chart, 10 domain templates, Docker deployment, import/export
- Architecture overview showing how patterns layer
- Quick start guide
- Pattern reference section linking to skill references
- Version history
- plugin.json: Bump version to
- Reuses: Existing file structures; all features from Steps 1-26
- Verify:
python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'"→ expected: no error - On failure: revert -- manifest must be valid JSON with correct version
- Checkpoint:
git commit -m "feat: Agent Factory v1.0.0 — full vision realized"
Failure recovery rules
- On failure: revert -- undo this step's changes (
git checkout -- {files}), do NOT proceed - On failure: retry -- attempt once more with the alternative approach described, then revert if still failing
- On failure: skip -- this step is non-critical; continue to next step and note the skip
- On failure: escalate -- stop execution entirely; the issue requires human judgment
- Checkpoint -- after each step succeeds, commit changes so subsequent failures cannot corrupt completed work
Alternatives Considered
| Approach | Pros | Cons | Why rejected |
|---|---|---|---|
| Template engine (Handlebars/EJS) | Richer templates, conditionals, loops | External dependency, added complexity | Spec requires plain string replace with {{PLACEHOLDER}} -- no engine |
| SQLite for budget/goals/org-chart | Structured queries, atomic operations | External dependency (sqlite3 binary), not human-readable | File-based approach is Claude Code-native, editable by humans and agents |
| Node.js scripts instead of bash | Richer JSON handling, async support | Requires Node.js installation, bash 3.2 constraint is for generated scripts | Bash with python3 for JSON is sufficient and more portable |
| Single monolithic build command | Simpler mental model, one entry point | Too long, hard to test phases independently | Separate commands allow modular use; build orchestrates |
| Pre-run budget checking (like a reservation system) | Prevents overspend before it happens | Requires persistent service or lock file coordination | Paperclip's post-hoc approach is proven robust enough in practice |
| Vector memory (sqlite-vec like OpenClaw) | Better semantic search | External dependency, complexity far exceeds file-based approach | Spec explicitly excludes vector/embedding memory -- stay file-based |
Test Strategy
- Framework: No test framework in this project. All verification is via shell commands.
- Existing patterns: Manual verification via
bash -nfor shell scripts, YAML parsing validation for agent/skill frontmatter,grepfor content verification. - Verification approach: Each step includes a concrete verify command. Additionally:
- All
.shtemplates verified withbash -n(syntax check) - All agent
.mdfiles verified for valid YAML frontmatter via python3 yaml parser - All command
.mdfiles verified fordescription:field in frontmatter - Import/export round-trip tested: export a test system, import into a clean directory, verify all components present
- Domain templates verified: builder agent can read and apply placeholder substitution
- All
Risks and Mitigations
| Priority | Risk | Location | Impact | Mitigation |
|---|---|---|---|---|
| High | Anthropic billing API may not be accessible or may have different auth | scripts/templates/budget/budget-hook.sh |
Budget tracking limited to token-based estimation | [ASSUMPTION] marked in template; fallback to token count estimation with configurable pricing |
| High | /schedule API may change |
commands/deploy.md, heartbeat templates |
Heartbeat scheduling breaks | Templates use standard claude -p invocation which is stable; /schedule is one option among cron/launchd/systemd/Docker |
| Medium | Bash 3.2 compatibility for complex scripts | All .sh templates |
Scripts fail on Intel Mac | Every script verified with bash -n; python3 used for all complex operations (JSON, YAML, date math) |
| Medium | 27-step plan scope is large | All files | Execution takes multiple sessions | Execution strategy groups into independent waves; each step is independently committable |
| Low | Docker template may need updates for newer Claude Code versions | scripts/templates/docker/ |
Docker deployment breaks | Dockerfile uses node:22-slim and npm install -g which auto-updates |
| Low | Domain templates may not match all user domains | scripts/templates/domains/ |
Users need custom templates | 10 templates cover common domains; builder agent can create custom templates |
Assumptions
| # | Assumption | Why unverifiable | Impact if wrong |
|---|---|---|---|
| 1 | Anthropic billing API exists and is accessible with standard API key | No docs found confirming exact endpoint | Budget tracking falls back to token-based estimation; budget-hook.sh needs manual cost configuration |
| 2 | /schedule trigger interface is stable enough to build on |
Claude Code internal API, no stability guarantee | Heartbeat templates still work with cron/launchd/systemd; /schedule is optional deployment target |
| 3 | Docker deployment should use docker-compose.yml with Dockerfile | Spec assumption, no user confirmation | Minor: can generate either format; both templates provided |
| 4 | claude --resume with custom session IDs works for isolated agent turns |
Based on CLI docs, not tested with custom session key formats | agentTurn template may need session ID format adjustment |
WARNING: Plan has 4 unverified assumptions -- review before executing.
Verification
End-to-end checks that prove the plan was implemented correctly:
grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan" | wc -l→ expected: 0 (all renamed)python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); print(d['name'], d['version'])"→ expected:agent-factory 1.0.0ls /Users/ktg/repos/agent-builder/commands/ | sort→ expected:build.md deploy.md evaluate.md status.mdls /Users/ktg/repos/agent-builder/agents/ | sort→ expected:builder.md deployment-advisor.mdls /Users/ktg/repos/agent-builder/skills/ | sort→ expected:agent-system-design managed-agentsls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l→ expected: 11ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l→ expected: 4ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l→ expected: 5ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l→ expected: 4ls /Users/ktg/repos/agent-builder/scripts/templates/budget/ | wc -l→ expected: 4ls /Users/ktg/repos/agent-builder/scripts/templates/governance/ | wc -l→ expected: 3ls /Users/ktg/repos/agent-builder/scripts/templates/org-chart/ | wc -l→ expected: 3ls /Users/ktg/repos/agent-builder/scripts/templates/docker/ | wc -l→ expected: 4ls /Users/ktg/repos/agent-builder/scripts/templates/transfer/ | wc -l→ expected: 4find /Users/ktg/repos/agent-builder/scripts/templates -name "*.sh" -exec bash -n {} \;→ expected: no errors (all scripts pass syntax check)find /Users/ktg/repos/agent-builder/agents -name "*.md" -exec python3 -c "import yaml,sys; yaml.safe_load(open(sys.argv[1]).read().split('---')[1])" {} \;→ expected: no errors (all agents have valid frontmatter)
Estimated Scope
- Files to modify: 6 (plugin.json, CLAUDE.md, README.md, commands/build.md, skills/agent-system-design/SKILL.md, .gitignore)
- Files to create: ~55 (4 commands, 1 agent, 1 skill + references, ~45 template files across 13 template directories)
- Complexity: high (27 steps across 5 development phases, comprehensive template library)
Execution Strategy
File Dependency Analysis
Steps were analyzed for file overlap. The following connected components emerge:
- Component A (rename + commands + agents): Steps 1, 2, 3, 4, 5, 8, 26, 27 -- all touch
commands/build.md,CLAUDE.md,README.md, orplugin.json - Component B (memory + OpenClaw patterns): Steps 9, 10, 11, 12, 13 -- touch
scripts/templates/memory/,scripts/templates/heartbeat/,scripts/templates/proactive/,scripts/templates/cron/,skills/agent-system-design/ - Component C (Paperclip patterns): Steps 14, 15, 16, 17, 18, 19 -- touch
scripts/templates/heartbeat/,scripts/templates/goals/,scripts/templates/budget/,scripts/templates/governance/,scripts/templates/org-chart/,skills/agent-system-design/ - Component D (Self-learning): Steps 20, 21 -- touch
scripts/templates/feedback/,scripts/templates/optimization/ - Component E (Integration): Steps 22, 23, 24, 25 -- touch
scripts/templates/docker/,scripts/templates/transfer/,scripts/templates/domains/,skills/agent-system-design/references/ - Component F (Domain templates initial): Step 7 -- touch
scripts/templates/domains/
Note: Components B and C share skills/agent-system-design/SKILL.md (Steps 13 and 19) and scripts/templates/heartbeat/ (Steps 10 and 14). Component A shares commands/build.md across Steps 8 and 26. These overlaps create dependencies.
Session 1: Foundation -- Rename and Commands
- Steps: 1, 2, 3, 4, 5
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
.claude-plugin/plugin.json,CLAUDE.md,README.md,commands/build.md,commands/deploy.md,commands/evaluate.md,commands/status.md,agents/deployment-advisor.md,skills/agent-system-design/SKILL.md(rename only) - Never touch:
scripts/templates/*(except existing),skills/managed-agents/
- Touch:
Session 2: Skills and Initial Templates
- Steps: 6, 7
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
skills/managed-agents/SKILL.md,skills/managed-agents/references/api-patterns.md,scripts/templates/domains/*.md - Never touch:
commands/,agents/,.claude-plugin/,scripts/templates/memory/,scripts/templates/heartbeat/
- Touch:
Session 3: OpenClaw Memory and Autonomy Patterns
- Steps: 9, 10, 11, 12
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
scripts/templates/memory/,scripts/templates/heartbeat/HEARTBEAT.md,scripts/templates/heartbeat/heartbeat-runner.sh,scripts/templates/heartbeat/README.md,scripts/templates/proactive/,scripts/templates/cron/ - Never touch:
commands/,agents/,skills/,scripts/templates/heartbeat/context-packet.md,scripts/templates/heartbeat/wake-prompt.md
- Touch:
Session 4: Paperclip Orchestration Patterns
- Steps: 14, 15, 16, 17, 18
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
scripts/templates/heartbeat/context-packet.md,scripts/templates/heartbeat/wake-prompt.md,scripts/templates/goals/,scripts/templates/budget/,scripts/templates/governance/,scripts/templates/org-chart/ - Never touch:
commands/,agents/,skills/,scripts/templates/heartbeat/HEARTBEAT.md,scripts/templates/heartbeat/heartbeat-runner.sh,scripts/templates/heartbeat/README.md
- Touch:
Session 5: Self-Learning Systems
- Steps: 20, 21
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
scripts/templates/feedback/,scripts/templates/optimization/ - Never touch:
commands/,agents/,skills/, all other template dirs
- Touch:
Session 6: Integration -- Docker, Transfer, Additional Templates
- Steps: 22, 23, 24
- Wave: 1
- Depends on: none
- Scope fence:
- Touch:
scripts/templates/docker/,scripts/templates/transfer/,scripts/templates/domains/customer-support.md,scripts/templates/domains/devops-automation.md,scripts/templates/domains/legal-review.md,scripts/templates/domains/sales-intelligence.md,scripts/templates/domains/security-audit.md,scripts/templates/domains/README.md(update only) - Never touch:
commands/,agents/,skills/,scripts/templates/domains/content-pipeline.md,scripts/templates/domains/code-review.md,scripts/templates/domains/monitoring.md,scripts/templates/domains/research-synthesis.md,scripts/templates/domains/data-processing.md
- Touch:
Session 7: Skill Updates and References
- Steps: 13, 19, 25
- Wave: 2
- Depends on: Session 3 (Step 13 references memory templates), Session 4 (Step 19 references orchestration templates)
- Scope fence:
- Touch:
skills/agent-system-design/SKILL.md,skills/agent-system-design/references/memory-patterns.md,skills/agent-system-design/references/autonomy-patterns.md,skills/agent-system-design/references/orchestration-patterns.md,skills/agent-system-design/references/governance-patterns.md,skills/agent-system-design/references/mcp-integrations.md - Never touch:
commands/,agents/,scripts/templates/
- Touch:
Session 8: Build Command Integration and Finalization
- Steps: 8, 26, 27
- Wave: 3
- Depends on: Session 1 (renamed commands), Session 2 (domain templates), Session 7 (skill updates)
- Scope fence:
- Touch:
commands/build.md,.claude-plugin/plugin.json,CLAUDE.md,README.md - Never touch:
scripts/templates/,skills/,agents/
- Touch:
Execution Order
- Wave 1: Session 1, Session 2, Session 3, Session 4, Session 5, Session 6 (parallel -- 6 independent sessions)
- Wave 2: Session 7 (after Sessions 3 and 4 complete)
- Wave 3: Session 8 (after Sessions 1, 2, and 7 complete)
Grouping rules applied
- Steps sharing
skills/agent-system-design/SKILL.md(13, 19) grouped into Session 7 - Steps sharing
commands/build.md(8, 26) grouped into Session 8 - Steps sharing
scripts/templates/heartbeat/split by specific files (Session 3: core files, Session 4: context files) - Template directories with no overlap placed in separate Wave 1 sessions for maximum parallelism
- Integration steps (Session 8) depend on content they reference being complete
Plan Quality Score
| Dimension | Weight | Score | Notes |
|---|---|---|---|
| Structural integrity | 0.15 | 85 | Clear step ordering, dependency chain respected, waves logical |
| Step quality | 0.20 | 80 | Each step has concrete file lists, changes, verify commands. Some templates are necessarily described at concept level |
| Coverage completeness | 0.20 | 90 | All 5 phases covered, all spec requirements mapped to steps |
| Specification quality | 0.15 | 78 | No TBDs or TODOs; 4 assumptions documented. Template contents described in detail but are large |
| Risk & pre-mortem | 0.15 | 80 | 6 risks identified with mitigations. Bash 3.2 and API assumptions addressed |
| Headless readiness | 0.15 | 82 | Every step has On failure + Checkpoint. Execution strategy enables parallel sessions |
| Weighted total | 1.00 | 82 | Grade: B+ |
Adversarial review:
- Plan critic: APPROVE_WITH_NOTES -- 0 blockers, 2 major (template content descriptions are concept-level, not line-by-line; Steps 8 and 26 both modify build.md which creates ordering risk), 3 minor (some verify commands assume directory exists before testing file count; Session 7 depends on Sessions 3 and 4 which is correctly modeled but adds latency)
- Scope guardian: ALIGNED -- All 27 steps map to spec requirements. No scope creep detected. All 5 phases covered. 4 commands, 2 agents, 2 skills, 10 domain templates, import/export, Docker -- all per spec. One potential gap: spec mentions "self-healing" in Phase 4 which is covered by Step 21 (self-healing.sh template).
Revisions
| # | Finding | Severity | Resolution |
|---|---|---|---|
| 1 | Steps 8 and 26 both modify commands/build.md -- risk of merge conflict if run in parallel |
major | Grouped into same Session 8 (Wave 3), ensuring sequential execution |
| 2 | Steps 13 and 19 both modify skills/agent-system-design/SKILL.md -- same risk |
major | Grouped into same Session 7 (Wave 2), ensuring sequential execution |
| 3 | Heartbeat template directory shared between Steps 10 (core files) and 14 (context files) | minor | Scope fence explicitly lists which files each session may touch -- no overlap on specific files |
| 4 | Domain templates README.md updated in both Step 7 and Step 24 | minor | Step 7 creates README, Step 24 updates it. Placed in different sessions but no conflict since Step 24 appends to existing content. Session 6 scope fence for Step 24 correctly notes "update only" for README |
| 5 | Verify commands assume directories exist | minor | Steps that create directories have verify commands that test file existence, not directory listing. If directory creation fails, the file write in the step itself would have failed first, triggering On failure |