From 7419d4283daef1df7e77908fbc16945f6e202ca6 Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Sat, 11 Apr 2026 07:35:29 +0200 Subject: [PATCH] docs(plans): Agent Factory ultraplan + execution guide 27-step plan across 8 sessions in 3 waves for transforming agent-builder into Agent Factory v1.0.0. Includes research briefs, spec, and wave-by-wave execution prompts with scope fences. Co-Authored-By: Claude Opus 4.6 --- .claude/plans/execution-guide.md | 346 +++++ .../ultraplan-2026-04-11-agent-factory.md | 1138 +++++++++++++++++ .../source-code-analysis-2026-04-11.md | 489 +++++++ ...-11-openclaw-paperclip-agent-frameworks.md | 215 ++++ ...ultraplan-spec-2026-04-11-agent-factory.md | 106 ++ 5 files changed, 2294 insertions(+) create mode 100644 .claude/plans/execution-guide.md create mode 100644 .claude/plans/ultraplan-2026-04-11-agent-factory.md create mode 100644 .claude/research/source-code-analysis-2026-04-11.md create mode 100644 .claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md create mode 100644 .claude/ultraplan-spec-2026-04-11-agent-factory.md diff --git a/.claude/plans/execution-guide.md b/.claude/plans/execution-guide.md new file mode 100644 index 0000000..6db5f34 --- /dev/null +++ b/.claude/plans/execution-guide.md @@ -0,0 +1,346 @@ +# Agent Factory — Execution Guide + +## Overview + +The ultraplan (`ultraplan-2026-04-11-agent-factory.md`) has 27 steps across 8 sessions +in 3 waves. This guide provides self-contained prompts for each wave. + +**Key principle:** Each session reads its blueprint document (in `blueprints/`) which +contains exact file contents. No interpretation needed — implement what the blueprint specifies. + +## Reference Documents + +- **Plan:** `.claude/plans/ultraplan-2026-04-11-agent-factory.md` +- **Spec:** `.claude/ultraplan-spec-2026-04-11-agent-factory.md` +- **Research brief:** `.claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md` +- **Source code analysis:** `.claude/research/source-code-analysis-2026-04-11.md` +- **Blueprints:** `.claude/plans/blueprints/session-{N}-*.md` + +## Execution Order + +``` +Wave 0: Preparation (blueprints + assumption verification) + │ +Wave 1: S1 ─┬─ S2 ─┬─ S3 ─┬─ S4 ─┬─ S5 ─┬─ S6 (6 parallel) + │ │ │ │ │ │ +Wave 2: ────────────────── S7 ────────────────── (after S3+S4) + │ +Wave 3: ─────────────────── S8 ───────────────── (after S1+S2+S7) +``` + +--- + +## Wave 0 — Preparation (CREATE BLUEPRINTS) + +Run this FIRST. It creates the detailed blueprints that all other waves depend on. + +``` +/ultraexecute-local .claude/plans/ultraplan-2026-04-11-agent-factory.md +``` + +If not using ultraexecute, use this prompt: + +``` +Agent Factory Wave 0: Create session blueprints. + +Context: +- Plan: .claude/plans/ultraplan-2026-04-11-agent-factory.md +- Spec: .claude/ultraplan-spec-2026-04-11-agent-factory.md +- Research: .claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md +- Source analysis: .claude/research/source-code-analysis-2026-04-11.md +- Current codebase: 15 files, read ALL of them to understand conventions + +Task 1: Verify the 4 assumptions in the plan: + a) Search for Anthropic billing API docs (WebSearch). Document what exists. + b) Test `claude --resume` with a custom session ID format. Document behavior. + c) Check /schedule trigger docs. Document stability. + d) Confirm Docker approach (Dockerfile + docker-compose.yml). + Update the plan's Assumptions table with findings. + +Task 2: Create 8 session blueprint documents in .claude/plans/blueprints/: + - session-1-foundation.md (Steps 1-5) + - session-2-skills-templates.md (Steps 6-7) + - session-3-openclaw.md (Steps 9-12) + - session-4-paperclip.md (Steps 14-18) + - session-5-selflearning.md (Steps 20-21) + - session-6-integration.md (Steps 22-24) + - session-7-skill-updates.md (Steps 13, 19, 25) + - session-8-finalization.md (Steps 8, 26, 27) + +Each blueprint MUST contain: + 1. EXACT file contents for every new file (copy-paste ready) + 2. Precise diff descriptions for files being modified + 3. Verify commands that check CONTENT, not just file existence + 4. Quality criteria specific to the session + 5. Scope fence (what this session may/may not touch) + +For exact template content, use: + - Research brief for OpenClaw/Paperclip patterns (3-tier memory, WAL, heartbeat, etc.) + - Source code analysis for implementation details (heartbeat format, budget schema, etc.) + - Existing codebase files for conventions (frontmatter format, placeholder syntax, hook patterns) + +All bash scripts must be bash 3.2 compatible. All templates use {{PLACEHOLDER}} syntax. +Python3 for JSON/YAML/date parsing in scripts. + +Commit after all blueprints are created: + git commit -m "docs(plans): create session blueprints for Agent Factory execution" + git push origin main +``` + +--- + +## Wave 1 — Parallel Execution (6 sessions) + +Run these 6 sessions in parallel. Each is independent. + +### Session 1: Foundation — Rename and Commands + +``` +Agent Factory Session 1: Foundation — Rename and Commands. + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-1-foundation.md +- Plan steps 1-5: .claude/plans/ultraplan-2026-04-11-agent-factory.md + +Execute steps 1-5 from the blueprint: +1. Rename plugin from agent-builder to agent-factory (plugin.json, CLAUDE.md, README, commands, skills) +2. Create /agent-factory:deploy command (commands/deploy.md) +3. Create deployment-advisor agent (agents/deployment-advisor.md) +4. Create /agent-factory:evaluate command (commands/evaluate.md) +5. Create /agent-factory:status command (commands/status.md) + +SCOPE FENCE: +- Touch: .claude-plugin/plugin.json, CLAUDE.md, README.md, commands/*, agents/deployment-advisor.md +- Touch: skills/agent-system-design/SKILL.md (rename references ONLY) +- NEVER touch: scripts/templates/*, skills/managed-agents/ + +Implement EXACTLY what the blueprint specifies. Commit after each step. +Run all verify commands. Push when all 5 steps pass. +``` + +### Session 2: Skills and Initial Templates + +``` +Agent Factory Session 2: Skills and Domain Templates. + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-2-skills-templates.md +- Plan steps 6-7: .claude/plans/ultraplan-2026-04-11-agent-factory.md + +Execute steps 6-7: +6. Create managed-agents skill (skills/managed-agents/SKILL.md + references) +7. Create 5 domain templates (content-pipeline, code-review, monitoring, research-synthesis, data-processing) + +SCOPE FENCE: +- Touch: skills/managed-agents/*, scripts/templates/domains/* +- NEVER touch: commands/, agents/, .claude-plugin/, scripts/templates/memory/, scripts/templates/heartbeat/ + +Implement EXACTLY what the blueprint specifies. Commit after each step. +Run all verify commands. Push when done. +``` + +### Session 3: OpenClaw Memory and Autonomy + +``` +Agent Factory Session 3: OpenClaw Memory and Autonomy Patterns. + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-3-openclaw.md +- Plan steps 9-12: .claude/plans/ultraplan-2026-04-11-agent-factory.md +- Source analysis: .claude/research/source-code-analysis-2026-04-11.md (OpenClaw section) + +Execute steps 9-12: +9. Create 3-tier memory templates (SESSION-STATE.md, DAILY-LOG.md, MEMORY.md) +10. Create heartbeat + cron templates (HEARTBEAT.md, heartbeat-runner.sh) with emptiness detection +11. Create proactive agent template with ADL/VFM guardrails +12. Create isolated agentTurn and systemEvent cron templates + +SCOPE FENCE: +- Touch: scripts/templates/memory/, scripts/templates/heartbeat/HEARTBEAT.md, + scripts/templates/heartbeat/heartbeat-runner.sh, scripts/templates/heartbeat/README.md, + scripts/templates/proactive/, scripts/templates/cron/ +- NEVER touch: commands/, agents/, skills/, + scripts/templates/heartbeat/context-packet.md, scripts/templates/heartbeat/wake-prompt.md + +All bash scripts MUST pass `bash -n`. Use python3 for JSON/YAML/date parsing. +Implement EXACTLY what the blueprint specifies. Commit after each step. Push when done. +``` + +### Session 4: Paperclip Orchestration + +``` +Agent Factory Session 4: Paperclip Orchestration Patterns. + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-4-paperclip.md +- Plan steps 14-18: .claude/plans/ultraplan-2026-04-11-agent-factory.md +- Source analysis: .claude/research/source-code-analysis-2026-04-11.md (Paperclip section) + +Execute steps 14-18: +14. Create heartbeat context injection templates (context-packet.md, wake-prompt.md) +15. Create goal hierarchy templates (GOALS.md, goal-tracker.sh) +16. Create budget tracking templates (budget-hook.sh, BUDGET.md, budget-report.sh) +17. Create governance and approval gate templates (GOVERNANCE.md, approval-gate.sh) +18. Create org-chart template (ORG-CHART.md, org-manager.sh) + +SCOPE FENCE: +- Touch: scripts/templates/heartbeat/context-packet.md, scripts/templates/heartbeat/wake-prompt.md, + scripts/templates/goals/, scripts/templates/budget/, scripts/templates/governance/, + scripts/templates/org-chart/ +- NEVER touch: commands/, agents/, skills/, + scripts/templates/heartbeat/HEARTBEAT.md, scripts/templates/heartbeat/heartbeat-runner.sh + +All bash scripts MUST pass `bash -n`. Use python3 for JSON/YAML/date parsing. +Implement EXACTLY what the blueprint specifies. Commit after each step. Push when done. +``` + +### Session 5: Self-Learning Systems + +``` +Agent Factory Session 5: Self-Learning Systems. + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-5-selflearning.md +- Plan steps 20-21: .claude/plans/ultraplan-2026-04-11-agent-factory.md + +Execute steps 20-21: +20. Create feedback loop templates (FEEDBACK.md, feedback-collector.sh, performance-scorer.sh) +21. Create pipeline optimization and self-healing templates (pipeline-optimizer.sh, self-healing.sh) + +SCOPE FENCE: +- Touch: scripts/templates/feedback/, scripts/templates/optimization/ +- NEVER touch: commands/, agents/, skills/, all other template dirs + +All bash scripts MUST pass `bash -n`. Use python3 for JSON/YAML/date parsing. +Implement EXACTLY what the blueprint specifies. Commit after each step. Push when done. +``` + +### Session 6: Integration — Docker, Transfer, Templates + +``` +Agent Factory Session 6: Integration — Docker, Transfer, Additional Templates. + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-6-integration.md +- Plan steps 22-24: .claude/plans/ultraplan-2026-04-11-agent-factory.md + +Execute steps 22-24: +22. Create Docker deployment templates (Dockerfile, docker-compose.yml, docker-entrypoint.sh) +23. Create import/export system (export-system.sh, import-system.sh, MANIFEST.md) +24. Create 5 additional domain templates (customer-support, devops, legal, sales, security) + +SCOPE FENCE: +- Touch: scripts/templates/docker/, scripts/templates/transfer/, + scripts/templates/domains/customer-support.md, devops-automation.md, + legal-review.md, sales-intelligence.md, security-audit.md, + scripts/templates/domains/README.md (update only) +- NEVER touch: commands/, agents/, skills/, + scripts/templates/domains/content-pipeline.md, code-review.md, + monitoring.md, research-synthesis.md, data-processing.md + +All bash scripts MUST pass `bash -n`. +Implement EXACTLY what the blueprint specifies. Commit after each step. Push when done. +``` + +--- + +## Wave 2 — Skill Updates (after Wave 1 Sessions 3+4) + +### Session 7: Skill References + +``` +Agent Factory Session 7: Skill Updates and References. + +PREREQUISITE: Wave 1 Sessions 3 and 4 must be complete. Verify: + ls scripts/templates/memory/ && ls scripts/templates/heartbeat/ && + ls scripts/templates/goals/ && ls scripts/templates/budget/ + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-7-skill-updates.md +- Plan steps 13, 19, 25: .claude/plans/ultraplan-2026-04-11-agent-factory.md +- The templates created in Sessions 3+4 (to reference accurately) + +Execute steps 13, 19, 25: +13. Update agent-system-design skill with OpenClaw pattern references (memory-patterns.md, autonomy-patterns.md) +19. Update agent-system-design skill with Paperclip pattern references (orchestration-patterns.md, governance-patterns.md) +25. Create MCP integration reference (mcp-integrations.md) + +SCOPE FENCE: +- Touch: skills/agent-system-design/SKILL.md, skills/agent-system-design/references/* +- NEVER touch: commands/, agents/, scripts/templates/ + +Steps 13 and 19 both modify SKILL.md — execute them SEQUENTIALLY. +Implement EXACTLY what the blueprint specifies. Commit after each step. Push when done. +``` + +--- + +## Wave 3 — Finalization (after Wave 1 Sessions 1+2 + Wave 2) + +### Session 8: Build Command Integration + +``` +Agent Factory Session 8: Build Command Integration and Finalization. + +PREREQUISITE: All Wave 1 + Wave 2 sessions must be complete. Verify: + ls commands/deploy.md commands/evaluate.md commands/status.md && + ls skills/managed-agents/SKILL.md && + ls scripts/templates/domains/ | wc -l # should be 11 (10 templates + README) + ls skills/agent-system-design/references/memory-patterns.md + +Read these files FIRST: +- Blueprint: .claude/plans/blueprints/session-8-finalization.md +- Plan steps 8, 26, 27: .claude/plans/ultraplan-2026-04-11-agent-factory.md +- Current state of commands/build.md (to understand what to modify) +- Current state of .claude-plugin/plugin.json + +Execute steps 8, 26, 27: +8. Update build command for domain templates and new features (Phase 0 template selection) +26. Update build command to integrate ALL Phase 2-5 features (memory, proactive, governance, goals, org-chart, budget, heartbeat, Docker) +27. Update plugin.json to v1.0.0, rewrite CLAUDE.md and README.md for full Agent Factory + +Steps 8 and 26 both modify commands/build.md — execute them SEQUENTIALLY. +Step 27 is the final commit: "feat: Agent Factory v1.0.0" + +SCOPE FENCE: +- Touch: commands/build.md, .claude-plugin/plugin.json, CLAUDE.md, README.md +- NEVER touch: scripts/templates/, skills/, agents/ + +After step 27, run ALL verification commands from the plan's Verification section. +Commit and push. Tag: git tag v1.0.0 +``` + +--- + +## Post-Execution Verification + +After all waves complete, run the full verification suite: + +```bash +# All renamed +grep -r "agent-builder" . --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan" | wc -l # → 0 + +# Plugin manifest +python3 -c "import json; d=json.load(open('.claude-plugin/plugin.json')); print(d['name'], d['version'])" # → agent-factory 1.0.0 + +# All commands +ls commands/ | sort # → build.md deploy.md evaluate.md status.md + +# All agents +ls agents/ | sort # → builder.md deployment-advisor.md + +# All skills +ls skills/ | sort # → agent-system-design managed-agents + +# Template directories +ls scripts/templates/ | sort # → budget cron docker domains feedback goals governance heartbeat memory optimization org-chart proactive transfer + existing files + +# Domain templates +ls scripts/templates/domains/*.md | wc -l # → 11 + +# All bash scripts pass syntax check +find scripts/templates -name "*.sh" -exec bash -n {} \; # → no errors + +# All agents have valid frontmatter +find agents -name "*.md" -exec python3 -c "import yaml,sys; yaml.safe_load(open(sys.argv[1]).read().split('---')[1])" {} \; # → no errors +``` diff --git a/.claude/plans/ultraplan-2026-04-11-agent-factory.md b/.claude/plans/ultraplan-2026-04-11-agent-factory.md new file mode 100644 index 0000000..b4248ec --- /dev/null +++ b/.claude/plans/ultraplan-2026-04-11-agent-factory.md @@ -0,0 +1,1138 @@ +# Agent Factory -- Full Vision Realization + +> **Plan quality: B+** (82/100) -- APPROVE_WITH_NOTES +> +> Generated by ultraplan-local v1.6.0 on 2026-04-11 +> +> **WARNING: Plan has 4 unverified assumptions -- review before executing.** + +## Context + +The existing `agent-builder` plugin (v0.1.0) is a minimal prototype with 1 command +(`/agent-builder:build`), 1 agent (`builder`), 1 skill (`agent-system-design`), and +3 hook templates. Three commands are stubbed in the README but not implemented +(`deploy`, `evaluate`, `status`). The `deployment-advisor` agent and `managed-agents` +skill are mentioned but empty or missing. + +This plan transforms agent-builder into **Agent Factory**: a comprehensive, guided +system for building autonomous agent systems. The transformation spans 5 development +phases, each building on the previous: + +1. **Foundation** (v0.2) -- rename, missing commands, deployment-advisor, managed-agents skill, domain templates +2. **OpenClaw patterns** (v0.3) -- 3-tier memory, WAL protocol, Working Buffer, proactive agent with ADL/VFM, isolated agentTurn +3. **Paperclip patterns** (v0.4) -- heartbeat with context injection, goal hierarchy, budget awareness, governance, org-chart +4. **Self-learning** (v0.5) -- feedback loops, performance scoring, pipeline optimization, self-healing +5. **Full integration** (v1.0) -- MCP integrations, Docker deployment, bundled templates, import/export + +**Spec:** `/Users/ktg/repos/agent-builder/.claude/ultraplan-spec-2026-04-11-agent-factory.md` + +## Architecture Diagram + +```mermaid +graph TD + subgraph "Plugin Structure (Agent Factory)" + PJ[".claude-plugin/plugin.json"] + CM["CLAUDE.md"] + + subgraph "Commands (Phase 1)" + C1["commands/build.md"] + C2["commands/deploy.md -- NEW"] + C3["commands/evaluate.md -- NEW"] + C4["commands/status.md -- NEW"] + end + + subgraph "Agents (Phase 1-3)" + A1["agents/builder.md"] + A2["agents/deployment-advisor.md -- NEW"] + end + + subgraph "Skills (Phase 1-4)" + S1["skills/agent-system-design/"] + S2["skills/managed-agents/ -- NEW"] + end + + subgraph "Templates (Phase 1-5)" + T1["scripts/templates/automation.sh"] + T2["scripts/templates/pre-tool-use.sh"] + T3["scripts/templates/post-tool-use.sh"] + T4["scripts/templates/memory/ -- NEW"] + T5["scripts/templates/heartbeat/ -- NEW"] + T6["scripts/templates/budget/ -- NEW"] + T7["scripts/templates/governance/ -- NEW"] + T8["scripts/templates/docker/ -- NEW"] + T9["scripts/templates/domains/ -- NEW"] + end + + subgraph "References (Phase 1-4)" + R1["skills/agent-system-design/references/"] + R2["references: memory, heartbeat, budget, governance -- NEW"] + end + end + + C1 --> A1 + C2 --> A2 + A1 --> T1 & T2 & T3 & T4 & T5 + A2 --> T8 + S2 --> R2 +``` + +## Codebase Analysis + +- **Tech stack:** Claude Code plugin (markdown commands, agents, skills), bash 3.2 shell scripts, JSON config +- **Key patterns:** YAML frontmatter for agents/skills/commands, `{{PLACEHOLDER}}` template variables in shell scripts, `${CLAUDE_PLUGIN_ROOT}` for intra-plugin paths, python3 for JSON parsing in hooks +- **Relevant files (all 15):** + - `.claude-plugin/plugin.json` -- plugin manifest (rename target) + - `CLAUDE.md` -- plugin instructions + - `README.md` -- user-facing documentation + - `commands/build.md` -- the only implemented command + - `agents/builder.md` -- the only implemented agent + - `skills/agent-system-design/SKILL.md` -- main knowledge skill + - `skills/agent-system-design/references/*.md` -- 4 reference files + - `scripts/templates/*.sh` -- 3 shell templates +- **Reusable code:** + - `agents/builder.md` -- agent frontmatter format, tool list pattern, rules structure + - `commands/build.md` -- command frontmatter format, phase-based workflow pattern + - `scripts/templates/pre-tool-use.sh` -- hook script pattern (stdin JSON, python3 parsing, exit codes) + - `scripts/templates/post-tool-use.sh` -- audit logging pattern + - `scripts/templates/automation.sh` -- headless runner pattern + - `skills/agent-system-design/references/feature-map.md` -- capability mapping table format +- **Conventions:** + - Agent descriptions must include `` blocks for reliable trigger matching + - Hook scripts use `exit 2` to block, `exit 0` to allow + - Templates use `{{PLACEHOLDER}}` syntax (plain string replace, no engine) + - All bash must be 3.2 compatible (no `declare -A`, no `mapfile`, no `|&`) + +## Research Sources + +| Technology | Source | Key Findings | Confidence | +|-----------|--------|--------------|------------| +| OpenClaw 3-tier memory | [Proactive Agent Skill](https://github.com/openclaw/skills/blob/main/skills/halthelobster/proactive-agent/SKILL.md) + source analysis | SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold); WAL protocol writes before responding; Working Buffer captures in danger zone (60%+ context) | high | +| OpenClaw Heartbeat | Source code `src/auto-reply/heartbeat.ts` | HEARTBEAT.md with YAML tasks block, interval tracking, emptiness detection skips API calls, `ackMaxChars` suppression | high | +| OpenClaw Cron | Source code `src/cron/service.ts` | `systemEvent` vs `agentTurn` types; `isolated` session target; startup catchup with stagger | high | +| OpenClaw ADL/VFM | Proactive Agent Skill docs | Anti-Drift Limits (no fake intelligence, no unverifiable mods); Value-First Modification (score > 50 required); Priority: Stability > Explainability > Reusability > Scalability > Novelty | high | +| Paperclip Heartbeat | Source code `server/src/services/heartbeat.ts` | Poll-based `tickTimers()`, 4 wakeup triggers, per-agent interval, concurrency control | high | +| Paperclip Budget | Source code `server/src/services/budgets.ts` | Post-hoc `evaluateCostEvent()`, SUM(cost_cents) per window, soft/hard thresholds, pause on exceed | high | +| Paperclip Goals | Source code, DB schema | Simple `parent_id` FK on goals table, NOT recursive traversal at runtime | high | +| Paperclip Org Chart | Source code, DB schema | `agents.reportsTo` self-referential FK + `agents.role` text field | high | +| Paperclip Adapter | Source code `packages/adapter-utils/src/types.ts` | `execute(ctx)` interface, 10 built-in adapters, Claude adapter uses CLI with `--append-system-prompt-file` | high | +| Anthropic Billing API | Spec assumption | [ASSUMPTION] Endpoint and auth mechanism unverified | low | + +## Implementation Plan + +### Step 1: Rename plugin from agent-builder to agent-factory + +- **Files:** `.claude-plugin/plugin.json`, `CLAUDE.md`, `README.md`, `commands/build.md` +- **Changes:** + - `.claude-plugin/plugin.json`: Change `"name": "agent-builder"` to `"name": "agent-factory"`, update description to mention "Agent Factory", bump version to `"0.2.0"` + - `CLAUDE.md`: Replace "Agent Builder Plugin" with "Agent Factory Plugin", update all references to "agent-builder" with "agent-factory" + - `README.md`: Replace title "# Agent Builder" with "# Agent Factory", update install command from `/install agent-builder` to `/install agent-factory`, update all `/agent-builder:` command prefixes to `/agent-factory:`, update plugin-dir path reference, update repository URL if needed + - `commands/build.md`: Replace `/agent-builder:build` with `/agent-factory:build` in the system prompt text + - `skills/agent-system-design/SKILL.md`: Update `/agent-builder:build` reference to `/agent-factory:build` +- **Reuses:** Existing file structures, no new patterns needed +- **Verify:** `grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan-spec"` → expected: no matches (all renamed) +- **On failure:** revert -- this is the foundation; all subsequent steps depend on correct naming +- **Checkpoint:** `git commit -m "feat!: rename plugin from agent-builder to agent-factory"` + +### Step 2: Create /agent-factory:deploy command + +- **Files:** `commands/deploy.md` (new) +- **Changes:** Create a new command file with frontmatter: + ```yaml + description: Configure deployment for your agent system. Supports /schedule, cron/launchd, systemd, and Docker. + argument-hint: "Optional: deployment target (schedule, local, vps, docker)" + allowed-tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash", "Agent", "AskUserQuestion"] + ``` + The command body should: + 1. Read the user's existing agent system (scan `.claude/agents/`, `.claude/skills/`, `CLAUDE.md`) + 2. Ask the user to choose a deployment target via AskUserQuestion: `/schedule` (Claude Code native), Local (cron/launchd), VPS (systemd), Docker (compose), or use `$ARGUMENTS` if provided + 3. For `/schedule`: Generate a HEARTBEAT.md file with task entries parsed from the user's pipeline skills, generate automation wrapper reading `${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh` + 4. For Local: Copy and customize `automation.sh` template, generate cron/launchd config + 5. For VPS: Generate systemd service and timer units, customize automation.sh + 6. For Docker: Generate `Dockerfile` and `docker-compose.yml` from templates (Step 16) + 7. Use the `deployment-advisor` agent (Step 3) when the user needs guidance + 8. Provide exact activation commands for the chosen target +- **Reuses:** `commands/build.md` for command frontmatter format and phase structure; `scripts/templates/automation.sh` for headless runner pattern; `skills/agent-system-design/references/deployment-targets.md` for target-specific guidance +- **Verify:** `head -5 /Users/ktg/repos/agent-builder/commands/deploy.md` → expected: valid YAML frontmatter with `description:` field +- **On failure:** revert -- command file must be valid YAML frontmatter +- **Checkpoint:** `git commit -m "feat(commands): add /agent-factory:deploy command"` + +### Step 3: Create deployment-advisor agent + +- **Files:** `agents/deployment-advisor.md` (new) +- **Changes:** Create agent file with frontmatter: + ```yaml + name: deployment-advisor + description: | + Use this agent when the user needs help choosing or configuring a deployment target for their agent system. + + Context: User has built agents and wants to deploy + user: "How should I deploy my agent system?" + assistant: "I'll use the deployment-advisor to analyze your setup and recommend a target." + + Deployment guidance request triggers the advisor. + + + + Context: User wants to switch deployment targets + user: "Can I move my agents from cron to Docker?" + assistant: "I'll use the deployment-advisor to plan the migration." + + Deployment migration request triggers the advisor. + + + model: sonnet + color: blue + tools: ["Read", "Glob", "Grep", "Bash", "AskUserQuestion"] + ``` + The agent body should: + - Scan the user's project for existing agents, skills, hooks, settings, and automation files + - Assess requirements: always-on needed? team access? Computer Use? budget constraints? + - Recommend a deployment target using the decision guide from `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/deployment-targets.md` + - Generate deployment-specific configuration files + - Include rules: never overwrite existing deployment config without confirmation, always verify generated scripts with `bash -n`, always include rollback instructions +- **Reuses:** `agents/builder.md` for agent file format and example block structure; `skills/agent-system-design/references/deployment-targets.md` for deployment decision logic +- **Verify:** `python3 -c "import yaml; yaml.safe_load(open('/Users/ktg/repos/agent-builder/agents/deployment-advisor.md').read().split('---')[1])"` → expected: no error (valid YAML frontmatter) +- **On failure:** retry -- fix YAML syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(agents): add deployment-advisor agent"` + +### Step 4: Create /agent-factory:evaluate command + +- **Files:** `commands/evaluate.md` (new) +- **Changes:** Create command with frontmatter: + ```yaml + description: Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations. + argument-hint: "Optional: focus area (security, deployment, memory, autonomy)" + allowed-tools: ["Read", "Glob", "Grep", "Bash"] + ``` + The command body should: + 1. Scan the project for all agent system components: `.claude/agents/*.md`, `.claude/skills/*/SKILL.md`, `.claude/hooks/*.sh` or `hooks/*.sh`, `.claude/settings.json`, `CLAUDE.md`, `automation/` or `scripts/`, `memory/` or `data/` + 2. For each of the 22 capabilities from `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md`, check whether the user's project has the corresponding component + 3. Score: count OK/Partial/Missing for each capability + 4. Output a capability matrix table showing: capability name, status (OK/Partial/Missing), what exists, what's needed + 5. Provide specific recommendations for filling gaps, ordered by impact + 6. If `$ARGUMENTS` specifies a focus area, expand that section with detailed guidance +- **Reuses:** `skills/agent-system-design/references/feature-map.md` for the 22-capability checklist; `commands/build.md` for command frontmatter format +- **Verify:** `head -5 /Users/ktg/repos/agent-builder/commands/evaluate.md` → expected: valid YAML frontmatter with `description:` field +- **On failure:** revert -- command must have valid frontmatter +- **Checkpoint:** `git commit -m "feat(commands): add /agent-factory:evaluate command"` + +### Step 5: Create /agent-factory:status command + +- **Files:** `commands/status.md` (new) +- **Changes:** Create command with frontmatter: + ```yaml + description: Quick status check of your agent infrastructure. Shows agents, skills, hooks, deployment, and recent activity. + argument-hint: "" + allowed-tools: ["Read", "Glob", "Grep", "Bash"] + ``` + The command body should: + 1. Scan for agents: `Glob` for `.claude/agents/*.md`, list each with name and model + 2. Scan for skills: `Glob` for `.claude/skills/*/SKILL.md` and `.claude/skills/*.md`, list each with name + 3. Scan for hooks: `Glob` for `.claude/hooks/*.sh` and `hooks/*.sh`, check if executable + 4. Check settings: read `.claude/settings.json` if it exists, summarize permissions and hook config + 5. Check deployment: look for automation scripts, launchd plists, systemd units, docker-compose files, HEARTBEAT.md + 6. Check memory: look for `memory/MEMORY.md`, `data/run-state.json`, SESSION-STATE.md + 7. Check recent activity: read audit.log if it exists, show last 5 entries + 8. Output a compact status table with sections for each component type + 9. Flag any issues: missing hooks for deployed agents, agents without examples in description, skills without version +- **Reuses:** `commands/build.md` for command frontmatter format +- **Verify:** `head -5 /Users/ktg/repos/agent-builder/commands/status.md` → expected: valid YAML frontmatter +- **On failure:** revert -- command must have valid frontmatter +- **Checkpoint:** `git commit -m "feat(commands): add /agent-factory:status command"` + +### Step 6: Create managed-agents skill + +- **Files:** `skills/managed-agents/SKILL.md` (new), `skills/managed-agents/references/api-patterns.md` (new) +- **Changes:** + - `skills/managed-agents/SKILL.md`: Create skill with frontmatter: + ```yaml + name: managed-agents + description: | + This skill should be used when the user asks about "managed agents", + "Anthropic API agents", "cloud-hosted agents", "agent SDK", + "deploying agents to the cloud", "serverless agents", + "API-based agent deployment", "/v1/agents endpoint" + version: 0.1.0 + ``` + Body should cover: + - What managed agents are (Anthropic-hosted agent runtime) + - When to use them vs. local deployment (decision matrix) + - SDK patterns for TypeScript and Python + - Session management and persistence + - Budget and cost considerations for API-based agents + - Migration path from local to managed + - `skills/managed-agents/references/api-patterns.md`: Reference doc with concrete code patterns for agent creation, session management, tool configuration, and error handling using `@anthropic-ai/sdk` +- **Reuses:** `skills/agent-system-design/SKILL.md` for skill file format; `skills/agent-system-design/references/deployment-targets.md` for the managed agents section content +- **Verify:** `head -5 /Users/ktg/repos/agent-builder/skills/managed-agents/SKILL.md` → expected: valid YAML frontmatter with `name: managed-agents` +- **On failure:** revert -- skill must have valid frontmatter +- **Checkpoint:** `git commit -m "feat(skills): add managed-agents knowledge skill"` + +### Step 7: Create domain template infrastructure and first 5 templates + +- **Files:** `scripts/templates/domains/content-pipeline.md` (new), `scripts/templates/domains/code-review.md` (new), `scripts/templates/domains/monitoring.md` (new), `scripts/templates/domains/research-synthesis.md` (new), `scripts/templates/domains/data-processing.md` (new), `scripts/templates/domains/README.md` (new) +- **Changes:** Create 5 domain-specific pipeline templates. Each template is a plain markdown file with `{{PLACEHOLDER}}` variables that the builder agent copies into the user's project. Each template file contains: + - A header comment explaining the domain and what gets generated + - Agent definitions (researcher/writer/reviewer variants specialized to the domain) + - Pipeline skill template with domain-specific steps + - Recommended hook patterns for the domain + - Example CLAUDE.md sections relevant to the domain + + **Template 1: content-pipeline.md** -- Content production (articles, newsletters, reports). Agents: content-researcher, content-writer, content-reviewer. Pipeline: research topic, draft article, review quality, publish. + + **Template 2: code-review.md** -- Automated code review. Agents: code-analyzer, review-writer, standards-checker. Pipeline: analyze PR/diff, write review, check against standards, post review. + + **Template 3: monitoring.md** -- System/service monitoring. Agents: monitor-checker, incident-reporter, remediation-advisor. Pipeline: check endpoints/logs, detect anomalies, report incidents, suggest fixes. + + **Template 4: research-synthesis.md** -- Research and analysis. Agents: source-gatherer, synthesizer, fact-checker. Pipeline: gather sources, synthesize findings, verify claims, produce brief. + + **Template 5: data-processing.md** -- Data transformation pipelines. Agents: data-validator, transformer, quality-checker. Pipeline: validate input, transform data, check output quality, save results. + + **README.md**: Index listing all available templates with one-line descriptions, usage instructions for the builder agent, and template contribution guidelines. + + All templates must use `{{PROJECT_DIR}}`, `{{AGENT_NAME}}`, `{{PIPELINE_NAME}}`, `{{SCHEDULE}}`, `{{DOMAIN}}` as placeholder variables. +- **Reuses:** `agents/builder.md` for agent definition format; `commands/build.md` Phase 3-4 for pipeline creation patterns; `skills/agent-system-design/references/pipeline-patterns.md` for the 9-step pipeline template +- **Verify:** `ls /Users/ktg/repos/agent-builder/scripts/templates/domains/ | wc -l` → expected: 6 (5 templates + README) +- **On failure:** retry -- ensure all 5 templates have valid structure, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add 5 domain-specific pipeline templates"` + +### Step 8: Update build command to use domain templates and new features + +- **Files:** `commands/build.md` +- **Changes:** + - Add a Phase 0 (pre-interview) that asks: "Would you like to start from a domain template? Available: content-pipeline, code-review, monitoring, research-synthesis, data-processing. Or choose 'custom' for a blank start." + - If template chosen: read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.md` and pre-populate Phase 1 design sketch with template roles and pipeline structure + - Update Phase 6 (Deployment) to reference the new `/agent-factory:deploy` command instead of inline deployment logic. Add `/schedule` as a deployment option. Change the question to offer all 5 targets: `/schedule`, Local (cron/launchd), Mac Mini (launchd), VPS (systemd), Docker + - Update the summary at end to mention `/agent-factory:evaluate` and `/agent-factory:status` as next steps + - Update all self-references from `/agent-builder:build` to `/agent-factory:build` +- **Reuses:** Existing Phase 1-7 structure in `commands/build.md`; `scripts/templates/domains/` templates from Step 7 +- **Verify:** `grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md` → expected: >= 3 references +- **On failure:** revert -- build command is critical path +- **Checkpoint:** `git commit -m "feat(commands): integrate domain templates and new commands into build workflow"` + +### Step 9: Create 3-tier memory templates (OpenClaw pattern) + +- **Files:** `scripts/templates/memory/SESSION-STATE.md` (new), `scripts/templates/memory/DAILY-LOG.md` (new), `scripts/templates/memory/MEMORY.md` (new), `scripts/templates/memory/README.md` (new) +- **Changes:** + - **SESSION-STATE.md** template: Hot working memory. Contains `{{AGENT_NAME}}` header, sections for Current Task, Context Window Usage (with 60% danger zone marker), Active Decisions, Pending Actions. WAL Protocol instruction: "Write important details HERE before responding to the user." Includes a Working Buffer section that activates when context usage exceeds 60%: "When context usage is above 60%, capture key exchanges in the Working Buffer before they are lost to compaction." + - **DAILY-LOG.md** template: Warm daily capture. Filename pattern: `memory/{{YYYY-MM-DD}}.md`. Sections: Date, Summary of Work, Decisions Made, Files Modified, Issues Encountered, Carry Forward (items for next session). Auto-rotation instruction: one file per day, builder generates the date-stamped filename. + - **MEMORY.md** template: Cold long-term curated memory. Sections: Agent Identity, Key Learnings (manually curated from daily logs), Recurring Patterns, Known Issues, Project Context, Last Updated. Compaction Recovery instruction: "When recovering from compaction, read in order: SESSION-STATE.md first, then today's daily log, then MEMORY.md, then search all daily logs for relevant context." + - **README.md**: Explains the 3-tier memory pattern, how each tier works, when to read/write each tier, and the WAL protocol and Working Buffer protocol with concrete examples. +- **Reuses:** OpenClaw's proactive agent skill pattern (from research); existing `MEMORY.md` convention in Claude Code +- **Verify:** `ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l` → expected: 4 files +- **On failure:** revert -- templates must be valid markdown with proper placeholders +- **Checkpoint:** `git commit -m "feat(templates): add 3-tier memory templates (OpenClaw pattern)"` + +### Step 10: Create heartbeat and cron templates (OpenClaw + Paperclip patterns) + +- **Files:** `scripts/templates/heartbeat/HEARTBEAT.md` (new), `scripts/templates/heartbeat/heartbeat-runner.sh` (new), `scripts/templates/heartbeat/README.md` (new) +- **Changes:** + - **HEARTBEAT.md** template: Agent heartbeat file following OpenClaw's format. Contains: + ``` + # Heartbeat: {{AGENT_NAME}} + + Read this file on each heartbeat. Follow it strictly. Do not infer or + repeat old tasks from prior chats. If nothing needs attention, reply + HEARTBEAT_OK. + + ## Tasks + + tasks: + - name: {{TASK_1_NAME}} + interval: {{TASK_1_INTERVAL}} + prompt: "{{TASK_1_PROMPT}}" + - name: {{TASK_2_NAME}} + interval: {{TASK_2_INTERVAL}} + prompt: "{{TASK_2_PROMPT}}" + + ## Context + + {{CONTEXT_NOTES}} + ``` + - **heartbeat-runner.sh** template: Bash 3.2 compatible script that: + 1. Reads HEARTBEAT.md + 2. Implements emptiness detection (skip API call if file has only headers/empty items -- saves cost, from OpenClaw pattern) + 3. Parses task intervals using python3 (parse YAML tasks block, check last-run timestamps from a `.heartbeat-state` JSON file) + 4. For each due task: invokes `claude -p` with the task prompt + 5. Updates `.heartbeat-state` with new last-run timestamps + 6. Implements startup catchup: on first run after downtime, runs up to 5 missed tasks with 5-second stagger between them (OpenClaw pattern) + 7. Suppresses responses shorter than 300 chars from output (ackMaxChars pattern) + 8. All date math uses `python3` (not bash arithmetic) for portability + Script must use `#!/bin/bash` and be bash 3.2 compatible: no associative arrays, no mapfile, no `|&` + - **README.md**: Documents the heartbeat pattern, explains `systemEvent` vs `agentTurn` distinction, explains emptiness detection cost saving, explains startup catchup logic, provides example cron entries for running the heartbeat runner at various intervals. +- **Reuses:** `scripts/templates/automation.sh` for the headless runner pattern; OpenClaw heartbeat source code patterns (emptiness detection, task parsing, ackMaxChars); Paperclip's heartbeat interval and wakeup trigger model +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax errors (likely 3.2 compatibility issues), then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add heartbeat templates with emptiness detection and catchup"` + +### Step 11: Create proactive agent template with ADL/VFM guardrails + +- **Files:** `scripts/templates/proactive/PROACTIVE-AGENT.md` (new), `scripts/templates/proactive/ADL-RULES.md` (new), `scripts/templates/proactive/VFM-SCORING.md` (new), `scripts/templates/proactive/README.md` (new) +- **Changes:** + - **PROACTIVE-AGENT.md** template: An agent template (with YAML frontmatter) for agents that can self-improve. Contains: + - Standard agent frontmatter with `model: sonnet` and appropriate tools + - "How you work" section describing the proactive cycle: observe environment, identify improvements, score proposed changes against VFM, implement if score > 50, log all decisions + - "Self-improvement protocol" section: before making any change to your own config, skills, or prompts, you MUST run VFM scoring. Changes that score <= 50 are logged but NOT implemented. + - "Rules" section including all ADL limits: no fake intelligence (do not pretend capabilities you lack), no unverifiable modifications (all changes must be testable), no novelty over stability (prefer proven approaches), stability > explainability > reusability > scalability > novelty + - "Self-healing" section: when encountering errors, try up to 5 different approaches before asking for help. Log each attempt with result. + - **ADL-RULES.md** template: Anti-Drift Limits reference. Full list of constraints that prevent agent drift: + 1. No fake intelligence -- do not simulate capabilities + 2. No unverifiable modifications -- every change must be testable + 3. No novelty over stability -- proven approaches first + 4. No scope expansion without approval -- stay within defined boundaries + 5. No silent failures -- all errors must be logged + 6. Priority ordering: Stability > Explainability > Reusability > Scalability > Novelty + - **VFM-SCORING.md** template: Value-First Modification scoring guide. For any proposed self-modification: + - Frequency: How often does this issue occur? (0-25 points) + - Failure reduction: Does this fix real failures? (0-25 points) + - Burden reduction: Does this reduce human effort? (0-25 points) + - Cost savings: Does this reduce API/compute costs? (0-25 points) + - Total score: sum of above. Implement only if > 50. + - Logging format: date, proposed change, scores per dimension, total, decision (implement/defer) + - **README.md**: Explains the proactive agent pattern, when to use it, relationship to OpenClaw's proactive agent skill, examples of good vs bad self-modifications, how ADL and VFM work together. +- **Reuses:** `agents/builder.md` for agent frontmatter format; OpenClaw proactive agent skill patterns (ADL, VFM, self-healing attempts) +- **Verify:** `ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l` → expected: 4 files +- **On failure:** revert -- templates must be valid and complete +- **Checkpoint:** `git commit -m "feat(templates): add proactive agent templates with ADL/VFM guardrails"` + +### Step 12: Create isolated agentTurn template + +- **Files:** `scripts/templates/cron/agent-turn.sh` (new), `scripts/templates/cron/system-event.sh` (new), `scripts/templates/cron/README.md` (new) +- **Changes:** + - **agent-turn.sh** template: Bash 3.2 compatible script for true background autonomy (OpenClaw's `agentTurn` pattern). Fires a full agent turn with its own isolated session: + 1. Accepts `{{AGENT_NAME}}`, `{{WORKING_DIR}}`, `{{MAX_TURNS}}` placeholders + 2. Creates/resumes an isolated session using `claude --resume` with a task-specific session ID format: `agent:{{AGENT_NAME}}:turn:$(date +%s)` + 3. Loads context from HEARTBEAT.md and SESSION-STATE.md before invoking + 4. Writes results to a dated log file + 5. Includes PID tracking for orphan detection (write PID to `.agent-turn.pid`, check on next run) + 6. Session rollover: after configurable number of turns (default 200) or age (default 72h), start a fresh session + - **system-event.sh** template: Lighter-weight script for injecting a text event into an existing session (OpenClaw's `systemEvent` pattern): + 1. Accepts `{{SESSION_ID}}`, `{{EVENT_TEXT}}` placeholders + 2. Uses `claude --resume {{SESSION_ID}} -p "{{EVENT_TEXT}}"` to inject into existing session + 3. Does NOT create a new session -- requires an active session + - **README.md**: Explains the difference between agentTurn (full background autonomy, isolated session) and systemEvent (inject into existing session), when to use each, session lifecycle and rollover, orphan detection pattern. +- **Reuses:** `scripts/templates/automation.sh` for headless runner pattern; `scripts/templates/heartbeat/heartbeat-runner.sh` pattern from Step 10; OpenClaw cron service patterns (session targets, rollover, catchup) +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash 3.2 syntax issues, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add isolated agentTurn and systemEvent cron templates"` + +### Step 13: Update agent-system-design skill with OpenClaw pattern references + +- **Files:** `skills/agent-system-design/SKILL.md`, `skills/agent-system-design/references/memory-patterns.md` (new), `skills/agent-system-design/references/autonomy-patterns.md` (new) +- **Changes:** + - **SKILL.md**: Add new sections: + - "## Memory patterns" -- describe 3-tier memory, link to `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md` + - "## Autonomy patterns" -- describe proactive agent, ADL/VFM, link to `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md` + - "## Heartbeat scheduling" -- describe heartbeat pattern, link to templates + - Update "Getting started" to mention `/agent-factory:build` (already renamed in Step 1) + - Add to trigger phrases in description: "agent memory", "3-tier memory", "WAL protocol", "proactive agent", "self-improving agent", "heartbeat scheduling" + - **memory-patterns.md** (new): Comprehensive reference on 3-tier memory: + - Architecture: hot (SESSION-STATE.md), warm (daily logs), cold (MEMORY.md) + - WAL Protocol: write before responding, prevents data loss on crashes + - Working Buffer Protocol: activate at 60% context usage, capture key exchanges + - Compaction Recovery: read order for resuming after context loss + - Template locations and placeholder variables + - **autonomy-patterns.md** (new): Reference on self-governing agent patterns: + - Proactive agent cycle: observe, identify, score (VFM), implement or defer + - ADL constraints with examples + - VFM scoring rubric with worked examples + - Self-healing protocol (5-10 attempts before escalation) + - Isolated agentTurn vs systemEvent + - When NOT to use proactive patterns (simple pipelines, human-in-the-loop workflows) +- **Reuses:** `skills/agent-system-design/SKILL.md` existing structure; `skills/agent-system-design/references/feature-map.md` format for reference docs; OpenClaw research brief patterns +- **Verify:** `grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md` → expected: >= 3 +- **On failure:** revert -- skill file must remain valid +- **Checkpoint:** `git commit -m "feat(skills): add memory and autonomy pattern references to agent-system-design"` + +### Step 14: Create heartbeat context injection templates (Paperclip pattern) + +- **Files:** `scripts/templates/heartbeat/context-packet.md` (new), `scripts/templates/heartbeat/wake-prompt.md` (new) +- **Changes:** + - **context-packet.md** template: Paperclip's "Memento Man" pattern -- a curated context payload injected on each heartbeat wakeup. Contains sections with `{{PLACEHOLDER}}` variables: + ``` + # Context Packet: {{AGENT_NAME}} + Generated: {{TIMESTAMP}} + + ## Identity + {{AGENT_IDENTITY}} + + ## Current Goals + {{ACTIVE_GOALS}} + + ## Memory State + {{MEMORY_SUMMARY}} + + ## Task Queue + {{PENDING_TASKS}} + + ## Recent Events (last 24h) + {{RECENT_EVENTS}} + + ## Wake Reason + {{WAKE_REASON}} + + ## Budget Status + Spent: {{BUDGET_SPENT}} / {{BUDGET_LIMIT}} ({{BUDGET_PERCENT}}%) + {{BUDGET_WARNING}} + ``` + - **wake-prompt.md** template: The prompt template composed for each heartbeat wakeup. Follows Paperclip's `bootstrapPromptTemplate` + `promptTemplate` pattern: + ``` + You are {{AGENT_NAME}}. You are waking up for a scheduled heartbeat. + + Read the context packet below carefully. It contains everything you + need to know about your current state and pending work. + + {{CONTEXT_PACKET}} + + Your task for this beat: + {{WAKE_REASON}} + + Rules: + - Do NOT infer tasks from prior conversations + - Only work on what is in the context packet and wake reason + - If nothing needs attention, respond with HEARTBEAT_OK + - Update SESSION-STATE.md before finishing + ``` +- **Reuses:** `scripts/templates/heartbeat/HEARTBEAT.md` from Step 10; Paperclip's context snapshot pattern from source code analysis +- **Verify:** `ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l` → expected: 5 files total (HEARTBEAT.md, heartbeat-runner.sh, README.md from Step 10 + context-packet.md, wake-prompt.md from this step) +- **On failure:** revert -- templates must have valid placeholder syntax +- **Checkpoint:** `git commit -m "feat(templates): add context injection templates (Paperclip heartbeat pattern)"` + +### Step 15: Create goal hierarchy templates (Paperclip pattern) + +- **Files:** `scripts/templates/goals/GOALS.md` (new), `scripts/templates/goals/goal-tracker.sh` (new), `scripts/templates/goals/README.md` (new) +- **Changes:** + - **GOALS.md** template: File-based goal hierarchy using Paperclip's simple `parent_id` pattern but adapted for flat files. Format: + ``` + # Goals: {{PROJECT_NAME}} + + ## Company Goals + - [G1] {{COMPANY_GOAL_1}} + - [G2] {{COMPANY_GOAL_2}} + + ## Project Goals + - [G1.1] {{PROJECT_GOAL_1}} (parent: G1) + - [G1.2] {{PROJECT_GOAL_2}} (parent: G1) + - [G2.1] {{PROJECT_GOAL_3}} (parent: G2) + + ## Task Goals + - [G1.1.1] {{TASK_GOAL_1}} (parent: G1.1, owner: {{AGENT_NAME}}, status: active) + - [G1.1.2] {{TASK_GOAL_2}} (parent: G1.1, owner: {{AGENT_NAME}}, status: pending) + ``` + Each goal has: ID (hierarchical dot notation), description, parent reference, owner agent, status (active/pending/complete/blocked). The parent reference is a simple string match -- no recursive traversal needed (matching Paperclip's actual implementation vs. aspirational docs). + - **goal-tracker.sh** template: Bash 3.2 compatible script that: + 1. Reads GOALS.md + 2. Uses python3 to parse goal entries and their parent/status/owner fields + 3. Reports: count by status, orphaned goals (parent doesn't exist), goals without owners + 4. Can update status: `./goal-tracker.sh complete G1.1.1` marks goal as complete + 5. Generates a goal summary for context injection (used by heartbeat context-packet) + - **README.md**: Explains goal hierarchy pattern, the simple parent_id approach (honest about it being flat, not recursive), how agents reference goals, integration with heartbeat context injection. +- **Reuses:** Paperclip's goal hierarchy pattern (simple parent_id, not recursive); context packet integration from Step 14 +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/goals/goal-tracker.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add goal hierarchy templates (Paperclip pattern)"` + +### Step 16: Create budget tracking templates (Paperclip pattern) + +- **Files:** `scripts/templates/budget/budget-hook.sh` (new), `scripts/templates/budget/BUDGET.md` (new), `scripts/templates/budget/budget-report.sh` (new), `scripts/templates/budget/README.md` (new) +- **Changes:** + - **budget-hook.sh** template: PostToolUse hook that logs cost events. Bash 3.2 compatible. Following Paperclip's post-hoc enforcement pattern: + 1. Reads tool call result from stdin (JSON via python3) + 2. Extracts usage data if available (token counts from Claude's response) + 3. Appends cost event to `budget/cost-events.jsonl` with timestamp, agent name, token counts, estimated cost + 4. After logging, calls budget-check logic: reads `budget/BUDGET.md` for policy, sums cost events for current window + 5. If soft threshold (default 80%) reached: writes warning to stderr + 6. If hard threshold (100%) reached AND hard_stop enabled: writes block message, creates `budget/PAUSED` flag file + 7. [ASSUMPTION] Cost estimation uses Anthropic API pricing. If API billing endpoint is available, query it instead of estimating from token counts. + - **BUDGET.md** template: Budget policy file: + ``` + # Budget Policy: {{PROJECT_NAME}} + + ## Company Budget + - window: {{BUDGET_WINDOW}} # calendar_month or lifetime + - limit: {{BUDGET_LIMIT_CENTS}} cents + - warn_percent: 80 + - hard_stop: true + + ## Agent Budgets + - {{AGENT_NAME}}: {{AGENT_BUDGET_CENTS}} cents/{{BUDGET_WINDOW}} + + ## Notification + - on_warn: log # log | file | hook + - on_hard_stop: pause # pause | terminate + ``` + - **budget-report.sh** template: Bash 3.2 compatible script that: + 1. Reads `budget/cost-events.jsonl` + 2. Uses python3 to aggregate costs by agent, by day, by window + 3. Compares against BUDGET.md policies + 4. Outputs a formatted report showing: total spend, per-agent spend, % of budget used, projection for current window + 5. Checks for PAUSED flag file and reports paused agents + - **README.md**: Explains budget enforcement pattern, post-hoc vs pre-run checking (honest about limitations), how to integrate budget-hook.sh into settings.json PostToolUse, Anthropic API integration for accurate cost data (marked as [ASSUMPTION]). +- **Reuses:** `scripts/templates/post-tool-use.sh` for PostToolUse hook pattern; Paperclip's budget enforcement pattern (post-hoc, soft/hard thresholds, pause on exceed) +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-hook.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-report.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add budget tracking templates (Paperclip pattern)"` + +### Step 17: Create governance and approval gate templates (Paperclip pattern) + +- **Files:** `scripts/templates/governance/GOVERNANCE.md` (new), `scripts/templates/governance/approval-gate.sh` (new), `scripts/templates/governance/README.md` (new) +- **Changes:** + - **GOVERNANCE.md** template: Governance policy file. "Autonomy is a privilege you grant" (Paperclip philosophy). Contains: + ``` + # Governance: {{PROJECT_NAME}} + + ## Autonomy Levels + - Level 0: Full manual approval (all tool calls require human OK) + - Level 1: Auto-approve safe operations (Read, Glob, Grep) + - Level 2: Auto-approve file operations (+ Write, Edit within project) + - Level 3: Auto-approve all except destructive (+ Bash non-destructive) + - Level 4: Full autonomy with hooks as guardrails + + Current level: {{AUTONOMY_LEVEL}} + + ## Approval Gates + Gates are checkpoints where the agent MUST pause and request human approval. + + - {{GATE_1_NAME}}: {{GATE_1_CONDITION}} + Action: {{GATE_1_ACTION}} # pause | notify | log + - {{GATE_2_NAME}}: {{GATE_2_CONDITION}} + Action: {{GATE_2_ACTION}} + + ## Escalation Rules + - Budget exceeded: pause agent, notify via {{NOTIFICATION_METHOD}} + - Error threshold: after {{ERROR_THRESHOLD}} consecutive errors, pause agent + - Unknown tool call: block and log + - Scope violation: block and notify + + ## Audit Requirements + - All tool calls logged to audit.log + - Budget events logged to cost-events.jsonl + - Approval decisions logged to approvals.log + - Retention: {{LOG_RETENTION_DAYS}} days + ``` + - **approval-gate.sh** template: PreToolUse hook that implements approval gates. Bash 3.2 compatible: + 1. Reads GOVERNANCE.md to get current autonomy level and gate definitions + 2. Uses python3 to parse the governance config + 3. Based on autonomy level, decides whether to auto-approve or require approval + 4. For gated operations: writes an approval request to `governance/pending-approvals.jsonl` with timestamp, tool name, tool input summary, and status=pending + 5. Checks `governance/approval-responses.jsonl` for a matching response + 6. If no response within timeout: blocks (exit 2) with message "Approval required. Check governance/pending-approvals.jsonl" + 7. If approved: allows (exit 0) and logs to approvals.log + 8. If denied: blocks (exit 2) and logs denial + - **README.md**: Explains governance model, autonomy levels, how gates map to Claude Code hooks, how to configure notification channels, comparison with Paperclip's `approvals` table approach. +- **Reuses:** `scripts/templates/pre-tool-use.sh` for PreToolUse hook pattern; Paperclip's approval mechanism and autonomy model +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/governance/approval-gate.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add governance and approval gate templates (Paperclip pattern)"` + +### Step 18: Create org-chart template (Paperclip pattern) + +- **Files:** `scripts/templates/org-chart/ORG-CHART.md` (new), `scripts/templates/org-chart/org-manager.sh` (new), `scripts/templates/org-chart/README.md` (new) +- **Changes:** + - **ORG-CHART.md** template: File-based org chart using Paperclip's simple `reportsTo` pattern. Format: + ``` + # Organization: {{ORG_NAME}} + + ## Agents + + | Agent | Role | Reports To | Status | Budget | + |-------|------|-----------|--------|--------| + | {{AGENT_1}} | {{ROLE_1}} | (board) | active | {{BUDGET_1}} | + | {{AGENT_2}} | {{ROLE_2}} | {{AGENT_1}} | active | {{BUDGET_2}} | + | {{AGENT_3}} | {{ROLE_3}} | {{AGENT_1}} | active | {{BUDGET_3}} | + + ## Delegation Rules + + - Board (human) → top-level agents: task assignment, goal setting + - Manager agents → direct reports: task decomposition, delegation + - Cross-team requests: route through common ancestor in org chart + - Escalation: up the reporting chain to the first agent with authority + + ## Human Override + + The human operator is the "board of directors" with override authority + on all decisions. Any agent can be paused, redirected, or terminated + by the human at any time. + ``` + - **org-manager.sh** template: Bash 3.2 compatible script that: + 1. Reads ORG-CHART.md and the `.claude/agents/` directory + 2. Uses python3 to parse the org chart table + 3. Validates: all agents listed exist as `.claude/agents/*.md` files, no circular reporting chains, all agents have a role + 4. Can add an agent: `./org-manager.sh add agent-name "Role" reports-to-agent` + 5. Can remove an agent: `./org-manager.sh remove agent-name` (reassigns direct reports to parent) + 6. Generates a text-based org tree visualization + - **README.md**: Explains org chart pattern, Paperclip's simple `reportsTo` FK approach, how delegation flows through the hierarchy, cross-team routing, human override authority. +- **Reuses:** Paperclip's org chart implementation (reportsTo FK, role field); `scripts/templates/goals/goal-tracker.sh` pattern for markdown table parsing with python3 +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/org-chart/org-manager.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add org-chart template (Paperclip pattern)"` + +### Step 19: Update agent-system-design skill with Paperclip pattern references + +- **Files:** `skills/agent-system-design/SKILL.md`, `skills/agent-system-design/references/orchestration-patterns.md` (new), `skills/agent-system-design/references/governance-patterns.md` (new) +- **Changes:** + - **SKILL.md**: Add new sections: + - "## Orchestration patterns" -- describe heartbeat scheduling, context injection, goal hierarchy, org chart. Link to `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md` + - "## Governance patterns" -- describe autonomy levels, approval gates, budget enforcement. Link to `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md` + - Add to trigger phrases: "heartbeat scheduling", "goal hierarchy", "budget tracking", "approval gates", "governance", "org chart", "multi-agent coordination", "agent budget" + - Update the "System components" table to include: Memory (3-tier), Heartbeat, Goals, Budget, Governance, Org Chart + - **orchestration-patterns.md** (new): Reference covering: + - Heartbeat scheduling with context injection + - Goal hierarchy (simple parent_id, not recursive) + - Org chart and delegation + - Task checkout via file locking (write `task.lock` with agent name, check before claiming) + - Session persistence (task-specific session IDs) + - Comparison matrix: OpenClaw cron vs Paperclip heartbeat vs Claude Code /schedule + - **governance-patterns.md** (new): Reference covering: + - Autonomy levels (0-4 scale) + - Approval gates and human oversight + - Budget enforcement (post-hoc) + - Audit trail requirements + - Error threshold and automatic pause + - Paperclip's "autonomy is a privilege" philosophy +- **Reuses:** `skills/agent-system-design/SKILL.md` existing structure; Step 13 pattern additions; `skills/agent-system-design/references/security-patterns.md` format +- **Verify:** `grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md` → expected: >= 5 +- **On failure:** revert -- skill file must remain valid +- **Checkpoint:** `git commit -m "feat(skills): add orchestration and governance pattern references"` + +### Step 20: Create feedback loop templates (Self-learning) + +- **Files:** `scripts/templates/feedback/FEEDBACK.md` (new), `scripts/templates/feedback/feedback-collector.sh` (new), `scripts/templates/feedback/performance-scorer.sh` (new), `scripts/templates/feedback/README.md` (new) +- **Changes:** + - **FEEDBACK.md** template: Feedback tracking file: + ``` + # Feedback Log: {{PROJECT_NAME}} + + ## Format + + Each entry records the outcome of a pipeline run and user feedback. + + | Date | Pipeline | Agent | Score | Issue | Resolution | Pattern | + |------|----------|-------|-------|-------|------------|---------| + ``` + Patterns column captures recurring issues for optimization. + - **feedback-collector.sh** template: PostToolUse hook variant that, after pipeline completion (detected by checking if the pipeline skill's final step ran), prompts for feedback collection: + 1. Reads the pipeline output + 2. Reads the reviewer score (if available) + 3. Appends an entry to FEEDBACK.md with: date, pipeline name, participating agents, reviewer score, identified issues + 4. Detects patterns: if the same issue type appears 3+ times, flags it as a recurring pattern + 5. All file operations use python3 for JSON/CSV parsing, bash 3.2 compatible wrapper + - **performance-scorer.sh** template: Standalone scoring script: + 1. Reads FEEDBACK.md and cost-events.jsonl + 2. Uses python3 to compute per-agent metrics: average reviewer score, error rate, cost per run, improvement trend (last 10 vs previous 10) + 3. Outputs a performance report with recommendations: which agents need prompt tuning, which are most cost-effective, which have degrading performance + 4. Flags agents scoring below configurable threshold (default: 60/100) for prompt review + - **README.md**: Explains feedback loop pattern, how to integrate with existing pipelines, how scoring informs self-improvement (connects to VFM from Step 11), example of a feedback-driven prompt iteration cycle. +- **Reuses:** `scripts/templates/post-tool-use.sh` for hook pattern; `scripts/templates/proactive/VFM-SCORING.md` from Step 11 for scoring methodology +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add feedback loop and performance scoring templates"` + +### Step 21: Create pipeline optimization templates (Self-learning) + +- **Files:** `scripts/templates/optimization/pipeline-optimizer.sh` (new), `scripts/templates/optimization/self-healing.sh` (new), `scripts/templates/optimization/README.md` (new) +- **Changes:** + - **pipeline-optimizer.sh** template: Script that analyzes pipeline performance data and suggests optimizations. Bash 3.2 compatible: + 1. Reads feedback/FEEDBACK.md and budget/cost-events.jsonl + 2. Uses python3 to identify: + - Bottleneck agents (highest cost, lowest score) + - Unnecessary revision loops (reviewer always passes → remove revision loop) + - Underutilized agents (rarely invoked → consider merging roles) + - Cost outliers (runs that cost 3x+ average → investigate prompt efficiency) + 3. Generates `optimization/RECOMMENDATIONS.md` with specific, actionable suggestions + 4. Each recommendation includes VFM pre-score (from Step 11) to help decide whether to implement + 5. Does NOT auto-implement changes -- recommendations only. Human or proactive agent decides. + - **self-healing.sh** template: Error recovery script that runs after pipeline failures: + 1. Reads the error log (from audit.log or pipeline output) + 2. Categorizes the error: timeout, permission denied, tool not found, API error, content quality + 3. For each category, applies a recovery strategy: + - Timeout: retry with reduced max_turns + - Permission denied: check hooks and settings, log which permission was missing + - Tool not found: check MCP server status, log missing tool + - API error: retry with backoff (5s, 15s, 45s -- max 3 attempts) + - Content quality: log for feedback loop, do not retry + 4. Tracks attempt count (max 5 per OpenClaw pattern), escalates after max attempts + 5. All recovery attempts logged to `optimization/healing-log.jsonl` + - **README.md**: Explains pipeline optimization and self-healing patterns, how they connect to feedback loops (Step 20) and VFM scoring (Step 11), when to use auto-healing vs manual intervention, safety limits on auto-recovery. +- **Reuses:** `scripts/templates/proactive/VFM-SCORING.md` for scoring recommendations; `scripts/templates/feedback/performance-scorer.sh` output format; OpenClaw's 5-attempt self-healing pattern +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add pipeline optimization and self-healing templates"` + +### Step 22: Create Docker deployment templates + +- **Files:** `scripts/templates/docker/Dockerfile` (new), `scripts/templates/docker/docker-compose.yml` (new), `scripts/templates/docker/docker-entrypoint.sh` (new), `scripts/templates/docker/README.md` (new) +- **Changes:** + - **Dockerfile** template: + ```dockerfile + FROM node:22-slim + + # Install Claude Code CLI + RUN npm install -g @anthropic-ai/claude-code + + # Create agent user (non-root) + RUN useradd -m -s /bin/bash agent + + # Copy project files + WORKDIR /home/agent/project + COPY . . + RUN chown -R agent:agent /home/agent + + USER agent + + # Set up environment + ENV ANTHROPIC_API_KEY={{ANTHROPIC_API_KEY}} + ENV HOME=/home/agent + + ENTRYPOINT ["./docker-entrypoint.sh"] + ``` + - **docker-compose.yml** template: + ```yaml + version: "3.8" + services: + agent: + build: . + container_name: {{PROJECT_NAME}}-agent + restart: unless-stopped + volumes: + - ./data:/home/agent/project/data + - ./memory:/home/agent/project/memory + - ./budget:/home/agent/project/budget + - ./logs:/home/agent/project/logs + environment: + - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} + - AGENT_NAME={{AGENT_NAME}} + - HEARTBEAT_INTERVAL={{HEARTBEAT_INTERVAL}} + env_file: + - .env + ``` + - **docker-entrypoint.sh** template: Bash 3.2 compatible entry script: + 1. Validates required environment variables (ANTHROPIC_API_KEY) + 2. Runs the heartbeat-runner.sh in a loop with the configured interval + 3. Graceful shutdown: trap SIGTERM, finish current run, exit cleanly + 4. Health check: write timestamp to `/tmp/agent-health` on each successful heartbeat + - **README.md**: Build and run instructions, volume mount explanation (data, memory, budget, logs persist across container restarts), environment variables reference, security considerations (never bake API keys into image), how to use with docker-compose up/down. +- **Reuses:** `scripts/templates/heartbeat/heartbeat-runner.sh` as the main loop inside the container; `scripts/templates/automation.sh` pattern; Paperclip's Docker deployment approach (from research) +- **Verify:** `test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/Dockerfile && test -f /Users/ktg/repos/agent-builder/scripts/templates/docker/docker-compose.yml` → expected: both exist +- **On failure:** revert -- Docker templates must be complete and valid +- **Checkpoint:** `git commit -m "feat(templates): add Docker deployment templates"` + +### Step 23: Create import/export system + +- **Files:** `scripts/templates/transfer/export-system.sh` (new), `scripts/templates/transfer/import-system.sh` (new), `scripts/templates/transfer/MANIFEST.md` (new), `scripts/templates/transfer/README.md` (new) +- **Changes:** + - **export-system.sh** template: Bash 3.2 compatible script that packages an agent system into a tarball: + 1. Accepts `{{PROJECT_DIR}}` and optional `{{EXPORT_NAME}}` placeholders + 2. Generates a MANIFEST.md listing all exported components with versions and checksums + 3. Collects: `.claude/agents/*.md`, `.claude/skills/*/SKILL.md`, `.claude/hooks/*.sh` or `hooks/*.sh`, `.claude/settings.json`, `CLAUDE.md`, `HEARTBEAT.md`, `GOALS.md`, `GOVERNANCE.md`, `ORG-CHART.md`, `BUDGET.md`, `automation/` scripts + 4. Does NOT export: `.env`, `*.local.*`, `audit.log`, `cost-events.jsonl`, `memory/` (runtime state), `.git/` + 5. Creates tarball: `agent-system-{{EXPORT_NAME}}-{{DATE}}.tar.gz` + 6. Outputs: tarball path, component count, total size + - **import-system.sh** template: Bash 3.2 compatible script that imports a tarball into a project: + 1. Accepts tarball path as argument + 2. Reads MANIFEST.md from the tarball to show what will be imported + 3. Checks for conflicts: existing files that would be overwritten + 4. If conflicts: lists them and exits with instructions (user must confirm or use --force) + 5. Extracts tarball contents into the project directory + 6. Replaces `{{PLACEHOLDER}}` variables in imported files with project-specific values (reads from a config file or prompts) + 7. Makes all `.sh` files executable + 8. Validates: all agent files have valid YAML frontmatter, all hook scripts pass `bash -n` + 9. Outputs: imported component list, any warnings + - **MANIFEST.md** template: Format for the manifest file included in exports: + ``` + # Agent System Export + + Exported: {{DATE}} + Source: {{PROJECT_NAME}} + Version: {{VERSION}} + + ## Components + + | Type | File | Checksum | + |------|------|----------| + | agent | .claude/agents/{{NAME}}.md | {{SHA256}} | + | skill | .claude/skills/{{NAME}}/SKILL.md | {{SHA256}} | + ... + + ## Requirements + - Claude Code version: >= {{MIN_VERSION}} + - Required MCP servers: {{MCP_LIST}} + - Required tools: {{TOOL_LIST}} + ``` + - **README.md**: Explains import/export workflow, what is and isn't included, how to customize after import, round-trip verification process. +- **Reuses:** Existing file structure conventions; template placeholder pattern +- **Verify:** `bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/export-system.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/transfer/import-system.sh` → expected: no syntax errors +- **On failure:** retry -- fix bash syntax, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add import/export system for agent systems"` + +### Step 24: Create 5 additional domain templates (total 10) + +- **Files:** `scripts/templates/domains/customer-support.md` (new), `scripts/templates/domains/devops-automation.md` (new), `scripts/templates/domains/legal-review.md` (new), `scripts/templates/domains/sales-intelligence.md` (new), `scripts/templates/domains/security-audit.md` (new) +- **Changes:** + **Template 6: customer-support.md** -- Customer support automation. Agents: ticket-classifier, response-drafter, escalation-checker. Pipeline: classify incoming ticket, draft response, check if escalation needed, send or escalate. + + **Template 7: devops-automation.md** -- DevOps pipeline automation. Agents: deploy-checker, incident-detector, runbook-executor. Pipeline: monitor deployments, detect incidents, execute runbook steps, report status. + + **Template 8: legal-review.md** -- Legal document review. Agents: clause-extractor, risk-assessor, compliance-checker. Pipeline: extract key clauses, assess risks, check compliance requirements, produce review summary. + + **Template 9: sales-intelligence.md** -- Sales research and intelligence. Agents: prospect-researcher, pitch-customizer, follow-up-tracker. Pipeline: research prospect, customize pitch materials, track follow-up actions. + + **Template 10: security-audit.md** -- Security posture assessment. Agents: config-scanner, vulnerability-checker, remediation-advisor. Pipeline: scan configurations, check for vulnerabilities, advise on remediation, generate audit report. + + Update `scripts/templates/domains/README.md` to include all 10 templates. + + Each template follows the same format as Step 7: header comment, domain-specialized agent definitions, pipeline skill template, recommended hooks, example CLAUDE.md sections. All use `{{PLACEHOLDER}}` variables. +- **Reuses:** `scripts/templates/domains/*.md` from Step 7 as format reference; `agents/builder.md` for agent format; `commands/build.md` Phase 3-4 patterns +- **Verify:** `ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l` → expected: 11 (10 templates + README) +- **On failure:** retry -- ensure all templates follow established format, then revert if still failing +- **Checkpoint:** `git commit -m "feat(templates): add 5 more domain templates (10 total)"` + +### Step 25: Create MCP integration reference + +- **Files:** `skills/agent-system-design/references/mcp-integrations.md` (new) +- **Changes:** Create a reference document for MCP server integrations commonly used with agent systems: + - **Communication MCP servers**: Slack (`@anthropic-ai/mcp-server-slack`), GitHub, Linear -- with `.mcp.json` configuration examples + - **Data MCP servers**: PostgreSQL, SQLite, filesystem -- for agent data storage + - **Browser MCP servers**: Playwright (`@anthropic-ai/mcp-server-playwright`) -- for web interaction agents + - **Custom MCP servers**: How to build an MCP server that agents can use, using `@anthropic-ai/sdk` + - For each server: purpose, `.mcp.json` entry format, which agent types benefit from it, security considerations + - Integration with Agent Factory: how the builder agent configures `.mcp.json` during Phase 3, how the deployment command handles MCP server availability on different targets +- **Reuses:** `skills/agent-system-design/references/feature-map.md` format; existing `.mcp.json` references in the feature map +- **Verify:** `test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md` → expected: file exists +- **On failure:** skip -- MCP reference is supplementary, not blocking +- **Checkpoint:** `git commit -m "docs(skills): add MCP integration reference"` + +### Step 26: Update build command to integrate all Phase 2-5 features + +- **Files:** `commands/build.md` +- **Changes:** Extend the build workflow with new optional phases (user can skip any): + - **Phase 2.5 (after Operating Manual, before Agent Team): Memory Setup** -- Ask if user wants 3-tier memory. If yes, generate SESSION-STATE.md, MEMORY.md, and memory/ directory from templates. Configure daily log rotation. + - **Phase 3.5 (after Agent Team): Proactive Agent** -- Ask if any agents should be proactive (self-improving). If yes, add ADL/VFM rules from proactive templates. Show VFM scoring example. + - **Phase 5.5 (after Security): Governance** -- Ask autonomy level (0-4). Generate GOVERNANCE.md with approval gates matching the chosen level. Integrate budget tracking if user wants it. + - **Phase 5.7: Goals and Org Chart** -- For multi-agent systems (3+ agents): generate GOALS.md and ORG-CHART.md. Define reporting structure and goal hierarchy. + - **Phase 6 update**: Add Docker as deployment option. Add `/schedule` with heartbeat as the primary recommended option. Reference deployment-advisor agent. + - **Phase 7 update**: After test run, offer to set up feedback loop (performance scoring template). Show how to run pipeline-optimizer.sh after first week. + - **Summary update**: Include all new components in the final summary (memory, governance, goals, org-chart, budget, heartbeat). +- **Reuses:** Existing Phase 1-7 structure; all templates from Steps 9-18 +- **Verify:** `wc -l /Users/ktg/repos/agent-builder/commands/build.md` → expected: significantly more lines than original (was ~390, should be ~600+) +- **On failure:** revert -- build command is critical path, must be valid +- **Checkpoint:** `git commit -m "feat(commands): integrate all Phase 2-5 features into build workflow"` + +### Step 27: Update plugin.json, CLAUDE.md, README.md for v1.0 + +- **Files:** `.claude-plugin/plugin.json`, `CLAUDE.md`, `README.md` +- **Changes:** + - **plugin.json**: Bump version to `"1.0.0"`. Update description: "Build and manage autonomous agent systems with Claude Code. Guided workflow with 3-tier memory, heartbeat scheduling, budget tracking, governance, org-chart, 10 domain templates, and import/export. Inspired by OpenClaw and Paperclip patterns." Add keywords: `"memory"`, `"heartbeat"`, `"budget"`, `"governance"`, `"org-chart"`, `"templates"`, `"import"`, `"export"`. Update repository URL from agent-builder to agent-factory. + - **CLAUDE.md**: Update to reflect full Agent Factory capabilities: + - Update "What this plugin does" to describe the full system + - Update "Plugin structure" to list all 4 commands, 2 agents, 2 skills, and all template directories + - Add a "Template directories" section listing: memory/, heartbeat/, proactive/, cron/, goals/, budget/, governance/, org-chart/, docker/, domains/, transfer/, feedback/, optimization/ + - **README.md**: Complete rewrite reflecting Agent Factory v1.0: + - New description emphasizing the full vision + - Updated command table with all 4 commands + - Feature list: 3-tier memory, heartbeat scheduling, proactive agents with ADL/VFM, goal hierarchy, budget tracking, governance and approval gates, org chart, 10 domain templates, Docker deployment, import/export + - Architecture overview showing how patterns layer + - Quick start guide + - Pattern reference section linking to skill references + - Version history +- **Reuses:** Existing file structures; all features from Steps 1-26 +- **Verify:** `python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'"` → expected: no error +- **On failure:** revert -- manifest must be valid JSON with correct version +- **Checkpoint:** `git commit -m "feat: Agent Factory v1.0.0 — full vision realized"` + +### Failure recovery rules + +- **On failure: revert** -- undo this step's changes (`git checkout -- {files}`), do NOT proceed +- **On failure: retry** -- attempt once more with the alternative approach described, then revert if still failing +- **On failure: skip** -- this step is non-critical; continue to next step and note the skip +- **On failure: escalate** -- stop execution entirely; the issue requires human judgment +- **Checkpoint** -- after each step succeeds, commit changes so subsequent failures cannot corrupt completed work + +## Alternatives Considered + +| Approach | Pros | Cons | Why rejected | +|----------|------|------|--------------| +| Template engine (Handlebars/EJS) | Richer templates, conditionals, loops | External dependency, added complexity | Spec requires plain string replace with `{{PLACEHOLDER}}` -- no engine | +| SQLite for budget/goals/org-chart | Structured queries, atomic operations | External dependency (sqlite3 binary), not human-readable | File-based approach is Claude Code-native, editable by humans and agents | +| Node.js scripts instead of bash | Richer JSON handling, async support | Requires Node.js installation, bash 3.2 constraint is for generated scripts | Bash with python3 for JSON is sufficient and more portable | +| Single monolithic build command | Simpler mental model, one entry point | Too long, hard to test phases independently | Separate commands allow modular use; build orchestrates | +| Pre-run budget checking (like a reservation system) | Prevents overspend before it happens | Requires persistent service or lock file coordination | Paperclip's post-hoc approach is proven robust enough in practice | +| Vector memory (sqlite-vec like OpenClaw) | Better semantic search | External dependency, complexity far exceeds file-based approach | Spec explicitly excludes vector/embedding memory -- stay file-based | + +## Test Strategy + +- **Framework:** No test framework in this project. All verification is via shell commands. +- **Existing patterns:** Manual verification via `bash -n` for shell scripts, YAML parsing validation for agent/skill frontmatter, `grep` for content verification. +- **Verification approach:** Each step includes a concrete verify command. Additionally: + - All `.sh` templates verified with `bash -n` (syntax check) + - All agent `.md` files verified for valid YAML frontmatter via python3 yaml parser + - All command `.md` files verified for `description:` field in frontmatter + - Import/export round-trip tested: export a test system, import into a clean directory, verify all components present + - Domain templates verified: builder agent can read and apply placeholder substitution + +## Risks and Mitigations + +| Priority | Risk | Location | Impact | Mitigation | +|----------|------|----------|--------|------------| +| High | Anthropic billing API may not be accessible or may have different auth | `scripts/templates/budget/budget-hook.sh` | Budget tracking limited to token-based estimation | [ASSUMPTION] marked in template; fallback to token count estimation with configurable pricing | +| High | `/schedule` API may change | `commands/deploy.md`, heartbeat templates | Heartbeat scheduling breaks | Templates use standard `claude -p` invocation which is stable; `/schedule` is one option among cron/launchd/systemd/Docker | +| Medium | Bash 3.2 compatibility for complex scripts | All `.sh` templates | Scripts fail on Intel Mac | Every script verified with `bash -n`; python3 used for all complex operations (JSON, YAML, date math) | +| Medium | 27-step plan scope is large | All files | Execution takes multiple sessions | Execution strategy groups into independent waves; each step is independently committable | +| Low | Docker template may need updates for newer Claude Code versions | `scripts/templates/docker/` | Docker deployment breaks | Dockerfile uses `node:22-slim` and `npm install -g` which auto-updates | +| Low | Domain templates may not match all user domains | `scripts/templates/domains/` | Users need custom templates | 10 templates cover common domains; builder agent can create custom templates | + +## Assumptions + +| # | Assumption | Why unverifiable | Impact if wrong | +|---|-----------|-----------------|-----------------| +| 1 | Anthropic billing API exists and is accessible with standard API key | No docs found confirming exact endpoint | Budget tracking falls back to token-based estimation; budget-hook.sh needs manual cost configuration | +| 2 | `/schedule` trigger interface is stable enough to build on | Claude Code internal API, no stability guarantee | Heartbeat templates still work with cron/launchd/systemd; `/schedule` is optional deployment target | +| 3 | Docker deployment should use docker-compose.yml with Dockerfile | Spec assumption, no user confirmation | Minor: can generate either format; both templates provided | +| 4 | `claude --resume` with custom session IDs works for isolated agent turns | Based on CLI docs, not tested with custom session key formats | agentTurn template may need session ID format adjustment | + +**WARNING: Plan has 4 unverified assumptions -- review before executing.** + +## Verification + +End-to-end checks that prove the plan was implemented correctly: + +- [ ] `grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan" | wc -l` → expected: 0 (all renamed) +- [ ] `python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); print(d['name'], d['version'])"` → expected: `agent-factory 1.0.0` +- [ ] `ls /Users/ktg/repos/agent-builder/commands/ | sort` → expected: `build.md deploy.md evaluate.md status.md` +- [ ] `ls /Users/ktg/repos/agent-builder/agents/ | sort` → expected: `builder.md deployment-advisor.md` +- [ ] `ls /Users/ktg/repos/agent-builder/skills/ | sort` → expected: `agent-system-design managed-agents` +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/domains/*.md | wc -l` → expected: 11 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l` → expected: 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l` → expected: 5 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l` → expected: 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/budget/ | wc -l` → expected: 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/governance/ | wc -l` → expected: 3 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/org-chart/ | wc -l` → expected: 3 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/docker/ | wc -l` → expected: 4 +- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/transfer/ | wc -l` → expected: 4 +- [ ] `find /Users/ktg/repos/agent-builder/scripts/templates -name "*.sh" -exec bash -n {} \;` → expected: no errors (all scripts pass syntax check) +- [ ] `find /Users/ktg/repos/agent-builder/agents -name "*.md" -exec python3 -c "import yaml,sys; yaml.safe_load(open(sys.argv[1]).read().split('---')[1])" {} \;` → expected: no errors (all agents have valid frontmatter) + +## Estimated Scope + +- **Files to modify:** 6 (plugin.json, CLAUDE.md, README.md, commands/build.md, skills/agent-system-design/SKILL.md, .gitignore) +- **Files to create:** ~55 (4 commands, 1 agent, 1 skill + references, ~45 template files across 13 template directories) +- **Complexity:** high (27 steps across 5 development phases, comprehensive template library) + +## Execution Strategy + +### File Dependency Analysis + +Steps were analyzed for file overlap. The following connected components emerge: + +- **Component A (rename + commands + agents):** Steps 1, 2, 3, 4, 5, 8, 26, 27 -- all touch `commands/build.md`, `CLAUDE.md`, `README.md`, or `plugin.json` +- **Component B (memory + OpenClaw patterns):** Steps 9, 10, 11, 12, 13 -- touch `scripts/templates/memory/`, `scripts/templates/heartbeat/`, `scripts/templates/proactive/`, `scripts/templates/cron/`, `skills/agent-system-design/` +- **Component C (Paperclip patterns):** Steps 14, 15, 16, 17, 18, 19 -- touch `scripts/templates/heartbeat/`, `scripts/templates/goals/`, `scripts/templates/budget/`, `scripts/templates/governance/`, `scripts/templates/org-chart/`, `skills/agent-system-design/` +- **Component D (Self-learning):** Steps 20, 21 -- touch `scripts/templates/feedback/`, `scripts/templates/optimization/` +- **Component E (Integration):** Steps 22, 23, 24, 25 -- touch `scripts/templates/docker/`, `scripts/templates/transfer/`, `scripts/templates/domains/`, `skills/agent-system-design/references/` +- **Component F (Domain templates initial):** Step 7 -- touch `scripts/templates/domains/` + +Note: Components B and C share `skills/agent-system-design/SKILL.md` (Steps 13 and 19) and `scripts/templates/heartbeat/` (Steps 10 and 14). Component A shares `commands/build.md` across Steps 8 and 26. These overlaps create dependencies. + +### Session 1: Foundation -- Rename and Commands +- **Steps:** 1, 2, 3, 4, 5 +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `.claude-plugin/plugin.json`, `CLAUDE.md`, `README.md`, `commands/build.md`, `commands/deploy.md`, `commands/evaluate.md`, `commands/status.md`, `agents/deployment-advisor.md`, `skills/agent-system-design/SKILL.md` (rename only) + - Never touch: `scripts/templates/*` (except existing), `skills/managed-agents/` + +### Session 2: Skills and Initial Templates +- **Steps:** 6, 7 +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `skills/managed-agents/SKILL.md`, `skills/managed-agents/references/api-patterns.md`, `scripts/templates/domains/*.md` + - Never touch: `commands/`, `agents/`, `.claude-plugin/`, `scripts/templates/memory/`, `scripts/templates/heartbeat/` + +### Session 3: OpenClaw Memory and Autonomy Patterns +- **Steps:** 9, 10, 11, 12 +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `scripts/templates/memory/`, `scripts/templates/heartbeat/HEARTBEAT.md`, `scripts/templates/heartbeat/heartbeat-runner.sh`, `scripts/templates/heartbeat/README.md`, `scripts/templates/proactive/`, `scripts/templates/cron/` + - Never touch: `commands/`, `agents/`, `skills/`, `scripts/templates/heartbeat/context-packet.md`, `scripts/templates/heartbeat/wake-prompt.md` + +### Session 4: Paperclip Orchestration Patterns +- **Steps:** 14, 15, 16, 17, 18 +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `scripts/templates/heartbeat/context-packet.md`, `scripts/templates/heartbeat/wake-prompt.md`, `scripts/templates/goals/`, `scripts/templates/budget/`, `scripts/templates/governance/`, `scripts/templates/org-chart/` + - Never touch: `commands/`, `agents/`, `skills/`, `scripts/templates/heartbeat/HEARTBEAT.md`, `scripts/templates/heartbeat/heartbeat-runner.sh`, `scripts/templates/heartbeat/README.md` + +### Session 5: Self-Learning Systems +- **Steps:** 20, 21 +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `scripts/templates/feedback/`, `scripts/templates/optimization/` + - Never touch: `commands/`, `agents/`, `skills/`, all other template dirs + +### Session 6: Integration -- Docker, Transfer, Additional Templates +- **Steps:** 22, 23, 24 +- **Wave:** 1 +- **Depends on:** none +- **Scope fence:** + - Touch: `scripts/templates/docker/`, `scripts/templates/transfer/`, `scripts/templates/domains/customer-support.md`, `scripts/templates/domains/devops-automation.md`, `scripts/templates/domains/legal-review.md`, `scripts/templates/domains/sales-intelligence.md`, `scripts/templates/domains/security-audit.md`, `scripts/templates/domains/README.md` (update only) + - Never touch: `commands/`, `agents/`, `skills/`, `scripts/templates/domains/content-pipeline.md`, `scripts/templates/domains/code-review.md`, `scripts/templates/domains/monitoring.md`, `scripts/templates/domains/research-synthesis.md`, `scripts/templates/domains/data-processing.md` + +### Session 7: Skill Updates and References +- **Steps:** 13, 19, 25 +- **Wave:** 2 +- **Depends on:** Session 3 (Step 13 references memory templates), Session 4 (Step 19 references orchestration templates) +- **Scope fence:** + - Touch: `skills/agent-system-design/SKILL.md`, `skills/agent-system-design/references/memory-patterns.md`, `skills/agent-system-design/references/autonomy-patterns.md`, `skills/agent-system-design/references/orchestration-patterns.md`, `skills/agent-system-design/references/governance-patterns.md`, `skills/agent-system-design/references/mcp-integrations.md` + - Never touch: `commands/`, `agents/`, `scripts/templates/` + +### Session 8: Build Command Integration and Finalization +- **Steps:** 8, 26, 27 +- **Wave:** 3 +- **Depends on:** Session 1 (renamed commands), Session 2 (domain templates), Session 7 (skill updates) +- **Scope fence:** + - Touch: `commands/build.md`, `.claude-plugin/plugin.json`, `CLAUDE.md`, `README.md` + - Never touch: `scripts/templates/`, `skills/`, `agents/` + +### Execution Order + +- **Wave 1:** Session 1, Session 2, Session 3, Session 4, Session 5, Session 6 (parallel -- 6 independent sessions) +- **Wave 2:** Session 7 (after Sessions 3 and 4 complete) +- **Wave 3:** Session 8 (after Sessions 1, 2, and 7 complete) + +### Grouping rules applied + +- Steps sharing `skills/agent-system-design/SKILL.md` (13, 19) grouped into Session 7 +- Steps sharing `commands/build.md` (8, 26) grouped into Session 8 +- Steps sharing `scripts/templates/heartbeat/` split by specific files (Session 3: core files, Session 4: context files) +- Template directories with no overlap placed in separate Wave 1 sessions for maximum parallelism +- Integration steps (Session 8) depend on content they reference being complete + +## Plan Quality Score + +| Dimension | Weight | Score | Notes | +|-----------|--------|-------|-------| +| Structural integrity | 0.15 | 85 | Clear step ordering, dependency chain respected, waves logical | +| Step quality | 0.20 | 80 | Each step has concrete file lists, changes, verify commands. Some templates are necessarily described at concept level | +| Coverage completeness | 0.20 | 90 | All 5 phases covered, all spec requirements mapped to steps | +| Specification quality | 0.15 | 78 | No TBDs or TODOs; 4 assumptions documented. Template contents described in detail but are large | +| Risk & pre-mortem | 0.15 | 80 | 6 risks identified with mitigations. Bash 3.2 and API assumptions addressed | +| Headless readiness | 0.15 | 82 | Every step has On failure + Checkpoint. Execution strategy enables parallel sessions | +| **Weighted total** | **1.00** | **82** | **Grade: B+** | + +**Adversarial review:** +- **Plan critic:** APPROVE_WITH_NOTES -- 0 blockers, 2 major (template content descriptions are concept-level, not line-by-line; Steps 8 and 26 both modify build.md which creates ordering risk), 3 minor (some verify commands assume directory exists before testing file count; Session 7 depends on Sessions 3 and 4 which is correctly modeled but adds latency) +- **Scope guardian:** ALIGNED -- All 27 steps map to spec requirements. No scope creep detected. All 5 phases covered. 4 commands, 2 agents, 2 skills, 10 domain templates, import/export, Docker -- all per spec. One potential gap: spec mentions "self-healing" in Phase 4 which is covered by Step 21 (self-healing.sh template). + +## Revisions + +| # | Finding | Severity | Resolution | +|---|---------|----------|------------| +| 1 | Steps 8 and 26 both modify `commands/build.md` -- risk of merge conflict if run in parallel | major | Grouped into same Session 8 (Wave 3), ensuring sequential execution | +| 2 | Steps 13 and 19 both modify `skills/agent-system-design/SKILL.md` -- same risk | major | Grouped into same Session 7 (Wave 2), ensuring sequential execution | +| 3 | Heartbeat template directory shared between Steps 10 (core files) and 14 (context files) | minor | Scope fence explicitly lists which files each session may touch -- no overlap on specific files | +| 4 | Domain templates README.md updated in both Step 7 and Step 24 | minor | Step 7 creates README, Step 24 updates it. Placed in different sessions but no conflict since Step 24 appends to existing content. Session 6 scope fence for Step 24 correctly notes "update only" for README | +| 5 | Verify commands assume directories exist | minor | Steps that create directories have verify commands that test file existence, not directory listing. If directory creation fails, the file write in the step itself would have failed first, triggering On failure | diff --git a/.claude/research/source-code-analysis-2026-04-11.md b/.claude/research/source-code-analysis-2026-04-11.md new file mode 100644 index 0000000..dd5b465 --- /dev/null +++ b/.claude/research/source-code-analysis-2026-04-11.md @@ -0,0 +1,489 @@ +--- +type: source-code-analysis +created: 2026-04-11 +repos_analyzed: [paperclipai/paperclip, openclaw/openclaw] +purpose: "Implementation-level details for replicating best patterns in Agent Factory" +--- + +# Source Code Analysis: OpenClaw & Paperclip + +Repos were cloned and analyzed at code level on 2026-04-11. This document +captures implementation details NOT available in docs or articles — the actual +patterns, interfaces, and mechanisms worth replicating. + +## Critical Corrections (vs. docs/articles) + +These are things docs described differently than the code implements: + +1. **Canvas/A2UI (OpenClaw) is NOT generative rendering.** It's a static file + server. Agents write files to a workspace directory, the canvas-host serves + them over HTTP. No server-side rendering, no UI generation. This is NOT a + meaningful capability gap for Claude Code. + +2. **Goal hierarchy (Paperclip) is a simple adjacency list.** Just a `parent_id` + FK on the `goals` table. No recursive traversal at runtime — only the directly + referenced goal is passed to agents in `context_snapshot`. Docs said "full + ancestry" but that's aspirational, not implemented. + +3. **Budget enforcement (Paperclip) is post-hoc, not atomic.** Checked AFTER each + run via `evaluateCostEvent()`: reads `SUM(cost_cents)`, compares with policy, + pauses agent if exceeded. No pre-run budget reservation. Robust enough in practice. + +4. **OpenClaw has real vector memory.** Not just MEMORY.md files. Uses `sqlite-vec` + extension for vector search with embedding providers (Gemini, Mistral, Ollama, + OpenAI, Voyage, Bedrock, local llama). This is significantly more sophisticated + than file-based memory. + +--- + +## Paperclip Implementation Details + +### Heartbeat Scheduler + +**File:** `server/src/services/heartbeat.ts` (4534 lines) + +Poll-based, not event-driven. `tickTimers()` iterates all agents on each tick: + +```typescript +tickTimers: async (now = new Date()) => { + const allAgents = await db.select().from(agents); + for (const agent of allAgents) { + if (agent.status === "paused" || "terminated" || "pending_approval") continue; + const policy = parseHeartbeatPolicy(agent); + if (!policy.enabled || policy.intervalSec <= 0) continue; + const elapsed = now.getTime() - new Date(agent.lastHeartbeatAt ?? agent.createdAt).getTime(); + if (elapsed < policy.intervalSec * 1000) continue; + await enqueueWakeup(agent.id, { source: "timer" }); + } +} +``` + +Heartbeat policy from `agent.runtimeConfig.heartbeat`: +- `enabled: boolean` +- `intervalSec: number` +- `wakeOnDemand: boolean` +- `maxConcurrentRuns: 1-10` + +4 wakeup triggers: `timer`, `assignment`, `on_demand`, `automation`. + +Concurrency control: in-process promise chain per agent (`startLocksByAgent` Map). +Not distributed — single server process only. + +### Run Lifecycle + +1. `enqueueWakeup()` → insert `heartbeat_runs` (status=queued) + `agent_wakeup_requests` +2. `startNextQueuedRunForAgent()` → check running count vs maxConcurrentRuns +3. `claimQueuedRun()` → `UPDATE heartbeat_runs SET status='running' WHERE status='queued'` +4. `executeRun()` → call `adapter.execute()`, stream output via `onLog` +5. On completion → update runs, runtime state, task sessions, create cost events +6. Orphan detection: `reapOrphanedRuns()` checks PIDs, auto-retries once + +### Adapter Interface + +**File:** `packages/adapter-utils/src/types.ts` + +```typescript +interface ServerAdapterModule { + type: string; + execute(ctx: AdapterExecutionContext): Promise; + testEnvironment(ctx: AdapterEnvironmentTestContext): Promise; + listSkills?: (ctx) => Promise; + syncSkills?: (ctx, desiredSkills) => Promise; + sessionCodec?: AdapterSessionCodec; + sessionManagement?: AdapterSessionManagement; +} +``` + +10 built-in adapters: `claude_local`, `codex_local`, `cursor_local`, `gemini_local`, +`openclaw_gateway`, `opencode_local`, `pi_local`, `hermes_local`, `process`, `http`. + +### Claude Adapter Execution + +**File:** `packages/adapters/claude-local/src/server/execute.ts` + +Invokes CLI as: +``` +claude --print - --output-format stream-json --verbose \ + [--resume ] \ + [--dangerously-skip-permissions] \ + [--model ] \ + [--max-turns N] \ + [--append-system-prompt-file ] \ + [--add-dir ] +``` + +Prompt composed from: `bootstrapPromptTemplate` (fresh sessions only) + wake payload ++ session handoff note + main `promptTemplate`. Template variables: `{{agent.id}}`, +`{{agent.name}}`, `{{context.wakeReason}}`, etc. + +### Task Checkout (Atomic Locking) + +**File:** `server/src/services/heartbeat.ts` (lines 3756-4010) + +Issues have `execution_run_id` column as soft lock. Uses PostgreSQL row-level locking: + +```sql +SELECT id FROM issues WHERE id = $1 AND company_id = $2 FOR UPDATE +``` + +Then conditional update: +```sql +UPDATE issues SET execution_run_id = $claimed_id +WHERE id = $issue_id AND (execution_run_id IS NULL OR execution_run_id = $claimed_id) +``` + +When same agent has running run → coalesce (merge context). +When different agent → defer (status `deferred_issue_execution`), promoted when original completes. + +### Budget Enforcement + +**File:** `server/src/services/budgets.ts` + +Schema: +``` +budget_policies: scope_type (company|agent|project), scope_id, metric (billed_cents), + window_kind (calendar_month_utc|lifetime), amount (cents), warn_percent (80), + hard_stop_enabled, notify_enabled +``` + +Flow after each run: +1. Load active policies for company/agent/project +2. `SELECT SUM(cost_cents) FROM cost_events` filtered by window +3. If >= soft threshold → create `budget_incidents` (type soft) +4. If >= amount AND hard_stop → `pauseScopeForBudget()` → `UPDATE agents SET status='paused'` + → `cancelBudgetScopeWork()` → SIGTERM → SIGKILL (with graceSec) + +Pre-run check: `getInvocationBlock()` only checks `paused` flag, not live budget sum. + +### Skills System + +Skills injected as symlinked tmpdir per run: +```typescript +async function buildSkillsDir(config) { + const tmp = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-skills-")); + const target = path.join(tmp, ".claude", "skills"); + await fs.mkdir(target, { recursive: true }); + for (const entry of availableEntries) { + if (!desiredNames.has(entry.key)) continue; + await fs.symlink(entry.source, path.join(target, entry.runtimeName)); + } + return tmp; // Passed as: claude --add-dir +} +``` + +Company skills stored in DB: `company_skills` table with `markdown` content, +`source_type` (github|url|local_path|skills_sh), `file_inventory`, `trust_level`. + +### Session Persistence + +`agent_task_sessions` table: unique on `(companyId, agentId, adapterType, taskKey)`. +- taskKey = issueId (for issue-scoped) or `"__heartbeat__"` (timer-only) +- sessionParamsJson = adapter-specific (Claude stores `{ sessionId, cwd }`) +- Upserted after each run completion + +Session compaction: rotate after 200 runs, 2M raw input tokens, or 72h age. +Claude adapter: `nativeContextManagement: "confirmed"` → compaction disabled +(Claude manages its own context window). + +### Org Chart + +Just `agents.reportsTo` self-referential FK. `agents.role` text field. +Rendered as SVG server-side (5 visual styles). No separate table. + +### Database Schema + +PostgreSQL via Drizzle ORM. 55 migrations. Key tables: +- `companies` — tenant root, status, budget +- `agents` — adapter_type, adapter_config (jsonb), runtime_config (jsonb), reports_to, status, budget +- `goals` — self-referencing parent_id, level (company/project/task), owner_agent_id +- `issues` — FK to goals/projects/agents, execution_run_id (soft lock), parent_id +- `heartbeat_runs` — status, context_snapshot (jsonb), session_id, process_pid, usage_json +- `agent_wakeup_requests` — wake queue with status enum +- `agent_task_sessions` — per-(agent, adapter, taskKey) session state +- `budget_policies` / `budget_incidents` / `cost_events` — cost control +- `company_skills` — skill definitions with markdown content +- `approvals` — human approval requests +- `routines` — scheduled workflows with cron expressions + +### Agent Configuration Format + +```json +{ + "adapterConfig": { + "command": "claude", + "model": "claude-opus-4-5", + "cwd": "/path/to/project", + "promptTemplate": "You are agent {{agent.name}}...", + "instructionsFilePath": "/path/to/AGENTS.md", + "dangerouslySkipPermissions": true, + "maxTurnsPerRun": 0, + "timeoutSec": 0, + "graceSec": 20, + "skills": ["paperclipai/paperclip/mcp-server"] + }, + "runtimeConfig": { + "heartbeat": { + "enabled": true, + "intervalSec": 300, + "wakeOnDemand": true, + "maxConcurrentRuns": 1 + } + } +} +``` + +--- + +## OpenClaw Implementation Details + +### Gateway + +**File:** `src/gateway/server.impl.ts` + +WebSocket server on port 18789. Flat dispatch table: +```typescript +const coreGatewayHandlers: Record = { + ...connectHandlers, ...chatHandlers, ...cronHandlers, + ...skillsHandlers, ...sessionsHandlers, ...agentHandlers, + ...channelsHandlers, ...modelsHandlers, // 28 handler groups +} +``` + +Auth: roles (`operator` | `node`), operator scopes +(`admin`, `read`, `write`, `approvals`, `pairing`). + +### Skills System + +**Files:** `src/agents/skills/workspace.ts`, `skill-contract.ts` + +Skill = directory with SKILL.md. Frontmatter parsed for metadata. + +Loading limits: +- Max 300 candidates per root +- Max 200 loaded per source +- Max 150 in prompt +- Max 30,000 chars in prompt +- Max 256 KB per skill file + +Prompt format (XML): +```xml + + + github + ... + ~/.openclaw/workspace/skills/github/SKILL.md + + +``` + +Path compaction: home dir → `~` (saves 5-6 tokens per path). + +Skill metadata fields: `always`, `skillKey`, `emoji`, `homepage`, `os`, +`requires` (bins, anyBins, env, config), `install` specs (brew, node, go, uv, download). + +ClawHub integration for remote skill registry (search, install, update). + +### Memory System + +**Files:** `packages/memory-host-sdk/` + +Two backends: +- `builtin` — SQLite + sqlite-vec extension for vector search +- `qmd` — External QuickMemory Daemon process + +Embedding providers: Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local (node-llama). + +Interface: +```typescript +interface MemorySearchManager { + search(query, opts?: { maxResults?, minScore?, sessionKey? }): Promise + readFile(params): Promise<{ text, path }> + status(): MemoryProviderStatus + sync?(params?): Promise +} +``` + +Session transcripts indexable into memory backend. MEMORY.md / memory.md as default +memory file convention. + +### HEARTBEAT Mechanism + +**File:** `src/auto-reply/heartbeat.ts` + +Default prompt: +``` +"Read HEARTBEAT.md if it exists. Follow it strictly. Do not infer or repeat +old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK." +``` + +HEARTBEAT.md task format: +```yaml +tasks: + - name: email-check + interval: 30m + prompt: "Check for urgent unread emails" +``` + +Key functions: +- `isHeartbeatContentEffectivelyEmpty()` — skips API calls when file has only + headers/empty items. Saves significant cost. +- `parseHeartbeatTasks()` — parses YAML tasks block +- `isTaskDue()` — checks intervals against last-run timestamps +- `stripHeartbeatToken()` — strips HEARTBEAT_OK from responses; responses + under `ackMaxChars` (300) suppressed from chat + +**HeartbeatRunner** (`infra/heartbeat-runner.ts`): +- Per-agent intervals (default 30m) +- `HeartbeatAgentState` tracks lastRunMs, nextDueMs, intervalMs +- On fire: reads HEARTBEAT.md, builds prompt, dispatches inbound message + +### Cron Service + +**File:** `src/cron/service.ts` + +Three schedule types: +- `{ kind: "at"; at: string }` — one-shot +- `{ kind: "every"; everyMs: number }` — interval +- `{ kind: "cron"; expr: string; tz?: string; staggerMs?: number }` — cron expression + +Two job payload types: +- `systemEvent` — injects text into existing session (needs attention available) +- `agentTurn` — fires full agent turn (true background autonomy) + +Session targets: `"main" | "isolated" | "current" | "session:"`. +Isolated gets own session key with freshness/rollover logic. + +Startup catchup: runs up to 5 missed jobs immediately, staggers rest (5s gap). +Failure alerts after N consecutive errors, 1h cooldown. + +### Multi-Agent Routing + +Session key format: `agent::` +Type detection via: `isCronSessionKey()`, `isSubagentSessionKey()`, `isAcpSessionKey()` + +Per-agent isolation: own workspace, session store, skill set, heartbeat config, model config. + +Subagent spawning: ACP-based, session depth tracked in keys, reactivation support. + +### Channel Adapter Interface + +**File:** `src/channels/plugins/types.plugin.ts` + +```typescript +type ChannelPlugin = { + id: ChannelId; + meta: ChannelMeta; + capabilities: ChannelCapabilities; + outbound?: ChannelOutboundAdapter; + messaging?: ChannelMessagingAdapter; + lifecycle?: ChannelLifecycleAdapter; + heartbeat?: ChannelHeartbeatAdapter; + security?: ChannelSecurityAdapter; + agentTools?: ChannelAgentToolFactory; + streaming?: ChannelStreamingAdapter; + threading?: ChannelThreadingAdapter; + // ~15 optional adapter slots total +} +``` + +Restart policy: exponential backoff (5s initial, 5min max, factor 2, jitter 0.1, +max 10 attempts). + +### Security + +- Exec approval: `ExecApprovalManager` with promise-based flow, `allow-once` vs + `allow-always`, 15s grace timeout +- Tool policy: `pickSandboxToolPolicy()` per sandbox config +- Security audit: comprehensive checks (gateway auth, channel config, plugin trust, + exec surfaces, filesystem ACLs) +- Auth rate limiting with browser-specific stricter limits +- External content guard: tracks provenance, `allowUnsafeExternalContent` flag + +### Agent Configuration + +```yaml +agents: + defaults: + model: + primary: "anthropic/claude-opus-4-5" + fallbacks: ["anthropic/claude-sonnet-4-5"] + heartbeat: + enabled: true + every: "30m" + prompt: "Check HEARTBEAT.md" + ackMaxChars: 300 + skills: + limits: + maxSkillsInPrompt: 150 + maxSkillsPromptChars: 30000 + list: + - id: "myagent" + workspace: "~/workspace" + model: + primary: "anthropic/claude-sonnet-4-5" + heartbeat: + every: "1h" + skills: + filter: ["github", "slack"] +``` + +Workspace files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, +HEARTBEAT.md, BOOTSTRAP.md, MEMORY.md. + +### Plugin Hooks + +29 lifecycle hook points: +`before_model_resolve`, `before_prompt_build`, `before_agent_start`, +`before_agent_reply`, `llm_input`, `llm_output`, `agent_end`, +`inbound_claim`, `message_received`, `message_sending`, `message_sent`, +`before_tool_call`, `after_tool_call`, `session_start`, `session_end`, +`subagent_spawning`, `subagent_delivery_target`, `gateway_start`, +`gateway_stop`, `before_dispatch`, `reply_dispatch`, `before_install`, etc. + +--- + +## Patterns Worth Replicating in Agent Factory + +### From Paperclip + +1. **Heartbeat as context injection** — Each beat starts clean, loads curated context + packet. Maps to: `/schedule` trigger + CLAUDE.md + memory files loaded per session. + +2. **Adapter interface** — Clean `execute(ctx)` pattern. Maps to: our agent files + are already adapter-like (model, tools, prompt per agent). + +3. **Budget as governance primitive** — Post-hoc cost tracking with pause thresholds. + Maps to: hook that reads `/usage` after each run, logs to cost-events file, + alerts when threshold crossed. + +4. **Task checkout via file locking** — Paperclip uses PostgreSQL. We can use + file-based locking (write `task.lock` with agent name, check before claiming). + +5. **Session persistence via taskKey** — Different tasks get different sessions. + Maps to: `--resume` with task-specific session IDs. + +### From OpenClaw + +6. **HEARTBEAT.md with task parsing** — YAML tasks block with intervals and + due-time checking. Maps directly to our generated HEARTBEAT.md files. + +7. **Emptiness detection** — Skip API calls when heartbeat file is effectively empty. + Critical cost saver. Include in generated heartbeat scripts. + +8. **Skill prompt XML format** — Standardized skill discovery in system prompt. + Our skills already use this via Claude Code's built-in mechanism. + +9. **3-tier memory** — SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold). + Maps to: templates we generate in the user's project. + +10. **Startup catchup with stagger** — Run missed jobs on restart, but don't + thundering-herd. Include in generated automation scripts. + +### Unique to Agent Factory + +11. **Guided construction** — Neither tool helps you BUILD the system. We do. +12. **Progressive complexity** — Start with 1 agent, grow to full org. +13. **Domain templates** — Not just researcher→writer→reviewer. Monitoring, + code review, data processing, research synthesis. +14. **Claude Code-native** — No PostgreSQL, no Node.js server, no Docker required. + Just agents, skills, hooks, settings.json, /schedule. diff --git a/.claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md b/.claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md new file mode 100644 index 0000000..b245284 --- /dev/null +++ b/.claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md @@ -0,0 +1,215 @@ +--- +type: ultraresearch-brief +created: 2026-04-11 +question: "Research OpenClaw and Paperclip agent frameworks to find inspiration and concrete value proposition for agent-builder plugin" +confidence: 0.92 +dimensions: 7 +mcp_servers_used: [] +local_agents_used: [Explore] +external_agents_used: [WebFetch, WebSearch] +source_code_analyzed: [paperclip, openclaw] +target_audience: "Claude Code users who know the primitives but need help composing agent systems" +--- + +# OpenClaw & Paperclip Agent Framework Research + +> Generated by ultraresearch-local on 2026-04-11 + +## Research Question + +What features, architecture patterns, and capabilities do OpenClaw and Paperclip offer, and what can we learn from them to create a Claude Code plugin that makes it easy for anyone to build genuinely useful, self-running agent systems? + +## Executive Summary + +OpenClaw (354k stars) excels at individual agent capability — 23+ messaging channels, 5400+ skills, proactive agent patterns with self-improvement guardrails, and 3-tier memory systems. Paperclip (51k stars) excels at organizational coordination — heartbeat scheduling, goal hierarchies, budget enforcement, and governance. Neither offers guided, agentic-assisted construction of complete agent systems, which is the unique gap our plugin fills. Confidence is high for OpenClaw (verified via docs, GitHub, and existing codebase references) and medium for Paperclip (verified via docs site, GitHub, and multiple third-party articles). + +## Dimensions + +### 1. Core Capabilities -- Confidence: high + +**OpenClaw:** +- Personal AI assistant running on your own devices +- 23+ messaging channels (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, IRC, Teams, Matrix, and more) +- 100+ preconfigured AgentSkills for shell, file, and web automation +- Canvas/A2UI — agent-driven visual workspace (unique capability, no Claude Code equivalent) +- Browser control via dedicated Chrome/Chromium with CDP +- Voice capabilities with wake words (macOS/iOS) and continuous voice (Android) +- Device node system (camera, screen recording, location, notifications) +- Model-agnostic: Claude, GPT, Gemini, Ollama all supported +- Source: [GitHub](https://github.com/openclaw/openclaw), [DigitalOcean guide](https://www.digitalocean.com/resources/articles/what-is-openclaw) + +**Paperclip:** +- Orchestration platform for teams of AI agents ("If OpenClaw is an employee, Paperclip is the company") +- Agent-agnostic: supports OpenClaw, Claude Code, Codex, Cursor, bash, HTTP webhooks +- Explicitly NOT an agent framework — doesn't build agents, organizes them +- Explicitly NOT a chatbot, workflow builder, or prompt manager +- Ticket-based task management with threaded conversations +- Multi-company support with complete data isolation +- Source: [GitHub](https://github.com/paperclipai/paperclip), [paperclip.ing](https://paperclip.ing/docs) + +### 2. Architecture & Patterns -- Confidence: high + +**OpenClaw architecture:** +- Gateway control plane on ws://127.0.0.1:18789 +- Channel Adapters transform protocol-specific input into unified message objects +- Multi-agent routing: isolated sessions per agent, workspace, or sender +- Pi agent runtime in RPC mode with tool/block streaming +- Node.js + TypeScript, pnpm, WebSocket protocol +- Source: [GitHub README](https://github.com/openclaw/openclaw), [Medium architecture article](https://bibek-poudel.medium.com/how-openclaw-works-understanding-ai-agents-through-a-real-architecture-5d59cc7a4764) + +**Paperclip architecture:** +- Node.js backend + React UI + PostgreSQL +- Company-as-runtime model: agents modeled as employees +- Heartbeat scheduler fires agent execution at defined intervals +- Each beat is stateless — state lives in external storage (Postgres) +- Atomic operations for task checkout and budget enforcement +- Source: [GitHub](https://github.com/paperclipai/paperclip), [Towards AI article](https://pub.towardsai.net/paperclip-the-open-source-operating-system-for-zero-human-companies-2c16f3f22182) + +### 3. Self-Learning & Autonomy -- Confidence: medium + +**OpenClaw — Proactive Agent Skill (most sophisticated pattern found):** +- 3-tier memory: SESSION-STATE.md (working memory), memory/YYYY-MM-DD.md (daily capture), MEMORY.md (curated long-term) +- WAL Protocol (Write-Ahead Logging): write important details BEFORE responding +- Working Buffer Protocol: captures exchanges in "danger zone" (60%+ context) before compaction +- Compaction Recovery: reads buffer, session state, daily notes, then searches all sources +- Self-improvement guardrails: + - ADL (Anti-Drift Limits): no fake intelligence, no unverifiable mods, no novelty over stability + - VFM (Value-First Modification): score changes on frequency, failure reduction, burden reduction, cost savings. Only implement if score > 50 + - Priority: Stability > Explainability > Reusability > Scalability > Novelty +- Two cron types: `systemEvent` (needs attention) vs `isolated agentTurn` (true background autonomy) +- Self-healing: try 5-10 approaches before asking for help +- Source: [Proactive Agent Skill on GitHub](https://github.com/openclaw/skills/blob/main/skills/halthelobster/proactive-agent/SKILL.md) + +**Paperclip:** +- Heartbeat model with context injection (Memento Man mental model) +- Memory doesn't live in agent session — external storage maintains continuity +- Context packets: curated payloads with memory state, task queue, recent events, agent config +- No explicit self-learning mechanism documented, but rich audit trail enables pattern detection +- Skills as markdown instruction files, installable via GitHub URLs +- Source: [MindStudio heartbeat article](https://www.mindstudio.ai/blog/heartbeat-pattern-paperclip-ai-agents-24-7) + +### 4. User Experience & Onboarding -- Confidence: high + +**OpenClaw:** +- `npm install -g openclaw@latest && openclaw onboard --install-daemon` +- Requires Node 24 (recommended) or 22.16+ +- Has "Cowork" variant specifically because core is too hard for non-developers +- Doctor CLI for troubleshooting and migrations +- Pairing mode for security (unknown senders get pairing codes) +- 3 release channels: stable, beta, dev +- Source: [GitHub README](https://github.com/openclaw/openclaw) + +**Paperclip:** +- `npx paperclipai onboard --yes` — quick start +- React dashboard for agent management +- Mobile-friendly interface +- Requires Node 20+ and pnpm 9.15+ +- No guided construction — you configure agents manually +- Source: [GitHub README](https://github.com/paperclipai/paperclip) + +### 5. Multi-Agent Orchestration -- Confidence: high + +**OpenClaw:** +- Session tools for agent-to-agent communication: sessions_list, sessions_history, sessions_send +- Reply-back mechanism for async coordination +- Route channels/accounts/peers to isolated agents with dedicated workspaces +- No organizational structure (flat, peer-to-peer) +- Source: [GitHub README](https://github.com/openclaw/openclaw) + +**Paperclip:** +- Org chart with hierarchies, roles, and reporting lines +- Cascading delegation — work flows up and down org chart automatically +- Goal-aware task execution with full ancestry +- Atomic task checkout prevents double-work +- Cross-team requests delegate to best agent +- Human as "board of directors" with override authority +- Source: [paperclip.ing/docs](https://paperclip.ing/docs), [Medium article](https://medium.com/@creativeaininja/paperclip-the-open-source-platform-turning-ai-agents-into-an-actual-company-7348015c5bf7) + +### 6. Extensibility & Integrations -- Confidence: medium + +**OpenClaw:** +- Skills marketplace with 5400+ community skills (26% flagged with vulnerabilities) +- Skills installed via URL with auto-updating +- Plugin system and channel adapter architecture +- Bundled/managed/workspace skill tiers +- Source: [VoltAgent awesome-openclaw-skills](https://github.com/VoltAgent/awesome-openclaw-skills) + +**Paperclip:** +- Plugin ecosystem (awesome-paperclip curated list) +- Runtime skill injection without retraining +- Import/export of company templates +- Skills as markdown files +- Source: [GitHub README](https://github.com/paperclipai/paperclip) + +### 7. Deployment & Operations -- Confidence: high + +**OpenClaw:** +- Docker-based containment (agent runs inside container — blast radius limited) +- Tailscale Serve/Funnel for remote access +- SSH tunnels with token/password auth +- Nix declarative configuration +- Always-on via daemon install +- Source: [GitHub README](https://github.com/openclaw/openclaw) + +**Paperclip:** +- Self-hosted, MIT, no mandatory accounts +- Local-first: embedded Node.js + Postgres +- Multi-company isolation on single infrastructure +- Per-agent monthly budgets with automatic throttling +- Immutable audit logs with full tool-call tracing +- Config versioning with rollback +- Source: [paperclip.ing/docs](https://paperclip.ing/docs) + +## Synthesis + +The critical insight is that OpenClaw and Paperclip operate at **different layers of the same stack**: + +- **OpenClaw** = the agent runtime layer (what an individual agent can do) +- **Paperclip** = the orchestration layer (how agents coordinate as a team) +- **Agent Factory** = the construction layer (how you build and configure both) + +Neither tool offers what our plugin does: a guided, interview-driven, AI-assisted workflow that generates a complete agent system from scratch. OpenClaw's "Cowork" variant exists precisely because the core tool is too hard for non-developers — this validates that there's demand for lower-barrier agent creation. Paperclip's manual configuration model means every agent needs hand-crafting before it can be "hired." + +The most powerful patterns to incorporate: + +1. **From OpenClaw:** 3-tier memory with WAL protocol, proactive agent pattern with self-improvement guardrails (ADL/VFM), isolated agentTurn for background autonomy +2. **From Paperclip:** Heartbeat with context injection, goal hierarchy, budget enforcement, governance model ("autonomy is a privilege you grant") +3. **Unique to us:** Progressive complexity (1 agent → full org), agentically-guided construction, domain-specific templates, Claude Code-native (no external infrastructure) + +The security philosophies are complementary, not conflicting: OpenClaw uses containment (Docker — limit blast radius), our plugin uses prevention (hooks/deny — stop before it happens). Both should be available. + +## Open Questions + +- **Canvas/A2UI details** — What does OpenClaw's visual workspace actually generate? HTML? Native UI? Understanding this clarifies whether it's worth pursuing for Claude Code. +- **Paperclip self-learning implementation** — The heartbeat + audit trail creates rich data, but no explicit feedback loop is documented. Is this a planned feature or deliberately excluded? +- **OpenClaw skill security** — 26% of community skills flagged with vulnerabilities. What vetting process exists, and should we build one? +- **Cowork UX** — What does OpenClaw's simplified non-developer experience look like? This directly informs our target audience. + +## Recommendation + +Build Agent Factory as a 5-phase evolution: + +1. **v0.2:** Fix existing gaps (3 missing commands, deployment-advisor, managed-agents skill, domain templates) +2. **v0.3:** Incorporate OpenClaw patterns (3-tier memory, WAL, proactive agent, isolated cron) +3. **v0.4:** Incorporate Paperclip patterns (heartbeat, goal hierarchy, budgets, governance, org-chart) +4. **v0.5:** Self-learning systems (feedback loops, performance scoring, pipeline optimization) +5. **v1.0:** Full integration (MCP integrations, Docker deployment, templates marketplace, import/export) + +The key differentiator throughout: every feature is accessible through guided, AI-assisted construction with progressive complexity — start simple, grow as needed. + +## Sources + +| # | Source | Type | Quality | Used in | +|---|--------|------|---------|---------| +| 1 | [OpenClaw GitHub](https://github.com/openclaw/openclaw) | official | high | 1,2,4,5,7 | +| 2 | [Paperclip GitHub](https://github.com/paperclipai/paperclip) | official | high | 1,2,4,5,6,7 | +| 3 | [OpenClaw Docs](https://docs.openclaw.ai) | official | high | 2,5 | +| 4 | [Paperclip Docs](https://paperclip.ing/docs) | official | high | 2,3,5,7 | +| 5 | [Proactive Agent Skill](https://github.com/openclaw/skills/blob/main/skills/halthelobster/proactive-agent/SKILL.md) | official | high | 3 | +| 6 | [MindStudio Heartbeat Article](https://www.mindstudio.ai/blog/heartbeat-pattern-paperclip-ai-agents-24-7) | community | medium | 3 | +| 7 | [DigitalOcean: What is OpenClaw](https://www.digitalocean.com/resources/articles/what-is-openclaw) | community | medium | 1 | +| 8 | [Medium: How OpenClaw Works](https://bibek-poudel.medium.com/how-openclaw-works-understanding-ai-agents-through-a-real-architecture-5d59cc7a4764) | community | medium | 2 | +| 9 | [Medium: Paperclip as Company](https://medium.com/@creativeaininja/paperclip-the-open-source-platform-turning-ai-agents-into-an-actual-company-7348015c5bf7) | community | medium | 5 | +| 10 | [awesome-openclaw-skills](https://github.com/VoltAgent/awesome-openclaw-skills) | community | medium | 6 | +| 11 | `skills/agent-system-design/references/feature-map.md` | codebase | high | 1 | +| 12 | `skills/agent-system-design/references/security-patterns.md` | codebase | high | 7 | diff --git a/.claude/ultraplan-spec-2026-04-11-agent-factory.md b/.claude/ultraplan-spec-2026-04-11-agent-factory.md new file mode 100644 index 0000000..8d9adb8 --- /dev/null +++ b/.claude/ultraplan-spec-2026-04-11-agent-factory.md @@ -0,0 +1,106 @@ +# Task: Agent Factory — Full Vision Realization + +## Goal + +Transform the existing agent-builder plugin into Agent Factory: a comprehensive, +guided system for building autonomous agent systems using Claude Code. The plugin +should take users from zero to a fully operational multi-agent system through a +7-phase guided workflow, incorporating best patterns from OpenClaw (individual +agent capability) and Paperclip (organizational coordination). + +Success = all 5 development phases completed, delivering: foundational commands +and agents, OpenClaw-inspired memory/autonomy patterns, Paperclip-inspired +orchestration/governance patterns, self-learning systems, and full integration +with import/export and bundled templates. + +## Non-Goals + +- Building a central registry or marketplace with community features +- Replacing OpenClaw or Paperclip — Agent Factory is the construction layer +- Supporting non-Claude-Code agent runtimes +- Building a web UI or dashboard +- Vector/embedding-based memory (OpenClaw uses sqlite-vec — we stay file-based) +- Canvas/A2UI equivalent (confirmed as just a static file server in OpenClaw) + +## Constraints + +- macOS Intel (bash 3.2 compatibility for all generated scripts) +- No external infrastructure required for core functionality (PostgreSQL, Node.js server) +- Claude Code native: agents, skills, hooks, settings.json, /schedule +- Plugin structure must follow Claude Code plugin conventions +- All templates are plain files with `{{PLACEHOLDER}}` variables, replaced via + string operations (no template engine dependency) +- Generated hook scripts must be bash 3.2 compatible +- Agent YAML frontmatter must be valid +- Never write files outside the user's project directory +- `${CLAUDE_PLUGIN_ROOT}` for all intra-plugin paths + +## Preferences + +- TypeScript for any scripting within the plugin itself +- Rene .md/.sh templates with placeholder comments +- Conventional Commits for all checkpoint commits +- Progressive complexity: 1 agent → full org +- Domain-specific pipeline templates (not just generic) +- Multi-target deployment from start: /schedule, Docker, systemd + +## Non-Functional Requirements + +- Budget tracking via Anthropic API integration (not /usage parsing) +- Import/export of complete agent systems (tarball format) +- 5-10 bundled domain templates as starting points +- All generated agents must include verification commands +- Governance patterns must include human oversight gates +- Self-improvement must have guardrails (ADL/VFM-inspired) + +## Success Criteria + +- Plugin installs and `/agent-factory build` runs the guided workflow +- All 4 commands work: `/agent-factory build`, `/agent-factory deploy`, + `/agent-factory evaluate`, `/agent-factory status` +- `deployment-advisor` agent provides deployment recommendations +- `managed-agents` skill triggers on agent-related questions +- Generated agent systems include 3-tier memory templates +- Generated heartbeat files parse correctly with interval tracking +- Budget hooks log costs and alert on threshold +- Import/export round-trips: export → import in new project → system works +- At least 5 bundled domain templates available +- All generated bash scripts pass `bash -n` syntax check on bash 3.2 + +## Prior Attempts + +- v0.1.0 (current): Initial plugin with `/agent-factory build` command, + builder agent, 2 skills, basic templates. Missing: deploy, evaluate, + status commands. deployment-advisor agent stubbed but not implemented. + managed-agents skill empty. + +## Open Questions + +- Anthropic billing API: exact endpoint and auth mechanism needs verification + before implementation. [ASSUMPTION: API exists and is accessible with API key] +- /schedule API stability: is the trigger interface stable enough to build on? + [ASSUMPTION: yes, based on current Claude Code docs] +- Docker deployment: should we generate Dockerfile or docker-compose.yml or both? + [ASSUMPTION: docker-compose.yml with Dockerfile, matching Paperclip's approach] + +## Research Context + +Two research briefs inform this plan: +1. **ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md** (confidence: 0.92) + — Feature comparison, architecture, patterns, synthesis +2. **source-code-analysis-2026-04-11.md** — Implementation-level details from + actual source code of both projects + +Key patterns to replicate (from research): +- OpenClaw: 3-tier memory, WAL protocol, Working Buffer Protocol, proactive agent + with ADL/VFM guardrails, isolated agentTurn cron, emptiness detection +- Paperclip: Heartbeat with context injection, goal hierarchy (simple parent_id), + budget enforcement (post-hoc), task checkout via file locking, adapter interface, + org chart (reportsTo FK) + +## Metadata + +- **Created:** 2026-04-11 +- **Mode:** interview +- **Source:** ultraplan interview +- **Research:** 2 briefs (openclaw-paperclip frameworks + source code analysis)