docs(plans): Agent Factory ultraplan + execution guide

27-step plan across 8 sessions in 3 waves for transforming agent-builder into Agent Factory v1.0.0. Includes research briefs, spec, and wave-by-wave execution prompts with scope fences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 07:35:29 +02:00 · 2026-04-11 07:35:29 +02:00 · 7419d4283d
commit 7419d4283d
parent 075383990f
5 changed files with 2294 additions and 0 deletions
--- a/.claude/ultraplan-spec-2026-04-11-agent-factory.md
+++ b/.claude/ultraplan-spec-2026-04-11-agent-factory.md
@ -0,0 +1,106 @@
+# Task: Agent Factory — Full Vision Realization
+
+## Goal
+
+Transform the existing agent-builder plugin into Agent Factory: a comprehensive,
+guided system for building autonomous agent systems using Claude Code. The plugin
+should take users from zero to a fully operational multi-agent system through a
+7-phase guided workflow, incorporating best patterns from OpenClaw (individual
+agent capability) and Paperclip (organizational coordination).
+
+Success = all 5 development phases completed, delivering: foundational commands
+and agents, OpenClaw-inspired memory/autonomy patterns, Paperclip-inspired
+orchestration/governance patterns, self-learning systems, and full integration
+with import/export and bundled templates.
+
+## Non-Goals
+
+- Building a central registry or marketplace with community features
+- Replacing OpenClaw or Paperclip — Agent Factory is the construction layer
+- Supporting non-Claude-Code agent runtimes
+- Building a web UI or dashboard
+- Vector/embedding-based memory (OpenClaw uses sqlite-vec — we stay file-based)
+- Canvas/A2UI equivalent (confirmed as just a static file server in OpenClaw)
+
+## Constraints
+
+- macOS Intel (bash 3.2 compatibility for all generated scripts)
+- No external infrastructure required for core functionality (PostgreSQL, Node.js server)
+- Claude Code native: agents, skills, hooks, settings.json, /schedule
+- Plugin structure must follow Claude Code plugin conventions
+- All templates are plain files with `{{PLACEHOLDER}}` variables, replaced via
+  string operations (no template engine dependency)
+- Generated hook scripts must be bash 3.2 compatible
+- Agent YAML frontmatter must be valid
+- Never write files outside the user's project directory
+- `${CLAUDE_PLUGIN_ROOT}` for all intra-plugin paths
+
+## Preferences
+
+- TypeScript for any scripting within the plugin itself
+- Rene .md/.sh templates with placeholder comments
+- Conventional Commits for all checkpoint commits
+- Progressive complexity: 1 agent → full org
+- Domain-specific pipeline templates (not just generic)
+- Multi-target deployment from start: /schedule, Docker, systemd
+
+## Non-Functional Requirements
+
+- Budget tracking via Anthropic API integration (not /usage parsing)
+- Import/export of complete agent systems (tarball format)
+- 5-10 bundled domain templates as starting points
+- All generated agents must include verification commands
+- Governance patterns must include human oversight gates
+- Self-improvement must have guardrails (ADL/VFM-inspired)
+
+## Success Criteria
+
+- Plugin installs and `/agent-factory build` runs the guided workflow
+- All 4 commands work: `/agent-factory build`, `/agent-factory deploy`,
+  `/agent-factory evaluate`, `/agent-factory status`
+- `deployment-advisor` agent provides deployment recommendations
+- `managed-agents` skill triggers on agent-related questions
+- Generated agent systems include 3-tier memory templates
+- Generated heartbeat files parse correctly with interval tracking
+- Budget hooks log costs and alert on threshold
+- Import/export round-trips: export → import in new project → system works
+- At least 5 bundled domain templates available
+- All generated bash scripts pass `bash -n` syntax check on bash 3.2
+
+## Prior Attempts
+
+- v0.1.0 (current): Initial plugin with `/agent-factory build` command,
+  builder agent, 2 skills, basic templates. Missing: deploy, evaluate,
+  status commands. deployment-advisor agent stubbed but not implemented.
+  managed-agents skill empty.
+
+## Open Questions
+
+- Anthropic billing API: exact endpoint and auth mechanism needs verification
+  before implementation. [ASSUMPTION: API exists and is accessible with API key]
+- /schedule API stability: is the trigger interface stable enough to build on?
+  [ASSUMPTION: yes, based on current Claude Code docs]
+- Docker deployment: should we generate Dockerfile or docker-compose.yml or both?
+  [ASSUMPTION: docker-compose.yml with Dockerfile, matching Paperclip's approach]
+
+## Research Context
+
+Two research briefs inform this plan:
+1. **ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md** (confidence: 0.92)
+   — Feature comparison, architecture, patterns, synthesis
+2. **source-code-analysis-2026-04-11.md** — Implementation-level details from
+   actual source code of both projects
+
+Key patterns to replicate (from research):
+- OpenClaw: 3-tier memory, WAL protocol, Working Buffer Protocol, proactive agent
+  with ADL/VFM guardrails, isolated agentTurn cron, emptiness detection
+- Paperclip: Heartbeat with context injection, goal hierarchy (simple parent_id),
+  budget enforcement (post-hoc), task checkout via file locking, adapter interface,
+  org chart (reportsTo FK)
+
+## Metadata
+
+- **Created:** 2026-04-11
+- **Mode:** interview
+- **Source:** ultraplan interview
+- **Research:** 2 briefs (openclaw-paperclip frameworks + source code analysis)