agent-builder/.claude/ultraplan-spec-2026-04-11-agent-factory.md
Kjell Tore Guttormsen 7419d4283d docs(plans): Agent Factory ultraplan + execution guide
27-step plan across 8 sessions in 3 waves for transforming
agent-builder into Agent Factory v1.0.0. Includes research briefs,
spec, and wave-by-wave execution prompts with scope fences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 07:35:29 +02:00

4.7 KiB

Task: Agent Factory — Full Vision Realization

Goal

Transform the existing agent-builder plugin into Agent Factory: a comprehensive, guided system for building autonomous agent systems using Claude Code. The plugin should take users from zero to a fully operational multi-agent system through a 7-phase guided workflow, incorporating best patterns from OpenClaw (individual agent capability) and Paperclip (organizational coordination).

Success = all 5 development phases completed, delivering: foundational commands and agents, OpenClaw-inspired memory/autonomy patterns, Paperclip-inspired orchestration/governance patterns, self-learning systems, and full integration with import/export and bundled templates.

Non-Goals

  • Building a central registry or marketplace with community features
  • Replacing OpenClaw or Paperclip — Agent Factory is the construction layer
  • Supporting non-Claude-Code agent runtimes
  • Building a web UI or dashboard
  • Vector/embedding-based memory (OpenClaw uses sqlite-vec — we stay file-based)
  • Canvas/A2UI equivalent (confirmed as just a static file server in OpenClaw)

Constraints

  • macOS Intel (bash 3.2 compatibility for all generated scripts)
  • No external infrastructure required for core functionality (PostgreSQL, Node.js server)
  • Claude Code native: agents, skills, hooks, settings.json, /schedule
  • Plugin structure must follow Claude Code plugin conventions
  • All templates are plain files with {{PLACEHOLDER}} variables, replaced via string operations (no template engine dependency)
  • Generated hook scripts must be bash 3.2 compatible
  • Agent YAML frontmatter must be valid
  • Never write files outside the user's project directory
  • ${CLAUDE_PLUGIN_ROOT} for all intra-plugin paths

Preferences

  • TypeScript for any scripting within the plugin itself
  • Rene .md/.sh templates with placeholder comments
  • Conventional Commits for all checkpoint commits
  • Progressive complexity: 1 agent → full org
  • Domain-specific pipeline templates (not just generic)
  • Multi-target deployment from start: /schedule, Docker, systemd

Non-Functional Requirements

  • Budget tracking via Anthropic API integration (not /usage parsing)
  • Import/export of complete agent systems (tarball format)
  • 5-10 bundled domain templates as starting points
  • All generated agents must include verification commands
  • Governance patterns must include human oversight gates
  • Self-improvement must have guardrails (ADL/VFM-inspired)

Success Criteria

  • Plugin installs and /agent-factory build runs the guided workflow
  • All 4 commands work: /agent-factory build, /agent-factory deploy, /agent-factory evaluate, /agent-factory status
  • deployment-advisor agent provides deployment recommendations
  • managed-agents skill triggers on agent-related questions
  • Generated agent systems include 3-tier memory templates
  • Generated heartbeat files parse correctly with interval tracking
  • Budget hooks log costs and alert on threshold
  • Import/export round-trips: export → import in new project → system works
  • At least 5 bundled domain templates available
  • All generated bash scripts pass bash -n syntax check on bash 3.2

Prior Attempts

  • v0.1.0 (current): Initial plugin with /agent-factory build command, builder agent, 2 skills, basic templates. Missing: deploy, evaluate, status commands. deployment-advisor agent stubbed but not implemented. managed-agents skill empty.

Open Questions

  • Anthropic billing API: exact endpoint and auth mechanism needs verification before implementation. [ASSUMPTION: API exists and is accessible with API key]
  • /schedule API stability: is the trigger interface stable enough to build on? [ASSUMPTION: yes, based on current Claude Code docs]
  • Docker deployment: should we generate Dockerfile or docker-compose.yml or both? [ASSUMPTION: docker-compose.yml with Dockerfile, matching Paperclip's approach]

Research Context

Two research briefs inform this plan:

  1. ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md (confidence: 0.92) — Feature comparison, architecture, patterns, synthesis
  2. source-code-analysis-2026-04-11.md — Implementation-level details from actual source code of both projects

Key patterns to replicate (from research):

  • OpenClaw: 3-tier memory, WAL protocol, Working Buffer Protocol, proactive agent with ADL/VFM guardrails, isolated agentTurn cron, emptiness detection
  • Paperclip: Heartbeat with context injection, goal hierarchy (simple parent_id), budget enforcement (post-hoc), task checkout via file locking, adapter interface, org chart (reportsTo FK)

Metadata

  • Created: 2026-04-11
  • Mode: interview
  • Source: ultraplan interview
  • Research: 2 briefs (openclaw-paperclip frameworks + source code analysis)