Kjell Tore Guttormsen 7419d4283d docs(plans): Agent Factory ultraplan + execution guide

27-step plan across 8 sessions in 3 waves for transforming
agent-builder into Agent Factory v1.0.0. Includes research briefs,
spec, and wave-by-wave execution prompts with scope fences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 07:35:29 +02:00

13 KiB

Raw Blame History

type

created

question

confidence

dimensions

mcp_servers_used

local_agents_used

external_agents_used

source_code_analyzed

target_audience

ultraresearch-brief

2026-04-11

Research OpenClaw and Paperclip agent frameworks to find inspiration and concrete value proposition for agent-builder plugin

0.92

Explore

WebFetch

WebSearch

paperclip

openclaw

Claude Code users who know the primitives but need help composing agent systems

OpenClaw & Paperclip Agent Framework Research

Generated by ultraresearch-local on 2026-04-11

Research Question

What features, architecture patterns, and capabilities do OpenClaw and Paperclip offer, and what can we learn from them to create a Claude Code plugin that makes it easy for anyone to build genuinely useful, self-running agent systems?

Executive Summary

OpenClaw (354k stars) excels at individual agent capability — 23+ messaging channels, 5400+ skills, proactive agent patterns with self-improvement guardrails, and 3-tier memory systems. Paperclip (51k stars) excels at organizational coordination — heartbeat scheduling, goal hierarchies, budget enforcement, and governance. Neither offers guided, agentic-assisted construction of complete agent systems, which is the unique gap our plugin fills. Confidence is high for OpenClaw (verified via docs, GitHub, and existing codebase references) and medium for Paperclip (verified via docs site, GitHub, and multiple third-party articles).

Dimensions

1. Core Capabilities -- Confidence: high

OpenClaw:

Personal AI assistant running on your own devices
23+ messaging channels (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, IRC, Teams, Matrix, and more)
100+ preconfigured AgentSkills for shell, file, and web automation
Canvas/A2UI — agent-driven visual workspace (unique capability, no Claude Code equivalent)
Browser control via dedicated Chrome/Chromium with CDP
Voice capabilities with wake words (macOS/iOS) and continuous voice (Android)
Device node system (camera, screen recording, location, notifications)
Model-agnostic: Claude, GPT, Gemini, Ollama all supported
Source: GitHub, DigitalOcean guide

Paperclip:

Orchestration platform for teams of AI agents ("If OpenClaw is an employee, Paperclip is the company")
Agent-agnostic: supports OpenClaw, Claude Code, Codex, Cursor, bash, HTTP webhooks
Explicitly NOT an agent framework — doesn't build agents, organizes them
Explicitly NOT a chatbot, workflow builder, or prompt manager
Ticket-based task management with threaded conversations
Multi-company support with complete data isolation
Source: GitHub, paperclip.ing

2. Architecture & Patterns -- Confidence: high

OpenClaw architecture:

Gateway control plane on ws://127.0.0.1:18789
Channel Adapters transform protocol-specific input into unified message objects
Multi-agent routing: isolated sessions per agent, workspace, or sender
Pi agent runtime in RPC mode with tool/block streaming
Node.js + TypeScript, pnpm, WebSocket protocol
Source: GitHub README, Medium architecture article

Paperclip architecture:

Node.js backend + React UI + PostgreSQL
Company-as-runtime model: agents modeled as employees
Heartbeat scheduler fires agent execution at defined intervals
Each beat is stateless — state lives in external storage (Postgres)
Atomic operations for task checkout and budget enforcement
Source: GitHub, Towards AI article

3. Self-Learning & Autonomy -- Confidence: medium

OpenClaw — Proactive Agent Skill (most sophisticated pattern found):

3-tier memory: SESSION-STATE.md (working memory), memory/YYYY-MM-DD.md (daily capture), MEMORY.md (curated long-term)
WAL Protocol (Write-Ahead Logging): write important details BEFORE responding
Working Buffer Protocol: captures exchanges in "danger zone" (60%+ context) before compaction
Compaction Recovery: reads buffer, session state, daily notes, then searches all sources
Self-improvement guardrails:
- ADL (Anti-Drift Limits): no fake intelligence, no unverifiable mods, no novelty over stability
- VFM (Value-First Modification): score changes on frequency, failure reduction, burden reduction, cost savings. Only implement if score > 50
- Priority: Stability > Explainability > Reusability > Scalability > Novelty
Two cron types: systemEvent (needs attention) vs isolated agentTurn (true background autonomy)
Self-healing: try 5-10 approaches before asking for help
Source: Proactive Agent Skill on GitHub

Paperclip:

Heartbeat model with context injection (Memento Man mental model)
Memory doesn't live in agent session — external storage maintains continuity
Context packets: curated payloads with memory state, task queue, recent events, agent config
No explicit self-learning mechanism documented, but rich audit trail enables pattern detection
Skills as markdown instruction files, installable via GitHub URLs
Source: MindStudio heartbeat article

4. User Experience & Onboarding -- Confidence: high

OpenClaw:

npm install -g openclaw@latest && openclaw onboard --install-daemon
Requires Node 24 (recommended) or 22.16+
Has "Cowork" variant specifically because core is too hard for non-developers
Doctor CLI for troubleshooting and migrations
Pairing mode for security (unknown senders get pairing codes)
3 release channels: stable, beta, dev
Source: GitHub README

Paperclip:

npx paperclipai onboard --yes — quick start
React dashboard for agent management
Mobile-friendly interface
Requires Node 20+ and pnpm 9.15+
No guided construction — you configure agents manually
Source: GitHub README

5. Multi-Agent Orchestration -- Confidence: high

OpenClaw:

Session tools for agent-to-agent communication: sessions_list, sessions_history, sessions_send
Reply-back mechanism for async coordination
Route channels/accounts/peers to isolated agents with dedicated workspaces
No organizational structure (flat, peer-to-peer)
Source: GitHub README

Paperclip:

Org chart with hierarchies, roles, and reporting lines
Cascading delegation — work flows up and down org chart automatically
Goal-aware task execution with full ancestry
Atomic task checkout prevents double-work
Cross-team requests delegate to best agent
Human as "board of directors" with override authority
Source: paperclip.ing/docs, Medium article

6. Extensibility & Integrations -- Confidence: medium

OpenClaw:

Skills marketplace with 5400+ community skills (26% flagged with vulnerabilities)
Skills installed via URL with auto-updating
Plugin system and channel adapter architecture
Bundled/managed/workspace skill tiers
Source: VoltAgent awesome-openclaw-skills

Paperclip:

Plugin ecosystem (awesome-paperclip curated list)
Runtime skill injection without retraining
Import/export of company templates
Skills as markdown files
Source: GitHub README

7. Deployment & Operations -- Confidence: high

OpenClaw:

Docker-based containment (agent runs inside container — blast radius limited)
Tailscale Serve/Funnel for remote access
SSH tunnels with token/password auth
Nix declarative configuration
Always-on via daemon install
Source: GitHub README

Paperclip:

Self-hosted, MIT, no mandatory accounts
Local-first: embedded Node.js + Postgres
Multi-company isolation on single infrastructure
Per-agent monthly budgets with automatic throttling
Immutable audit logs with full tool-call tracing
Config versioning with rollback
Source: paperclip.ing/docs

Synthesis

The critical insight is that OpenClaw and Paperclip operate at different layers of the same stack:

OpenClaw = the agent runtime layer (what an individual agent can do)
Paperclip = the orchestration layer (how agents coordinate as a team)
Agent Factory = the construction layer (how you build and configure both)

Neither tool offers what our plugin does: a guided, interview-driven, AI-assisted workflow that generates a complete agent system from scratch. OpenClaw's "Cowork" variant exists precisely because the core tool is too hard for non-developers — this validates that there's demand for lower-barrier agent creation. Paperclip's manual configuration model means every agent needs hand-crafting before it can be "hired."

The most powerful patterns to incorporate:

From OpenClaw: 3-tier memory with WAL protocol, proactive agent pattern with self-improvement guardrails (ADL/VFM), isolated agentTurn for background autonomy
From Paperclip: Heartbeat with context injection, goal hierarchy, budget enforcement, governance model ("autonomy is a privilege you grant")
Unique to us: Progressive complexity (1 agent → full org), agentically-guided construction, domain-specific templates, Claude Code-native (no external infrastructure)

The security philosophies are complementary, not conflicting: OpenClaw uses containment (Docker — limit blast radius), our plugin uses prevention (hooks/deny — stop before it happens). Both should be available.

Open Questions

Canvas/A2UI details — What does OpenClaw's visual workspace actually generate? HTML? Native UI? Understanding this clarifies whether it's worth pursuing for Claude Code.
Paperclip self-learning implementation — The heartbeat + audit trail creates rich data, but no explicit feedback loop is documented. Is this a planned feature or deliberately excluded?
OpenClaw skill security — 26% of community skills flagged with vulnerabilities. What vetting process exists, and should we build one?
Cowork UX — What does OpenClaw's simplified non-developer experience look like? This directly informs our target audience.

Recommendation

Build Agent Factory as a 5-phase evolution:

v0.2: Fix existing gaps (3 missing commands, deployment-advisor, managed-agents skill, domain templates)
v0.3: Incorporate OpenClaw patterns (3-tier memory, WAL, proactive agent, isolated cron)
v0.4: Incorporate Paperclip patterns (heartbeat, goal hierarchy, budgets, governance, org-chart)
v0.5: Self-learning systems (feedback loops, performance scoring, pipeline optimization)
v1.0: Full integration (MCP integrations, Docker deployment, templates marketplace, import/export)

The key differentiator throughout: every feature is accessible through guided, AI-assisted construction with progressive complexity — start simple, grow as needed.

Sources

#	Source	Type	Quality	Used in
1	OpenClaw GitHub	official	high	1,2,4,5,7
2	Paperclip GitHub	official	high	1,2,4,5,6,7
3	OpenClaw Docs	official	high	2,5
4	Paperclip Docs	official	high	2,3,5,7
5	Proactive Agent Skill	official	high	3
6	MindStudio Heartbeat Article	community	medium	3
7	DigitalOcean: What is OpenClaw	community	medium	1
8	Medium: How OpenClaw Works	community	medium	2
9	Medium: Paperclip as Company	community	medium	5
10	awesome-openclaw-skills	community	medium	6
11	`skills/agent-system-design/references/feature-map.md`	codebase	high	1
12	`skills/agent-system-design/references/security-patterns.md`	codebase	high	7

13 KiB Raw Blame History

OpenClaw & Paperclip Agent Framework Research

Research Question

Executive Summary

Dimensions

1. Core Capabilities -- Confidence: high

2. Architecture & Patterns -- Confidence: high

3. Self-Learning & Autonomy -- Confidence: medium

4. User Experience & Onboarding -- Confidence: high

5. Multi-Agent Orchestration -- Confidence: high

6. Extensibility & Integrations -- Confidence: medium

7. Deployment & Operations -- Confidence: high

Synthesis

Open Questions

Recommendation

Sources

13 KiB

Raw Blame History