27-step plan across 8 sessions in 3 waves for transforming agent-builder into Agent Factory v1.0.0. Includes research briefs, spec, and wave-by-wave execution prompts with scope fences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
| type | created | question | confidence | dimensions | mcp_servers_used | local_agents_used | external_agents_used | source_code_analyzed | target_audience | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ultraresearch-brief | 2026-04-11 | Research OpenClaw and Paperclip agent frameworks to find inspiration and concrete value proposition for agent-builder plugin | 0.92 | 7 |
|
|
|
Claude Code users who know the primitives but need help composing agent systems |
OpenClaw & Paperclip Agent Framework Research
Generated by ultraresearch-local on 2026-04-11
Research Question
What features, architecture patterns, and capabilities do OpenClaw and Paperclip offer, and what can we learn from them to create a Claude Code plugin that makes it easy for anyone to build genuinely useful, self-running agent systems?
Executive Summary
OpenClaw (354k stars) excels at individual agent capability — 23+ messaging channels, 5400+ skills, proactive agent patterns with self-improvement guardrails, and 3-tier memory systems. Paperclip (51k stars) excels at organizational coordination — heartbeat scheduling, goal hierarchies, budget enforcement, and governance. Neither offers guided, agentic-assisted construction of complete agent systems, which is the unique gap our plugin fills. Confidence is high for OpenClaw (verified via docs, GitHub, and existing codebase references) and medium for Paperclip (verified via docs site, GitHub, and multiple third-party articles).
Dimensions
1. Core Capabilities -- Confidence: high
OpenClaw:
- Personal AI assistant running on your own devices
- 23+ messaging channels (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, IRC, Teams, Matrix, and more)
- 100+ preconfigured AgentSkills for shell, file, and web automation
- Canvas/A2UI — agent-driven visual workspace (unique capability, no Claude Code equivalent)
- Browser control via dedicated Chrome/Chromium with CDP
- Voice capabilities with wake words (macOS/iOS) and continuous voice (Android)
- Device node system (camera, screen recording, location, notifications)
- Model-agnostic: Claude, GPT, Gemini, Ollama all supported
- Source: GitHub, DigitalOcean guide
Paperclip:
- Orchestration platform for teams of AI agents ("If OpenClaw is an employee, Paperclip is the company")
- Agent-agnostic: supports OpenClaw, Claude Code, Codex, Cursor, bash, HTTP webhooks
- Explicitly NOT an agent framework — doesn't build agents, organizes them
- Explicitly NOT a chatbot, workflow builder, or prompt manager
- Ticket-based task management with threaded conversations
- Multi-company support with complete data isolation
- Source: GitHub, paperclip.ing
2. Architecture & Patterns -- Confidence: high
OpenClaw architecture:
- Gateway control plane on ws://127.0.0.1:18789
- Channel Adapters transform protocol-specific input into unified message objects
- Multi-agent routing: isolated sessions per agent, workspace, or sender
- Pi agent runtime in RPC mode with tool/block streaming
- Node.js + TypeScript, pnpm, WebSocket protocol
- Source: GitHub README, Medium architecture article
Paperclip architecture:
- Node.js backend + React UI + PostgreSQL
- Company-as-runtime model: agents modeled as employees
- Heartbeat scheduler fires agent execution at defined intervals
- Each beat is stateless — state lives in external storage (Postgres)
- Atomic operations for task checkout and budget enforcement
- Source: GitHub, Towards AI article
3. Self-Learning & Autonomy -- Confidence: medium
OpenClaw — Proactive Agent Skill (most sophisticated pattern found):
- 3-tier memory: SESSION-STATE.md (working memory), memory/YYYY-MM-DD.md (daily capture), MEMORY.md (curated long-term)
- WAL Protocol (Write-Ahead Logging): write important details BEFORE responding
- Working Buffer Protocol: captures exchanges in "danger zone" (60%+ context) before compaction
- Compaction Recovery: reads buffer, session state, daily notes, then searches all sources
- Self-improvement guardrails:
- ADL (Anti-Drift Limits): no fake intelligence, no unverifiable mods, no novelty over stability
- VFM (Value-First Modification): score changes on frequency, failure reduction, burden reduction, cost savings. Only implement if score > 50
- Priority: Stability > Explainability > Reusability > Scalability > Novelty
- Two cron types:
systemEvent(needs attention) vsisolated agentTurn(true background autonomy) - Self-healing: try 5-10 approaches before asking for help
- Source: Proactive Agent Skill on GitHub
Paperclip:
- Heartbeat model with context injection (Memento Man mental model)
- Memory doesn't live in agent session — external storage maintains continuity
- Context packets: curated payloads with memory state, task queue, recent events, agent config
- No explicit self-learning mechanism documented, but rich audit trail enables pattern detection
- Skills as markdown instruction files, installable via GitHub URLs
- Source: MindStudio heartbeat article
4. User Experience & Onboarding -- Confidence: high
OpenClaw:
npm install -g openclaw@latest && openclaw onboard --install-daemon- Requires Node 24 (recommended) or 22.16+
- Has "Cowork" variant specifically because core is too hard for non-developers
- Doctor CLI for troubleshooting and migrations
- Pairing mode for security (unknown senders get pairing codes)
- 3 release channels: stable, beta, dev
- Source: GitHub README
Paperclip:
npx paperclipai onboard --yes— quick start- React dashboard for agent management
- Mobile-friendly interface
- Requires Node 20+ and pnpm 9.15+
- No guided construction — you configure agents manually
- Source: GitHub README
5. Multi-Agent Orchestration -- Confidence: high
OpenClaw:
- Session tools for agent-to-agent communication: sessions_list, sessions_history, sessions_send
- Reply-back mechanism for async coordination
- Route channels/accounts/peers to isolated agents with dedicated workspaces
- No organizational structure (flat, peer-to-peer)
- Source: GitHub README
Paperclip:
- Org chart with hierarchies, roles, and reporting lines
- Cascading delegation — work flows up and down org chart automatically
- Goal-aware task execution with full ancestry
- Atomic task checkout prevents double-work
- Cross-team requests delegate to best agent
- Human as "board of directors" with override authority
- Source: paperclip.ing/docs, Medium article
6. Extensibility & Integrations -- Confidence: medium
OpenClaw:
- Skills marketplace with 5400+ community skills (26% flagged with vulnerabilities)
- Skills installed via URL with auto-updating
- Plugin system and channel adapter architecture
- Bundled/managed/workspace skill tiers
- Source: VoltAgent awesome-openclaw-skills
Paperclip:
- Plugin ecosystem (awesome-paperclip curated list)
- Runtime skill injection without retraining
- Import/export of company templates
- Skills as markdown files
- Source: GitHub README
7. Deployment & Operations -- Confidence: high
OpenClaw:
- Docker-based containment (agent runs inside container — blast radius limited)
- Tailscale Serve/Funnel for remote access
- SSH tunnels with token/password auth
- Nix declarative configuration
- Always-on via daemon install
- Source: GitHub README
Paperclip:
- Self-hosted, MIT, no mandatory accounts
- Local-first: embedded Node.js + Postgres
- Multi-company isolation on single infrastructure
- Per-agent monthly budgets with automatic throttling
- Immutable audit logs with full tool-call tracing
- Config versioning with rollback
- Source: paperclip.ing/docs
Synthesis
The critical insight is that OpenClaw and Paperclip operate at different layers of the same stack:
- OpenClaw = the agent runtime layer (what an individual agent can do)
- Paperclip = the orchestration layer (how agents coordinate as a team)
- Agent Factory = the construction layer (how you build and configure both)
Neither tool offers what our plugin does: a guided, interview-driven, AI-assisted workflow that generates a complete agent system from scratch. OpenClaw's "Cowork" variant exists precisely because the core tool is too hard for non-developers — this validates that there's demand for lower-barrier agent creation. Paperclip's manual configuration model means every agent needs hand-crafting before it can be "hired."
The most powerful patterns to incorporate:
- From OpenClaw: 3-tier memory with WAL protocol, proactive agent pattern with self-improvement guardrails (ADL/VFM), isolated agentTurn for background autonomy
- From Paperclip: Heartbeat with context injection, goal hierarchy, budget enforcement, governance model ("autonomy is a privilege you grant")
- Unique to us: Progressive complexity (1 agent → full org), agentically-guided construction, domain-specific templates, Claude Code-native (no external infrastructure)
The security philosophies are complementary, not conflicting: OpenClaw uses containment (Docker — limit blast radius), our plugin uses prevention (hooks/deny — stop before it happens). Both should be available.
Open Questions
- Canvas/A2UI details — What does OpenClaw's visual workspace actually generate? HTML? Native UI? Understanding this clarifies whether it's worth pursuing for Claude Code.
- Paperclip self-learning implementation — The heartbeat + audit trail creates rich data, but no explicit feedback loop is documented. Is this a planned feature or deliberately excluded?
- OpenClaw skill security — 26% of community skills flagged with vulnerabilities. What vetting process exists, and should we build one?
- Cowork UX — What does OpenClaw's simplified non-developer experience look like? This directly informs our target audience.
Recommendation
Build Agent Factory as a 5-phase evolution:
- v0.2: Fix existing gaps (3 missing commands, deployment-advisor, managed-agents skill, domain templates)
- v0.3: Incorporate OpenClaw patterns (3-tier memory, WAL, proactive agent, isolated cron)
- v0.4: Incorporate Paperclip patterns (heartbeat, goal hierarchy, budgets, governance, org-chart)
- v0.5: Self-learning systems (feedback loops, performance scoring, pipeline optimization)
- v1.0: Full integration (MCP integrations, Docker deployment, templates marketplace, import/export)
The key differentiator throughout: every feature is accessible through guided, AI-assisted construction with progressive complexity — start simple, grow as needed.
Sources
| # | Source | Type | Quality | Used in |
|---|---|---|---|---|
| 1 | OpenClaw GitHub | official | high | 1,2,4,5,7 |
| 2 | Paperclip GitHub | official | high | 1,2,4,5,6,7 |
| 3 | OpenClaw Docs | official | high | 2,5 |
| 4 | Paperclip Docs | official | high | 2,3,5,7 |
| 5 | Proactive Agent Skill | official | high | 3 |
| 6 | MindStudio Heartbeat Article | community | medium | 3 |
| 7 | DigitalOcean: What is OpenClaw | community | medium | 1 |
| 8 | Medium: How OpenClaw Works | community | medium | 2 |
| 9 | Medium: Paperclip as Company | community | medium | 5 |
| 10 | awesome-openclaw-skills | community | medium | 6 |
| 11 | skills/agent-system-design/references/feature-map.md |
codebase | high | 1 |
| 12 | skills/agent-system-design/references/security-patterns.md |
codebase | high | 7 |