docs(plans): Agent Factory ultraplan + execution guide

27-step plan across 8 sessions in 3 waves for transforming agent-builder into Agent Factory v1.0.0. Includes research briefs, spec, and wave-by-wave execution prompts with scope fences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 07:35:29 +02:00 · 2026-04-11 07:35:29 +02:00 · 7419d4283d
commit 7419d4283d
parent 075383990f
5 changed files with 2294 additions and 0 deletions
--- a/.claude/research/source-code-analysis-2026-04-11.md
+++ b/.claude/research/source-code-analysis-2026-04-11.md
@ -0,0 +1,489 @@
+---
+type: source-code-analysis
+created: 2026-04-11
+repos_analyzed: [paperclipai/paperclip, openclaw/openclaw]
+purpose: "Implementation-level details for replicating best patterns in Agent Factory"
+---
+
+# Source Code Analysis: OpenClaw & Paperclip
+
+Repos were cloned and analyzed at code level on 2026-04-11. This document
+captures implementation details NOT available in docs or articles — the actual
+patterns, interfaces, and mechanisms worth replicating.
+
+## Critical Corrections (vs. docs/articles)
+
+These are things docs described differently than the code implements:
+
+1. **Canvas/A2UI (OpenClaw) is NOT generative rendering.** It's a static file
+   server. Agents write files to a workspace directory, the canvas-host serves
+   them over HTTP. No server-side rendering, no UI generation. This is NOT a
+   meaningful capability gap for Claude Code.
+
+2. **Goal hierarchy (Paperclip) is a simple adjacency list.** Just a `parent_id`
+   FK on the `goals` table. No recursive traversal at runtime — only the directly
+   referenced goal is passed to agents in `context_snapshot`. Docs said "full
+   ancestry" but that's aspirational, not implemented.
+
+3. **Budget enforcement (Paperclip) is post-hoc, not atomic.** Checked AFTER each
+   run via `evaluateCostEvent()`: reads `SUM(cost_cents)`, compares with policy,
+   pauses agent if exceeded. No pre-run budget reservation. Robust enough in practice.
+
+4. **OpenClaw has real vector memory.** Not just MEMORY.md files. Uses `sqlite-vec`
+   extension for vector search with embedding providers (Gemini, Mistral, Ollama,
+   OpenAI, Voyage, Bedrock, local llama). This is significantly more sophisticated
+   than file-based memory.
+
+---
+
+## Paperclip Implementation Details
+
+### Heartbeat Scheduler
+
+**File:** `server/src/services/heartbeat.ts` (4534 lines)
+
+Poll-based, not event-driven. `tickTimers()` iterates all agents on each tick:
+
+```typescript
+tickTimers: async (now = new Date()) => {
+  const allAgents = await db.select().from(agents);
+  for (const agent of allAgents) {
+    if (agent.status === "paused" || "terminated" || "pending_approval") continue;
+    const policy = parseHeartbeatPolicy(agent);
+    if (!policy.enabled || policy.intervalSec <= 0) continue;
+    const elapsed = now.getTime() - new Date(agent.lastHeartbeatAt ?? agent.createdAt).getTime();
+    if (elapsed < policy.intervalSec * 1000) continue;
+    await enqueueWakeup(agent.id, { source: "timer" });
+  }
+}
+```
+
+Heartbeat policy from `agent.runtimeConfig.heartbeat`:
+- `enabled: boolean`
+- `intervalSec: number`
+- `wakeOnDemand: boolean`
+- `maxConcurrentRuns: 1-10`
+
+4 wakeup triggers: `timer`, `assignment`, `on_demand`, `automation`.
+
+Concurrency control: in-process promise chain per agent (`startLocksByAgent` Map).
+Not distributed — single server process only.
+
+### Run Lifecycle
+
+1. `enqueueWakeup()` → insert `heartbeat_runs` (status=queued) + `agent_wakeup_requests`
+2. `startNextQueuedRunForAgent()` → check running count vs maxConcurrentRuns
+3. `claimQueuedRun()` → `UPDATE heartbeat_runs SET status='running' WHERE status='queued'`
+4. `executeRun()` → call `adapter.execute()`, stream output via `onLog`
+5. On completion → update runs, runtime state, task sessions, create cost events
+6. Orphan detection: `reapOrphanedRuns()` checks PIDs, auto-retries once
+
+### Adapter Interface
+
+**File:** `packages/adapter-utils/src/types.ts`
+
+```typescript
+interface ServerAdapterModule {
+  type: string;
+  execute(ctx: AdapterExecutionContext): Promise<AdapterExecutionResult>;
+  testEnvironment(ctx: AdapterEnvironmentTestContext): Promise<AdapterEnvironmentTestResult>;
+  listSkills?: (ctx) => Promise<AdapterSkillSnapshot>;
+  syncSkills?: (ctx, desiredSkills) => Promise<AdapterSkillSnapshot>;
+  sessionCodec?: AdapterSessionCodec;
+  sessionManagement?: AdapterSessionManagement;
+}
+```
+
+10 built-in adapters: `claude_local`, `codex_local`, `cursor_local`, `gemini_local`,
+`openclaw_gateway`, `opencode_local`, `pi_local`, `hermes_local`, `process`, `http`.
+
+### Claude Adapter Execution
+
+**File:** `packages/adapters/claude-local/src/server/execute.ts`
+
+Invokes CLI as:
+```
+claude --print - --output-format stream-json --verbose \
+  [--resume <sessionId>] \
+  [--dangerously-skip-permissions] \
+  [--model <model>] \
+  [--max-turns N] \
+  [--append-system-prompt-file <file>] \
+  [--add-dir <skillsDir>]
+```
+
+Prompt composed from: `bootstrapPromptTemplate` (fresh sessions only) + wake payload
+ session handoff note + main `promptTemplate`. Template variables: `{{agent.id}}`,
+`{{agent.name}}`, `{{context.wakeReason}}`, etc.
+
+### Task Checkout (Atomic Locking)
+
+**File:** `server/src/services/heartbeat.ts` (lines 3756-4010)
+
+Issues have `execution_run_id` column as soft lock. Uses PostgreSQL row-level locking:
+
+```sql
+SELECT id FROM issues WHERE id = $1 AND company_id = $2 FOR UPDATE
+```
+
+Then conditional update:
+```sql
+UPDATE issues SET execution_run_id = $claimed_id
+WHERE id = $issue_id AND (execution_run_id IS NULL OR execution_run_id = $claimed_id)
+```
+
+When same agent has running run → coalesce (merge context).
+When different agent → defer (status `deferred_issue_execution`), promoted when original completes.
+
+### Budget Enforcement
+
+**File:** `server/src/services/budgets.ts`
+
+Schema:
+```
+budget_policies: scope_type (company|agent|project), scope_id, metric (billed_cents),
+  window_kind (calendar_month_utc|lifetime), amount (cents), warn_percent (80),
+  hard_stop_enabled, notify_enabled
+```
+
+Flow after each run:
+1. Load active policies for company/agent/project
+2. `SELECT SUM(cost_cents) FROM cost_events` filtered by window
+3. If >= soft threshold → create `budget_incidents` (type soft)
+4. If >= amount AND hard_stop → `pauseScopeForBudget()` → `UPDATE agents SET status='paused'`
+   → `cancelBudgetScopeWork()` → SIGTERM → SIGKILL (with graceSec)
+
+Pre-run check: `getInvocationBlock()` only checks `paused` flag, not live budget sum.
+
+### Skills System
+
+Skills injected as symlinked tmpdir per run:
+```typescript
+async function buildSkillsDir(config) {
+  const tmp = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-skills-"));
+  const target = path.join(tmp, ".claude", "skills");
+  await fs.mkdir(target, { recursive: true });
+  for (const entry of availableEntries) {
+    if (!desiredNames.has(entry.key)) continue;
+    await fs.symlink(entry.source, path.join(target, entry.runtimeName));
+  }
+  return tmp; // Passed as: claude --add-dir <skillsDir>
+}
+```
+
+Company skills stored in DB: `company_skills` table with `markdown` content,
+`source_type` (github|url|local_path|skills_sh), `file_inventory`, `trust_level`.
+
+### Session Persistence
+
+`agent_task_sessions` table: unique on `(companyId, agentId, adapterType, taskKey)`.
+- taskKey = issueId (for issue-scoped) or `"__heartbeat__"` (timer-only)
+- sessionParamsJson = adapter-specific (Claude stores `{ sessionId, cwd }`)
+- Upserted after each run completion
+
+Session compaction: rotate after 200 runs, 2M raw input tokens, or 72h age.
+Claude adapter: `nativeContextManagement: "confirmed"` → compaction disabled
+(Claude manages its own context window).
+
+### Org Chart
+
+Just `agents.reportsTo` self-referential FK. `agents.role` text field.
+Rendered as SVG server-side (5 visual styles). No separate table.
+
+### Database Schema
+
+PostgreSQL via Drizzle ORM. 55 migrations. Key tables:
+- `companies` — tenant root, status, budget
+- `agents` — adapter_type, adapter_config (jsonb), runtime_config (jsonb), reports_to, status, budget
+- `goals` — self-referencing parent_id, level (company/project/task), owner_agent_id
+- `issues` — FK to goals/projects/agents, execution_run_id (soft lock), parent_id
+- `heartbeat_runs` — status, context_snapshot (jsonb), session_id, process_pid, usage_json
+- `agent_wakeup_requests` — wake queue with status enum
+- `agent_task_sessions` — per-(agent, adapter, taskKey) session state
+- `budget_policies` / `budget_incidents` / `cost_events` — cost control
+- `company_skills` — skill definitions with markdown content
+- `approvals` — human approval requests
+- `routines` — scheduled workflows with cron expressions
+
+### Agent Configuration Format
+
+```json
+{
+  "adapterConfig": {
+    "command": "claude",
+    "model": "claude-opus-4-5",
+    "cwd": "/path/to/project",
+    "promptTemplate": "You are agent {{agent.name}}...",
+    "instructionsFilePath": "/path/to/AGENTS.md",
+    "dangerouslySkipPermissions": true,
+    "maxTurnsPerRun": 0,
+    "timeoutSec": 0,
+    "graceSec": 20,
+    "skills": ["paperclipai/paperclip/mcp-server"]
+  },
+  "runtimeConfig": {
+    "heartbeat": {
+      "enabled": true,
+      "intervalSec": 300,
+      "wakeOnDemand": true,
+      "maxConcurrentRuns": 1
+    }
+  }
+}
+```
+
+---
+
+## OpenClaw Implementation Details
+
+### Gateway
+
+**File:** `src/gateway/server.impl.ts`
+
+WebSocket server on port 18789. Flat dispatch table:
+```typescript
+const coreGatewayHandlers: Record<string, GatewayRequestHandler> = {
+  ...connectHandlers, ...chatHandlers, ...cronHandlers,
+  ...skillsHandlers, ...sessionsHandlers, ...agentHandlers,
+  ...channelsHandlers, ...modelsHandlers, // 28 handler groups
+}
+```
+
+Auth: roles (`operator` | `node`), operator scopes
+(`admin`, `read`, `write`, `approvals`, `pairing`).
+
+### Skills System
+
+**Files:** `src/agents/skills/workspace.ts`, `skill-contract.ts`
+
+Skill = directory with SKILL.md. Frontmatter parsed for metadata.
+
+Loading limits:
+- Max 300 candidates per root
+- Max 200 loaded per source
+- Max 150 in prompt
+- Max 30,000 chars in prompt
+- Max 256 KB per skill file
+
+Prompt format (XML):
+```xml
+<available_skills>
+  <skill>
+    <name>github</name>
+    <description>...</description>
+    <location>~/.openclaw/workspace/skills/github/SKILL.md</location>
+  </skill>
+</available_skills>
+```
+
+Path compaction: home dir → `~` (saves 5-6 tokens per path).
+
+Skill metadata fields: `always`, `skillKey`, `emoji`, `homepage`, `os`,
+`requires` (bins, anyBins, env, config), `install` specs (brew, node, go, uv, download).
+
+ClawHub integration for remote skill registry (search, install, update).
+
+### Memory System
+
+**Files:** `packages/memory-host-sdk/`
+
+Two backends:
+- `builtin` — SQLite + sqlite-vec extension for vector search
+- `qmd` — External QuickMemory Daemon process
+
+Embedding providers: Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local (node-llama).
+
+Interface:
+```typescript
+interface MemorySearchManager {
+  search(query, opts?: { maxResults?, minScore?, sessionKey? }): Promise<MemorySearchResult[]>
+  readFile(params): Promise<{ text, path }>
+  status(): MemoryProviderStatus
+  sync?(params?): Promise<void>
+}
+```
+
+Session transcripts indexable into memory backend. MEMORY.md / memory.md as default
+memory file convention.
+
+### HEARTBEAT Mechanism
+
+**File:** `src/auto-reply/heartbeat.ts`
+
+Default prompt:
+```
+"Read HEARTBEAT.md if it exists. Follow it strictly. Do not infer or repeat
+old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK."
+```
+
+HEARTBEAT.md task format:
+```yaml
+tasks:
+  - name: email-check
+    interval: 30m
+    prompt: "Check for urgent unread emails"
+```
+
+Key functions:
+- `isHeartbeatContentEffectivelyEmpty()` — skips API calls when file has only
+  headers/empty items. Saves significant cost.
+- `parseHeartbeatTasks()` — parses YAML tasks block
+- `isTaskDue()` — checks intervals against last-run timestamps
+- `stripHeartbeatToken()` — strips HEARTBEAT_OK from responses; responses
+  under `ackMaxChars` (300) suppressed from chat
+
+**HeartbeatRunner** (`infra/heartbeat-runner.ts`):
+- Per-agent intervals (default 30m)
+- `HeartbeatAgentState` tracks lastRunMs, nextDueMs, intervalMs
+- On fire: reads HEARTBEAT.md, builds prompt, dispatches inbound message
+
+### Cron Service
+
+**File:** `src/cron/service.ts`
+
+Three schedule types:
+- `{ kind: "at"; at: string }` — one-shot
+- `{ kind: "every"; everyMs: number }` — interval
+- `{ kind: "cron"; expr: string; tz?: string; staggerMs?: number }` — cron expression
+
+Two job payload types:
+- `systemEvent` — injects text into existing session (needs attention available)
+- `agentTurn` — fires full agent turn (true background autonomy)
+
+Session targets: `"main" | "isolated" | "current" | "session:<id>"`.
+Isolated gets own session key with freshness/rollover logic.
+
+Startup catchup: runs up to 5 missed jobs immediately, staggers rest (5s gap).
+Failure alerts after N consecutive errors, 1h cooldown.
+
+### Multi-Agent Routing
+
+Session key format: `agent:<agentId>:<key>`
+Type detection via: `isCronSessionKey()`, `isSubagentSessionKey()`, `isAcpSessionKey()`
+
+Per-agent isolation: own workspace, session store, skill set, heartbeat config, model config.
+
+Subagent spawning: ACP-based, session depth tracked in keys, reactivation support.
+
+### Channel Adapter Interface
+
+**File:** `src/channels/plugins/types.plugin.ts`
+
+```typescript
+type ChannelPlugin<ResolvedAccount> = {
+  id: ChannelId;
+  meta: ChannelMeta;
+  capabilities: ChannelCapabilities;
+  outbound?: ChannelOutboundAdapter;
+  messaging?: ChannelMessagingAdapter;
+  lifecycle?: ChannelLifecycleAdapter;
+  heartbeat?: ChannelHeartbeatAdapter;
+  security?: ChannelSecurityAdapter;
+  agentTools?: ChannelAgentToolFactory;
+  streaming?: ChannelStreamingAdapter;
+  threading?: ChannelThreadingAdapter;
+  // ~15 optional adapter slots total
+}
+```
+
+Restart policy: exponential backoff (5s initial, 5min max, factor 2, jitter 0.1,
+max 10 attempts).
+
+### Security
+
+- Exec approval: `ExecApprovalManager` with promise-based flow, `allow-once` vs
+  `allow-always`, 15s grace timeout
+- Tool policy: `pickSandboxToolPolicy()` per sandbox config
+- Security audit: comprehensive checks (gateway auth, channel config, plugin trust,
+  exec surfaces, filesystem ACLs)
+- Auth rate limiting with browser-specific stricter limits
+- External content guard: tracks provenance, `allowUnsafeExternalContent` flag
+
+### Agent Configuration
+
+```yaml
+agents:
+  defaults:
+    model:
+      primary: "anthropic/claude-opus-4-5"
+      fallbacks: ["anthropic/claude-sonnet-4-5"]
+    heartbeat:
+      enabled: true
+      every: "30m"
+      prompt: "Check HEARTBEAT.md"
+      ackMaxChars: 300
+    skills:
+      limits:
+        maxSkillsInPrompt: 150
+        maxSkillsPromptChars: 30000
+  list:
+    - id: "myagent"
+      workspace: "~/workspace"
+      model:
+        primary: "anthropic/claude-sonnet-4-5"
+      heartbeat:
+        every: "1h"
+      skills:
+        filter: ["github", "slack"]
+```
+
+Workspace files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md,
+HEARTBEAT.md, BOOTSTRAP.md, MEMORY.md.
+
+### Plugin Hooks
+
+29 lifecycle hook points:
+`before_model_resolve`, `before_prompt_build`, `before_agent_start`,
+`before_agent_reply`, `llm_input`, `llm_output`, `agent_end`,
+`inbound_claim`, `message_received`, `message_sending`, `message_sent`,
+`before_tool_call`, `after_tool_call`, `session_start`, `session_end`,
+`subagent_spawning`, `subagent_delivery_target`, `gateway_start`,
+`gateway_stop`, `before_dispatch`, `reply_dispatch`, `before_install`, etc.
+
+---
+
+## Patterns Worth Replicating in Agent Factory
+
+### From Paperclip
+
+1. **Heartbeat as context injection** — Each beat starts clean, loads curated context
+   packet. Maps to: `/schedule` trigger + CLAUDE.md + memory files loaded per session.
+
+2. **Adapter interface** — Clean `execute(ctx)` pattern. Maps to: our agent files
+   are already adapter-like (model, tools, prompt per agent).
+
+3. **Budget as governance primitive** — Post-hoc cost tracking with pause thresholds.
+   Maps to: hook that reads `/usage` after each run, logs to cost-events file,
+   alerts when threshold crossed.
+
+4. **Task checkout via file locking** — Paperclip uses PostgreSQL. We can use
+   file-based locking (write `task.lock` with agent name, check before claiming).
+
+5. **Session persistence via taskKey** — Different tasks get different sessions.
+   Maps to: `--resume` with task-specific session IDs.
+
+### From OpenClaw
+
+6. **HEARTBEAT.md with task parsing** — YAML tasks block with intervals and
+   due-time checking. Maps directly to our generated HEARTBEAT.md files.
+
+7. **Emptiness detection** — Skip API calls when heartbeat file is effectively empty.
+   Critical cost saver. Include in generated heartbeat scripts.
+
+8. **Skill prompt XML format** — Standardized skill discovery in system prompt.
+   Our skills already use this via Claude Code's built-in mechanism.
+
+9. **3-tier memory** — SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold).
+   Maps to: templates we generate in the user's project.
+
+10. **Startup catchup with stagger** — Run missed jobs on restart, but don't
+    thundering-herd. Include in generated automation scripts.
+
+### Unique to Agent Factory
+
+11. **Guided construction** — Neither tool helps you BUILD the system. We do.
+12. **Progressive complexity** — Start with 1 agent, grow to full org.
+13. **Domain templates** — Not just researcher→writer→reviewer. Monitoring,
+    code review, data processing, research synthesis.
+14. **Claude Code-native** — No PostgreSQL, no Node.js server, no Docker required.
+    Just agents, skills, hooks, settings.json, /schedule.
--- a/.claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md
+++ b/.claude/research/ultraresearch-2026-04-11-openclaw-paperclip-agent-frameworks.md
@ -0,0 +1,215 @@
+---
+type: ultraresearch-brief
+created: 2026-04-11
+question: "Research OpenClaw and Paperclip agent frameworks to find inspiration and concrete value proposition for agent-builder plugin"
+confidence: 0.92
+dimensions: 7
+mcp_servers_used: []
+local_agents_used: [Explore]
+external_agents_used: [WebFetch, WebSearch]
+source_code_analyzed: [paperclip, openclaw]
+target_audience: "Claude Code users who know the primitives but need help composing agent systems"
+---
+
+# OpenClaw & Paperclip Agent Framework Research
+
+> Generated by ultraresearch-local on 2026-04-11
+
+## Research Question
+
+What features, architecture patterns, and capabilities do OpenClaw and Paperclip offer, and what can we learn from them to create a Claude Code plugin that makes it easy for anyone to build genuinely useful, self-running agent systems?
+
+## Executive Summary
+
+OpenClaw (354k stars) excels at individual agent capability — 23+ messaging channels, 5400+ skills, proactive agent patterns with self-improvement guardrails, and 3-tier memory systems. Paperclip (51k stars) excels at organizational coordination — heartbeat scheduling, goal hierarchies, budget enforcement, and governance. Neither offers guided, agentic-assisted construction of complete agent systems, which is the unique gap our plugin fills. Confidence is high for OpenClaw (verified via docs, GitHub, and existing codebase references) and medium for Paperclip (verified via docs site, GitHub, and multiple third-party articles).
+
+## Dimensions
+
+### 1. Core Capabilities -- Confidence: high
+
+**OpenClaw:**
+- Personal AI assistant running on your own devices
+- 23+ messaging channels (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, IRC, Teams, Matrix, and more)
+- 100+ preconfigured AgentSkills for shell, file, and web automation
+- Canvas/A2UI — agent-driven visual workspace (unique capability, no Claude Code equivalent)
+- Browser control via dedicated Chrome/Chromium with CDP
+- Voice capabilities with wake words (macOS/iOS) and continuous voice (Android)
+- Device node system (camera, screen recording, location, notifications)
+- Model-agnostic: Claude, GPT, Gemini, Ollama all supported
+- Source: [GitHub](https://github.com/openclaw/openclaw), [DigitalOcean guide](https://www.digitalocean.com/resources/articles/what-is-openclaw)
+
+**Paperclip:**
+- Orchestration platform for teams of AI agents ("If OpenClaw is an employee, Paperclip is the company")
+- Agent-agnostic: supports OpenClaw, Claude Code, Codex, Cursor, bash, HTTP webhooks
+- Explicitly NOT an agent framework — doesn't build agents, organizes them
+- Explicitly NOT a chatbot, workflow builder, or prompt manager
+- Ticket-based task management with threaded conversations
+- Multi-company support with complete data isolation
+- Source: [GitHub](https://github.com/paperclipai/paperclip), [paperclip.ing](https://paperclip.ing/docs)
+
+### 2. Architecture & Patterns -- Confidence: high
+
+**OpenClaw architecture:**
+- Gateway control plane on ws://127.0.0.1:18789
+- Channel Adapters transform protocol-specific input into unified message objects
+- Multi-agent routing: isolated sessions per agent, workspace, or sender
+- Pi agent runtime in RPC mode with tool/block streaming
+- Node.js + TypeScript, pnpm, WebSocket protocol
+- Source: [GitHub README](https://github.com/openclaw/openclaw), [Medium architecture article](https://bibek-poudel.medium.com/how-openclaw-works-understanding-ai-agents-through-a-real-architecture-5d59cc7a4764)
+
+**Paperclip architecture:**
+- Node.js backend + React UI + PostgreSQL
+- Company-as-runtime model: agents modeled as employees
+- Heartbeat scheduler fires agent execution at defined intervals
+- Each beat is stateless — state lives in external storage (Postgres)
+- Atomic operations for task checkout and budget enforcement
+- Source: [GitHub](https://github.com/paperclipai/paperclip), [Towards AI article](https://pub.towardsai.net/paperclip-the-open-source-operating-system-for-zero-human-companies-2c16f3f22182)
+
+### 3. Self-Learning & Autonomy -- Confidence: medium
+
+**OpenClaw — Proactive Agent Skill (most sophisticated pattern found):**
+- 3-tier memory: SESSION-STATE.md (working memory), memory/YYYY-MM-DD.md (daily capture), MEMORY.md (curated long-term)
+- WAL Protocol (Write-Ahead Logging): write important details BEFORE responding
+- Working Buffer Protocol: captures exchanges in "danger zone" (60%+ context) before compaction
+- Compaction Recovery: reads buffer, session state, daily notes, then searches all sources
+- Self-improvement guardrails:
+  - ADL (Anti-Drift Limits): no fake intelligence, no unverifiable mods, no novelty over stability
+  - VFM (Value-First Modification): score changes on frequency, failure reduction, burden reduction, cost savings. Only implement if score > 50
+  - Priority: Stability > Explainability > Reusability > Scalability > Novelty
+- Two cron types: `systemEvent` (needs attention) vs `isolated agentTurn` (true background autonomy)
+- Self-healing: try 5-10 approaches before asking for help
+- Source: [Proactive Agent Skill on GitHub](https://github.com/openclaw/skills/blob/main/skills/halthelobster/proactive-agent/SKILL.md)
+
+**Paperclip:**
+- Heartbeat model with context injection (Memento Man mental model)
+- Memory doesn't live in agent session — external storage maintains continuity
+- Context packets: curated payloads with memory state, task queue, recent events, agent config
+- No explicit self-learning mechanism documented, but rich audit trail enables pattern detection
+- Skills as markdown instruction files, installable via GitHub URLs
+- Source: [MindStudio heartbeat article](https://www.mindstudio.ai/blog/heartbeat-pattern-paperclip-ai-agents-24-7)
+
+### 4. User Experience & Onboarding -- Confidence: high
+
+**OpenClaw:**
+- `npm install -g openclaw@latest && openclaw onboard --install-daemon`
+- Requires Node 24 (recommended) or 22.16+
+- Has "Cowork" variant specifically because core is too hard for non-developers
+- Doctor CLI for troubleshooting and migrations
+- Pairing mode for security (unknown senders get pairing codes)
+- 3 release channels: stable, beta, dev
+- Source: [GitHub README](https://github.com/openclaw/openclaw)
+
+**Paperclip:**
+- `npx paperclipai onboard --yes` — quick start
+- React dashboard for agent management
+- Mobile-friendly interface
+- Requires Node 20+ and pnpm 9.15+
+- No guided construction — you configure agents manually
+- Source: [GitHub README](https://github.com/paperclipai/paperclip)
+
+### 5. Multi-Agent Orchestration -- Confidence: high
+
+**OpenClaw:**
+- Session tools for agent-to-agent communication: sessions_list, sessions_history, sessions_send
+- Reply-back mechanism for async coordination
+- Route channels/accounts/peers to isolated agents with dedicated workspaces
+- No organizational structure (flat, peer-to-peer)
+- Source: [GitHub README](https://github.com/openclaw/openclaw)
+
+**Paperclip:**
+- Org chart with hierarchies, roles, and reporting lines
+- Cascading delegation — work flows up and down org chart automatically
+- Goal-aware task execution with full ancestry
+- Atomic task checkout prevents double-work
+- Cross-team requests delegate to best agent
+- Human as "board of directors" with override authority
+- Source: [paperclip.ing/docs](https://paperclip.ing/docs), [Medium article](https://medium.com/@creativeaininja/paperclip-the-open-source-platform-turning-ai-agents-into-an-actual-company-7348015c5bf7)
+
+### 6. Extensibility & Integrations -- Confidence: medium
+
+**OpenClaw:**
+- Skills marketplace with 5400+ community skills (26% flagged with vulnerabilities)
+- Skills installed via URL with auto-updating
+- Plugin system and channel adapter architecture
+- Bundled/managed/workspace skill tiers
+- Source: [VoltAgent awesome-openclaw-skills](https://github.com/VoltAgent/awesome-openclaw-skills)
+
+**Paperclip:**
+- Plugin ecosystem (awesome-paperclip curated list)
+- Runtime skill injection without retraining
+- Import/export of company templates
+- Skills as markdown files
+- Source: [GitHub README](https://github.com/paperclipai/paperclip)
+
+### 7. Deployment & Operations -- Confidence: high
+
+**OpenClaw:**
+- Docker-based containment (agent runs inside container — blast radius limited)
+- Tailscale Serve/Funnel for remote access
+- SSH tunnels with token/password auth
+- Nix declarative configuration
+- Always-on via daemon install
+- Source: [GitHub README](https://github.com/openclaw/openclaw)
+
+**Paperclip:**
+- Self-hosted, MIT, no mandatory accounts
+- Local-first: embedded Node.js + Postgres
+- Multi-company isolation on single infrastructure
+- Per-agent monthly budgets with automatic throttling
+- Immutable audit logs with full tool-call tracing
+- Config versioning with rollback
+- Source: [paperclip.ing/docs](https://paperclip.ing/docs)
+
+## Synthesis
+
+The critical insight is that OpenClaw and Paperclip operate at **different layers of the same stack**:
+
+- **OpenClaw** = the agent runtime layer (what an individual agent can do)
+- **Paperclip** = the orchestration layer (how agents coordinate as a team)
+- **Agent Factory** = the construction layer (how you build and configure both)
+
+Neither tool offers what our plugin does: a guided, interview-driven, AI-assisted workflow that generates a complete agent system from scratch. OpenClaw's "Cowork" variant exists precisely because the core tool is too hard for non-developers — this validates that there's demand for lower-barrier agent creation. Paperclip's manual configuration model means every agent needs hand-crafting before it can be "hired."
+
+The most powerful patterns to incorporate:
+
+1. **From OpenClaw:** 3-tier memory with WAL protocol, proactive agent pattern with self-improvement guardrails (ADL/VFM), isolated agentTurn for background autonomy
+2. **From Paperclip:** Heartbeat with context injection, goal hierarchy, budget enforcement, governance model ("autonomy is a privilege you grant")
+3. **Unique to us:** Progressive complexity (1 agent → full org), agentically-guided construction, domain-specific templates, Claude Code-native (no external infrastructure)
+
+The security philosophies are complementary, not conflicting: OpenClaw uses containment (Docker — limit blast radius), our plugin uses prevention (hooks/deny — stop before it happens). Both should be available.
+
+## Open Questions
+
+- **Canvas/A2UI details** — What does OpenClaw's visual workspace actually generate? HTML? Native UI? Understanding this clarifies whether it's worth pursuing for Claude Code.
+- **Paperclip self-learning implementation** — The heartbeat + audit trail creates rich data, but no explicit feedback loop is documented. Is this a planned feature or deliberately excluded?
+- **OpenClaw skill security** — 26% of community skills flagged with vulnerabilities. What vetting process exists, and should we build one?
+- **Cowork UX** — What does OpenClaw's simplified non-developer experience look like? This directly informs our target audience.
+
+## Recommendation
+
+Build Agent Factory as a 5-phase evolution:
+
+1. **v0.2:** Fix existing gaps (3 missing commands, deployment-advisor, managed-agents skill, domain templates)
+2. **v0.3:** Incorporate OpenClaw patterns (3-tier memory, WAL, proactive agent, isolated cron)
+3. **v0.4:** Incorporate Paperclip patterns (heartbeat, goal hierarchy, budgets, governance, org-chart)
+4. **v0.5:** Self-learning systems (feedback loops, performance scoring, pipeline optimization)
+5. **v1.0:** Full integration (MCP integrations, Docker deployment, templates marketplace, import/export)
+
+The key differentiator throughout: every feature is accessible through guided, AI-assisted construction with progressive complexity — start simple, grow as needed.
+
+## Sources
+
+| # | Source | Type | Quality | Used in |
+|---|--------|------|---------|---------|
+| 1 | [OpenClaw GitHub](https://github.com/openclaw/openclaw) | official | high | 1,2,4,5,7 |
+| 2 | [Paperclip GitHub](https://github.com/paperclipai/paperclip) | official | high | 1,2,4,5,6,7 |
+| 3 | [OpenClaw Docs](https://docs.openclaw.ai) | official | high | 2,5 |
+| 4 | [Paperclip Docs](https://paperclip.ing/docs) | official | high | 2,3,5,7 |
+| 5 | [Proactive Agent Skill](https://github.com/openclaw/skills/blob/main/skills/halthelobster/proactive-agent/SKILL.md) | official | high | 3 |
+| 6 | [MindStudio Heartbeat Article](https://www.mindstudio.ai/blog/heartbeat-pattern-paperclip-ai-agents-24-7) | community | medium | 3 |
+| 7 | [DigitalOcean: What is OpenClaw](https://www.digitalocean.com/resources/articles/what-is-openclaw) | community | medium | 1 |
+| 8 | [Medium: How OpenClaw Works](https://bibek-poudel.medium.com/how-openclaw-works-understanding-ai-agents-through-a-real-architecture-5d59cc7a4764) | community | medium | 2 |
+| 9 | [Medium: Paperclip as Company](https://medium.com/@creativeaininja/paperclip-the-open-source-platform-turning-ai-agents-into-an-actual-company-7348015c5bf7) | community | medium | 5 |
+| 10 | [awesome-openclaw-skills](https://github.com/VoltAgent/awesome-openclaw-skills) | community | medium | 6 |
+| 11 | `skills/agent-system-design/references/feature-map.md` | codebase | high | 1 |
+| 12 | `skills/agent-system-design/references/security-patterns.md` | codebase | high | 7 |