Kjell Tore Guttormsen 7419d4283d docs(plans): Agent Factory ultraplan + execution guide

27-step plan across 8 sessions in 3 waves for transforming
agent-builder into Agent Factory v1.0.0. Includes research briefs,
spec, and wave-by-wave execution prompts with scope fences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 07:35:29 +02:00

17 KiB

Raw Blame History

type

created

repos_analyzed

purpose

source-code-analysis

2026-04-11

paperclipai/paperclip

openclaw/openclaw

Implementation-level details for replicating best patterns in Agent Factory

Source Code Analysis: OpenClaw & Paperclip

Repos were cloned and analyzed at code level on 2026-04-11. This document captures implementation details NOT available in docs or articles — the actual patterns, interfaces, and mechanisms worth replicating.

Critical Corrections (vs. docs/articles)

These are things docs described differently than the code implements:

Canvas/A2UI (OpenClaw) is NOT generative rendering. It's a static file server. Agents write files to a workspace directory, the canvas-host serves them over HTTP. No server-side rendering, no UI generation. This is NOT a meaningful capability gap for Claude Code.
Goal hierarchy (Paperclip) is a simple adjacency list. Just a parent_id FK on the goals table. No recursive traversal at runtime — only the directly referenced goal is passed to agents in context_snapshot. Docs said "full ancestry" but that's aspirational, not implemented.
Budget enforcement (Paperclip) is post-hoc, not atomic. Checked AFTER each run via evaluateCostEvent(): reads SUM(cost_cents), compares with policy, pauses agent if exceeded. No pre-run budget reservation. Robust enough in practice.
OpenClaw has real vector memory. Not just MEMORY.md files. Uses sqlite-vec extension for vector search with embedding providers (Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local llama). This is significantly more sophisticated than file-based memory.

Paperclip Implementation Details

Heartbeat Scheduler

File: server/src/services/heartbeat.ts (4534 lines)

Poll-based, not event-driven. tickTimers() iterates all agents on each tick:

tickTimers: async (now = new Date()) => {
  const allAgents = await db.select().from(agents);
  for (const agent of allAgents) {
    if (agent.status === "paused" || "terminated" || "pending_approval") continue;
    const policy = parseHeartbeatPolicy(agent);
    if (!policy.enabled || policy.intervalSec <= 0) continue;
    const elapsed = now.getTime() - new Date(agent.lastHeartbeatAt ?? agent.createdAt).getTime();
    if (elapsed < policy.intervalSec * 1000) continue;
    await enqueueWakeup(agent.id, { source: "timer" });
  }
}

Heartbeat policy from agent.runtimeConfig.heartbeat:

enabled: boolean
intervalSec: number
wakeOnDemand: boolean
maxConcurrentRuns: 1-10

4 wakeup triggers: timer, assignment, on_demand, automation.

Concurrency control: in-process promise chain per agent (startLocksByAgent Map). Not distributed — single server process only.

Run Lifecycle

enqueueWakeup() → insert heartbeat_runs (status=queued) + agent_wakeup_requests
startNextQueuedRunForAgent() → check running count vs maxConcurrentRuns
claimQueuedRun() → UPDATE heartbeat_runs SET status='running' WHERE status='queued'
executeRun() → call adapter.execute(), stream output via onLog
On completion → update runs, runtime state, task sessions, create cost events
Orphan detection: reapOrphanedRuns() checks PIDs, auto-retries once

Adapter Interface

File: packages/adapter-utils/src/types.ts

interface ServerAdapterModule {
  type: string;
  execute(ctx: AdapterExecutionContext): Promise<AdapterExecutionResult>;
  testEnvironment(ctx: AdapterEnvironmentTestContext): Promise<AdapterEnvironmentTestResult>;
  listSkills?: (ctx) => Promise<AdapterSkillSnapshot>;
  syncSkills?: (ctx, desiredSkills) => Promise<AdapterSkillSnapshot>;
  sessionCodec?: AdapterSessionCodec;
  sessionManagement?: AdapterSessionManagement;
}

10 built-in adapters: claude_local, codex_local, cursor_local, gemini_local, openclaw_gateway, opencode_local, pi_local, hermes_local, process, http.

Claude Adapter Execution

File: packages/adapters/claude-local/src/server/execute.ts

Invokes CLI as:

claude --print - --output-format stream-json --verbose \
  [--resume <sessionId>] \
  [--dangerously-skip-permissions] \
  [--model <model>] \
  [--max-turns N] \
  [--append-system-prompt-file <file>] \
  [--add-dir <skillsDir>]

Prompt composed from: bootstrapPromptTemplate (fresh sessions only) + wake payload

session handoff note + main promptTemplate. Template variables: {{agent.id}}, {{agent.name}}, {{context.wakeReason}}, etc.

Task Checkout (Atomic Locking)

File: server/src/services/heartbeat.ts (lines 3756-4010)

Issues have execution_run_id column as soft lock. Uses PostgreSQL row-level locking:

SELECT id FROM issues WHERE id = $1 AND company_id = $2 FOR UPDATE

Then conditional update:

UPDATE issues SET execution_run_id = $claimed_id
WHERE id = $issue_id AND (execution_run_id IS NULL OR execution_run_id = $claimed_id)

When same agent has running run → coalesce (merge context). When different agent → defer (status deferred_issue_execution), promoted when original completes.

Budget Enforcement

File: server/src/services/budgets.ts

Schema:

budget_policies: scope_type (company|agent|project), scope_id, metric (billed_cents),
  window_kind (calendar_month_utc|lifetime), amount (cents), warn_percent (80),
  hard_stop_enabled, notify_enabled

Flow after each run:

Load active policies for company/agent/project
SELECT SUM(cost_cents) FROM cost_events filtered by window
If >= soft threshold → create budget_incidents (type soft)
If >= amount AND hard_stop → pauseScopeForBudget() → UPDATE agents SET status='paused' → cancelBudgetScopeWork() → SIGTERM → SIGKILL (with graceSec)

Pre-run check: getInvocationBlock() only checks paused flag, not live budget sum.

Skills System

Skills injected as symlinked tmpdir per run:

async function buildSkillsDir(config) {
  const tmp = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-skills-"));
  const target = path.join(tmp, ".claude", "skills");
  await fs.mkdir(target, { recursive: true });
  for (const entry of availableEntries) {
    if (!desiredNames.has(entry.key)) continue;
    await fs.symlink(entry.source, path.join(target, entry.runtimeName));
  }
  return tmp; // Passed as: claude --add-dir <skillsDir>
}

Company skills stored in DB: company_skills table with markdown content, source_type (github|url|local_path|skills_sh), file_inventory, trust_level.

Session Persistence

agent_task_sessions table: unique on (companyId, agentId, adapterType, taskKey).

taskKey = issueId (for issue-scoped) or "__heartbeat__" (timer-only)
sessionParamsJson = adapter-specific (Claude stores { sessionId, cwd })
Upserted after each run completion

Session compaction: rotate after 200 runs, 2M raw input tokens, or 72h age. Claude adapter: nativeContextManagement: "confirmed" → compaction disabled (Claude manages its own context window).

Org Chart

Just agents.reportsTo self-referential FK. agents.role text field. Rendered as SVG server-side (5 visual styles). No separate table.

Database Schema

PostgreSQL via Drizzle ORM. 55 migrations. Key tables:

companies — tenant root, status, budget
agents — adapter_type, adapter_config (jsonb), runtime_config (jsonb), reports_to, status, budget
goals — self-referencing parent_id, level (company/project/task), owner_agent_id
issues — FK to goals/projects/agents, execution_run_id (soft lock), parent_id
heartbeat_runs — status, context_snapshot (jsonb), session_id, process_pid, usage_json
agent_wakeup_requests — wake queue with status enum
agent_task_sessions — per-(agent, adapter, taskKey) session state
budget_policies / budget_incidents / cost_events — cost control
company_skills — skill definitions with markdown content
approvals — human approval requests
routines — scheduled workflows with cron expressions

Agent Configuration Format

{
  "adapterConfig": {
    "command": "claude",
    "model": "claude-opus-4-5",
    "cwd": "/path/to/project",
    "promptTemplate": "You are agent {{agent.name}}...",
    "instructionsFilePath": "/path/to/AGENTS.md",
    "dangerouslySkipPermissions": true,
    "maxTurnsPerRun": 0,
    "timeoutSec": 0,
    "graceSec": 20,
    "skills": ["paperclipai/paperclip/mcp-server"]
  },
  "runtimeConfig": {
    "heartbeat": {
      "enabled": true,
      "intervalSec": 300,
      "wakeOnDemand": true,
      "maxConcurrentRuns": 1
    }
  }
}

OpenClaw Implementation Details

Gateway

File: src/gateway/server.impl.ts

WebSocket server on port 18789. Flat dispatch table:

const coreGatewayHandlers: Record<string, GatewayRequestHandler> = {
  ...connectHandlers, ...chatHandlers, ...cronHandlers,
  ...skillsHandlers, ...sessionsHandlers, ...agentHandlers,
  ...channelsHandlers, ...modelsHandlers, // 28 handler groups
}

Auth: roles (operator | node), operator scopes (admin, read, write, approvals, pairing).

Skills System

Files: src/agents/skills/workspace.ts, skill-contract.ts

Skill = directory with SKILL.md. Frontmatter parsed for metadata.

Loading limits:

Max 300 candidates per root
Max 200 loaded per source
Max 150 in prompt
Max 30,000 chars in prompt
Max 256 KB per skill file

Prompt format (XML):

<available_skills>
  <skill>
    <name>github</name>
    <description>...</description>
    <location>~/.openclaw/workspace/skills/github/SKILL.md</location>
  </skill>
</available_skills>

Path compaction: home dir → ~ (saves 5-6 tokens per path).

Skill metadata fields: always, skillKey, emoji, homepage, os, requires (bins, anyBins, env, config), install specs (brew, node, go, uv, download).

ClawHub integration for remote skill registry (search, install, update).

Memory System

Files: packages/memory-host-sdk/

Two backends:

builtin — SQLite + sqlite-vec extension for vector search
qmd — External QuickMemory Daemon process

Embedding providers: Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local (node-llama).

Interface:

interface MemorySearchManager {
  search(query, opts?: { maxResults?, minScore?, sessionKey? }): Promise<MemorySearchResult[]>
  readFile(params): Promise<{ text, path }>
  status(): MemoryProviderStatus
  sync?(params?): Promise<void>
}

Session transcripts indexable into memory backend. MEMORY.md / memory.md as default memory file convention.

HEARTBEAT Mechanism

File: src/auto-reply/heartbeat.ts

Default prompt:

"Read HEARTBEAT.md if it exists. Follow it strictly. Do not infer or repeat
old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK."

HEARTBEAT.md task format:

tasks:
  - name: email-check
    interval: 30m
    prompt: "Check for urgent unread emails"

Key functions:

isHeartbeatContentEffectivelyEmpty() — skips API calls when file has only headers/empty items. Saves significant cost.
parseHeartbeatTasks() — parses YAML tasks block
isTaskDue() — checks intervals against last-run timestamps
stripHeartbeatToken() — strips HEARTBEAT_OK from responses; responses under ackMaxChars (300) suppressed from chat

HeartbeatRunner (infra/heartbeat-runner.ts):

Per-agent intervals (default 30m)
HeartbeatAgentState tracks lastRunMs, nextDueMs, intervalMs
On fire: reads HEARTBEAT.md, builds prompt, dispatches inbound message

Cron Service

File: src/cron/service.ts

Three schedule types:

{ kind: "at"; at: string } — one-shot
{ kind: "every"; everyMs: number } — interval
{ kind: "cron"; expr: string; tz?: string; staggerMs?: number } — cron expression

Two job payload types:

systemEvent — injects text into existing session (needs attention available)
agentTurn — fires full agent turn (true background autonomy)

Session targets: "main" | "isolated" | "current" | "session:<id>". Isolated gets own session key with freshness/rollover logic.

Startup catchup: runs up to 5 missed jobs immediately, staggers rest (5s gap). Failure alerts after N consecutive errors, 1h cooldown.

Multi-Agent Routing

Session key format: agent:<agentId>:<key> Type detection via: isCronSessionKey(), isSubagentSessionKey(), isAcpSessionKey()

Per-agent isolation: own workspace, session store, skill set, heartbeat config, model config.

Subagent spawning: ACP-based, session depth tracked in keys, reactivation support.

Channel Adapter Interface

File: src/channels/plugins/types.plugin.ts

type ChannelPlugin<ResolvedAccount> = {
  id: ChannelId;
  meta: ChannelMeta;
  capabilities: ChannelCapabilities;
  outbound?: ChannelOutboundAdapter;
  messaging?: ChannelMessagingAdapter;
  lifecycle?: ChannelLifecycleAdapter;
  heartbeat?: ChannelHeartbeatAdapter;
  security?: ChannelSecurityAdapter;
  agentTools?: ChannelAgentToolFactory;
  streaming?: ChannelStreamingAdapter;
  threading?: ChannelThreadingAdapter;
  // ~15 optional adapter slots total
}

Restart policy: exponential backoff (5s initial, 5min max, factor 2, jitter 0.1, max 10 attempts).

Security

Exec approval: ExecApprovalManager with promise-based flow, allow-once vs allow-always, 15s grace timeout
Tool policy: pickSandboxToolPolicy() per sandbox config
Security audit: comprehensive checks (gateway auth, channel config, plugin trust, exec surfaces, filesystem ACLs)
Auth rate limiting with browser-specific stricter limits
External content guard: tracks provenance, allowUnsafeExternalContent flag

Agent Configuration

agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-5"
      fallbacks: ["anthropic/claude-sonnet-4-5"]
    heartbeat:
      enabled: true
      every: "30m"
      prompt: "Check HEARTBEAT.md"
      ackMaxChars: 300
    skills:
      limits:
        maxSkillsInPrompt: 150
        maxSkillsPromptChars: 30000
  list:
    - id: "myagent"
      workspace: "~/workspace"
      model:
        primary: "anthropic/claude-sonnet-4-5"
      heartbeat:
        every: "1h"
      skills:
        filter: ["github", "slack"]

Workspace files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md, MEMORY.md.

Plugin Hooks

29 lifecycle hook points: before_model_resolve, before_prompt_build, before_agent_start, before_agent_reply, llm_input, llm_output, agent_end, inbound_claim, message_received, message_sending, message_sent, before_tool_call, after_tool_call, session_start, session_end, subagent_spawning, subagent_delivery_target, gateway_start, gateway_stop, before_dispatch, reply_dispatch, before_install, etc.

Patterns Worth Replicating in Agent Factory

From Paperclip

Heartbeat as context injection — Each beat starts clean, loads curated context packet. Maps to: /schedule trigger + CLAUDE.md + memory files loaded per session.
Adapter interface — Clean execute(ctx) pattern. Maps to: our agent files are already adapter-like (model, tools, prompt per agent).
Budget as governance primitive — Post-hoc cost tracking with pause thresholds. Maps to: hook that reads /usage after each run, logs to cost-events file, alerts when threshold crossed.
Task checkout via file locking — Paperclip uses PostgreSQL. We can use file-based locking (write task.lock with agent name, check before claiming).
Session persistence via taskKey — Different tasks get different sessions. Maps to: --resume with task-specific session IDs.

From OpenClaw

HEARTBEAT.md with task parsing — YAML tasks block with intervals and due-time checking. Maps directly to our generated HEARTBEAT.md files.
Emptiness detection — Skip API calls when heartbeat file is effectively empty. Critical cost saver. Include in generated heartbeat scripts.
Skill prompt XML format — Standardized skill discovery in system prompt. Our skills already use this via Claude Code's built-in mechanism.
3-tier memory — SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold). Maps to: templates we generate in the user's project.
Startup catchup with stagger — Run missed jobs on restart, but don't thundering-herd. Include in generated automation scripts.

Unique to Agent Factory

Guided construction — Neither tool helps you BUILD the system. We do.
Progressive complexity — Start with 1 agent, grow to full org.
Domain templates — Not just researcher→writer→reviewer. Monitoring, code review, data processing, research synthesis.
Claude Code-native — No PostgreSQL, no Node.js server, no Docker required. Just agents, skills, hooks, settings.json, /schedule.

17 KiB Raw Blame History

Source Code Analysis: OpenClaw & Paperclip

Critical Corrections (vs. docs/articles)

Paperclip Implementation Details

Heartbeat Scheduler

Run Lifecycle

Adapter Interface

Claude Adapter Execution

Task Checkout (Atomic Locking)

Budget Enforcement

Skills System

Session Persistence

Org Chart

Database Schema

Agent Configuration Format

OpenClaw Implementation Details

Gateway

Skills System

Memory System

HEARTBEAT Mechanism

Cron Service

Multi-Agent Routing

Channel Adapter Interface

Security

Agent Configuration

Plugin Hooks

Patterns Worth Replicating in Agent Factory

From Paperclip

From OpenClaw

Unique to Agent Factory

17 KiB

Raw Blame History