agent-builder/.claude/research/source-code-analysis-2026-04-11.md
Kjell Tore Guttormsen 7419d4283d docs(plans): Agent Factory ultraplan + execution guide
27-step plan across 8 sessions in 3 waves for transforming
agent-builder into Agent Factory v1.0.0. Includes research briefs,
spec, and wave-by-wave execution prompts with scope fences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 07:35:29 +02:00

17 KiB

type created repos_analyzed purpose
source-code-analysis 2026-04-11
paperclipai/paperclip
openclaw/openclaw
Implementation-level details for replicating best patterns in Agent Factory

Source Code Analysis: OpenClaw & Paperclip

Repos were cloned and analyzed at code level on 2026-04-11. This document captures implementation details NOT available in docs or articles — the actual patterns, interfaces, and mechanisms worth replicating.

Critical Corrections (vs. docs/articles)

These are things docs described differently than the code implements:

  1. Canvas/A2UI (OpenClaw) is NOT generative rendering. It's a static file server. Agents write files to a workspace directory, the canvas-host serves them over HTTP. No server-side rendering, no UI generation. This is NOT a meaningful capability gap for Claude Code.

  2. Goal hierarchy (Paperclip) is a simple adjacency list. Just a parent_id FK on the goals table. No recursive traversal at runtime — only the directly referenced goal is passed to agents in context_snapshot. Docs said "full ancestry" but that's aspirational, not implemented.

  3. Budget enforcement (Paperclip) is post-hoc, not atomic. Checked AFTER each run via evaluateCostEvent(): reads SUM(cost_cents), compares with policy, pauses agent if exceeded. No pre-run budget reservation. Robust enough in practice.

  4. OpenClaw has real vector memory. Not just MEMORY.md files. Uses sqlite-vec extension for vector search with embedding providers (Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local llama). This is significantly more sophisticated than file-based memory.


Paperclip Implementation Details

Heartbeat Scheduler

File: server/src/services/heartbeat.ts (4534 lines)

Poll-based, not event-driven. tickTimers() iterates all agents on each tick:

tickTimers: async (now = new Date()) => {
  const allAgents = await db.select().from(agents);
  for (const agent of allAgents) {
    if (agent.status === "paused" || "terminated" || "pending_approval") continue;
    const policy = parseHeartbeatPolicy(agent);
    if (!policy.enabled || policy.intervalSec <= 0) continue;
    const elapsed = now.getTime() - new Date(agent.lastHeartbeatAt ?? agent.createdAt).getTime();
    if (elapsed < policy.intervalSec * 1000) continue;
    await enqueueWakeup(agent.id, { source: "timer" });
  }
}

Heartbeat policy from agent.runtimeConfig.heartbeat:

  • enabled: boolean
  • intervalSec: number
  • wakeOnDemand: boolean
  • maxConcurrentRuns: 1-10

4 wakeup triggers: timer, assignment, on_demand, automation.

Concurrency control: in-process promise chain per agent (startLocksByAgent Map). Not distributed — single server process only.

Run Lifecycle

  1. enqueueWakeup() → insert heartbeat_runs (status=queued) + agent_wakeup_requests
  2. startNextQueuedRunForAgent() → check running count vs maxConcurrentRuns
  3. claimQueuedRun()UPDATE heartbeat_runs SET status='running' WHERE status='queued'
  4. executeRun() → call adapter.execute(), stream output via onLog
  5. On completion → update runs, runtime state, task sessions, create cost events
  6. Orphan detection: reapOrphanedRuns() checks PIDs, auto-retries once

Adapter Interface

File: packages/adapter-utils/src/types.ts

interface ServerAdapterModule {
  type: string;
  execute(ctx: AdapterExecutionContext): Promise<AdapterExecutionResult>;
  testEnvironment(ctx: AdapterEnvironmentTestContext): Promise<AdapterEnvironmentTestResult>;
  listSkills?: (ctx) => Promise<AdapterSkillSnapshot>;
  syncSkills?: (ctx, desiredSkills) => Promise<AdapterSkillSnapshot>;
  sessionCodec?: AdapterSessionCodec;
  sessionManagement?: AdapterSessionManagement;
}

10 built-in adapters: claude_local, codex_local, cursor_local, gemini_local, openclaw_gateway, opencode_local, pi_local, hermes_local, process, http.

Claude Adapter Execution

File: packages/adapters/claude-local/src/server/execute.ts

Invokes CLI as:

claude --print - --output-format stream-json --verbose \
  [--resume <sessionId>] \
  [--dangerously-skip-permissions] \
  [--model <model>] \
  [--max-turns N] \
  [--append-system-prompt-file <file>] \
  [--add-dir <skillsDir>]

Prompt composed from: bootstrapPromptTemplate (fresh sessions only) + wake payload

  • session handoff note + main promptTemplate. Template variables: {{agent.id}}, {{agent.name}}, {{context.wakeReason}}, etc.

Task Checkout (Atomic Locking)

File: server/src/services/heartbeat.ts (lines 3756-4010)

Issues have execution_run_id column as soft lock. Uses PostgreSQL row-level locking:

SELECT id FROM issues WHERE id = $1 AND company_id = $2 FOR UPDATE

Then conditional update:

UPDATE issues SET execution_run_id = $claimed_id
WHERE id = $issue_id AND (execution_run_id IS NULL OR execution_run_id = $claimed_id)

When same agent has running run → coalesce (merge context). When different agent → defer (status deferred_issue_execution), promoted when original completes.

Budget Enforcement

File: server/src/services/budgets.ts

Schema:

budget_policies: scope_type (company|agent|project), scope_id, metric (billed_cents),
  window_kind (calendar_month_utc|lifetime), amount (cents), warn_percent (80),
  hard_stop_enabled, notify_enabled

Flow after each run:

  1. Load active policies for company/agent/project
  2. SELECT SUM(cost_cents) FROM cost_events filtered by window
  3. If >= soft threshold → create budget_incidents (type soft)
  4. If >= amount AND hard_stop → pauseScopeForBudget()UPDATE agents SET status='paused'cancelBudgetScopeWork() → SIGTERM → SIGKILL (with graceSec)

Pre-run check: getInvocationBlock() only checks paused flag, not live budget sum.

Skills System

Skills injected as symlinked tmpdir per run:

async function buildSkillsDir(config) {
  const tmp = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-skills-"));
  const target = path.join(tmp, ".claude", "skills");
  await fs.mkdir(target, { recursive: true });
  for (const entry of availableEntries) {
    if (!desiredNames.has(entry.key)) continue;
    await fs.symlink(entry.source, path.join(target, entry.runtimeName));
  }
  return tmp; // Passed as: claude --add-dir <skillsDir>
}

Company skills stored in DB: company_skills table with markdown content, source_type (github|url|local_path|skills_sh), file_inventory, trust_level.

Session Persistence

agent_task_sessions table: unique on (companyId, agentId, adapterType, taskKey).

  • taskKey = issueId (for issue-scoped) or "__heartbeat__" (timer-only)
  • sessionParamsJson = adapter-specific (Claude stores { sessionId, cwd })
  • Upserted after each run completion

Session compaction: rotate after 200 runs, 2M raw input tokens, or 72h age. Claude adapter: nativeContextManagement: "confirmed" → compaction disabled (Claude manages its own context window).

Org Chart

Just agents.reportsTo self-referential FK. agents.role text field. Rendered as SVG server-side (5 visual styles). No separate table.

Database Schema

PostgreSQL via Drizzle ORM. 55 migrations. Key tables:

  • companies — tenant root, status, budget
  • agents — adapter_type, adapter_config (jsonb), runtime_config (jsonb), reports_to, status, budget
  • goals — self-referencing parent_id, level (company/project/task), owner_agent_id
  • issues — FK to goals/projects/agents, execution_run_id (soft lock), parent_id
  • heartbeat_runs — status, context_snapshot (jsonb), session_id, process_pid, usage_json
  • agent_wakeup_requests — wake queue with status enum
  • agent_task_sessions — per-(agent, adapter, taskKey) session state
  • budget_policies / budget_incidents / cost_events — cost control
  • company_skills — skill definitions with markdown content
  • approvals — human approval requests
  • routines — scheduled workflows with cron expressions

Agent Configuration Format

{
  "adapterConfig": {
    "command": "claude",
    "model": "claude-opus-4-5",
    "cwd": "/path/to/project",
    "promptTemplate": "You are agent {{agent.name}}...",
    "instructionsFilePath": "/path/to/AGENTS.md",
    "dangerouslySkipPermissions": true,
    "maxTurnsPerRun": 0,
    "timeoutSec": 0,
    "graceSec": 20,
    "skills": ["paperclipai/paperclip/mcp-server"]
  },
  "runtimeConfig": {
    "heartbeat": {
      "enabled": true,
      "intervalSec": 300,
      "wakeOnDemand": true,
      "maxConcurrentRuns": 1
    }
  }
}

OpenClaw Implementation Details

Gateway

File: src/gateway/server.impl.ts

WebSocket server on port 18789. Flat dispatch table:

const coreGatewayHandlers: Record<string, GatewayRequestHandler> = {
  ...connectHandlers, ...chatHandlers, ...cronHandlers,
  ...skillsHandlers, ...sessionsHandlers, ...agentHandlers,
  ...channelsHandlers, ...modelsHandlers, // 28 handler groups
}

Auth: roles (operator | node), operator scopes (admin, read, write, approvals, pairing).

Skills System

Files: src/agents/skills/workspace.ts, skill-contract.ts

Skill = directory with SKILL.md. Frontmatter parsed for metadata.

Loading limits:

  • Max 300 candidates per root
  • Max 200 loaded per source
  • Max 150 in prompt
  • Max 30,000 chars in prompt
  • Max 256 KB per skill file

Prompt format (XML):

<available_skills>
  <skill>
    <name>github</name>
    <description>...</description>
    <location>~/.openclaw/workspace/skills/github/SKILL.md</location>
  </skill>
</available_skills>

Path compaction: home dir → ~ (saves 5-6 tokens per path).

Skill metadata fields: always, skillKey, emoji, homepage, os, requires (bins, anyBins, env, config), install specs (brew, node, go, uv, download).

ClawHub integration for remote skill registry (search, install, update).

Memory System

Files: packages/memory-host-sdk/

Two backends:

  • builtin — SQLite + sqlite-vec extension for vector search
  • qmd — External QuickMemory Daemon process

Embedding providers: Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local (node-llama).

Interface:

interface MemorySearchManager {
  search(query, opts?: { maxResults?, minScore?, sessionKey? }): Promise<MemorySearchResult[]>
  readFile(params): Promise<{ text, path }>
  status(): MemoryProviderStatus
  sync?(params?): Promise<void>
}

Session transcripts indexable into memory backend. MEMORY.md / memory.md as default memory file convention.

HEARTBEAT Mechanism

File: src/auto-reply/heartbeat.ts

Default prompt:

"Read HEARTBEAT.md if it exists. Follow it strictly. Do not infer or repeat
old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK."

HEARTBEAT.md task format:

tasks:
  - name: email-check
    interval: 30m
    prompt: "Check for urgent unread emails"

Key functions:

  • isHeartbeatContentEffectivelyEmpty() — skips API calls when file has only headers/empty items. Saves significant cost.
  • parseHeartbeatTasks() — parses YAML tasks block
  • isTaskDue() — checks intervals against last-run timestamps
  • stripHeartbeatToken() — strips HEARTBEAT_OK from responses; responses under ackMaxChars (300) suppressed from chat

HeartbeatRunner (infra/heartbeat-runner.ts):

  • Per-agent intervals (default 30m)
  • HeartbeatAgentState tracks lastRunMs, nextDueMs, intervalMs
  • On fire: reads HEARTBEAT.md, builds prompt, dispatches inbound message

Cron Service

File: src/cron/service.ts

Three schedule types:

  • { kind: "at"; at: string } — one-shot
  • { kind: "every"; everyMs: number } — interval
  • { kind: "cron"; expr: string; tz?: string; staggerMs?: number } — cron expression

Two job payload types:

  • systemEvent — injects text into existing session (needs attention available)
  • agentTurn — fires full agent turn (true background autonomy)

Session targets: "main" | "isolated" | "current" | "session:<id>". Isolated gets own session key with freshness/rollover logic.

Startup catchup: runs up to 5 missed jobs immediately, staggers rest (5s gap). Failure alerts after N consecutive errors, 1h cooldown.

Multi-Agent Routing

Session key format: agent:<agentId>:<key> Type detection via: isCronSessionKey(), isSubagentSessionKey(), isAcpSessionKey()

Per-agent isolation: own workspace, session store, skill set, heartbeat config, model config.

Subagent spawning: ACP-based, session depth tracked in keys, reactivation support.

Channel Adapter Interface

File: src/channels/plugins/types.plugin.ts

type ChannelPlugin<ResolvedAccount> = {
  id: ChannelId;
  meta: ChannelMeta;
  capabilities: ChannelCapabilities;
  outbound?: ChannelOutboundAdapter;
  messaging?: ChannelMessagingAdapter;
  lifecycle?: ChannelLifecycleAdapter;
  heartbeat?: ChannelHeartbeatAdapter;
  security?: ChannelSecurityAdapter;
  agentTools?: ChannelAgentToolFactory;
  streaming?: ChannelStreamingAdapter;
  threading?: ChannelThreadingAdapter;
  // ~15 optional adapter slots total
}

Restart policy: exponential backoff (5s initial, 5min max, factor 2, jitter 0.1, max 10 attempts).

Security

  • Exec approval: ExecApprovalManager with promise-based flow, allow-once vs allow-always, 15s grace timeout
  • Tool policy: pickSandboxToolPolicy() per sandbox config
  • Security audit: comprehensive checks (gateway auth, channel config, plugin trust, exec surfaces, filesystem ACLs)
  • Auth rate limiting with browser-specific stricter limits
  • External content guard: tracks provenance, allowUnsafeExternalContent flag

Agent Configuration

agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-5"
      fallbacks: ["anthropic/claude-sonnet-4-5"]
    heartbeat:
      enabled: true
      every: "30m"
      prompt: "Check HEARTBEAT.md"
      ackMaxChars: 300
    skills:
      limits:
        maxSkillsInPrompt: 150
        maxSkillsPromptChars: 30000
  list:
    - id: "myagent"
      workspace: "~/workspace"
      model:
        primary: "anthropic/claude-sonnet-4-5"
      heartbeat:
        every: "1h"
      skills:
        filter: ["github", "slack"]

Workspace files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md, MEMORY.md.

Plugin Hooks

29 lifecycle hook points: before_model_resolve, before_prompt_build, before_agent_start, before_agent_reply, llm_input, llm_output, agent_end, inbound_claim, message_received, message_sending, message_sent, before_tool_call, after_tool_call, session_start, session_end, subagent_spawning, subagent_delivery_target, gateway_start, gateway_stop, before_dispatch, reply_dispatch, before_install, etc.


Patterns Worth Replicating in Agent Factory

From Paperclip

  1. Heartbeat as context injection — Each beat starts clean, loads curated context packet. Maps to: /schedule trigger + CLAUDE.md + memory files loaded per session.

  2. Adapter interface — Clean execute(ctx) pattern. Maps to: our agent files are already adapter-like (model, tools, prompt per agent).

  3. Budget as governance primitive — Post-hoc cost tracking with pause thresholds. Maps to: hook that reads /usage after each run, logs to cost-events file, alerts when threshold crossed.

  4. Task checkout via file locking — Paperclip uses PostgreSQL. We can use file-based locking (write task.lock with agent name, check before claiming).

  5. Session persistence via taskKey — Different tasks get different sessions. Maps to: --resume with task-specific session IDs.

From OpenClaw

  1. HEARTBEAT.md with task parsing — YAML tasks block with intervals and due-time checking. Maps directly to our generated HEARTBEAT.md files.

  2. Emptiness detection — Skip API calls when heartbeat file is effectively empty. Critical cost saver. Include in generated heartbeat scripts.

  3. Skill prompt XML format — Standardized skill discovery in system prompt. Our skills already use this via Claude Code's built-in mechanism.

  4. 3-tier memory — SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold). Maps to: templates we generate in the user's project.

  5. Startup catchup with stagger — Run missed jobs on restart, but don't thundering-herd. Include in generated automation scripts.

Unique to Agent Factory

  1. Guided construction — Neither tool helps you BUILD the system. We do.
  2. Progressive complexity — Start with 1 agent, grow to full org.
  3. Domain templates — Not just researcher→writer→reviewer. Monitoring, code review, data processing, research synthesis.
  4. Claude Code-native — No PostgreSQL, no Node.js server, no Docker required. Just agents, skills, hooks, settings.json, /schedule.