docs(plans): Agent Factory ultraplan + execution guide
27-step plan across 8 sessions in 3 waves for transforming agent-builder into Agent Factory v1.0.0. Includes research briefs, spec, and wave-by-wave execution prompts with scope fences. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
075383990f
commit
7419d4283d
5 changed files with 2294 additions and 0 deletions
489
.claude/research/source-code-analysis-2026-04-11.md
Normal file
489
.claude/research/source-code-analysis-2026-04-11.md
Normal file
|
|
@ -0,0 +1,489 @@
|
|||
---
|
||||
type: source-code-analysis
|
||||
created: 2026-04-11
|
||||
repos_analyzed: [paperclipai/paperclip, openclaw/openclaw]
|
||||
purpose: "Implementation-level details for replicating best patterns in Agent Factory"
|
||||
---
|
||||
|
||||
# Source Code Analysis: OpenClaw & Paperclip
|
||||
|
||||
Repos were cloned and analyzed at code level on 2026-04-11. This document
|
||||
captures implementation details NOT available in docs or articles — the actual
|
||||
patterns, interfaces, and mechanisms worth replicating.
|
||||
|
||||
## Critical Corrections (vs. docs/articles)
|
||||
|
||||
These are things docs described differently than the code implements:
|
||||
|
||||
1. **Canvas/A2UI (OpenClaw) is NOT generative rendering.** It's a static file
|
||||
server. Agents write files to a workspace directory, the canvas-host serves
|
||||
them over HTTP. No server-side rendering, no UI generation. This is NOT a
|
||||
meaningful capability gap for Claude Code.
|
||||
|
||||
2. **Goal hierarchy (Paperclip) is a simple adjacency list.** Just a `parent_id`
|
||||
FK on the `goals` table. No recursive traversal at runtime — only the directly
|
||||
referenced goal is passed to agents in `context_snapshot`. Docs said "full
|
||||
ancestry" but that's aspirational, not implemented.
|
||||
|
||||
3. **Budget enforcement (Paperclip) is post-hoc, not atomic.** Checked AFTER each
|
||||
run via `evaluateCostEvent()`: reads `SUM(cost_cents)`, compares with policy,
|
||||
pauses agent if exceeded. No pre-run budget reservation. Robust enough in practice.
|
||||
|
||||
4. **OpenClaw has real vector memory.** Not just MEMORY.md files. Uses `sqlite-vec`
|
||||
extension for vector search with embedding providers (Gemini, Mistral, Ollama,
|
||||
OpenAI, Voyage, Bedrock, local llama). This is significantly more sophisticated
|
||||
than file-based memory.
|
||||
|
||||
---
|
||||
|
||||
## Paperclip Implementation Details
|
||||
|
||||
### Heartbeat Scheduler
|
||||
|
||||
**File:** `server/src/services/heartbeat.ts` (4534 lines)
|
||||
|
||||
Poll-based, not event-driven. `tickTimers()` iterates all agents on each tick:
|
||||
|
||||
```typescript
|
||||
tickTimers: async (now = new Date()) => {
|
||||
const allAgents = await db.select().from(agents);
|
||||
for (const agent of allAgents) {
|
||||
if (agent.status === "paused" || "terminated" || "pending_approval") continue;
|
||||
const policy = parseHeartbeatPolicy(agent);
|
||||
if (!policy.enabled || policy.intervalSec <= 0) continue;
|
||||
const elapsed = now.getTime() - new Date(agent.lastHeartbeatAt ?? agent.createdAt).getTime();
|
||||
if (elapsed < policy.intervalSec * 1000) continue;
|
||||
await enqueueWakeup(agent.id, { source: "timer" });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Heartbeat policy from `agent.runtimeConfig.heartbeat`:
|
||||
- `enabled: boolean`
|
||||
- `intervalSec: number`
|
||||
- `wakeOnDemand: boolean`
|
||||
- `maxConcurrentRuns: 1-10`
|
||||
|
||||
4 wakeup triggers: `timer`, `assignment`, `on_demand`, `automation`.
|
||||
|
||||
Concurrency control: in-process promise chain per agent (`startLocksByAgent` Map).
|
||||
Not distributed — single server process only.
|
||||
|
||||
### Run Lifecycle
|
||||
|
||||
1. `enqueueWakeup()` → insert `heartbeat_runs` (status=queued) + `agent_wakeup_requests`
|
||||
2. `startNextQueuedRunForAgent()` → check running count vs maxConcurrentRuns
|
||||
3. `claimQueuedRun()` → `UPDATE heartbeat_runs SET status='running' WHERE status='queued'`
|
||||
4. `executeRun()` → call `adapter.execute()`, stream output via `onLog`
|
||||
5. On completion → update runs, runtime state, task sessions, create cost events
|
||||
6. Orphan detection: `reapOrphanedRuns()` checks PIDs, auto-retries once
|
||||
|
||||
### Adapter Interface
|
||||
|
||||
**File:** `packages/adapter-utils/src/types.ts`
|
||||
|
||||
```typescript
|
||||
interface ServerAdapterModule {
|
||||
type: string;
|
||||
execute(ctx: AdapterExecutionContext): Promise<AdapterExecutionResult>;
|
||||
testEnvironment(ctx: AdapterEnvironmentTestContext): Promise<AdapterEnvironmentTestResult>;
|
||||
listSkills?: (ctx) => Promise<AdapterSkillSnapshot>;
|
||||
syncSkills?: (ctx, desiredSkills) => Promise<AdapterSkillSnapshot>;
|
||||
sessionCodec?: AdapterSessionCodec;
|
||||
sessionManagement?: AdapterSessionManagement;
|
||||
}
|
||||
```
|
||||
|
||||
10 built-in adapters: `claude_local`, `codex_local`, `cursor_local`, `gemini_local`,
|
||||
`openclaw_gateway`, `opencode_local`, `pi_local`, `hermes_local`, `process`, `http`.
|
||||
|
||||
### Claude Adapter Execution
|
||||
|
||||
**File:** `packages/adapters/claude-local/src/server/execute.ts`
|
||||
|
||||
Invokes CLI as:
|
||||
```
|
||||
claude --print - --output-format stream-json --verbose \
|
||||
[--resume <sessionId>] \
|
||||
[--dangerously-skip-permissions] \
|
||||
[--model <model>] \
|
||||
[--max-turns N] \
|
||||
[--append-system-prompt-file <file>] \
|
||||
[--add-dir <skillsDir>]
|
||||
```
|
||||
|
||||
Prompt composed from: `bootstrapPromptTemplate` (fresh sessions only) + wake payload
|
||||
+ session handoff note + main `promptTemplate`. Template variables: `{{agent.id}}`,
|
||||
`{{agent.name}}`, `{{context.wakeReason}}`, etc.
|
||||
|
||||
### Task Checkout (Atomic Locking)
|
||||
|
||||
**File:** `server/src/services/heartbeat.ts` (lines 3756-4010)
|
||||
|
||||
Issues have `execution_run_id` column as soft lock. Uses PostgreSQL row-level locking:
|
||||
|
||||
```sql
|
||||
SELECT id FROM issues WHERE id = $1 AND company_id = $2 FOR UPDATE
|
||||
```
|
||||
|
||||
Then conditional update:
|
||||
```sql
|
||||
UPDATE issues SET execution_run_id = $claimed_id
|
||||
WHERE id = $issue_id AND (execution_run_id IS NULL OR execution_run_id = $claimed_id)
|
||||
```
|
||||
|
||||
When same agent has running run → coalesce (merge context).
|
||||
When different agent → defer (status `deferred_issue_execution`), promoted when original completes.
|
||||
|
||||
### Budget Enforcement
|
||||
|
||||
**File:** `server/src/services/budgets.ts`
|
||||
|
||||
Schema:
|
||||
```
|
||||
budget_policies: scope_type (company|agent|project), scope_id, metric (billed_cents),
|
||||
window_kind (calendar_month_utc|lifetime), amount (cents), warn_percent (80),
|
||||
hard_stop_enabled, notify_enabled
|
||||
```
|
||||
|
||||
Flow after each run:
|
||||
1. Load active policies for company/agent/project
|
||||
2. `SELECT SUM(cost_cents) FROM cost_events` filtered by window
|
||||
3. If >= soft threshold → create `budget_incidents` (type soft)
|
||||
4. If >= amount AND hard_stop → `pauseScopeForBudget()` → `UPDATE agents SET status='paused'`
|
||||
→ `cancelBudgetScopeWork()` → SIGTERM → SIGKILL (with graceSec)
|
||||
|
||||
Pre-run check: `getInvocationBlock()` only checks `paused` flag, not live budget sum.
|
||||
|
||||
### Skills System
|
||||
|
||||
Skills injected as symlinked tmpdir per run:
|
||||
```typescript
|
||||
async function buildSkillsDir(config) {
|
||||
const tmp = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-skills-"));
|
||||
const target = path.join(tmp, ".claude", "skills");
|
||||
await fs.mkdir(target, { recursive: true });
|
||||
for (const entry of availableEntries) {
|
||||
if (!desiredNames.has(entry.key)) continue;
|
||||
await fs.symlink(entry.source, path.join(target, entry.runtimeName));
|
||||
}
|
||||
return tmp; // Passed as: claude --add-dir <skillsDir>
|
||||
}
|
||||
```
|
||||
|
||||
Company skills stored in DB: `company_skills` table with `markdown` content,
|
||||
`source_type` (github|url|local_path|skills_sh), `file_inventory`, `trust_level`.
|
||||
|
||||
### Session Persistence
|
||||
|
||||
`agent_task_sessions` table: unique on `(companyId, agentId, adapterType, taskKey)`.
|
||||
- taskKey = issueId (for issue-scoped) or `"__heartbeat__"` (timer-only)
|
||||
- sessionParamsJson = adapter-specific (Claude stores `{ sessionId, cwd }`)
|
||||
- Upserted after each run completion
|
||||
|
||||
Session compaction: rotate after 200 runs, 2M raw input tokens, or 72h age.
|
||||
Claude adapter: `nativeContextManagement: "confirmed"` → compaction disabled
|
||||
(Claude manages its own context window).
|
||||
|
||||
### Org Chart
|
||||
|
||||
Just `agents.reportsTo` self-referential FK. `agents.role` text field.
|
||||
Rendered as SVG server-side (5 visual styles). No separate table.
|
||||
|
||||
### Database Schema
|
||||
|
||||
PostgreSQL via Drizzle ORM. 55 migrations. Key tables:
|
||||
- `companies` — tenant root, status, budget
|
||||
- `agents` — adapter_type, adapter_config (jsonb), runtime_config (jsonb), reports_to, status, budget
|
||||
- `goals` — self-referencing parent_id, level (company/project/task), owner_agent_id
|
||||
- `issues` — FK to goals/projects/agents, execution_run_id (soft lock), parent_id
|
||||
- `heartbeat_runs` — status, context_snapshot (jsonb), session_id, process_pid, usage_json
|
||||
- `agent_wakeup_requests` — wake queue with status enum
|
||||
- `agent_task_sessions` — per-(agent, adapter, taskKey) session state
|
||||
- `budget_policies` / `budget_incidents` / `cost_events` — cost control
|
||||
- `company_skills` — skill definitions with markdown content
|
||||
- `approvals` — human approval requests
|
||||
- `routines` — scheduled workflows with cron expressions
|
||||
|
||||
### Agent Configuration Format
|
||||
|
||||
```json
|
||||
{
|
||||
"adapterConfig": {
|
||||
"command": "claude",
|
||||
"model": "claude-opus-4-5",
|
||||
"cwd": "/path/to/project",
|
||||
"promptTemplate": "You are agent {{agent.name}}...",
|
||||
"instructionsFilePath": "/path/to/AGENTS.md",
|
||||
"dangerouslySkipPermissions": true,
|
||||
"maxTurnsPerRun": 0,
|
||||
"timeoutSec": 0,
|
||||
"graceSec": 20,
|
||||
"skills": ["paperclipai/paperclip/mcp-server"]
|
||||
},
|
||||
"runtimeConfig": {
|
||||
"heartbeat": {
|
||||
"enabled": true,
|
||||
"intervalSec": 300,
|
||||
"wakeOnDemand": true,
|
||||
"maxConcurrentRuns": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OpenClaw Implementation Details
|
||||
|
||||
### Gateway
|
||||
|
||||
**File:** `src/gateway/server.impl.ts`
|
||||
|
||||
WebSocket server on port 18789. Flat dispatch table:
|
||||
```typescript
|
||||
const coreGatewayHandlers: Record<string, GatewayRequestHandler> = {
|
||||
...connectHandlers, ...chatHandlers, ...cronHandlers,
|
||||
...skillsHandlers, ...sessionsHandlers, ...agentHandlers,
|
||||
...channelsHandlers, ...modelsHandlers, // 28 handler groups
|
||||
}
|
||||
```
|
||||
|
||||
Auth: roles (`operator` | `node`), operator scopes
|
||||
(`admin`, `read`, `write`, `approvals`, `pairing`).
|
||||
|
||||
### Skills System
|
||||
|
||||
**Files:** `src/agents/skills/workspace.ts`, `skill-contract.ts`
|
||||
|
||||
Skill = directory with SKILL.md. Frontmatter parsed for metadata.
|
||||
|
||||
Loading limits:
|
||||
- Max 300 candidates per root
|
||||
- Max 200 loaded per source
|
||||
- Max 150 in prompt
|
||||
- Max 30,000 chars in prompt
|
||||
- Max 256 KB per skill file
|
||||
|
||||
Prompt format (XML):
|
||||
```xml
|
||||
<available_skills>
|
||||
<skill>
|
||||
<name>github</name>
|
||||
<description>...</description>
|
||||
<location>~/.openclaw/workspace/skills/github/SKILL.md</location>
|
||||
</skill>
|
||||
</available_skills>
|
||||
```
|
||||
|
||||
Path compaction: home dir → `~` (saves 5-6 tokens per path).
|
||||
|
||||
Skill metadata fields: `always`, `skillKey`, `emoji`, `homepage`, `os`,
|
||||
`requires` (bins, anyBins, env, config), `install` specs (brew, node, go, uv, download).
|
||||
|
||||
ClawHub integration for remote skill registry (search, install, update).
|
||||
|
||||
### Memory System
|
||||
|
||||
**Files:** `packages/memory-host-sdk/`
|
||||
|
||||
Two backends:
|
||||
- `builtin` — SQLite + sqlite-vec extension for vector search
|
||||
- `qmd` — External QuickMemory Daemon process
|
||||
|
||||
Embedding providers: Gemini, Mistral, Ollama, OpenAI, Voyage, Bedrock, local (node-llama).
|
||||
|
||||
Interface:
|
||||
```typescript
|
||||
interface MemorySearchManager {
|
||||
search(query, opts?: { maxResults?, minScore?, sessionKey? }): Promise<MemorySearchResult[]>
|
||||
readFile(params): Promise<{ text, path }>
|
||||
status(): MemoryProviderStatus
|
||||
sync?(params?): Promise<void>
|
||||
}
|
||||
```
|
||||
|
||||
Session transcripts indexable into memory backend. MEMORY.md / memory.md as default
|
||||
memory file convention.
|
||||
|
||||
### HEARTBEAT Mechanism
|
||||
|
||||
**File:** `src/auto-reply/heartbeat.ts`
|
||||
|
||||
Default prompt:
|
||||
```
|
||||
"Read HEARTBEAT.md if it exists. Follow it strictly. Do not infer or repeat
|
||||
old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK."
|
||||
```
|
||||
|
||||
HEARTBEAT.md task format:
|
||||
```yaml
|
||||
tasks:
|
||||
- name: email-check
|
||||
interval: 30m
|
||||
prompt: "Check for urgent unread emails"
|
||||
```
|
||||
|
||||
Key functions:
|
||||
- `isHeartbeatContentEffectivelyEmpty()` — skips API calls when file has only
|
||||
headers/empty items. Saves significant cost.
|
||||
- `parseHeartbeatTasks()` — parses YAML tasks block
|
||||
- `isTaskDue()` — checks intervals against last-run timestamps
|
||||
- `stripHeartbeatToken()` — strips HEARTBEAT_OK from responses; responses
|
||||
under `ackMaxChars` (300) suppressed from chat
|
||||
|
||||
**HeartbeatRunner** (`infra/heartbeat-runner.ts`):
|
||||
- Per-agent intervals (default 30m)
|
||||
- `HeartbeatAgentState` tracks lastRunMs, nextDueMs, intervalMs
|
||||
- On fire: reads HEARTBEAT.md, builds prompt, dispatches inbound message
|
||||
|
||||
### Cron Service
|
||||
|
||||
**File:** `src/cron/service.ts`
|
||||
|
||||
Three schedule types:
|
||||
- `{ kind: "at"; at: string }` — one-shot
|
||||
- `{ kind: "every"; everyMs: number }` — interval
|
||||
- `{ kind: "cron"; expr: string; tz?: string; staggerMs?: number }` — cron expression
|
||||
|
||||
Two job payload types:
|
||||
- `systemEvent` — injects text into existing session (needs attention available)
|
||||
- `agentTurn` — fires full agent turn (true background autonomy)
|
||||
|
||||
Session targets: `"main" | "isolated" | "current" | "session:<id>"`.
|
||||
Isolated gets own session key with freshness/rollover logic.
|
||||
|
||||
Startup catchup: runs up to 5 missed jobs immediately, staggers rest (5s gap).
|
||||
Failure alerts after N consecutive errors, 1h cooldown.
|
||||
|
||||
### Multi-Agent Routing
|
||||
|
||||
Session key format: `agent:<agentId>:<key>`
|
||||
Type detection via: `isCronSessionKey()`, `isSubagentSessionKey()`, `isAcpSessionKey()`
|
||||
|
||||
Per-agent isolation: own workspace, session store, skill set, heartbeat config, model config.
|
||||
|
||||
Subagent spawning: ACP-based, session depth tracked in keys, reactivation support.
|
||||
|
||||
### Channel Adapter Interface
|
||||
|
||||
**File:** `src/channels/plugins/types.plugin.ts`
|
||||
|
||||
```typescript
|
||||
type ChannelPlugin<ResolvedAccount> = {
|
||||
id: ChannelId;
|
||||
meta: ChannelMeta;
|
||||
capabilities: ChannelCapabilities;
|
||||
outbound?: ChannelOutboundAdapter;
|
||||
messaging?: ChannelMessagingAdapter;
|
||||
lifecycle?: ChannelLifecycleAdapter;
|
||||
heartbeat?: ChannelHeartbeatAdapter;
|
||||
security?: ChannelSecurityAdapter;
|
||||
agentTools?: ChannelAgentToolFactory;
|
||||
streaming?: ChannelStreamingAdapter;
|
||||
threading?: ChannelThreadingAdapter;
|
||||
// ~15 optional adapter slots total
|
||||
}
|
||||
```
|
||||
|
||||
Restart policy: exponential backoff (5s initial, 5min max, factor 2, jitter 0.1,
|
||||
max 10 attempts).
|
||||
|
||||
### Security
|
||||
|
||||
- Exec approval: `ExecApprovalManager` with promise-based flow, `allow-once` vs
|
||||
`allow-always`, 15s grace timeout
|
||||
- Tool policy: `pickSandboxToolPolicy()` per sandbox config
|
||||
- Security audit: comprehensive checks (gateway auth, channel config, plugin trust,
|
||||
exec surfaces, filesystem ACLs)
|
||||
- Auth rate limiting with browser-specific stricter limits
|
||||
- External content guard: tracks provenance, `allowUnsafeExternalContent` flag
|
||||
|
||||
### Agent Configuration
|
||||
|
||||
```yaml
|
||||
agents:
|
||||
defaults:
|
||||
model:
|
||||
primary: "anthropic/claude-opus-4-5"
|
||||
fallbacks: ["anthropic/claude-sonnet-4-5"]
|
||||
heartbeat:
|
||||
enabled: true
|
||||
every: "30m"
|
||||
prompt: "Check HEARTBEAT.md"
|
||||
ackMaxChars: 300
|
||||
skills:
|
||||
limits:
|
||||
maxSkillsInPrompt: 150
|
||||
maxSkillsPromptChars: 30000
|
||||
list:
|
||||
- id: "myagent"
|
||||
workspace: "~/workspace"
|
||||
model:
|
||||
primary: "anthropic/claude-sonnet-4-5"
|
||||
heartbeat:
|
||||
every: "1h"
|
||||
skills:
|
||||
filter: ["github", "slack"]
|
||||
```
|
||||
|
||||
Workspace files: AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md,
|
||||
HEARTBEAT.md, BOOTSTRAP.md, MEMORY.md.
|
||||
|
||||
### Plugin Hooks
|
||||
|
||||
29 lifecycle hook points:
|
||||
`before_model_resolve`, `before_prompt_build`, `before_agent_start`,
|
||||
`before_agent_reply`, `llm_input`, `llm_output`, `agent_end`,
|
||||
`inbound_claim`, `message_received`, `message_sending`, `message_sent`,
|
||||
`before_tool_call`, `after_tool_call`, `session_start`, `session_end`,
|
||||
`subagent_spawning`, `subagent_delivery_target`, `gateway_start`,
|
||||
`gateway_stop`, `before_dispatch`, `reply_dispatch`, `before_install`, etc.
|
||||
|
||||
---
|
||||
|
||||
## Patterns Worth Replicating in Agent Factory
|
||||
|
||||
### From Paperclip
|
||||
|
||||
1. **Heartbeat as context injection** — Each beat starts clean, loads curated context
|
||||
packet. Maps to: `/schedule` trigger + CLAUDE.md + memory files loaded per session.
|
||||
|
||||
2. **Adapter interface** — Clean `execute(ctx)` pattern. Maps to: our agent files
|
||||
are already adapter-like (model, tools, prompt per agent).
|
||||
|
||||
3. **Budget as governance primitive** — Post-hoc cost tracking with pause thresholds.
|
||||
Maps to: hook that reads `/usage` after each run, logs to cost-events file,
|
||||
alerts when threshold crossed.
|
||||
|
||||
4. **Task checkout via file locking** — Paperclip uses PostgreSQL. We can use
|
||||
file-based locking (write `task.lock` with agent name, check before claiming).
|
||||
|
||||
5. **Session persistence via taskKey** — Different tasks get different sessions.
|
||||
Maps to: `--resume` with task-specific session IDs.
|
||||
|
||||
### From OpenClaw
|
||||
|
||||
6. **HEARTBEAT.md with task parsing** — YAML tasks block with intervals and
|
||||
due-time checking. Maps directly to our generated HEARTBEAT.md files.
|
||||
|
||||
7. **Emptiness detection** — Skip API calls when heartbeat file is effectively empty.
|
||||
Critical cost saver. Include in generated heartbeat scripts.
|
||||
|
||||
8. **Skill prompt XML format** — Standardized skill discovery in system prompt.
|
||||
Our skills already use this via Claude Code's built-in mechanism.
|
||||
|
||||
9. **3-tier memory** — SESSION-STATE.md (hot) + daily logs (warm) + MEMORY.md (cold).
|
||||
Maps to: templates we generate in the user's project.
|
||||
|
||||
10. **Startup catchup with stagger** — Run missed jobs on restart, but don't
|
||||
thundering-herd. Include in generated automation scripts.
|
||||
|
||||
### Unique to Agent Factory
|
||||
|
||||
11. **Guided construction** — Neither tool helps you BUILD the system. We do.
|
||||
12. **Progressive complexity** — Start with 1 agent, grow to full org.
|
||||
13. **Domain templates** — Not just researcher→writer→reviewer. Monitoring,
|
||||
code review, data processing, research synthesis.
|
||||
14. **Claude Code-native** — No PostgreSQL, no Node.js server, no Docker required.
|
||||
Just agents, skills, hooks, settings.json, /schedule.
|
||||
Loading…
Add table
Add a link
Reference in a new issue