agent-builder/.claude/plans/blueprints/session-7-skill-updates.md
Kjell Tore Guttormsen 1a776bdeb2 docs(plans): create session blueprints for Agent Factory execution
8 session blueprints covering all 27 steps across 3 waves:
- Session 1: Foundation (rename + commands, Steps 1-5)
- Session 2: Skills and templates (Steps 6-7)
- Session 3: OpenClaw patterns (memory/heartbeat/proactive/cron, Steps 9-12)
- Session 4: Paperclip patterns (context/goals/budget/governance/org-chart, Steps 14-18)
- Session 5: Self-learning (feedback/optimization, Steps 20-21)
- Session 6: Integration (Docker/transfer/5 more domains, Steps 22-24)
- Session 7: Skill updates (memory/autonomy/orchestration/governance/MCP refs, Steps 13,19,25)
- Session 8: Finalization (build command integration + v1.0, Steps 8,26,27)

Also updates plan assumptions table with verified findings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 11:21:17 +02:00

53 KiB
Raw Blame History

Session 7: Skill Updates and References

Steps 13, 19, 25 | Wave 2 | Depends on: Sessions 3 and 4

Dependencies

Entry condition: Sessions 3 and 4 must be complete. Session 3 creates the memory and autonomy templates referenced in Step 13. Session 4 creates the orchestration and governance templates referenced in Step 19. Verify before starting:

ls /Users/ktg/repos/agent-builder/scripts/templates/memory/SESSION-STATE.md
ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/HEARTBEAT.md
ls /Users/ktg/repos/agent-builder/scripts/templates/governance/GOVERNANCE.md
ls /Users/ktg/repos/agent-builder/scripts/templates/goals/GOALS.md

All four files must exist.

Scope Fence

Touch:

  • skills/agent-system-design/SKILL.md (modify)
  • skills/agent-system-design/references/memory-patterns.md (new)
  • skills/agent-system-design/references/autonomy-patterns.md (new)
  • skills/agent-system-design/references/orchestration-patterns.md (new)
  • skills/agent-system-design/references/governance-patterns.md (new)
  • skills/agent-system-design/references/mcp-integrations.md (new)

Never touch:

  • commands/
  • agents/
  • scripts/templates/

Step 13: Add memory and autonomy pattern references to agent-system-design skill

Files to modify

skills/agent-system-design/SKILL.md — Apply these diffs:

Diff 1 — Add trigger phrases to the description field in frontmatter.

Find:

  "how to build an agent with Claude Code"
version: 0.1.0

Replace with:

  "how to build an agent with Claude Code",
  "agent memory", "3-tier memory", "WAL protocol",
  "proactive agent", "self-improving agent", "heartbeat scheduling"
version: 0.1.0

Diff 2 — Update the System components table to include Memory and Heartbeat rows.

Find:

| Automation | `scripts/` + `launchd/` | Scheduled execution (Mac: launchd, Linux: systemd/cron) |
| Memory | `memory/` or `data/` | Persistent state files updated each run |

Replace with:

| Automation | `scripts/` + `launchd/` | Scheduled execution (Mac: launchd, Linux: systemd/cron) |
| Memory | `memory/` or `data/` | Persistent state files updated each run |
| Memory (3-tier) | `SESSION-STATE.md`, `DAILY-LOG.md`, `MEMORY.md` | Hot/warm/cold tier memory with WAL protocol for crash safety |
| Heartbeat | `heartbeat/HEARTBEAT.md` + cron | Scheduled wakeup with context injection to restore agent state |

Diff 3 — Add Memory patterns, Autonomy patterns, and Heartbeat scheduling sections before "Getting started".

Find:

## Getting started

Run `/agent-factory:build` for the guided 7-phase workflow.

Replace with:

## Memory patterns

Autonomous agents lose state between sessions. The 3-tier memory architecture gives
agents persistent memory that survives context compaction and crashes:

- **Hot tier** — `SESSION-STATE.md`: written every turn using the WAL protocol. Read
  first on resume. Captures current task, active decisions, pending actions.
- **Warm tier** — `DAILY-LOG.md`: rolling 7-day log of decisions and completions.
  Used for mid-term recall and generating daily summaries.
- **Cold tier** — `MEMORY.md`: durable facts, project history, learned preferences.
  Updated by compaction from DAILY-LOG at end of week.

The WAL (Write-Ahead Log) protocol requires agents to write critical state to
SESSION-STATE.md *before* responding. If the session crashes mid-response, the
next session reads SESSION-STATE.md and recovers.

For full memory architecture, compaction procedures, and template locations, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md`

## Autonomy patterns

Proactive agents act without explicit user prompts. Two constraints govern when
an agent should act autonomously versus wait:

- **ADL (Autonomous Decision Limit)**: Defines which actions the agent may take
  without approval. Examples: `ADL: read files, write to memory/, run tests`.
  Anything not in the ADL list requires escalation.
- **VFM (Value/Friction Matrix)**: Scores each candidate action by value (benefit
  to user) and friction (cost, reversibility, risk). Act autonomously only when
  value is high and friction is low. Scored on a 0-10 scale each axis.

The self-healing protocol handles transient failures: retry up to 5 attempts with
exponential backoff, then pause and log. Never retry indefinitely without a cap.

For the full proactive cycle, ADL constraint examples, VFM scoring rubric with
worked examples, and when NOT to use proactive patterns, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md`

## Heartbeat scheduling

The heartbeat pattern wakes an agent on a schedule, injects a context packet
(goals, memory summary, pending tasks, budget status, wake reason), and lets the
agent decide what to do in this beat. If nothing needs attention, the agent
responds `HEARTBEAT_OK` and exits.

This differs from a simple cron trigger: the context packet ensures the agent
always wakes up knowing its current state, even if it has no persistent memory
of previous sessions.

Template locations (created in Sessions 3 and 4):
- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/HEARTBEAT.md`
- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/heartbeat-runner.sh`
- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/context-packet.md`
- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/wake-prompt.md`

## Getting started

Run `/agent-factory:build` for the guided 7-phase workflow.

Diff 4 — Add memory-patterns and autonomy-patterns links to the "Getting started" reference block.

Find:

For pipeline design patterns and agent role templates, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/pipeline-patterns.md`

Replace with:

For pipeline design patterns and agent role templates, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/pipeline-patterns.md`

For 3-tier memory architecture, WAL protocol, and compaction procedures, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md`

For proactive agent patterns, ADL constraints, and VFM scoring, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md`

Files to create

skills/agent-system-design/references/memory-patterns.md:

# Memory Patterns for Autonomous Agents

Reference for the agent-system-design skill. Covers 3-tier memory architecture,
WAL protocol, Working Buffer protocol, compaction recovery, and template locations.

---

## 1. Three-Tier Memory Architecture

Autonomous agents have no built-in persistent memory. The 3-tier pattern gives
them a structured place to write and read state across sessions and compaction events.

| Tier | File | Volatility | Updated | Read order |
|------|------|-----------|---------|------------|
| Hot | `SESSION-STATE.md` | Every turn | Agent writes before each response | First |
| Warm | `DAILY-LOG.md` | Daily | Agent appends at session end | Second |
| Cold | `MEMORY.md` | Weekly | Compaction from DAILY-LOG | Third |

**Read order on session start:** SESSION-STATE.md → DAILY-LOG.md (last 2 days) →
MEMORY.md. This ensures the agent loads the most recent context first without
reading the entire history.

**Write order on session end:** Update SESSION-STATE.md first (WAL), then append
to DAILY-LOG.md, then check if weekly compaction is due.

---

## 2. WAL Protocol

WAL (Write-Ahead Log) prevents data loss when sessions crash or context compacts
mid-response. The rule is simple: **write before you respond**.

### Protocol

1. Agent receives a message or trigger
2. **Before generating the response:** write current task, key decisions, and
   pending actions to SESSION-STATE.md
3. Generate the response
4. On session resume: read SESSION-STATE.md first — it contains the last known state

### Why it works

If the session crashes after step 2 but before step 3, the next session reads
SESSION-STATE.md and recovers. Without WAL, the crash leaves no trace of what
the agent was doing.

### Example entry

```markdown
## WAL Entry — 2025-11-14T09:30:00Z

**Task:** Compacting DAILY-LOG.md into MEMORY.md
**Decision:** Keep entries from last 14 days, summarize older
**Next action:** Write compacted MEMORY.md, then truncate DAILY-LOG.md
**Status:** in progress

The agent writes this before starting the compaction. If it crashes mid-compaction, the next session reads this entry and knows to check whether MEMORY.md was updated.


3. Working Buffer Protocol

When context window usage exceeds 60%, the agent activates the Working Buffer in SESSION-STATE.md. This is a scratchpad for capturing key exchanges before they are pushed out of the context window by compaction.

When to activate

The agent monitors context usage through self-assessment. Signals to activate:

  • Response latency increasing (internal signal)
  • Conversation has been active for 90+ minutes
  • Agent notices it cannot recall something from earlier in the session

Working Buffer format

## Working Buffer (ACTIVE — context at 72%)

### Key exchange 1 — 09:15
User asked about X. Decided to use approach Y because Z.

### Key exchange 2 — 09:40
Discovered constraint: W. This affects steps 3 and 4.

### Decision: [the most critical unresolved decision in this session]

The Working Buffer is read before generating any response once activated. Deactivate by moving captured content to DAILY-LOG.md at session end.


4. Compaction Recovery Read Order

When a session resumes after a context compaction event (Claude Code auto-summary), the agent may have lost conversational context. Recovery read order:

  1. SESSION-STATE.md — current task and WAL entry (most recent state)
  2. DAILY-LOG.md — last 2 days of entries (recent decisions and completions)
  3. MEMORY.md — durable facts (only if SESSION-STATE.md indicates this is needed)

This order minimizes tokens loaded while ensuring the agent can continue its most recent task without asking the user to repeat context.


5. Template Locations

All memory templates are installed by Session 3 of the Agent Factory build workflow.

Template Location Purpose
SESSION-STATE.md ${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/SESSION-STATE.md Hot tier template
DAILY-LOG.md ${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/DAILY-LOG.md Warm tier template
MEMORY.md ${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/MEMORY.md Cold tier template
README.md ${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/README.md Setup instructions

The Agent Factory builder copies these templates into the user's project directory during Phase 2 (memory configuration) of the /agent-factory:build workflow.


6. Memory and the Heartbeat Pattern

The heartbeat pattern (see ${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/memory-patterns.md) reads the memory files to build a context packet before each wakeup. The context injection pattern is:

heartbeat-runner.sh
  → read SESSION-STATE.md       (current task and WAL)
  → read DAILY-LOG.md (2 days)  (recent activity)
  → read GOALS.md               (active goals)
  → build context-packet.md     (assembled packet)
  → inject into wake-prompt.md  (agent receives full context)
  → invoke agent

This means the heartbeat runner is the memory reader, not the agent itself. The agent receives a pre-assembled context packet and does not need to read memory files directly.


**`skills/agent-system-design/references/autonomy-patterns.md`**:

```markdown
# Autonomy Patterns for Proactive Agents

Reference for the agent-system-design skill. Covers the proactive agent cycle,
ADL constraints, VFM scoring, self-healing protocol, agentTurn vs systemEvent,
and when NOT to use proactive patterns.

---

## 1. The Proactive Agent Cycle

A proactive agent wakes on schedule, evaluates its situation, acts or skips,
then sleeps again. The cycle has five stages:

WAKE ↓ READ context (SESSION-STATE, GOALS, recent events) ↓ EVALUATE: Is there work to do? (VFM scoring) ↓ If value > threshold AND friction < threshold: ACT: take the highest-value low-friction action WRITE state (WAL before acting, update after) ↓ If no qualifying action: LOG: "HEARTBEAT_OK — no action needed" ↓ SLEEP (until next scheduled wakeup)


The key constraint: the agent evaluates *before* acting. An agent that acts
without evaluation is not proactive — it is reactive and unpredictable.

---

## 2. ADL — Autonomous Decision Limit

The ADL defines which actions the agent may take without human approval.
Everything outside the ADL requires escalation.

### ADL format

```markdown
## Autonomous Decision Limit

The agent MAY autonomously:
- Read files anywhere in the project directory
- Write to memory/ and logs/ directories
- Run tests (npm test, pytest, cargo test)
- Create branches with prefix feature/ or fix/
- Send Slack messages to #agent-activity channel only

The agent MUST escalate before:
- Modifying files outside the project directory
- Making commits to main/master
- Sending messages to any channel other than #agent-activity
- Making API calls that have costs
- Taking any irreversible action not listed above

ADL design principles

  • Start narrow, expand on evidence. Start with read-only and memory writes. Expand the ADL only after observing the agent behave correctly in more sessions.
  • Irreversibility drives the line. Actions that cannot be undone (commits, external API calls, file deletions) stay outside the ADL until trust is earned.
  • Channel specificity over channel type. "Send Slack messages" is too broad. "Send Slack messages to #agent-activity channel only" is a testable constraint.

ADL examples by autonomy level

Level ADL scope
0 — Human in loop Empty ADL. Every action requires approval.
1 — Read-only Read files, read memory, read logs. No writes.
2 — Memory + logs Level 1 plus write to memory/ and logs/.
3 — Local automation Level 2 plus run tests, create branches, send designated channel messages.
4 — Trusted operator Level 3 plus commit to non-main branches, call approved external APIs.

Autonomy level 4 (full trust) should only be reached after the agent has operated at level 3 for a sustained period without escalation failures.


3. VFM — Value/Friction Matrix

The VFM scoring rubric determines whether an action qualifies for autonomous execution. Each candidate action gets two scores:

  • Value (010): How much does this action benefit the user if done now?
  • Friction (010): How much effort, risk, or cost does this action incur?

Decision rule

If value >= 7 AND friction <= 3: ACT autonomously
If value >= 5 AND friction <= 2: ACT autonomously
If value >= 8 AND friction <= 5: ACT autonomously (high-value exception)
Otherwise: LOG intent, DO NOT ACT, optionally notify

The thresholds are starting points. Tune them based on observed behavior. Lower the friction threshold if the agent acts too aggressively. Raise the value threshold if the agent acts too conservatively.

Worked examples

Example 1 — Update SESSION-STATE.md at end of session

  • Value: 9 (prevents state loss, always needed)
  • Friction: 1 (one file write, fully reversible)
  • Decision: ACT (9 >= 7, 1 <= 3) ✓

Example 2 — Send daily summary to Slack

  • Value: 7 (useful to the user, expected behavior)
  • Friction: 3 (external message, channel is in ADL, low risk)
  • Decision: ACT (7 >= 7, 3 <= 3) ✓

Example 3 — Commit and push to main branch

  • Value: 6 (saves work, but user may want to review)
  • Friction: 8 (irreversible without force push, affects others)
  • Decision: DO NOT ACT (friction > 3) — log intent and notify

Example 4 — Run the full test suite

  • Value: 5 (useful, but not urgent if nothing changed)
  • Friction: 2 (runs locally, no external side effects)
  • Decision: ACT (5 >= 5, 2 <= 2) ✓

Example 5 — Delete build artifacts older than 30 days

  • Value: 4 (marginal disk savings)
  • Friction: 4 (deletion is irreversible, could delete needed files)
  • Decision: DO NOT ACT (value < 5, friction > 3) — log suggestion

4. Self-Healing Protocol

Autonomous agents encounter transient failures: network timeouts, locked files, API rate limits, temporary permission errors. The self-healing protocol handles these without human intervention, up to a defined limit.

Protocol

Attempt action
  → On success: continue
  → On transient failure (timeout, rate limit, lock contention):
      Wait 2^n seconds (exponential backoff: 2, 4, 8, 16, 32 seconds)
      Retry
      After 5 attempts: PAUSE and log failure
  → On hard failure (permission denied, file not found, invalid input):
      Do NOT retry
      Log failure with full error message
      Escalate if action was in ADL

The 5-attempt limit is mandatory. An agent that retries indefinitely will run forever on a permanent failure. Always cap retries and log when the cap is reached.

Transient vs hard failures

Error type Retry? Examples
Network timeout Yes HTTP 429, 503, connection refused
File lock Yes File currently being written by another process
Rate limit Yes (with longer wait) API 429, git push rate limit
Permission denied No chmod 000 file, read-only filesystem
File not found No (usually) Missing dependency, wrong path
Invalid input No Malformed JSON, wrong argument type

5. agentTurn vs systemEvent

Two trigger types for proactive agents. Understanding the difference prevents the agent from acting when it should not.

agentTurn

The agent is invoked as part of a deliberate workflow. A human or orchestrator explicitly asked the agent to do something. The agent has full permission to act within its ADL.

Use for: scheduled heartbeats, pipeline steps, explicit /agent-factory:build phases.

systemEvent

An external event fired the agent (file watcher, webhook, monitoring alert). The agent was not explicitly asked to act — it was notified. The agent should evaluate using VFM before acting, not act automatically.

Use for: file change monitors, CI failure alerts, Slack mention triggers.

Key distinction: In an agentTurn, the context gives permission. In a systemEvent, the agent must earn permission through VFM evaluation.


6. When NOT to Use Proactive Patterns

Proactive patterns add complexity. Do not use them unless the use case requires it.

Do not use proactive agents for:

  • One-shot tasks. If the user runs the agent manually each time, a simple pipeline is sufficient. Proactive patterns add overhead with no benefit.
  • Tasks that require real-time data. A proactive agent wakes on a schedule. If the task requires acting within seconds of an event, use a webhook receiver or event-driven architecture instead.
  • Tasks with high friction, always. If every action in the workflow scores high on friction (irreversible, external side effects, costly), the VFM will never clear the threshold. The agent will wake, evaluate, and go back to sleep every time. This is waste.
  • Untrusted environments. Do not deploy a proactive agent in an environment where the blast radius of a mistake is large and you have not yet validated the agent's behavior manually.

Start proactive only when:

  • The agent has been validated in manual (non-proactive) mode first
  • The ADL is defined and tested
  • The VFM thresholds are tuned for the specific use case
  • Monitoring and logging are in place

### Verify

```bash
grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md

Expected: >= 3

On failure: revert

Checkpoint

git commit -m "feat(skills): add memory and autonomy pattern references to agent-system-design"

Step 19: Add orchestration and governance pattern references to agent-system-design skill

Files to modify

skills/agent-system-design/SKILL.md — Apply these diffs (builds on Step 13 changes):

Diff 1 — Extend trigger phrases in the description field.

Find:

  "agent memory", "3-tier memory", "WAL protocol",
  "proactive agent", "self-improving agent", "heartbeat scheduling"
version: 0.1.0

Replace with:

  "agent memory", "3-tier memory", "WAL protocol",
  "proactive agent", "self-improving agent", "heartbeat scheduling",
  "goal hierarchy", "budget tracking", "approval gates",
  "governance", "org chart", "multi-agent coordination", "agent budget"
version: 0.1.0

Diff 2 — Extend the System components table to include orchestration components.

Find:

| Memory (3-tier) | `SESSION-STATE.md`, `DAILY-LOG.md`, `MEMORY.md` | Hot/warm/cold tier memory with WAL protocol for crash safety |
| Heartbeat | `heartbeat/HEARTBEAT.md` + cron | Scheduled wakeup with context injection to restore agent state |

Replace with:

| Memory (3-tier) | `SESSION-STATE.md`, `DAILY-LOG.md`, `MEMORY.md` | Hot/warm/cold tier memory with WAL protocol for crash safety |
| Heartbeat | `heartbeat/HEARTBEAT.md` + cron | Scheduled wakeup with context injection to restore agent state |
| Goals | `GOALS.md` | File-based goal hierarchy (company → project → task) with agent ownership |
| Budget | `BUDGET.md` + `budget/` | Post-hoc cost enforcement with soft warn and hard pause |
| Governance | `GOVERNANCE.md` | Autonomy level policy, approval gates, escalation rules |
| Org Chart | `ORG-CHART.md` | Agent reporting hierarchy with delegation and human override |

Diff 3 — Add Orchestration patterns and Governance patterns sections before "Getting started".

Find:

## Getting started

Run `/agent-factory:build` for the guided 7-phase workflow.

Replace with:

## Orchestration patterns

Multi-agent systems need coordination beyond sequential pipelines. Four Paperclip
orchestration patterns supported by Agent Factory:

- **Heartbeat with context injection** — The scheduler injects a pre-built context
  packet into each wakeup. The agent wakes already knowing its state, goals, and
  budget. No cold start.
- **Goal hierarchy** — Goals are organized as company → project → task with a
  simple `parent_id` reference (not recursive). Each task goal has an owner (agent
  name) and a status. The `goal-tracker.sh` script reads GOALS.md and generates
  context for heartbeat injection.
- **Org chart and delegation** — The ORG-CHART.md file maps agents to roles and
  reporting lines using a `reportsTo` pattern. Orchestrator agents route work to
  specialists based on the org chart. Humans always have override authority.
- **Task checkout via file locking** — Multiple agents avoid working on the same
  task by writing a `.lock` file before starting. If the lock exists, the agent
  skips and logs. Lock files are cleaned up on completion or timeout.

For full orchestration patterns including session persistence, heartbeat comparison
matrix (OpenClaw cron vs Paperclip heartbeat vs /schedule), see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md`

## Governance patterns

Autonomous agents need constraints. The governance system defines what level of
autonomy each agent operates at and what requires approval.

**Autonomy levels (04):**

| Level | Name | Description |
|-------|------|-------------|
| 0 | Full manual | Every action requires human approval |
| 1 | Read-only | Agent may read files and memory; no writes |
| 2 | Memory writes | Level 1 plus writes to memory/ and logs/ |
| 3 | Local operator | Level 2 plus tests, branches, designated channel messages |
| 4 | Trusted operator | Level 3 plus non-main commits, approved external APIs |

**Approval gates** — Named checkpoints where an agent must request human approval
before proceeding. Defined in GOVERNANCE.md. Example gates: `pre-commit`,
`pre-deploy`, `pre-external-api`, `budget-exceeded`.

**Budget enforcement** — Post-hoc: cost is checked after each run against the
policy in BUDGET.md. Soft threshold (default 80%) triggers a warning. Hard
threshold (100%) creates a PAUSED flag and blocks subsequent tool calls.

For the full governance reference including audit trail requirements,
error threshold auto-pause, and Paperclip's "autonomy is a privilege" philosophy, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md`

## Getting started

Run `/agent-factory:build` for the guided 7-phase workflow.

Diff 4 — Add orchestration and governance links to the reference block.

Find:

For proactive agent patterns, ADL constraints, and VFM scoring, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md`

Replace with:

For proactive agent patterns, ADL constraints, and VFM scoring, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/autonomy-patterns.md`

For multi-agent orchestration, goal hierarchy, org chart, and heartbeat comparison, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/orchestration-patterns.md`

For governance, autonomy levels, approval gates, and budget enforcement, see:
`${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/governance-patterns.md`

Files to create

skills/agent-system-design/references/orchestration-patterns.md:

# Orchestration Patterns for Multi-Agent Systems

Reference for the agent-system-design skill. Covers heartbeat with context
injection, goal hierarchy, org chart and delegation, task checkout, session
persistence, and a comparison of scheduling approaches.

---

## 1. Heartbeat with Context Injection

The Paperclip heartbeat pattern extends a simple cron trigger with a context
assembly step. Instead of waking an empty agent, the scheduler builds a context
packet containing everything the agent needs and injects it into the wake prompt.

### The context assembly flow

heartbeat-runner.sh (runs on schedule) ↓ Assemble context packet: read SESSION-STATE.md → current task, WAL entry read DAILY-LOG.md (2 days) → recent activity read GOALS.md → active goals (./goal-tracker.sh context) read BUDGET.md + cost-events.jsonl → budget status determine wake reason → scheduled beat, or event trigger

↓ Populate context-packet.md template: {{AGENT_NAME}}, {{MEMORY_SUMMARY}}, {{ACTIVE_GOALS}}, {{BUDGET_STATUS}}, {{WAKE_REASON}}

↓ Populate wake-prompt.md with context packet ↓ Invoke: claude -p --resume {{SESSION_NAME}} < wake-prompt.md


### Why inject context rather than let the agent read files?

The agent reading its own memory files requires extra tool calls, adds latency,
and fails silently if a file is missing. The runner assembles the packet once,
handles missing files gracefully, and delivers a complete context in a single
system turn. The agent arrives oriented.

### Template locations

- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/context-packet.md`
- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/wake-prompt.md`
- `${CLAUDE_PLUGIN_ROOT}/scripts/templates/heartbeat/heartbeat-runner.sh`

---

## 2. Goal Hierarchy

File-based goal hierarchy inspired by Paperclip's goal system. Three levels:
company → project → task. Each goal references its parent by ID.

### Structure

```markdown
## Company Goals
- [G1] Increase content output quality
- [G2] Reduce operational overhead

## Project Goals
- [G1.1] Improve research agent accuracy (parent: G1)
- [G2.1] Automate daily log compaction (parent: G2)

## Task Goals
- [G1.1.1] Add source verification to researcher (parent: G1.1, owner: researcher-agent, status: active)
- [G2.1.1] Write compaction script (parent: G2.1, owner: memory-agent, status: pending)

Design decisions

  • Simple parent_id, not recursive. Each goal references its parent by ID. The goal-tracker.sh script uses this for orphan detection and context generation but does not traverse the tree recursively at runtime.
  • Dot notation for hierarchy. G1 → G1.1 → G1.1.1 makes depth visible in the ID without a database or recursive query.
  • Status at task level only. Company and project goals are structural. Status tracking happens at the task level where agents own the work.

Integration with heartbeat

# In heartbeat-runner.sh:
ACTIVE_GOALS=$(./scripts/goal-tracker.sh context)
# Inject as {{ACTIVE_GOALS}} into context-packet.md

Template location: ${CLAUDE_PLUGIN_ROOT}/scripts/templates/goals/


3. Org Chart and Delegation

The org chart defines the reporting structure for a multi-agent system. It answers: when agent A cannot handle a task, who does it delegate to?

ORG-CHART.md format

| Agent | Role | Reports To | Status | Budget (cents/day) |
|-------|------|-----------|--------|-------------------|
| orchestrator | Coordinator | human | active | 200 |
| researcher | Research | orchestrator | active | 100 |
| writer | Content | orchestrator | active | 100 |
| reviewer | Quality | orchestrator | active | 50 |

Delegation rules

  1. Orchestrator assigns tasks to specialists based on the org chart
  2. Specialist completes or escalates to orchestrator (never skips levels)
  3. Orchestrator escalates to human for decisions outside its ADL
  4. Human override authority: any human instruction takes priority over the org chart at any level

Cross-team routing

For organizations with multiple orchestrators:

| Agent | Role | Reports To | Status |
|-------|------|-----------|--------|
| content-orchestrator | Content team lead | human | active |
| research-orchestrator | Research team lead | human | active |
| shared-researcher | Shared resource | content-orchestrator | active |

Shared agents report to a primary orchestrator but can receive delegated tasks from other orchestrators via a task request file.

Template location: ${CLAUDE_PLUGIN_ROOT}/scripts/templates/org-chart/


4. Task Checkout via File Locking

When multiple agents may work on the same task pool, use file locking to prevent duplicate work. Bash 3.2 compatible using mkdir as an atomic lock.

LOCK_DIR="tasks/${TASK_ID}.lock"

# Attempt checkout
if mkdir "$LOCK_DIR" 2>/dev/null; then
  # Lock acquired — we own this task
  echo "$AGENT_NAME" > "$LOCK_DIR/owner"
  echo "$(date -u +%Y-%m-%dT%H:%M:%SZ)" > "$LOCK_DIR/acquired"

  # Do the work here
  do_task "$TASK_ID"

  # Release lock
  rm -rf "$LOCK_DIR"
else
  # Lock held by another agent
  OWNER=$(cat "$LOCK_DIR/owner" 2>/dev/null || echo "unknown")
  echo "Task $TASK_ID is locked by $OWNER — skipping"
  exit 0
fi

Stale lock cleanup

Add to the heartbeat runner to clean locks older than 2 hours:

find tasks/ -name "*.lock" -type d | while read lock; do
  acquired=$(cat "$lock/acquired" 2>/dev/null || echo "")
  if [ -n "$acquired" ]; then
    age=$(python3 -c "
from datetime import datetime, timezone
import sys
t = datetime.fromisoformat('$acquired'.replace('Z','+00:00'))
now = datetime.now(timezone.utc)
print(int((now-t).total_seconds()))
" 2>/dev/null || echo 0)
    if [ "$age" -gt 7200 ]; then
      echo "Removing stale lock: $lock (age: ${age}s)"
      rm -rf "$lock"
    fi
  fi
done

5. Session Persistence

Agents can resume previous sessions using Claude Code's --resume flag. The heartbeat runner manages session names to ensure consistent resumption.

# heartbeat-runner.sh — session persistence pattern
SESSION_NAME="${AGENT_NAME}-$(date +%Y-%m)"  # one session per month
SESSION_FILE="$WORKING_DIR/heartbeat/.current-session"

# Check if session exists
if [ -f "$SESSION_FILE" ]; then
  SESSION_ID=$(cat "$SESSION_FILE")
  RESUME_FLAG="--resume $SESSION_ID"
else
  RESUME_FLAG=""
fi

# Run agent
claude -p $RESUME_FLAG --name "$SESSION_NAME" < wake-prompt-filled.md

# Save session ID for next run
claude session list --json | python3 -c "
import json, sys
sessions = json.load(sys.stdin)
for s in sessions:
    if s.get('name') == '$SESSION_NAME':
        print(s['id'])
        break
" > "$SESSION_FILE" 2>/dev/null || true

Note: Session persistence is optional. The context injection pattern (Section 1) means the agent can work correctly without resuming a previous session, because the context packet re-establishes its state.


6. Scheduling Comparison Matrix

Three scheduling approaches supported by Agent Factory. Choose based on run frequency, context requirements, and deployment environment.

Dimension OpenClaw cron Paperclip heartbeat Claude /schedule
Mechanism cron / launchd cron + context assembly script Cloud task (Anthropic-hosted)
Context injection Agent reads its own memory Runner assembles and injects packet Prompt in task definition
Minimum interval 1 minute 1 minute 1 hour
Local file access Yes Yes No
MCP servers Yes Yes Via MCP connectors only
Session resume Optional (--resume) Optional (--resume) Via API sessions
Best for Simple schedules, file-heavy Multi-agent systems, stateful agents Repo-only work, cloud-native
Cold start problem Agent reads memory each run Solved by context injection N/A (stateless tasks)

Decision rule:

  • Personal pipeline with local files → OpenClaw cron pattern
  • Multi-agent system with shared state → Paperclip heartbeat pattern
  • GitHub/Linear/web-only tasks on a long interval → Claude /schedule

**`skills/agent-system-design/references/governance-patterns.md`**:

```markdown
# Governance Patterns for Autonomous Agents

Reference for the agent-system-design skill. Covers autonomy levels, approval
gates, budget enforcement, audit trail, error threshold auto-pause, and
Paperclip's "autonomy is a privilege" philosophy.

---

## 1. The Core Philosophy: Autonomy is a Privilege

Autonomous agents earn the right to act without approval through demonstrated
reliability. The governance system formalizes this by requiring agents to start
at low autonomy (level 0 or 1) and progress upward based on observed behavior.

**The principle, stated plainly:**
> An agent that has never been seen to fail may act autonomously.
> An agent that has failed without recovery must earn autonomy back.

This is Paperclip's operating assumption. It is not about distrust — it is about
building a verifiable track record before expanding permissions. Governance is not
a ceiling; it is a ladder.

---

## 2. Autonomy Levels (04)

Five levels mapping to progressively broader autonomous action. Define the
current level in GOVERNANCE.md.

| Level | Name | What the agent may do without approval |
|-------|------|----------------------------------------|
| 0 | Full manual | Nothing. Every action requires human approval via prompt. |
| 1 | Read-only | Read files, read memory, query status. No writes. |
| 2 | Memory writes | Level 1 plus write to memory/ and logs/ directories. |
| 3 | Local operator | Level 2 plus run tests, create non-main branches, write to designated channels. |
| 4 | Trusted operator | Level 3 plus commit to non-main branches, call approved external APIs. |

### Progressing through levels

Advance one level at a time. Criteria for advancing:

- **Level 0 → 1:** Agent correctly reads and summarizes state in 5 consecutive sessions
- **Level 1 → 2:** Agent correctly writes memory files without data loss in 5 consecutive sessions
- **Level 2 → 3:** Agent correctly manages its task queue and escalates appropriately in 10 consecutive sessions
- **Level 3 → 4:** Agent operates at level 3 for 30 days without an escalation failure

Regress one level immediately on: unrecoverable state corruption, unauthorized
action outside ADL, failure to escalate a hard failure.

---

## 3. Approval Gates

Named checkpoints where the agent must pause and request human approval before
proceeding. Defined in GOVERNANCE.md under `## Approval Gates`.

### Gate format

```markdown
## Approval Gates

- pre-commit: Required before any git commit to a non-draft branch
  Condition: agent has staged changes and plans to commit
  Channel: slack #agent-approvals

- pre-deploy: Required before any deployment action
  Condition: agent is about to run a deploy script or kubectl apply
  Channel: slack #agent-approvals

- budget-exceeded: Required when cumulative cost exceeds soft threshold
  Condition: budget/cost-events.jsonl shows > 80% of limit
  Channel: slack #agent-activity (warn), #agent-approvals (hard stop)

- unknown-tool: Required when agent wants to use a tool not in its allowlist
  Condition: tool name not present in settings.json permissions.allow
  Channel: slack #agent-approvals

Gate enforcement via PreToolUse hook

The approval-gate.sh template implements gates as a PreToolUse hook:

  1. Check current autonomy level from GOVERNANCE.md
  2. Check if the requested tool call matches a gate condition
  3. If match and level < 4: write approval request to governance/pending-approvals.jsonl
  4. Poll governance/approval-responses.jsonl for a matching response (timeout: 5 minutes)
  5. If approved: exit 0 (allow)
  6. If rejected or timeout: exit 2 (block)

Template location: ${CLAUDE_PLUGIN_ROOT}/scripts/templates/governance/approval-gate.sh


4. Budget Enforcement

Post-hoc budget enforcement: check cost after each run rather than reserving budget before. This matches Paperclip's actual implementation (simpler, no pre-run coordination needed).

How it works

  1. budget-hook.sh runs as a PostToolUse hook
  2. Each tool call is logged to budget/cost-events.jsonl
  3. Cumulative cost is compared against BUDGET.md policy
  4. At soft threshold (default 80%): warning logged, Slack notification optional
  5. At hard threshold (100%): budget/PAUSED flag created; subsequent PreToolUse hook calls exit 2 (block) until the flag is manually removed

Resuming after a hard stop

# Check why agent was paused
cat budget/PAUSED

# Review cost events
./scripts/budget-report.sh

# If budget is sufficient, remove the pause flag to resume
rm budget/PAUSED

Important limitation: cost estimation accuracy

The budget-hook.sh template counts tool events as a rough cost proxy. Accurate cost tracking requires one of:

  • Admin API (/v1/organizations/cost_report) — requires an Admin API key (sk-ant-admin...), available to organization accounts only
  • Token counting — parse token counts from Claude's responses and multiply by published per-token prices
  • --max-budget-usd N flag — per-run budget cap when invoking claude -p

For personal agents, the event-count proxy is sufficient as a relative indicator. For production agents with real cost accountability, use one of the accurate methods.

Template locations: ${CLAUDE_PLUGIN_ROOT}/scripts/templates/budget/


5. Audit Trail

Every action taken by an autonomous agent should be logged for post-hoc review. Minimum audit trail for a production agent:

logs/
  audit.log          — tool calls with timestamp, tool name, input summary
  decisions.log      — VFM evaluations and outcomes
  escalations.log    — approval gate requests and responses
budget/
  cost-events.jsonl  — per-tool-call cost events
governance/
  pending-approvals.jsonl   — approval requests sent
  approval-responses.jsonl  — responses received

Audit log format (per tool call)

2025-11-14T09:30:00Z TOOL=Write FILE=memory/SESSION-STATE.md AGENT=orchestrator LEVEL=2 STATUS=allowed
2025-11-14T09:30:05Z TOOL=Bash CMD=git commit -m "..." AGENT=orchestrator LEVEL=2 GATE=pre-commit STATUS=pending-approval
2025-11-14T09:35:12Z GATE=pre-commit RESPONSE=approved APPROVER=human AGENT=orchestrator

Log rotation

Audit logs grow unbounded without rotation. Options:

  • logrotate (Linux): configure in /etc/logrotate.d/
  • launchd (macOS): size-based rotation via a cleanup script in the heartbeat
  • max-lines truncation: keep last 10,000 lines, archive the rest to logs/archive/

6. Error Threshold and Auto-Pause

Agents that encounter repeated failures indicate either a systemic problem or a changed environment. The error threshold triggers auto-pause before the agent causes further damage.

Configuration in GOVERNANCE.md

## Error Policy

- error_threshold: 3          # consecutive failures before auto-pause
- error_window: 1h            # window for counting consecutive failures
- auto_pause: true            # create PAUSED flag on threshold breach
- notify_on_pause: slack      # channel to notify (optional)

What counts as a failure

  • Any non-zero exit from a pipeline step that is not a deliberate HEARTBEAT_OK
  • Any escalation that receives a rejection response (not just timeout)
  • Any budget hard stop

Recovering from auto-pause

# Check error log
tail -20 logs/audit.log

# Understand why the threshold was reached
grep "FAILURE\|ERROR" logs/audit.log | tail -20

# Fix the underlying issue first
# Then remove the pause flag
rm budget/PAUSED  # if budget-triggered
rm governance/PAUSED  # if error-threshold-triggered

# Reset error counter (optional — some implementations auto-reset after fix)
rm -f governance/error-count

7. GOVERNANCE.md Template Structure

# Governance Policy: {{PROJECT_NAME}}

## Autonomy Level

current: {{AUTONOMY_LEVEL}}   # 0-4
set_by: {{SET_BY}}
set_date: {{SET_DATE}}

## Autonomous Decision Limit

The agent MAY autonomously:
{{ADL_ALLOWED_ACTIONS}}

The agent MUST escalate before:
{{ADL_ESCALATION_TRIGGERS}}

## Approval Gates

{{APPROVAL_GATES}}

## Error Policy

- error_threshold: 3
- error_window: 1h
- auto_pause: true
- notify_on_pause: none

## Audit Requirements

- tool calls: log all
- budget events: log all
- escalations: log all
- retention: 90 days

Template location: ${CLAUDE_PLUGIN_ROOT}/scripts/templates/governance/GOVERNANCE.md


### Verify

```bash
grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md

Expected: >= 5

On failure: revert

Checkpoint

git commit -m "feat(skills): add orchestration and governance pattern references"

Step 25: Create MCP integration reference

Files to create

skills/agent-system-design/references/mcp-integrations.md:

# MCP Integration Reference

Reference for the agent-system-design skill. Covers MCP servers for communication,
data, browser automation, and custom integrations. Includes .mcp.json examples,
agent type recommendations, and security considerations.

---

## What is MCP

MCP (Model Context Protocol) is an open protocol that lets Claude Code connect to
external services via MCP servers. MCP servers expose tools that Claude can call
just like built-in tools (Read, Write, Bash). The connection is configured in
`.mcp.json` in the project root or `~/.claude/.mcp.json` for global servers.

**Important:** MCP servers are available only in local deployments (cron, launchd,
Docker, VPS). Managed agents (Anthropic API `/v1/agents`) do not support MCP servers.

---

## 1. Communication Integrations

### Slack — `@anthropic-ai/mcp-server-slack`

Lets agents send and read Slack messages. Most useful for: notifications from
heartbeat agents, approval gate requests, daily summaries.

```json
{
  "mcpServers": {
    "slack": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-slack"],
      "env": {
        "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}",
        "SLACK_TEAM_ID": "${SLACK_TEAM_ID}"
      }
    }
  }
}

Agent types that benefit: Orchestrators (send status updates), approval-gate hooks (request approvals), monitoring agents (alert on threshold breach).

Security considerations:

  • Use a dedicated bot account, not a personal token
  • Scope the bot to specific channels (don't grant workspace-wide access)
  • Rotate the bot token every 90 days
  • Store SLACK_BOT_TOKEN in environment (macOS Keychain → ~/.zshenv), never in .mcp.json

GitHub — @anthropic-ai/mcp-server-github

Lets agents read and write GitHub issues, PRs, and files. Most useful for: PR review agents, issue triage, automated release notes.

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}

Agent types that benefit: Code review agents, release managers, issue triagers.

Security considerations:

  • Use a fine-grained personal access token scoped to specific repos
  • Grant only the permissions the agent actually needs (read-only if it only comments)
  • Do not use a classic token with full repo scope

Linear — @linear/mcp-server-linear (via plugin)

Lets agents create, update, and query Linear issues and projects.

{
  "mcpServers": {
    "linear": {
      "command": "npx",
      "args": ["-y", "@linear/mcp-server"],
      "env": {
        "LINEAR_API_KEY": "${LINEAR_API_KEY}"
      }
    }
  }
}

Agent types that benefit: Project management agents, sprint planners, backlog maintainers.

Security considerations:

  • Use a Linear API key scoped to the specific team or workspace
  • An agent that can create issues can also spam your backlog — scope the ADL carefully

2. Data Integrations

PostgreSQL

Lets agents query and write to a PostgreSQL database. Most useful for: data analysis agents, state storage beyond flat files, reporting pipelines.

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-postgres",
               "postgresql://localhost/mydb"]
    }
  }
}

Agent types that benefit: Data analysis agents, audit log writers (more durable than JSONL), reporting agents.

Security considerations:

  • Use a dedicated database user with minimal privileges (SELECT + INSERT only if the agent only reads and logs)
  • Never use the postgres superuser
  • Prefer a local socket connection over TCP if the database is on the same machine
  • Connection string contains credentials — load from environment, never hardcode

SQLite

File-based SQL database. No server required. Good for single-agent state that outgrows flat files but does not need PostgreSQL.

{
  "mcpServers": {
    "sqlite": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-sqlite",
               "--db-path", "/path/to/agent.db"]
    }
  }
}

Agent types that benefit: Single-agent systems where JSONL is too unstructured but PostgreSQL is too heavy.

Security considerations:

  • The SQLite file is a plain file — apply filesystem permissions (chmod 600)
  • Back up the file before each heartbeat run if it contains critical state

Filesystem (extended)

The built-in Read/Write/Bash tools cover most filesystem needs. The MCP filesystem server adds directory listing, search, and recursive operations that are otherwise verbose with raw Bash.

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-filesystem",
               "/path/to/allowed/directory"]
    }
  }
}

Security considerations:

  • The path argument is the root of allowed access — set it to the project directory, not /
  • Add a PreToolUse hook to log filesystem MCP tool calls alongside native tool calls

3. Browser Automation

Playwright — @anthropic-ai/mcp-server-playwright

Lets agents control a browser: navigate, click, fill forms, take screenshots, extract content. Most useful for: web scraping, form automation, UI testing, competitive research.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-playwright"]
    }
  }
}

Agent types that benefit: Research agents (deep web extraction), monitoring agents (check site availability or content), automation agents (form submission).

Security considerations:

  • Browser automation can interact with authenticated sessions — ensure the agent runs with a clean browser profile, not your personal browser profile
  • Headless mode is appropriate for server deployments; headed mode for local debugging
  • Rate-limit scraping tasks to avoid being blocked or causing load on target sites
  • Never store session cookies in version control

4. Custom MCP Servers

When no existing MCP server covers your integration, build one using the Anthropic SDK.

Minimal TypeScript MCP server

import Anthropic from "@anthropic-ai/sdk";
import { Server } from "@anthropic-ai/sdk/mcp";
import { z } from "zod";

const server = new Server({
  name: "my-custom-server",
  version: "0.1.0",
});

server.tool(
  "my_tool",
  {
    description: "Does something useful",
    inputSchema: z.object({
      param: z.string().describe("The input parameter"),
    }),
  },
  async ({ param }) => {
    // Your integration logic here
    const result = await callExternalService(param);
    return {
      content: [{ type: "text", text: result }],
    };
  }
);

server.run({ transport: "stdio" });
{
  "mcpServers": {
    "my-custom-server": {
      "command": "node",
      "args": ["/path/to/my-custom-server/index.js"],
      "env": {
        "MY_API_KEY": "${MY_API_KEY}"
      }
    }
  }
}

When to build custom:

  • The target API has no existing MCP server
  • Existing MCP servers expose more than the agent needs (reduce attack surface)
  • You need to enforce business logic in the server (e.g., rate limits, audit logging)
  • The integration requires local state that a remote MCP server cannot hold

5. Integration with Agent Factory

The Agent Factory builder configures .mcp.json in Phase 3 of the /agent-factory:build workflow:

  1. Phase 3 — MCP configuration: Builder asks which external services the agent system needs. For each selected service, it appends the appropriate server entry to .mcp.json and adds the required environment variable to hooks/env-check.sh.

  2. Phase 5 — Deployment: The deployment-advisor agent checks MCP availability for the chosen deployment target and warns if managed agents are selected with MCP dependencies.

  3. Phase 6 — Security review: The security review step checks that all MCP server tokens are loaded from environment variables (not hardcoded), that each server is scoped to the minimum required permissions, and that the .mcp.json is not committed with secrets.

Checking MCP availability

# Verify MCP servers are connected
claude mcp list

# Test a specific server
claude mcp test slack

Keeping .mcp.json out of version control

If .mcp.json contains sensitive paths or server configs that vary by environment, add it to .gitignore and provide a .mcp.json.template with placeholders:

.mcp.json
{
  "mcpServers": {
    "slack": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-server-slack"],
      "env": {
        "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}",
        "SLACK_TEAM_ID": "${SLACK_TEAM_ID}"
      }
    }
  }
}

The template uses ${VAR} syntax — shell environment variables, not hardcoded values.


### Verify

```bash
test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md && echo "EXISTS"

Expected: EXISTS

On failure: skip (supplementary reference, does not block other steps)

Checkpoint

git commit -m "docs(skills): add MCP integration reference"

Exit Condition

  • grep -c "memory-patterns\|autonomy-patterns\|heartbeat" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md>= 3
  • grep -c "orchestration\|governance\|heartbeat\|budget\|org.chart" /Users/ktg/repos/agent-builder/skills/agent-system-design/SKILL.md>= 5
  • test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/memory-patterns.md → exit 0
  • test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/autonomy-patterns.md → exit 0
  • test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/orchestration-patterns.md → exit 0
  • test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/governance-patterns.md → exit 0
  • test -f /Users/ktg/repos/agent-builder/skills/agent-system-design/references/mcp-integrations.md → exit 0
  • SKILL.md description field contains trigger phrases from both Step 13 and Step 19 (composable)
  • All reference files contain section headers matching their names (grep for "## 1.")
  • Step 13 and Step 19 diffs do not overlap — each diff targets unique text anchors

Quality Criteria

  • SKILL.md description field triggers match the pattern of the existing trigger phrases (quoted, comma-separated)
  • System components table rows added in Step 13 and Step 19 do not duplicate each other
  • Memory patterns reference covers all three tiers with precise write/read order
  • WAL protocol entry includes a concrete example, not just a description
  • Autonomy patterns reference includes VFM worked examples with numeric scores and outcomes
  • Self-healing protocol specifies the 5-attempt cap explicitly (never open-ended retries)
  • Orchestration patterns comparison matrix covers all three scheduling approaches (OpenClaw cron, Paperclip heartbeat, /schedule)
  • Governance patterns reference states the "autonomy is a privilege" philosophy in concrete terms with progression criteria
  • MCP reference includes .mcp.json examples for each server — copy-paste ready
  • MCP reference notes the managed-agents limitation (no MCP support) explicitly
  • All ${CLAUDE_PLUGIN_ROOT} paths point to files that will exist after Sessions 3 and 4
  • No security considerations section uses vague language — each point is a specific, actionable instruction