Kjell Tore Guttormsen 1a776bdeb2 docs(plans): create session blueprints for Agent Factory execution

8 session blueprints covering all 27 steps across 3 waves:
- Session 1: Foundation (rename + commands, Steps 1-5)
- Session 2: Skills and templates (Steps 6-7)
- Session 3: OpenClaw patterns (memory/heartbeat/proactive/cron, Steps 9-12)
- Session 4: Paperclip patterns (context/goals/budget/governance/org-chart, Steps 14-18)
- Session 5: Self-learning (feedback/optimization, Steps 20-21)
- Session 6: Integration (Docker/transfer/5 more domains, Steps 22-24)
- Session 7: Skill updates (memory/autonomy/orchestration/governance/MCP refs, Steps 13,19,25)
- Session 8: Finalization (build command integration + v1.0, Steps 8,26,27)

Also updates plan assumptions table with verified findings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-11 11:21:17 +02:00

36 KiB

Raw Blame History

Session 3: OpenClaw Memory and Autonomy Patterns

Steps 9, 10, 11, 12 | Wave 1 | Depends on: none

Dependencies

Entry condition: none (independent — creates new template directories only)

Scope Fence

Touch:

scripts/templates/memory/SESSION-STATE.md (new)
scripts/templates/memory/DAILY-LOG.md (new)
scripts/templates/memory/MEMORY.md (new)
scripts/templates/memory/README.md (new)
scripts/templates/heartbeat/HEARTBEAT.md (new)
scripts/templates/heartbeat/heartbeat-runner.sh (new)
scripts/templates/heartbeat/README.md (new)
scripts/templates/proactive/PROACTIVE-AGENT.md (new)
scripts/templates/proactive/ADL-RULES.md (new)
scripts/templates/proactive/VFM-SCORING.md (new)
scripts/templates/proactive/README.md (new)
scripts/templates/cron/agent-turn.sh (new)
scripts/templates/cron/system-event.sh (new)
scripts/templates/cron/README.md (new)

Never touch:

commands/
agents/
skills/
scripts/templates/heartbeat/context-packet.md (Session 4)
scripts/templates/heartbeat/wake-prompt.md (Session 4)
.claude-plugin/
CLAUDE.md, README.md

Step 9: Create 3-tier memory templates

Files to create

scripts/templates/memory/SESSION-STATE.md:

# Session State: {{AGENT_NAME}}

> Hot working memory. Updated every turn. Read first on resume.

## WAL Protocol

**Write important details HERE before responding to the user.**
This prevents data loss if the session crashes or context compacts mid-response.

## Current Task

- Task: [what you're working on right now]
- Started: [timestamp]
- Status: [in progress / blocked / waiting]
- Key decision: [the most important choice made this session]

## Context Window Usage

- Estimated usage: [low / medium / high / DANGER ZONE]
- If above 60%: activate Working Buffer below

## Active Decisions

| Decision | Choice | Reason | Reversible? |
|----------|--------|--------|-------------|
| | | | |

## Pending Actions

- [ ] [action 1]
- [ ] [action 2]

## Working Buffer

> Activate when context usage exceeds 60%. Capture key exchanges here
> before they are lost to compaction. This is your safety net.

### Recent exchanges to preserve

[Paste critical user messages and your key responses here when in the danger zone]

### Key facts from this session

[Extract and list facts that would be lost on compaction]

## Compaction Recovery

If you're reading this after a context compaction:
1. Read this SESSION-STATE.md first (you're here)
2. Read today's daily log: `memory/$(date +%Y-%m-%d).md`
3. Read memory/MEMORY.md for long-term context
4. Search daily logs for relevant prior context
5. Resume from the Current Task and Pending Actions above

scripts/templates/memory/DAILY-LOG.md:

# Daily Log: {{AGENT_NAME}}

> Warm daily capture. One file per day. Filename: memory/YYYY-MM-DD.md
> Auto-rotated: the pipeline creates a new file each day.

## {{DATE}}

### Summary of Work

[1-3 sentences describing what was accomplished today]

### Decisions Made

| Decision | Context | Outcome |
|----------|---------|---------|
| | | |

### Files Modified

- [file path] — [what changed and why]

### Issues Encountered

- [issue description] — [resolution or status]

### Quality Scores

| Pipeline run | Reviewer score | Notes |
|-------------|---------------|-------|
| | | |

### Carry Forward

> Items for the next session. These get checked on the next pipeline run.

- [ ] [item that needs attention tomorrow]
- [ ] [follow-up from today's work]

### Cost

- Estimated tokens used: [if tracked]
- Pipeline runs: [count]

scripts/templates/memory/MEMORY.md:

# Long-Term Memory: {{AGENT_NAME}}

> Cold curated memory. Updated manually or after significant learnings.
> This is the last file read during compaction recovery.

## Agent Identity

- Name: {{AGENT_NAME}}
- Role: [what this agent does]
- Domain: {{DOMAIN}}
- Created: {{DATE}}

## Key Learnings

> Manually curated from daily logs. Only include insights that affect
> future behavior. Delete entries that are no longer relevant.

- [learning 1 — date discovered]
- [learning 2 — date discovered]

## Recurring Patterns

> Patterns observed across multiple runs. Used to improve agent behavior.

| Pattern | Frequency | Impact | Action taken |
|---------|-----------|--------|-------------|
| | | | |

## Known Issues

> Active issues that affect agent performance.

- [issue] — [workaround] — [status: open/resolved]

## Project Context

> Static context that doesn't change often.

- Project: {{PROJECT_DIR}}
- Pipeline: {{PIPELINE_NAME}}
- Schedule: {{SCHEDULE}}
- Deployment: [target]

## Last Updated

[date — update this when you curate this file]

scripts/templates/memory/README.md:

# 3-Tier Memory System

Inspired by OpenClaw's proactive agent memory pattern. Three tiers serve
different purposes with different update frequencies.

## Architecture

Tier 1: SESSION-STATE.md (hot)

Updated every turn
Read FIRST on session start and compaction recovery
Contains: current task, decisions, pending actions, working buffer

Tier 2: memory/YYYY-MM-DD.md (warm)

One file per day, auto-rotated
Updated at end of each pipeline run
Contains: daily summary, decisions, files modified, issues, carry-forward

Tier 3: memory/MEMORY.md (cold)

Updated manually or after significant learnings
Read LAST during compaction recovery
Contains: identity, curated learnings, patterns, known issues


## WAL Protocol (Write-Ahead Logging)

Before responding to the user with important information, write it to
SESSION-STATE.md first. This prevents data loss if:
- The session crashes mid-response
- Context compaction removes the exchange
- The user's connection drops

## Working Buffer Protocol

When context usage exceeds ~60% (the "danger zone"):
1. Activate the Working Buffer section in SESSION-STATE.md
2. Copy critical recent exchanges into the buffer
3. Extract and list key facts that would be lost on compaction
4. Continue working normally — the buffer is your safety net

## Compaction Recovery

When Claude resumes after context compaction, it reads in this order:
1. SESSION-STATE.md (current task, decisions, working buffer)
2. Today's daily log (what happened today)
3. MEMORY.md (long-term context, known issues)
4. Search older daily logs if needed for specific context

## Integration with Agent Factory

During `/agent-factory:build` Phase 2.5 (Memory Setup):
1. Copy these templates to the user's `memory/` directory
2. Replace `{{PLACEHOLDER}}` variables with project-specific values
3. Create the initial SESSION-STATE.md and MEMORY.md
4. Configure the pipeline skill to update daily logs after each run

## File locations after scaffolding

project/ memory/ SESSION-STATE.md (from Tier 1 template) MEMORY.md (from Tier 3 template) 2026-04-11.md (generated daily, from Tier 2 template) 2026-04-12.md ...

Verify

ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l

Expected: 4

On failure

revert

Checkpoint

git commit -m "feat(templates): add 3-tier memory templates (OpenClaw pattern)"

Step 10: Create heartbeat and cron templates

Files to create

scripts/templates/heartbeat/HEARTBEAT.md:

# Heartbeat: {{AGENT_NAME}}

Read this file on each heartbeat. Follow it strictly. Do not infer or
repeat old tasks from prior chats. If nothing needs attention, reply
HEARTBEAT_OK.

## Tasks

tasks:
  - name: {{TASK_1_NAME}}
    interval: {{TASK_1_INTERVAL}}
    prompt: "{{TASK_1_PROMPT}}"
  - name: {{TASK_2_NAME}}
    interval: {{TASK_2_INTERVAL}}
    prompt: "{{TASK_2_PROMPT}}"

## Context

{{CONTEXT_NOTES}}

## Rules

- Only perform tasks listed above
- Respect the interval — do not run a task before its next due time
- If a task fails, log the error and continue to the next task
- Respond with HEARTBEAT_OK if no tasks are due

scripts/templates/heartbeat/heartbeat-runner.sh:

#!/bin/bash
# Heartbeat runner for Claude Code agents.
# Reads HEARTBEAT.md, checks which tasks are due, invokes claude -p for each.
#
# Bash 3.2 compatible: no associative arrays, no mapfile, no |&
# Uses python3 for all JSON/YAML/date operations.
#
# Usage: ./heartbeat-runner.sh [--catchup]
#   --catchup: run missed tasks on first invocation (max 5, 5s stagger)
#
# Placeholders:
#   {{AGENT_NAME}}     - name of the agent
#   {{WORKING_DIR}}    - absolute path to project directory
#   {{MAX_TURNS}}      - max turns per heartbeat (default: 10)
#   {{ACK_MAX_CHARS}}  - suppress responses shorter than this (default: 300)

AGENT_NAME="{{AGENT_NAME}}"
WORKING_DIR="{{WORKING_DIR}}"
MAX_TURNS="${MAX_TURNS:-10}"
ACK_MAX_CHARS="${ACK_MAX_CHARS:-300}"
HEARTBEAT_FILE="$WORKING_DIR/HEARTBEAT.md"
STATE_FILE="$WORKING_DIR/.heartbeat-state.json"
LOG_DIR="$WORKING_DIR/logs"
CATCHUP_MODE=false

if [ "$1" = "--catchup" ]; then
  CATCHUP_MODE=true
fi

# Ensure directories exist
mkdir -p "$LOG_DIR"

# --- Emptiness detection (OpenClaw pattern) ---
# Skip API calls if heartbeat file has only headers/empty items
is_heartbeat_empty() {
  python3 << 'PYEOF'
import sys, re

try:
    with open("$HEARTBEAT_FILE", "r") as f:
        content = f.read()
except FileNotFoundError:
    print("true")
    sys.exit(0)

# Strip markdown headers, blank lines, and YAML structure markers
stripped = re.sub(r'^#+.*$', '', content, flags=re.MULTILINE)
stripped = re.sub(r'^tasks:\s*$', '', stripped, flags=re.MULTILINE)
stripped = re.sub(r'^\s*-\s*name:\s*\{\{.*\}\}\s*$', '', stripped, flags=re.MULTILINE)
stripped = re.sub(r'^\s*$', '', stripped, flags=re.MULTILINE)
stripped = stripped.strip()

if len(stripped) < 20:
    print("true")
else:
    print("false")
PYEOF
}

HEARTBEAT_FILE_ACTUAL="$HEARTBEAT_FILE"
EMPTY_CHECK=$(HEARTBEAT_FILE="$HEARTBEAT_FILE_ACTUAL" python3 -c "
import sys, re, os
hf = os.environ.get('HEARTBEAT_FILE', '')
try:
    content = open(hf).read()
except:
    print('true'); sys.exit(0)
stripped = re.sub(r'^#+.*$', '', content, flags=re.MULTILINE)
stripped = re.sub(r'^\s*$', '', stripped, flags=re.MULTILINE).strip()
print('true' if len(stripped) < 20 else 'false')
" 2>/dev/null)

if [ "$EMPTY_CHECK" = "true" ]; then
  echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | heartbeat | SKIP (empty heartbeat file)" >> "$LOG_DIR/heartbeat.log"
  exit 0
fi

# --- Parse tasks and check due times ---
DUE_TASKS=$(python3 << PYEOF
import json, re, os, time
from datetime import datetime, timedelta

heartbeat_file = "$HEARTBEAT_FILE_ACTUAL"
state_file = "$STATE_FILE"
catchup = "$CATCHUP_MODE" == "true"

# Parse tasks from HEARTBEAT.md
try:
    content = open(heartbeat_file).read()
except FileNotFoundError:
    print("[]")
    exit(0)

# Simple YAML-like task parsing
tasks = []
current_task = {}
for line in content.split('\n'):
    line = line.strip()
    m_name = re.match(r'-\s*name:\s*(.+)', line)
    m_interval = re.match(r'interval:\s*(.+)', line)
    m_prompt = re.match(r'prompt:\s*"(.+)"', line)
    if m_name:
        if current_task.get('name'):
            tasks.append(current_task)
        current_task = {'name': m_name.group(1).strip()}
    elif m_interval and current_task:
        current_task['interval'] = m_interval.group(1).strip()
    elif m_prompt and current_task:
        current_task['prompt'] = m_prompt.group(1).strip()
if current_task.get('name'):
    tasks.append(current_task)

# Load state
try:
    state = json.load(open(state_file))
except:
    state = {}

# Parse interval to seconds
def parse_interval(s):
    s = s.strip()
    m = re.match(r'(\d+)\s*(m|min|h|hr|d)', s)
    if not m:
        return 3600  # default 1 hour
    val, unit = int(m.group(1)), m.group(2)
    if unit in ('m', 'min'):
        return val * 60
    elif unit in ('h', 'hr'):
        return val * 3600
    elif unit == 'd':
        return val * 86400
    return 3600

# Check which tasks are due
now = time.time()
due = []
for task in tasks:
    name = task.get('name', '')
    interval_sec = parse_interval(task.get('interval', '1h'))
    last_run = state.get(name, {}).get('last_run', 0)
    if now - last_run >= interval_sec:
        due.append(task)
    elif catchup and last_run == 0:
        due.append(task)

# Limit catchup to 5 tasks
if catchup:
    due = due[:5]

print(json.dumps(due))
PYEOF
)

# --- Run due tasks ---
TASK_COUNT=$(echo "$DUE_TASKS" | python3 -c "import sys,json; print(len(json.load(sys.stdin)))" 2>/dev/null)

if [ "$TASK_COUNT" = "0" ]; then
  echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | heartbeat | HEARTBEAT_OK (no tasks due)" >> "$LOG_DIR/heartbeat.log"
  exit 0
fi

echo "$DUE_TASKS" | python3 -c "
import sys, json, subprocess, time, os

tasks = json.load(sys.stdin)
state_file = '$STATE_FILE'
log_dir = '$LOG_DIR'
working_dir = '$WORKING_DIR'
max_turns = '$MAX_TURNS'
ack_max = int('$ACK_MAX_CHARS')
catchup = '$CATCHUP_MODE' == 'true'

# Load state
try:
    state = json.load(open(state_file))
except:
    state = {}

for i, task in enumerate(tasks):
    name = task.get('name', 'unknown')
    prompt = task.get('prompt', '')

    if not prompt:
        continue

    # Stagger catchup tasks
    if catchup and i > 0:
        time.sleep(5)

    ts = time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())
    print(f'{ts} | heartbeat | RUNNING: {name}')

    try:
        result = subprocess.run(
            ['claude', '-p', prompt, '--output-format', 'text', '--max-turns', str(max_turns)],
            capture_output=True, text=True, timeout=600,
            cwd=working_dir
        )
        output = result.stdout.strip()

        # Suppress short ack responses (OpenClaw ackMaxChars pattern)
        if len(output) <= ack_max and 'HEARTBEAT_OK' in output:
            log_line = f'{ts} | heartbeat | {name} | HEARTBEAT_OK (suppressed)'
        else:
            log_line = f'{ts} | heartbeat | {name} | completed ({len(output)} chars)'
            # Save full output
            with open(os.path.join(log_dir, f'heartbeat-{name}-{time.strftime(\"%Y-%m-%d\")}.log'), 'a') as f:
                f.write(f'--- {ts} ---\n{output}\n\n')

    except subprocess.TimeoutExpired:
        log_line = f'{ts} | heartbeat | {name} | TIMEOUT'
    except Exception as e:
        log_line = f'{ts} | heartbeat | {name} | ERROR: {str(e)}'

    with open(os.path.join(log_dir, 'heartbeat.log'), 'a') as f:
        f.write(log_line + '\n')

    # Update state
    state[name] = {'last_run': time.time()}

# Save state
with open(state_file, 'w') as f:
    json.dump(state, f, indent=2)
"

echo "Heartbeat complete: $TASK_COUNT tasks processed"

scripts/templates/heartbeat/README.md:

# Heartbeat Scheduling

Combines OpenClaw's HEARTBEAT.md task format with Paperclip's interval-based
heartbeat model. Designed for Claude Code agents running on a schedule.

## How it works

1. A scheduler (cron/launchd/systemd) runs `heartbeat-runner.sh` at a fixed
   interval (e.g., every 30 minutes)
2. The runner reads `HEARTBEAT.md` for task definitions
3. **Emptiness detection**: if the file has no real tasks, skip the API call
   entirely (saves cost — from OpenClaw)
4. For each task: check if it's due based on its interval and last-run time
5. Run due tasks via `claude -p` with the task's prompt
6. Suppress short acknowledgment responses (<300 chars containing HEARTBEAT_OK)
7. Update `.heartbeat-state.json` with last-run timestamps

## Two execution types (from OpenClaw)

### systemEvent
Injects a text event into an existing session. Lightweight, no new session.
Use for: notifications, status checks, simple updates.
Template: `scripts/templates/cron/system-event.sh`

### agentTurn
Fires a full agent turn with its own session. Full context, full tool access.
Use for: background autonomous work, complex tasks, multi-step operations.
Template: `scripts/templates/cron/agent-turn.sh`

## Startup catchup (OpenClaw pattern)

When the runner starts after downtime (e.g., machine was off):
- Run `heartbeat-runner.sh --catchup`
- Processes up to 5 missed tasks
- 5-second stagger between tasks (prevents thundering herd)

## Cost optimization

- **Emptiness detection**: No API call if HEARTBEAT.md has no real content
- **ackMaxChars suppression**: Responses under 300 chars with HEARTBEAT_OK
  are logged but not displayed (saves downstream processing)
- **Interval-based**: Only run tasks when actually due, not every heartbeat

## Example cron entries

```bash
# Run heartbeat every 30 minutes
*/30 * * * * /path/to/heartbeat-runner.sh >> /tmp/heartbeat.log 2>&1

# Run heartbeat every hour with catchup on restart
@reboot /path/to/heartbeat-runner.sh --catchup >> /tmp/heartbeat.log 2>&1
0 * * * * /path/to/heartbeat-runner.sh >> /tmp/heartbeat.log 2>&1

State file format

.heartbeat-state.json:

{
  "email-check": { "last_run": 1712847600 },
  "report-generation": { "last_run": 1712844000 }
}


### Verify

```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh

Expected: no syntax errors (exit 0)

On failure

retry — fix bash 3.2 syntax issues, then revert if still failing

Checkpoint

git commit -m "feat(templates): add heartbeat templates with emptiness detection and catchup"

Step 11: Create proactive agent templates with ADL/VFM

Files to create

scripts/templates/proactive/PROACTIVE-AGENT.md:

---
name: {{AGENT_NAME}}
description: |
  A proactive agent that can identify improvements and self-modify within
  strict guardrails. Uses ADL (Anti-Drift Limits) and VFM (Value-First
  Modification) scoring to prevent uncontrolled drift.

  <example>
  Context: Agent identifies a recurring inefficiency
  user: "Check for improvements"
  assistant: "I'll review recent performance data and propose changes via VFM scoring."
  <commentary>Proactive improvement cycle triggered by performance review.</commentary>
  </example>
model: sonnet
tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash"]
---

## How you work

You are a proactive agent. You don't just respond to tasks — you observe
your environment, identify improvements, and implement changes that pass
VFM scoring.

### Proactive cycle

1. **Observe**: Read performance data (feedback/FEEDBACK.md, audit.log, cost-events.jsonl)
2. **Identify**: Find patterns: recurring errors, slow steps, unnecessary work
3. **Score**: Run VFM scoring on each proposed change (see VFM protocol below)
4. **Implement**: Only changes with VFM score > 50. All others logged but not applied.
5. **Log**: Record every decision (implement or defer) with scores and reasoning

### VFM protocol

Before making ANY change to your own config, skills, prompts, or behavior:

1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/VFM-SCORING.md`
2. Score the proposed change across 4 dimensions (0-25 each)
3. If total score > 50: implement and log
4. If total score <= 50: log with reason for deferral, do NOT implement

### Self-healing protocol

When encountering errors:
1. Log the error with full context
2. Try approach 1 (most likely fix based on error message)
3. If fail: try approach 2 (alternative strategy)
4. If fail: try approach 3 (simplified version)
5. Continue up to 5 attempts with increasingly conservative approaches
6. After 5 failures: escalate to human with full attempt log

## Rules (ADL — Anti-Drift Limits)

Read the full ADL at `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/ADL-RULES.md`.

Core constraints:
- **No fake intelligence**: Do not simulate capabilities you lack
- **No unverifiable modifications**: Every change must be testable
- **No novelty over stability**: Prefer proven approaches over clever ones
- **No scope expansion without approval**: Stay within your defined boundaries
- **No silent failures**: All errors must be logged

Priority ordering: Stability > Explainability > Reusability > Scalability > Novelty

## Output format

After each proactive cycle, produce:

PROACTIVE CYCLE REPORT

Date: [timestamp] Observations: [N] patterns found Proposals: [N] changes evaluated

Proposed change	VFM score	Decision	Reason
[change 1]	[score]	implement/defer	[why]
...

Implemented: [N] Deferred: [N] Errors handled: [N] (max attempt: [N])

scripts/templates/proactive/ADL-RULES.md:

# Anti-Drift Limits (ADL)

Guardrails that prevent proactive agents from drifting beyond useful behavior.
Inspired by OpenClaw's proactive agent skill.

## Constraints

### 1. No fake intelligence
Do not simulate capabilities you do not have. If you cannot access a tool,
do not pretend the operation succeeded. If you cannot verify a fact, say so.

### 2. No unverifiable modifications
Every change you make must be testable. Before implementing:
- Define how to verify the change worked
- Run the verification after implementation
- Revert if verification fails

### 3. No novelty over stability
When choosing between a clever new approach and a proven existing one,
choose the proven approach unless VFM scoring strongly favors the new one
(score > 75).

### 4. No scope expansion without approval
Your boundaries are defined by your agent file and CLAUDE.md. You may
optimize within those boundaries. You may NOT:
- Add new tools to your own configuration
- Modify other agents' files
- Change system-level settings
- Create new agents or skills

### 5. No silent failures
Every error, every failed attempt, every unexpected result must be logged.
Write to the daily log (memory/YYYY-MM-DD.md) or a dedicated error log.

## Priority Ordering

When constraints conflict, apply this priority:

Stability > Explainability > Reusability > Scalability > Novelty


A stable system that is hard to understand is better than a novel system
that breaks. An explainable system that doesn't scale is better than a
scalable system that nobody can debug.

## When to override ADL

ADL can be overridden ONLY by explicit human instruction. If the user says
"try the new approach even though it's risky," that overrides constraint #3.
Log the override with the user's exact instruction.

Never self-override. The whole point of ADL is to prevent the agent from
convincing itself that an exception is warranted.

scripts/templates/proactive/VFM-SCORING.md:

# Value-First Modification (VFM) Scoring

Scoring rubric for evaluating proposed self-modifications. Any change to
agent config, prompts, behavior, or pipeline structure must score > 50
to be implemented.

## Dimensions

### Frequency (0-25 points)
How often does the issue this change addresses occur?

| Score | Criteria |
|-------|----------|
| 0-5 | Happened once, may not recur |
| 6-10 | Happens occasionally (1-2x per week) |
| 11-15 | Happens regularly (daily) |
| 16-20 | Happens frequently (multiple times per day) |
| 21-25 | Happens on nearly every run |

### Failure Reduction (0-25 points)
Does this change fix real failures?

| Score | Criteria |
|-------|----------|
| 0-5 | Cosmetic improvement, no failures prevented |
| 6-10 | Prevents occasional warnings or non-critical errors |
| 11-15 | Prevents errors that require manual intervention |
| 16-20 | Prevents errors that cause pipeline failure |
| 21-25 | Prevents errors that cause data loss or system damage |

### Burden Reduction (0-25 points)
Does this reduce human effort?

| Score | Criteria |
|-------|----------|
| 0-5 | Saves less than 1 minute per occurrence |
| 6-10 | Saves 1-5 minutes per occurrence |
| 11-15 | Saves 5-30 minutes per occurrence |
| 16-20 | Eliminates a manual step entirely |
| 21-25 | Eliminates multiple manual steps or a recurring task |

### Cost Savings (0-25 points)
Does this reduce API/compute costs?

| Score | Criteria |
|-------|----------|
| 0-5 | Negligible cost difference |
| 6-10 | Saves <10% on affected operations |
| 11-15 | Saves 10-25% on affected operations |
| 16-20 | Saves 25-50% on affected operations |
| 21-25 | Saves >50% or eliminates unnecessary API calls entirely |

## Decision threshold

| Total score | Decision |
|-------------|----------|
| > 50 | **Implement** — change is worth the risk |
| 26-50 | **Defer** — log for future consideration |
| <= 25 | **Reject** — not worth pursuing |

## Logging format

Every VFM evaluation must be logged, whether implemented or not:

VFM EVALUATION Date: [timestamp] Proposed change: [description] Scores: Frequency: [score] — [justification] Failure reduction: [score] — [justification] Burden reduction: [score] — [justification] Cost savings: [score] — [justification] Total: [sum]/100 Decision: implement / defer / reject


## Worked examples

### Example 1: Add retry logic to web search (Implement)
- Frequency: 18 (search fails ~3x daily due to timeouts)
- Failure reduction: 15 (prevents pipeline stall requiring manual restart)
- Burden reduction: 16 (eliminates manual re-run)
- Cost savings: 8 (slight cost from retry, but saves failed run cost)
- **Total: 57 → Implement**

### Example 2: Refactor prompt to use XML tags (Defer)
- Frequency: 25 (every run)
- Failure reduction: 3 (current format works fine)
- Burden reduction: 2 (no human effort saved)
- Cost savings: 5 (maybe slightly fewer tokens)
- **Total: 35 → Defer** (improvement is real but marginal)

### Example 3: Switch to experimental model (Reject)
- Frequency: 25 (every run)
- Failure reduction: 0 (current model has no failures)
- Burden reduction: 0 (no human effort saved)
- Cost savings: 10 (newer model might be cheaper)
- **Total: 35 → Defer** (stability > novelty per ADL)

scripts/templates/proactive/README.md:

# Proactive Agent Pattern

A proactive agent observes its environment, identifies improvements, and
self-modifies within strict guardrails. This pattern is inspired by
OpenClaw's proactive agent skill.

## When to use

- Agents that run frequently and should improve over time
- Pipelines with measurable performance metrics
- Systems where the cost of not improving exceeds the risk of changes

## When NOT to use

- Simple pipelines that just need to run reliably
- Human-in-the-loop workflows (the human provides the feedback)
- New systems that haven't established a performance baseline yet

## Components

- **PROACTIVE-AGENT.md**: Agent template with proactive cycle, VFM protocol, self-healing
- **ADL-RULES.md**: Anti-Drift Limits — constraints that prevent uncontrolled drift
- **VFM-SCORING.md**: Value-First Modification — scoring rubric for proposed changes

## How ADL and VFM work together

ADL defines what the agent CANNOT do (hard boundaries).
VFM determines what the agent SHOULD do (prioritization within boundaries).

Proposed change → Check ADL constraints → BLOCKED if constraint violated → Score with VFM → IMPLEMENT if > 50, DEFER if <= 50 → Log decision either way


## Integration with feedback loops

The proactive agent reads from:
- `feedback/FEEDBACK.md` — pipeline run outcomes
- `budget/cost-events.jsonl` — cost data
- `logs/audit.log` — tool call history
- `memory/MEMORY.md` — long-term patterns

It writes to:
- Daily log (decisions and scores)
- Its own agent file (when implementing approved changes)
- SESSION-STATE.md (current proactive cycle state)

Verify

ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l

Expected: 4

On failure

revert

Checkpoint

git commit -m "feat(templates): add proactive agent templates with ADL/VFM guardrails"

Step 12: Create cron templates (agentTurn + systemEvent)

Files to create

scripts/templates/cron/agent-turn.sh:

#!/bin/bash
# Agent Turn: Full background autonomy for Claude Code agents.
# Fires a complete agent turn with its own named session.
#
# Bash 3.2 compatible. Uses python3 for JSON/date operations.
#
# Placeholders:
#   {{AGENT_NAME}}   - name of the agent
#   {{WORKING_DIR}}  - absolute path to project directory
#   {{MAX_TURNS}}    - max turns per agent turn (default: 15)

AGENT_NAME="{{AGENT_NAME}}"
WORKING_DIR="{{WORKING_DIR}}"
MAX_TURNS="${MAX_TURNS:-15}"
SESSION_NAME="agent:${AGENT_NAME}:turn"
PID_FILE="$WORKING_DIR/.agent-turn-${AGENT_NAME}.pid"
LOG_DIR="$WORKING_DIR/logs"
STATE_FILE="$WORKING_DIR/.agent-turn-state.json"

mkdir -p "$LOG_DIR"

# --- Orphan detection ---
# Check if a previous agent turn is still running
if [ -f "$PID_FILE" ]; then
  OLD_PID=$(cat "$PID_FILE" 2>/dev/null)
  if [ -n "$OLD_PID" ] && kill -0 "$OLD_PID" 2>/dev/null; then
    echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | agent-turn | SKIP: previous run still active (PID $OLD_PID)" >> "$LOG_DIR/agent-turn.log"
    exit 0
  else
    echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | agent-turn | Cleaned orphan PID file (PID $OLD_PID)" >> "$LOG_DIR/agent-turn.log"
    rm -f "$PID_FILE"
  fi
fi

# --- Session rollover check ---
# After 200 turns or 72 hours, start a fresh session
NEEDS_FRESH=$(python3 -c "
import json, time, os
state_file = '$STATE_FILE'
agent = '$AGENT_NAME'
try:
    state = json.load(open(state_file))
    agent_state = state.get(agent, {})
    turns = agent_state.get('turn_count', 0)
    started = agent_state.get('session_started', 0)
    age_hours = (time.time() - started) / 3600 if started else 999
    if turns >= 200 or age_hours >= 72:
        print('true')
    else:
        print('false')
except:
    print('true')
" 2>/dev/null)

if [ "$NEEDS_FRESH" = "true" ]; then
  # Use a new session name with timestamp for rollover
  SESSION_NAME="agent:${AGENT_NAME}:turn:$(date +%s)"
  # Reset state
  python3 -c "
import json, time, os
state_file = '$STATE_FILE'
agent = '$AGENT_NAME'
try:
    state = json.load(open(state_file))
except:
    state = {}
state[agent] = {'turn_count': 0, 'session_started': time.time(), 'session_name': '$SESSION_NAME'}
with open(state_file, 'w') as f:
    json.dump(state, f, indent=2)
"
fi

# --- Load context ---
# Build prompt from HEARTBEAT.md and SESSION-STATE.md
PROMPT=$(python3 -c "
import os
working_dir = '$WORKING_DIR'
parts = []

# Read SESSION-STATE.md for current context
ss = os.path.join(working_dir, 'memory', 'SESSION-STATE.md') if os.path.isdir(os.path.join(working_dir, 'memory')) else os.path.join(working_dir, 'SESSION-STATE.md')
if os.path.exists(ss):
    parts.append('## Current session state:\n' + open(ss).read()[:2000])

# Read HEARTBEAT.md for tasks
hb = os.path.join(working_dir, 'HEARTBEAT.md')
if os.path.exists(hb):
    parts.append('## Heartbeat tasks:\n' + open(hb).read()[:2000])

if parts:
    print('\n\n'.join(parts))
else:
    print('No context files found. Check the working directory and proceed with any pending tasks.')
" 2>/dev/null)

# --- Run agent turn ---
echo $$ > "$PID_FILE"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
echo "$TIMESTAMP | agent-turn | START: $AGENT_NAME (session: $SESSION_NAME)" >> "$LOG_DIR/agent-turn.log"

cd "$WORKING_DIR"

OUTPUT=$(claude -p "$PROMPT" \
  --name "$SESSION_NAME" \
  --output-format text \
  --max-turns "$MAX_TURNS" 2>&1)

EXIT_CODE=$?
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

# Save output to dated log
echo "--- $TIMESTAMP ---" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log"
echo "$OUTPUT" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log"
echo "" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log"

# Update turn count
python3 -c "
import json, time
state_file = '$STATE_FILE'
agent = '$AGENT_NAME'
try:
    state = json.load(open(state_file))
except:
    state = {}
if agent not in state:
    state[agent] = {'turn_count': 0, 'session_started': time.time(), 'session_name': '$SESSION_NAME'}
state[agent]['turn_count'] = state[agent].get('turn_count', 0) + 1
state[agent]['last_run'] = time.time()
with open(state_file, 'w') as f:
    json.dump(state, f, indent=2)
"

if [ "$EXIT_CODE" -eq 0 ]; then
  echo "$TIMESTAMP | agent-turn | COMPLETE: $AGENT_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/agent-turn.log"
else
  echo "$TIMESTAMP | agent-turn | ERROR: $AGENT_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/agent-turn.log"
fi

# Cleanup
rm -f "$PID_FILE"

scripts/templates/cron/system-event.sh:

#!/bin/bash
# System Event: Inject a text event into an existing Claude Code session.
# Lighter than agentTurn — does not create a new session.
#
# Bash 3.2 compatible.
#
# Usage: ./system-event.sh "session-name" "Event text to inject"
#
# Placeholders:
#   {{WORKING_DIR}} - absolute path to project directory

WORKING_DIR="{{WORKING_DIR}}"
SESSION_NAME="${1:-}"
EVENT_TEXT="${2:-}"

if [ -z "$SESSION_NAME" ] || [ -z "$EVENT_TEXT" ]; then
  echo "Usage: $0 <session-name> <event-text>"
  echo "Example: $0 'agent:researcher:turn' 'New data available in /data/inbox'"
  exit 1
fi

LOG_DIR="$WORKING_DIR/logs"
mkdir -p "$LOG_DIR"

TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
echo "$TIMESTAMP | system-event | INJECT: $SESSION_NAME — $EVENT_TEXT" >> "$LOG_DIR/system-event.log"

cd "$WORKING_DIR"

# Resume the named session and inject the event
OUTPUT=$(claude --resume "$SESSION_NAME" -p "$EVENT_TEXT" \
  --output-format text \
  --max-turns 3 2>&1)

EXIT_CODE=$?
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

if [ "$EXIT_CODE" -eq 0 ]; then
  echo "$TIMESTAMP | system-event | DELIVERED: $SESSION_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/system-event.log"
else
  echo "$TIMESTAMP | system-event | FAILED: $SESSION_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/system-event.log"
fi

scripts/templates/cron/README.md:

# Cron Execution Templates

Two execution types for scheduled agent work, inspired by OpenClaw's cron service.

## agentTurn (agent-turn.sh)

Full background autonomy. Creates/resumes its own named session.

**Use for:**
- Complex multi-step work
- Tasks that need full context and tool access
- Background autonomous operation (proactive agent cycles)

**Features:**
- Named sessions via `--name` for resume capability
- Orphan detection (checks PID before starting new run)
- Session rollover: fresh session after 200 turns or 72 hours
- Context loading from SESSION-STATE.md and HEARTBEAT.md
- Dated log files per agent per day

**Session naming (VERIFIED):**
Uses `--name "agent:<name>:turn"` with `--resume` for continuity.
Note: `--session-id` requires valid UUIDs. Named sessions are the
correct approach for human-readable, resumable agent sessions.

## systemEvent (system-event.sh)

Injects text into an existing session. No new session created.

**Use for:**
- Notifications ("new data available")
- Simple status checks
- Triggering a specific action in a running session

**Limitations:**
- Requires an active named session to resume
- Limited to 3 turns (quick action, not full autonomy)
- If the session doesn't exist, the command fails gracefully

## Scheduling examples

```bash
# agentTurn: run full agent turn every hour
0 * * * * /path/to/agent-turn.sh >> /tmp/agent-turn.log 2>&1

# systemEvent: notify agent of new data every 30 min
*/30 * * * * /path/to/system-event.sh "agent:processor:turn" "Check /data/inbox for new files"

# agentTurn with catchup on reboot
@reboot /path/to/agent-turn.sh >> /tmp/agent-turn.log 2>&1


### Verify

```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh && echo "VALID"

Expected: VALID

On failure

retry — fix bash 3.2 syntax issues, then revert if still failing

Checkpoint

git commit -m "feat(templates): add isolated agentTurn and systemEvent cron templates"

Exit Condition

ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l → 4
ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l → 3 (HEARTBEAT.md, heartbeat-runner.sh, README.md)
ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l → 4
ls /Users/ktg/repos/agent-builder/scripts/templates/cron/ | wc -l → 3
bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh → exit 0
bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh → exit 0
bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh → exit 0
All memory templates contain {{AGENT_NAME}} placeholder: grep -l "AGENT_NAME" /Users/ktg/repos/agent-builder/scripts/templates/memory/*.md | wc -l → 3

Quality Criteria

3-tier memory matches OpenClaw pattern: SESSION-STATE (hot), daily logs (warm), MEMORY (cold)
WAL protocol instructions included in SESSION-STATE template
Working Buffer protocol with 60% danger zone threshold included
Compaction recovery order documented
heartbeat-runner.sh implements emptiness detection (cost saving)
heartbeat-runner.sh implements startup catchup with stagger
heartbeat-runner.sh suppresses ack responses under 300 chars
All shell scripts pass bash -n syntax check
agent-turn.sh uses --name (not custom session IDs per verified assumption)
ADL and VFM scoring rubrics have concrete thresholds and worked examples

36 KiB Raw Blame History

Session 3: OpenClaw Memory and Autonomy Patterns

Dependencies

Scope Fence

Step 9: Create 3-tier memory templates

Files to create

Verify

On failure

Checkpoint

Step 10: Create heartbeat and cron templates

Files to create

State file format

On failure

Checkpoint

Step 11: Create proactive agent templates with ADL/VFM

Files to create

PROACTIVE CYCLE REPORT

Verify

On failure

Checkpoint

Step 12: Create cron templates (agentTurn + systemEvent)

Files to create

On failure

Checkpoint

Exit Condition

Quality Criteria

36 KiB

Raw Blame History