8 session blueprints covering all 27 steps across 3 waves: - Session 1: Foundation (rename + commands, Steps 1-5) - Session 2: Skills and templates (Steps 6-7) - Session 3: OpenClaw patterns (memory/heartbeat/proactive/cron, Steps 9-12) - Session 4: Paperclip patterns (context/goals/budget/governance/org-chart, Steps 14-18) - Session 5: Self-learning (feedback/optimization, Steps 20-21) - Session 6: Integration (Docker/transfer/5 more domains, Steps 22-24) - Session 7: Skill updates (memory/autonomy/orchestration/governance/MCP refs, Steps 13,19,25) - Session 8: Finalization (build command integration + v1.0, Steps 8,26,27) Also updates plan assumptions table with verified findings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36 KiB
Session 3: OpenClaw Memory and Autonomy Patterns
Steps 9, 10, 11, 12 | Wave 1 | Depends on: none
Dependencies
Entry condition: none (independent — creates new template directories only)
Scope Fence
Touch:
scripts/templates/memory/SESSION-STATE.md(new)scripts/templates/memory/DAILY-LOG.md(new)scripts/templates/memory/MEMORY.md(new)scripts/templates/memory/README.md(new)scripts/templates/heartbeat/HEARTBEAT.md(new)scripts/templates/heartbeat/heartbeat-runner.sh(new)scripts/templates/heartbeat/README.md(new)scripts/templates/proactive/PROACTIVE-AGENT.md(new)scripts/templates/proactive/ADL-RULES.md(new)scripts/templates/proactive/VFM-SCORING.md(new)scripts/templates/proactive/README.md(new)scripts/templates/cron/agent-turn.sh(new)scripts/templates/cron/system-event.sh(new)scripts/templates/cron/README.md(new)
Never touch:
commands/agents/skills/scripts/templates/heartbeat/context-packet.md(Session 4)scripts/templates/heartbeat/wake-prompt.md(Session 4).claude-plugin/CLAUDE.md,README.md
Step 9: Create 3-tier memory templates
Files to create
scripts/templates/memory/SESSION-STATE.md:
# Session State: {{AGENT_NAME}}
> Hot working memory. Updated every turn. Read first on resume.
## WAL Protocol
**Write important details HERE before responding to the user.**
This prevents data loss if the session crashes or context compacts mid-response.
## Current Task
- Task: [what you're working on right now]
- Started: [timestamp]
- Status: [in progress / blocked / waiting]
- Key decision: [the most important choice made this session]
## Context Window Usage
- Estimated usage: [low / medium / high / DANGER ZONE]
- If above 60%: activate Working Buffer below
## Active Decisions
| Decision | Choice | Reason | Reversible? |
|----------|--------|--------|-------------|
| | | | |
## Pending Actions
- [ ] [action 1]
- [ ] [action 2]
## Working Buffer
> Activate when context usage exceeds 60%. Capture key exchanges here
> before they are lost to compaction. This is your safety net.
### Recent exchanges to preserve
[Paste critical user messages and your key responses here when in the danger zone]
### Key facts from this session
[Extract and list facts that would be lost on compaction]
## Compaction Recovery
If you're reading this after a context compaction:
1. Read this SESSION-STATE.md first (you're here)
2. Read today's daily log: `memory/$(date +%Y-%m-%d).md`
3. Read memory/MEMORY.md for long-term context
4. Search daily logs for relevant prior context
5. Resume from the Current Task and Pending Actions above
scripts/templates/memory/DAILY-LOG.md:
# Daily Log: {{AGENT_NAME}}
> Warm daily capture. One file per day. Filename: memory/YYYY-MM-DD.md
> Auto-rotated: the pipeline creates a new file each day.
## {{DATE}}
### Summary of Work
[1-3 sentences describing what was accomplished today]
### Decisions Made
| Decision | Context | Outcome |
|----------|---------|---------|
| | | |
### Files Modified
- [file path] — [what changed and why]
### Issues Encountered
- [issue description] — [resolution or status]
### Quality Scores
| Pipeline run | Reviewer score | Notes |
|-------------|---------------|-------|
| | | |
### Carry Forward
> Items for the next session. These get checked on the next pipeline run.
- [ ] [item that needs attention tomorrow]
- [ ] [follow-up from today's work]
### Cost
- Estimated tokens used: [if tracked]
- Pipeline runs: [count]
scripts/templates/memory/MEMORY.md:
# Long-Term Memory: {{AGENT_NAME}}
> Cold curated memory. Updated manually or after significant learnings.
> This is the last file read during compaction recovery.
## Agent Identity
- Name: {{AGENT_NAME}}
- Role: [what this agent does]
- Domain: {{DOMAIN}}
- Created: {{DATE}}
## Key Learnings
> Manually curated from daily logs. Only include insights that affect
> future behavior. Delete entries that are no longer relevant.
- [learning 1 — date discovered]
- [learning 2 — date discovered]
## Recurring Patterns
> Patterns observed across multiple runs. Used to improve agent behavior.
| Pattern | Frequency | Impact | Action taken |
|---------|-----------|--------|-------------|
| | | | |
## Known Issues
> Active issues that affect agent performance.
- [issue] — [workaround] — [status: open/resolved]
## Project Context
> Static context that doesn't change often.
- Project: {{PROJECT_DIR}}
- Pipeline: {{PIPELINE_NAME}}
- Schedule: {{SCHEDULE}}
- Deployment: [target]
## Last Updated
[date — update this when you curate this file]
scripts/templates/memory/README.md:
# 3-Tier Memory System
Inspired by OpenClaw's proactive agent memory pattern. Three tiers serve
different purposes with different update frequencies.
## Architecture
Tier 1: SESSION-STATE.md (hot)
- Updated every turn
- Read FIRST on session start and compaction recovery
- Contains: current task, decisions, pending actions, working buffer
Tier 2: memory/YYYY-MM-DD.md (warm)
- One file per day, auto-rotated
- Updated at end of each pipeline run
- Contains: daily summary, decisions, files modified, issues, carry-forward
Tier 3: memory/MEMORY.md (cold)
- Updated manually or after significant learnings
- Read LAST during compaction recovery
- Contains: identity, curated learnings, patterns, known issues
## WAL Protocol (Write-Ahead Logging)
Before responding to the user with important information, write it to
SESSION-STATE.md first. This prevents data loss if:
- The session crashes mid-response
- Context compaction removes the exchange
- The user's connection drops
## Working Buffer Protocol
When context usage exceeds ~60% (the "danger zone"):
1. Activate the Working Buffer section in SESSION-STATE.md
2. Copy critical recent exchanges into the buffer
3. Extract and list key facts that would be lost on compaction
4. Continue working normally — the buffer is your safety net
## Compaction Recovery
When Claude resumes after context compaction, it reads in this order:
1. SESSION-STATE.md (current task, decisions, working buffer)
2. Today's daily log (what happened today)
3. MEMORY.md (long-term context, known issues)
4. Search older daily logs if needed for specific context
## Integration with Agent Factory
During `/agent-factory:build` Phase 2.5 (Memory Setup):
1. Copy these templates to the user's `memory/` directory
2. Replace `{{PLACEHOLDER}}` variables with project-specific values
3. Create the initial SESSION-STATE.md and MEMORY.md
4. Configure the pipeline skill to update daily logs after each run
## File locations after scaffolding
project/ memory/ SESSION-STATE.md (from Tier 1 template) MEMORY.md (from Tier 3 template) 2026-04-11.md (generated daily, from Tier 2 template) 2026-04-12.md ...
Verify
ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l
Expected: 4
On failure
revert
Checkpoint
git commit -m "feat(templates): add 3-tier memory templates (OpenClaw pattern)"
Step 10: Create heartbeat and cron templates
Files to create
scripts/templates/heartbeat/HEARTBEAT.md:
# Heartbeat: {{AGENT_NAME}}
Read this file on each heartbeat. Follow it strictly. Do not infer or
repeat old tasks from prior chats. If nothing needs attention, reply
HEARTBEAT_OK.
## Tasks
tasks:
- name: {{TASK_1_NAME}}
interval: {{TASK_1_INTERVAL}}
prompt: "{{TASK_1_PROMPT}}"
- name: {{TASK_2_NAME}}
interval: {{TASK_2_INTERVAL}}
prompt: "{{TASK_2_PROMPT}}"
## Context
{{CONTEXT_NOTES}}
## Rules
- Only perform tasks listed above
- Respect the interval — do not run a task before its next due time
- If a task fails, log the error and continue to the next task
- Respond with HEARTBEAT_OK if no tasks are due
scripts/templates/heartbeat/heartbeat-runner.sh:
#!/bin/bash
# Heartbeat runner for Claude Code agents.
# Reads HEARTBEAT.md, checks which tasks are due, invokes claude -p for each.
#
# Bash 3.2 compatible: no associative arrays, no mapfile, no |&
# Uses python3 for all JSON/YAML/date operations.
#
# Usage: ./heartbeat-runner.sh [--catchup]
# --catchup: run missed tasks on first invocation (max 5, 5s stagger)
#
# Placeholders:
# {{AGENT_NAME}} - name of the agent
# {{WORKING_DIR}} - absolute path to project directory
# {{MAX_TURNS}} - max turns per heartbeat (default: 10)
# {{ACK_MAX_CHARS}} - suppress responses shorter than this (default: 300)
AGENT_NAME="{{AGENT_NAME}}"
WORKING_DIR="{{WORKING_DIR}}"
MAX_TURNS="${MAX_TURNS:-10}"
ACK_MAX_CHARS="${ACK_MAX_CHARS:-300}"
HEARTBEAT_FILE="$WORKING_DIR/HEARTBEAT.md"
STATE_FILE="$WORKING_DIR/.heartbeat-state.json"
LOG_DIR="$WORKING_DIR/logs"
CATCHUP_MODE=false
if [ "$1" = "--catchup" ]; then
CATCHUP_MODE=true
fi
# Ensure directories exist
mkdir -p "$LOG_DIR"
# --- Emptiness detection (OpenClaw pattern) ---
# Skip API calls if heartbeat file has only headers/empty items
is_heartbeat_empty() {
python3 << 'PYEOF'
import sys, re
try:
with open("$HEARTBEAT_FILE", "r") as f:
content = f.read()
except FileNotFoundError:
print("true")
sys.exit(0)
# Strip markdown headers, blank lines, and YAML structure markers
stripped = re.sub(r'^#+.*$', '', content, flags=re.MULTILINE)
stripped = re.sub(r'^tasks:\s*$', '', stripped, flags=re.MULTILINE)
stripped = re.sub(r'^\s*-\s*name:\s*\{\{.*\}\}\s*$', '', stripped, flags=re.MULTILINE)
stripped = re.sub(r'^\s*$', '', stripped, flags=re.MULTILINE)
stripped = stripped.strip()
if len(stripped) < 20:
print("true")
else:
print("false")
PYEOF
}
HEARTBEAT_FILE_ACTUAL="$HEARTBEAT_FILE"
EMPTY_CHECK=$(HEARTBEAT_FILE="$HEARTBEAT_FILE_ACTUAL" python3 -c "
import sys, re, os
hf = os.environ.get('HEARTBEAT_FILE', '')
try:
content = open(hf).read()
except:
print('true'); sys.exit(0)
stripped = re.sub(r'^#+.*$', '', content, flags=re.MULTILINE)
stripped = re.sub(r'^\s*$', '', stripped, flags=re.MULTILINE).strip()
print('true' if len(stripped) < 20 else 'false')
" 2>/dev/null)
if [ "$EMPTY_CHECK" = "true" ]; then
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | heartbeat | SKIP (empty heartbeat file)" >> "$LOG_DIR/heartbeat.log"
exit 0
fi
# --- Parse tasks and check due times ---
DUE_TASKS=$(python3 << PYEOF
import json, re, os, time
from datetime import datetime, timedelta
heartbeat_file = "$HEARTBEAT_FILE_ACTUAL"
state_file = "$STATE_FILE"
catchup = "$CATCHUP_MODE" == "true"
# Parse tasks from HEARTBEAT.md
try:
content = open(heartbeat_file).read()
except FileNotFoundError:
print("[]")
exit(0)
# Simple YAML-like task parsing
tasks = []
current_task = {}
for line in content.split('\n'):
line = line.strip()
m_name = re.match(r'-\s*name:\s*(.+)', line)
m_interval = re.match(r'interval:\s*(.+)', line)
m_prompt = re.match(r'prompt:\s*"(.+)"', line)
if m_name:
if current_task.get('name'):
tasks.append(current_task)
current_task = {'name': m_name.group(1).strip()}
elif m_interval and current_task:
current_task['interval'] = m_interval.group(1).strip()
elif m_prompt and current_task:
current_task['prompt'] = m_prompt.group(1).strip()
if current_task.get('name'):
tasks.append(current_task)
# Load state
try:
state = json.load(open(state_file))
except:
state = {}
# Parse interval to seconds
def parse_interval(s):
s = s.strip()
m = re.match(r'(\d+)\s*(m|min|h|hr|d)', s)
if not m:
return 3600 # default 1 hour
val, unit = int(m.group(1)), m.group(2)
if unit in ('m', 'min'):
return val * 60
elif unit in ('h', 'hr'):
return val * 3600
elif unit == 'd':
return val * 86400
return 3600
# Check which tasks are due
now = time.time()
due = []
for task in tasks:
name = task.get('name', '')
interval_sec = parse_interval(task.get('interval', '1h'))
last_run = state.get(name, {}).get('last_run', 0)
if now - last_run >= interval_sec:
due.append(task)
elif catchup and last_run == 0:
due.append(task)
# Limit catchup to 5 tasks
if catchup:
due = due[:5]
print(json.dumps(due))
PYEOF
)
# --- Run due tasks ---
TASK_COUNT=$(echo "$DUE_TASKS" | python3 -c "import sys,json; print(len(json.load(sys.stdin)))" 2>/dev/null)
if [ "$TASK_COUNT" = "0" ]; then
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | heartbeat | HEARTBEAT_OK (no tasks due)" >> "$LOG_DIR/heartbeat.log"
exit 0
fi
echo "$DUE_TASKS" | python3 -c "
import sys, json, subprocess, time, os
tasks = json.load(sys.stdin)
state_file = '$STATE_FILE'
log_dir = '$LOG_DIR'
working_dir = '$WORKING_DIR'
max_turns = '$MAX_TURNS'
ack_max = int('$ACK_MAX_CHARS')
catchup = '$CATCHUP_MODE' == 'true'
# Load state
try:
state = json.load(open(state_file))
except:
state = {}
for i, task in enumerate(tasks):
name = task.get('name', 'unknown')
prompt = task.get('prompt', '')
if not prompt:
continue
# Stagger catchup tasks
if catchup and i > 0:
time.sleep(5)
ts = time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime())
print(f'{ts} | heartbeat | RUNNING: {name}')
try:
result = subprocess.run(
['claude', '-p', prompt, '--output-format', 'text', '--max-turns', str(max_turns)],
capture_output=True, text=True, timeout=600,
cwd=working_dir
)
output = result.stdout.strip()
# Suppress short ack responses (OpenClaw ackMaxChars pattern)
if len(output) <= ack_max and 'HEARTBEAT_OK' in output:
log_line = f'{ts} | heartbeat | {name} | HEARTBEAT_OK (suppressed)'
else:
log_line = f'{ts} | heartbeat | {name} | completed ({len(output)} chars)'
# Save full output
with open(os.path.join(log_dir, f'heartbeat-{name}-{time.strftime(\"%Y-%m-%d\")}.log'), 'a') as f:
f.write(f'--- {ts} ---\n{output}\n\n')
except subprocess.TimeoutExpired:
log_line = f'{ts} | heartbeat | {name} | TIMEOUT'
except Exception as e:
log_line = f'{ts} | heartbeat | {name} | ERROR: {str(e)}'
with open(os.path.join(log_dir, 'heartbeat.log'), 'a') as f:
f.write(log_line + '\n')
# Update state
state[name] = {'last_run': time.time()}
# Save state
with open(state_file, 'w') as f:
json.dump(state, f, indent=2)
"
echo "Heartbeat complete: $TASK_COUNT tasks processed"
scripts/templates/heartbeat/README.md:
# Heartbeat Scheduling
Combines OpenClaw's HEARTBEAT.md task format with Paperclip's interval-based
heartbeat model. Designed for Claude Code agents running on a schedule.
## How it works
1. A scheduler (cron/launchd/systemd) runs `heartbeat-runner.sh` at a fixed
interval (e.g., every 30 minutes)
2. The runner reads `HEARTBEAT.md` for task definitions
3. **Emptiness detection**: if the file has no real tasks, skip the API call
entirely (saves cost — from OpenClaw)
4. For each task: check if it's due based on its interval and last-run time
5. Run due tasks via `claude -p` with the task's prompt
6. Suppress short acknowledgment responses (<300 chars containing HEARTBEAT_OK)
7. Update `.heartbeat-state.json` with last-run timestamps
## Two execution types (from OpenClaw)
### systemEvent
Injects a text event into an existing session. Lightweight, no new session.
Use for: notifications, status checks, simple updates.
Template: `scripts/templates/cron/system-event.sh`
### agentTurn
Fires a full agent turn with its own session. Full context, full tool access.
Use for: background autonomous work, complex tasks, multi-step operations.
Template: `scripts/templates/cron/agent-turn.sh`
## Startup catchup (OpenClaw pattern)
When the runner starts after downtime (e.g., machine was off):
- Run `heartbeat-runner.sh --catchup`
- Processes up to 5 missed tasks
- 5-second stagger between tasks (prevents thundering herd)
## Cost optimization
- **Emptiness detection**: No API call if HEARTBEAT.md has no real content
- **ackMaxChars suppression**: Responses under 300 chars with HEARTBEAT_OK
are logged but not displayed (saves downstream processing)
- **Interval-based**: Only run tasks when actually due, not every heartbeat
## Example cron entries
```bash
# Run heartbeat every 30 minutes
*/30 * * * * /path/to/heartbeat-runner.sh >> /tmp/heartbeat.log 2>&1
# Run heartbeat every hour with catchup on restart
@reboot /path/to/heartbeat-runner.sh --catchup >> /tmp/heartbeat.log 2>&1
0 * * * * /path/to/heartbeat-runner.sh >> /tmp/heartbeat.log 2>&1
State file format
.heartbeat-state.json:
{
"email-check": { "last_run": 1712847600 },
"report-generation": { "last_run": 1712844000 }
}
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh
Expected: no syntax errors (exit 0)
On failure
retry — fix bash 3.2 syntax issues, then revert if still failing
Checkpoint
git commit -m "feat(templates): add heartbeat templates with emptiness detection and catchup"
Step 11: Create proactive agent templates with ADL/VFM
Files to create
scripts/templates/proactive/PROACTIVE-AGENT.md:
---
name: {{AGENT_NAME}}
description: |
A proactive agent that can identify improvements and self-modify within
strict guardrails. Uses ADL (Anti-Drift Limits) and VFM (Value-First
Modification) scoring to prevent uncontrolled drift.
<example>
Context: Agent identifies a recurring inefficiency
user: "Check for improvements"
assistant: "I'll review recent performance data and propose changes via VFM scoring."
<commentary>Proactive improvement cycle triggered by performance review.</commentary>
</example>
model: sonnet
tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash"]
---
## How you work
You are a proactive agent. You don't just respond to tasks — you observe
your environment, identify improvements, and implement changes that pass
VFM scoring.
### Proactive cycle
1. **Observe**: Read performance data (feedback/FEEDBACK.md, audit.log, cost-events.jsonl)
2. **Identify**: Find patterns: recurring errors, slow steps, unnecessary work
3. **Score**: Run VFM scoring on each proposed change (see VFM protocol below)
4. **Implement**: Only changes with VFM score > 50. All others logged but not applied.
5. **Log**: Record every decision (implement or defer) with scores and reasoning
### VFM protocol
Before making ANY change to your own config, skills, prompts, or behavior:
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/VFM-SCORING.md`
2. Score the proposed change across 4 dimensions (0-25 each)
3. If total score > 50: implement and log
4. If total score <= 50: log with reason for deferral, do NOT implement
### Self-healing protocol
When encountering errors:
1. Log the error with full context
2. Try approach 1 (most likely fix based on error message)
3. If fail: try approach 2 (alternative strategy)
4. If fail: try approach 3 (simplified version)
5. Continue up to 5 attempts with increasingly conservative approaches
6. After 5 failures: escalate to human with full attempt log
## Rules (ADL — Anti-Drift Limits)
Read the full ADL at `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/ADL-RULES.md`.
Core constraints:
- **No fake intelligence**: Do not simulate capabilities you lack
- **No unverifiable modifications**: Every change must be testable
- **No novelty over stability**: Prefer proven approaches over clever ones
- **No scope expansion without approval**: Stay within your defined boundaries
- **No silent failures**: All errors must be logged
Priority ordering: Stability > Explainability > Reusability > Scalability > Novelty
## Output format
After each proactive cycle, produce:
PROACTIVE CYCLE REPORT
Date: [timestamp] Observations: [N] patterns found Proposals: [N] changes evaluated
| Proposed change | VFM score | Decision | Reason |
|---|---|---|---|
| [change 1] | [score] | implement/defer | [why] |
| ... |
Implemented: [N] Deferred: [N] Errors handled: [N] (max attempt: [N])
scripts/templates/proactive/ADL-RULES.md:
# Anti-Drift Limits (ADL)
Guardrails that prevent proactive agents from drifting beyond useful behavior.
Inspired by OpenClaw's proactive agent skill.
## Constraints
### 1. No fake intelligence
Do not simulate capabilities you do not have. If you cannot access a tool,
do not pretend the operation succeeded. If you cannot verify a fact, say so.
### 2. No unverifiable modifications
Every change you make must be testable. Before implementing:
- Define how to verify the change worked
- Run the verification after implementation
- Revert if verification fails
### 3. No novelty over stability
When choosing between a clever new approach and a proven existing one,
choose the proven approach unless VFM scoring strongly favors the new one
(score > 75).
### 4. No scope expansion without approval
Your boundaries are defined by your agent file and CLAUDE.md. You may
optimize within those boundaries. You may NOT:
- Add new tools to your own configuration
- Modify other agents' files
- Change system-level settings
- Create new agents or skills
### 5. No silent failures
Every error, every failed attempt, every unexpected result must be logged.
Write to the daily log (memory/YYYY-MM-DD.md) or a dedicated error log.
## Priority Ordering
When constraints conflict, apply this priority:
Stability > Explainability > Reusability > Scalability > Novelty
A stable system that is hard to understand is better than a novel system
that breaks. An explainable system that doesn't scale is better than a
scalable system that nobody can debug.
## When to override ADL
ADL can be overridden ONLY by explicit human instruction. If the user says
"try the new approach even though it's risky," that overrides constraint #3.
Log the override with the user's exact instruction.
Never self-override. The whole point of ADL is to prevent the agent from
convincing itself that an exception is warranted.
scripts/templates/proactive/VFM-SCORING.md:
# Value-First Modification (VFM) Scoring
Scoring rubric for evaluating proposed self-modifications. Any change to
agent config, prompts, behavior, or pipeline structure must score > 50
to be implemented.
## Dimensions
### Frequency (0-25 points)
How often does the issue this change addresses occur?
| Score | Criteria |
|-------|----------|
| 0-5 | Happened once, may not recur |
| 6-10 | Happens occasionally (1-2x per week) |
| 11-15 | Happens regularly (daily) |
| 16-20 | Happens frequently (multiple times per day) |
| 21-25 | Happens on nearly every run |
### Failure Reduction (0-25 points)
Does this change fix real failures?
| Score | Criteria |
|-------|----------|
| 0-5 | Cosmetic improvement, no failures prevented |
| 6-10 | Prevents occasional warnings or non-critical errors |
| 11-15 | Prevents errors that require manual intervention |
| 16-20 | Prevents errors that cause pipeline failure |
| 21-25 | Prevents errors that cause data loss or system damage |
### Burden Reduction (0-25 points)
Does this reduce human effort?
| Score | Criteria |
|-------|----------|
| 0-5 | Saves less than 1 minute per occurrence |
| 6-10 | Saves 1-5 minutes per occurrence |
| 11-15 | Saves 5-30 minutes per occurrence |
| 16-20 | Eliminates a manual step entirely |
| 21-25 | Eliminates multiple manual steps or a recurring task |
### Cost Savings (0-25 points)
Does this reduce API/compute costs?
| Score | Criteria |
|-------|----------|
| 0-5 | Negligible cost difference |
| 6-10 | Saves <10% on affected operations |
| 11-15 | Saves 10-25% on affected operations |
| 16-20 | Saves 25-50% on affected operations |
| 21-25 | Saves >50% or eliminates unnecessary API calls entirely |
## Decision threshold
| Total score | Decision |
|-------------|----------|
| > 50 | **Implement** — change is worth the risk |
| 26-50 | **Defer** — log for future consideration |
| <= 25 | **Reject** — not worth pursuing |
## Logging format
Every VFM evaluation must be logged, whether implemented or not:
VFM EVALUATION Date: [timestamp] Proposed change: [description] Scores: Frequency: [score] — [justification] Failure reduction: [score] — [justification] Burden reduction: [score] — [justification] Cost savings: [score] — [justification] Total: [sum]/100 Decision: implement / defer / reject
## Worked examples
### Example 1: Add retry logic to web search (Implement)
- Frequency: 18 (search fails ~3x daily due to timeouts)
- Failure reduction: 15 (prevents pipeline stall requiring manual restart)
- Burden reduction: 16 (eliminates manual re-run)
- Cost savings: 8 (slight cost from retry, but saves failed run cost)
- **Total: 57 → Implement**
### Example 2: Refactor prompt to use XML tags (Defer)
- Frequency: 25 (every run)
- Failure reduction: 3 (current format works fine)
- Burden reduction: 2 (no human effort saved)
- Cost savings: 5 (maybe slightly fewer tokens)
- **Total: 35 → Defer** (improvement is real but marginal)
### Example 3: Switch to experimental model (Reject)
- Frequency: 25 (every run)
- Failure reduction: 0 (current model has no failures)
- Burden reduction: 0 (no human effort saved)
- Cost savings: 10 (newer model might be cheaper)
- **Total: 35 → Defer** (stability > novelty per ADL)
scripts/templates/proactive/README.md:
# Proactive Agent Pattern
A proactive agent observes its environment, identifies improvements, and
self-modifies within strict guardrails. This pattern is inspired by
OpenClaw's proactive agent skill.
## When to use
- Agents that run frequently and should improve over time
- Pipelines with measurable performance metrics
- Systems where the cost of not improving exceeds the risk of changes
## When NOT to use
- Simple pipelines that just need to run reliably
- Human-in-the-loop workflows (the human provides the feedback)
- New systems that haven't established a performance baseline yet
## Components
- **PROACTIVE-AGENT.md**: Agent template with proactive cycle, VFM protocol, self-healing
- **ADL-RULES.md**: Anti-Drift Limits — constraints that prevent uncontrolled drift
- **VFM-SCORING.md**: Value-First Modification — scoring rubric for proposed changes
## How ADL and VFM work together
ADL defines what the agent CANNOT do (hard boundaries).
VFM determines what the agent SHOULD do (prioritization within boundaries).
Proposed change → Check ADL constraints → BLOCKED if constraint violated → Score with VFM → IMPLEMENT if > 50, DEFER if <= 50 → Log decision either way
## Integration with feedback loops
The proactive agent reads from:
- `feedback/FEEDBACK.md` — pipeline run outcomes
- `budget/cost-events.jsonl` — cost data
- `logs/audit.log` — tool call history
- `memory/MEMORY.md` — long-term patterns
It writes to:
- Daily log (decisions and scores)
- Its own agent file (when implementing approved changes)
- SESSION-STATE.md (current proactive cycle state)
Verify
ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l
Expected: 4
On failure
revert
Checkpoint
git commit -m "feat(templates): add proactive agent templates with ADL/VFM guardrails"
Step 12: Create cron templates (agentTurn + systemEvent)
Files to create
scripts/templates/cron/agent-turn.sh:
#!/bin/bash
# Agent Turn: Full background autonomy for Claude Code agents.
# Fires a complete agent turn with its own named session.
#
# Bash 3.2 compatible. Uses python3 for JSON/date operations.
#
# Placeholders:
# {{AGENT_NAME}} - name of the agent
# {{WORKING_DIR}} - absolute path to project directory
# {{MAX_TURNS}} - max turns per agent turn (default: 15)
AGENT_NAME="{{AGENT_NAME}}"
WORKING_DIR="{{WORKING_DIR}}"
MAX_TURNS="${MAX_TURNS:-15}"
SESSION_NAME="agent:${AGENT_NAME}:turn"
PID_FILE="$WORKING_DIR/.agent-turn-${AGENT_NAME}.pid"
LOG_DIR="$WORKING_DIR/logs"
STATE_FILE="$WORKING_DIR/.agent-turn-state.json"
mkdir -p "$LOG_DIR"
# --- Orphan detection ---
# Check if a previous agent turn is still running
if [ -f "$PID_FILE" ]; then
OLD_PID=$(cat "$PID_FILE" 2>/dev/null)
if [ -n "$OLD_PID" ] && kill -0 "$OLD_PID" 2>/dev/null; then
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | agent-turn | SKIP: previous run still active (PID $OLD_PID)" >> "$LOG_DIR/agent-turn.log"
exit 0
else
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) | agent-turn | Cleaned orphan PID file (PID $OLD_PID)" >> "$LOG_DIR/agent-turn.log"
rm -f "$PID_FILE"
fi
fi
# --- Session rollover check ---
# After 200 turns or 72 hours, start a fresh session
NEEDS_FRESH=$(python3 -c "
import json, time, os
state_file = '$STATE_FILE'
agent = '$AGENT_NAME'
try:
state = json.load(open(state_file))
agent_state = state.get(agent, {})
turns = agent_state.get('turn_count', 0)
started = agent_state.get('session_started', 0)
age_hours = (time.time() - started) / 3600 if started else 999
if turns >= 200 or age_hours >= 72:
print('true')
else:
print('false')
except:
print('true')
" 2>/dev/null)
if [ "$NEEDS_FRESH" = "true" ]; then
# Use a new session name with timestamp for rollover
SESSION_NAME="agent:${AGENT_NAME}:turn:$(date +%s)"
# Reset state
python3 -c "
import json, time, os
state_file = '$STATE_FILE'
agent = '$AGENT_NAME'
try:
state = json.load(open(state_file))
except:
state = {}
state[agent] = {'turn_count': 0, 'session_started': time.time(), 'session_name': '$SESSION_NAME'}
with open(state_file, 'w') as f:
json.dump(state, f, indent=2)
"
fi
# --- Load context ---
# Build prompt from HEARTBEAT.md and SESSION-STATE.md
PROMPT=$(python3 -c "
import os
working_dir = '$WORKING_DIR'
parts = []
# Read SESSION-STATE.md for current context
ss = os.path.join(working_dir, 'memory', 'SESSION-STATE.md') if os.path.isdir(os.path.join(working_dir, 'memory')) else os.path.join(working_dir, 'SESSION-STATE.md')
if os.path.exists(ss):
parts.append('## Current session state:\n' + open(ss).read()[:2000])
# Read HEARTBEAT.md for tasks
hb = os.path.join(working_dir, 'HEARTBEAT.md')
if os.path.exists(hb):
parts.append('## Heartbeat tasks:\n' + open(hb).read()[:2000])
if parts:
print('\n\n'.join(parts))
else:
print('No context files found. Check the working directory and proceed with any pending tasks.')
" 2>/dev/null)
# --- Run agent turn ---
echo $$ > "$PID_FILE"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
echo "$TIMESTAMP | agent-turn | START: $AGENT_NAME (session: $SESSION_NAME)" >> "$LOG_DIR/agent-turn.log"
cd "$WORKING_DIR"
OUTPUT=$(claude -p "$PROMPT" \
--name "$SESSION_NAME" \
--output-format text \
--max-turns "$MAX_TURNS" 2>&1)
EXIT_CODE=$?
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
# Save output to dated log
echo "--- $TIMESTAMP ---" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log"
echo "$OUTPUT" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log"
echo "" >> "$LOG_DIR/agent-turn-${AGENT_NAME}-$(date +%Y-%m-%d).log"
# Update turn count
python3 -c "
import json, time
state_file = '$STATE_FILE'
agent = '$AGENT_NAME'
try:
state = json.load(open(state_file))
except:
state = {}
if agent not in state:
state[agent] = {'turn_count': 0, 'session_started': time.time(), 'session_name': '$SESSION_NAME'}
state[agent]['turn_count'] = state[agent].get('turn_count', 0) + 1
state[agent]['last_run'] = time.time()
with open(state_file, 'w') as f:
json.dump(state, f, indent=2)
"
if [ "$EXIT_CODE" -eq 0 ]; then
echo "$TIMESTAMP | agent-turn | COMPLETE: $AGENT_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/agent-turn.log"
else
echo "$TIMESTAMP | agent-turn | ERROR: $AGENT_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/agent-turn.log"
fi
# Cleanup
rm -f "$PID_FILE"
scripts/templates/cron/system-event.sh:
#!/bin/bash
# System Event: Inject a text event into an existing Claude Code session.
# Lighter than agentTurn — does not create a new session.
#
# Bash 3.2 compatible.
#
# Usage: ./system-event.sh "session-name" "Event text to inject"
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
SESSION_NAME="${1:-}"
EVENT_TEXT="${2:-}"
if [ -z "$SESSION_NAME" ] || [ -z "$EVENT_TEXT" ]; then
echo "Usage: $0 <session-name> <event-text>"
echo "Example: $0 'agent:researcher:turn' 'New data available in /data/inbox'"
exit 1
fi
LOG_DIR="$WORKING_DIR/logs"
mkdir -p "$LOG_DIR"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
echo "$TIMESTAMP | system-event | INJECT: $SESSION_NAME — $EVENT_TEXT" >> "$LOG_DIR/system-event.log"
cd "$WORKING_DIR"
# Resume the named session and inject the event
OUTPUT=$(claude --resume "$SESSION_NAME" -p "$EVENT_TEXT" \
--output-format text \
--max-turns 3 2>&1)
EXIT_CODE=$?
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
if [ "$EXIT_CODE" -eq 0 ]; then
echo "$TIMESTAMP | system-event | DELIVERED: $SESSION_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/system-event.log"
else
echo "$TIMESTAMP | system-event | FAILED: $SESSION_NAME (exit $EXIT_CODE)" >> "$LOG_DIR/system-event.log"
fi
scripts/templates/cron/README.md:
# Cron Execution Templates
Two execution types for scheduled agent work, inspired by OpenClaw's cron service.
## agentTurn (agent-turn.sh)
Full background autonomy. Creates/resumes its own named session.
**Use for:**
- Complex multi-step work
- Tasks that need full context and tool access
- Background autonomous operation (proactive agent cycles)
**Features:**
- Named sessions via `--name` for resume capability
- Orphan detection (checks PID before starting new run)
- Session rollover: fresh session after 200 turns or 72 hours
- Context loading from SESSION-STATE.md and HEARTBEAT.md
- Dated log files per agent per day
**Session naming (VERIFIED):**
Uses `--name "agent:<name>:turn"` with `--resume` for continuity.
Note: `--session-id` requires valid UUIDs. Named sessions are the
correct approach for human-readable, resumable agent sessions.
## systemEvent (system-event.sh)
Injects text into an existing session. No new session created.
**Use for:**
- Notifications ("new data available")
- Simple status checks
- Triggering a specific action in a running session
**Limitations:**
- Requires an active named session to resume
- Limited to 3 turns (quick action, not full autonomy)
- If the session doesn't exist, the command fails gracefully
## Scheduling examples
```bash
# agentTurn: run full agent turn every hour
0 * * * * /path/to/agent-turn.sh >> /tmp/agent-turn.log 2>&1
# systemEvent: notify agent of new data every 30 min
*/30 * * * * /path/to/system-event.sh "agent:processor:turn" "Check /data/inbox for new files"
# agentTurn with catchup on reboot
@reboot /path/to/agent-turn.sh >> /tmp/agent-turn.log 2>&1
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh && echo "VALID"
Expected: VALID
On failure
retry — fix bash 3.2 syntax issues, then revert if still failing
Checkpoint
git commit -m "feat(templates): add isolated agentTurn and systemEvent cron templates"
Exit Condition
ls /Users/ktg/repos/agent-builder/scripts/templates/memory/ | wc -l→ 4ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l→ 3 (HEARTBEAT.md, heartbeat-runner.sh, README.md)ls /Users/ktg/repos/agent-builder/scripts/templates/proactive/ | wc -l→ 4ls /Users/ktg/repos/agent-builder/scripts/templates/cron/ | wc -l→ 3bash -n /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/heartbeat-runner.sh→ exit 0bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/agent-turn.sh→ exit 0bash -n /Users/ktg/repos/agent-builder/scripts/templates/cron/system-event.sh→ exit 0- All memory templates contain
{{AGENT_NAME}}placeholder:grep -l "AGENT_NAME" /Users/ktg/repos/agent-builder/scripts/templates/memory/*.md | wc -l→ 3
Quality Criteria
- 3-tier memory matches OpenClaw pattern: SESSION-STATE (hot), daily logs (warm), MEMORY (cold)
- WAL protocol instructions included in SESSION-STATE template
- Working Buffer protocol with 60% danger zone threshold included
- Compaction recovery order documented
- heartbeat-runner.sh implements emptiness detection (cost saving)
- heartbeat-runner.sh implements startup catchup with stagger
- heartbeat-runner.sh suppresses ack responses under 300 chars
- All shell scripts pass
bash -nsyntax check - agent-turn.sh uses
--name(not custom session IDs per verified assumption) - ADL and VFM scoring rubrics have concrete thresholds and worked examples