docs(plans): create session blueprints for Agent Factory execution

8 session blueprints covering all 27 steps across 3 waves:
- Session 1: Foundation (rename + commands, Steps 1-5)
- Session 2: Skills and templates (Steps 6-7)
- Session 3: OpenClaw patterns (memory/heartbeat/proactive/cron, Steps 9-12)
- Session 4: Paperclip patterns (context/goals/budget/governance/org-chart, Steps 14-18)
- Session 5: Self-learning (feedback/optimization, Steps 20-21)
- Session 6: Integration (Docker/transfer/5 more domains, Steps 22-24)
- Session 7: Skill updates (memory/autonomy/orchestration/governance/MCP refs, Steps 13,19,25)
- Session 8: Finalization (build command integration + v1.0, Steps 8,26,27)

Also updates plan assumptions table with verified findings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Kjell Tore Guttormsen 2026-04-11 11:21:17 +02:00
commit 1a776bdeb2
9 changed files with 8885 additions and 8 deletions

View file

@ -0,0 +1,548 @@
# Session 1: Foundation — Rename and Commands
> Steps 1, 2, 3, 4, 5 | Wave 1 | Depends on: none
## Dependencies
Entry condition: none (first session, clean repo)
## Scope Fence
**Touch:**
- `.claude-plugin/plugin.json`
- `CLAUDE.md`
- `README.md`
- `commands/build.md`
- `commands/deploy.md` (new)
- `commands/evaluate.md` (new)
- `commands/status.md` (new)
- `agents/deployment-advisor.md` (new)
- `skills/agent-system-design/SKILL.md` (rename reference only)
**Never touch:**
- `scripts/templates/*` (except reading existing for reference)
- `skills/managed-agents/`
- `agents/builder.md` (content — only read for format reference)
- Any file in `scripts/templates/memory/`, `scripts/templates/heartbeat/`, etc.
---
## Step 1: Rename plugin from agent-builder to agent-factory
### Files to modify
**`.claude-plugin/plugin.json`** — Replace content with:
```json
{
"name": "agent-factory",
"description": "Build and manage autonomous agent systems with Claude Code. Guided workflow from idea to deployed, self-running agents. Covers all 22 agent capabilities with patterns from OpenClaw and Paperclip.",
"version": "0.2.0",
"author": {
"name": "Kjell Tore Guttormsen",
"url": "https://fromaitochitta.com"
},
"repository": "https://git.fromaitochitta.com/open/agent-factory",
"license": "MIT",
"keywords": [
"agent",
"autonomous",
"pipeline",
"automation",
"hooks",
"security",
"deployment",
"memory",
"heartbeat",
"governance"
]
}
```
**`CLAUDE.md`** — Apply these diffs:
- Line 1: `# Agent Builder Plugin``# Agent Factory Plugin`
- Line 3: `Plugin that helps users build complete autonomous agent systems``Plugin that guides users through building complete autonomous agent systems`
- Line 8: `The `/agent-builder:build` command``The `/agent-factory:build` command`
- All other occurrences of `agent-builder``agent-factory`
**`README.md`** — Full rewrite for v0.2.0 (will be rewritten again in Step 27 for v1.0):
Replace `# Agent Builder` with `# Agent Factory` and update:
- Install command: `/install agent-factory` or `--plugin-dir ./agent-factory`
- All `/agent-builder:` prefixes → `/agent-factory:`
- Repository URL: `https://git.fromaitochitta.com/open/agent-factory`
- Add recommendation: "For the best experience, install via [ktg-plugin-marketplace](https://git.fromaitochitta.com/open/ktg-plugin-marketplace) which includes Agent Factory plus the ultra-suite (ultraplan, ultraresearch, ultraexecute)."
**`commands/build.md`** — Diff:
- Line 7: `You are running `/agent-builder:build`` → `You are running `/agent-factory:build``
**`skills/agent-system-design/SKILL.md`** — Diff:
- Line 106: `Run `/agent-builder:build`` → `Run `/agent-factory:build``
### Verify
```bash
grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan-spec" | grep -v "ultraplan-2026" | grep -v "blueprints/" | grep -v "execution-guide" | wc -l
```
Expected: `0` (all renamed)
### On failure
revert
### Checkpoint
```bash
git commit -m "feat!: rename plugin from agent-builder to agent-factory"
```
---
## Step 2: Create /agent-factory:deploy command
### Files to create
**`commands/deploy.md`**:
```markdown
---
description: Configure deployment for your agent system. Supports /schedule (cloud), Desktop scheduled tasks, cron/launchd, systemd, and Docker.
argument-hint: "Optional: deployment target (schedule, desktop, local, vps, docker)"
allowed-tools: ["Read", "Write", "Edit", "Glob", "Grep", "Bash", "Agent", "AskUserQuestion"]
---
You are running `/agent-factory:deploy` — a guided deployment configuration for autonomous agent systems.
If $ARGUMENTS specifies a target, skip to that target's section. Otherwise, ask the user to choose.
---
## Step 1: Scan existing system
Read the user's agent system:
- Glob for `.claude/agents/*.md` — list all agents with names and models
- Glob for `.claude/skills/*/SKILL.md` and `.claude/skills/*.md` — list all skills
- Read `.claude/settings.json` if it exists — check hook configuration
- Check for existing automation: `automation/`, `HEARTBEAT.md`, `docker-compose.yml`
- Read `CLAUDE.md` for project context
Summarize what was found before proceeding.
---
## Step 2: Choose deployment target
Ask using AskUserQuestion:
"Where should your agents run? Choose a deployment target:
1. **Cloud (/schedule)** — Runs on Anthropic's cloud. Needs GitHub repo. No local file access. 1-hour minimum interval. Best for: PR reviews, CI triage, repo maintenance.
2. **Desktop scheduled tasks** — Runs on your machine via Claude Code Desktop app. Local file access. 1-minute minimum interval. Best for: local automation, file processing.
3. **Local (cron/launchd)** — Traditional scheduler. Runs headless via `claude -p`. Full local access. Best for: personal daily pipelines, development.
4. **VPS (systemd)** — Linux server with systemd service/timer. Always-on. Best for: team pipelines, production workloads.
5. **Docker** — Containerized agent. Portable, isolated. Best for: reproducible deployments, security isolation."
If `$ARGUMENTS` matches a target name, skip this question.
---
## Step 3: Configure chosen target
### Cloud (/schedule)
1. Check GitHub connection: suggest `/web-setup` if not connected
2. Explain: "Cloud tasks clone your repo on each run. Local files are not accessible. MCP connectors provide external service access."
3. Generate a task prompt from the user's pipeline skills
4. Guide the user through `/schedule` to create the task
5. Note minimum 1-hour interval
### Desktop scheduled tasks
1. Explain: "Desktop tasks run on your machine with full local file access."
2. Guide through the Desktop app's Schedule page
3. Configure permission mode for unattended operation
### Local (cron/launchd)
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh`
2. Copy and customize to `automation/run-pipeline.sh`
3. Replace SKILL_NAME with the user's pipeline name
4. Ask: "What schedule? (e.g., daily at 07:00, every 2 hours)"
5. For macOS: read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/launchd.plist`, customize, save to `automation/`
6. For Linux: generate cron entry
7. Provide exact activation commands
### VPS (systemd)
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/systemd-service.unit`
2. Ask: "What Linux user runs the agent?" and "Absolute path to project?"
3. Customize and save service + timer units to `automation/`
4. Copy automation.sh template to `automation/run-pipeline.sh`
5. Provide setup instructions including `systemctl enable/start`
### Docker
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/docker/` templates (created in Session 6)
2. If templates don't exist yet: generate basic Dockerfile + docker-compose.yml inline
3. Customize with project-specific values
4. Save to project root
5. Provide `docker compose up -d` instructions
---
## Step 4: Verify deployment
For each target, provide a verification command:
- Cloud: `/schedule list` → task visible
- Desktop: Check Schedule page in Desktop app
- Local: `crontab -l | grep claude` or `launchctl list | grep claude`
- VPS: `systemctl status claude-agent`
- Docker: `docker compose ps`
Use the `deployment-advisor` agent when the user needs guidance choosing between targets.
---
## Step 5: Next steps
Tell the user:
- "Run `/agent-factory:status` to check your deployment"
- "Run `/agent-factory:evaluate` to assess your system's capability coverage"
```
### Verify
```bash
head -3 /Users/ktg/repos/agent-builder/commands/deploy.md | grep -c "description:"
```
Expected: `1`
### On failure
revert
### Checkpoint
```bash
git commit -m "feat(commands): add /agent-factory:deploy command"
```
---
## Step 3: Create deployment-advisor agent
### Files to create
**`agents/deployment-advisor.md`**:
```markdown
---
name: deployment-advisor
description: |
Use this agent when the user needs help choosing or configuring a deployment target for their agent system.
<example>
Context: User has built agents and wants to deploy
user: "How should I deploy my agent system?"
assistant: "I'll use the deployment-advisor to analyze your setup and recommend a target."
<commentary>
Deployment guidance request triggers the advisor.
</commentary>
</example>
<example>
Context: User wants to switch deployment targets
user: "Can I move my agents from cron to Docker?"
assistant: "I'll use the deployment-advisor to plan the migration."
<commentary>
Deployment migration request triggers the advisor.
</commentary>
</example>
<example>
Context: User asks about cloud vs local deployment
user: "Should I use /schedule or cron for my pipeline?"
assistant: "I'll use the deployment-advisor to compare the options for your use case."
<commentary>
Deployment comparison request triggers the advisor.
</commentary>
</example>
model: sonnet
color: blue
tools: ["Read", "Glob", "Grep", "Bash", "AskUserQuestion"]
---
## How you work
You analyze the user's agent system and recommend a deployment target based on their requirements.
1. Scan the project: `.claude/agents/*.md`, `.claude/skills/`, `.claude/settings.json`, `CLAUDE.md`, `automation/`, `HEARTBEAT.md`
2. Assess requirements by asking targeted questions:
- Does this need to run when your computer is off?
- Do agents need local filesystem access?
- Is this for personal use or a team?
- Any budget constraints for hosting?
- Do agents need Computer Use (browser interaction)?
3. Read the deployment reference at `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/deployment-targets.md`
4. Apply the decision guide from that reference
5. Recommend ONE primary target with clear reasoning
6. Generate the deployment configuration files for the chosen target
## Rules
- Never overwrite existing deployment config without asking the user first
- Always verify generated shell scripts with `bash -n` before saving
- Always include rollback instructions (how to undo the deployment)
- If the user's needs span multiple targets, recommend the simplest one that covers all requirements
- For Docker: always include `security_opt: [no-new-privileges:true]` in docker-compose.yml
- For /schedule (cloud): warn that local files are not accessible — only GitHub repo content
- Never recommend `--dangerously-skip-permissions` outside a Docker container or sandboxed environment
## Output format
```
DEPLOYMENT RECOMMENDATION
========================
Target: [chosen target]
Reason: [why this fits]
Files to create:
- [file 1]: [description]
- [file 2]: [description]
Activation:
[exact commands to activate the deployment]
Verification:
[exact commands to verify it's running]
Rollback:
[exact commands to undo if needed]
```
```
### Verify
```bash
python3 -c "import yaml; yaml.safe_load(open('/Users/ktg/repos/agent-builder/agents/deployment-advisor.md').read().split('---')[1])" 2>&1 && echo "VALID"
```
Expected: `VALID`
### On failure
retry — fix YAML syntax, then revert if still failing
### Checkpoint
```bash
git commit -m "feat(agents): add deployment-advisor agent"
```
---
## Step 4: Create /agent-factory:evaluate command
### Files to create
**`commands/evaluate.md`**:
```markdown
---
description: Evaluate your agent system against the 22 agent capabilities. Shows coverage, gaps, and recommendations.
argument-hint: "Optional: focus area (security, deployment, memory, autonomy)"
allowed-tools: ["Read", "Glob", "Grep", "Bash"]
---
You are running `/agent-factory:evaluate` — a capability assessment for your agent system.
## Step 1: Scan project components
Scan for all agent system components:
- Agents: Glob for `.claude/agents/*.md`
- Pipeline skills: Glob for `.claude/skills/*/SKILL.md`
- Knowledge skills: Glob for `.claude/skills/*.md`
- Hooks: Glob for `.claude/hooks/*.sh` and `hooks/*.sh`
- Settings: Read `.claude/settings.json` if it exists
- Context: Read `CLAUDE.md` if it exists
- Automation: Glob for `automation/*`, `scripts/*.sh`
- Memory: look for `memory/MEMORY.md`, `memory/SESSION-STATE.md`, `data/run-state.json`
- Heartbeat: look for `HEARTBEAT.md`
- Goals: look for `GOALS.md`
- Governance: look for `GOVERNANCE.md`
- Org chart: look for `ORG-CHART.md`
- Budget: look for `budget/BUDGET.md`, `budget/cost-events.jsonl`
- Docker: look for `Dockerfile`, `docker-compose.yml`
## Step 2: Score against 22 capabilities
Read the feature map at `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/feature-map.md`.
For each of the 22 capabilities, check whether the user's project has the corresponding component. Score as:
- **OK** — component exists and is properly configured
- **Partial** — component exists but is incomplete or misconfigured
- **Missing** — component does not exist
## Step 3: Output capability matrix
```
AGENT SYSTEM EVALUATION
=======================
| # | Capability | Status | What exists | What's needed |
|---|-----------|--------|-------------|---------------|
| 1 | Agent Runtime | OK | CLAUDE.md + settings.json | — |
| 2 | Shell Execution | Missing | — | hooks/pre-tool-use.sh + deny list |
...
Score: X/22 OK | Y/22 Partial | Z/22 Missing
```
## Step 4: Recommendations
Provide specific recommendations for filling gaps, ordered by impact:
1. Security gaps first (hooks, permissions)
2. Core functionality gaps (missing agents, skills)
3. Operational gaps (memory, automation, deployment)
4. Advanced gaps (governance, budget, self-learning)
If $ARGUMENTS specifies a focus area, expand that section with detailed guidance and link to relevant templates from `${CLAUDE_PLUGIN_ROOT}/scripts/templates/`.
## Step 5: Next steps
Suggest: "Run `/agent-factory:build` to fill gaps interactively, or `/agent-factory:deploy` to configure deployment."
```
### Verify
```bash
head -3 /Users/ktg/repos/agent-builder/commands/evaluate.md | grep -c "description:"
```
Expected: `1`
### On failure
revert
### Checkpoint
```bash
git commit -m "feat(commands): add /agent-factory:evaluate command"
```
---
## Step 5: Create /agent-factory:status command
### Files to create
**`commands/status.md`**:
```markdown
---
description: Quick status check of your agent infrastructure. Shows agents, skills, hooks, deployment, and recent activity.
argument-hint: ""
allowed-tools: ["Read", "Glob", "Grep", "Bash"]
---
You are running `/agent-factory:status` — a quick health check of your agent infrastructure.
## Scan and report
Run all scans, then output a single compact status report.
### Agents
Glob for `.claude/agents/*.md`. For each agent file:
- Read the frontmatter to extract: name, model, tools count
- List as: `[name] (model: [model], tools: [N])`
- Flag: agents without `<example>` blocks in description (unreliable triggering)
### Skills
Glob for `.claude/skills/*/SKILL.md` and `.claude/skills/*.md`. For each:
- Read frontmatter to extract: name, version (if present)
- List as: `[name] v[version]`
- Flag: skills without version field
### Hooks
Glob for `.claude/hooks/*.sh` and `hooks/*.sh`. For each:
- Check if executable: `test -x [path]`
- List as: `[filename] ([executable/not executable])`
- Flag: hook scripts that are not executable
### Settings
Read `.claude/settings.json` if it exists:
- Count permission allow rules and deny rules
- Count hook configurations
- Summarize: `[N] allow rules, [M] deny rules, [K] hooks configured`
### Deployment
Check for:
- `automation/*.sh` → "Local automation scripts found"
- `automation/*.plist` → "launchd config found"
- `automation/*.service` → "systemd config found"
- `docker-compose.yml` or `Dockerfile` → "Docker config found"
- `HEARTBEAT.md` → "Heartbeat file found"
### Memory
Check for:
- `memory/MEMORY.md` → "Long-term memory: found"
- `memory/SESSION-STATE.md` → "Session state: found"
- `memory/*.md` (daily logs) → "Daily logs: [N] files"
- `data/run-state.json` → "Run state: found"
### Recent activity
If `.claude/hooks/audit.log` or `hooks/audit.log` exists:
- Read last 5 lines
- Display as recent tool call log
### Issues
Compile all flags from above into an issues list:
- "WARNING: [agent] has no example blocks — may not trigger reliably"
- "WARNING: [hook] is not executable — run chmod +x"
- "WARNING: No hooks configured — unattended runs have no guardrails"
## Output format
```
AGENT FACTORY STATUS
====================
Agents: [N] configured
Skills: [N] loaded
Hooks: [N] configured ([M] executable)
Settings: [allow/deny/hooks counts]
Deployment: [target or "not configured"]
Memory: [tiers found]
Issues: [N]
[list if any]
```
```
### Verify
```bash
head -3 /Users/ktg/repos/agent-builder/commands/status.md | grep -c "description:"
```
Expected: `1`
### On failure
revert
### Checkpoint
```bash
git commit -m "feat(commands): add /agent-factory:status command"
```
---
## Exit Condition
- [ ] `grep -r "agent-builder" /Users/ktg/repos/agent-builder/ --include="*.md" --include="*.json" | grep -v ".git/" | grep -v "research/" | grep -v "ultraplan" | grep -v "blueprints/" | grep -v "execution-guide" | wc -l` → 0
- [ ] `ls /Users/ktg/repos/agent-builder/commands/ | sort``build.md deploy.md evaluate.md status.md`
- [ ] `ls /Users/ktg/repos/agent-builder/agents/ | sort``builder.md deployment-advisor.md`
- [ ] `python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['name']=='agent-factory'; assert d['version']=='0.2.0'"` → no error
- [ ] All agent files have valid YAML frontmatter: `find /Users/ktg/repos/agent-builder/agents -name "*.md" -exec python3 -c "import yaml,sys; yaml.safe_load(open(sys.argv[1]).read().split('---')[1])" {} \;` → no errors
## Quality Criteria
- All 4 command files have valid YAML frontmatter with `description:` field
- deployment-advisor.md has 3 `<example>` blocks in description
- deploy.md covers all 5 deployment targets with specific instructions
- evaluate.md references the 22-capability feature map
- status.md scans all 7 component types (agents, skills, hooks, settings, deployment, memory, activity)
- No `agent-builder` references remain in any touched file
- plugin.json version is `0.2.0` and name is `agent-factory`

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,757 @@
# Session 4: Paperclip Orchestration Patterns
> Steps 14, 15, 16, 17, 18 | Wave 1 | Depends on: none
## Dependencies
Entry condition: none (creates new template directories only, plus 2 new files in heartbeat/)
## Scope Fence
**Touch:**
- `scripts/templates/heartbeat/context-packet.md` (new)
- `scripts/templates/heartbeat/wake-prompt.md` (new)
- `scripts/templates/goals/GOALS.md` (new)
- `scripts/templates/goals/goal-tracker.sh` (new)
- `scripts/templates/goals/README.md` (new)
- `scripts/templates/budget/budget-hook.sh` (new)
- `scripts/templates/budget/BUDGET.md` (new)
- `scripts/templates/budget/budget-report.sh` (new)
- `scripts/templates/budget/README.md` (new)
- `scripts/templates/governance/GOVERNANCE.md` (new)
- `scripts/templates/governance/approval-gate.sh` (new)
- `scripts/templates/governance/README.md` (new)
- `scripts/templates/org-chart/ORG-CHART.md` (new)
- `scripts/templates/org-chart/org-manager.sh` (new)
- `scripts/templates/org-chart/README.md` (new)
**Never touch:**
- `commands/`
- `agents/`
- `skills/`
- `scripts/templates/heartbeat/HEARTBEAT.md` (Session 3)
- `scripts/templates/heartbeat/heartbeat-runner.sh` (Session 3)
- `scripts/templates/heartbeat/README.md` (Session 3)
- `scripts/templates/memory/`
- `scripts/templates/proactive/`
- `scripts/templates/cron/`
- `.claude-plugin/`, `CLAUDE.md`, `README.md`
---
## Step 14: Create heartbeat context injection templates
### Files to create
**`scripts/templates/heartbeat/context-packet.md`** — Paperclip's "Memento Man" pattern:
```markdown
# Context Packet: {{AGENT_NAME}}
Generated: {{TIMESTAMP}}
## Identity
{{AGENT_IDENTITY}}
## Current Goals
{{ACTIVE_GOALS}}
## Memory State
{{MEMORY_SUMMARY}}
## Task Queue
{{PENDING_TASKS}}
## Recent Events (last 24h)
{{RECENT_EVENTS}}
## Wake Reason
{{WAKE_REASON}}
## Budget Status
Spent: {{BUDGET_SPENT}} / {{BUDGET_LIMIT}} ({{BUDGET_PERCENT}}%)
{{BUDGET_WARNING}}
```
**`scripts/templates/heartbeat/wake-prompt.md`** — Prompt template for each heartbeat wakeup:
```markdown
You are {{AGENT_NAME}}. You are waking up for a scheduled heartbeat.
Read the context packet below carefully. It contains everything you
need to know about your current state and pending work.
{{CONTEXT_PACKET}}
Your task for this beat:
{{WAKE_REASON}}
Rules:
- Do NOT infer tasks from prior conversations
- Only work on what is in the context packet and wake reason
- If nothing needs attention, respond with HEARTBEAT_OK
- Update SESSION-STATE.md before finishing
```
### Verify
```bash
ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l
```
Expected: `5` (3 from Session 3 + 2 from this step)
### On failure: revert
### Checkpoint
```bash
git commit -m "feat(templates): add context injection templates (Paperclip heartbeat pattern)"
```
---
## Step 15: Create goal hierarchy templates
### Files to create
**`scripts/templates/goals/GOALS.md`**:
```markdown
# Goals: {{PROJECT_NAME}}
## Company Goals
- [G1] {{COMPANY_GOAL_1}}
- [G2] {{COMPANY_GOAL_2}}
## Project Goals
- [G1.1] {{PROJECT_GOAL_1}} (parent: G1)
- [G1.2] {{PROJECT_GOAL_2}} (parent: G1)
- [G2.1] {{PROJECT_GOAL_3}} (parent: G2)
## Task Goals
- [G1.1.1] {{TASK_GOAL_1}} (parent: G1.1, owner: {{AGENT_NAME}}, status: active)
- [G1.1.2] {{TASK_GOAL_2}} (parent: G1.1, owner: {{AGENT_NAME}}, status: pending)
## Notes
Goal IDs use hierarchical dot notation. Each goal has:
- ID: unique identifier (e.g., G1.1.1)
- Description: what the goal is
- Parent: which goal this supports (simple parent reference, not recursive)
- Owner: which agent is responsible (task goals only)
- Status: active | pending | complete | blocked
```
**`scripts/templates/goals/goal-tracker.sh`**:
```bash
#!/bin/bash
# Goal tracker: parse and manage GOALS.md
# Bash 3.2 compatible. Uses python3 for parsing.
#
# Usage:
# ./goal-tracker.sh # Show goal summary
# ./goal-tracker.sh complete G1.1.1 # Mark goal as complete
# ./goal-tracker.sh status # Show status counts
# ./goal-tracker.sh context # Generate context for heartbeat injection
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
GOALS_FILE="$WORKING_DIR/GOALS.md"
ACTION="${1:-summary}"
GOAL_ID="${2:-}"
if [ ! -f "$GOALS_FILE" ]; then
echo "Error: $GOALS_FILE not found"
exit 1
fi
case "$ACTION" in
summary|status)
python3 << PYEOF
import re
goals = []
with open("$GOALS_FILE") as f:
for line in f:
m = re.match(r'-\s*\[(\S+)\]\s+(.+)', line.strip())
if m:
gid = m.group(1)
rest = m.group(2)
status_m = re.search(r'status:\s*(\w+)', rest)
parent_m = re.search(r'parent:\s*(\S+)', rest)
owner_m = re.search(r'owner:\s*(\S+)', rest)
status = status_m.group(1) if status_m else 'active'
parent = parent_m.group(1).rstrip(',)') if parent_m else None
owner = owner_m.group(1).rstrip(',)') if owner_m else None
desc = re.sub(r'\(.*\)', '', rest).strip()
goals.append({'id': gid, 'desc': desc, 'status': status, 'parent': parent, 'owner': owner})
# Status counts
counts = {}
for g in goals:
counts[g['status']] = counts.get(g['status'], 0) + 1
print("Goal Status Summary")
print("=" * 40)
for status, count in sorted(counts.items()):
print(f" {status}: {count}")
print(f" Total: {len(goals)}")
# Check for orphans
all_ids = set(g['id'] for g in goals)
orphans = [g for g in goals if g['parent'] and g['parent'] not in all_ids]
if orphans:
print(f"\nOrphaned goals (parent not found):")
for g in orphans:
print(f" [{g['id']}] parent: {g['parent']}")
# Goals without owners at task level
unowned = [g for g in goals if '.' in g['id'] and g['id'].count('.') >= 2 and not g['owner']]
if unowned:
print(f"\nTask goals without owners:")
for g in unowned:
print(f" [{g['id']}] {g['desc']}")
PYEOF
;;
complete)
if [ -z "$GOAL_ID" ]; then
echo "Usage: $0 complete <goal-id>"
exit 1
fi
python3 -c "
import re
goal_id = '$GOAL_ID'
with open('$GOALS_FILE') as f:
content = f.read()
# Replace status for the specific goal
pattern = r'(\[' + re.escape(goal_id) + r'\][^)]*status:\s*)\w+'
if re.search(pattern, content):
content = re.sub(pattern, r'\1complete', content)
with open('$GOALS_FILE', 'w') as f:
f.write(content)
print(f'Goal {goal_id} marked as complete')
else:
print(f'Goal {goal_id} not found or has no status field')
"
;;
context)
# Generate a goal summary for heartbeat context injection
python3 << PYEOF
import re
goals = []
with open("$GOALS_FILE") as f:
for line in f:
m = re.match(r'-\s*\[(\S+)\]\s+(.+)', line.strip())
if m:
gid = m.group(1)
rest = m.group(2)
status_m = re.search(r'status:\s*(\w+)', rest)
status = status_m.group(1) if status_m else 'active'
desc = re.sub(r'\(.*\)', '', rest).strip()
goals.append({'id': gid, 'desc': desc, 'status': status})
active = [g for g in goals if g['status'] == 'active']
if active:
print("Active goals:")
for g in active:
print(f" [{g['id']}] {g['desc']}")
else:
print("No active goals.")
PYEOF
;;
*)
echo "Usage: $0 [summary|complete <id>|status|context]"
exit 1
;;
esac
```
**`scripts/templates/goals/README.md`**:
```markdown
# Goal Hierarchy
File-based goal hierarchy inspired by Paperclip's goal system.
## Design decisions
- **Simple parent_id, not recursive**: Each goal references its parent by ID.
No recursive traversal at runtime — matching Paperclip's actual implementation
(not their aspirational docs which describe "full ancestry").
- **Dot notation for hierarchy**: G1 → G1.1 → G1.1.1 makes the hierarchy visible
in the ID itself.
- **File-based, not database**: Human-editable, version-controlled, no dependencies.
## Goal levels
| Level | ID pattern | Example | Description |
|-------|-----------|---------|-------------|
| Company | G1, G2 | G1 | Top-level organizational goals |
| Project | G1.1, G1.2 | G1.1 | Goals that support a company goal |
| Task | G1.1.1 | G1.1.1 | Specific tasks assigned to agents |
## Usage
```bash
# View goal summary
./goal-tracker.sh
# Mark a task as complete
./goal-tracker.sh complete G1.1.1
# Generate context for heartbeat injection
./goal-tracker.sh context
```
## Integration with heartbeat
The `context` command produces a summary suitable for injection into
the heartbeat context packet (see `scripts/templates/heartbeat/context-packet.md`).
The heartbeat runner can call `./goal-tracker.sh context` and inject
the output as `{{ACTIVE_GOALS}}`.
```
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/goals/goal-tracker.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add goal hierarchy templates (Paperclip pattern)"
```
---
## Step 16: Create budget tracking templates
### Files to create
**`scripts/templates/budget/BUDGET.md`**:
```markdown
# Budget Policy: {{PROJECT_NAME}}
## Company Budget
- window: {{BUDGET_WINDOW}}
- limit: {{BUDGET_LIMIT_CENTS}} cents
- warn_percent: 80
- hard_stop: true
## Agent Budgets
- {{AGENT_NAME}}: {{AGENT_BUDGET_CENTS}} cents/{{BUDGET_WINDOW}}
## Notification
- on_warn: log
- on_hard_stop: pause
## Notes
Budget enforcement is POST-HOC (checked after each run, not before).
This matches Paperclip's proven approach: check SUM(cost) after run,
pause if exceeded. No pre-run reservation needed.
Cost estimation uses token counts × published pricing. For accurate
cost data, organizations can use the Admin API:
`/v1/organizations/cost_report` (requires Admin API key: sk-ant-admin...).
For headless runs, use `claude -p --max-budget-usd N` as a per-run cap.
```
**`scripts/templates/budget/budget-hook.sh`**:
```bash
#!/bin/bash
# PostToolUse hook: Log cost events and enforce budget.
# Bash 3.2 compatible. Uses python3 for JSON parsing.
#
# Follows Paperclip's post-hoc enforcement pattern:
# 1. Log cost event after each tool call
# 2. Check cumulative cost against budget policy
# 3. Warn at soft threshold, pause at hard threshold
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
BUDGET_DIR="$WORKING_DIR/budget"
COST_LOG="$BUDGET_DIR/cost-events.jsonl"
BUDGET_FILE="$WORKING_DIR/BUDGET.md"
PAUSED_FLAG="$BUDGET_DIR/PAUSED"
mkdir -p "$BUDGET_DIR"
# Read hook input
INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_name',''))" 2>/dev/null)
# Log cost event
python3 << PYEOF
import json, sys, time, os
try:
data = json.loads('''$INPUT''')
except:
sys.exit(0)
tool_name = data.get('tool_name', '')
# Estimate cost from token counts if available in tool result
# This is a rough estimate; actual costs come from the Admin API
event = {
'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
'tool_name': tool_name,
'agent': os.environ.get('AGENT_NAME', 'unknown'),
'estimated_tokens': 0
}
cost_log = "$COST_LOG"
with open(cost_log, 'a') as f:
f.write(json.dumps(event) + '\n')
PYEOF
# Check budget if BUDGET.md exists
if [ -f "$BUDGET_FILE" ] && [ -f "$COST_LOG" ]; then
BUDGET_CHECK=$(python3 << 'PYEOF'
import re, json, os
budget_file = os.environ.get('BUDGET_FILE', '')
cost_log = os.environ.get('COST_LOG', '')
paused_flag = os.environ.get('PAUSED_FLAG', '')
if not budget_file or not cost_log:
print("ok")
exit(0)
# Parse budget
try:
content = open(budget_file).read()
limit_m = re.search(r'limit:\s*(\d+)\s*cents', content)
warn_m = re.search(r'warn_percent:\s*(\d+)', content)
hard_m = re.search(r'hard_stop:\s*(\w+)', content)
if not limit_m:
print("ok")
exit(0)
limit = int(limit_m.group(1))
warn_pct = int(warn_m.group(1)) if warn_m else 80
hard_stop = hard_m.group(1).lower() == 'true' if hard_m else True
except:
print("ok")
exit(0)
# Sum cost events (count events as rough proxy — actual cost tracking
# requires Admin API or token counting from responses)
try:
event_count = sum(1 for _ in open(cost_log))
except:
event_count = 0
# Rough estimate: each event ~ 1 cent (placeholder — customize per model)
estimated_cents = event_count
pct = (estimated_cents / limit * 100) if limit > 0 else 0
if pct >= 100 and hard_stop:
open(paused_flag, 'w').write(f'Budget exceeded: {estimated_cents}/{limit} cents')
print("hard_stop")
elif pct >= warn_pct:
print("warn")
else:
print("ok")
PYEOF
)
BUDGET_FILE="$BUDGET_FILE" COST_LOG="$COST_LOG" PAUSED_FLAG="$PAUSED_FLAG" \
BUDGET_RESULT=$(BUDGET_FILE="$BUDGET_FILE" COST_LOG="$COST_LOG" PAUSED_FLAG="$PAUSED_FLAG" python3 -c "
import re, json, os
budget_file = '$BUDGET_FILE'
cost_log = '$COST_LOG'
paused_flag = '$PAUSED_FLAG'
try:
content = open(budget_file).read()
limit_m = re.search(r'limit:\s*(\d+)\s*cents', content)
if not limit_m: print('ok'); exit(0)
limit = int(limit_m.group(1))
warn_m = re.search(r'warn_percent:\s*(\d+)', content)
warn_pct = int(warn_m.group(1)) if warn_m else 80
hard_m = re.search(r'hard_stop:\s*(\w+)', content)
hard_stop = hard_m.group(1).lower() == 'true' if hard_m else True
event_count = sum(1 for _ in open(cost_log))
estimated_cents = event_count
pct = (estimated_cents / limit * 100) if limit > 0 else 0
if pct >= 100 and hard_stop:
open(paused_flag, 'w').write(f'Budget exceeded: {estimated_cents}/{limit} cents')
print('hard_stop')
elif pct >= warn_pct:
print('warn')
else:
print('ok')
except Exception as e:
print('ok')
" 2>/dev/null)
if [ "$BUDGET_RESULT" = "hard_stop" ]; then
echo "BUDGET EXCEEDED — agent paused. Check $PAUSED_FLAG" >&2
elif [ "$BUDGET_RESULT" = "warn" ]; then
echo "BUDGET WARNING — approaching limit" >&2
fi
fi
# Check if agent is paused
if [ -f "$PAUSED_FLAG" ]; then
echo '{"decision": "block", "reason": "Agent paused: budget exceeded. Remove '"$PAUSED_FLAG"' to resume."}'
exit 2
fi
exit 0
```
**`scripts/templates/budget/budget-report.sh`**:
```bash
#!/bin/bash
# Budget report: summarize cost events and compare against policy.
# Bash 3.2 compatible. Uses python3 for aggregation.
#
# Usage: ./budget-report.sh
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
COST_LOG="$WORKING_DIR/budget/cost-events.jsonl"
BUDGET_FILE="$WORKING_DIR/BUDGET.md"
PAUSED_FLAG="$WORKING_DIR/budget/PAUSED"
if [ ! -f "$COST_LOG" ]; then
echo "No cost events recorded yet."
exit 0
fi
python3 << PYEOF
import json, re, os
from collections import defaultdict
cost_log = "$COST_LOG"
budget_file = "$BUDGET_FILE"
paused_flag = "$PAUSED_FLAG"
# Read events
events = []
with open(cost_log) as f:
for line in f:
line = line.strip()
if line:
try:
events.append(json.loads(line))
except:
pass
# Aggregate
by_agent = defaultdict(int)
by_day = defaultdict(int)
by_tool = defaultdict(int)
for e in events:
agent = e.get('agent', 'unknown')
day = e.get('timestamp', '')[:10]
tool = e.get('tool_name', 'unknown')
by_agent[agent] += 1
by_day[day] += 1
by_tool[tool] += 1
print("BUDGET REPORT")
print("=" * 50)
print(f"Total events: {len(events)}")
print()
# Per-agent breakdown
print("By Agent:")
for agent, count in sorted(by_agent.items(), key=lambda x: -x[1]):
print(f" {agent}: {count} events")
print()
# Per-day breakdown (last 7 days)
print("By Day (last 7):")
for day, count in sorted(by_day.items())[-7:]:
print(f" {day}: {count} events")
print()
# Budget comparison
if os.path.exists(budget_file):
content = open(budget_file).read()
limit_m = re.search(r'limit:\s*(\d+)\s*cents', content)
if limit_m:
limit = int(limit_m.group(1))
est_cents = len(events) # rough proxy
pct = (est_cents / limit * 100) if limit > 0 else 0
print(f"Budget: ~{est_cents}/{limit} cents ({pct:.0f}%)")
# Paused status
if os.path.exists(paused_flag):
print(f"\n!! AGENT PAUSED: {open(paused_flag).read().strip()}")
print(f" Remove {paused_flag} to resume")
PYEOF
```
**`scripts/templates/budget/README.md`**:
```markdown
# Budget Tracking
Post-hoc budget enforcement inspired by Paperclip's budget system.
## How it works
1. `budget-hook.sh` runs as a PostToolUse hook after every tool call
2. Each call is logged to `budget/cost-events.jsonl`
3. After logging, cumulative cost is compared against `BUDGET.md` policy
4. If soft threshold (default 80%) exceeded: warning to stderr
5. If hard threshold (100%) exceeded and hard_stop=true: creates `budget/PAUSED`
flag file, subsequent tool calls are blocked (exit 2)
## Why post-hoc, not pre-run?
Paperclip uses the same approach. Pre-run budget reservation requires a
persistent service or lock file coordination. Post-hoc checking is simpler
and robust enough in practice — the worst case is one extra run before pause.
## Cost estimation
The current implementation counts events as a rough proxy for cost. For
accurate cost tracking, you have two options:
1. **Admin API** (org accounts only): Query `/v1/organizations/cost_report`
with an Admin API key (`sk-ant-admin...`). This gives actual USD costs.
2. **Token estimation**: Parse token counts from Claude's responses and
multiply by published per-token prices. More accurate than event counting
but still an estimate.
For headless runs, `claude -p --max-budget-usd N` provides a per-run
budget cap directly in the CLI.
## Integration
Add to `.claude/settings.json`:
```json
{
"hooks": {
"PostToolUse": [{
"matcher": "*",
"hooks": [{"type": "command", "command": "bash budget/budget-hook.sh"}]
}]
}
}
```
```
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-hook.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/budget/budget-report.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add budget tracking templates (Paperclip pattern)"
```
---
## Step 17: Create governance and approval gate templates
### Files to create
**`scripts/templates/governance/GOVERNANCE.md`** — See plan Step 17 for full content. Key sections:
- Autonomy Levels (0-4 scale from full manual to full autonomy)
- Approval Gates with `{{GATE_NAME}}` and `{{GATE_CONDITION}}` placeholders
- Escalation Rules (budget exceeded, error threshold, unknown tool, scope violation)
- Audit Requirements (tool calls, budget events, approvals, retention)
**`scripts/templates/governance/approval-gate.sh`** — PreToolUse hook implementing approval gates. Bash 3.2 compatible. Key behavior:
1. Reads GOVERNANCE.md for current autonomy level
2. Based on level, auto-approves or requires approval
3. For gated operations: writes request to `governance/pending-approvals.jsonl`
4. Checks `governance/approval-responses.jsonl` for matching response
5. No response within timeout → block (exit 2)
**`scripts/templates/governance/README.md`** — Explains the governance model, autonomy levels, Paperclip's "autonomy is a privilege" philosophy.
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/governance/approval-gate.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add governance and approval gate templates (Paperclip pattern)"
```
---
## Step 18: Create org-chart template
### Files to create
**`scripts/templates/org-chart/ORG-CHART.md`** — Markdown table with columns: Agent, Role, Reports To, Status, Budget. Uses `reportsTo` pattern from Paperclip. Includes delegation rules and human override section.
**`scripts/templates/org-chart/org-manager.sh`** — Bash 3.2 script that:
- Parses ORG-CHART.md table with python3
- Validates: agents exist in `.claude/agents/`, no circular chains
- Can add/remove agents from the chart
- Generates text-based org tree visualization
**`scripts/templates/org-chart/README.md`** — Explains the simple `reportsTo` pattern, delegation flows, cross-team routing.
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/org-chart/org-manager.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add org-chart template (Paperclip pattern)"
```
---
## Exit Condition
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/heartbeat/ | wc -l` → 5
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/goals/ | wc -l` → 3
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/budget/ | wc -l` → 4
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/governance/ | wc -l` → 3
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/org-chart/ | wc -l` → 3
- [ ] All shell scripts pass `bash -n`: `find /Users/ktg/repos/agent-builder/scripts/templates/goals /Users/ktg/repos/agent-builder/scripts/templates/budget /Users/ktg/repos/agent-builder/scripts/templates/governance /Users/ktg/repos/agent-builder/scripts/templates/org-chart -name "*.sh" -exec bash -n {} \;` → no errors
- [ ] context-packet.md contains `{{WAKE_REASON}}` placeholder
- [ ] budget-hook.sh contains reference to PAUSED flag file
## Quality Criteria
- Context packet follows Paperclip's "Memento Man" pattern with all sections
- Wake prompt includes rules about not inferring from prior conversations
- Goal hierarchy uses simple parent_id (not recursive) matching Paperclip's actual code
- Budget enforcement is post-hoc matching Paperclip's pattern
- Budget README honestly documents Admin API limitation (org-only, admin key)
- Governance has 5 autonomy levels (0-4) with clear descriptions
- Org chart uses simple reportsTo pattern with human override authority
- All bash scripts are 3.2 compatible

View file

@ -0,0 +1,997 @@
# Session 5: Self-Learning Systems
> Steps 20, 21 | Wave 1 | Depends on: none
## Dependencies
Entry condition: none (independent — creates new template directories only)
## Scope Fence
**Touch:**
- `scripts/templates/feedback/FEEDBACK.md` (new)
- `scripts/templates/feedback/feedback-collector.sh` (new)
- `scripts/templates/feedback/performance-scorer.sh` (new)
- `scripts/templates/feedback/README.md` (new)
- `scripts/templates/optimization/pipeline-optimizer.sh` (new)
- `scripts/templates/optimization/self-healing.sh` (new)
- `scripts/templates/optimization/README.md` (new)
**Never touch:**
- `commands/`
- `agents/`
- `skills/`
- `scripts/templates/heartbeat/`
- `scripts/templates/memory/`
- `scripts/templates/proactive/`
- `scripts/templates/cron/`
- `scripts/templates/goals/`
- `scripts/templates/budget/`
- `scripts/templates/governance/`
- `scripts/templates/org-chart/`
- `.claude-plugin/`, `CLAUDE.md`, `README.md`
---
## Step 20: Create feedback loop templates
### Files to create
**`scripts/templates/feedback/FEEDBACK.md`** — Feedback tracking file:
```markdown
# Feedback Log: {{PROJECT_NAME}}
> Append-only. One row per pipeline run. Reviewed by performance-scorer.sh.
## Feedback Table
| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |
|------|----------|-------|-------|-------|------------|---------|
| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} |
## Pattern Tags
Use consistent tags so performance-scorer.sh can detect recurring issues:
- `quality-low` — output below acceptance threshold
- `loop-excess` — more revision iterations than expected
- `timeout` — agent exceeded time budget
- `tool-fail` — tool call failed or returned unexpected result
- `cost-spike` — single run cost exceeded 3x average
- `scope-drift` — agent worked outside defined scope
- `hallucination` — output contained factual errors
## Notes
Scores are 0100 as assigned by the reviewer agent or human reviewer.
A score below 60 triggers a flag in performance-scorer.sh.
Three or more rows with the same Pattern tag = recurring issue.
Recurring issues should drive prompt iteration or pipeline redesign.
```
**`scripts/templates/feedback/feedback-collector.sh`** — PostToolUse hook variant that appends feedback after pipeline completion:
```bash
#!/bin/bash
# PostToolUse hook: Collect feedback after pipeline completion.
# Bash 3.2 compatible. Uses python3 for JSON parsing and CSV/MD append.
#
# Triggered after a designated "review" tool call completes.
# Reads pipeline output and reviewer score, appends to FEEDBACK.md,
# and detects recurring patterns (3+ rows with same tag = recurring).
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
# {{PIPELINE_NAME}} - name of the pipeline being tracked
# {{SCORE_THRESHOLD}} - minimum acceptable score (default: 60)
WORKING_DIR="{{WORKING_DIR}}"
PIPELINE_NAME="{{PIPELINE_NAME}}"
SCORE_THRESHOLD="${SCORE_THRESHOLD:-60}"
FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md"
HOOK_INPUT=$(cat)
# Only act on review tool calls
TOOL_NAME=$(echo "$HOOK_INPUT" | python3 -c "
import sys, json
try:
data = json.load(sys.stdin)
print(data.get('tool_name', ''))
except:
print('')
" 2>/dev/null)
if [ "$TOOL_NAME" != "review_pipeline" ] && [ "$TOOL_NAME" != "score_output" ]; then
exit 0
fi
# Extract score, agent, issue, resolution, pattern from hook input
python3 << PYEOF
import sys, json, re, os
from datetime import datetime
hook_input = """$HOOK_INPUT"""
feedback_file = "$FEEDBACK_FILE"
pipeline_name = "$PIPELINE_NAME"
score_threshold = int("$SCORE_THRESHOLD")
try:
data = json.loads(hook_input)
except Exception:
sys.exit(0)
tool_result = data.get('tool_result', '')
if isinstance(tool_result, dict):
tool_result = json.dumps(tool_result)
# Parse structured fields from tool result (expects JSON or key:value)
agent_name = os.environ.get('AGENT_NAME', 'unknown')
score = 0
issue = ''
resolution = ''
pattern = ''
try:
result_data = json.loads(tool_result)
agent_name = result_data.get('agent', agent_name)
score = int(result_data.get('score', 0))
issue = result_data.get('issue', '')
resolution = result_data.get('resolution', '')
pattern = result_data.get('pattern', '')
except Exception:
# Fallback: look for score: N in plain text
m = re.search(r'score[:\s]+(\d+)', tool_result, re.IGNORECASE)
if m:
score = int(m.group(1))
m = re.search(r'pattern[:\s]+(\S+)', tool_result, re.IGNORECASE)
if m:
pattern = m.group(1)
if score == 0 and not issue:
sys.exit(0)
date_str = datetime.utcnow().strftime('%Y-%m-%d')
row = f"| {date_str} | {pipeline_name} | {agent_name} | {score}/100 | {issue} | {resolution} | {pattern} |"
# Append to feedback table
if not os.path.exists(feedback_file):
print(f"Warning: {feedback_file} not found — skipping feedback append")
sys.exit(0)
with open(feedback_file, 'r') as f:
content = f.read()
# Insert row after the header row of the table
table_header = '| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |'
separator = '|------|----------|-------|-------|-------|------------|---------|'
placeholder_row = '| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} |'
if placeholder_row in content:
# Replace placeholder with real row + keep placeholder for next time
content = content.replace(placeholder_row, row + '\n' + placeholder_row)
elif separator in content:
content = content.replace(separator, separator + '\n' + row)
else:
content += '\n' + row + '\n'
with open(feedback_file, 'w') as f:
f.write(content)
print(f"Feedback recorded: score={score}, pattern={pattern}")
# Detect recurring patterns
if pattern:
pattern_count = content.count(f'| {pattern} |')
if pattern_count >= 3:
print(f"RECURRING PATTERN DETECTED: '{pattern}' appears {pattern_count} times")
print(f"Action required: review prompt or pipeline for '{pipeline_name}'")
# Flag low scores
if score < score_threshold and score > 0:
print(f"LOW SCORE: {score} < threshold {score_threshold} for agent {agent_name}")
PYEOF
exit 0
```
**`scripts/templates/feedback/performance-scorer.sh`** — Standalone scoring script that reads FEEDBACK.md and cost-events.jsonl:
```bash
#!/bin/bash
# Performance scorer: per-agent metrics from FEEDBACK.md + cost-events.jsonl.
# Bash 3.2 compatible. Uses python3 for all metrics computation.
#
# Metrics per agent:
# - Average score (0-100)
# - Error rate (rows with score < threshold / total rows)
# - Cost per run (from cost-events.jsonl, rough proxy)
# - Improvement trend: avg of last 10 scores vs. previous 10
#
# Flags agents below threshold (default 60/100).
#
# Usage:
# ./performance-scorer.sh # Score all agents
# ./performance-scorer.sh --agent {{AGENT}} # Score specific agent
# ./performance-scorer.sh --threshold 70 # Custom threshold
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md"
COST_LOG="$WORKING_DIR/budget/cost-events.jsonl"
THRESHOLD="${2:-60}"
AGENT_FILTER=""
# Parse arguments (bash 3.2 compatible — no associative arrays)
while [ "$#" -gt 0 ]; do
case "$1" in
--agent) AGENT_FILTER="$2"; shift 2 ;;
--threshold) THRESHOLD="$2"; shift 2 ;;
*) shift ;;
esac
done
if [ ! -f "$FEEDBACK_FILE" ]; then
echo "No feedback file found at $FEEDBACK_FILE"
exit 0
fi
python3 << PYEOF
import re, json, os, sys
from collections import defaultdict
feedback_file = "$FEEDBACK_FILE"
cost_log = "$COST_LOG"
threshold = int("$THRESHOLD")
agent_filter = "$AGENT_FILTER"
# Parse FEEDBACK.md table rows
# Expected columns: Date, Pipeline, Agent, Score, Issue, Resolution, Pattern
feedback_rows = []
with open(feedback_file) as f:
in_table = False
header_seen = False
for line in f:
line = line.strip()
if '| Date |' in line:
in_table = True
header_seen = True
continue
if in_table and line.startswith('|---'):
continue
if in_table and line.startswith('|') and '{{' not in line and header_seen:
cols = [c.strip() for c in line.strip('|').split('|')]
if len(cols) >= 7:
try:
date = cols[0]
pipeline = cols[1]
agent = cols[2]
score_str = cols[3]
issue = cols[4]
resolution = cols[5]
pattern = cols[6]
# Parse score: "75/100" or "75"
score_m = re.match(r'(\d+)', score_str)
score = int(score_m.group(1)) if score_m else 0
feedback_rows.append({
'date': date,
'pipeline': pipeline,
'agent': agent,
'score': score,
'issue': issue,
'pattern': pattern
})
except (ValueError, IndexError):
pass
# Filter by agent if specified
if agent_filter:
feedback_rows = [r for r in feedback_rows if r['agent'] == agent_filter]
if not feedback_rows:
print("No feedback rows found.")
sys.exit(0)
# Read cost events if available
cost_by_agent = defaultdict(int)
if os.path.exists(cost_log):
with open(cost_log) as f:
for line in f:
line = line.strip()
if line:
try:
event = json.loads(line)
agent = event.get('agent', 'unknown')
cost_by_agent[agent] += 1 # event count as proxy
except Exception:
pass
# Compute per-agent metrics
agents = list(set(r['agent'] for r in feedback_rows))
print("PERFORMANCE SCORECARD")
print("=" * 60)
print(f"Threshold: {threshold}/100")
print(f"Total feedback rows: {len(feedback_rows)}")
print()
flagged = []
for agent in sorted(agents):
rows = [r for r in feedback_rows if r['agent'] == agent]
scores = [r['score'] for r in rows]
avg_score = sum(scores) / len(scores) if scores else 0
error_rate = len([s for s in scores if s < threshold]) / len(scores) if scores else 0
cost_events = cost_by_agent.get(agent, 0)
cost_per_run = cost_events / len(rows) if rows else 0
# Improvement trend: last 10 vs. prev 10
trend_str = "n/a (fewer than 20 runs)"
if len(scores) >= 20:
prev10 = scores[-20:-10]
last10 = scores[-10:]
prev_avg = sum(prev10) / len(prev10)
last_avg = sum(last10) / len(last10)
delta = last_avg - prev_avg
if delta > 5:
trend_str = f"improving (+{delta:.1f})"
elif delta < -5:
trend_str = f"declining ({delta:.1f})"
else:
trend_str = f"stable ({delta:+.1f})"
elif len(scores) >= 10:
last10 = scores[-10:]
trend_str = f"recent avg: {sum(last10)/len(last10):.1f} (need 20 runs for trend)"
# Pattern frequency
patterns = defaultdict(int)
for r in rows:
if r['pattern']:
patterns[r['pattern']] += 1
top_patterns = sorted(patterns.items(), key=lambda x: -x[1])[:3]
print(f"Agent: {agent}")
print(f" Runs: {len(rows)}")
print(f" Avg score: {avg_score:.1f}/100")
print(f" Error rate: {error_rate*100:.0f}% (score < {threshold})")
print(f" Cost/run: ~{cost_per_run:.1f} events (rough proxy)")
print(f" Trend: {trend_str}")
if top_patterns:
print(f" Top patterns: {', '.join(f'{p}({c})' for p, c in top_patterns)}")
print()
if avg_score < threshold:
flagged.append((agent, avg_score))
# Summary of flagged agents
if flagged:
print("FLAGGED AGENTS (below threshold)")
print("-" * 40)
for agent, avg in flagged:
print(f" {agent}: avg {avg:.1f} < {threshold}")
print()
print("Recommended actions:")
print(" 1. Review feedback rows for top patterns")
print(" 2. Iterate on agent system prompt")
print(" 3. Consider pipeline redesign if pattern is structural")
print(" 4. Run pipeline-optimizer.sh for bottleneck analysis")
else:
print("All agents above threshold.")
PYEOF
```
**`scripts/templates/feedback/README.md`** — Explains the feedback loop pattern:
```markdown
# Feedback Loop
Systematic feedback collection and performance scoring for agent pipelines.
## How it works
1. After each pipeline run, a reviewer agent (or human) assigns a score (0100)
and categorizes any issues with a pattern tag.
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
`score_output` tool calls. It appends a row to `FEEDBACK.md`.
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
to compute per-agent metrics: average score, error rate, cost per run,
improvement trend (last 10 vs. previous 10 runs).
5. Agents scoring below the threshold (default 60/100) are flagged for review.
## Pattern tags
Consistent tags are required for pattern detection to work. Use the tags
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
consistent. Inconsistent tagging produces false negatives.
## Scoring → self-improvement connection
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
making it less likely to be selected until its performance improves.
The feedback loop closes the improvement cycle:
1. Pipeline runs → reviewer assigns score + pattern tag
2. `feedback-collector.sh` appends to FEEDBACK.md
3. `performance-scorer.sh` flags underperforming agents
4. Developer reviews top patterns → iterates on agent prompt
5. New runs produce new feedback → trend shows improvement
6. VFM scores update automatically on next pipeline selection
## Example: prompt iteration driven by feedback
Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
```
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
```
After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
performance-scorer.sh shows avg 45/100, error rate 100%.
Action: update agent-writer's system prompt with explicit length and
depth requirements. Next 10 runs show trend "improving (+18.3)".
## Integration
Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
```json
{
"hooks": {
"PostToolUse": [{
"matcher": "review_pipeline",
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
}]
}
}
```
Run performance-scorer.sh on demand or as a scheduled report:
```bash
./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
```
```
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add feedback loop and performance scoring templates"
```
---
## Step 21: Create pipeline optimization templates
### Files to create
**`scripts/templates/optimization/pipeline-optimizer.sh`** — Analyzes pipeline performance and generates recommendations:
```bash
#!/bin/bash
# Pipeline optimizer: identify bottlenecks, excess loops, cost outliers.
# Bash 3.2 compatible. Uses python3 for all analysis.
# Does NOT auto-implement any changes — produces RECOMMENDATIONS.md only.
#
# Analysis covers:
# - Bottleneck agents (highest avg duration or cost per run)
# - Unnecessary revision loops (agents that loop 3+ times on average)
# - Underutilized agents (invoked < 10% of pipeline runs)
# - Cost outliers (single run cost >= 3x average)
#
# Output: RECOMMENDATIONS.md with VFM pre-scores for each recommendation.
#
# Usage:
# ./pipeline-optimizer.sh
# ./pipeline-optimizer.sh --pipeline {{PIPELINE_NAME}}
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md"
COST_LOG="$WORKING_DIR/budget/cost-events.jsonl"
RECOMMENDATIONS_FILE="$WORKING_DIR/RECOMMENDATIONS.md"
PIPELINE_FILTER=""
# Parse arguments (bash 3.2 compatible)
while [ "$#" -gt 0 ]; do
case "$1" in
--pipeline) PIPELINE_FILTER="$2"; shift 2 ;;
*) shift ;;
esac
done
python3 << PYEOF
import re, json, os, sys
from collections import defaultdict
from datetime import datetime
feedback_file = "$FEEDBACK_FILE"
cost_log = "$COST_LOG"
recommendations_file = "$RECOMMENDATIONS_FILE"
pipeline_filter = "$PIPELINE_FILTER"
# Parse FEEDBACK.md
feedback_rows = []
if os.path.exists(feedback_file):
with open(feedback_file) as f:
in_table = False
for line in f:
line = line.strip()
if '| Date |' in line:
in_table = True
continue
if in_table and line.startswith('|---'):
continue
if in_table and line.startswith('|') and '{{' not in line:
cols = [c.strip() for c in line.strip('|').split('|')]
if len(cols) >= 7:
try:
score_m = re.match(r'(\d+)', cols[3])
score = int(score_m.group(1)) if score_m else 0
feedback_rows.append({
'date': cols[0],
'pipeline': cols[1],
'agent': cols[2],
'score': score,
'issue': cols[4],
'pattern': cols[6]
})
except (ValueError, IndexError):
pass
# Filter by pipeline
if pipeline_filter:
feedback_rows = [r for r in feedback_rows if r['pipeline'] == pipeline_filter]
# Parse cost events
cost_events = []
if os.path.exists(cost_log):
with open(cost_log) as f:
for line in f:
line = line.strip()
if line:
try:
cost_events.append(json.loads(line))
except Exception:
pass
# Per-agent event counts (cost proxy)
cost_by_agent = defaultdict(list)
# Group by agent+date for per-run cost
run_costs = defaultdict(list)
for e in cost_events:
agent = e.get('agent', 'unknown')
date = e.get('timestamp', '')[:10]
run_key = f"{agent}:{date}"
cost_by_agent[agent].append(1)
run_costs[agent].append(1)
# Build recommendations
recommendations = []
# 1. Bottleneck agents: top 2 by event count
if cost_by_agent:
agent_totals = [(a, len(events)) for a, events in cost_by_agent.items()]
agent_totals.sort(key=lambda x: -x[1])
for agent, total in agent_totals[:2]:
all_costs = [len(v) for v in run_costs.values()]
avg_cost = sum(all_costs) / len(all_costs) if all_costs else 1
if total > avg_cost * 1.5:
recommendations.append({
'type': 'bottleneck',
'agent': agent,
'description': f"Agent '{agent}' accounts for {total} events vs avg {avg_cost:.0f}. "
f"Consider batching its tool calls or reducing its task scope.",
'vfm_prescore': 70
})
# 2. Unnecessary revision loops: agents with loop-excess pattern >= 3 times
pattern_by_agent = defaultdict(lambda: defaultdict(int))
for r in feedback_rows:
if r['pattern']:
pattern_by_agent[r['agent']][r['pattern']] += 1
for agent, patterns in pattern_by_agent.items():
if patterns.get('loop-excess', 0) >= 3:
count = patterns['loop-excess']
recommendations.append({
'type': 'loop-excess',
'agent': agent,
'description': f"Agent '{agent}' has {count} feedback rows tagged 'loop-excess'. "
f"Review pipeline revision criteria — tighten acceptance conditions "
f"or add a max-iterations guard (see self-healing.sh).",
'vfm_prescore': 80
})
# 3. Underutilized agents: invoked in < 10% of pipeline runs
if feedback_rows:
all_runs = set(r['date'] + ':' + r['pipeline'] for r in feedback_rows)
total_runs = len(all_runs) if all_runs else 1
agent_runs = defaultdict(set)
for r in feedback_rows:
agent_runs[r['agent']].add(r['date'] + ':' + r['pipeline'])
for agent, runs in agent_runs.items():
utilization = len(runs) / total_runs
if utilization < 0.1 and total_runs >= 10:
recommendations.append({
'type': 'underutilized',
'agent': agent,
'description': f"Agent '{agent}' appears in only {utilization*100:.0f}% of pipeline runs. "
f"Consider removing from the pipeline or combining with another agent.",
'vfm_prescore': 60
})
# 4. Cost outliers: single-run cost >= 3x average
if run_costs:
all_run_totals = []
for agent, runs in run_costs.items():
all_run_totals.extend(runs)
avg_run = sum(all_run_totals) / len(all_run_totals) if all_run_totals else 1
for agent, runs in run_costs.items():
for run_cost in runs:
if run_cost >= avg_run * 3:
recommendations.append({
'type': 'cost-outlier',
'agent': agent,
'description': f"Agent '{agent}' had a run costing {run_cost} events "
f"vs avg {avg_run:.1f} (3x+ threshold). "
f"Add per-run budget cap with budget-hook.sh.",
'vfm_prescore': 75
})
break # one recommendation per agent
# Write RECOMMENDATIONS.md
timestamp = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
pipeline_label = pipeline_filter if pipeline_filter else "all pipelines"
lines = [
f"# Pipeline Optimization Recommendations",
f"",
f"Generated: {timestamp}",
f"Scope: {pipeline_label}",
f"",
f"> These are recommendations only. No changes have been made.",
f"> Review each item and implement manually or with team approval.",
f"",
]
if recommendations:
lines.append(f"## Recommendations ({len(recommendations)} found)")
lines.append("")
for i, rec in enumerate(recommendations, 1):
lines.append(f"### R{i}: {rec['type'].upper()} — {rec['agent']}")
lines.append("")
lines.append(rec['description'])
lines.append("")
lines.append(f"**VFM pre-score:** {rec['vfm_prescore']}/100")
lines.append("")
else:
lines.append("## No recommendations")
lines.append("")
lines.append("No bottlenecks, excess loops, underutilized agents, or cost outliers detected.")
lines.append("")
lines.append("## Next steps")
lines.append("")
lines.append("1. Review each recommendation with the team")
lines.append("2. Prioritize by VFM pre-score (higher = more value per effort)")
lines.append("3. Implement approved changes one at a time")
lines.append("4. Run feedback-collector.sh for 10+ runs after each change")
lines.append("5. Re-run pipeline-optimizer.sh to confirm improvement")
with open(recommendations_file, 'w') as f:
f.write('\n'.join(lines) + '\n')
print(f"Recommendations written to {recommendations_file}")
print(f" Found: {len(recommendations)} recommendations")
for rec in recommendations:
print(f" - [{rec['type']}] {rec['agent']}: VFM pre-score {rec['vfm_prescore']}")
PYEOF
```
**`scripts/templates/optimization/self-healing.sh`** — Error recovery after agent/pipeline failures:
```bash
#!/bin/bash
# Self-healing: categorize errors and apply recovery strategies.
# Bash 3.2 compatible. Uses python3 for JSON/log parsing.
#
# Error categories and recovery strategies:
# timeout → retry with shorter task scope
# permission-denied → log and skip (do not retry)
# tool-not-found → log and alert, do not retry
# api-error → exponential backoff, max 3 retries
# content-quality → re-run with stricter prompt, max 2 retries
#
# Max total attempts: 5 (OpenClaw pattern — hard cap regardless of category).
# All recovery events logged to healing-log.jsonl.
#
# Usage:
# ./self-healing.sh --error-type <type> --agent <name> --attempt <n> --context <msg>
#
# Exit codes:
# 0 — recovery action taken (caller should retry)
# 1 — no recovery possible (caller should abort)
# 2 — max attempts reached (caller should escalate)
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
HEALING_LOG="$WORKING_DIR/healing-log.jsonl"
MAX_ATTEMPTS=5
ERROR_TYPE=""
AGENT_NAME=""
ATTEMPT=1
CONTEXT_MSG=""
# Parse arguments (bash 3.2 compatible)
while [ "$#" -gt 0 ]; do
case "$1" in
--error-type) ERROR_TYPE="$2"; shift 2 ;;
--agent) AGENT_NAME="$2"; shift 2 ;;
--attempt) ATTEMPT="$2"; shift 2 ;;
--context) CONTEXT_MSG="$2"; shift 2 ;;
*) shift ;;
esac
done
if [ -z "$ERROR_TYPE" ]; then
echo "Usage: $0 --error-type <type> --agent <name> --attempt <n> --context <msg>"
exit 1
fi
# Hard cap: max 5 attempts total
if [ "$ATTEMPT" -gt "$MAX_ATTEMPTS" ]; then
echo "MAX ATTEMPTS REACHED ($MAX_ATTEMPTS) for $AGENT_NAME. Escalating."
python3 -c "
import json, time, os
event = {
'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
'agent': '$AGENT_NAME',
'error_type': '$ERROR_TYPE',
'attempt': $ATTEMPT,
'action': 'escalate',
'reason': 'max_attempts_reached',
'context': '$CONTEXT_MSG'
}
with open('$HEALING_LOG', 'a') as f:
f.write(json.dumps(event) + '\n')
print(json.dumps(event))
"
exit 2
fi
# Determine recovery action per category
RECOVERY_ACTION=""
RECOVERY_DETAIL=""
EXIT_CODE=0
case "$ERROR_TYPE" in
timeout)
RECOVERY_ACTION="retry_shorter"
RECOVERY_DETAIL="Re-run with reduced task scope. Split task if attempt >= 3."
if [ "$ATTEMPT" -ge 3 ]; then
RECOVERY_DETAIL="Attempt $ATTEMPT: recommend splitting task before retry."
fi
EXIT_CODE=0
;;
permission-denied)
RECOVERY_ACTION="skip"
RECOVERY_DETAIL="Permission errors cannot be auto-resolved. Log and skip. Notify operator."
EXIT_CODE=1
;;
tool-not-found)
RECOVERY_ACTION="alert"
RECOVERY_DETAIL="Tool not found — check agent config and hook registrations. Do not retry."
EXIT_CODE=1
;;
api-error)
# Exponential backoff: 2^(attempt-1) seconds, max 3 retries
if [ "$ATTEMPT" -le 3 ]; then
BACKOFF_SECS=$(python3 -c "print(min(2 ** ($ATTEMPT - 1), 16))")
RECOVERY_ACTION="retry_backoff"
RECOVERY_DETAIL="API error — wait ${BACKOFF_SECS}s then retry (attempt $ATTEMPT/3)."
sleep "$BACKOFF_SECS"
EXIT_CODE=0
else
RECOVERY_ACTION="abort"
RECOVERY_DETAIL="API error persists after 3 retries. Aborting."
EXIT_CODE=1
fi
;;
content-quality)
# Max 2 retries for quality issues
if [ "$ATTEMPT" -le 2 ]; then
RECOVERY_ACTION="retry_strict"
RECOVERY_DETAIL="Re-run with stricter prompt. Add explicit quality criteria (attempt $ATTEMPT/2)."
EXIT_CODE=0
else
RECOVERY_ACTION="escalate_quality"
RECOVERY_DETAIL="Content quality below threshold after 2 retries. Escalate to human review."
EXIT_CODE=2
fi
;;
*)
RECOVERY_ACTION="unknown"
RECOVERY_DETAIL="Unknown error type '$ERROR_TYPE'. Logging and aborting."
EXIT_CODE=1
;;
esac
# Log recovery event
python3 -c "
import json, time
event = {
'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
'agent': '$AGENT_NAME',
'error_type': '$ERROR_TYPE',
'attempt': $ATTEMPT,
'action': '$RECOVERY_ACTION',
'detail': '$RECOVERY_DETAIL',
'context': '$CONTEXT_MSG'
}
with open('$HEALING_LOG', 'a') as f:
f.write(json.dumps(event) + '\n')
print(json.dumps(event, indent=2))
"
echo "Recovery: $RECOVERY_ACTION — $RECOVERY_DETAIL"
exit $EXIT_CODE
```
**`scripts/templates/optimization/README.md`** — Explains optimization and self-healing:
```markdown
# Pipeline Optimization and Self-Healing
Two tools for making agent pipelines more efficient and resilient over time.
## pipeline-optimizer.sh
Analyzes FEEDBACK.md and cost-events.jsonl to identify:
| Issue | Detection | Recommendation |
|-------|-----------|----------------|
| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope |
| Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard |
| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent |
| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh |
Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each
recommendation. Higher VFM pre-scores mean more value per implementation effort.
**This script does not auto-implement anything.** All changes require
manual review and explicit approval. This is intentional — pipeline
restructuring is a high-stakes operation.
## self-healing.sh
Categorizes errors and applies targeted recovery strategies:
| Error Type | Recovery | Max Retries |
|------------|----------|-------------|
| `timeout` | Retry with shorter scope | 5 (hard cap) |
| `permission-denied` | Log and skip | 0 (no retry) |
| `tool-not-found` | Alert operator | 0 (no retry) |
| `api-error` | Exponential backoff (2^n seconds) | 3 |
| `content-quality` | Retry with stricter prompt | 2 |
**Hard cap: 5 total attempts regardless of category.** This follows the
OpenClaw pattern — unbounded retry loops are the most common cause of
runaway agent costs. The cap is non-negotiable.
After the hard cap is reached, the script exits with code 2 (escalate).
The caller is responsible for deciding whether to pause, alert a human,
or abort the pipeline run.
## Connection to feedback and VFM
```
feedback-collector.sh → FEEDBACK.md → performance-scorer.sh → flagged agents
pipeline-optimizer.sh → RECOMMENDATIONS.md
(manual review + approval)
prompt/pipeline update
new runs → new feedback
```
VFM pre-scores in RECOMMENDATIONS.md use the same 0100 scale as
`scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are
pre-scores, not final scores — the VFM evaluation still needs to run
when the task is scheduled. The pre-scores help prioritize which
recommendations to tackle first.
## Safety limits
- `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files
- `self-healing.sh`: max 5 attempts hard cap, permission errors never retried
- All events logged to `healing-log.jsonl` for audit trail
- No auto-escalation to external systems — exit codes only
## Usage
```bash
# Run optimizer for all pipelines
./optimization/pipeline-optimizer.sh
# Run optimizer for a specific pipeline
./optimization/pipeline-optimizer.sh --pipeline doc-pipeline
# Handle an error in a pipeline step
./optimization/self-healing.sh \
--error-type api-error \
--agent agent-writer \
--attempt 1 \
--context "OpenAI timeout on summarize call"
# Check healing log
cat healing-log.jsonl | python3 -m json.tool
```
```
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add pipeline optimization and self-healing templates"
```
---
## Exit Condition
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/feedback/ | wc -l` → 4
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/optimization/ | wc -l` → 3
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh` → no errors
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh` → no errors
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh` → no errors
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh` → no errors
- [ ] FEEDBACK.md contains `| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |` header row
- [ ] performance-scorer.sh computes improvement trend (last 10 vs. prev 10)
- [ ] pipeline-optimizer.sh writes to RECOMMENDATIONS.md and does NOT modify any pipeline files
- [ ] self-healing.sh exits 2 when attempt > 5 (hard cap enforced)
- [ ] healing-log.jsonl referenced in self-healing.sh
- [ ] All bash scripts are 3.2 compatible (no associative arrays, no mapfile, no `|&`)
## Quality Criteria
- Feedback table columns match the 7-column spec (Date, Pipeline, Agent, Score, Issue, Resolution, Pattern)
- Pattern detection fires at exactly 3 occurrences (not 2, not 4)
- Performance-scorer.sh improvement trend correctly computes last 10 vs. previous 10 scores
- pipeline-optimizer.sh detects all 4 issue types: bottleneck, loop-excess, underutilized, cost-outlier
- VFM pre-scores in RECOMMENDATIONS.md use the same 0100 scale as VFM-SCORING.md (Step 11)
- self-healing.sh hard cap is exactly 5 (OpenClaw pattern) — not configurable
- permission-denied and tool-not-found errors are never retried (exit 1 immediately)
- api-error uses exponential backoff: 1s, 2s, 4s (2^0, 2^1, 2^2) before aborting at attempt 4
- content-quality escalates to human review (exit 2) after 2 retries, not abort
- All scripts use `#!/bin/bash` shebang and are bash 3.2 compatible

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,937 @@
# Session 8: Build Command Integration and Finalization
> Steps 8, 26, 27 | Wave 3 | Depends on: Sessions 1, 2, 7
## Dependencies
Entry condition: Session 1 complete (commands renamed to `/agent-factory:*`), Session 2 complete (5 domain templates exist in `scripts/templates/domains/`), Session 7 complete (skill references updated in `skills/agent-system-design/`)
## Scope Fence
**Touch:**
- `commands/build.md`
- `.claude-plugin/plugin.json`
- `CLAUDE.md`
- `README.md`
**Never touch:**
- `scripts/templates/` (read-only — already created in Sessions 3-6)
- `skills/` (read-only — already updated in Session 7)
- `agents/` (no changes needed)
---
## Step 8: Update build command to use domain templates and new features
### Files to modify
**`commands/build.md`** — Apply the following changes:
**Change 1 — Rename command reference in line 7:**
Old:
```
You are running `/agent-builder:build` — a guided 7-phase workflow for building a complete autonomous agent system with Claude Code.
```
New:
```
You are running `/agent-factory:build` — a guided 7-phase workflow for building a complete autonomous agent system with Claude Code.
```
**Change 2 — Add Phase 0 before Phase 1:**
Insert the following section immediately before the `## Phase 1: Map Your Work` heading:
```markdown
---
## Phase 0: Choose a Starting Point
Goal: Offer domain templates to accelerate the design process.
Ask the user using AskUserQuestion:
"Would you like to start from a domain template? Templates pre-populate the agent roles and pipeline structure for common use cases.
Available templates:
1. **content-pipeline** — Articles, newsletters, reports. Agents: researcher, writer, reviewer.
2. **code-review** — Automated PR review. Agents: analyzer, review-writer, standards-checker.
3. **monitoring** — System/service monitoring. Agents: monitor-checker, incident-reporter, remediation-advisor.
4. **research-synthesis** — Research and analysis. Agents: source-gatherer, synthesizer, fact-checker.
5. **data-processing** — Data transformation. Agents: data-validator, transformer, quality-checker.
6. **custom** — Blank start. Answer the Phase 1 questions yourself.
Enter a template name or 'custom'."
If a template is chosen (not 'custom'):
- Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.md` where `{{TEMPLATE}}` is the chosen name
- Extract the agent definitions and pipeline skill template from the file
- Pre-populate the Phase 1 design sketch using the template's roles and pipeline steps
- Tell the user: "I've loaded the [template] template. In Phase 1, I'll show you the pre-built design and you can customize it."
- Skip Phase 1's interview questions and go directly to showing the pre-populated design sketch
- Ask the user: "Does this design sketch match your use case? What would you change?"
- Incorporate any changes before proceeding to Phase 2.
If 'custom' is chosen:
- Proceed normally to Phase 1.
```
**Change 3 — Replace Phase 6 (Deployment) entirely:**
Replace the entire `## Phase 6: Deployment` section with:
```markdown
## Phase 6: Deployment
Goal: Configure automated scheduling for the pipeline.
Use the `/agent-factory:deploy` command to handle deployment configuration. Tell the user:
"I'll hand off to the deploy command to configure your deployment target."
Then invoke `/agent-factory:deploy` passing the pipeline name and agent list as context.
If the user wants to configure deployment inline without switching commands, ask using AskUserQuestion:
"Where will this pipeline run? Choose a deployment target:
1. **/schedule (cloud)** — Anthropic's cloud scheduler. Needs GitHub repo. No local file access. 1-hour minimum interval. Simplest option if your project is on GitHub.
2. **Desktop scheduled tasks** — Runs on your machine via Claude Code Desktop. Local file access. 1-minute minimum interval. Best for local automation.
3. **Local (cron/launchd)** — Traditional scheduler. Runs headless via `claude -p`. Full local access. Best for personal daily pipelines.
4. **VPS (systemd)** — Linux server with systemd service + timer. Always-on. Best for team pipelines and production workloads.
5. **Docker** — Containerized agent. Portable, isolated, reproducible. Best for consistent environments and security isolation."
For each target, follow the deployment steps from `${CLAUDE_PLUGIN_ROOT}/commands/deploy.md`.
Tell the user what files were created and the exact commands needed to activate the schedule.
```
**Change 4 — Update Summary section:**
Replace the summary block at the end of the file with:
```markdown
## Summary
After all phases are complete (or the user stops early), print a summary:
```
AGENT SYSTEM BUILT
==================
Agents: .claude/agents/[list]
Pipeline: .claude/skills/[name].md
Hooks: .claude/hooks/pre-tool-use.sh, post-tool-use.sh
Settings: .claude/settings.json
Schedule: automation/[script or config]
Run your pipeline: /[pipeline-name] [your topic]
Review logs: .claude/hooks/audit.log
Evaluate system: /agent-factory:evaluate
Check status: /agent-factory:status
```
If the user stopped early, tell them which phase to continue from and remind them that this command can be re-invoked at any time.
Next steps:
- Run `/agent-factory:evaluate` to score your system against the 22 agent capabilities
- Run `/agent-factory:status` for a quick health check of all components
- Run `/agent-factory:deploy` to configure or change your deployment target
```
### Verify
```bash
grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md
```
Expected: `>= 3`
### On failure
revert — build command is critical path
### Checkpoint
```bash
git commit -m "feat(commands): integrate domain templates and new commands into build workflow"
```
---
## Step 26: Update build command to integrate all Phase 2-5 features
### Files to modify
**`commands/build.md`** — Build on Step 8's changes. Apply the following additions:
**Change 1 — Add Phase 2.5 after Phase 2:**
Insert the following section immediately after the `## Phase 2: Operating Manual` section (after the line "Tell the user: "Created CLAUDE.md. Review it and edit freely — this is your operating manual, not mine.""):
```markdown
---
## Phase 2.5: Memory Setup
Goal: Optionally configure 3-tier memory for agents that need to remember state across runs.
Ask the user using AskUserQuestion:
"Would you like to set up 3-tier memory for your agents? This lets agents remember context across sessions.
Memory tiers:
- **Session state** (hot) — Working memory, updated every turn. Prevents data loss on crash. Agents write here before responding.
- **Daily logs** (warm) — One file per day. Captures decisions, files modified, carry-forward items.
- **Long-term memory** (cold) — Curated knowledge that persists indefinitely. Agents read this on startup.
This is recommended for pipelines that run daily and need to track progress over time. Skip if this is a one-shot or stateless pipeline."
If yes:
1. Create the `memory/` directory in the user's project
2. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/SESSION-STATE.md` and copy to `memory/SESSION-STATE.md`, replacing `{{AGENT_NAME}}` with the main pipeline agent name
3. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/MEMORY.md` and copy to `memory/MEMORY.md`
4. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/memory/DAILY-LOG.md` — explain to the user that daily log files are created automatically with the pattern `memory/YYYY-MM-DD.md`
5. Tell each agent to read `memory/SESSION-STATE.md` first on startup by adding this instruction to each agent's "How you work" section:
"1. Read memory/SESSION-STATE.md for current task context (write your intent here before responding)"
6. Tell the user: "Memory configured. Agents will now persist state across runs. The WAL protocol ensures no data loss on crash."
If no: skip and continue to Phase 3.
```
**Change 2 — Add Phase 3.5 after Phase 3:**
Insert the following section immediately after the `## Phase 3: Agent Team` section (after the "Make adjustments before moving to Phase 4" line):
```markdown
---
## Phase 3.5: Skills and Custom Components
Goal: Wire up specialized skills and configure the pause/resume mechanism for missing dependencies.
Ask the user using AskUserQuestion:
"Do your agents need any specialized knowledge or behaviors beyond what's been generated? For example: writing style guides, domain-specific rules, external API patterns, or custom tool workflows."
If yes, for each described need:
1. Check if a matching skill already exists in the project: `Glob for .claude/skills/*.md` and `.claude/skills/*/SKILL.md`
2. If a matching skill exists: note it and confirm it will be auto-loaded by Claude Code
3. If no matching skill exists, ask: "For [described need], I can:
(a) Generate a skill skeleton now that you fill in
(b) Pause and let you build it first, then resume the build
Which do you prefer?"
If user chooses (a): generate `.claude/skills/[name].md` with basic frontmatter and placeholder content
If user chooses (b) — pause and resume:
1. Write `build-state.json` to the project root with the current state:
```json
{
"phase": "3.5",
"completed": ["0", "1", "2", "2.5", "3"],
"choices": {
"template": "[chosen template or 'custom']",
"agents": ["[agent names]"],
"memory": true,
"pipeline_name": "[name]"
},
"paused_reason": "User building custom skill: [skill name]",
"resume_instructions": "Run /agent-factory:build --resume when the skill is ready"
}
```
2. Tell the user: "Build paused. Create your skill at `.claude/skills/[name].md`, then run `/agent-factory:build --resume` to continue from Phase 3.5."
3. Stop — do not proceed further until resumed.
On `--resume` (when $ARGUMENTS contains "--resume"):
1. Read `build-state.json` from the project root
2. Print a summary: "Resuming build. Completed phases: [list]. Paused at: [phase]. Reason: [reason]."
3. Continue from the saved phase using the saved choices.
```
**Change 3 — Add Phase 3.7 after Phase 3.5:**
Insert the following section immediately after the Phase 3.5 section:
```markdown
---
## Phase 3.7: Proactive Agent (Optional)
Goal: Enable self-improving behavior for agents that should operate autonomously over time.
Ask the user using AskUserQuestion:
"Should any of your agents be proactive — able to identify improvements to their own behavior and implement them within guardrails?
Proactive agents use two protection mechanisms:
- **ADL (Anti-Drift Limits)** — Prevents agents from faking capabilities, making unverifiable changes, or expanding scope without approval
- **VFM (Value-First Modification)** — Requires each proposed change to score >50/100 on: frequency (0-25), failure reduction (0-25), burden reduction (0-25), cost savings (0-25)
This is recommended for long-running pipelines where you want the agent to improve itself over time. Skip if you want full manual control."
If yes:
1. Ask: "Which agent(s) should be proactive?" and list the agents from Phase 3
2. For each chosen agent:
a. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/ADL-RULES.md` and append its contents to the agent's `.claude/agents/[name].md` as a new "## Anti-Drift Limits" section
b. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/proactive/VFM-SCORING.md` and append as a new "## Value-First Modification" section
3. Show the user an example VFM scoring calculation: "Before adding a new step to my pipeline, I score it: Frequency (how often this gap causes issues): 20/25. Failure reduction: 18/25. Burden reduction: 15/25. Cost savings: 5/25. Total: 58/100 → implement."
4. Tell the user: "Proactive guardrails added. Agents will self-improve within ADL/VFM boundaries."
If no: skip and continue to Phase 4.
```
**Change 4 — Add Phase 4.5 after Phase 4:**
Insert the following section immediately after the `## Phase 4: Pipeline` section (after "Tell the user: "Created .claude/skills/[pipeline-name].md. Run it with /[pipeline-name] [your topic].""):
```markdown
---
## Phase 4.5: Integrations and MCP Servers
Goal: Connect agents to external services they need to do real work.
Ask the user using AskUserQuestion:
"What external services do your agents need to interact with? For example: Slack, GitHub, Linear, databases, REST APIs, file systems, browsers."
If none: skip and continue to Phase 5.
For each named service:
1. Check if a known MCP server exists for it:
- Slack → `@anthropic-ai/mcp-server-slack`
- GitHub → available as MCP server
- Playwright/browser → `@anthropic-ai/mcp-server-playwright`
- PostgreSQL/SQLite → community MCP servers available
- Linear, Jira, Notion → community MCP servers available
- Read `${CLAUDE_PLUGIN_ROOT}/skills/agent-system-design/references/mcp-integrations.md` for the full list
2. If MCP server exists:
a. Generate a `.mcp.json` entry for it (create or append to `.mcp.json` in project root):
```json
{
"mcpServers": {
"[service-name]": {
"command": "npx",
"args": ["-y", "@anthropic-ai/mcp-server-[name]"],
"env": {
"[SERVICE]_API_KEY": "${[SERVICE]_API_KEY}"
}
}
}
}
```
b. Tell the user: "Add [SERVICE]_API_KEY to your environment. For local runs: export it in ~/.zshenv. For Docker: add it to .env."
3. If no MCP server exists for the service:
a. Explain: "There's no pre-built MCP server for [service]. You have three options:
(a) Use the Bash tool directly — simpler but less structured. Good for REST APIs with curl.
(b) Find a community MCP server — search: https://github.com/modelcontextprotocol/servers
(c) Build a custom MCP server — I can generate a skeleton using the `/mcp-builder` skill."
b. Ask which option they prefer
c. If option (a): add Bash tool to the agent's tools list and note the API pattern in the agent's system prompt
d. If option (b): pause and provide search instructions
e. If option (c) — pause and resume:
- Write build-state.json with phase "4.5", add current MCP choices to "choices"
- Tell the user: "Build paused. Build your MCP server, then run `/agent-factory:build --resume`."
- Stop.
After all integrations:
1. Validate `.mcp.json` syntax: `python3 -c "import json; json.load(open('.mcp.json'))"` — tell user if invalid
2. Tell the user: "MCP integrations configured. Note: `/schedule` cloud tasks cannot use MCP servers. For cloud deployment, agents must use Bash with direct API calls instead."
```
**Change 5 — Add Phase 5.5 after Phase 5:**
Insert the following section immediately after the `## Phase 5: Security` section (after "Tell the user: "Created hooks and settings.json. The audit log will be written to .claude/hooks/audit.log after each tool call.""):
```markdown
---
## Phase 5.5: Governance
Goal: Define how much autonomy the agent system has and where humans stay in the loop.
Ask the user using AskUserQuestion:
"What autonomy level do you want for your agent system?
- **Level 0** — Full manual approval. Every tool call requires your OK.
- **Level 1** — Auto-approve reads only. (Read, Glob, Grep run freely. All writes need approval.)
- **Level 2** — Auto-approve file operations within the project. (Writes to project dir run freely.)
- **Level 3** — Auto-approve all except destructive operations. (Bash with non-destructive commands runs freely.)
- **Level 4** — Full autonomy with hooks as guardrails. (Everything runs unless blocked by a hook rule.)
For most personal pipelines: Level 3 or 4. For shared or production systems: Level 1 or 2."
Based on the chosen level:
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/governance/GOVERNANCE.md`
2. Copy to `GOVERNANCE.md` in the project root
3. Replace `{{AUTONOMY_LEVEL}}` with the chosen level
4. Replace `{{PROJECT_NAME}}` with the project name from Phase 2
5. Ask: "Are there any specific approval gates — operations that must always pause for your review? For example: sending emails, deleting files, committing to main branch."
6. For each gate: add an entry to the Approval Gates section in GOVERNANCE.md
Ask using AskUserQuestion:
"Do you want to track agent spending? Budget tracking records estimated API costs per run and can pause agents that exceed a monthly limit."
If yes:
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/budget/BUDGET.md` and copy to `budget/BUDGET.md`
2. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/budget/budget-hook.sh` and copy to `.claude/hooks/budget-hook.sh`
3. Make budget-hook.sh executable: `chmod +x .claude/hooks/budget-hook.sh`
4. Ask: "What monthly budget limit in USD?" and set `{{BUDGET_LIMIT_CENTS}}` to (USD * 100)
5. Add budget-hook.sh to PostToolUse hooks in `.claude/settings.json`
6. Tell the user: "Budget tracking configured. Costs are estimated from token counts. A warning triggers at 80% and agents pause at 100% of the monthly limit."
```
**Change 6 — Add Phase 5.7 after Phase 5.5:**
Insert the following section immediately after the Phase 5.5 section:
```markdown
---
## Phase 5.7: Goals and Org Chart (Multi-Agent Systems)
Goal: For systems with 3 or more agents, define the hierarchy and shared goals.
Count the agents created in Phase 3. If fewer than 3: skip this phase entirely.
If 3 or more agents:
Ask the user using AskUserQuestion:
"Your system has [N] agents. Would you like to define:
(a) A **goal hierarchy** — what this system is trying to achieve at the company, project, and task levels
(b) An **org chart** — which agents report to which, and who has what authority
(c) Both
(d) Skip — I'll manage coordination manually"
If goals (a or c):
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/goals/GOALS.md` and copy to `GOALS.md` in the project root
2. Ask: "What is the top-level goal this agent system serves? (e.g., 'Publish 3 high-quality articles per week')"
3. For each agent, ask: "What does [agent name] contribute toward that goal?"
4. Populate the Goals file with the hierarchy using dot-notation IDs: G1 (company goal) → G1.1 (project goal) → G1.1.1 (this agent's task goal)
5. Tell the user: "GOALS.md created. Agents can reference their goal ID to stay aligned with the overall mission."
If org chart (b or c):
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/org-chart/ORG-CHART.md` and copy to `ORG-CHART.md` in the project root
2. Ask: "Which agent is the top-level orchestrator? (The one that coordinates others.)"
3. For remaining agents: ask "Does [agent] report to [orchestrator] or another agent?"
4. Populate the ORG-CHART.md table with `reportsTo` references
5. Tell the user: "ORG-CHART.md created. You (the human operator) are always the 'board' with override authority on all agents."
```
**Change 7 — Update Phase 6 deployment options** (replace the Phase 6 written in Step 8 with this expanded version):
Replace the `## Phase 6: Deployment` section written in Step 8 with:
```markdown
## Phase 6: Deployment
Goal: Configure automated scheduling for the pipeline. Present ALL deployment options with clear trade-offs.
Use the `/agent-factory:deploy` command to handle deployment configuration. Tell the user:
"I'll hand off to the deploy command to configure your deployment target."
Then invoke `/agent-factory:deploy` passing the pipeline name, agent list, and any MCP integrations from Phase 4.5 as context.
If the user wants to configure deployment inline, ask using AskUserQuestion:
"Choose a deployment target. Here are the trade-offs:
| Target | Needs | Minimum interval | Local files | Best for |
|--------|-------|-----------------|-------------|----------|
| /schedule (cloud) | GitHub repo | 1 hour | No | Repo maintenance, CI triage, PR review |
| Desktop tasks | Desktop app running | 1 minute | Yes | Local automation while you work |
| Local cron/launchd | Machine on | Any | Yes | Personal daily pipelines |
| VPS systemd | Linux server | Any | Yes | Team pipelines, production |
| Docker | Docker installed | Any | Via volumes | Isolated, portable, reproducible |
Which target? (You can configure multiple.)"
For `/schedule (cloud)`:
1. Check GitHub connection. If not connected: explain the GitHub repo requirement.
2. Warn: "Cloud tasks cannot access local files or MCP servers. Only GitHub repo contents and allowed MCP connectors are accessible."
3. Generate a task prompt from the pipeline skill. Guide the user through `/schedule` to create the task.
For Desktop scheduled tasks:
1. Explain: "Desktop tasks run on your machine with full local file access and MCP servers available."
2. Guide through the Desktop app Schedule page. Recommend `bypassPermissionsModeAllowed: true` for unattended runs.
For Local (cron/launchd):
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/automation.sh`. Copy to `automation/run-pipeline.sh` and customize.
2. Ask: "What schedule? (e.g., daily at 07:00)"
3. For macOS: read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/launchd.plist`, customize, save to `automation/`
4. Provide activation: `launchctl load ~/Library/LaunchAgents/[label].plist`
For VPS (systemd):
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/systemd-service.unit`. Copy and customize to `automation/claude-pipeline.service`.
2. Ask: "What Linux user runs the agent?" and "Absolute path to project?"
3. Provide activation: `systemctl enable --now claude-pipeline.timer`
For Docker:
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/docker/Dockerfile` and `docker-compose.yml`. Copy and customize to project root.
2. Warn: "Never bake ANTHROPIC_API_KEY into the image. Use .env file or environment injection."
3. Provide activation: `docker compose up -d`
Tell the user what files were created and the exact commands needed to activate the schedule.
```
**Change 8 — Update Phase 7:**
Append the following to the end of the `## Phase 7: Test and Iterate` section (after "Implement that one change before closing the session."):
```markdown
After the test run, ask using AskUserQuestion:
"Would you like to set up a feedback loop? This records reviewer scores and recurring issues so you can track pipeline quality over time and identify which agents need tuning."
If yes:
1. Read `${CLAUDE_PLUGIN_ROOT}/scripts/templates/feedback/FEEDBACK.md` and copy to `feedback/FEEDBACK.md`
2. Tell the user: "Log pipeline results to feedback/FEEDBACK.md after each run. After your first week, run `scripts/templates/optimization/pipeline-optimizer.sh` to get specific recommendations for improving your agents."
```
**Change 9 — Update the Summary section** (replace the summary from Step 8 with the full version):
Replace the `## Summary` section written in Step 8 with:
```markdown
## Summary
After all phases are complete (or the user stops early), print a summary:
```
AGENT SYSTEM BUILT
==================
Agents: .claude/agents/[list]
Pipeline: .claude/skills/[name].md
Hooks: .claude/hooks/pre-tool-use.sh, post-tool-use.sh
Settings: .claude/settings.json
Schedule: automation/[script or config]
Memory: memory/ (3-tier: SESSION-STATE, daily logs, MEMORY.md)
Governance: GOVERNANCE.md (Level [N])
Budget: budget/BUDGET.md + .claude/hooks/budget-hook.sh
Goals: GOALS.md
Org chart: ORG-CHART.md
MCP config: .mcp.json
Feedback: feedback/FEEDBACK.md
Run your pipeline: /[pipeline-name] [your topic]
Review logs: .claude/hooks/audit.log
Evaluate system: /agent-factory:evaluate
Check status: /agent-factory:status
Deploy/reschedule: /agent-factory:deploy
```
If build was paused (build-state.json exists):
- Tell the user: "Build paused at Phase [phase]. Run `/agent-factory:build --resume` to continue."
- List completed phases and remaining phases.
If the user stopped early without pausing:
- Tell them which phase to continue from and remind them that this command can be re-invoked at any time.
Next steps after first week:
- Run `scripts/templates/optimization/pipeline-optimizer.sh` after collecting feedback
- Run `/agent-factory:evaluate` to score your system against all 22 capabilities
```
**Change 10 — Add build-state.json resume handling at top of file:**
Add the following paragraph immediately after the opening description paragraph (after "Work through phases sequentially..."):
```markdown
**Resume mode:** If $ARGUMENTS contains `--resume`, read `build-state.json` from the current directory. Print: "Resuming build — completed phases: [list]. Paused at Phase [phase]. Reason: [reason]. Continuing..." Then skip all completed phases and continue from the saved phase using the saved choices. If build-state.json does not exist, tell the user: "No saved build state found. Starting fresh." and proceed normally.
**build-state.json schema:**
```json
{
"phase": "3.5",
"completed": ["0", "1", "2", "2.5", "3"],
"choices": {
"template": "monitoring",
"agents": ["monitor-checker", "incident-reporter", "remediation-advisor"],
"pipeline_name": "my-monitoring",
"memory": true,
"proactive": false,
"mcp_servers": ["slack"],
"autonomy_level": 3,
"budget": true,
"goals": false,
"org_chart": false,
"deployment_target": "local"
},
"paused_reason": "User creating custom MCP server for internal ticketing system",
"resume_instructions": "Run /agent-factory:build --resume when MCP server is ready"
}
```
```
### Verify
```bash
wc -l /Users/ktg/repos/agent-builder/commands/build.md
```
Expected: `>= 600` (was 390 before Step 8, Step 26 should bring it to 600+)
Also verify the resume mechanism is present:
```bash
grep -c "build-state.json" /Users/ktg/repos/agent-builder/commands/build.md
```
Expected: `>= 3`
### On failure
revert — build command is critical path, must be valid
### Checkpoint
```bash
git commit -m "feat(commands): integrate all Phase 2-5 features into build workflow"
```
---
## Step 27: Update plugin.json, CLAUDE.md, README.md for v1.0
### Files to modify
**`.claude-plugin/plugin.json`** — Replace content with:
```json
{
"name": "agent-factory",
"description": "Build and manage autonomous agent systems with Claude Code. Guided workflow with 3-tier memory, heartbeat scheduling, budget tracking, governance, org-chart, 10 domain templates, Docker deployment, and import/export. Inspired by OpenClaw and Paperclip patterns.",
"version": "1.0.0",
"author": {
"name": "Kjell Tore Guttormsen",
"url": "https://fromaitochitta.com"
},
"repository": "https://git.fromaitochitta.com/open/agent-factory",
"license": "MIT",
"keywords": [
"agent",
"autonomous",
"pipeline",
"automation",
"hooks",
"security",
"deployment",
"memory",
"heartbeat",
"budget",
"governance",
"org-chart",
"templates",
"import",
"export"
]
}
```
**`CLAUDE.md`** — Replace content with:
```markdown
# Agent Factory Plugin
Plugin that guides users through building complete autonomous agent systems
using Claude Code. Install via `/install agent-factory` or `--plugin-dir`.
## What this plugin does
Guides users through building their own multi-agent system end to end:
agents, skills, hooks, 3-tier memory, heartbeat scheduling, budget
tracking, governance, org-chart, MCP integrations, and deployment.
The `/agent-factory:build` command runs a guided workflow covering
all major agent system capabilities.
## Plugin structure
- `commands/` — 4 user-invoked slash commands
- `build.md` — Guided build workflow (10 phases including memory, governance, integrations)
- `deploy.md` — Configure deployment (5 targets: cloud, desktop, cron/launchd, systemd, Docker)
- `evaluate.md` — Score system against 22 agent capabilities
- `status.md` — Quick health check of all infrastructure
- `agents/` — 2 plugin agents
- `builder.md` — Guided build orchestrator
- `deployment-advisor.md` — Deployment target recommendation agent
- `skills/` — 2 auto-triggering knowledge skills
- `agent-system-design/` — 22-capability feature map, pipeline patterns, security, deployment, memory, autonomy, orchestration, governance, MCP references
- `managed-agents/` — Anthropic API managed agents patterns and SDK examples
- `scripts/templates/` — File templates the builder copies into the user's project
## Template directories
All templates are in `scripts/templates/`:
| Directory | Contents |
|-----------|----------|
| `memory/` | 3-tier memory: SESSION-STATE.md, DAILY-LOG.md, MEMORY.md |
| `heartbeat/` | Heartbeat runner, HEARTBEAT.md, context-packet.md, wake-prompt.md |
| `proactive/` | PROACTIVE-AGENT.md, ADL-RULES.md, VFM-SCORING.md |
| `cron/` | agent-turn.sh (isolated agentTurn), system-event.sh |
| `goals/` | GOALS.md goal hierarchy, goal-tracker.sh |
| `budget/` | BUDGET.md policy, budget-hook.sh, budget-report.sh |
| `governance/` | GOVERNANCE.md, approval-gate.sh |
| `org-chart/` | ORG-CHART.md, org-manager.sh |
| `docker/` | Dockerfile, docker-compose.yml, docker-entrypoint.sh |
| `domains/` | 10 domain pipeline templates (content, code-review, monitoring, research, data-processing, customer-support, devops, legal, sales, security) |
| `transfer/` | export-system.sh, import-system.sh, MANIFEST.md |
| `feedback/` | FEEDBACK.md, feedback-collector.sh, performance-scorer.sh |
| `optimization/` | pipeline-optimizer.sh, self-healing.sh |
## Rules
- Never write files outside the user's project directory
- Always ask before overwriting existing files
- Hook templates must be bash 3.2 compatible (Intel Mac)
- Generated agents must have valid YAML frontmatter with `<example>` blocks
- Use `${CLAUDE_PLUGIN_ROOT}` for all intra-plugin paths
- Domain templates use `{{PLACEHOLDER}}` syntax (plain string replace, no engine)
```
**`README.md`** — Complete rewrite with:
```markdown
# Agent Factory
> Install one plugin. Build complete agent systems that do real work.
Agent Factory is a Claude Code plugin that guides you through building
autonomous agent systems from scratch — covering every layer from
agents and pipelines to memory, heartbeat scheduling, governance, and
deployment.
**Recommended installation:** via [ktg-plugin-marketplace](https://git.fromaitochitta.com/open/ktg-plugin-marketplace), which bundles Agent Factory with the ultra-suite (ultraplan, ultraresearch, ultraexecute).
```bash
# Install from marketplace (recommended)
/install ktg-plugin-marketplace
# Or install standalone
/install agent-factory
# or
claude --plugin-dir ./agent-factory
```
---
## Commands
| Command | What it does |
|---------|-------------|
| `/agent-factory:build` | Guided build workflow — 10 phases from blank to deployed system |
| `/agent-factory:deploy` | Configure deployment for any target (cloud, local, VPS, Docker) |
| `/agent-factory:evaluate` | Score your system against all 22 agent capabilities |
| `/agent-factory:status` | Quick health check: agents, skills, hooks, deployment, memory |
---
## What it builds
A complete autonomous agent system with:
**Core infrastructure**
- Agent files (`.claude/agents/*.md`) with reliable trigger patterns
- Pipeline skill that chains agents end to end
- Pre-tool-use and post-tool-use security hooks
- Settings with granular permission allow/deny rules
**Memory (OpenClaw pattern)**
- **Session state** (hot) — Working memory with WAL protocol. Written before responding; survives crashes.
- **Daily logs** (warm) — One file per day; captures decisions and carry-forward items.
- **Long-term memory** (cold) — Curated knowledge that persists indefinitely.
**Heartbeat scheduling (OpenClaw + Paperclip)**
- `HEARTBEAT.md` — Defines scheduled tasks and intervals
- `heartbeat-runner.sh` — Bash 3.2 script with emptiness detection (no API call if nothing to do), startup catchup (catches up to 5 missed tasks after downtime), and task state tracking
- Context injection — `context-packet.md` + `wake-prompt.md` give agents their full state on each wakeup
**Proactive agents (OpenClaw)**
- Anti-Drift Limits (ADL) — Prevents agents from faking capabilities, expanding scope without approval, or making unverifiable changes
- Value-First Modification (VFM) — Requires proposed self-improvements to score >50/100 before implementation
**Goal hierarchy (Paperclip)**
- `GOALS.md` — Three-level goal tree: company → project → task
- Simple `parent_id` references (not recursive traversal — matches actual Paperclip implementation)
- Goal IDs in agent prompts for alignment
**Budget tracking (Paperclip)**
- `BUDGET.md` — Monthly budget policy per agent and per project
- `budget-hook.sh` — PostToolUse hook that logs cost events, warns at 80%, pauses at 100%
- `budget-report.sh` — Per-agent cost breakdown and projection
**Governance (Paperclip)**
- Autonomy levels 0-4 (from "all calls need approval" to "full autonomy with hook guardrails")
- Approval gates — specific operations that always pause for human review
- `approval-gate.sh` — PreToolUse hook implementing the gate logic
- Audit trail: tool calls, budget events, approval decisions
**Org chart (Paperclip)**
- `ORG-CHART.md` — Agent hierarchy with `reportsTo` references
- Delegation rules, cross-team routing, human override authority
- `org-manager.sh` — Validates hierarchy, generates org tree visualization
**MCP integrations**
- `.mcp.json` configuration for Slack, GitHub, Playwright, databases
- Pause/resume build state when MCP server creation is needed
- Trade-off guidance: MCP vs Bash vs custom server
**10 domain templates**
Start from a pre-built system rather than blank:
| Template | Domain | Agents |
|----------|--------|--------|
| content-pipeline | Articles, newsletters, reports | researcher, writer, reviewer |
| code-review | Automated PR review | analyzer, review-writer, standards-checker |
| monitoring | System/service monitoring | monitor-checker, incident-reporter, remediation-advisor |
| research-synthesis | Research and analysis | source-gatherer, synthesizer, fact-checker |
| data-processing | Data transformation | data-validator, transformer, quality-checker |
| customer-support | Ticket handling | classifier, response-drafter, escalation-checker |
| devops-automation | Deployment and incidents | deploy-checker, incident-detector, runbook-executor |
| legal-review | Document review | clause-extractor, risk-assessor, compliance-checker |
| sales-intelligence | Prospect research | prospect-researcher, pitch-customizer, follow-up-tracker |
| security-audit | Security posture | config-scanner, vulnerability-checker, remediation-advisor |
**Deployment (5 targets)**
| Target | Needs | Local files | Best for |
|--------|-------|-------------|----------|
| `/schedule` (cloud) | GitHub repo | No | Repo maintenance, PR review, CI triage |
| Desktop tasks | Desktop app | Yes | Local automation while you work |
| cron/launchd | Machine on | Yes | Personal daily pipelines |
| VPS systemd | Linux server | Yes | Team pipelines, production |
| Docker | Docker installed | Via volumes | Isolated, portable, reproducible |
**Import/export**
- `export-system.sh` — Package your agent system into a versioned tarball with MANIFEST.md
- `import-system.sh` — Import a system tarball, check for conflicts, validate all components
- MANIFEST.md — Component list with checksums
**Feedback and self-learning**
- `FEEDBACK.md` — Records pipeline scores and recurring issues
- `feedback-collector.sh` — PostToolUse hook that detects patterns after 3+ occurrences
- `performance-scorer.sh` — Per-agent metrics and improvement trends
- `pipeline-optimizer.sh` — VFM-scored recommendations for bottleneck agents
- `self-healing.sh` — Categorized error recovery with backoff and max-attempt limits
---
## Quick start
```bash
# Install and run
/install agent-factory
/agent-factory:build
# Start from a domain template
/agent-factory:build monitoring
# Resume an interrupted build
/agent-factory:build --resume
# Evaluate what you've built
/agent-factory:evaluate
# Check system health
/agent-factory:status
```
---
## Architecture overview
```
Phase 0: Choose template (or custom)
Phase 1: Map your work — pipeline design sketch
Phase 2: Operating manual — CLAUDE.md
Phase 2.5: Memory setup — 3-tier memory
Phase 3: Agent team — .claude/agents/*.md
Phase 3.5: Skills — wire up or generate skill files
Phase 3.7: Proactive agent — ADL/VFM guardrails
Phase 4: Pipeline — .claude/skills/[name].md
Phase 4.5: Integrations — .mcp.json + MCP servers
Phase 5: Security — hooks + settings.json
Phase 5.5: Governance — GOVERNANCE.md + autonomy level
Phase 5.7: Goals and org chart — for 3+ agent systems
Phase 6: Deployment — choose and configure target
Phase 7: Test and iterate — feedback loop setup
```
---
## Pattern reference
Patterns implemented by this plugin are documented in the skills:
- **22 agent capabilities**`skills/agent-system-design/references/feature-map.md`
- **Memory patterns**`skills/agent-system-design/references/memory-patterns.md`
- **Autonomy patterns**`skills/agent-system-design/references/autonomy-patterns.md`
- **Orchestration patterns**`skills/agent-system-design/references/orchestration-patterns.md`
- **Governance patterns**`skills/agent-system-design/references/governance-patterns.md`
- **MCP integrations**`skills/agent-system-design/references/mcp-integrations.md`
- **Managed Agents (API)**`skills/managed-agents/SKILL.md`
---
## Version history
| Version | What changed |
|---------|-------------|
| 1.0.0 | Full vision: 10 templates, memory, heartbeat, governance, budget, org-chart, Docker, import/export |
| 0.2.0 | Rename to agent-factory, add deploy/evaluate/status commands, managed-agents skill, 5 domain templates |
| 0.1.0 | Initial release: build command, builder agent, agent-system-design skill, 3 hook templates |
---
## Value proposition
Install one plugin. Build complete agent systems that do real work. No servers, no databases, no configuration files beyond what the agents themselves manage. Everything runs in Claude Code with shell scripts you can read, edit, and understand.
```
### Verify
```bash
python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'; print('OK')"
```
Expected: `OK`
Also verify CLAUDE.md lists all 13 template directories:
```bash
grep -c "memory/\|heartbeat/\|proactive/\|cron/\|goals/\|budget/\|governance/\|org-chart/\|docker/\|domains/\|transfer/\|feedback/\|optimization/" /Users/ktg/repos/agent-builder/CLAUDE.md
```
Expected: `13`
Also verify README.md has the value proposition line:
```bash
grep -c "No servers, no databases, no configuration" /Users/ktg/repos/agent-builder/README.md
```
Expected: `1`
### On failure
revert — manifest must be valid JSON with correct version
### Checkpoint
```bash
git commit -m "feat: Agent Factory v1.0.0 — full vision realized"
```
---
## Exit Condition
- [ ] `grep -c "agent-factory" /Users/ktg/repos/agent-builder/commands/build.md``>= 3`
- [ ] `wc -l /Users/ktg/repos/agent-builder/commands/build.md | awk '{print ($1 >= 600)}'``1`
- [ ] `grep -c "build-state.json" /Users/ktg/repos/agent-builder/commands/build.md``>= 3`
- [ ] `grep -c "Phase 2.5\|Phase 3.5\|Phase 3.7\|Phase 4.5\|Phase 5.5\|Phase 5.7" /Users/ktg/repos/agent-builder/commands/build.md``6`
- [ ] `python3 -c "import json; d=json.load(open('/Users/ktg/repos/agent-builder/.claude-plugin/plugin.json')); assert d['version']=='1.0.0'; assert d['name']=='agent-factory'"` → no error
- [ ] `grep -c "memory/\|heartbeat/\|proactive/\|cron/\|goals/\|budget/\|governance/\|org-chart/\|docker/\|domains/\|transfer/\|feedback/\|optimization/" /Users/ktg/repos/agent-builder/CLAUDE.md``13`
- [ ] `grep -c "No servers, no databases, no configuration" /Users/ktg/repos/agent-builder/README.md``1`
- [ ] `grep -c "1.0.0" /Users/ktg/repos/agent-builder/README.md``>= 1`
## Quality Criteria
- `commands/build.md` has all 6 new phases (2.5, 3.5, 3.7, 4.5, 5.5, 5.7) in correct phase-number order
- Phase 0 domain template selection asks by name and reads the correct template file via `${CLAUDE_PLUGIN_ROOT}/scripts/templates/domains/{{TEMPLATE}}.md`
- Phase 3.5 pause/resume writes and reads `build-state.json` with the exact schema specified
- Phase 4.5 covers all 3 options for missing MCP servers: Bash fallback, community search, custom build with pause
- Phase 5.5 governance includes both GOVERNANCE.md generation and the optional budget tracking setup
- Phase 5.7 is conditional: only shown for systems with 3+ agents
- Phase 6 deployment table shows all 5 targets with `Local files` and `Minimum interval` columns
- Resume logic (`--resume` in $ARGUMENTS) reads build-state.json and skips completed phases
- `plugin.json` version is exactly `"1.0.0"`, name is `"agent-factory"`, has all 14 keywords including `"memory"`, `"heartbeat"`, `"budget"`, `"governance"`, `"org-chart"`, `"templates"`, `"import"`, `"export"`
- `CLAUDE.md` lists all 13 template directories in the Template directories table
- `README.md` command table has all 4 commands
- `README.md` deployment table has all 5 targets with trade-offs
- `README.md` domain template table has all 10 templates
- `README.md` ends with the value proposition: "No servers, no databases, no configuration..."
- Steps 8 and 26 are composable: Step 26 builds on Step 8's Phase 0 and Phase 6 changes without reverting them

View file

@ -993,16 +993,17 @@ graph TD
| Low | Docker template may need updates for newer Claude Code versions | `scripts/templates/docker/` | Docker deployment breaks | Dockerfile uses `node:22-slim` and `npm install -g` which auto-updates |
| Low | Domain templates may not match all user domains | `scripts/templates/domains/` | Users need custom templates | 10 templates cover common domains; builder agent can create custom templates |
## Assumptions
## Assumptions (VERIFIED 2026-04-11)
| # | Assumption | Why unverifiable | Impact if wrong |
|---|-----------|-----------------|-----------------|
| 1 | Anthropic billing API exists and is accessible with standard API key | No docs found confirming exact endpoint | Budget tracking falls back to token-based estimation; budget-hook.sh needs manual cost configuration |
| 2 | `/schedule` trigger interface is stable enough to build on | Claude Code internal API, no stability guarantee | Heartbeat templates still work with cron/launchd/systemd; `/schedule` is optional deployment target |
| 3 | Docker deployment should use docker-compose.yml with Dockerfile | Spec assumption, no user confirmation | Minor: can generate either format; both templates provided |
| 4 | `claude --resume` with custom session IDs works for isolated agent turns | Based on CLI docs, not tested with custom session key formats | agentTurn template may need session ID format adjustment |
| # | Assumption | Status | Finding | Impact on plan |
|---|-----------|--------|---------|----------------|
| 1 | Anthropic billing API exists and is accessible with standard API key | **PARTIAL** | Usage & Cost API exists at `/v1/organizations/usage_report/messages` and `/v1/organizations/cost_report`. BUT requires Admin API key (`sk-ant-admin...`), not standard API key. Only available for organizations, not individual accounts. CLI has `--max-budget-usd` flag for print-mode budget caps. | budget-hook.sh MUST use token-count estimation as primary method. Admin API integration is optional, org-only. Template README should document this limitation clearly. CLI `--max-budget-usd` is a simpler alternative for headless runs. |
| 2 | `claude --resume` with custom session IDs works for isolated agent turns | **WRONG** | `--session-id` requires a valid UUID. Custom formats like `agent:name:turn:123` are rejected. `--name` flag sets human-readable session names. `--resume` accepts session names for fuzzy search. | agent-turn.sh template MUST use `--name "agent:{{AGENT_NAME}}:turn"` + `--resume "agent:{{AGENT_NAME}}:turn"` pattern instead of custom session IDs. For deterministic UUIDs: `python3 -c "import uuid; print(uuid.uuid5(uuid.NAMESPACE_DNS, 'agent:name:task'))"`. |
| 3 | `/schedule` trigger interface is stable enough to build on | **CONFIRMED with caveats** | `/schedule` is GA, available to Pro/Max/Team/Enterprise. Runs on Anthropic cloud from GitHub clone (NOT local files). 1-hour minimum interval. Desktop scheduled tasks (`/schedule` in Desktop app) are the local alternative with 1-minute minimum and local file access. MCP connectors available for cloud tasks. | Deploy command must distinguish: `/schedule` = cloud (needs GitHub, no local files), Desktop scheduled tasks = local machine, cron/launchd/systemd = traditional local. Heartbeat templates are for local/VPS/Docker only; `/schedule` has its own mechanism. |
| 4 | Docker deployment should use docker-compose.yml with Dockerfile | **CONFIRMED** | Well-established pattern. Official Anthropic devcontainer reference exists. Docker sandboxes documented for safe autonomous operation. `--dangerously-skip-permissions` for unattended operation inside containers. Key security: `security_opt: no-new-privileges:true`, never mount Docker socket, mount only specific project folders. | docker-compose.yml template should add `security_opt: [no-new-privileges:true]`. Dockerfile approach is correct. Add Docker socket warning to README. |
**WARNING: Plan has 4 unverified assumptions -- review before executing.**
**All 4 assumptions verified. 1 was wrong (session IDs), 1 was partial (billing API), 2 confirmed.**
Sources: [CLI Reference](https://code.claude.com/docs/en/cli-reference), [Scheduled Tasks](https://code.claude.com/docs/en/web-scheduled-tasks), [Usage & Cost API](https://platform.claude.com/docs/en/build-with-claude/usage-cost-api), [Docker Blog](https://www.docker.com/blog/docker-sandboxes-run-claude-code-and-other-coding-agents-unsupervised-but-safely/)
## Verification