agent-builder/.claude/plans/blueprints/session-5-selflearning.md
Kjell Tore Guttormsen 1a776bdeb2 docs(plans): create session blueprints for Agent Factory execution
8 session blueprints covering all 27 steps across 3 waves:
- Session 1: Foundation (rename + commands, Steps 1-5)
- Session 2: Skills and templates (Steps 6-7)
- Session 3: OpenClaw patterns (memory/heartbeat/proactive/cron, Steps 9-12)
- Session 4: Paperclip patterns (context/goals/budget/governance/org-chart, Steps 14-18)
- Session 5: Self-learning (feedback/optimization, Steps 20-21)
- Session 6: Integration (Docker/transfer/5 more domains, Steps 22-24)
- Session 7: Skill updates (memory/autonomy/orchestration/governance/MCP refs, Steps 13,19,25)
- Session 8: Finalization (build command integration + v1.0, Steps 8,26,27)

Also updates plan assumptions table with verified findings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 11:21:17 +02:00

997 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Session 5: Self-Learning Systems
> Steps 20, 21 | Wave 1 | Depends on: none
## Dependencies
Entry condition: none (independent — creates new template directories only)
## Scope Fence
**Touch:**
- `scripts/templates/feedback/FEEDBACK.md` (new)
- `scripts/templates/feedback/feedback-collector.sh` (new)
- `scripts/templates/feedback/performance-scorer.sh` (new)
- `scripts/templates/feedback/README.md` (new)
- `scripts/templates/optimization/pipeline-optimizer.sh` (new)
- `scripts/templates/optimization/self-healing.sh` (new)
- `scripts/templates/optimization/README.md` (new)
**Never touch:**
- `commands/`
- `agents/`
- `skills/`
- `scripts/templates/heartbeat/`
- `scripts/templates/memory/`
- `scripts/templates/proactive/`
- `scripts/templates/cron/`
- `scripts/templates/goals/`
- `scripts/templates/budget/`
- `scripts/templates/governance/`
- `scripts/templates/org-chart/`
- `.claude-plugin/`, `CLAUDE.md`, `README.md`
---
## Step 20: Create feedback loop templates
### Files to create
**`scripts/templates/feedback/FEEDBACK.md`** — Feedback tracking file:
```markdown
# Feedback Log: {{PROJECT_NAME}}
> Append-only. One row per pipeline run. Reviewed by performance-scorer.sh.
## Feedback Table
| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |
|------|----------|-------|-------|-------|------------|---------|
| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} |
## Pattern Tags
Use consistent tags so performance-scorer.sh can detect recurring issues:
- `quality-low` — output below acceptance threshold
- `loop-excess` — more revision iterations than expected
- `timeout` — agent exceeded time budget
- `tool-fail` — tool call failed or returned unexpected result
- `cost-spike` — single run cost exceeded 3x average
- `scope-drift` — agent worked outside defined scope
- `hallucination` — output contained factual errors
## Notes
Scores are 0100 as assigned by the reviewer agent or human reviewer.
A score below 60 triggers a flag in performance-scorer.sh.
Three or more rows with the same Pattern tag = recurring issue.
Recurring issues should drive prompt iteration or pipeline redesign.
```
**`scripts/templates/feedback/feedback-collector.sh`** — PostToolUse hook variant that appends feedback after pipeline completion:
```bash
#!/bin/bash
# PostToolUse hook: Collect feedback after pipeline completion.
# Bash 3.2 compatible. Uses python3 for JSON parsing and CSV/MD append.
#
# Triggered after a designated "review" tool call completes.
# Reads pipeline output and reviewer score, appends to FEEDBACK.md,
# and detects recurring patterns (3+ rows with same tag = recurring).
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
# {{PIPELINE_NAME}} - name of the pipeline being tracked
# {{SCORE_THRESHOLD}} - minimum acceptable score (default: 60)
WORKING_DIR="{{WORKING_DIR}}"
PIPELINE_NAME="{{PIPELINE_NAME}}"
SCORE_THRESHOLD="${SCORE_THRESHOLD:-60}"
FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md"
HOOK_INPUT=$(cat)
# Only act on review tool calls
TOOL_NAME=$(echo "$HOOK_INPUT" | python3 -c "
import sys, json
try:
data = json.load(sys.stdin)
print(data.get('tool_name', ''))
except:
print('')
" 2>/dev/null)
if [ "$TOOL_NAME" != "review_pipeline" ] && [ "$TOOL_NAME" != "score_output" ]; then
exit 0
fi
# Extract score, agent, issue, resolution, pattern from hook input
python3 << PYEOF
import sys, json, re, os
from datetime import datetime
hook_input = """$HOOK_INPUT"""
feedback_file = "$FEEDBACK_FILE"
pipeline_name = "$PIPELINE_NAME"
score_threshold = int("$SCORE_THRESHOLD")
try:
data = json.loads(hook_input)
except Exception:
sys.exit(0)
tool_result = data.get('tool_result', '')
if isinstance(tool_result, dict):
tool_result = json.dumps(tool_result)
# Parse structured fields from tool result (expects JSON or key:value)
agent_name = os.environ.get('AGENT_NAME', 'unknown')
score = 0
issue = ''
resolution = ''
pattern = ''
try:
result_data = json.loads(tool_result)
agent_name = result_data.get('agent', agent_name)
score = int(result_data.get('score', 0))
issue = result_data.get('issue', '')
resolution = result_data.get('resolution', '')
pattern = result_data.get('pattern', '')
except Exception:
# Fallback: look for score: N in plain text
m = re.search(r'score[:\s]+(\d+)', tool_result, re.IGNORECASE)
if m:
score = int(m.group(1))
m = re.search(r'pattern[:\s]+(\S+)', tool_result, re.IGNORECASE)
if m:
pattern = m.group(1)
if score == 0 and not issue:
sys.exit(0)
date_str = datetime.utcnow().strftime('%Y-%m-%d')
row = f"| {date_str} | {pipeline_name} | {agent_name} | {score}/100 | {issue} | {resolution} | {pattern} |"
# Append to feedback table
if not os.path.exists(feedback_file):
print(f"Warning: {feedback_file} not found — skipping feedback append")
sys.exit(0)
with open(feedback_file, 'r') as f:
content = f.read()
# Insert row after the header row of the table
table_header = '| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |'
separator = '|------|----------|-------|-------|-------|------------|---------|'
placeholder_row = '| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} |'
if placeholder_row in content:
# Replace placeholder with real row + keep placeholder for next time
content = content.replace(placeholder_row, row + '\n' + placeholder_row)
elif separator in content:
content = content.replace(separator, separator + '\n' + row)
else:
content += '\n' + row + '\n'
with open(feedback_file, 'w') as f:
f.write(content)
print(f"Feedback recorded: score={score}, pattern={pattern}")
# Detect recurring patterns
if pattern:
pattern_count = content.count(f'| {pattern} |')
if pattern_count >= 3:
print(f"RECURRING PATTERN DETECTED: '{pattern}' appears {pattern_count} times")
print(f"Action required: review prompt or pipeline for '{pipeline_name}'")
# Flag low scores
if score < score_threshold and score > 0:
print(f"LOW SCORE: {score} < threshold {score_threshold} for agent {agent_name}")
PYEOF
exit 0
```
**`scripts/templates/feedback/performance-scorer.sh`** — Standalone scoring script that reads FEEDBACK.md and cost-events.jsonl:
```bash
#!/bin/bash
# Performance scorer: per-agent metrics from FEEDBACK.md + cost-events.jsonl.
# Bash 3.2 compatible. Uses python3 for all metrics computation.
#
# Metrics per agent:
# - Average score (0-100)
# - Error rate (rows with score < threshold / total rows)
# - Cost per run (from cost-events.jsonl, rough proxy)
# - Improvement trend: avg of last 10 scores vs. previous 10
#
# Flags agents below threshold (default 60/100).
#
# Usage:
# ./performance-scorer.sh # Score all agents
# ./performance-scorer.sh --agent {{AGENT}} # Score specific agent
# ./performance-scorer.sh --threshold 70 # Custom threshold
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md"
COST_LOG="$WORKING_DIR/budget/cost-events.jsonl"
THRESHOLD="${2:-60}"
AGENT_FILTER=""
# Parse arguments (bash 3.2 compatible — no associative arrays)
while [ "$#" -gt 0 ]; do
case "$1" in
--agent) AGENT_FILTER="$2"; shift 2 ;;
--threshold) THRESHOLD="$2"; shift 2 ;;
*) shift ;;
esac
done
if [ ! -f "$FEEDBACK_FILE" ]; then
echo "No feedback file found at $FEEDBACK_FILE"
exit 0
fi
python3 << PYEOF
import re, json, os, sys
from collections import defaultdict
feedback_file = "$FEEDBACK_FILE"
cost_log = "$COST_LOG"
threshold = int("$THRESHOLD")
agent_filter = "$AGENT_FILTER"
# Parse FEEDBACK.md table rows
# Expected columns: Date, Pipeline, Agent, Score, Issue, Resolution, Pattern
feedback_rows = []
with open(feedback_file) as f:
in_table = False
header_seen = False
for line in f:
line = line.strip()
if '| Date |' in line:
in_table = True
header_seen = True
continue
if in_table and line.startswith('|---'):
continue
if in_table and line.startswith('|') and '{{' not in line and header_seen:
cols = [c.strip() for c in line.strip('|').split('|')]
if len(cols) >= 7:
try:
date = cols[0]
pipeline = cols[1]
agent = cols[2]
score_str = cols[3]
issue = cols[4]
resolution = cols[5]
pattern = cols[6]
# Parse score: "75/100" or "75"
score_m = re.match(r'(\d+)', score_str)
score = int(score_m.group(1)) if score_m else 0
feedback_rows.append({
'date': date,
'pipeline': pipeline,
'agent': agent,
'score': score,
'issue': issue,
'pattern': pattern
})
except (ValueError, IndexError):
pass
# Filter by agent if specified
if agent_filter:
feedback_rows = [r for r in feedback_rows if r['agent'] == agent_filter]
if not feedback_rows:
print("No feedback rows found.")
sys.exit(0)
# Read cost events if available
cost_by_agent = defaultdict(int)
if os.path.exists(cost_log):
with open(cost_log) as f:
for line in f:
line = line.strip()
if line:
try:
event = json.loads(line)
agent = event.get('agent', 'unknown')
cost_by_agent[agent] += 1 # event count as proxy
except Exception:
pass
# Compute per-agent metrics
agents = list(set(r['agent'] for r in feedback_rows))
print("PERFORMANCE SCORECARD")
print("=" * 60)
print(f"Threshold: {threshold}/100")
print(f"Total feedback rows: {len(feedback_rows)}")
print()
flagged = []
for agent in sorted(agents):
rows = [r for r in feedback_rows if r['agent'] == agent]
scores = [r['score'] for r in rows]
avg_score = sum(scores) / len(scores) if scores else 0
error_rate = len([s for s in scores if s < threshold]) / len(scores) if scores else 0
cost_events = cost_by_agent.get(agent, 0)
cost_per_run = cost_events / len(rows) if rows else 0
# Improvement trend: last 10 vs. prev 10
trend_str = "n/a (fewer than 20 runs)"
if len(scores) >= 20:
prev10 = scores[-20:-10]
last10 = scores[-10:]
prev_avg = sum(prev10) / len(prev10)
last_avg = sum(last10) / len(last10)
delta = last_avg - prev_avg
if delta > 5:
trend_str = f"improving (+{delta:.1f})"
elif delta < -5:
trend_str = f"declining ({delta:.1f})"
else:
trend_str = f"stable ({delta:+.1f})"
elif len(scores) >= 10:
last10 = scores[-10:]
trend_str = f"recent avg: {sum(last10)/len(last10):.1f} (need 20 runs for trend)"
# Pattern frequency
patterns = defaultdict(int)
for r in rows:
if r['pattern']:
patterns[r['pattern']] += 1
top_patterns = sorted(patterns.items(), key=lambda x: -x[1])[:3]
print(f"Agent: {agent}")
print(f" Runs: {len(rows)}")
print(f" Avg score: {avg_score:.1f}/100")
print(f" Error rate: {error_rate*100:.0f}% (score < {threshold})")
print(f" Cost/run: ~{cost_per_run:.1f} events (rough proxy)")
print(f" Trend: {trend_str}")
if top_patterns:
print(f" Top patterns: {', '.join(f'{p}({c})' for p, c in top_patterns)}")
print()
if avg_score < threshold:
flagged.append((agent, avg_score))
# Summary of flagged agents
if flagged:
print("FLAGGED AGENTS (below threshold)")
print("-" * 40)
for agent, avg in flagged:
print(f" {agent}: avg {avg:.1f} < {threshold}")
print()
print("Recommended actions:")
print(" 1. Review feedback rows for top patterns")
print(" 2. Iterate on agent system prompt")
print(" 3. Consider pipeline redesign if pattern is structural")
print(" 4. Run pipeline-optimizer.sh for bottleneck analysis")
else:
print("All agents above threshold.")
PYEOF
```
**`scripts/templates/feedback/README.md`** — Explains the feedback loop pattern:
```markdown
# Feedback Loop
Systematic feedback collection and performance scoring for agent pipelines.
## How it works
1. After each pipeline run, a reviewer agent (or human) assigns a score (0100)
and categorizes any issues with a pattern tag.
2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or
`score_output` tool calls. It appends a row to `FEEDBACK.md`.
3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires.
4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl`
to compute per-agent metrics: average score, error rate, cost per run,
improvement trend (last 10 vs. previous 10 runs).
5. Agents scoring below the threshold (default 60/100) are flagged for review.
## Pattern tags
Consistent tags are required for pattern detection to work. Use the tags
defined in `FEEDBACK.md`. Add project-specific tags as needed — but be
consistent. Inconsistent tagging produces false negatives.
## Scoring → self-improvement connection
Feedback scores are the input to VFM (Value-for-Money) pre-scoring
defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11).
A low-scoring agent gets a lower VFM pre-score for future pipeline tasks,
making it less likely to be selected until its performance improves.
The feedback loop closes the improvement cycle:
1. Pipeline runs → reviewer assigns score + pattern tag
2. `feedback-collector.sh` appends to FEEDBACK.md
3. `performance-scorer.sh` flags underperforming agents
4. Developer reviews top patterns → iterates on agent prompt
5. New runs produce new feedback → trend shows improvement
6. VFM scores update automatically on next pipeline selection
## Example: prompt iteration driven by feedback
Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`:
```
| 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low |
| 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low |
| 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low |
```
After 3 rows: feedback-collector.sh fires the recurring-pattern alert.
performance-scorer.sh shows avg 45/100, error rate 100%.
Action: update agent-writer's system prompt with explicit length and
depth requirements. Next 10 runs show trend "improving (+18.3)".
## Integration
Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`:
```json
{
"hooks": {
"PostToolUse": [{
"matcher": "review_pipeline",
"hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}]
}]
}
}
```
Run performance-scorer.sh on demand or as a scheduled report:
```bash
./feedback/performance-scorer.sh
./feedback/performance-scorer.sh --agent agent-writer --threshold 70
```
```
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add feedback loop and performance scoring templates"
```
---
## Step 21: Create pipeline optimization templates
### Files to create
**`scripts/templates/optimization/pipeline-optimizer.sh`** — Analyzes pipeline performance and generates recommendations:
```bash
#!/bin/bash
# Pipeline optimizer: identify bottlenecks, excess loops, cost outliers.
# Bash 3.2 compatible. Uses python3 for all analysis.
# Does NOT auto-implement any changes — produces RECOMMENDATIONS.md only.
#
# Analysis covers:
# - Bottleneck agents (highest avg duration or cost per run)
# - Unnecessary revision loops (agents that loop 3+ times on average)
# - Underutilized agents (invoked < 10% of pipeline runs)
# - Cost outliers (single run cost >= 3x average)
#
# Output: RECOMMENDATIONS.md with VFM pre-scores for each recommendation.
#
# Usage:
# ./pipeline-optimizer.sh
# ./pipeline-optimizer.sh --pipeline {{PIPELINE_NAME}}
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md"
COST_LOG="$WORKING_DIR/budget/cost-events.jsonl"
RECOMMENDATIONS_FILE="$WORKING_DIR/RECOMMENDATIONS.md"
PIPELINE_FILTER=""
# Parse arguments (bash 3.2 compatible)
while [ "$#" -gt 0 ]; do
case "$1" in
--pipeline) PIPELINE_FILTER="$2"; shift 2 ;;
*) shift ;;
esac
done
python3 << PYEOF
import re, json, os, sys
from collections import defaultdict
from datetime import datetime
feedback_file = "$FEEDBACK_FILE"
cost_log = "$COST_LOG"
recommendations_file = "$RECOMMENDATIONS_FILE"
pipeline_filter = "$PIPELINE_FILTER"
# Parse FEEDBACK.md
feedback_rows = []
if os.path.exists(feedback_file):
with open(feedback_file) as f:
in_table = False
for line in f:
line = line.strip()
if '| Date |' in line:
in_table = True
continue
if in_table and line.startswith('|---'):
continue
if in_table and line.startswith('|') and '{{' not in line:
cols = [c.strip() for c in line.strip('|').split('|')]
if len(cols) >= 7:
try:
score_m = re.match(r'(\d+)', cols[3])
score = int(score_m.group(1)) if score_m else 0
feedback_rows.append({
'date': cols[0],
'pipeline': cols[1],
'agent': cols[2],
'score': score,
'issue': cols[4],
'pattern': cols[6]
})
except (ValueError, IndexError):
pass
# Filter by pipeline
if pipeline_filter:
feedback_rows = [r for r in feedback_rows if r['pipeline'] == pipeline_filter]
# Parse cost events
cost_events = []
if os.path.exists(cost_log):
with open(cost_log) as f:
for line in f:
line = line.strip()
if line:
try:
cost_events.append(json.loads(line))
except Exception:
pass
# Per-agent event counts (cost proxy)
cost_by_agent = defaultdict(list)
# Group by agent+date for per-run cost
run_costs = defaultdict(list)
for e in cost_events:
agent = e.get('agent', 'unknown')
date = e.get('timestamp', '')[:10]
run_key = f"{agent}:{date}"
cost_by_agent[agent].append(1)
run_costs[agent].append(1)
# Build recommendations
recommendations = []
# 1. Bottleneck agents: top 2 by event count
if cost_by_agent:
agent_totals = [(a, len(events)) for a, events in cost_by_agent.items()]
agent_totals.sort(key=lambda x: -x[1])
for agent, total in agent_totals[:2]:
all_costs = [len(v) for v in run_costs.values()]
avg_cost = sum(all_costs) / len(all_costs) if all_costs else 1
if total > avg_cost * 1.5:
recommendations.append({
'type': 'bottleneck',
'agent': agent,
'description': f"Agent '{agent}' accounts for {total} events vs avg {avg_cost:.0f}. "
f"Consider batching its tool calls or reducing its task scope.",
'vfm_prescore': 70
})
# 2. Unnecessary revision loops: agents with loop-excess pattern >= 3 times
pattern_by_agent = defaultdict(lambda: defaultdict(int))
for r in feedback_rows:
if r['pattern']:
pattern_by_agent[r['agent']][r['pattern']] += 1
for agent, patterns in pattern_by_agent.items():
if patterns.get('loop-excess', 0) >= 3:
count = patterns['loop-excess']
recommendations.append({
'type': 'loop-excess',
'agent': agent,
'description': f"Agent '{agent}' has {count} feedback rows tagged 'loop-excess'. "
f"Review pipeline revision criteria — tighten acceptance conditions "
f"or add a max-iterations guard (see self-healing.sh).",
'vfm_prescore': 80
})
# 3. Underutilized agents: invoked in < 10% of pipeline runs
if feedback_rows:
all_runs = set(r['date'] + ':' + r['pipeline'] for r in feedback_rows)
total_runs = len(all_runs) if all_runs else 1
agent_runs = defaultdict(set)
for r in feedback_rows:
agent_runs[r['agent']].add(r['date'] + ':' + r['pipeline'])
for agent, runs in agent_runs.items():
utilization = len(runs) / total_runs
if utilization < 0.1 and total_runs >= 10:
recommendations.append({
'type': 'underutilized',
'agent': agent,
'description': f"Agent '{agent}' appears in only {utilization*100:.0f}% of pipeline runs. "
f"Consider removing from the pipeline or combining with another agent.",
'vfm_prescore': 60
})
# 4. Cost outliers: single-run cost >= 3x average
if run_costs:
all_run_totals = []
for agent, runs in run_costs.items():
all_run_totals.extend(runs)
avg_run = sum(all_run_totals) / len(all_run_totals) if all_run_totals else 1
for agent, runs in run_costs.items():
for run_cost in runs:
if run_cost >= avg_run * 3:
recommendations.append({
'type': 'cost-outlier',
'agent': agent,
'description': f"Agent '{agent}' had a run costing {run_cost} events "
f"vs avg {avg_run:.1f} (3x+ threshold). "
f"Add per-run budget cap with budget-hook.sh.",
'vfm_prescore': 75
})
break # one recommendation per agent
# Write RECOMMENDATIONS.md
timestamp = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
pipeline_label = pipeline_filter if pipeline_filter else "all pipelines"
lines = [
f"# Pipeline Optimization Recommendations",
f"",
f"Generated: {timestamp}",
f"Scope: {pipeline_label}",
f"",
f"> These are recommendations only. No changes have been made.",
f"> Review each item and implement manually or with team approval.",
f"",
]
if recommendations:
lines.append(f"## Recommendations ({len(recommendations)} found)")
lines.append("")
for i, rec in enumerate(recommendations, 1):
lines.append(f"### R{i}: {rec['type'].upper()} — {rec['agent']}")
lines.append("")
lines.append(rec['description'])
lines.append("")
lines.append(f"**VFM pre-score:** {rec['vfm_prescore']}/100")
lines.append("")
else:
lines.append("## No recommendations")
lines.append("")
lines.append("No bottlenecks, excess loops, underutilized agents, or cost outliers detected.")
lines.append("")
lines.append("## Next steps")
lines.append("")
lines.append("1. Review each recommendation with the team")
lines.append("2. Prioritize by VFM pre-score (higher = more value per effort)")
lines.append("3. Implement approved changes one at a time")
lines.append("4. Run feedback-collector.sh for 10+ runs after each change")
lines.append("5. Re-run pipeline-optimizer.sh to confirm improvement")
with open(recommendations_file, 'w') as f:
f.write('\n'.join(lines) + '\n')
print(f"Recommendations written to {recommendations_file}")
print(f" Found: {len(recommendations)} recommendations")
for rec in recommendations:
print(f" - [{rec['type']}] {rec['agent']}: VFM pre-score {rec['vfm_prescore']}")
PYEOF
```
**`scripts/templates/optimization/self-healing.sh`** — Error recovery after agent/pipeline failures:
```bash
#!/bin/bash
# Self-healing: categorize errors and apply recovery strategies.
# Bash 3.2 compatible. Uses python3 for JSON/log parsing.
#
# Error categories and recovery strategies:
# timeout → retry with shorter task scope
# permission-denied → log and skip (do not retry)
# tool-not-found → log and alert, do not retry
# api-error → exponential backoff, max 3 retries
# content-quality → re-run with stricter prompt, max 2 retries
#
# Max total attempts: 5 (OpenClaw pattern — hard cap regardless of category).
# All recovery events logged to healing-log.jsonl.
#
# Usage:
# ./self-healing.sh --error-type <type> --agent <name> --attempt <n> --context <msg>
#
# Exit codes:
# 0 — recovery action taken (caller should retry)
# 1 — no recovery possible (caller should abort)
# 2 — max attempts reached (caller should escalate)
#
# Placeholders:
# {{WORKING_DIR}} - absolute path to project directory
WORKING_DIR="{{WORKING_DIR}}"
HEALING_LOG="$WORKING_DIR/healing-log.jsonl"
MAX_ATTEMPTS=5
ERROR_TYPE=""
AGENT_NAME=""
ATTEMPT=1
CONTEXT_MSG=""
# Parse arguments (bash 3.2 compatible)
while [ "$#" -gt 0 ]; do
case "$1" in
--error-type) ERROR_TYPE="$2"; shift 2 ;;
--agent) AGENT_NAME="$2"; shift 2 ;;
--attempt) ATTEMPT="$2"; shift 2 ;;
--context) CONTEXT_MSG="$2"; shift 2 ;;
*) shift ;;
esac
done
if [ -z "$ERROR_TYPE" ]; then
echo "Usage: $0 --error-type <type> --agent <name> --attempt <n> --context <msg>"
exit 1
fi
# Hard cap: max 5 attempts total
if [ "$ATTEMPT" -gt "$MAX_ATTEMPTS" ]; then
echo "MAX ATTEMPTS REACHED ($MAX_ATTEMPTS) for $AGENT_NAME. Escalating."
python3 -c "
import json, time, os
event = {
'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
'agent': '$AGENT_NAME',
'error_type': '$ERROR_TYPE',
'attempt': $ATTEMPT,
'action': 'escalate',
'reason': 'max_attempts_reached',
'context': '$CONTEXT_MSG'
}
with open('$HEALING_LOG', 'a') as f:
f.write(json.dumps(event) + '\n')
print(json.dumps(event))
"
exit 2
fi
# Determine recovery action per category
RECOVERY_ACTION=""
RECOVERY_DETAIL=""
EXIT_CODE=0
case "$ERROR_TYPE" in
timeout)
RECOVERY_ACTION="retry_shorter"
RECOVERY_DETAIL="Re-run with reduced task scope. Split task if attempt >= 3."
if [ "$ATTEMPT" -ge 3 ]; then
RECOVERY_DETAIL="Attempt $ATTEMPT: recommend splitting task before retry."
fi
EXIT_CODE=0
;;
permission-denied)
RECOVERY_ACTION="skip"
RECOVERY_DETAIL="Permission errors cannot be auto-resolved. Log and skip. Notify operator."
EXIT_CODE=1
;;
tool-not-found)
RECOVERY_ACTION="alert"
RECOVERY_DETAIL="Tool not found — check agent config and hook registrations. Do not retry."
EXIT_CODE=1
;;
api-error)
# Exponential backoff: 2^(attempt-1) seconds, max 3 retries
if [ "$ATTEMPT" -le 3 ]; then
BACKOFF_SECS=$(python3 -c "print(min(2 ** ($ATTEMPT - 1), 16))")
RECOVERY_ACTION="retry_backoff"
RECOVERY_DETAIL="API error — wait ${BACKOFF_SECS}s then retry (attempt $ATTEMPT/3)."
sleep "$BACKOFF_SECS"
EXIT_CODE=0
else
RECOVERY_ACTION="abort"
RECOVERY_DETAIL="API error persists after 3 retries. Aborting."
EXIT_CODE=1
fi
;;
content-quality)
# Max 2 retries for quality issues
if [ "$ATTEMPT" -le 2 ]; then
RECOVERY_ACTION="retry_strict"
RECOVERY_DETAIL="Re-run with stricter prompt. Add explicit quality criteria (attempt $ATTEMPT/2)."
EXIT_CODE=0
else
RECOVERY_ACTION="escalate_quality"
RECOVERY_DETAIL="Content quality below threshold after 2 retries. Escalate to human review."
EXIT_CODE=2
fi
;;
*)
RECOVERY_ACTION="unknown"
RECOVERY_DETAIL="Unknown error type '$ERROR_TYPE'. Logging and aborting."
EXIT_CODE=1
;;
esac
# Log recovery event
python3 -c "
import json, time
event = {
'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
'agent': '$AGENT_NAME',
'error_type': '$ERROR_TYPE',
'attempt': $ATTEMPT,
'action': '$RECOVERY_ACTION',
'detail': '$RECOVERY_DETAIL',
'context': '$CONTEXT_MSG'
}
with open('$HEALING_LOG', 'a') as f:
f.write(json.dumps(event) + '\n')
print(json.dumps(event, indent=2))
"
echo "Recovery: $RECOVERY_ACTION$RECOVERY_DETAIL"
exit $EXIT_CODE
```
**`scripts/templates/optimization/README.md`** — Explains optimization and self-healing:
```markdown
# Pipeline Optimization and Self-Healing
Two tools for making agent pipelines more efficient and resilient over time.
## pipeline-optimizer.sh
Analyzes FEEDBACK.md and cost-events.jsonl to identify:
| Issue | Detection | Recommendation |
|-------|-----------|----------------|
| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope |
| Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard |
| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent |
| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh |
Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each
recommendation. Higher VFM pre-scores mean more value per implementation effort.
**This script does not auto-implement anything.** All changes require
manual review and explicit approval. This is intentional — pipeline
restructuring is a high-stakes operation.
## self-healing.sh
Categorizes errors and applies targeted recovery strategies:
| Error Type | Recovery | Max Retries |
|------------|----------|-------------|
| `timeout` | Retry with shorter scope | 5 (hard cap) |
| `permission-denied` | Log and skip | 0 (no retry) |
| `tool-not-found` | Alert operator | 0 (no retry) |
| `api-error` | Exponential backoff (2^n seconds) | 3 |
| `content-quality` | Retry with stricter prompt | 2 |
**Hard cap: 5 total attempts regardless of category.** This follows the
OpenClaw pattern — unbounded retry loops are the most common cause of
runaway agent costs. The cap is non-negotiable.
After the hard cap is reached, the script exits with code 2 (escalate).
The caller is responsible for deciding whether to pause, alert a human,
or abort the pipeline run.
## Connection to feedback and VFM
```
feedback-collector.sh → FEEDBACK.md → performance-scorer.sh → flagged agents
pipeline-optimizer.sh → RECOMMENDATIONS.md
(manual review + approval)
prompt/pipeline update
new runs → new feedback
```
VFM pre-scores in RECOMMENDATIONS.md use the same 0100 scale as
`scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are
pre-scores, not final scores — the VFM evaluation still needs to run
when the task is scheduled. The pre-scores help prioritize which
recommendations to tackle first.
## Safety limits
- `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files
- `self-healing.sh`: max 5 attempts hard cap, permission errors never retried
- All events logged to `healing-log.jsonl` for audit trail
- No auto-escalation to external systems — exit codes only
## Usage
```bash
# Run optimizer for all pipelines
./optimization/pipeline-optimizer.sh
# Run optimizer for a specific pipeline
./optimization/pipeline-optimizer.sh --pipeline doc-pipeline
# Handle an error in a pipeline step
./optimization/self-healing.sh \
--error-type api-error \
--agent agent-writer \
--attempt 1 \
--context "OpenAI timeout on summarize call"
# Check healing log
cat healing-log.jsonl | python3 -m json.tool
```
```
### Verify
```bash
bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh && echo "VALID"
```
Expected: `VALID`
### On failure: retry — fix bash syntax, then revert
### Checkpoint
```bash
git commit -m "feat(templates): add pipeline optimization and self-healing templates"
```
---
## Exit Condition
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/feedback/ | wc -l` → 4
- [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/optimization/ | wc -l` → 3
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh` → no errors
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh` → no errors
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh` → no errors
- [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh` → no errors
- [ ] FEEDBACK.md contains `| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |` header row
- [ ] performance-scorer.sh computes improvement trend (last 10 vs. prev 10)
- [ ] pipeline-optimizer.sh writes to RECOMMENDATIONS.md and does NOT modify any pipeline files
- [ ] self-healing.sh exits 2 when attempt > 5 (hard cap enforced)
- [ ] healing-log.jsonl referenced in self-healing.sh
- [ ] All bash scripts are 3.2 compatible (no associative arrays, no mapfile, no `|&`)
## Quality Criteria
- Feedback table columns match the 7-column spec (Date, Pipeline, Agent, Score, Issue, Resolution, Pattern)
- Pattern detection fires at exactly 3 occurrences (not 2, not 4)
- Performance-scorer.sh improvement trend correctly computes last 10 vs. previous 10 scores
- pipeline-optimizer.sh detects all 4 issue types: bottleneck, loop-excess, underutilized, cost-outlier
- VFM pre-scores in RECOMMENDATIONS.md use the same 0100 scale as VFM-SCORING.md (Step 11)
- self-healing.sh hard cap is exactly 5 (OpenClaw pattern) — not configurable
- permission-denied and tool-not-found errors are never retried (exit 1 immediately)
- api-error uses exponential backoff: 1s, 2s, 4s (2^0, 2^1, 2^2) before aborting at attempt 4
- content-quality escalates to human review (exit 2) after 2 retries, not abort
- All scripts use `#!/bin/bash` shebang and are bash 3.2 compatible