# Session 5: Self-Learning Systems > Steps 20, 21 | Wave 1 | Depends on: none ## Dependencies Entry condition: none (independent — creates new template directories only) ## Scope Fence **Touch:** - `scripts/templates/feedback/FEEDBACK.md` (new) - `scripts/templates/feedback/feedback-collector.sh` (new) - `scripts/templates/feedback/performance-scorer.sh` (new) - `scripts/templates/feedback/README.md` (new) - `scripts/templates/optimization/pipeline-optimizer.sh` (new) - `scripts/templates/optimization/self-healing.sh` (new) - `scripts/templates/optimization/README.md` (new) **Never touch:** - `commands/` - `agents/` - `skills/` - `scripts/templates/heartbeat/` - `scripts/templates/memory/` - `scripts/templates/proactive/` - `scripts/templates/cron/` - `scripts/templates/goals/` - `scripts/templates/budget/` - `scripts/templates/governance/` - `scripts/templates/org-chart/` - `.claude-plugin/`, `CLAUDE.md`, `README.md` --- ## Step 20: Create feedback loop templates ### Files to create **`scripts/templates/feedback/FEEDBACK.md`** — Feedback tracking file: ```markdown # Feedback Log: {{PROJECT_NAME}} > Append-only. One row per pipeline run. Reviewed by performance-scorer.sh. ## Feedback Table | Date | Pipeline | Agent | Score | Issue | Resolution | Pattern | |------|----------|-------|-------|-------|------------|---------| | {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} | ## Pattern Tags Use consistent tags so performance-scorer.sh can detect recurring issues: - `quality-low` — output below acceptance threshold - `loop-excess` — more revision iterations than expected - `timeout` — agent exceeded time budget - `tool-fail` — tool call failed or returned unexpected result - `cost-spike` — single run cost exceeded 3x average - `scope-drift` — agent worked outside defined scope - `hallucination` — output contained factual errors ## Notes Scores are 0–100 as assigned by the reviewer agent or human reviewer. A score below 60 triggers a flag in performance-scorer.sh. Three or more rows with the same Pattern tag = recurring issue. Recurring issues should drive prompt iteration or pipeline redesign. ``` **`scripts/templates/feedback/feedback-collector.sh`** — PostToolUse hook variant that appends feedback after pipeline completion: ```bash #!/bin/bash # PostToolUse hook: Collect feedback after pipeline completion. # Bash 3.2 compatible. Uses python3 for JSON parsing and CSV/MD append. # # Triggered after a designated "review" tool call completes. # Reads pipeline output and reviewer score, appends to FEEDBACK.md, # and detects recurring patterns (3+ rows with same tag = recurring). # # Placeholders: # {{WORKING_DIR}} - absolute path to project directory # {{PIPELINE_NAME}} - name of the pipeline being tracked # {{SCORE_THRESHOLD}} - minimum acceptable score (default: 60) WORKING_DIR="{{WORKING_DIR}}" PIPELINE_NAME="{{PIPELINE_NAME}}" SCORE_THRESHOLD="${SCORE_THRESHOLD:-60}" FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md" HOOK_INPUT=$(cat) # Only act on review tool calls TOOL_NAME=$(echo "$HOOK_INPUT" | python3 -c " import sys, json try: data = json.load(sys.stdin) print(data.get('tool_name', '')) except: print('') " 2>/dev/null) if [ "$TOOL_NAME" != "review_pipeline" ] && [ "$TOOL_NAME" != "score_output" ]; then exit 0 fi # Extract score, agent, issue, resolution, pattern from hook input python3 << PYEOF import sys, json, re, os from datetime import datetime hook_input = """$HOOK_INPUT""" feedback_file = "$FEEDBACK_FILE" pipeline_name = "$PIPELINE_NAME" score_threshold = int("$SCORE_THRESHOLD") try: data = json.loads(hook_input) except Exception: sys.exit(0) tool_result = data.get('tool_result', '') if isinstance(tool_result, dict): tool_result = json.dumps(tool_result) # Parse structured fields from tool result (expects JSON or key:value) agent_name = os.environ.get('AGENT_NAME', 'unknown') score = 0 issue = '' resolution = '' pattern = '' try: result_data = json.loads(tool_result) agent_name = result_data.get('agent', agent_name) score = int(result_data.get('score', 0)) issue = result_data.get('issue', '') resolution = result_data.get('resolution', '') pattern = result_data.get('pattern', '') except Exception: # Fallback: look for score: N in plain text m = re.search(r'score[:\s]+(\d+)', tool_result, re.IGNORECASE) if m: score = int(m.group(1)) m = re.search(r'pattern[:\s]+(\S+)', tool_result, re.IGNORECASE) if m: pattern = m.group(1) if score == 0 and not issue: sys.exit(0) date_str = datetime.utcnow().strftime('%Y-%m-%d') row = f"| {date_str} | {pipeline_name} | {agent_name} | {score}/100 | {issue} | {resolution} | {pattern} |" # Append to feedback table if not os.path.exists(feedback_file): print(f"Warning: {feedback_file} not found — skipping feedback append") sys.exit(0) with open(feedback_file, 'r') as f: content = f.read() # Insert row after the header row of the table table_header = '| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |' separator = '|------|----------|-------|-------|-------|------------|---------|' placeholder_row = '| {{DATE}} | {{PIPELINE_NAME}} | {{AGENT_NAME}} | {{SCORE}}/100 | {{ISSUE_DESCRIPTION}} | {{RESOLUTION}} | {{PATTERN_TAG}} |' if placeholder_row in content: # Replace placeholder with real row + keep placeholder for next time content = content.replace(placeholder_row, row + '\n' + placeholder_row) elif separator in content: content = content.replace(separator, separator + '\n' + row) else: content += '\n' + row + '\n' with open(feedback_file, 'w') as f: f.write(content) print(f"Feedback recorded: score={score}, pattern={pattern}") # Detect recurring patterns if pattern: pattern_count = content.count(f'| {pattern} |') if pattern_count >= 3: print(f"RECURRING PATTERN DETECTED: '{pattern}' appears {pattern_count} times") print(f"Action required: review prompt or pipeline for '{pipeline_name}'") # Flag low scores if score < score_threshold and score > 0: print(f"LOW SCORE: {score} < threshold {score_threshold} for agent {agent_name}") PYEOF exit 0 ``` **`scripts/templates/feedback/performance-scorer.sh`** — Standalone scoring script that reads FEEDBACK.md and cost-events.jsonl: ```bash #!/bin/bash # Performance scorer: per-agent metrics from FEEDBACK.md + cost-events.jsonl. # Bash 3.2 compatible. Uses python3 for all metrics computation. # # Metrics per agent: # - Average score (0-100) # - Error rate (rows with score < threshold / total rows) # - Cost per run (from cost-events.jsonl, rough proxy) # - Improvement trend: avg of last 10 scores vs. previous 10 # # Flags agents below threshold (default 60/100). # # Usage: # ./performance-scorer.sh # Score all agents # ./performance-scorer.sh --agent {{AGENT}} # Score specific agent # ./performance-scorer.sh --threshold 70 # Custom threshold # # Placeholders: # {{WORKING_DIR}} - absolute path to project directory WORKING_DIR="{{WORKING_DIR}}" FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md" COST_LOG="$WORKING_DIR/budget/cost-events.jsonl" THRESHOLD="${2:-60}" AGENT_FILTER="" # Parse arguments (bash 3.2 compatible — no associative arrays) while [ "$#" -gt 0 ]; do case "$1" in --agent) AGENT_FILTER="$2"; shift 2 ;; --threshold) THRESHOLD="$2"; shift 2 ;; *) shift ;; esac done if [ ! -f "$FEEDBACK_FILE" ]; then echo "No feedback file found at $FEEDBACK_FILE" exit 0 fi python3 << PYEOF import re, json, os, sys from collections import defaultdict feedback_file = "$FEEDBACK_FILE" cost_log = "$COST_LOG" threshold = int("$THRESHOLD") agent_filter = "$AGENT_FILTER" # Parse FEEDBACK.md table rows # Expected columns: Date, Pipeline, Agent, Score, Issue, Resolution, Pattern feedback_rows = [] with open(feedback_file) as f: in_table = False header_seen = False for line in f: line = line.strip() if '| Date |' in line: in_table = True header_seen = True continue if in_table and line.startswith('|---'): continue if in_table and line.startswith('|') and '{{' not in line and header_seen: cols = [c.strip() for c in line.strip('|').split('|')] if len(cols) >= 7: try: date = cols[0] pipeline = cols[1] agent = cols[2] score_str = cols[3] issue = cols[4] resolution = cols[5] pattern = cols[6] # Parse score: "75/100" or "75" score_m = re.match(r'(\d+)', score_str) score = int(score_m.group(1)) if score_m else 0 feedback_rows.append({ 'date': date, 'pipeline': pipeline, 'agent': agent, 'score': score, 'issue': issue, 'pattern': pattern }) except (ValueError, IndexError): pass # Filter by agent if specified if agent_filter: feedback_rows = [r for r in feedback_rows if r['agent'] == agent_filter] if not feedback_rows: print("No feedback rows found.") sys.exit(0) # Read cost events if available cost_by_agent = defaultdict(int) if os.path.exists(cost_log): with open(cost_log) as f: for line in f: line = line.strip() if line: try: event = json.loads(line) agent = event.get('agent', 'unknown') cost_by_agent[agent] += 1 # event count as proxy except Exception: pass # Compute per-agent metrics agents = list(set(r['agent'] for r in feedback_rows)) print("PERFORMANCE SCORECARD") print("=" * 60) print(f"Threshold: {threshold}/100") print(f"Total feedback rows: {len(feedback_rows)}") print() flagged = [] for agent in sorted(agents): rows = [r for r in feedback_rows if r['agent'] == agent] scores = [r['score'] for r in rows] avg_score = sum(scores) / len(scores) if scores else 0 error_rate = len([s for s in scores if s < threshold]) / len(scores) if scores else 0 cost_events = cost_by_agent.get(agent, 0) cost_per_run = cost_events / len(rows) if rows else 0 # Improvement trend: last 10 vs. prev 10 trend_str = "n/a (fewer than 20 runs)" if len(scores) >= 20: prev10 = scores[-20:-10] last10 = scores[-10:] prev_avg = sum(prev10) / len(prev10) last_avg = sum(last10) / len(last10) delta = last_avg - prev_avg if delta > 5: trend_str = f"improving (+{delta:.1f})" elif delta < -5: trend_str = f"declining ({delta:.1f})" else: trend_str = f"stable ({delta:+.1f})" elif len(scores) >= 10: last10 = scores[-10:] trend_str = f"recent avg: {sum(last10)/len(last10):.1f} (need 20 runs for trend)" # Pattern frequency patterns = defaultdict(int) for r in rows: if r['pattern']: patterns[r['pattern']] += 1 top_patterns = sorted(patterns.items(), key=lambda x: -x[1])[:3] print(f"Agent: {agent}") print(f" Runs: {len(rows)}") print(f" Avg score: {avg_score:.1f}/100") print(f" Error rate: {error_rate*100:.0f}% (score < {threshold})") print(f" Cost/run: ~{cost_per_run:.1f} events (rough proxy)") print(f" Trend: {trend_str}") if top_patterns: print(f" Top patterns: {', '.join(f'{p}({c})' for p, c in top_patterns)}") print() if avg_score < threshold: flagged.append((agent, avg_score)) # Summary of flagged agents if flagged: print("FLAGGED AGENTS (below threshold)") print("-" * 40) for agent, avg in flagged: print(f" {agent}: avg {avg:.1f} < {threshold}") print() print("Recommended actions:") print(" 1. Review feedback rows for top patterns") print(" 2. Iterate on agent system prompt") print(" 3. Consider pipeline redesign if pattern is structural") print(" 4. Run pipeline-optimizer.sh for bottleneck analysis") else: print("All agents above threshold.") PYEOF ``` **`scripts/templates/feedback/README.md`** — Explains the feedback loop pattern: ```markdown # Feedback Loop Systematic feedback collection and performance scoring for agent pipelines. ## How it works 1. After each pipeline run, a reviewer agent (or human) assigns a score (0–100) and categorizes any issues with a pattern tag. 2. `feedback-collector.sh` runs as a PostToolUse hook on `review_pipeline` or `score_output` tool calls. It appends a row to `FEEDBACK.md`. 3. When 3+ rows share the same pattern tag, a recurring-pattern alert fires. 4. `performance-scorer.sh` reads `FEEDBACK.md` and `budget/cost-events.jsonl` to compute per-agent metrics: average score, error rate, cost per run, improvement trend (last 10 vs. previous 10 runs). 5. Agents scoring below the threshold (default 60/100) are flagged for review. ## Pattern tags Consistent tags are required for pattern detection to work. Use the tags defined in `FEEDBACK.md`. Add project-specific tags as needed — but be consistent. Inconsistent tagging produces false negatives. ## Scoring → self-improvement connection Feedback scores are the input to VFM (Value-for-Money) pre-scoring defined in `scripts/templates/proactive/VFM-SCORING.md` (Step 11). A low-scoring agent gets a lower VFM pre-score for future pipeline tasks, making it less likely to be selected until its performance improves. The feedback loop closes the improvement cycle: 1. Pipeline runs → reviewer assigns score + pattern tag 2. `feedback-collector.sh` appends to FEEDBACK.md 3. `performance-scorer.sh` flags underperforming agents 4. Developer reviews top patterns → iterates on agent prompt 5. New runs produce new feedback → trend shows improvement 6. VFM scores update automatically on next pipeline selection ## Example: prompt iteration driven by feedback Suppose `agent-writer` repeatedly scores 45/100 with pattern `quality-low`: ``` | 2025-01-10 | doc-pipeline | agent-writer | 45/100 | Output too brief | Added detail requirement | quality-low | | 2025-01-11 | doc-pipeline | agent-writer | 42/100 | Still too brief | Repeated instruction | quality-low | | 2025-01-12 | doc-pipeline | agent-writer | 48/100 | Slightly better | — | quality-low | ``` After 3 rows: feedback-collector.sh fires the recurring-pattern alert. performance-scorer.sh shows avg 45/100, error rate 100%. Action: update agent-writer's system prompt with explicit length and depth requirements. Next 10 runs show trend "improving (+18.3)". ## Integration Add feedback-collector.sh as a PostToolUse hook in `.claude/settings.json`: ```json { "hooks": { "PostToolUse": [{ "matcher": "review_pipeline", "hooks": [{"type": "command", "command": "bash feedback/feedback-collector.sh"}] }] } } ``` Run performance-scorer.sh on demand or as a scheduled report: ```bash ./feedback/performance-scorer.sh ./feedback/performance-scorer.sh --agent agent-writer --threshold 70 ``` ``` ### Verify ```bash bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh && echo "VALID" ``` Expected: `VALID` ### On failure: retry — fix bash syntax, then revert ### Checkpoint ```bash git commit -m "feat(templates): add feedback loop and performance scoring templates" ``` --- ## Step 21: Create pipeline optimization templates ### Files to create **`scripts/templates/optimization/pipeline-optimizer.sh`** — Analyzes pipeline performance and generates recommendations: ```bash #!/bin/bash # Pipeline optimizer: identify bottlenecks, excess loops, cost outliers. # Bash 3.2 compatible. Uses python3 for all analysis. # Does NOT auto-implement any changes — produces RECOMMENDATIONS.md only. # # Analysis covers: # - Bottleneck agents (highest avg duration or cost per run) # - Unnecessary revision loops (agents that loop 3+ times on average) # - Underutilized agents (invoked < 10% of pipeline runs) # - Cost outliers (single run cost >= 3x average) # # Output: RECOMMENDATIONS.md with VFM pre-scores for each recommendation. # # Usage: # ./pipeline-optimizer.sh # ./pipeline-optimizer.sh --pipeline {{PIPELINE_NAME}} # # Placeholders: # {{WORKING_DIR}} - absolute path to project directory WORKING_DIR="{{WORKING_DIR}}" FEEDBACK_FILE="$WORKING_DIR/FEEDBACK.md" COST_LOG="$WORKING_DIR/budget/cost-events.jsonl" RECOMMENDATIONS_FILE="$WORKING_DIR/RECOMMENDATIONS.md" PIPELINE_FILTER="" # Parse arguments (bash 3.2 compatible) while [ "$#" -gt 0 ]; do case "$1" in --pipeline) PIPELINE_FILTER="$2"; shift 2 ;; *) shift ;; esac done python3 << PYEOF import re, json, os, sys from collections import defaultdict from datetime import datetime feedback_file = "$FEEDBACK_FILE" cost_log = "$COST_LOG" recommendations_file = "$RECOMMENDATIONS_FILE" pipeline_filter = "$PIPELINE_FILTER" # Parse FEEDBACK.md feedback_rows = [] if os.path.exists(feedback_file): with open(feedback_file) as f: in_table = False for line in f: line = line.strip() if '| Date |' in line: in_table = True continue if in_table and line.startswith('|---'): continue if in_table and line.startswith('|') and '{{' not in line: cols = [c.strip() for c in line.strip('|').split('|')] if len(cols) >= 7: try: score_m = re.match(r'(\d+)', cols[3]) score = int(score_m.group(1)) if score_m else 0 feedback_rows.append({ 'date': cols[0], 'pipeline': cols[1], 'agent': cols[2], 'score': score, 'issue': cols[4], 'pattern': cols[6] }) except (ValueError, IndexError): pass # Filter by pipeline if pipeline_filter: feedback_rows = [r for r in feedback_rows if r['pipeline'] == pipeline_filter] # Parse cost events cost_events = [] if os.path.exists(cost_log): with open(cost_log) as f: for line in f: line = line.strip() if line: try: cost_events.append(json.loads(line)) except Exception: pass # Per-agent event counts (cost proxy) cost_by_agent = defaultdict(list) # Group by agent+date for per-run cost run_costs = defaultdict(list) for e in cost_events: agent = e.get('agent', 'unknown') date = e.get('timestamp', '')[:10] run_key = f"{agent}:{date}" cost_by_agent[agent].append(1) run_costs[agent].append(1) # Build recommendations recommendations = [] # 1. Bottleneck agents: top 2 by event count if cost_by_agent: agent_totals = [(a, len(events)) for a, events in cost_by_agent.items()] agent_totals.sort(key=lambda x: -x[1]) for agent, total in agent_totals[:2]: all_costs = [len(v) for v in run_costs.values()] avg_cost = sum(all_costs) / len(all_costs) if all_costs else 1 if total > avg_cost * 1.5: recommendations.append({ 'type': 'bottleneck', 'agent': agent, 'description': f"Agent '{agent}' accounts for {total} events vs avg {avg_cost:.0f}. " f"Consider batching its tool calls or reducing its task scope.", 'vfm_prescore': 70 }) # 2. Unnecessary revision loops: agents with loop-excess pattern >= 3 times pattern_by_agent = defaultdict(lambda: defaultdict(int)) for r in feedback_rows: if r['pattern']: pattern_by_agent[r['agent']][r['pattern']] += 1 for agent, patterns in pattern_by_agent.items(): if patterns.get('loop-excess', 0) >= 3: count = patterns['loop-excess'] recommendations.append({ 'type': 'loop-excess', 'agent': agent, 'description': f"Agent '{agent}' has {count} feedback rows tagged 'loop-excess'. " f"Review pipeline revision criteria — tighten acceptance conditions " f"or add a max-iterations guard (see self-healing.sh).", 'vfm_prescore': 80 }) # 3. Underutilized agents: invoked in < 10% of pipeline runs if feedback_rows: all_runs = set(r['date'] + ':' + r['pipeline'] for r in feedback_rows) total_runs = len(all_runs) if all_runs else 1 agent_runs = defaultdict(set) for r in feedback_rows: agent_runs[r['agent']].add(r['date'] + ':' + r['pipeline']) for agent, runs in agent_runs.items(): utilization = len(runs) / total_runs if utilization < 0.1 and total_runs >= 10: recommendations.append({ 'type': 'underutilized', 'agent': agent, 'description': f"Agent '{agent}' appears in only {utilization*100:.0f}% of pipeline runs. " f"Consider removing from the pipeline or combining with another agent.", 'vfm_prescore': 60 }) # 4. Cost outliers: single-run cost >= 3x average if run_costs: all_run_totals = [] for agent, runs in run_costs.items(): all_run_totals.extend(runs) avg_run = sum(all_run_totals) / len(all_run_totals) if all_run_totals else 1 for agent, runs in run_costs.items(): for run_cost in runs: if run_cost >= avg_run * 3: recommendations.append({ 'type': 'cost-outlier', 'agent': agent, 'description': f"Agent '{agent}' had a run costing {run_cost} events " f"vs avg {avg_run:.1f} (3x+ threshold). " f"Add per-run budget cap with budget-hook.sh.", 'vfm_prescore': 75 }) break # one recommendation per agent # Write RECOMMENDATIONS.md timestamp = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ') pipeline_label = pipeline_filter if pipeline_filter else "all pipelines" lines = [ f"# Pipeline Optimization Recommendations", f"", f"Generated: {timestamp}", f"Scope: {pipeline_label}", f"", f"> These are recommendations only. No changes have been made.", f"> Review each item and implement manually or with team approval.", f"", ] if recommendations: lines.append(f"## Recommendations ({len(recommendations)} found)") lines.append("") for i, rec in enumerate(recommendations, 1): lines.append(f"### R{i}: {rec['type'].upper()} — {rec['agent']}") lines.append("") lines.append(rec['description']) lines.append("") lines.append(f"**VFM pre-score:** {rec['vfm_prescore']}/100") lines.append("") else: lines.append("## No recommendations") lines.append("") lines.append("No bottlenecks, excess loops, underutilized agents, or cost outliers detected.") lines.append("") lines.append("## Next steps") lines.append("") lines.append("1. Review each recommendation with the team") lines.append("2. Prioritize by VFM pre-score (higher = more value per effort)") lines.append("3. Implement approved changes one at a time") lines.append("4. Run feedback-collector.sh for 10+ runs after each change") lines.append("5. Re-run pipeline-optimizer.sh to confirm improvement") with open(recommendations_file, 'w') as f: f.write('\n'.join(lines) + '\n') print(f"Recommendations written to {recommendations_file}") print(f" Found: {len(recommendations)} recommendations") for rec in recommendations: print(f" - [{rec['type']}] {rec['agent']}: VFM pre-score {rec['vfm_prescore']}") PYEOF ``` **`scripts/templates/optimization/self-healing.sh`** — Error recovery after agent/pipeline failures: ```bash #!/bin/bash # Self-healing: categorize errors and apply recovery strategies. # Bash 3.2 compatible. Uses python3 for JSON/log parsing. # # Error categories and recovery strategies: # timeout → retry with shorter task scope # permission-denied → log and skip (do not retry) # tool-not-found → log and alert, do not retry # api-error → exponential backoff, max 3 retries # content-quality → re-run with stricter prompt, max 2 retries # # Max total attempts: 5 (OpenClaw pattern — hard cap regardless of category). # All recovery events logged to healing-log.jsonl. # # Usage: # ./self-healing.sh --error-type --agent --attempt --context # # Exit codes: # 0 — recovery action taken (caller should retry) # 1 — no recovery possible (caller should abort) # 2 — max attempts reached (caller should escalate) # # Placeholders: # {{WORKING_DIR}} - absolute path to project directory WORKING_DIR="{{WORKING_DIR}}" HEALING_LOG="$WORKING_DIR/healing-log.jsonl" MAX_ATTEMPTS=5 ERROR_TYPE="" AGENT_NAME="" ATTEMPT=1 CONTEXT_MSG="" # Parse arguments (bash 3.2 compatible) while [ "$#" -gt 0 ]; do case "$1" in --error-type) ERROR_TYPE="$2"; shift 2 ;; --agent) AGENT_NAME="$2"; shift 2 ;; --attempt) ATTEMPT="$2"; shift 2 ;; --context) CONTEXT_MSG="$2"; shift 2 ;; *) shift ;; esac done if [ -z "$ERROR_TYPE" ]; then echo "Usage: $0 --error-type --agent --attempt --context " exit 1 fi # Hard cap: max 5 attempts total if [ "$ATTEMPT" -gt "$MAX_ATTEMPTS" ]; then echo "MAX ATTEMPTS REACHED ($MAX_ATTEMPTS) for $AGENT_NAME. Escalating." python3 -c " import json, time, os event = { 'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), 'agent': '$AGENT_NAME', 'error_type': '$ERROR_TYPE', 'attempt': $ATTEMPT, 'action': 'escalate', 'reason': 'max_attempts_reached', 'context': '$CONTEXT_MSG' } with open('$HEALING_LOG', 'a') as f: f.write(json.dumps(event) + '\n') print(json.dumps(event)) " exit 2 fi # Determine recovery action per category RECOVERY_ACTION="" RECOVERY_DETAIL="" EXIT_CODE=0 case "$ERROR_TYPE" in timeout) RECOVERY_ACTION="retry_shorter" RECOVERY_DETAIL="Re-run with reduced task scope. Split task if attempt >= 3." if [ "$ATTEMPT" -ge 3 ]; then RECOVERY_DETAIL="Attempt $ATTEMPT: recommend splitting task before retry." fi EXIT_CODE=0 ;; permission-denied) RECOVERY_ACTION="skip" RECOVERY_DETAIL="Permission errors cannot be auto-resolved. Log and skip. Notify operator." EXIT_CODE=1 ;; tool-not-found) RECOVERY_ACTION="alert" RECOVERY_DETAIL="Tool not found — check agent config and hook registrations. Do not retry." EXIT_CODE=1 ;; api-error) # Exponential backoff: 2^(attempt-1) seconds, max 3 retries if [ "$ATTEMPT" -le 3 ]; then BACKOFF_SECS=$(python3 -c "print(min(2 ** ($ATTEMPT - 1), 16))") RECOVERY_ACTION="retry_backoff" RECOVERY_DETAIL="API error — wait ${BACKOFF_SECS}s then retry (attempt $ATTEMPT/3)." sleep "$BACKOFF_SECS" EXIT_CODE=0 else RECOVERY_ACTION="abort" RECOVERY_DETAIL="API error persists after 3 retries. Aborting." EXIT_CODE=1 fi ;; content-quality) # Max 2 retries for quality issues if [ "$ATTEMPT" -le 2 ]; then RECOVERY_ACTION="retry_strict" RECOVERY_DETAIL="Re-run with stricter prompt. Add explicit quality criteria (attempt $ATTEMPT/2)." EXIT_CODE=0 else RECOVERY_ACTION="escalate_quality" RECOVERY_DETAIL="Content quality below threshold after 2 retries. Escalate to human review." EXIT_CODE=2 fi ;; *) RECOVERY_ACTION="unknown" RECOVERY_DETAIL="Unknown error type '$ERROR_TYPE'. Logging and aborting." EXIT_CODE=1 ;; esac # Log recovery event python3 -c " import json, time event = { 'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()), 'agent': '$AGENT_NAME', 'error_type': '$ERROR_TYPE', 'attempt': $ATTEMPT, 'action': '$RECOVERY_ACTION', 'detail': '$RECOVERY_DETAIL', 'context': '$CONTEXT_MSG' } with open('$HEALING_LOG', 'a') as f: f.write(json.dumps(event) + '\n') print(json.dumps(event, indent=2)) " echo "Recovery: $RECOVERY_ACTION — $RECOVERY_DETAIL" exit $EXIT_CODE ``` **`scripts/templates/optimization/README.md`** — Explains optimization and self-healing: ```markdown # Pipeline Optimization and Self-Healing Two tools for making agent pipelines more efficient and resilient over time. ## pipeline-optimizer.sh Analyzes FEEDBACK.md and cost-events.jsonl to identify: | Issue | Detection | Recommendation | |-------|-----------|----------------| | Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope | | Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard | | Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent | | Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh | Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each recommendation. Higher VFM pre-scores mean more value per implementation effort. **This script does not auto-implement anything.** All changes require manual review and explicit approval. This is intentional — pipeline restructuring is a high-stakes operation. ## self-healing.sh Categorizes errors and applies targeted recovery strategies: | Error Type | Recovery | Max Retries | |------------|----------|-------------| | `timeout` | Retry with shorter scope | 5 (hard cap) | | `permission-denied` | Log and skip | 0 (no retry) | | `tool-not-found` | Alert operator | 0 (no retry) | | `api-error` | Exponential backoff (2^n seconds) | 3 | | `content-quality` | Retry with stricter prompt | 2 | **Hard cap: 5 total attempts regardless of category.** This follows the OpenClaw pattern — unbounded retry loops are the most common cause of runaway agent costs. The cap is non-negotiable. After the hard cap is reached, the script exits with code 2 (escalate). The caller is responsible for deciding whether to pause, alert a human, or abort the pipeline run. ## Connection to feedback and VFM ``` feedback-collector.sh → FEEDBACK.md → performance-scorer.sh → flagged agents ↓ pipeline-optimizer.sh → RECOMMENDATIONS.md ↓ (manual review + approval) ↓ prompt/pipeline update ↓ new runs → new feedback ``` VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as `scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are pre-scores, not final scores — the VFM evaluation still needs to run when the task is scheduled. The pre-scores help prioritize which recommendations to tackle first. ## Safety limits - `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files - `self-healing.sh`: max 5 attempts hard cap, permission errors never retried - All events logged to `healing-log.jsonl` for audit trail - No auto-escalation to external systems — exit codes only ## Usage ```bash # Run optimizer for all pipelines ./optimization/pipeline-optimizer.sh # Run optimizer for a specific pipeline ./optimization/pipeline-optimizer.sh --pipeline doc-pipeline # Handle an error in a pipeline step ./optimization/self-healing.sh \ --error-type api-error \ --agent agent-writer \ --attempt 1 \ --context "OpenAI timeout on summarize call" # Check healing log cat healing-log.jsonl | python3 -m json.tool ``` ``` ### Verify ```bash bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh && bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh && echo "VALID" ``` Expected: `VALID` ### On failure: retry — fix bash syntax, then revert ### Checkpoint ```bash git commit -m "feat(templates): add pipeline optimization and self-healing templates" ``` --- ## Exit Condition - [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/feedback/ | wc -l` → 4 - [ ] `ls /Users/ktg/repos/agent-builder/scripts/templates/optimization/ | wc -l` → 3 - [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/feedback-collector.sh` → no errors - [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/feedback/performance-scorer.sh` → no errors - [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/pipeline-optimizer.sh` → no errors - [ ] `bash -n /Users/ktg/repos/agent-builder/scripts/templates/optimization/self-healing.sh` → no errors - [ ] FEEDBACK.md contains `| Date | Pipeline | Agent | Score | Issue | Resolution | Pattern |` header row - [ ] performance-scorer.sh computes improvement trend (last 10 vs. prev 10) - [ ] pipeline-optimizer.sh writes to RECOMMENDATIONS.md and does NOT modify any pipeline files - [ ] self-healing.sh exits 2 when attempt > 5 (hard cap enforced) - [ ] healing-log.jsonl referenced in self-healing.sh - [ ] All bash scripts are 3.2 compatible (no associative arrays, no mapfile, no `|&`) ## Quality Criteria - Feedback table columns match the 7-column spec (Date, Pipeline, Agent, Score, Issue, Resolution, Pattern) - Pattern detection fires at exactly 3 occurrences (not 2, not 4) - Performance-scorer.sh improvement trend correctly computes last 10 vs. previous 10 scores - pipeline-optimizer.sh detects all 4 issue types: bottleneck, loop-excess, underutilized, cost-outlier - VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as VFM-SCORING.md (Step 11) - self-healing.sh hard cap is exactly 5 (OpenClaw pattern) — not configurable - permission-denied and tool-not-found errors are never retried (exit 1 immediately) - api-error uses exponential backoff: 1s, 2s, 4s (2^0, 2^1, 2^2) before aborting at attempt 4 - content-quality escalates to human review (exit 2) after 2 retries, not abort - All scripts use `#!/bin/bash` shebang and are bash 3.2 compatible