agent-builder/scripts/templates/optimization/README.md

# Pipeline Optimization and Self-Healing

Two tools for making agent pipelines more efficient and resilient over time.

## pipeline-optimizer.sh

Analyzes FEEDBACK.md and cost-events.jsonl to identify:

| Issue | Detection | Recommendation |
|-------|-----------|----------------|
| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope |
| Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard |
| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent |
| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh |

Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each
recommendation. Higher VFM pre-scores mean more value per implementation effort.

**This script does not auto-implement anything.** All changes require
manual review and explicit approval. This is intentional — pipeline
restructuring is a high-stakes operation.

## self-healing.sh

Categorizes errors and applies targeted recovery strategies:

| Error Type | Recovery | Max Retries |
|------------|----------|-------------|
| `timeout` | Retry with shorter scope | 5 (hard cap) |
| `permission-denied` | Log and skip | 0 (no retry) |
| `tool-not-found` | Alert operator | 0 (no retry) |
| `api-error` | Exponential backoff (2^n seconds) | 3 |
| `content-quality` | Retry with stricter prompt | 2 |

**Hard cap: 5 total attempts regardless of category.** This follows the
OpenClaw pattern — unbounded retry loops are the most common cause of
runaway agent costs. The cap is non-negotiable.

After the hard cap is reached, the script exits with code 2 (escalate).
The caller is responsible for deciding whether to pause, alert a human,
or abort the pipeline run.

## Connection to feedback and VFM

```
feedback-collector.sh -> FEEDBACK.md -> performance-scorer.sh -> flagged agents
                                     |
                              pipeline-optimizer.sh -> RECOMMENDATIONS.md
                                     |
                           (manual review + approval)
                                     |
                            prompt/pipeline update
                                     |
                           new runs -> new feedback
```

VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as
`scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are
pre-scores, not final scores — the VFM evaluation still needs to run
when the task is scheduled. The pre-scores help prioritize which
recommendations to tackle first.

## Safety limits

- `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files
- `self-healing.sh`: max 5 attempts hard cap, permission errors never retried
- All events logged to `healing-log.jsonl` for audit trail
- No auto-escalation to external systems — exit codes only

## Usage

```bash
# Run optimizer for all pipelines
./optimization/pipeline-optimizer.sh

# Run optimizer for a specific pipeline
./optimization/pipeline-optimizer.sh --pipeline doc-pipeline

# Handle an error in a pipeline step
./optimization/self-healing.sh \
  --error-type api-error \
  --agent agent-writer \
  --attempt 1 \
  --context "OpenAI timeout on summarize call"

# Check healing log
cat healing-log.jsonl | python3 -m json.tool
```