# Pipeline Optimization and Self-Healing Two tools for making agent pipelines more efficient and resilient over time. ## pipeline-optimizer.sh Analyzes FEEDBACK.md and cost-events.jsonl to identify: | Issue | Detection | Recommendation | |-------|-----------|----------------| | Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope | | Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard | | Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent | | Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh | Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each recommendation. Higher VFM pre-scores mean more value per implementation effort. **This script does not auto-implement anything.** All changes require manual review and explicit approval. This is intentional — pipeline restructuring is a high-stakes operation. ## self-healing.sh Categorizes errors and applies targeted recovery strategies: | Error Type | Recovery | Max Retries | |------------|----------|-------------| | `timeout` | Retry with shorter scope | 5 (hard cap) | | `permission-denied` | Log and skip | 0 (no retry) | | `tool-not-found` | Alert operator | 0 (no retry) | | `api-error` | Exponential backoff (2^n seconds) | 3 | | `content-quality` | Retry with stricter prompt | 2 | **Hard cap: 5 total attempts regardless of category.** This follows the OpenClaw pattern — unbounded retry loops are the most common cause of runaway agent costs. The cap is non-negotiable. After the hard cap is reached, the script exits with code 2 (escalate). The caller is responsible for deciding whether to pause, alert a human, or abort the pipeline run. ## Connection to feedback and VFM ``` feedback-collector.sh -> FEEDBACK.md -> performance-scorer.sh -> flagged agents | pipeline-optimizer.sh -> RECOMMENDATIONS.md | (manual review + approval) | prompt/pipeline update | new runs -> new feedback ``` VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as `scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are pre-scores, not final scores — the VFM evaluation still needs to run when the task is scheduled. The pre-scores help prioritize which recommendations to tackle first. ## Safety limits - `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files - `self-healing.sh`: max 5 attempts hard cap, permission errors never retried - All events logged to `healing-log.jsonl` for audit trail - No auto-escalation to external systems — exit codes only ## Usage ```bash # Run optimizer for all pipelines ./optimization/pipeline-optimizer.sh # Run optimizer for a specific pipeline ./optimization/pipeline-optimizer.sh --pipeline doc-pipeline # Handle an error in a pipeline step ./optimization/self-healing.sh \ --error-type api-error \ --agent agent-writer \ --attempt 1 \ --context "OpenAI timeout on summarize call" # Check healing log cat healing-log.jsonl | python3 -m json.tool ```