Session 5 step 21 — pipeline-optimizer writes RECOMMENDATIONS.md with VFM pre-scores (never modifies pipeline files directly). self-healing categorizes errors and applies recovery strategies with 5-attempt hard cap, logging to healing-log.jsonl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| pipeline-optimizer.sh | ||
| README.md | ||
| self-healing.sh | ||
Pipeline Optimization and Self-Healing
Two tools for making agent pipelines more efficient and resilient over time.
pipeline-optimizer.sh
Analyzes FEEDBACK.md and cost-events.jsonl to identify:
| Issue | Detection | Recommendation |
|---|---|---|
| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope |
| Unnecessary revision loops | 3+ loop-excess pattern rows |
Tighten acceptance criteria, add max-iterations guard |
| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent |
| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh |
Output is written to RECOMMENDATIONS.md with a VFM pre-score for each
recommendation. Higher VFM pre-scores mean more value per implementation effort.
This script does not auto-implement anything. All changes require manual review and explicit approval. This is intentional — pipeline restructuring is a high-stakes operation.
self-healing.sh
Categorizes errors and applies targeted recovery strategies:
| Error Type | Recovery | Max Retries |
|---|---|---|
timeout |
Retry with shorter scope | 5 (hard cap) |
permission-denied |
Log and skip | 0 (no retry) |
tool-not-found |
Alert operator | 0 (no retry) |
api-error |
Exponential backoff (2^n seconds) | 3 |
content-quality |
Retry with stricter prompt | 2 |
Hard cap: 5 total attempts regardless of category. This follows the OpenClaw pattern — unbounded retry loops are the most common cause of runaway agent costs. The cap is non-negotiable.
After the hard cap is reached, the script exits with code 2 (escalate). The caller is responsible for deciding whether to pause, alert a human, or abort the pipeline run.
Connection to feedback and VFM
feedback-collector.sh -> FEEDBACK.md -> performance-scorer.sh -> flagged agents
|
pipeline-optimizer.sh -> RECOMMENDATIONS.md
|
(manual review + approval)
|
prompt/pipeline update
|
new runs -> new feedback
VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as
scripts/templates/proactive/VFM-SCORING.md (Step 11). They are
pre-scores, not final scores — the VFM evaluation still needs to run
when the task is scheduled. The pre-scores help prioritize which
recommendations to tackle first.
Safety limits
pipeline-optimizer.sh: read-only analysis — never modifies pipeline filesself-healing.sh: max 5 attempts hard cap, permission errors never retried- All events logged to
healing-log.jsonlfor audit trail - No auto-escalation to external systems — exit codes only
Usage
# Run optimizer for all pipelines
./optimization/pipeline-optimizer.sh
# Run optimizer for a specific pipeline
./optimization/pipeline-optimizer.sh --pipeline doc-pipeline
# Handle an error in a pipeline step
./optimization/self-healing.sh \
--error-type api-error \
--agent agent-writer \
--attempt 1 \
--context "OpenAI timeout on summarize call"
# Check healing log
cat healing-log.jsonl | python3 -m json.tool