agent-builder/scripts/templates/optimization/README.md
Kjell Tore Guttormsen fa8bc86897 feat(templates): add pipeline optimization and self-healing templates
Session 5 step 21 — pipeline-optimizer writes RECOMMENDATIONS.md with
VFM pre-scores (never modifies pipeline files directly). self-healing
categorizes errors and applies recovery strategies with 5-attempt hard
cap, logging to healing-log.jsonl.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 06:51:38 +02:00

88 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Pipeline Optimization and Self-Healing
Two tools for making agent pipelines more efficient and resilient over time.
## pipeline-optimizer.sh
Analyzes FEEDBACK.md and cost-events.jsonl to identify:
| Issue | Detection | Recommendation |
|-------|-----------|----------------|
| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope |
| Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard |
| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent |
| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh |
Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each
recommendation. Higher VFM pre-scores mean more value per implementation effort.
**This script does not auto-implement anything.** All changes require
manual review and explicit approval. This is intentional — pipeline
restructuring is a high-stakes operation.
## self-healing.sh
Categorizes errors and applies targeted recovery strategies:
| Error Type | Recovery | Max Retries |
|------------|----------|-------------|
| `timeout` | Retry with shorter scope | 5 (hard cap) |
| `permission-denied` | Log and skip | 0 (no retry) |
| `tool-not-found` | Alert operator | 0 (no retry) |
| `api-error` | Exponential backoff (2^n seconds) | 3 |
| `content-quality` | Retry with stricter prompt | 2 |
**Hard cap: 5 total attempts regardless of category.** This follows the
OpenClaw pattern — unbounded retry loops are the most common cause of
runaway agent costs. The cap is non-negotiable.
After the hard cap is reached, the script exits with code 2 (escalate).
The caller is responsible for deciding whether to pause, alert a human,
or abort the pipeline run.
## Connection to feedback and VFM
```
feedback-collector.sh -> FEEDBACK.md -> performance-scorer.sh -> flagged agents
|
pipeline-optimizer.sh -> RECOMMENDATIONS.md
|
(manual review + approval)
|
prompt/pipeline update
|
new runs -> new feedback
```
VFM pre-scores in RECOMMENDATIONS.md use the same 0100 scale as
`scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are
pre-scores, not final scores — the VFM evaluation still needs to run
when the task is scheduled. The pre-scores help prioritize which
recommendations to tackle first.
## Safety limits
- `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files
- `self-healing.sh`: max 5 attempts hard cap, permission errors never retried
- All events logged to `healing-log.jsonl` for audit trail
- No auto-escalation to external systems — exit codes only
## Usage
```bash
# Run optimizer for all pipelines
./optimization/pipeline-optimizer.sh
# Run optimizer for a specific pipeline
./optimization/pipeline-optimizer.sh --pipeline doc-pipeline
# Handle an error in a pipeline step
./optimization/self-healing.sh \
--error-type api-error \
--agent agent-writer \
--attempt 1 \
--context "OpenAI timeout on summarize call"
# Check healing log
cat healing-log.jsonl | python3 -m json.tool
```