feat(templates): add pipeline optimization and self-healing templates
Session 5 step 21 — pipeline-optimizer writes RECOMMENDATIONS.md with VFM pre-scores (never modifies pipeline files directly). self-healing categorizes errors and applies recovery strategies with 5-attempt hard cap, logging to healing-log.jsonl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
d743ec7fbf
commit
fa8bc86897
3 changed files with 456 additions and 0 deletions
88
scripts/templates/optimization/README.md
Normal file
88
scripts/templates/optimization/README.md
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
# Pipeline Optimization and Self-Healing
|
||||
|
||||
Two tools for making agent pipelines more efficient and resilient over time.
|
||||
|
||||
## pipeline-optimizer.sh
|
||||
|
||||
Analyzes FEEDBACK.md and cost-events.jsonl to identify:
|
||||
|
||||
| Issue | Detection | Recommendation |
|
||||
|-------|-----------|----------------|
|
||||
| Bottleneck agent | Top-2 by cost event count, 1.5x+ avg | Batch tool calls or narrow task scope |
|
||||
| Unnecessary revision loops | 3+ `loop-excess` pattern rows | Tighten acceptance criteria, add max-iterations guard |
|
||||
| Underutilized agent | Appears in < 10% of pipeline runs | Remove from pipeline or combine with another agent |
|
||||
| Cost outlier | Single run >= 3x average | Add per-run budget cap via budget-hook.sh |
|
||||
|
||||
Output is written to `RECOMMENDATIONS.md` with a VFM pre-score for each
|
||||
recommendation. Higher VFM pre-scores mean more value per implementation effort.
|
||||
|
||||
**This script does not auto-implement anything.** All changes require
|
||||
manual review and explicit approval. This is intentional — pipeline
|
||||
restructuring is a high-stakes operation.
|
||||
|
||||
## self-healing.sh
|
||||
|
||||
Categorizes errors and applies targeted recovery strategies:
|
||||
|
||||
| Error Type | Recovery | Max Retries |
|
||||
|------------|----------|-------------|
|
||||
| `timeout` | Retry with shorter scope | 5 (hard cap) |
|
||||
| `permission-denied` | Log and skip | 0 (no retry) |
|
||||
| `tool-not-found` | Alert operator | 0 (no retry) |
|
||||
| `api-error` | Exponential backoff (2^n seconds) | 3 |
|
||||
| `content-quality` | Retry with stricter prompt | 2 |
|
||||
|
||||
**Hard cap: 5 total attempts regardless of category.** This follows the
|
||||
OpenClaw pattern — unbounded retry loops are the most common cause of
|
||||
runaway agent costs. The cap is non-negotiable.
|
||||
|
||||
After the hard cap is reached, the script exits with code 2 (escalate).
|
||||
The caller is responsible for deciding whether to pause, alert a human,
|
||||
or abort the pipeline run.
|
||||
|
||||
## Connection to feedback and VFM
|
||||
|
||||
```
|
||||
feedback-collector.sh -> FEEDBACK.md -> performance-scorer.sh -> flagged agents
|
||||
|
|
||||
pipeline-optimizer.sh -> RECOMMENDATIONS.md
|
||||
|
|
||||
(manual review + approval)
|
||||
|
|
||||
prompt/pipeline update
|
||||
|
|
||||
new runs -> new feedback
|
||||
```
|
||||
|
||||
VFM pre-scores in RECOMMENDATIONS.md use the same 0–100 scale as
|
||||
`scripts/templates/proactive/VFM-SCORING.md` (Step 11). They are
|
||||
pre-scores, not final scores — the VFM evaluation still needs to run
|
||||
when the task is scheduled. The pre-scores help prioritize which
|
||||
recommendations to tackle first.
|
||||
|
||||
## Safety limits
|
||||
|
||||
- `pipeline-optimizer.sh`: read-only analysis — never modifies pipeline files
|
||||
- `self-healing.sh`: max 5 attempts hard cap, permission errors never retried
|
||||
- All events logged to `healing-log.jsonl` for audit trail
|
||||
- No auto-escalation to external systems — exit codes only
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Run optimizer for all pipelines
|
||||
./optimization/pipeline-optimizer.sh
|
||||
|
||||
# Run optimizer for a specific pipeline
|
||||
./optimization/pipeline-optimizer.sh --pipeline doc-pipeline
|
||||
|
||||
# Handle an error in a pipeline step
|
||||
./optimization/self-healing.sh \
|
||||
--error-type api-error \
|
||||
--agent agent-writer \
|
||||
--attempt 1 \
|
||||
--context "OpenAI timeout on summarize call"
|
||||
|
||||
# Check healing log
|
||||
cat healing-log.jsonl | python3 -m json.tool
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue