feat(templates): add budget tracking templates (Paperclip pattern)

Session 4 step 16 — post-hoc enforcement via PostToolUse hook with PAUSED flag, budget-report.sh aggregates spend against window limit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 06:55:33 +02:00 · 2026-04-12 06:55:33 +02:00 · ec6f7c150e
commit ec6f7c150e
parent 506f532f88
4 changed files with 246 additions and 0 deletions
--- a/scripts/templates/budget/BUDGET.md
+++ b/scripts/templates/budget/BUDGET.md
@ -0,0 +1,26 @@
 # Budget Policy: {{PROJECT_NAME}}
 ## Company Budget
 - window: {{BUDGET_WINDOW}}
 - limit: {{BUDGET_LIMIT_CENTS}} cents
 - warn_percent: 80
 - hard_stop: true
 ## Agent Budgets
 - {{AGENT_NAME}}: {{AGENT_BUDGET_CENTS}} cents/{{BUDGET_WINDOW}}
 ## Notification
 - on_warn: log
 - on_hard_stop: pause
 ## Notes
 Budget enforcement is POST-HOC (checked after each run, not before).
 This matches Paperclip's proven approach: check SUM(cost) after run,
 pause if exceeded. No pre-run reservation needed.
 Cost estimation uses token counts × published pricing. For accurate
 cost data, organizations can use the Admin API:
 `/v1/organizations/cost_report` (requires Admin API key: sk-ant-admin...).
 For headless runs, use `claude -p --max-budget-usd N` as a per-run cap.
--- a/scripts/templates/budget/README.md
+++ b/scripts/templates/budget/README.md
@ -0,0 +1,46 @@
 # Budget Tracking
 Post-hoc budget enforcement inspired by Paperclip's budget system.
 ## How it works
 1. `budget-hook.sh` runs as a PostToolUse hook after every tool call
 2. Each call is logged to `budget/cost-events.jsonl`
 3. After logging, cumulative cost is compared against `BUDGET.md` policy
 4. If soft threshold (default 80%) exceeded: warning to stderr
 5. If hard threshold (100%) exceeded and hard_stop=true: creates `budget/PAUSED`
   flag file, subsequent tool calls are blocked (exit 2)
 ## Why post-hoc, not pre-run?
 Paperclip uses the same approach. Pre-run budget reservation requires a
 persistent service or lock file coordination. Post-hoc checking is simpler
 and robust enough in practice — the worst case is one extra run before pause.
 ## Cost estimation
 The current implementation counts events as a rough proxy for cost. For
 accurate cost tracking, you have two options:
 1. **Admin API** (org accounts only): Query `/v1/organizations/cost_report`
   with an Admin API key (`sk-ant-admin...`). This gives actual USD costs.
 2. **Token estimation**: Parse token counts from Claude's responses and
   multiply by published per-token prices. More accurate than event counting
   but still an estimate.
 For headless runs, `claude -p --max-budget-usd N` provides a per-run
 budget cap directly in the CLI.
 ## Integration
 Add to `.claude/settings.json`:
 ```json
 {
  "hooks": {
    "PostToolUse": [{
      "matcher": "*",
      "hooks": [{"type": "command", "command": "bash budget/budget-hook.sh"}]
    }]
  }
 }
 ```
--- a/scripts/templates/budget/budget-hook.sh
+++ b/scripts/templates/budget/budget-hook.sh
@ -0,0 +1,90 @@
 #!/bin/bash
 # PostToolUse hook: Log cost events and enforce budget.
 # Bash 3.2 compatible. Uses python3 for JSON parsing.
 #
 # Follows Paperclip's post-hoc enforcement pattern:
 # 1. Log cost event after each tool call
 # 2. Check cumulative cost against budget policy
 # 3. Warn at soft threshold, pause at hard threshold
 #
 # Placeholders:
 #   {{WORKING_DIR}} - absolute path to project directory
 WORKING_DIR="{{WORKING_DIR}}"
 BUDGET_DIR="$WORKING_DIR/budget"
 COST_LOG="$BUDGET_DIR/cost-events.jsonl"
 BUDGET_FILE="$WORKING_DIR/BUDGET.md"
 PAUSED_FLAG="$BUDGET_DIR/PAUSED"
 mkdir -p "$BUDGET_DIR"
 # Read hook input
 INPUT=$(cat)
 TOOL_NAME=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_name',''))" 2>/dev/null)
 # Log cost event
 python3 -c "
 import json, sys, time, os
 try:
    data = json.loads('''$INPUT''')
 except:
    sys.exit(0)
 tool_name = data.get('tool_name', '')
 event = {
    'timestamp': time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime()),
    'tool_name': tool_name,
    'agent': os.environ.get('AGENT_NAME', 'unknown'),
    'estimated_tokens': 0
 }
 cost_log = '$COST_LOG'
 with open(cost_log, 'a') as f:
    f.write(json.dumps(event) + '\n')
 " 2>/dev/null
 # Check budget if BUDGET.md exists
 if [ -f "$BUDGET_FILE" ] && [ -f "$COST_LOG" ]; then
  BUDGET_RESULT=$(BUDGET_FILE="$BUDGET_FILE" COST_LOG="$COST_LOG" PAUSED_FLAG="$PAUSED_FLAG" python3 -c "
 import re, json, os
 budget_file = os.environ.get('BUDGET_FILE', '')
 cost_log = os.environ.get('COST_LOG', '')
 paused_flag = os.environ.get('PAUSED_FLAG', '')
 try:
    content = open(budget_file).read()
    limit_m = re.search(r'limit:\s*(\d+)\s*cents', content)
    if not limit_m: print('ok'); exit(0)
    limit = int(limit_m.group(1))
    warn_m = re.search(r'warn_percent:\s*(\d+)', content)
    warn_pct = int(warn_m.group(1)) if warn_m else 80
    hard_m = re.search(r'hard_stop:\s*(\w+)', content)
    hard_stop = hard_m.group(1).lower() == 'true' if hard_m else True
    event_count = sum(1 for _ in open(cost_log))
    estimated_cents = event_count
    pct = (estimated_cents / limit * 100) if limit > 0 else 0
    if pct >= 100 and hard_stop:
        open(paused_flag, 'w').write('Budget exceeded: ' + str(estimated_cents) + '/' + str(limit) + ' cents')
        print('hard_stop')
    elif pct >= warn_pct:
        print('warn')
    else:
        print('ok')
 except Exception as e:
    print('ok')
 " 2>/dev/null)
  if [ "$BUDGET_RESULT" = "hard_stop" ]; then
    echo "BUDGET EXCEEDED — agent paused. Check $PAUSED_FLAG" >&2
  elif [ "$BUDGET_RESULT" = "warn" ]; then
    echo "BUDGET WARNING — approaching limit" >&2
  fi
 fi
 # Check if agent is paused
 if [ -f "$PAUSED_FLAG" ]; then
  echo '{"decision": "block", "reason": "Agent paused: budget exceeded. Remove '"$PAUSED_FLAG"' to resume."}'
  exit 2
 fi
 exit 0
--- a/scripts/templates/budget/budget-report.sh
+++ b/scripts/templates/budget/budget-report.sh
@ -0,0 +1,84 @@
 #!/bin/bash
 # Budget report: summarize cost events and compare against policy.
 # Bash 3.2 compatible. Uses python3 for aggregation.
 #
 # Usage: ./budget-report.sh
 #
 # Placeholders:
 #   {{WORKING_DIR}} - absolute path to project directory
 WORKING_DIR="{{WORKING_DIR}}"
 COST_LOG="$WORKING_DIR/budget/cost-events.jsonl"
 BUDGET_FILE="$WORKING_DIR/BUDGET.md"
 PAUSED_FLAG="$WORKING_DIR/budget/PAUSED"
 if [ ! -f "$COST_LOG" ]; then
  echo "No cost events recorded yet."
  exit 0
 fi
 COST_LOG="$COST_LOG" BUDGET_FILE="$BUDGET_FILE" PAUSED_FLAG="$PAUSED_FLAG" python3 -c "
 import json, re, os
 from collections import defaultdict
 cost_log = os.environ.get('COST_LOG', '')
 budget_file = os.environ.get('BUDGET_FILE', '')
 paused_flag = os.environ.get('PAUSED_FLAG', '')
 # Read events
 events = []
 with open(cost_log) as f:
    for line in f:
        line = line.strip()
        if line:
            try:
                events.append(json.loads(line))
            except:
                pass
 # Aggregate
 by_agent = defaultdict(int)
 by_day = defaultdict(int)
 by_tool = defaultdict(int)
 for e in events:
    agent = e.get('agent', 'unknown')
    day = e.get('timestamp', '')[:10]
    tool = e.get('tool_name', 'unknown')
    by_agent[agent] += 1
    by_day[day] += 1
    by_tool[tool] += 1
 print('BUDGET REPORT')
 print('=' * 50)
 print('Total events: ' + str(len(events)))
 print()
 # Per-agent breakdown
 print('By Agent:')
 for agent, count in sorted(by_agent.items(), key=lambda x: -x[1]):
    print('  ' + agent + ': ' + str(count) + ' events')
 print()
 # Per-day breakdown (last 7 days)
 print('By Day (last 7):')
 for day, count in sorted(by_day.items())[-7:]:
    print('  ' + day + ': ' + str(count) + ' events')
 print()
 # Budget comparison
 if os.path.exists(budget_file):
    content = open(budget_file).read()
    limit_m = re.search(r'limit:\s*(\d+)\s*cents', content)
    if limit_m:
        limit = int(limit_m.group(1))
        est_cents = len(events)  # rough proxy
        pct = (est_cents / limit * 100) if limit > 0 else 0
        print('Budget: ~' + str(est_cents) + '/' + str(limit) + ' cents (' + str(round(pct)) + '%)')
 # Paused status
 if os.path.exists(paused_flag):
    print('')
    print('!! AGENT PAUSED: ' + open(paused_flag).read().strip())
    print('   Remove ' + paused_flag + ' to resume')
 "