feat(voyage): add examples/observability/ Docker Compose stack — version-pinned per research/01

Step 14 of v4.1 — local-development observability stack with version-pinned
container images:
  - prom/prometheus:v3.0.1
  - prom/node-exporter:v1.10.2 (textfile collector enabled)
  - grafana/grafana:11.4.0
  - otel/opentelemetry-collector-contrib:0.115.0

Two complementary export paths from voyage hooks/scripts/otel-export.mjs:
  - VOYAGE_EXPORT_MODE=textfile → node-exporter textfile collector
  - VOYAGE_EXPORT_MODE=otlp     → otel-collector OTLP/HTTP receiver (:4318)
Both feed Prometheus → Grafana.

Files:
  examples/observability/docker-compose.yml
  examples/observability/otel-collector-config.yaml
  examples/observability/prometheus.yml
  examples/observability/grafana-datasource.yml
  examples/observability/README.md

Verified manifest expected_paths (5 files). docker compose config validation
runs in Step 16 with proper skip-pattern when docker is unavailable.
This commit is contained in:
Kjell Tore Guttormsen 2026-05-09 09:50:13 +02:00
commit 48543f63c2
5 changed files with 240 additions and 0 deletions

View file

@ -0,0 +1,75 @@
# Voyage observability — local Docker Compose stack
A version-pinned local-development stack for inspecting Voyage v4.1 metrics
emitted by `hooks/scripts/otel-export.mjs`. Two complementary paths:
| Mode | env var | Pull/push | Container that scrapes |
|------|---------|-----------|------------------------|
| Prometheus textfile | `VOYAGE_EXPORT_MODE=textfile` | voyage writes `voyage.prom` to `./voyage-textfile/`, node-exporter scrapes | `node-exporter` |
| OTLP/HTTP | `VOYAGE_EXPORT_MODE=otlp` | voyage POSTs to `http://localhost:4318/v1/metrics` | `otel-collector` (re-exposed as Prometheus on `:8889`) |
Both modes feed Prometheus → Grafana.
## Quickstart
```bash
cd examples/observability
mkdir -p voyage-textfile
docker compose up -d
```
Endpoints (all bound to `localhost`):
| Service | URL |
|---------|-----|
| Prometheus UI | http://localhost:9090 |
| Grafana UI | http://localhost:3000 (anonymous Viewer enabled; admin/admin) |
| OTLP/HTTP receiver | http://localhost:4318/v1/metrics |
| Node Exporter | http://localhost:9100/metrics |
| OTel Collector Prometheus exporter | http://localhost:8889/metrics |
Stop with `docker compose down`. Add `-v` to wipe Prometheus + Grafana volumes.
## Activating Voyage export
In another terminal, set one of the env vars before invoking a Voyage command:
```bash
# Path A — textfile mode
export VOYAGE_EXPORT_MODE=textfile
export VOYAGE_TEXTFILE_DIR="$(pwd)/voyage-textfile"
# Path B — OTLP mode
export VOYAGE_EXPORT_MODE=otlp
export VOYAGE_OTLP_ENDPOINT=http://localhost:4318/v1/metrics
```
The Stop hook (wired in `hooks/hooks.json`) will run
`hooks/scripts/otel-export.mjs` automatically at session-end.
See `docs/observability.md` for the full env-var matrix and security notes.
## Pinned versions (per research/01)
| Component | Image | Pinned to |
|-----------|-------|-----------|
| Prometheus | `prom/prometheus` | `v3.0.1` |
| Node Exporter | `prom/node-exporter` | `v1.10.2` |
| Grafana | `grafana/grafana` | `11.4.0` |
| OTel Collector (contrib) | `otel/opentelemetry-collector-contrib` | `0.115.0` |
These are reference versions for the v4.1 release window; bump only after
re-testing the full smoke flow.
## Limitations
- Stop-hook export runs at *normal* session end. If Claude Code exits via
crash or hard kill, the final metrics for that session are not flushed.
Use `--resume` on next start to recover plan/progress state; metrics for
the unflushed session will be missing from Prometheus.
- The `VOYAGE_OTEL_ALLOW_PRIVATE=1` escape hatch enables sending to
RFC1918 addresses (home-lab use). It is **off by default** so accidental
internal-network exfiltration is blocked. See `docs/observability.md`.
- This stack is for *local development*. Do not expose ports `:9090` /
`:3000` / `:4318` outside loopback — Grafana ships with anonymous viewer
access enabled and admin/admin credentials.

View file

@ -0,0 +1,71 @@
services:
# OpenTelemetry Collector — receives OTLP/HTTP push from voyage hooks/scripts/otel-export.mjs
# and forwards metrics to Prometheus via prometheus exporter (scrape endpoint :8889)
otel-collector:
image: otel/opentelemetry-collector-contrib:0.115.0
container_name: voyage-otel-collector
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml:ro
ports:
- "4317:4317" # OTLP/gRPC (not used by voyage, kept for parity)
- "4318:4318" # OTLP/HTTP — voyage sends here when VOYAGE_EXPORT_MODE=otlp
- "8889:8889" # Prometheus exporter scrape endpoint
restart: unless-stopped
# Node Exporter with textfile collector — scrapes voyage.prom files written by voyage hooks
# when VOYAGE_EXPORT_MODE=textfile. Volume-mount: ./voyage-textfile/ matches voyage default.
node-exporter:
image: prom/node-exporter:v1.10.2
container_name: voyage-node-exporter
command:
- "--path.rootfs=/host"
- "--collector.textfile.directory=/var/lib/node_exporter/textfile"
- "--no-collector.arp"
- "--no-collector.bcache"
volumes:
- ./voyage-textfile:/var/lib/node_exporter/textfile:ro
- /:/host:ro,rslave
ports:
- "9100:9100"
restart: unless-stopped
# Prometheus — scrapes both node-exporter (textfile) and otel-collector (OTLP-derived)
prometheus:
image: prom/prometheus:v3.0.1
container_name: voyage-prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=14d"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
ports:
- "9090:9090"
depends_on:
- node-exporter
- otel-collector
restart: unless-stopped
# Grafana — preconfigured Prometheus datasource for voyage dashboards
grafana:
image: grafana/grafana:11.4.0
container_name: voyage-grafana
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
volumes:
- ./grafana-datasource.yml:/etc/grafana/provisioning/datasources/voyage.yml:ro
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
depends_on:
- prometheus
restart: unless-stopped
volumes:
prometheus-data:
grafana-data:

View file

@ -0,0 +1,16 @@
# Grafana datasource provisioning for voyage v4.1 observability stack.
# Auto-loaded from /etc/grafana/provisioning/datasources/ on Grafana start.
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
timeInterval: 15s
httpMethod: POST
manageAlerts: false

View file

@ -0,0 +1,47 @@
# OpenTelemetry Collector config for voyage v4.1
# Receives OTLP/HTTP push from hooks/scripts/otel-export.mjs (port 4318)
# and exposes a Prometheus scrape endpoint at :8889 for the Prometheus
# container to pull voyage metrics.
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
send_batch_size: 1024
timeout: 5s
# Conservative resource attribute limits — voyage emits small payloads but
# we cap to prevent runaway label-cardinality blowing up Prometheus.
memory_limiter:
check_interval: 5s
limit_mib: 256
spike_limit_mib: 64
exporters:
prometheus:
endpoint: 0.0.0.0:8889
namespace: voyage
send_timestamps: true
metric_expiration: 5m
enable_open_metrics: true
# Debug exporter — echoes every received metric to stderr. Useful for
# local development; comment out in production to reduce log volume.
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus, debug]
telemetry:
logs:
level: info

View file

@ -0,0 +1,31 @@
# Prometheus config for voyage v4.1 observability stack.
# Two scrape targets:
# 1. node-exporter — picks up voyage.prom files written by hooks/scripts/otel-export.mjs
# when VOYAGE_EXPORT_MODE=textfile (default location: ./voyage-textfile/)
# 2. otel-collector — exposes voyage metrics from OTLP push when VOYAGE_EXPORT_MODE=otlp
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: voyage-local
scrape_configs:
# Path A: voyage textfile mode → node-exporter textfile collector
- job_name: voyage-textfile
static_configs:
- targets: ["node-exporter:9100"]
labels:
voyage_export_mode: textfile
# Path B: voyage OTLP mode → otel-collector prometheus exporter
- job_name: voyage-otlp
static_configs:
- targets: ["otel-collector:8889"]
labels:
voyage_export_mode: otlp
# Self-scrape so Prometheus shows its own up=1 in dashboards.
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]