From 0bdfc02e7550bd357fb0d1ff834349f2900eb73a Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Sat, 9 May 2026 09:20:54 +0200 Subject: [PATCH] =?UTF-8?q?docs(voyage):=20jsonl=20schema=20audit=20?= =?UTF-8?q?=E2=80=94=20field-allowlist=20input=20for=20v4.1=20otel=20expor?= =?UTF-8?q?ter?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Step 1 av v4.1-execute (Wave 1, Session 1). Audit alle 6 trek*-stats.jsonl-skjemaer + lib/stats/event-emit.mjs autonomy events + hooks/scripts/post-bash-stats.mjs PostToolUse Bash records. Produser markdown-tabell {schema_id, fields[], writer_path, line_ref, v4.1 additive, PII} som load-bearing input til Step 11 (field-allowlist) og Step 8 (stats plumbing). Spesielle merker: - command_excerpt fra post-bash-stats.mjs flagget CWE-212 (improper cross- boundary removal of sensitive data) — eksporten MÅ hard-ekskludere uten eksplisitt VOYAGE_EXPORT_INCLUDE_COMMAND_EXCERPT=1 (deferred til v4.2) - v4.1 additive fields enumerert per skjema: profile, phase_models, parallel_agents, external_research_enabled, profile_source - EXPORT_ALLOWLIST + EXPORT_DENYLIST utdrag i bunnen som forhåndsdefinisjon av Step 11 inline static consts Co-Authored-By: Claude Opus 4.7 --- .../voyage/tests/fixtures/jsonl-schemas.md | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 plugins/voyage/tests/fixtures/jsonl-schemas.md diff --git a/plugins/voyage/tests/fixtures/jsonl-schemas.md b/plugins/voyage/tests/fixtures/jsonl-schemas.md new file mode 100644 index 0000000..0275466 --- /dev/null +++ b/plugins/voyage/tests/fixtures/jsonl-schemas.md @@ -0,0 +1,76 @@ +# Voyage JSONL stats — schema audit (v4.1 input) + +> **Purpose:** Field-allowlist input for v4.1 OTel exporter (Step 11). Lists every +> field every voyage stats JSONL writer emits today, plus the additive fields v4.1 +> introduces. Load-bearing for Step 11 (field-allowlist) and Step 8 (stats plumbing). +> +> **PII-flag:** `command_excerpt` from `hooks/scripts/post-bash-stats.mjs` slices +> the first 120 chars of an arbitrary Bash command — may contain operator paths, +> branch names, or fragments of secrets that survived the secrets-hook. CWE-212 +> (Improper Cross-boundary Removal of Sensitive Data). The OTel exporter MUST +> NOT export this field unless the operator explicitly opts in via +> `VOYAGE_EXPORT_INCLUDE_COMMAND_EXCERPT=1` (deferred to v4.2 — v4.1 hard-excludes). +> +> **Additive v4.1 fields:** `profile`, `phase_models`, `parallel_agents`, +> `external_research_enabled`, `profile_source`. All are forward-compat: existing +> v4.0 consumers ignore unknown keys, v4.1 consumers get richer signal. + +## Field table per JSONL writer + +| schema_id | fields | writer_path | line_ref | v4.1 additive | PII | +|-----------|--------|-------------|----------|---------------|-----| +| trekbrief-stats | ts, task, slug, mode, interview_turns, review_iterations, brief_quality, research_topics, auto_research, auto_result, project_dir | commands/trekbrief.md (orchestrator-emit Phase 7) | trekbrief.md:657-672 | profile, phase_models, profile_source | none | +| trekresearch-stats | ts, question, mode, scope, slug, project_dir, brief_path, dimensions, agents_local, agents_external, gemini_used, confidence, contradictions, open_questions | commands/trekresearch.md (orchestrator-emit Stats tracking) | trekresearch.md:388-410 | profile, phase_models, parallel_agents, external_research_enabled, profile_source | none | +| trekplan-stats | ts, task, mode, slug, brief_path, project_dir, codebase_size, codebase_files, agents_deployed, deep_dives, research_briefs_used, research_scout_used, critic_verdict, guardian_verdict, outcome | commands/trekplan.md (orchestrator-emit Phase 12) | trekplan.md:805-826 | profile, phase_models, parallel_agents, profile_source | none | +| trekexecute-stats (Phase 9 record) | ts, plan, plan_type, mode, result, steps_total, steps_passed, steps_failed, steps_skipped, failed_at_step | commands/trekexecute.md (orchestrator-emit Phase 9) | trekexecute.md:1479-1494 | profile, phase_models, profile_source | none | +| trekexecute-stats (autonomy events) | ts, event, known_event, payload | lib/stats/event-emit.mjs `emit()` | event-emit.mjs:64-86 | payload.profile, payload.phase_models, payload.profile_source | none | +| trekexecute-stats (PostToolUse Bash) | ts, session_id, command_excerpt, duration_ms, success | hooks/scripts/post-bash-stats.mjs (Bash PostToolUse) | post-bash-stats.mjs:42-54 | none (hook is plugin-level, not profile-aware) | command_excerpt (CWE-212) | +| trekreview-stats | ts, slug, verdict, counts (BLOCKER/MAJOR/MINOR/SUGGESTION), reviewed_files_count, mode, duration_ms | commands/trekreview.md (orchestrator-emit Phase 8) | trekreview.md:255 | profile, phase_models, profile_source | none | +| trekcontinue-stats | ts, project, next_session_label, status | commands/trekcontinue.md (orchestrator-emit Phase 5) | trekcontinue.md:289 | profile, profile_source | none | + +## Field-allowlist input for Step 11 + +The OTel exporter (Step 11 `lib/exporters/field-allowlist.mjs`) MUST inline the +following static const arrays (NOT load from this file at runtime — Step 11 +explicit constraint: INLINE static const, IKKE runtime fra tests/fixtures): + +**EXPORT_ALLOWLIST** (numeric/bool/short-string fields safe for OTel metric labels): + +``` +ts, slug, mode, brief_quality, auto_research, auto_result, +codebase_size, codebase_files, agents_deployed, deep_dives, +agents_local, agents_external, gemini_used, dimensions, confidence, +contradictions, open_questions, interview_turns, review_iterations, +research_topics, research_briefs_used, research_scout_used, +critic_verdict, guardian_verdict, outcome, plan_type, result, +steps_total, steps_passed, steps_failed, steps_skipped, failed_at_step, +verdict, reviewed_files_count, duration_ms, status, next_session_label, +event, known_event, success, scope, +profile, profile_source, parallel_agents, external_research_enabled +``` + +**EXPORT_DENYLIST** (PII or high-cardinality, never export): + +``` +task, question, project_dir, project, plan, brief_path, command_excerpt, payload, counts, phase_models, session_id +``` + +> Notes: +> - `task` and `question` may contain user-content prose → high-cardinality + PII risk. +> - `project_dir` and paths leak filesystem layout. +> - `command_excerpt` per CWE-212 above. +> - `phase_models` is a structured object (6 keys) — too high-cardinality for label; +> profile name (`profile`) is the safe summary. v4.2 may revisit if operators ask. +> - `counts` (review BLOCKER/MAJOR/MINOR/SUGGESTION) is a nested object — Step 11 +> exporter flattens to `voyage_review_counts_blocker`/`_major`/`_minor`/`_suggestion` +> metrics rather than a label. +> - `session_id` is a UUID — high-cardinality, not useful as a label, log-only. + +## Cross-reference + +- Step 8 (stats plumbing) — adds `profile` + `phase_models` + `profile_source` to all + 6 orchestrator-emit sites listed above. +- Step 11 (field-allowlist) — codifies the EXPORT_ALLOWLIST/DENYLIST arrays above + as inline static consts in `lib/exporters/field-allowlist.mjs`. +- Step 9 (Prometheus textfile) — emits one metric line per allowlist-numeric field + per JSONL writer; PII-flagged fields are dropped at format-layer, not export-layer.