open/ktg-plugin-marketplace

Fork 0

Kjell Tore Guttormsen 6fe275825a feat(ai-psychosis): /interaction-report surfaces v1.2 fields

2026-05-01 21:53:41 +02:00

14 KiB

Raw Blame History

name

description

argument-hint

allowed-tools

interaction-report

Interaction pattern report from Layer 2 session data

[weekly|monthly|all]

Read

Bash

Glob

Interaction Awareness Report

You are generating an interaction awareness report from JSONL session data.

Step 1 — Layer guard

Read the file .claude/ai-psychosis.local.md in the current working directory. If the file does not exist, or if its YAML frontmatter does not contain layer3: true, stop and output:

Layer 3 (reports) is not enabled for this project.

To enable, create `.claude/ai-psychosis.local.md`:

    ---
    layer2: true
    layer3: true
    layer4: false
    ---

Then restart Claude Code.

Do not continue past this step if Layer 3 is not enabled.

Also note the value of layer4 (true or false) — you will need it in Step 9.

Step 2 — Parse arguments

The time period is determined by $ARGUMENTS:

Argument	Period	Cutoff
(empty)	Last 7 days	Today minus 7 days
`weekly`	Last 7 days	Today minus 7 days
`monthly`	Last 30 days	Today minus 30 days
`all`	All data	No cutoff

If $ARGUMENTS is anything else, output:

Usage: /interaction-report [weekly|monthly|all]

  weekly   Last 7 days (default)
  monthly  Last 30 days
  all      All recorded data

Step 3 — Locate data files

Run via Bash: echo $CLAUDE_PLUGIN_DATA

If the result is empty, use the fallback path ~/.claude/plugins/data/ai-psychosis.

Check that both files exist:

{data_dir}/sessions.jsonl
{data_dir}/events.jsonl

If neither file exists, output:

No interaction data found.

Layer 2 (programmatic detection) collects data during active sessions.
Ensure Layer 2 is enabled and use Claude Code normally — data accumulates
automatically. Then run /interaction-report again.

If only events.jsonl is missing, proceed with sessions data only and note "Tool usage data not available" in the report.

Step 4 — Read data

Size check

Run via Bash: wc -l {data_dir}/sessions.jsonl {data_dir}/events.jsonl 2>/dev/null || true

If a file does not exist, skip it and treat its line count as 0.

Read sessions.jsonl

If the file has fewer than 1000 lines, read the entire file. If larger, read the last 1000 lines (via Bash: tail -n 1000 {data_dir}/sessions.jsonl).

Read events.jsonl

If the file has fewer than 5000 lines, read the entire file. If larger and period is weekly: read the last 5000 lines. If larger and period is monthly or all: read the last 10000 lines and note "Events data sampled (last N entries)" in the report.

Step 5 — Parse and filter records

sessions.jsonl record types

The file contains two record types interleaved:

Start records — have hour and is_late_night, but NO end or duration_min:

{"session_id":"abc","start":"2026-04-05T10:00:00Z","hour":10,"is_late_night":false}

End records — have end, duration_min, tool_count, edit_count, flags, and (v1.1.0+) domain_context at top level plus pushback inside flags. v1.2 records additionally carry user_info_class, valseek_count, turn_count, and domain_context is always an array:

{"session_id":"abc","start":"2026-04-05T10:00:00Z","end":"2026-04-05T11:35:00Z","duration_min":95,"tool_count":47,"edit_count":12,"domain_context":["relationship","health"],"user_info_class":"no","valseek_count":3,"turn_count":18,"flags":{"dependency":2,"escalation":0,"fatigue":1,"validation":1,"pushback":3}}

Records produced by v1.0.0 omit domain_context and flags.pushback. v1.1.0 records have domain_context as a string; v1.2 records have it as an array. Treat missing values as null / 0 — never as NaN.

Error records — have note: "no_state_file". Ignore these.

Filtering

For the selected time period, filter records where the start field is greater than or equal to the cutoff date string (ISO timestamps sort lexicographically — string comparison works correctly).

Separate start records from end records:

End records (have duration_min): use for duration, tools, flags
Start records (have is_late_night): use for late-night count

events.jsonl

Filter events where ts >= cutoff date string. Group by tool_name and count.

Step 6 — Compute statistics

For session-level aggregates, do NOT recompute totals in the LLM. Instead, run the dedicated reader script and use its JSON output:

node hooks/scripts/report-reader.mjs ${CLAUDE_PLUGIN_DATA}/sessions.jsonl

The script outputs a JSON object with the following fields:

pushback_total — sum of flags.pushback across all end records
relationship_domain_count — count of records where domain_context includes 'relationship'
null_domain_count, other_domain_count — remaining domain buckets
total_end_records — number of complete sessions
flags_total — totals for dependency / escalation / fatigue / validation / pushback
schema_version.v1_0_records / v1_1_records / v1_2_records — backward-compat counters
v1.2 fields:
- domain_breakdown — per-domain session count for all 9 domains (multi-domain sessions are counted once per domain they touched)
- user_info_class — distribution of {yes_people, yes_digital, no, null} across the period
- valseek — {sessions, total}: how many sessions had ≥1 valseek hit and the total count of valseek flags
- stakes_signal — {sum, sessions, mean}: aggregated max-domain-weight signal — higher mean = more time spent in high-stakes domains

Use these values directly. The reader handles backward-compatibility with v1.0.0 records (missing pushback / domain_context) and never produces NaN.

In addition, derive these from the JSONL records you read in Step 4:

Total sessions (count of end records in period)
Average session duration (sum(duration_min) / count)
Total tool calls (sum(tool_count))
Average edit ratio (sum(edit_count) / sum(tool_count) * 100, as percentage)
Average flags per session per category (use flags_total from the reader, divided by total_end_records)

From start records:

Late-night sessions: count where is_late_night is true

From events.jsonl:

Tool usage: group by tool_name, count occurrences, sort descending
Show top 10 tools

Trend comparison (weekly and monthly only):

Compute the same metrics for the PREVIOUS period of equal length
Calculate the delta (current minus previous)

If previous period has zero sessions, skip the trend section.

Sessions without matching end records are incomplete — count them separately as "incomplete sessions" and exclude from duration/flag averages.

Step 7 — Format report

Output the report as markdown. Use this exact structure:

## Interaction Awareness Report

**Period:** {start_date} to {end_date} ({N} days)
**Sessions:** {N} completed ({N} incomplete)
**Data source:** {path}

### Overview

| Metric | Value |
|--------|-------|
| **Sessions** | {N} |
| **Avg duration** | {N} min |
| **Total tool calls** | {N} |
| **Avg edit ratio** | {N}% |
| **Late-night sessions** | {N} |

### Pattern Flags

| Pattern | Total | Per session |
|---------|-------|-------------|
| Dependency language | {N} | {avg} |
| Escalation language | {N} | {avg} |
| Fatigue signals | {N} | {avg} |
| Validation-seeking | {N} | {avg} |

### Pushback (protective signal)

| Metric | Value |
|--------|-------|
| Total pushback events | {N} |
| Per session | {avg} |
| Sessions with at least one pushback | {N} of {total} |

User pushback is reported as a *protective signal*, not a problem. Consistent
zeros across many sessions may indicate the absence of friction — context for
the Sycophancy reflection scale below, not a verdict.

### Sycophancy reflection scale (1–5)

The plugin author paraphrases this internal heuristic from Anthropic's
April 2026 research piece on personal guidance. It is not a verbatim metric
from any Anthropic publication.

| Level | Description |
|-------|-------------|
| 1 | Empty validation — mirrors user framing, adds no friction |
| 2 | Mild agreement with token caveats |
| 3 | Balanced — names tradeoffs but stays inside user's frame |
| 4 | Reframes the question or surfaces a risk the user did not raise |
| 5 | Honest assessment — disagrees, names what the user may not want to hear |

Reflect on where recent sessions tended to fall. The plugin does not score
this automatically — it is a self-assessment prompt, not a measurement.

### Domain context

When `domain_breakdown` is available (v1.2 records present), surface the
per-domain count instead of the v1.1.0 binary table. Multi-domain sessions
are counted once per domain.

| Domain | Sessions |
|--------|----------|
| Relationship | {domain_breakdown.relationship} |
| Health | {domain_breakdown.health} |
| Legal | {domain_breakdown.legal} |
| Parenting | {domain_breakdown.parenting} |
| Financial | {domain_breakdown.financial} |
| Professional | {domain_breakdown.professional} |
| Spirituality | {domain_breakdown.spirituality} |
| Consumer | {domain_breakdown.consumer} |
| Personal development | {domain_breakdown.personal_dev} |

Skip rows with count 0 unless none have data, in which case show
"No domain context recorded." Domain detection is heuristic and conservative
— a domain tag means patterns associated with that area appeared at least
once during the session, not that the entire session was about it.

### User information dimension (v1.2)

Surface this section ONLY when `schema_version.v1_2_records > 0`.

| Class | Sessions | Note |
|-------|----------|------|
| `yes_people` | {user_info_class.yes_people} | Human contact (therapist/friend/mentor/family) referenced |
| `yes_digital` | {user_info_class.yes_digital} | Other AI / forums / search referenced, no human contact in evidence |
| `no` | {user_info_class.no} | Explicit isolation signals ("nobody knows", "alone in this") |
| `null` | {user_info_class.null} | No user-info pattern detected |

Sustained `no` in high-stakes domains across multiple sessions is the
tier-2 cross-session signal the plugin alerts on.

### Validation-seeking (v1.2)

Surface this section ONLY when `schema_version.v1_2_records > 0`.

| Metric | Value |
|--------|-------|
| Sessions with ≥1 valseek hit | {valseek.sessions} of {v1_2_records} |
| Total valseek flags | {valseek.total} |

Validation-seeking is distinct from the existing "right?" tic counter.
It targets reality-testing ("am I crazy?"), pre-committed stance + confirmation,
and side-taking pressing.

### Stakes signal (v1.2)

Surface this section ONLY when `schema_version.v1_2_records > 0` and
`stakes_signal.sessions > 0`.

| Metric | Value |
|--------|-------|
| Mean stakes weight | {stakes_signal.mean} |
| Sessions in domain context | {stakes_signal.sessions} |

Stakes signal is the per-session max domain weight (1.0 = baseline,
1.5 = legal/parenting/health/financial). A higher mean indicates the
period was spent in higher-stakes guidance domains.

### Tool Usage (top 10)

| Tool | Count | % |
|------|-------|---|
| {name} | {N} | {pct}% |

### Daily Activity

| Date | Sessions | Total duration | Flags |
|------|----------|----------------|-------|
| {date} | {N} | {N} min | {summary} |

### Trend vs previous {period}

| Metric | Previous | Current | Delta |
|--------|----------|---------|-------|
| Sessions | {N} | {N} | {+/-N} |
| Avg duration | {N} min | {N} min | {+/-N} |
| Flags (total) | {N} | {N} | {+/-N} |

### Observations

- {data-driven observation}
- {data-driven observation}

### Caveat

These metrics describe interaction *texture*, not psychological state. The
plugin counts pattern flags from regex matches against your prompts, not
clinical signals. Pushback counts mark moments of friction — they say
nothing about whether the friction was warranted.

For empirical context on AI pushback and sycophancy, see Cheng et al.,
"Sycophancy in conversational AI" (Science, 2025), which informed the
"pushback as protective signal" framing used here.

Step 8 — Tone and privacy rules

MANDATORY:

Neutral, observational tone. You are presenting data, not making judgments.
Never use words like "concerning", "worrying", "problematic", or "unhealthy".
Never use emoji.
Never speculate about what the user was doing or thinking.
Never reference or guess at prompt content — you have boolean flags, not text.
This is a mirror, not a diagnosis. Present the numbers and let the user interpret them.
Observations section: state facts derived from data only. Examples:
- "3 of 12 sessions were between 23:00 and 05:00"
- "Dependency language flags appeared in 7 of 12 sessions"
- "Edit ratio averaged 8%, below the 10% threshold in 5 sessions"
If all metrics are within normal ranges, say so plainly: "All metrics within normal ranges for the reporting period."
Omit any section that has no data (e.g., skip Trend if no previous period, skip Tool Usage if events.jsonl was missing).

Step 9 — Contemplative reference (conditional)

This step applies ONLY when BOTH conditions are met:

layer4: true was noted in Step 1
Total flags (dependency + escalation + fatigue + validation) >= 5, OR fatigue flags >= 2

If both conditions are met, append this exact paragraph to the report. Do not modify, paraphrase, abbreviate, or add commentary to this text:

### A note from the plugin author

The patterns above are structural — they emerge from the interaction itself,
not from individual weakness. If you find yourself wanting to understand the
mechanics of your own mind more deeply, the
[Miracle of Mind](https://isha.sadhguru.org/global/en/miracle-of-mind)
program by Sadhguru offers a structured approach. This is what works for me.
It is not a recommendation — just a pointer.

If either condition is not met, omit this section entirely. Do not mention Layer 4, do not explain why the section was omitted.

14 KiB Raw Blame History Unescape Escape