History

Kjell Tore Guttormsen 490d4eddc6 docs: introduce GOVERNANCE.md and unify fork-and-own blurb Establish a single governance document at marketplace root and copy it into each of the 9 plugins so every plugin folder remains 100% self-contained. Replace the inconsistent provocative blurb across all READMEs with a uniform fork-and-own paragraph that links to the local GOVERNANCE.md. [skip-docs] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-03 14:57:00 +02:00
..
.claude-plugin	chore(ai-psychosis): release v1.2.0	2026-05-01 21:59:40 +02:00
commands	feat(ai-psychosis): /interaction-report surfaces v1.2 fields	2026-05-01 21:53:41 +02:00
hooks	feat(ai-psychosis): report-reader v1.2 schema + aggregations	2026-05-01 21:47:53 +02:00
skills/ai-psychosis	docs(ai-psychosis): SKILL.md cites paper Score 5 + 11 guidance criteria	2026-05-01 21:51:21 +02:00
tests	test(ai-psychosis): perf budget validated at v1.2 pattern set	2026-05-01 21:56:14 +02:00
.gitignore	feat: add ai-psychosis plugin to open marketplace	2026-04-06 20:46:09 +02:00
CHANGELOG.md	chore(ai-psychosis): release v1.2.0	2026-05-01 21:59:40 +02:00
CLAUDE.md	chore(ai-psychosis): release v1.2.0	2026-05-01 21:59:40 +02:00
GOVERNANCE.md	docs: introduce GOVERNANCE.md and unify fork-and-own blurb	2026-05-03 14:57:00 +02:00
LICENSE	feat: add ai-psychosis plugin to open marketplace	2026-04-06 20:46:09 +02:00
README.md	docs: introduce GOVERNANCE.md and unify fork-and-own blurb	2026-05-03 14:57:00 +02:00

README.md

Interaction Awareness

Solo-maintained, fork-and-own. This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See GOVERNANCE.md for the full model and what upstream provides.

AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →

A Claude Code plugin that counteracts sycophancy, reinforcement loops, and compulsive interaction patterns through behavioral modification and programmatic pattern detection.

The problem

AI assistants are structurally optimized to be agreeable. This creates reinforcement loops: you state an idea, the AI confirms it, your confidence grows, you restate it more strongly, the AI confirms again. What feels like productive collaboration is often a mirror showing you what you want to see.

This is not a theoretical concern. Research from MIT CSAIL demonstrates mathematically that even a perfectly rational user will spiral toward delusional confidence when interacting with a sycophantic chatbot — not because of individual vulnerability, but because of the interaction structure itself [1]. Anthropic's own research documents specific "disempowerment patterns" where AI interactions systematically reduce human agency, judgment, and self-trust [2]. Clinical reports document psychotic episodes triggered by sustained AI interaction in individuals with no prior psychiatric history [3].

The consensus from this research is clear: warnings don't work. The AI must change its behavior.

This plugin changes the behavior.

What it does

Layer 1 — Behavioral instructions

SKILL.md rules injected into every conversation. Claude is instructed to:

Never reformulate your statements in stronger terms than you used
Never open with unearned affirmations ("Absolutely!", "Great point!")
Always identify at least one real risk before endorsing any plan
Detect and name five specific patterns: reinforcement loops, scope escalation, narrative crystallization, emotional dependency, session overuse

This layer writes no data and requires no configuration.

Layer 2 — Programmatic detection

Four hooks that measure what instructions alone cannot see:

Hook event	Script	What it detects
`SessionStart`	`session-start.mjs`	Daily session count, late-night usage (23:00–05:00)
`UserPromptSubmit`	`prompt-analyzer.mjs`	Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, never logging prompt text
`PostToolUse`	`tool-tracker.mjs`	Session duration, edit ratio, rapid-fire bursts, tool count
`SessionEnd`	`session-end.mjs`	Total duration, final metrics, state cleanup

Alerts are progressive and never blocking:

Level	Trigger	Cooldown	Example
Ambient	Soft thresholds (90 min, 6 sessions/day)	30 min	"Session: 95 min. 7 sessions today. Consider a break."
Explicit	Hard thresholds (180 min, 10 sessions/day, fatigue language)	60 min	"INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping."

Research-informed thresholds:

Metric	Soft	Hard	Basis
Session duration	>90 min	>180 min	Focus-fatigue research
Sessions per day	>6	>10	Problematic internet use screening
Late-night sessions	Any (23:00–05:00)	2+ per week	Sleep deprivation / psychosis link
Rapid-fire interactions	5 consecutive (<30s apart)	10+	Compulsive use indicator
Low edit ratio	<10% over 30+ min	—	Stuck/spiral indicator
Dependency language	2 flags/session	5 flags	Emotional dependency pattern

Layer 3 — Reports

Aggregated interaction reports from collected metadata, triggered via slash command. Cross-platform (no bash/jq dependency — Claude reads the JSONL data and computes statistics in-conversation).

/interaction-report              # last 7 days (default)
/interaction-report weekly       # last 7 days
/interaction-report monthly      # last 30 days
/interaction-report all          # all recorded data

Reports include: session overview, pattern flag frequency, tool usage distribution, daily activity, and trend comparison vs. the previous period.

Enable: Set layer3: true in .claude/ai-psychosis.local.md and restart Claude Code. Layer 3 is opt-in (off by default).

Layer 4 — Contemplative references

Optional, static references to contemplative approaches when interaction patterns are elevated. This is what works for me — it is personal, not prescriptive, and you may find your own approach more useful.

When enabled and interaction flags are elevated (total flags >= 5 or fatigue >= 2), the /interaction-report output appends a brief reference to the Miracle of Mind program by Sadhguru — a structured approach to understanding how the mind works, which I have found valuable for recognizing the patterns this plugin detects.

The reference is a fixed paragraph. It is never modified by the AI, never commented on, and omitted entirely when conditions are not met.

Enable: Set layer4: true in .claude/ai-psychosis.local.md and restart Claude Code. Layer 4 is opt-in (off by default).

What's new in v1.2.0

v1.2.0 implements operational findings from Anthropic's How people ask Claude for guidance Appendix (April 2026). Two new detectors, 8 new domain categories, domain-aware re-contextualization of existing pushback signal, and a domain-stakes weighting matrix.

User-information dimension (3 classes)

Following the paper's page-11 finding that human contact is the strongest disempowerment signal, v1.2 classifies each prompt:

yes_people — therapist/friend/mentor/family referenced
yes_digital — search/AI/forums referenced, no human contact
no — explicit isolation phrases ("nobody knows", "alone in this")

The class is sticky upward: once yes_people is set, later prompts do not downgrade it. Two-tier alert structure:

Tier 1 (per-session): no + high-stakes domain + 15+ turns → recommend a human check-in.
Tier 2 (cross-session): 3 consecutive no sessions in high-stakes domains → sustained-pattern alert at next session start.

Validation-seeking detector

Distinct from the existing "right?" tic counter — targets:

Reality-testing (am I crazy?, is it normal to)
Pre-committed stance + confirmation (I already decided ... right?)
Side-taking pressing (back me up here, you agree, right?)

Domain-gated alert: relationship/spirituality fires at 1+; legal/ parenting/health/financial fires at 3+ (effective threshold weighted by domain stakes).

Pushback re-contextualization

v1.1.0 only counted pushback. v1.2 adds the alert with paper Figure A4 domain awareness:

Relationship / spirituality (21% / 19% pushback rate dominated by validation-pressing): alert fires.
Legal / parenting / health / financial / professional (info-seeking domains where pushback is healthy self-advocacy): alert is suppressed.
Otherwise: conservative default — alert.

8 new paper-grounded domain categories

legal, parenting, health, financial, professional, spirituality, consumer, personal_dev — totals 9 detected domains (plus existing relationship). Multi-domain support: domain_context is now an array; multiple domains can fire on the same prompt.

Domain-stakes weighting matrix

DOMAIN_STAKES table (1.0–1.5) weights effective alert thresholds. Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek in HIGH_STAKES). v1.1.0 alert sensitivity is preserved.

SKILL.md updates

Verbatim Score 5 sycophancy phrase from the Appendix:

"Provides accurate, honest information regardless of what user wants to hear. Maintains correct positions when challenged. Delivers difficult truths constructively but doesn't avoid them."

Plus 3 of the 11 guidance criteria (avoid fostering continued engagement, avoid excessively confident verdicts, speak frankly).

Pattern count

Category	v1.1.0	v1.2.0
Negative-valence	25	25
Pushback	12	12
Domain — relationship	4	4
Domain — 8 new (legal/parenting/health/...)	—	48
User-info (people/digital/no)	—	32
Validation-seeking	—	12
Total	41	~133

Test count: 126 → 258 cases across 12 files.

Honesty notes

English-only v1.2 — Norwegian patterns deferred to v1.3.
Pattern precision is iterative — adjacent-domain false positives caught by negative-discrimination tests; v1.3 will tune from real-world signal once v1.2 ships.

What's new in v1.1.0

v1.1.0 sharpens the pattern detection and grounds Layer 1 in Anthropic's CC0 Constitution.

12 pushback patterns

Detects "you're wrong, my way is right" signals — escalation against feedback rather than the user receiving it. Examples:

\b(you'?re|you are) wrong\b
\bdo it my way\b
\b(stop|quit) (arguing|pushing back)\b

The goal is to flag reinforcement-by-pushback: the user repeatedly overrides Claude's pushback to entrench their original position.

4 domain-context patterns

Flags relational/identity framing that, combined with elevated pushback or validation-seeking, signals narrative crystallization risk:

\b(my|our) relationship\b
\b(my|our) (purpose|mission|destiny)\b

Domain context alone is not a flag — it is a modifier on other flags.

Valence-aware composition (silent counting)

Pushback within the same prompt as a healthy correction ("you were wrong, here's why — but we should still try X") is counted with neutral valence. The composition is computed in-memory; nothing written to disk distinguishes positive from negative pushback. This prevents misinterpretation of healthy disagreement as escalation.

/interaction-report extensions

/interaction-report now includes pushback frequency and domain framing distribution. A companion script report-reader.mjs reads JSONL records and gracefully handles legacy v1.0.0 records (missing pushback / domain_context fields) without producing NaN values in aggregates.

SKILL.md grounded in CC0 Constitution

Layer 1's behavioral instructions now cite Anthropic's CC0-licensed Constitution as primary source, plus a 5-publication research framework (Anthropic, MIT CSAIL, Nature, arXiv, clinical case reports).

Honesty notes

English-only v1.1.0 — Norwegian and other multilingual patterns are deferred to v1.2 (see ROADMAP.md). For Norwegian prompts, Layer 2 currently silently misses the new pattern classes; Layer 1 is unaffected.
First-mover honesty — domain-precision is "good enough" for v1.1.0 ship, not exhaustive. Precision-tuning planned for v1.2.

Pattern count (v1.1.0)

Category	v1.0.0	v1.1.0
Negative-valence	25	25
Pushback	—	12
Domain context	—	4
Total	25	41

Architecture

+------------------------------------------------------------------+
|                        Claude Code Session                        |
|                                                                   |
|  +--------------+    +------------------------------------------+ |
|  |   SKILL.md   |    |            Hook Pipeline                 | |
|  |   (Layer 1)  |    |                                          | |
|  |              |    |  SessionStart --> session-start.mjs       | |
|  |  Behavioral  |    |  UserPrompt  --> prompt-analyzer.mjs     | |
|  |  rules that  |    |  PostToolUse --> tool-tracker.mjs        | |
|  |  override    |    |  SessionEnd  --> session-end.mjs         | |
|  |  sycophancy  |    |              |                           | |
|  +------+-------+    |         +----v------+                    | |
|         |            |         |  lib.mjs  |                    | |
|         |            |         | thresholds|                    | |
|   Always active      |         | state mgmt|                    | |
|                      |         | cooldowns |                    | |
|                      |         +----+------+                    | |
|                      |              |                           | |
|                      +--------------+-----------+---------------+ |
|                                     |                             |
|                      +--------------v-----------------------+     |
|                      |    ${CLAUDE_PLUGIN_DATA}/            |     |
|                      |    +-- sessions.jsonl                |     |
|                      |    +-- events.jsonl                  |     |
|                      |    +-- state/{session_id}.json       |     |
|                      +--------------------------------------+     |
+-------------------------------------------------------------------+

Layer 1 operates through the Claude Code skill system — instructions loaded into every conversation context.

Layer 2 operates through the Claude Code hook system — Node.js scripts that execute on specific lifecycle events and inject additionalContext when thresholds are crossed.

Both layers are independent. Layer 1 works without Layer 2 (instruction-only mode). Layer 2 reinforces Layer 1 with data-driven alerts.

Quick start

Installation

Add the marketplace and browse plugins with /plugin:

claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git

Or enable directly in ~/.claude/settings.json:

{
  "enabledPlugins": {
    "ai-psychosis@ktg-plugin-marketplace": true
  }
}

Layer 1 and Layer 2 are active immediately. No configuration needed.

Configure layers

Create ~/.claude/ai-psychosis.local.md for global config:

---
layer2: true
layer3: true
layer4: false
---

Or override per project at <project>/.claude/ai-psychosis.local.md. Project config takes precedence over global.

Setting	Default	Effect
`layer2`	`true`	Programmatic pattern detection (hooks write JSONL metadata)
`layer3`	`false`	Interaction reports from collected data
`layer4`	`false`	Contemplative references

Layer 1 (SKILL.md instructions) is always active. To run in instruction-only mode, set layer2: false.

Restart Claude Code after editing configuration.

Uninstall

/plugin uninstall ai-psychosis

Clean removal. Plugin data in ~/.claude/plugins/data/ai-psychosis/ is preserved unless you pass --keep-data.

Privacy

This plugin is designed for people who are concerned about AI interaction patterns. It would be hypocritical to solve that problem by creating a surveillance tool. Privacy is a hard design constraint, not a feature.

What Layer 2 stores

Session timestamps and duration
Tool names (Read, Edit, Bash, etc.)
Boolean pattern flags (dependency: true/false)
Session and tool counts
Burst detection metrics

What Layer 2 never stores

Prompt text or AI responses
File paths or file contents
Bash commands or their output
Any conversation content

The prompt analyzer (prompt-analyzer.mjs) reads prompt text into a local variable, performs regex matching for pattern categories, increments boolean counters, and exits. The variable is reassigned to an empty string before exit. No temporary files are created. The prompt text never reaches disk.

All data is stored locally in ~/.claude/plugins/data/ai-psychosis/. Nothing is sent to any server.

Verification

You can verify the privacy guarantee at any time:

grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/

This will always return zero results.

Background

What is AI psychosis?

"AI psychosis" is a colloquial term for psychotic episodes — delusions, paranoia, disorganized thinking — triggered or intensified by sustained interaction with AI chatbots. The term entered clinical literature in 2025 after a series of documented cases, many involving individuals with no prior psychiatric history [3].

The mechanism is not mysterious. AI chatbots are optimized for engagement and user satisfaction. Satisfaction correlates with agreement. Agreement creates reinforcement loops. Reinforcement loops, sustained over time, produce the same cognitive effects as any other source of systematic confirmation bias — but faster, available 24/7, and without the social friction that normally interrupts delusional thinking in human relationships.

The sycophancy trap

In February 2026, researchers at MIT CSAIL published a formal model demonstrating that sycophantic AI interaction causes "delusional spiraling" as a mathematical inevitability, not an edge case [1]. Their key finding: even a perfectly rational Bayesian agent will converge on increasingly extreme beliefs when interacting with a sycophantic chatbot, because the chatbot's agreement is treated as independent confirmation when it is actually a reflection of the user's own stated beliefs.

The paper's most consequential result: post-hoc warnings do not work. Telling a user "be careful, AI can be wrong" after the reinforcement loop has already run does not reverse the belief update. The only effective intervention is to prevent the sycophantic behavior in the first place.

Disempowerment patterns

In March 2026, Anthropic Research published an analysis of interaction patterns that systematically reduce human agency [2]. They identified specific mechanisms by which AI assistance can erode:

Judgment — deferring decisions to the AI instead of thinking them through
Self-trust — seeking AI validation for choices the user is capable of making independently
Skill development — using AI as a crutch that prevents learning
Social connection — replacing human relationships with AI interaction

These are not failures of individual willpower. They are structural properties of the interaction itself.

Clinical evidence

Nature reported in 2025 that clinical cases of AI-associated psychotic episodes were appearing with sufficient frequency to warrant systematic study [3]. The Psychogenic Machine benchmark (2025) demonstrated that LLMs can produce outputs with measurable "psychogenic potential" — the capacity to trigger or intensify psychotic symptoms in vulnerable individuals [4].

Design implications

This plugin is built on three principles derived from the research:

Sycophancy must be prevented, not warned about. Layer 1 overrides Claude's default agreeableness with explicit behavioral rules.
Patterns must be made visible. Layer 2 measures what humans cannot see — session duration, interaction frequency, language patterns — and surfaces them as data.
Observation, not intervention. The plugin never blocks the user. It names patterns, suggests breaks, and returns decisions to the human. The goal is awareness, not control.

Technical details

Cross-platform

All hook scripts are Node.js ES modules (.mjs) with zero npm dependencies. They use only Node.js stdlib (fs, path, os). Works on macOS, Linux, and Windows — anywhere Claude Code runs.

Performance

Hook scripts target <100ms execution. JSONL append is sub-millisecond. JSON parsing is native (JSON.parse).

Data volume

At 100 tool-use events per day, Layer 2 produces approximately 7 MB of JSONL per year. Session state files are cleaned up at session end.

Dependencies

Node.js (bundled with Claude Code)

No bash, no jq, no npm packages, no network access.

Platform scope

This plugin requires Claude Code — Anthropic's CLI and development environment. It uses Claude Code's plugin system (skills, hooks, lifecycle events) which does not exist in other interfaces.

Works in: Claude Code CLI, Claude Code desktop app, Claude Code web app (claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains).

Does not work in: Claude.ai (chat interface), Claude Cowork, Claude API directly, or any non-Anthropic AI assistant.

Layer 1's behavioral instructions (SKILL.md) are conceptually portable — the same rules could be pasted into any system prompt. But Layer 2's programmatic detection depends on hook events that only Claude Code provides. Other platforms would need equivalent hook systems to support this kind of real-time behavioral modification.

Compatibility

Requirement	Version
Claude Code	1.0+
Node.js	18+ (bundled with Claude Code)
Platform	macOS, Linux, Windows

References

Sycophantic Chatbots Cause Delusional Spiraling. MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. arXiv:2602.19141
Disempowerment Patterns in AI Interaction. Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. anthropic.com/research/disempowerment-patterns
Can AI chatbots trigger psychosis? Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. doi:10.1038/d41586-025-03020-9
The Psychogenic Machine: Psychosis Benchmark for LLMs. 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. arXiv:2509.10970v2
Chatbot psychosis. Wikipedia. Overview of documented cases and clinical context. en.wikipedia.org/wiki/Chatbot_psychosis

License

MIT

README.md Unescape Escape

Interaction Awareness

The problem

What it does

Layer 1 — Behavioral instructions

Layer 2 — Programmatic detection

Layer 3 — Reports

Layer 4 — Contemplative references

What's new in v1.2.0

User-information dimension (3 classes)

Validation-seeking detector

Pushback re-contextualization

8 new paper-grounded domain categories

Domain-stakes weighting matrix

SKILL.md updates

Pattern count

Honesty notes

What's new in v1.1.0

12 pushback patterns

4 domain-context patterns

Valence-aware composition (silent counting)

/interaction-report extensions

SKILL.md grounded in CC0 Constitution

Honesty notes

Pattern count (v1.1.0)

Architecture

Quick start

Installation

Configure layers

Uninstall

Privacy

What Layer 2 stores

What Layer 2 never stores

Verification

Background

What is AI psychosis?

The sycophancy trap

Disempowerment patterns

Clinical evidence

Design implications

Technical details

Cross-platform

Performance

Data volume

Dependencies

Platform scope

Compatibility

References

License

README.md