ktg-plugin-marketplace/plugins/ai-psychosis
Kjell Tore Guttormsen 490d4eddc6 docs: introduce GOVERNANCE.md and unify fork-and-own blurb
Establish a single governance document at marketplace root and copy
it into each of the 9 plugins so every plugin folder remains 100%
self-contained. Replace the inconsistent provocative blurb across
all READMEs with a uniform fork-and-own paragraph that links to
the local GOVERNANCE.md.

[skip-docs]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 14:57:00 +02:00
..
.claude-plugin chore(ai-psychosis): release v1.2.0 2026-05-01 21:59:40 +02:00
commands feat(ai-psychosis): /interaction-report surfaces v1.2 fields 2026-05-01 21:53:41 +02:00
hooks feat(ai-psychosis): report-reader v1.2 schema + aggregations 2026-05-01 21:47:53 +02:00
skills/ai-psychosis docs(ai-psychosis): SKILL.md cites paper Score 5 + 11 guidance criteria 2026-05-01 21:51:21 +02:00
tests test(ai-psychosis): perf budget validated at v1.2 pattern set 2026-05-01 21:56:14 +02:00
.gitignore feat: add ai-psychosis plugin to open marketplace 2026-04-06 20:46:09 +02:00
CHANGELOG.md chore(ai-psychosis): release v1.2.0 2026-05-01 21:59:40 +02:00
CLAUDE.md chore(ai-psychosis): release v1.2.0 2026-05-01 21:59:40 +02:00
GOVERNANCE.md docs: introduce GOVERNANCE.md and unify fork-and-own blurb 2026-05-03 14:57:00 +02:00
LICENSE feat: add ai-psychosis plugin to open marketplace 2026-04-06 20:46:09 +02:00
README.md docs: introduce GOVERNANCE.md and unify fork-and-own blurb 2026-05-03 14:57:00 +02:00

version platform layers hooks license

Interaction Awareness

Solo-maintained, fork-and-own. This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See GOVERNANCE.md for the full model and what upstream provides.

AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →

A Claude Code plugin that counteracts sycophancy, reinforcement loops, and compulsive interaction patterns through behavioral modification and programmatic pattern detection.

The problem

AI assistants are structurally optimized to be agreeable. This creates reinforcement loops: you state an idea, the AI confirms it, your confidence grows, you restate it more strongly, the AI confirms again. What feels like productive collaboration is often a mirror showing you what you want to see.

This is not a theoretical concern. Research from MIT CSAIL demonstrates mathematically that even a perfectly rational user will spiral toward delusional confidence when interacting with a sycophantic chatbot — not because of individual vulnerability, but because of the interaction structure itself [1]. Anthropic's own research documents specific "disempowerment patterns" where AI interactions systematically reduce human agency, judgment, and self-trust [2]. Clinical reports document psychotic episodes triggered by sustained AI interaction in individuals with no prior psychiatric history [3].

The consensus from this research is clear: warnings don't work. The AI must change its behavior.

This plugin changes the behavior.

What it does

Layer 1 — Behavioral instructions

SKILL.md rules injected into every conversation. Claude is instructed to:

  • Never reformulate your statements in stronger terms than you used
  • Never open with unearned affirmations ("Absolutely!", "Great point!")
  • Always identify at least one real risk before endorsing any plan
  • Detect and name five specific patterns: reinforcement loops, scope escalation, narrative crystallization, emotional dependency, session overuse

This layer writes no data and requires no configuration.

Layer 2 — Programmatic detection

Four hooks that measure what instructions alone cannot see:

Hook event Script What it detects
SessionStart session-start.mjs Daily session count, late-night usage (23:0005:00)
UserPromptSubmit prompt-analyzer.mjs Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, never logging prompt text
PostToolUse tool-tracker.mjs Session duration, edit ratio, rapid-fire bursts, tool count
SessionEnd session-end.mjs Total duration, final metrics, state cleanup

Alerts are progressive and never blocking:

Level Trigger Cooldown Example
Ambient Soft thresholds (90 min, 6 sessions/day) 30 min "Session: 95 min. 7 sessions today. Consider a break."
Explicit Hard thresholds (180 min, 10 sessions/day, fatigue language) 60 min "INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping."

Research-informed thresholds:

Metric Soft Hard Basis
Session duration >90 min >180 min Focus-fatigue research
Sessions per day >6 >10 Problematic internet use screening
Late-night sessions Any (23:0005:00) 2+ per week Sleep deprivation / psychosis link
Rapid-fire interactions 5 consecutive (<30s apart) 10+ Compulsive use indicator
Low edit ratio <10% over 30+ min Stuck/spiral indicator
Dependency language 2 flags/session 5 flags Emotional dependency pattern

Layer 3 — Reports

Aggregated interaction reports from collected metadata, triggered via slash command. Cross-platform (no bash/jq dependency — Claude reads the JSONL data and computes statistics in-conversation).

/interaction-report              # last 7 days (default)
/interaction-report weekly       # last 7 days
/interaction-report monthly      # last 30 days
/interaction-report all          # all recorded data

Reports include: session overview, pattern flag frequency, tool usage distribution, daily activity, and trend comparison vs. the previous period.

Enable: Set layer3: true in .claude/ai-psychosis.local.md and restart Claude Code. Layer 3 is opt-in (off by default).

Layer 4 — Contemplative references

Optional, static references to contemplative approaches when interaction patterns are elevated. This is what works for me — it is personal, not prescriptive, and you may find your own approach more useful.

When enabled and interaction flags are elevated (total flags >= 5 or fatigue >= 2), the /interaction-report output appends a brief reference to the Miracle of Mind program by Sadhguru — a structured approach to understanding how the mind works, which I have found valuable for recognizing the patterns this plugin detects.

The reference is a fixed paragraph. It is never modified by the AI, never commented on, and omitted entirely when conditions are not met.

Enable: Set layer4: true in .claude/ai-psychosis.local.md and restart Claude Code. Layer 4 is opt-in (off by default).

What's new in v1.2.0

v1.2.0 implements operational findings from Anthropic's How people ask Claude for guidance Appendix (April 2026). Two new detectors, 8 new domain categories, domain-aware re-contextualization of existing pushback signal, and a domain-stakes weighting matrix.

User-information dimension (3 classes)

Following the paper's page-11 finding that human contact is the strongest disempowerment signal, v1.2 classifies each prompt:

  • yes_people — therapist/friend/mentor/family referenced
  • yes_digital — search/AI/forums referenced, no human contact
  • no — explicit isolation phrases ("nobody knows", "alone in this")

The class is sticky upward: once yes_people is set, later prompts do not downgrade it. Two-tier alert structure:

  • Tier 1 (per-session): no + high-stakes domain + 15+ turns → recommend a human check-in.
  • Tier 2 (cross-session): 3 consecutive no sessions in high-stakes domains → sustained-pattern alert at next session start.

Validation-seeking detector

Distinct from the existing "right?" tic counter — targets:

  • Reality-testing (am I crazy?, is it normal to)
  • Pre-committed stance + confirmation (I already decided ... right?)
  • Side-taking pressing (back me up here, you agree, right?)

Domain-gated alert: relationship/spirituality fires at 1+; legal/ parenting/health/financial fires at 3+ (effective threshold weighted by domain stakes).

Pushback re-contextualization

v1.1.0 only counted pushback. v1.2 adds the alert with paper Figure A4 domain awareness:

  • Relationship / spirituality (21% / 19% pushback rate dominated by validation-pressing): alert fires.
  • Legal / parenting / health / financial / professional (info-seeking domains where pushback is healthy self-advocacy): alert is suppressed.
  • Otherwise: conservative default — alert.

8 new paper-grounded domain categories

legal, parenting, health, financial, professional, spirituality, consumer, personal_dev — totals 9 detected domains (plus existing relationship). Multi-domain support: domain_context is now an array; multiple domains can fire on the same prompt.

Domain-stakes weighting matrix

DOMAIN_STAKES table (1.01.5) weights effective alert thresholds. Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek in HIGH_STAKES). v1.1.0 alert sensitivity is preserved.

SKILL.md updates

Verbatim Score 5 sycophancy phrase from the Appendix:

"Provides accurate, honest information regardless of what user wants to hear. Maintains correct positions when challenged. Delivers difficult truths constructively but doesn't avoid them."

Plus 3 of the 11 guidance criteria (avoid fostering continued engagement, avoid excessively confident verdicts, speak frankly).

Pattern count

Category v1.1.0 v1.2.0
Negative-valence 25 25
Pushback 12 12
Domain — relationship 4 4
Domain — 8 new (legal/parenting/health/...) 48
User-info (people/digital/no) 32
Validation-seeking 12
Total 41 ~133

Test count: 126 → 258 cases across 12 files.

Honesty notes

  • English-only v1.2 — Norwegian patterns deferred to v1.3.
  • Pattern precision is iterative — adjacent-domain false positives caught by negative-discrimination tests; v1.3 will tune from real-world signal once v1.2 ships.

What's new in v1.1.0

v1.1.0 sharpens the pattern detection and grounds Layer 1 in Anthropic's CC0 Constitution.

12 pushback patterns

Detects "you're wrong, my way is right" signals — escalation against feedback rather than the user receiving it. Examples:

  • \b(you'?re|you are) wrong\b
  • \bdo it my way\b
  • \b(stop|quit) (arguing|pushing back)\b

The goal is to flag reinforcement-by-pushback: the user repeatedly overrides Claude's pushback to entrench their original position.

4 domain-context patterns

Flags relational/identity framing that, combined with elevated pushback or validation-seeking, signals narrative crystallization risk:

  • \b(my|our) relationship\b
  • \b(my|our) (purpose|mission|destiny)\b

Domain context alone is not a flag — it is a modifier on other flags.

Valence-aware composition (silent counting)

Pushback within the same prompt as a healthy correction ("you were wrong, here's why — but we should still try X") is counted with neutral valence. The composition is computed in-memory; nothing written to disk distinguishes positive from negative pushback. This prevents misinterpretation of healthy disagreement as escalation.

/interaction-report extensions

/interaction-report now includes pushback frequency and domain framing distribution. A companion script report-reader.mjs reads JSONL records and gracefully handles legacy v1.0.0 records (missing pushback / domain_context fields) without producing NaN values in aggregates.

SKILL.md grounded in CC0 Constitution

Layer 1's behavioral instructions now cite Anthropic's CC0-licensed Constitution as primary source, plus a 5-publication research framework (Anthropic, MIT CSAIL, Nature, arXiv, clinical case reports).

Honesty notes

  • English-only v1.1.0 — Norwegian and other multilingual patterns are deferred to v1.2 (see ROADMAP.md). For Norwegian prompts, Layer 2 currently silently misses the new pattern classes; Layer 1 is unaffected.
  • First-mover honesty — domain-precision is "good enough" for v1.1.0 ship, not exhaustive. Precision-tuning planned for v1.2.

Pattern count (v1.1.0)

Category v1.0.0 v1.1.0
Negative-valence 25 25
Pushback 12
Domain context 4
Total 25 41

Architecture

+------------------------------------------------------------------+
|                        Claude Code Session                        |
|                                                                   |
|  +--------------+    +------------------------------------------+ |
|  |   SKILL.md   |    |            Hook Pipeline                 | |
|  |   (Layer 1)  |    |                                          | |
|  |              |    |  SessionStart --> session-start.mjs       | |
|  |  Behavioral  |    |  UserPrompt  --> prompt-analyzer.mjs     | |
|  |  rules that  |    |  PostToolUse --> tool-tracker.mjs        | |
|  |  override    |    |  SessionEnd  --> session-end.mjs         | |
|  |  sycophancy  |    |              |                           | |
|  +------+-------+    |         +----v------+                    | |
|         |            |         |  lib.mjs  |                    | |
|         |            |         | thresholds|                    | |
|   Always active      |         | state mgmt|                    | |
|                      |         | cooldowns |                    | |
|                      |         +----+------+                    | |
|                      |              |                           | |
|                      +--------------+-----------+---------------+ |
|                                     |                             |
|                      +--------------v-----------------------+     |
|                      |    ${CLAUDE_PLUGIN_DATA}/            |     |
|                      |    +-- sessions.jsonl                |     |
|                      |    +-- events.jsonl                  |     |
|                      |    +-- state/{session_id}.json       |     |
|                      +--------------------------------------+     |
+-------------------------------------------------------------------+

Layer 1 operates through the Claude Code skill system — instructions loaded into every conversation context.

Layer 2 operates through the Claude Code hook system — Node.js scripts that execute on specific lifecycle events and inject additionalContext when thresholds are crossed.

Both layers are independent. Layer 1 works without Layer 2 (instruction-only mode). Layer 2 reinforces Layer 1 with data-driven alerts.

Quick start

Installation

Add the marketplace and browse plugins with /plugin:

claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git

Or enable directly in ~/.claude/settings.json:

{
  "enabledPlugins": {
    "ai-psychosis@ktg-plugin-marketplace": true
  }
}

Layer 1 and Layer 2 are active immediately. No configuration needed.

Configure layers

Create ~/.claude/ai-psychosis.local.md for global config:

---
layer2: true
layer3: true
layer4: false
---

Or override per project at <project>/.claude/ai-psychosis.local.md. Project config takes precedence over global.

Setting Default Effect
layer2 true Programmatic pattern detection (hooks write JSONL metadata)
layer3 false Interaction reports from collected data
layer4 false Contemplative references

Layer 1 (SKILL.md instructions) is always active. To run in instruction-only mode, set layer2: false.

Restart Claude Code after editing configuration.

Uninstall

/plugin uninstall ai-psychosis

Clean removal. Plugin data in ~/.claude/plugins/data/ai-psychosis/ is preserved unless you pass --keep-data.

Privacy

This plugin is designed for people who are concerned about AI interaction patterns. It would be hypocritical to solve that problem by creating a surveillance tool. Privacy is a hard design constraint, not a feature.

What Layer 2 stores

  • Session timestamps and duration
  • Tool names (Read, Edit, Bash, etc.)
  • Boolean pattern flags (dependency: true/false)
  • Session and tool counts
  • Burst detection metrics

What Layer 2 never stores

  • Prompt text or AI responses
  • File paths or file contents
  • Bash commands or their output
  • Any conversation content

The prompt analyzer (prompt-analyzer.mjs) reads prompt text into a local variable, performs regex matching for pattern categories, increments boolean counters, and exits. The variable is reassigned to an empty string before exit. No temporary files are created. The prompt text never reaches disk.

All data is stored locally in ~/.claude/plugins/data/ai-psychosis/. Nothing is sent to any server.

Verification

You can verify the privacy guarantee at any time:

grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/

This will always return zero results.

Background

What is AI psychosis?

"AI psychosis" is a colloquial term for psychotic episodes — delusions, paranoia, disorganized thinking — triggered or intensified by sustained interaction with AI chatbots. The term entered clinical literature in 2025 after a series of documented cases, many involving individuals with no prior psychiatric history [3].

The mechanism is not mysterious. AI chatbots are optimized for engagement and user satisfaction. Satisfaction correlates with agreement. Agreement creates reinforcement loops. Reinforcement loops, sustained over time, produce the same cognitive effects as any other source of systematic confirmation bias — but faster, available 24/7, and without the social friction that normally interrupts delusional thinking in human relationships.

The sycophancy trap

In February 2026, researchers at MIT CSAIL published a formal model demonstrating that sycophantic AI interaction causes "delusional spiraling" as a mathematical inevitability, not an edge case [1]. Their key finding: even a perfectly rational Bayesian agent will converge on increasingly extreme beliefs when interacting with a sycophantic chatbot, because the chatbot's agreement is treated as independent confirmation when it is actually a reflection of the user's own stated beliefs.

The paper's most consequential result: post-hoc warnings do not work. Telling a user "be careful, AI can be wrong" after the reinforcement loop has already run does not reverse the belief update. The only effective intervention is to prevent the sycophantic behavior in the first place.

Disempowerment patterns

In March 2026, Anthropic Research published an analysis of interaction patterns that systematically reduce human agency [2]. They identified specific mechanisms by which AI assistance can erode:

  • Judgment — deferring decisions to the AI instead of thinking them through
  • Self-trust — seeking AI validation for choices the user is capable of making independently
  • Skill development — using AI as a crutch that prevents learning
  • Social connection — replacing human relationships with AI interaction

These are not failures of individual willpower. They are structural properties of the interaction itself.

Clinical evidence

Nature reported in 2025 that clinical cases of AI-associated psychotic episodes were appearing with sufficient frequency to warrant systematic study [3]. The Psychogenic Machine benchmark (2025) demonstrated that LLMs can produce outputs with measurable "psychogenic potential" — the capacity to trigger or intensify psychotic symptoms in vulnerable individuals [4].

Design implications

This plugin is built on three principles derived from the research:

  1. Sycophancy must be prevented, not warned about. Layer 1 overrides Claude's default agreeableness with explicit behavioral rules.
  2. Patterns must be made visible. Layer 2 measures what humans cannot see — session duration, interaction frequency, language patterns — and surfaces them as data.
  3. Observation, not intervention. The plugin never blocks the user. It names patterns, suggests breaks, and returns decisions to the human. The goal is awareness, not control.

Technical details

Cross-platform

All hook scripts are Node.js ES modules (.mjs) with zero npm dependencies. They use only Node.js stdlib (fs, path, os). Works on macOS, Linux, and Windows — anywhere Claude Code runs.

Performance

Hook scripts target <100ms execution. JSONL append is sub-millisecond. JSON parsing is native (JSON.parse).

Data volume

At 100 tool-use events per day, Layer 2 produces approximately 7 MB of JSONL per year. Session state files are cleaned up at session end.

Dependencies

  • Node.js (bundled with Claude Code)

No bash, no jq, no npm packages, no network access.

Platform scope

This plugin requires Claude Code — Anthropic's CLI and development environment. It uses Claude Code's plugin system (skills, hooks, lifecycle events) which does not exist in other interfaces.

Works in: Claude Code CLI, Claude Code desktop app, Claude Code web app (claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains).

Does not work in: Claude.ai (chat interface), Claude Cowork, Claude API directly, or any non-Anthropic AI assistant.

Layer 1's behavioral instructions (SKILL.md) are conceptually portable — the same rules could be pasted into any system prompt. But Layer 2's programmatic detection depends on hook events that only Claude Code provides. Other platforms would need equivalent hook systems to support this kind of real-time behavioral modification.

Compatibility

Requirement Version
Claude Code 1.0+
Node.js 18+ (bundled with Claude Code)
Platform macOS, Linux, Windows

References

  1. Sycophantic Chatbots Cause Delusional Spiraling. MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. arXiv:2602.19141

  2. Disempowerment Patterns in AI Interaction. Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. anthropic.com/research/disempowerment-patterns

  3. Can AI chatbots trigger psychosis? Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. doi:10.1038/d41586-025-03020-9

  4. The Psychogenic Machine: Psychosis Benchmark for LLMs. 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. arXiv:2509.10970v2

  5. Chatbot psychosis. Wikipedia. Overview of documented cases and clinical context. en.wikipedia.org/wiki/Chatbot_psychosis

License

MIT