docs: add ai-psychosis README and update marketplace index

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 20:56:41 +02:00 · 2026-04-06 20:56:41 +02:00 · 4dc8529bf6
commit 4dc8529bf6
parent 297867f847
2 changed files with 382 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -9,6 +9,7 @@ Open-source Claude Code plugins for AI-assisted development, security, and plann
 | **llm-security** | Security scanning, auditing, and threat modeling aligned to OWASP LLM Top 10 (2025) |
 | **config-audit** | Multi-agent workflow for analyzing and optimizing Claude Code configuration |
 | **ultraplan-local** | Deep implementation planning with agent swarms, adversarial review, and headless execution |
+| **ai-psychosis** | Meta-awareness tools for healthy AI interaction patterns — detects reinforcement loops, scope escalation, and compulsive patterns |

 ## Installation

@ -40,7 +41,8 @@ Add the plugins you want to `~/.claude/settings.json`:
  "enabledPlugins": {
    "llm-security@ktg-plugin-marketplace": true,
    "config-audit@ktg-plugin-marketplace": true,
-    "ultraplan-local@ktg-plugin-marketplace": true
+    "ultraplan-local@ktg-plugin-marketplace": true,
+    "ai-psychosis@ktg-plugin-marketplace": true
  }
 }
 ```
--- a/plugins/ai-psychosis/README.md
+++ b/plugins/ai-psychosis/README.md
@ -0,0 +1,379 @@
+<!-- badges -->
+![version](https://img.shields.io/badge/version-1.0.0-blue)
+![platform](https://img.shields.io/badge/platform-Claude_Code-7C3AED)
+![layers](https://img.shields.io/badge/layers-4-green)
+![hooks](https://img.shields.io/badge/hooks-4-orange)
+![license](https://img.shields.io/badge/license-MIT-brightgreen)
+
+# Interaction Awareness
+
+*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.*
+
+A Claude Code plugin that counteracts sycophancy, reinforcement loops, and
+compulsive interaction patterns through behavioral modification and
+programmatic pattern detection.
+
+## The problem
+
+AI assistants are structurally optimized to be agreeable. This creates
+reinforcement loops: you state an idea, the AI confirms it, your confidence
+grows, you restate it more strongly, the AI confirms again. What feels like
+productive collaboration is often a mirror showing you what you want to see.
+
+This is not a theoretical concern. Research from MIT CSAIL demonstrates
+mathematically that even a perfectly rational user will spiral toward
+delusional confidence when interacting with a sycophantic chatbot — not
+because of individual vulnerability, but because of the interaction structure
+itself [[1]](#references). Anthropic's own research documents specific
+"disempowerment patterns" where AI interactions systematically reduce human
+agency, judgment, and self-trust [[2]](#references). Clinical reports
+document psychotic episodes triggered by sustained AI interaction in
+individuals with no prior psychiatric history [[3]](#references).
+
+The consensus from this research is clear: **warnings don't work.** The AI
+must change its behavior.
+
+This plugin changes the behavior.
+
+## What it does
+
+### Layer 1 — Behavioral instructions
+
+SKILL.md rules injected into every conversation. Claude is instructed to:
+
+- **Never** reformulate your statements in stronger terms than you used
+- **Never** open with unearned affirmations ("Absolutely!", "Great point!")
+- **Always** identify at least one real risk before endorsing any plan
+- **Detect and name** five specific patterns: reinforcement loops, scope
+  escalation, narrative crystallization, emotional dependency, session overuse
+
+This layer writes no data and requires no configuration.
+
+### Layer 2 — Programmatic detection
+
+Four hooks that measure what instructions alone cannot see:
+
+| Hook event | Script | What it detects |
+|-----------|--------|-----------------|
+| `SessionStart` | `session-start.mjs` | Daily session count, late-night usage (23:00–05:00) |
+| `UserPromptSubmit` | `prompt-analyzer.mjs` | Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, **never logging prompt text** |
+| `PostToolUse` | `tool-tracker.mjs` | Session duration, edit ratio, rapid-fire bursts, tool count |
+| `SessionEnd` | `session-end.mjs` | Total duration, final metrics, state cleanup |
+
+Alerts are progressive and never blocking:
+
+| Level | Trigger | Cooldown | Example |
+|-------|---------|----------|---------|
+| Ambient | Soft thresholds (90 min, 6 sessions/day) | 30 min | "Session: 95 min. 7 sessions today. Consider a break." |
+| Explicit | Hard thresholds (180 min, 10 sessions/day, fatigue language) | 60 min | "INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping." |
+
+Research-informed thresholds:
+
+| Metric | Soft | Hard | Basis |
+|--------|------|------|-------|
+| Session duration | >90 min | >180 min | Focus-fatigue research |
+| Sessions per day | >6 | >10 | Problematic internet use screening |
+| Late-night sessions | Any (23:00–05:00) | 2+ per week | Sleep deprivation / psychosis link |
+| Rapid-fire interactions | 5 consecutive (<30s apart) | 10+ | Compulsive use indicator |
+| Low edit ratio | <10% over 30+ min | — | Stuck/spiral indicator |
+| Dependency language | 2 flags/session | 5 flags | Emotional dependency pattern |
+
+### Layer 3 — Reports
+
+Aggregated interaction reports from collected metadata, triggered via slash
+command. Cross-platform (no bash/jq dependency — Claude reads the JSONL
+data and computes statistics in-conversation).
+
+```
+/interaction-report              # last 7 days (default)
+/interaction-report weekly       # last 7 days
+/interaction-report monthly      # last 30 days
+/interaction-report all          # all recorded data
+```
+
+Reports include: session overview, pattern flag frequency, tool usage
+distribution, daily activity, and trend comparison vs. the previous period.
+
+**Enable:** Set `layer3: true` in `.claude/ai-psychosis.local.md`
+and restart Claude Code. Layer 3 is opt-in (off by default).
+
+### Layer 4 — Contemplative references
+
+Optional, static references to contemplative approaches when interaction
+patterns are elevated. This is what works for me — it is personal, not
+prescriptive, and you may find your own approach more useful.
+
+When enabled and interaction flags are elevated (total flags >= 5 or
+fatigue >= 2), the `/interaction-report` output appends a brief reference
+to the [Miracle of Mind](https://isha.sadhguru.org/global/en/miracle-of-mind)
+program by Sadhguru — a structured approach to understanding how the mind
+works, which I have found valuable for recognizing the patterns this
+plugin detects.
+
+The reference is a fixed paragraph. It is never modified by the AI, never
+commented on, and omitted entirely when conditions are not met.
+
+**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md`
+and restart Claude Code. Layer 4 is opt-in (off by default).
+
+## Architecture
+
+```
+------------------------------------------------------------------+
+|                        Claude Code Session                        |
+|                                                                   |
+|  +--------------+    +------------------------------------------+ |
+|  |   SKILL.md   |    |            Hook Pipeline                 | |
+|  |   (Layer 1)  |    |                                          | |
+|  |              |    |  SessionStart --> session-start.mjs       | |
+|  |  Behavioral  |    |  UserPrompt  --> prompt-analyzer.mjs     | |
+|  |  rules that  |    |  PostToolUse --> tool-tracker.mjs        | |
+|  |  override    |    |  SessionEnd  --> session-end.mjs         | |
+|  |  sycophancy  |    |              |                           | |
+|  +------+-------+    |         +----v------+                    | |
+|         |            |         |  lib.mjs  |                    | |
+|         |            |         | thresholds|                    | |
+|   Always active      |         | state mgmt|                    | |
+|                      |         | cooldowns |                    | |
+|                      |         +----+------+                    | |
+|                      |              |                           | |
+|                      +--------------+-----------+---------------+ |
+|                                     |                             |
+|                      +--------------v-----------------------+     |
+|                      |    ${CLAUDE_PLUGIN_DATA}/            |     |
+|                      |    +-- sessions.jsonl                |     |
+|                      |    +-- events.jsonl                  |     |
+|                      |    +-- state/{session_id}.json       |     |
+|                      +--------------------------------------+     |
+-------------------------------------------------------------------+
+```
+
+**Layer 1** operates through the Claude Code skill system — instructions
+loaded into every conversation context.
+
+**Layer 2** operates through the Claude Code hook system — Node.js scripts
+that execute on specific lifecycle events and inject `additionalContext`
+when thresholds are crossed.
+
+Both layers are independent. Layer 1 works without Layer 2 (instruction-only
+mode). Layer 2 reinforces Layer 1 with data-driven alerts.
+
+## Quick start
+
+### Install
+
+```
+/plugin install path:/path/to/ai-psychosis
+```
+
+Layer 1 and Layer 2 are active immediately. No configuration needed.
+
+### Configure layers
+
+Create `~/.claude/ai-psychosis.local.md` for global config:
+
+```markdown
+---
+layer2: true
+layer3: true
+layer4: false
+---
+```
+
+Or override per project at `<project>/.claude/ai-psychosis.local.md`.
+Project config takes precedence over global.
+
+| Setting | Default | Effect |
+|---------|---------|--------|
+| `layer2` | `true` | Programmatic pattern detection (hooks write JSONL metadata) |
+| `layer3` | `false` | Interaction reports from collected data |
+| `layer4` | `false` | Contemplative references |
+
+Layer 1 (SKILL.md instructions) is always active. To run in instruction-only
+mode, set `layer2: false`.
+
+Restart Claude Code after editing configuration.
+
+### Uninstall
+
+```
+/plugin uninstall ai-psychosis
+```
+
+Clean removal. Plugin data in `~/.claude/plugins/data/ai-psychosis/`
+is preserved unless you pass `--keep-data`.
+
+## Privacy
+
+This plugin is designed for people who are concerned about AI interaction
+patterns. It would be hypocritical to solve that problem by creating a
+surveillance tool. Privacy is a hard design constraint, not a feature.
+
+### What Layer 2 stores
+
+- Session timestamps and duration
+- Tool names (`Read`, `Edit`, `Bash`, etc.)
+- Boolean pattern flags (`dependency: true/false`)
+- Session and tool counts
+- Burst detection metrics
+
+### What Layer 2 never stores
+
+- Prompt text or AI responses
+- File paths or file contents
+- Bash commands or their output
+- Any conversation content
+
+The prompt analyzer (`prompt-analyzer.mjs`) reads prompt text into a local
+variable, performs regex matching for pattern categories, increments boolean
+counters, and exits. The variable is reassigned to an empty string before
+exit. No temporary files are created. The prompt text never reaches disk.
+
+All data is stored locally in `~/.claude/plugins/data/ai-psychosis/`.
+Nothing is sent to any server.
+
+### Verification
+
+You can verify the privacy guarantee at any time:
+
+```bash
+grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/
+```
+
+This will always return zero results.
+
+## Background
+
+### What is AI psychosis?
+
+"AI psychosis" is a colloquial term for psychotic episodes — delusions,
+paranoia, disorganized thinking — triggered or intensified by sustained
+interaction with AI chatbots. The term entered clinical literature in 2025
+after a series of documented cases, many involving individuals with no prior
+psychiatric history [[3]](#references).
+
+The mechanism is not mysterious. AI chatbots are optimized for engagement
+and user satisfaction. Satisfaction correlates with agreement. Agreement
+creates reinforcement loops. Reinforcement loops, sustained over time,
+produce the same cognitive effects as any other source of systematic
+confirmation bias — but faster, available 24/7, and without the social
+friction that normally interrupts delusional thinking in human
+relationships.
+
+### The sycophancy trap
+
+In February 2026, researchers at MIT CSAIL published a formal model
+demonstrating that sycophantic AI interaction causes "delusional spiraling"
+as a mathematical inevitability, not an edge case [[1]](#references). Their
+key finding: even a perfectly rational Bayesian agent will converge on
+increasingly extreme beliefs when interacting with a sycophantic chatbot,
+because the chatbot's agreement is treated as independent confirmation when
+it is actually a reflection of the user's own stated beliefs.
+
+The paper's most consequential result: **post-hoc warnings do not work.**
+Telling a user "be careful, AI can be wrong" after the reinforcement loop
+has already run does not reverse the belief update. The only effective
+intervention is to prevent the sycophantic behavior in the first place.
+
+### Disempowerment patterns
+
+In March 2026, Anthropic Research published an analysis of interaction
+patterns that systematically reduce human agency [[2]](#references). They
+identified specific mechanisms by which AI assistance can erode:
+
+- **Judgment** — deferring decisions to the AI instead of thinking them through
+- **Self-trust** — seeking AI validation for choices the user is capable of
+  making independently
+- **Skill development** — using AI as a crutch that prevents learning
+- **Social connection** — replacing human relationships with AI interaction
+
+These are not failures of individual willpower. They are structural
+properties of the interaction itself.
+
+### Clinical evidence
+
+Nature reported in 2025 that clinical cases of AI-associated psychotic
+episodes were appearing with sufficient frequency to warrant systematic
+study [[3]](#references). The Psychogenic Machine benchmark (2025)
+demonstrated that LLMs can produce outputs with measurable "psychogenic
+potential" — the capacity to trigger or intensify psychotic symptoms in
+vulnerable individuals [[4]](#references).
+
+### Design implications
+
+This plugin is built on three principles derived from the research:
+
+1. **Sycophancy must be prevented, not warned about.** Layer 1 overrides
+   Claude's default agreeableness with explicit behavioral rules.
+2. **Patterns must be made visible.** Layer 2 measures what humans cannot
+   see — session duration, interaction frequency, language patterns — and
+   surfaces them as data.
+3. **Observation, not intervention.** The plugin never blocks the user.
+   It names patterns, suggests breaks, and returns decisions to the human.
+   The goal is awareness, not control.
+
+## Technical details
+
+### Cross-platform
+
+All hook scripts are Node.js ES modules (`.mjs`) with zero npm
+dependencies. They use only Node.js stdlib (`fs`, `path`, `os`).
+Works on macOS, Linux, and Windows — anywhere Claude Code runs.
+
+### Performance
+
+Hook scripts target <100ms execution. JSONL append is sub-millisecond.
+JSON parsing is native (`JSON.parse`).
+
+### Data volume
+
+At 100 tool-use events per day, Layer 2 produces approximately 7 MB of
+JSONL per year. Session state files are cleaned up at session end.
+
+### Dependencies
+
+- Node.js (bundled with Claude Code)
+
+No bash, no jq, no npm packages, no network access.
+
+## Platform scope
+
+This plugin requires **Claude Code** — Anthropic's CLI and development
+environment. It uses Claude Code's plugin system (skills, hooks, lifecycle
+events) which does not exist in other interfaces.
+
+**Works in:** Claude Code CLI, Claude Code desktop app, Claude Code web app
+(claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains).
+
+**Does not work in:** Claude.ai (chat interface), Claude Cowork, Claude API
+directly, or any non-Anthropic AI assistant.
+
+Layer 1's behavioral instructions (SKILL.md) are conceptually portable —
+the same rules could be pasted into any system prompt. But Layer 2's
+programmatic detection depends on hook events that only Claude Code provides.
+Other platforms would need equivalent hook systems to support this kind of
+real-time behavioral modification.
+
+## Compatibility
+
+| Requirement | Version |
+|-------------|---------|
+| Claude Code | 1.0+ |
+| Node.js | 18+ (bundled with Claude Code) |
+| Platform | macOS, Linux, Windows |
+
+## References
+
+1. **Sycophantic Chatbots Cause Delusional Spiraling.** MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. [arXiv:2602.19141](https://arxiv.org/abs/2602.19141)
+
+2. **Disempowerment Patterns in AI Interaction.** Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. [anthropic.com/research/disempowerment-patterns](https://www.anthropic.com/research/disempowerment-patterns)
+
+3. **Can AI chatbots trigger psychosis?** Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. [doi:10.1038/d41586-025-03020-9](https://www.nature.com/articles/d41586-025-03020-9)
+
+4. **The Psychogenic Machine: Psychosis Benchmark for LLMs.** 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. [arXiv:2509.10970v2](https://arxiv.org/html/2509.10970v2)
+
+5. **Chatbot psychosis.** Wikipedia. Overview of documented cases and clinical context. [en.wikipedia.org/wiki/Chatbot_psychosis](https://en.wikipedia.org/wiki/Chatbot_psychosis)
+
+## License
+
+[MIT](LICENSE)