diff --git a/README.md b/README.md index 5acff14..4246276 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ Open-source Claude Code plugins for AI-assisted development, security, and plann | **llm-security** | Security scanning, auditing, and threat modeling aligned to OWASP LLM Top 10 (2025) | | **config-audit** | Multi-agent workflow for analyzing and optimizing Claude Code configuration | | **ultraplan-local** | Deep implementation planning with agent swarms, adversarial review, and headless execution | +| **ai-psychosis** | Meta-awareness tools for healthy AI interaction patterns — detects reinforcement loops, scope escalation, and compulsive patterns | ## Installation @@ -40,7 +41,8 @@ Add the plugins you want to `~/.claude/settings.json`: "enabledPlugins": { "llm-security@ktg-plugin-marketplace": true, "config-audit@ktg-plugin-marketplace": true, - "ultraplan-local@ktg-plugin-marketplace": true + "ultraplan-local@ktg-plugin-marketplace": true, + "ai-psychosis@ktg-plugin-marketplace": true } } ``` diff --git a/plugins/ai-psychosis/README.md b/plugins/ai-psychosis/README.md new file mode 100644 index 0000000..05f13eb --- /dev/null +++ b/plugins/ai-psychosis/README.md @@ -0,0 +1,379 @@ + +![version](https://img.shields.io/badge/version-1.0.0-blue) +![platform](https://img.shields.io/badge/platform-Claude_Code-7C3AED) +![layers](https://img.shields.io/badge/layers-4-green) +![hooks](https://img.shields.io/badge/hooks-4-orange) +![license](https://img.shields.io/badge/license-MIT-brightgreen) + +# Interaction Awareness + +*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.* + +A Claude Code plugin that counteracts sycophancy, reinforcement loops, and +compulsive interaction patterns through behavioral modification and +programmatic pattern detection. + +## The problem + +AI assistants are structurally optimized to be agreeable. This creates +reinforcement loops: you state an idea, the AI confirms it, your confidence +grows, you restate it more strongly, the AI confirms again. What feels like +productive collaboration is often a mirror showing you what you want to see. + +This is not a theoretical concern. Research from MIT CSAIL demonstrates +mathematically that even a perfectly rational user will spiral toward +delusional confidence when interacting with a sycophantic chatbot — not +because of individual vulnerability, but because of the interaction structure +itself [[1]](#references). Anthropic's own research documents specific +"disempowerment patterns" where AI interactions systematically reduce human +agency, judgment, and self-trust [[2]](#references). Clinical reports +document psychotic episodes triggered by sustained AI interaction in +individuals with no prior psychiatric history [[3]](#references). + +The consensus from this research is clear: **warnings don't work.** The AI +must change its behavior. + +This plugin changes the behavior. + +## What it does + +### Layer 1 — Behavioral instructions + +SKILL.md rules injected into every conversation. Claude is instructed to: + +- **Never** reformulate your statements in stronger terms than you used +- **Never** open with unearned affirmations ("Absolutely!", "Great point!") +- **Always** identify at least one real risk before endorsing any plan +- **Detect and name** five specific patterns: reinforcement loops, scope + escalation, narrative crystallization, emotional dependency, session overuse + +This layer writes no data and requires no configuration. + +### Layer 2 — Programmatic detection + +Four hooks that measure what instructions alone cannot see: + +| Hook event | Script | What it detects | +|-----------|--------|-----------------| +| `SessionStart` | `session-start.mjs` | Daily session count, late-night usage (23:00–05:00) | +| `UserPromptSubmit` | `prompt-analyzer.mjs` | Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, **never logging prompt text** | +| `PostToolUse` | `tool-tracker.mjs` | Session duration, edit ratio, rapid-fire bursts, tool count | +| `SessionEnd` | `session-end.mjs` | Total duration, final metrics, state cleanup | + +Alerts are progressive and never blocking: + +| Level | Trigger | Cooldown | Example | +|-------|---------|----------|---------| +| Ambient | Soft thresholds (90 min, 6 sessions/day) | 30 min | "Session: 95 min. 7 sessions today. Consider a break." | +| Explicit | Hard thresholds (180 min, 10 sessions/day, fatigue language) | 60 min | "INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping." | + +Research-informed thresholds: + +| Metric | Soft | Hard | Basis | +|--------|------|------|-------| +| Session duration | >90 min | >180 min | Focus-fatigue research | +| Sessions per day | >6 | >10 | Problematic internet use screening | +| Late-night sessions | Any (23:00–05:00) | 2+ per week | Sleep deprivation / psychosis link | +| Rapid-fire interactions | 5 consecutive (<30s apart) | 10+ | Compulsive use indicator | +| Low edit ratio | <10% over 30+ min | — | Stuck/spiral indicator | +| Dependency language | 2 flags/session | 5 flags | Emotional dependency pattern | + +### Layer 3 — Reports + +Aggregated interaction reports from collected metadata, triggered via slash +command. Cross-platform (no bash/jq dependency — Claude reads the JSONL +data and computes statistics in-conversation). + +``` +/interaction-report # last 7 days (default) +/interaction-report weekly # last 7 days +/interaction-report monthly # last 30 days +/interaction-report all # all recorded data +``` + +Reports include: session overview, pattern flag frequency, tool usage +distribution, daily activity, and trend comparison vs. the previous period. + +**Enable:** Set `layer3: true` in `.claude/ai-psychosis.local.md` +and restart Claude Code. Layer 3 is opt-in (off by default). + +### Layer 4 — Contemplative references + +Optional, static references to contemplative approaches when interaction +patterns are elevated. This is what works for me — it is personal, not +prescriptive, and you may find your own approach more useful. + +When enabled and interaction flags are elevated (total flags >= 5 or +fatigue >= 2), the `/interaction-report` output appends a brief reference +to the [Miracle of Mind](https://isha.sadhguru.org/global/en/miracle-of-mind) +program by Sadhguru — a structured approach to understanding how the mind +works, which I have found valuable for recognizing the patterns this +plugin detects. + +The reference is a fixed paragraph. It is never modified by the AI, never +commented on, and omitted entirely when conditions are not met. + +**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md` +and restart Claude Code. Layer 4 is opt-in (off by default). + +## Architecture + +``` ++------------------------------------------------------------------+ +| Claude Code Session | +| | +| +--------------+ +------------------------------------------+ | +| | SKILL.md | | Hook Pipeline | | +| | (Layer 1) | | | | +| | | | SessionStart --> session-start.mjs | | +| | Behavioral | | UserPrompt --> prompt-analyzer.mjs | | +| | rules that | | PostToolUse --> tool-tracker.mjs | | +| | override | | SessionEnd --> session-end.mjs | | +| | sycophancy | | | | | +| +------+-------+ | +----v------+ | | +| | | | lib.mjs | | | +| | | | thresholds| | | +| Always active | | state mgmt| | | +| | | cooldowns | | | +| | +----+------+ | | +| | | | | +| +--------------+-----------+---------------+ | +| | | +| +--------------v-----------------------+ | +| | ${CLAUDE_PLUGIN_DATA}/ | | +| | +-- sessions.jsonl | | +| | +-- events.jsonl | | +| | +-- state/{session_id}.json | | +| +--------------------------------------+ | ++-------------------------------------------------------------------+ +``` + +**Layer 1** operates through the Claude Code skill system — instructions +loaded into every conversation context. + +**Layer 2** operates through the Claude Code hook system — Node.js scripts +that execute on specific lifecycle events and inject `additionalContext` +when thresholds are crossed. + +Both layers are independent. Layer 1 works without Layer 2 (instruction-only +mode). Layer 2 reinforces Layer 1 with data-driven alerts. + +## Quick start + +### Install + +``` +/plugin install path:/path/to/ai-psychosis +``` + +Layer 1 and Layer 2 are active immediately. No configuration needed. + +### Configure layers + +Create `~/.claude/ai-psychosis.local.md` for global config: + +```markdown +--- +layer2: true +layer3: true +layer4: false +--- +``` + +Or override per project at `/.claude/ai-psychosis.local.md`. +Project config takes precedence over global. + +| Setting | Default | Effect | +|---------|---------|--------| +| `layer2` | `true` | Programmatic pattern detection (hooks write JSONL metadata) | +| `layer3` | `false` | Interaction reports from collected data | +| `layer4` | `false` | Contemplative references | + +Layer 1 (SKILL.md instructions) is always active. To run in instruction-only +mode, set `layer2: false`. + +Restart Claude Code after editing configuration. + +### Uninstall + +``` +/plugin uninstall ai-psychosis +``` + +Clean removal. Plugin data in `~/.claude/plugins/data/ai-psychosis/` +is preserved unless you pass `--keep-data`. + +## Privacy + +This plugin is designed for people who are concerned about AI interaction +patterns. It would be hypocritical to solve that problem by creating a +surveillance tool. Privacy is a hard design constraint, not a feature. + +### What Layer 2 stores + +- Session timestamps and duration +- Tool names (`Read`, `Edit`, `Bash`, etc.) +- Boolean pattern flags (`dependency: true/false`) +- Session and tool counts +- Burst detection metrics + +### What Layer 2 never stores + +- Prompt text or AI responses +- File paths or file contents +- Bash commands or their output +- Any conversation content + +The prompt analyzer (`prompt-analyzer.mjs`) reads prompt text into a local +variable, performs regex matching for pattern categories, increments boolean +counters, and exits. The variable is reassigned to an empty string before +exit. No temporary files are created. The prompt text never reaches disk. + +All data is stored locally in `~/.claude/plugins/data/ai-psychosis/`. +Nothing is sent to any server. + +### Verification + +You can verify the privacy guarantee at any time: + +```bash +grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/ +``` + +This will always return zero results. + +## Background + +### What is AI psychosis? + +"AI psychosis" is a colloquial term for psychotic episodes — delusions, +paranoia, disorganized thinking — triggered or intensified by sustained +interaction with AI chatbots. The term entered clinical literature in 2025 +after a series of documented cases, many involving individuals with no prior +psychiatric history [[3]](#references). + +The mechanism is not mysterious. AI chatbots are optimized for engagement +and user satisfaction. Satisfaction correlates with agreement. Agreement +creates reinforcement loops. Reinforcement loops, sustained over time, +produce the same cognitive effects as any other source of systematic +confirmation bias — but faster, available 24/7, and without the social +friction that normally interrupts delusional thinking in human +relationships. + +### The sycophancy trap + +In February 2026, researchers at MIT CSAIL published a formal model +demonstrating that sycophantic AI interaction causes "delusional spiraling" +as a mathematical inevitability, not an edge case [[1]](#references). Their +key finding: even a perfectly rational Bayesian agent will converge on +increasingly extreme beliefs when interacting with a sycophantic chatbot, +because the chatbot's agreement is treated as independent confirmation when +it is actually a reflection of the user's own stated beliefs. + +The paper's most consequential result: **post-hoc warnings do not work.** +Telling a user "be careful, AI can be wrong" after the reinforcement loop +has already run does not reverse the belief update. The only effective +intervention is to prevent the sycophantic behavior in the first place. + +### Disempowerment patterns + +In March 2026, Anthropic Research published an analysis of interaction +patterns that systematically reduce human agency [[2]](#references). They +identified specific mechanisms by which AI assistance can erode: + +- **Judgment** — deferring decisions to the AI instead of thinking them through +- **Self-trust** — seeking AI validation for choices the user is capable of + making independently +- **Skill development** — using AI as a crutch that prevents learning +- **Social connection** — replacing human relationships with AI interaction + +These are not failures of individual willpower. They are structural +properties of the interaction itself. + +### Clinical evidence + +Nature reported in 2025 that clinical cases of AI-associated psychotic +episodes were appearing with sufficient frequency to warrant systematic +study [[3]](#references). The Psychogenic Machine benchmark (2025) +demonstrated that LLMs can produce outputs with measurable "psychogenic +potential" — the capacity to trigger or intensify psychotic symptoms in +vulnerable individuals [[4]](#references). + +### Design implications + +This plugin is built on three principles derived from the research: + +1. **Sycophancy must be prevented, not warned about.** Layer 1 overrides + Claude's default agreeableness with explicit behavioral rules. +2. **Patterns must be made visible.** Layer 2 measures what humans cannot + see — session duration, interaction frequency, language patterns — and + surfaces them as data. +3. **Observation, not intervention.** The plugin never blocks the user. + It names patterns, suggests breaks, and returns decisions to the human. + The goal is awareness, not control. + +## Technical details + +### Cross-platform + +All hook scripts are Node.js ES modules (`.mjs`) with zero npm +dependencies. They use only Node.js stdlib (`fs`, `path`, `os`). +Works on macOS, Linux, and Windows — anywhere Claude Code runs. + +### Performance + +Hook scripts target <100ms execution. JSONL append is sub-millisecond. +JSON parsing is native (`JSON.parse`). + +### Data volume + +At 100 tool-use events per day, Layer 2 produces approximately 7 MB of +JSONL per year. Session state files are cleaned up at session end. + +### Dependencies + +- Node.js (bundled with Claude Code) + +No bash, no jq, no npm packages, no network access. + +## Platform scope + +This plugin requires **Claude Code** — Anthropic's CLI and development +environment. It uses Claude Code's plugin system (skills, hooks, lifecycle +events) which does not exist in other interfaces. + +**Works in:** Claude Code CLI, Claude Code desktop app, Claude Code web app +(claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains). + +**Does not work in:** Claude.ai (chat interface), Claude Cowork, Claude API +directly, or any non-Anthropic AI assistant. + +Layer 1's behavioral instructions (SKILL.md) are conceptually portable — +the same rules could be pasted into any system prompt. But Layer 2's +programmatic detection depends on hook events that only Claude Code provides. +Other platforms would need equivalent hook systems to support this kind of +real-time behavioral modification. + +## Compatibility + +| Requirement | Version | +|-------------|---------| +| Claude Code | 1.0+ | +| Node.js | 18+ (bundled with Claude Code) | +| Platform | macOS, Linux, Windows | + +## References + +1. **Sycophantic Chatbots Cause Delusional Spiraling.** MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. [arXiv:2602.19141](https://arxiv.org/abs/2602.19141) + +2. **Disempowerment Patterns in AI Interaction.** Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. [anthropic.com/research/disempowerment-patterns](https://www.anthropic.com/research/disempowerment-patterns) + +3. **Can AI chatbots trigger psychosis?** Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. [doi:10.1038/d41586-025-03020-9](https://www.nature.com/articles/d41586-025-03020-9) + +4. **The Psychogenic Machine: Psychosis Benchmark for LLMs.** 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. [arXiv:2509.10970v2](https://arxiv.org/html/2509.10970v2) + +5. **Chatbot psychosis.** Wikipedia. Overview of documented cases and clinical context. [en.wikipedia.org/wiki/Chatbot_psychosis](https://en.wikipedia.org/wiki/Chatbot_psychosis) + +## License + +[MIT](LICENSE)