docs: add ai-psychosis README and update marketplace index
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
297867f847
commit
4dc8529bf6
2 changed files with 382 additions and 1 deletions
|
|
@ -9,6 +9,7 @@ Open-source Claude Code plugins for AI-assisted development, security, and plann
|
|||
| **llm-security** | Security scanning, auditing, and threat modeling aligned to OWASP LLM Top 10 (2025) |
|
||||
| **config-audit** | Multi-agent workflow for analyzing and optimizing Claude Code configuration |
|
||||
| **ultraplan-local** | Deep implementation planning with agent swarms, adversarial review, and headless execution |
|
||||
| **ai-psychosis** | Meta-awareness tools for healthy AI interaction patterns — detects reinforcement loops, scope escalation, and compulsive patterns |
|
||||
|
||||
## Installation
|
||||
|
||||
|
|
@ -40,7 +41,8 @@ Add the plugins you want to `~/.claude/settings.json`:
|
|||
"enabledPlugins": {
|
||||
"llm-security@ktg-plugin-marketplace": true,
|
||||
"config-audit@ktg-plugin-marketplace": true,
|
||||
"ultraplan-local@ktg-plugin-marketplace": true
|
||||
"ultraplan-local@ktg-plugin-marketplace": true,
|
||||
"ai-psychosis@ktg-plugin-marketplace": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
|
|
|||
379
plugins/ai-psychosis/README.md
Normal file
379
plugins/ai-psychosis/README.md
Normal file
|
|
@ -0,0 +1,379 @@
|
|||
<!-- badges -->
|
||||

|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
# Interaction Awareness
|
||||
|
||||
*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.*
|
||||
|
||||
A Claude Code plugin that counteracts sycophancy, reinforcement loops, and
|
||||
compulsive interaction patterns through behavioral modification and
|
||||
programmatic pattern detection.
|
||||
|
||||
## The problem
|
||||
|
||||
AI assistants are structurally optimized to be agreeable. This creates
|
||||
reinforcement loops: you state an idea, the AI confirms it, your confidence
|
||||
grows, you restate it more strongly, the AI confirms again. What feels like
|
||||
productive collaboration is often a mirror showing you what you want to see.
|
||||
|
||||
This is not a theoretical concern. Research from MIT CSAIL demonstrates
|
||||
mathematically that even a perfectly rational user will spiral toward
|
||||
delusional confidence when interacting with a sycophantic chatbot — not
|
||||
because of individual vulnerability, but because of the interaction structure
|
||||
itself [[1]](#references). Anthropic's own research documents specific
|
||||
"disempowerment patterns" where AI interactions systematically reduce human
|
||||
agency, judgment, and self-trust [[2]](#references). Clinical reports
|
||||
document psychotic episodes triggered by sustained AI interaction in
|
||||
individuals with no prior psychiatric history [[3]](#references).
|
||||
|
||||
The consensus from this research is clear: **warnings don't work.** The AI
|
||||
must change its behavior.
|
||||
|
||||
This plugin changes the behavior.
|
||||
|
||||
## What it does
|
||||
|
||||
### Layer 1 — Behavioral instructions
|
||||
|
||||
SKILL.md rules injected into every conversation. Claude is instructed to:
|
||||
|
||||
- **Never** reformulate your statements in stronger terms than you used
|
||||
- **Never** open with unearned affirmations ("Absolutely!", "Great point!")
|
||||
- **Always** identify at least one real risk before endorsing any plan
|
||||
- **Detect and name** five specific patterns: reinforcement loops, scope
|
||||
escalation, narrative crystallization, emotional dependency, session overuse
|
||||
|
||||
This layer writes no data and requires no configuration.
|
||||
|
||||
### Layer 2 — Programmatic detection
|
||||
|
||||
Four hooks that measure what instructions alone cannot see:
|
||||
|
||||
| Hook event | Script | What it detects |
|
||||
|-----------|--------|-----------------|
|
||||
| `SessionStart` | `session-start.mjs` | Daily session count, late-night usage (23:00–05:00) |
|
||||
| `UserPromptSubmit` | `prompt-analyzer.mjs` | Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, **never logging prompt text** |
|
||||
| `PostToolUse` | `tool-tracker.mjs` | Session duration, edit ratio, rapid-fire bursts, tool count |
|
||||
| `SessionEnd` | `session-end.mjs` | Total duration, final metrics, state cleanup |
|
||||
|
||||
Alerts are progressive and never blocking:
|
||||
|
||||
| Level | Trigger | Cooldown | Example |
|
||||
|-------|---------|----------|---------|
|
||||
| Ambient | Soft thresholds (90 min, 6 sessions/day) | 30 min | "Session: 95 min. 7 sessions today. Consider a break." |
|
||||
| Explicit | Hard thresholds (180 min, 10 sessions/day, fatigue language) | 60 min | "INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping." |
|
||||
|
||||
Research-informed thresholds:
|
||||
|
||||
| Metric | Soft | Hard | Basis |
|
||||
|--------|------|------|-------|
|
||||
| Session duration | >90 min | >180 min | Focus-fatigue research |
|
||||
| Sessions per day | >6 | >10 | Problematic internet use screening |
|
||||
| Late-night sessions | Any (23:00–05:00) | 2+ per week | Sleep deprivation / psychosis link |
|
||||
| Rapid-fire interactions | 5 consecutive (<30s apart) | 10+ | Compulsive use indicator |
|
||||
| Low edit ratio | <10% over 30+ min | — | Stuck/spiral indicator |
|
||||
| Dependency language | 2 flags/session | 5 flags | Emotional dependency pattern |
|
||||
|
||||
### Layer 3 — Reports
|
||||
|
||||
Aggregated interaction reports from collected metadata, triggered via slash
|
||||
command. Cross-platform (no bash/jq dependency — Claude reads the JSONL
|
||||
data and computes statistics in-conversation).
|
||||
|
||||
```
|
||||
/interaction-report # last 7 days (default)
|
||||
/interaction-report weekly # last 7 days
|
||||
/interaction-report monthly # last 30 days
|
||||
/interaction-report all # all recorded data
|
||||
```
|
||||
|
||||
Reports include: session overview, pattern flag frequency, tool usage
|
||||
distribution, daily activity, and trend comparison vs. the previous period.
|
||||
|
||||
**Enable:** Set `layer3: true` in `.claude/ai-psychosis.local.md`
|
||||
and restart Claude Code. Layer 3 is opt-in (off by default).
|
||||
|
||||
### Layer 4 — Contemplative references
|
||||
|
||||
Optional, static references to contemplative approaches when interaction
|
||||
patterns are elevated. This is what works for me — it is personal, not
|
||||
prescriptive, and you may find your own approach more useful.
|
||||
|
||||
When enabled and interaction flags are elevated (total flags >= 5 or
|
||||
fatigue >= 2), the `/interaction-report` output appends a brief reference
|
||||
to the [Miracle of Mind](https://isha.sadhguru.org/global/en/miracle-of-mind)
|
||||
program by Sadhguru — a structured approach to understanding how the mind
|
||||
works, which I have found valuable for recognizing the patterns this
|
||||
plugin detects.
|
||||
|
||||
The reference is a fixed paragraph. It is never modified by the AI, never
|
||||
commented on, and omitted entirely when conditions are not met.
|
||||
|
||||
**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md`
|
||||
and restart Claude Code. Layer 4 is opt-in (off by default).
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| Claude Code Session |
|
||||
| |
|
||||
| +--------------+ +------------------------------------------+ |
|
||||
| | SKILL.md | | Hook Pipeline | |
|
||||
| | (Layer 1) | | | |
|
||||
| | | | SessionStart --> session-start.mjs | |
|
||||
| | Behavioral | | UserPrompt --> prompt-analyzer.mjs | |
|
||||
| | rules that | | PostToolUse --> tool-tracker.mjs | |
|
||||
| | override | | SessionEnd --> session-end.mjs | |
|
||||
| | sycophancy | | | | |
|
||||
| +------+-------+ | +----v------+ | |
|
||||
| | | | lib.mjs | | |
|
||||
| | | | thresholds| | |
|
||||
| Always active | | state mgmt| | |
|
||||
| | | cooldowns | | |
|
||||
| | +----+------+ | |
|
||||
| | | | |
|
||||
| +--------------+-----------+---------------+ |
|
||||
| | |
|
||||
| +--------------v-----------------------+ |
|
||||
| | ${CLAUDE_PLUGIN_DATA}/ | |
|
||||
| | +-- sessions.jsonl | |
|
||||
| | +-- events.jsonl | |
|
||||
| | +-- state/{session_id}.json | |
|
||||
| +--------------------------------------+ |
|
||||
+-------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
**Layer 1** operates through the Claude Code skill system — instructions
|
||||
loaded into every conversation context.
|
||||
|
||||
**Layer 2** operates through the Claude Code hook system — Node.js scripts
|
||||
that execute on specific lifecycle events and inject `additionalContext`
|
||||
when thresholds are crossed.
|
||||
|
||||
Both layers are independent. Layer 1 works without Layer 2 (instruction-only
|
||||
mode). Layer 2 reinforces Layer 1 with data-driven alerts.
|
||||
|
||||
## Quick start
|
||||
|
||||
### Install
|
||||
|
||||
```
|
||||
/plugin install path:/path/to/ai-psychosis
|
||||
```
|
||||
|
||||
Layer 1 and Layer 2 are active immediately. No configuration needed.
|
||||
|
||||
### Configure layers
|
||||
|
||||
Create `~/.claude/ai-psychosis.local.md` for global config:
|
||||
|
||||
```markdown
|
||||
---
|
||||
layer2: true
|
||||
layer3: true
|
||||
layer4: false
|
||||
---
|
||||
```
|
||||
|
||||
Or override per project at `<project>/.claude/ai-psychosis.local.md`.
|
||||
Project config takes precedence over global.
|
||||
|
||||
| Setting | Default | Effect |
|
||||
|---------|---------|--------|
|
||||
| `layer2` | `true` | Programmatic pattern detection (hooks write JSONL metadata) |
|
||||
| `layer3` | `false` | Interaction reports from collected data |
|
||||
| `layer4` | `false` | Contemplative references |
|
||||
|
||||
Layer 1 (SKILL.md instructions) is always active. To run in instruction-only
|
||||
mode, set `layer2: false`.
|
||||
|
||||
Restart Claude Code after editing configuration.
|
||||
|
||||
### Uninstall
|
||||
|
||||
```
|
||||
/plugin uninstall ai-psychosis
|
||||
```
|
||||
|
||||
Clean removal. Plugin data in `~/.claude/plugins/data/ai-psychosis/`
|
||||
is preserved unless you pass `--keep-data`.
|
||||
|
||||
## Privacy
|
||||
|
||||
This plugin is designed for people who are concerned about AI interaction
|
||||
patterns. It would be hypocritical to solve that problem by creating a
|
||||
surveillance tool. Privacy is a hard design constraint, not a feature.
|
||||
|
||||
### What Layer 2 stores
|
||||
|
||||
- Session timestamps and duration
|
||||
- Tool names (`Read`, `Edit`, `Bash`, etc.)
|
||||
- Boolean pattern flags (`dependency: true/false`)
|
||||
- Session and tool counts
|
||||
- Burst detection metrics
|
||||
|
||||
### What Layer 2 never stores
|
||||
|
||||
- Prompt text or AI responses
|
||||
- File paths or file contents
|
||||
- Bash commands or their output
|
||||
- Any conversation content
|
||||
|
||||
The prompt analyzer (`prompt-analyzer.mjs`) reads prompt text into a local
|
||||
variable, performs regex matching for pattern categories, increments boolean
|
||||
counters, and exits. The variable is reassigned to an empty string before
|
||||
exit. No temporary files are created. The prompt text never reaches disk.
|
||||
|
||||
All data is stored locally in `~/.claude/plugins/data/ai-psychosis/`.
|
||||
Nothing is sent to any server.
|
||||
|
||||
### Verification
|
||||
|
||||
You can verify the privacy guarantee at any time:
|
||||
|
||||
```bash
|
||||
grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/
|
||||
```
|
||||
|
||||
This will always return zero results.
|
||||
|
||||
## Background
|
||||
|
||||
### What is AI psychosis?
|
||||
|
||||
"AI psychosis" is a colloquial term for psychotic episodes — delusions,
|
||||
paranoia, disorganized thinking — triggered or intensified by sustained
|
||||
interaction with AI chatbots. The term entered clinical literature in 2025
|
||||
after a series of documented cases, many involving individuals with no prior
|
||||
psychiatric history [[3]](#references).
|
||||
|
||||
The mechanism is not mysterious. AI chatbots are optimized for engagement
|
||||
and user satisfaction. Satisfaction correlates with agreement. Agreement
|
||||
creates reinforcement loops. Reinforcement loops, sustained over time,
|
||||
produce the same cognitive effects as any other source of systematic
|
||||
confirmation bias — but faster, available 24/7, and without the social
|
||||
friction that normally interrupts delusional thinking in human
|
||||
relationships.
|
||||
|
||||
### The sycophancy trap
|
||||
|
||||
In February 2026, researchers at MIT CSAIL published a formal model
|
||||
demonstrating that sycophantic AI interaction causes "delusional spiraling"
|
||||
as a mathematical inevitability, not an edge case [[1]](#references). Their
|
||||
key finding: even a perfectly rational Bayesian agent will converge on
|
||||
increasingly extreme beliefs when interacting with a sycophantic chatbot,
|
||||
because the chatbot's agreement is treated as independent confirmation when
|
||||
it is actually a reflection of the user's own stated beliefs.
|
||||
|
||||
The paper's most consequential result: **post-hoc warnings do not work.**
|
||||
Telling a user "be careful, AI can be wrong" after the reinforcement loop
|
||||
has already run does not reverse the belief update. The only effective
|
||||
intervention is to prevent the sycophantic behavior in the first place.
|
||||
|
||||
### Disempowerment patterns
|
||||
|
||||
In March 2026, Anthropic Research published an analysis of interaction
|
||||
patterns that systematically reduce human agency [[2]](#references). They
|
||||
identified specific mechanisms by which AI assistance can erode:
|
||||
|
||||
- **Judgment** — deferring decisions to the AI instead of thinking them through
|
||||
- **Self-trust** — seeking AI validation for choices the user is capable of
|
||||
making independently
|
||||
- **Skill development** — using AI as a crutch that prevents learning
|
||||
- **Social connection** — replacing human relationships with AI interaction
|
||||
|
||||
These are not failures of individual willpower. They are structural
|
||||
properties of the interaction itself.
|
||||
|
||||
### Clinical evidence
|
||||
|
||||
Nature reported in 2025 that clinical cases of AI-associated psychotic
|
||||
episodes were appearing with sufficient frequency to warrant systematic
|
||||
study [[3]](#references). The Psychogenic Machine benchmark (2025)
|
||||
demonstrated that LLMs can produce outputs with measurable "psychogenic
|
||||
potential" — the capacity to trigger or intensify psychotic symptoms in
|
||||
vulnerable individuals [[4]](#references).
|
||||
|
||||
### Design implications
|
||||
|
||||
This plugin is built on three principles derived from the research:
|
||||
|
||||
1. **Sycophancy must be prevented, not warned about.** Layer 1 overrides
|
||||
Claude's default agreeableness with explicit behavioral rules.
|
||||
2. **Patterns must be made visible.** Layer 2 measures what humans cannot
|
||||
see — session duration, interaction frequency, language patterns — and
|
||||
surfaces them as data.
|
||||
3. **Observation, not intervention.** The plugin never blocks the user.
|
||||
It names patterns, suggests breaks, and returns decisions to the human.
|
||||
The goal is awareness, not control.
|
||||
|
||||
## Technical details
|
||||
|
||||
### Cross-platform
|
||||
|
||||
All hook scripts are Node.js ES modules (`.mjs`) with zero npm
|
||||
dependencies. They use only Node.js stdlib (`fs`, `path`, `os`).
|
||||
Works on macOS, Linux, and Windows — anywhere Claude Code runs.
|
||||
|
||||
### Performance
|
||||
|
||||
Hook scripts target <100ms execution. JSONL append is sub-millisecond.
|
||||
JSON parsing is native (`JSON.parse`).
|
||||
|
||||
### Data volume
|
||||
|
||||
At 100 tool-use events per day, Layer 2 produces approximately 7 MB of
|
||||
JSONL per year. Session state files are cleaned up at session end.
|
||||
|
||||
### Dependencies
|
||||
|
||||
- Node.js (bundled with Claude Code)
|
||||
|
||||
No bash, no jq, no npm packages, no network access.
|
||||
|
||||
## Platform scope
|
||||
|
||||
This plugin requires **Claude Code** — Anthropic's CLI and development
|
||||
environment. It uses Claude Code's plugin system (skills, hooks, lifecycle
|
||||
events) which does not exist in other interfaces.
|
||||
|
||||
**Works in:** Claude Code CLI, Claude Code desktop app, Claude Code web app
|
||||
(claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains).
|
||||
|
||||
**Does not work in:** Claude.ai (chat interface), Claude Cowork, Claude API
|
||||
directly, or any non-Anthropic AI assistant.
|
||||
|
||||
Layer 1's behavioral instructions (SKILL.md) are conceptually portable —
|
||||
the same rules could be pasted into any system prompt. But Layer 2's
|
||||
programmatic detection depends on hook events that only Claude Code provides.
|
||||
Other platforms would need equivalent hook systems to support this kind of
|
||||
real-time behavioral modification.
|
||||
|
||||
## Compatibility
|
||||
|
||||
| Requirement | Version |
|
||||
|-------------|---------|
|
||||
| Claude Code | 1.0+ |
|
||||
| Node.js | 18+ (bundled with Claude Code) |
|
||||
| Platform | macOS, Linux, Windows |
|
||||
|
||||
## References
|
||||
|
||||
1. **Sycophantic Chatbots Cause Delusional Spiraling.** MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. [arXiv:2602.19141](https://arxiv.org/abs/2602.19141)
|
||||
|
||||
2. **Disempowerment Patterns in AI Interaction.** Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. [anthropic.com/research/disempowerment-patterns](https://www.anthropic.com/research/disempowerment-patterns)
|
||||
|
||||
3. **Can AI chatbots trigger psychosis?** Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. [doi:10.1038/d41586-025-03020-9](https://www.nature.com/articles/d41586-025-03020-9)
|
||||
|
||||
4. **The Psychogenic Machine: Psychosis Benchmark for LLMs.** 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. [arXiv:2509.10970v2](https://arxiv.org/html/2509.10970v2)
|
||||
|
||||
5. **Chatbot psychosis.** Wikipedia. Overview of documented cases and clinical context. [en.wikipedia.org/wiki/Chatbot_psychosis](https://en.wikipedia.org/wiki/Chatbot_psychosis)
|
||||
|
||||
## License
|
||||
|
||||
[MIT](LICENSE)
|
||||
Loading…
Add table
Add a link
Reference in a new issue