ktg-plugin-marketplace/plugins/ai-psychosis/README.md

<!-- badges -->
![version](https://img.shields.io/badge/version-1.0.0-blue)
![platform](https://img.shields.io/badge/platform-Claude_Code-7C3AED)
![layers](https://img.shields.io/badge/layers-4-green)
![hooks](https://img.shields.io/badge/hooks-4-orange)
![license](https://img.shields.io/badge/license-MIT-brightgreen)

# Interaction Awareness

*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.*

*AI-generated: all code produced by Claude Code through dialog-driven development. [Full disclosure →](../../README.md#ai-generated-code-disclosure)*

A Claude Code plugin that counteracts sycophancy, reinforcement loops, and
compulsive interaction patterns through behavioral modification and
programmatic pattern detection.

## The problem

AI assistants are structurally optimized to be agreeable. This creates
reinforcement loops: you state an idea, the AI confirms it, your confidence
grows, you restate it more strongly, the AI confirms again. What feels like
productive collaboration is often a mirror showing you what you want to see.

This is not a theoretical concern. Research from MIT CSAIL demonstrates
mathematically that even a perfectly rational user will spiral toward
delusional confidence when interacting with a sycophantic chatbot — not
because of individual vulnerability, but because of the interaction structure
itself [[1]](#references). Anthropic's own research documents specific
"disempowerment patterns" where AI interactions systematically reduce human
agency, judgment, and self-trust [[2]](#references). Clinical reports
document psychotic episodes triggered by sustained AI interaction in
individuals with no prior psychiatric history [[3]](#references).

The consensus from this research is clear: **warnings don't work.** The AI
must change its behavior.

This plugin changes the behavior.

## What it does

### Layer 1 — Behavioral instructions

SKILL.md rules injected into every conversation. Claude is instructed to:

- **Never** reformulate your statements in stronger terms than you used
- **Never** open with unearned affirmations ("Absolutely!", "Great point!")
- **Always** identify at least one real risk before endorsing any plan
- **Detect and name** five specific patterns: reinforcement loops, scope
  escalation, narrative crystallization, emotional dependency, session overuse

This layer writes no data and requires no configuration.

### Layer 2 — Programmatic detection

Four hooks that measure what instructions alone cannot see:

| Hook event | Script | What it detects |
|-----------|--------|-----------------|
| `SessionStart` | `session-start.mjs` | Daily session count, late-night usage (23:00–05:00) |
| `UserPromptSubmit` | `prompt-analyzer.mjs` | Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, **never logging prompt text** |
| `PostToolUse` | `tool-tracker.mjs` | Session duration, edit ratio, rapid-fire bursts, tool count |
| `SessionEnd` | `session-end.mjs` | Total duration, final metrics, state cleanup |

Alerts are progressive and never blocking:

| Level | Trigger | Cooldown | Example |
|-------|---------|----------|---------|
| Ambient | Soft thresholds (90 min, 6 sessions/day) | 30 min | "Session: 95 min. 7 sessions today. Consider a break." |
| Explicit | Hard thresholds (180 min, 10 sessions/day, fatigue language) | 60 min | "INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping." |

Research-informed thresholds:

| Metric | Soft | Hard | Basis |
|--------|------|------|-------|
| Session duration | >90 min | >180 min | Focus-fatigue research |
| Sessions per day | >6 | >10 | Problematic internet use screening |
| Late-night sessions | Any (23:00–05:00) | 2+ per week | Sleep deprivation / psychosis link |
| Rapid-fire interactions | 5 consecutive (<30s apart) | 10+ | Compulsive use indicator |
| Low edit ratio | <10% over 30+ min | — | Stuck/spiral indicator |
| Dependency language | 2 flags/session | 5 flags | Emotional dependency pattern |

### Layer 3 — Reports

Aggregated interaction reports from collected metadata, triggered via slash
command. Cross-platform (no bash/jq dependency — Claude reads the JSONL
data and computes statistics in-conversation).

```
/interaction-report              # last 7 days (default)
/interaction-report weekly       # last 7 days
/interaction-report monthly      # last 30 days
/interaction-report all          # all recorded data
```

Reports include: session overview, pattern flag frequency, tool usage
distribution, daily activity, and trend comparison vs. the previous period.

**Enable:** Set `layer3: true` in `.claude/ai-psychosis.local.md`
and restart Claude Code. Layer 3 is opt-in (off by default).

### Layer 4 — Contemplative references

Optional, static references to contemplative approaches when interaction
patterns are elevated. This is what works for me — it is personal, not
prescriptive, and you may find your own approach more useful.

When enabled and interaction flags are elevated (total flags >= 5 or
fatigue >= 2), the `/interaction-report` output appends a brief reference
to the [Miracle of Mind](https://isha.sadhguru.org/global/en/miracle-of-mind)
program by Sadhguru — a structured approach to understanding how the mind
works, which I have found valuable for recognizing the patterns this
plugin detects.

The reference is a fixed paragraph. It is never modified by the AI, never
commented on, and omitted entirely when conditions are not met.

**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md`
and restart Claude Code. Layer 4 is opt-in (off by default).

## Architecture

```
+------------------------------------------------------------------+
|                        Claude Code Session                        |
|                                                                   |
|  +--------------+    +------------------------------------------+ |
|  |   SKILL.md   |    |            Hook Pipeline                 | |
|  |   (Layer 1)  |    |                                          | |
|  |              |    |  SessionStart --> session-start.mjs       | |
|  |  Behavioral  |    |  UserPrompt  --> prompt-analyzer.mjs     | |
|  |  rules that  |    |  PostToolUse --> tool-tracker.mjs        | |
|  |  override    |    |  SessionEnd  --> session-end.mjs         | |
|  |  sycophancy  |    |              |                           | |
|  +------+-------+    |         +----v------+                    | |
|         |            |         |  lib.mjs  |                    | |
|         |            |         | thresholds|                    | |
|   Always active      |         | state mgmt|                    | |
|                      |         | cooldowns |                    | |
|                      |         +----+------+                    | |
|                      |              |                           | |
|                      +--------------+-----------+---------------+ |
|                                     |                             |
|                      +--------------v-----------------------+     |
|                      |    ${CLAUDE_PLUGIN_DATA}/            |     |
|                      |    +-- sessions.jsonl                |     |
|                      |    +-- events.jsonl                  |     |
|                      |    +-- state/{session_id}.json       |     |
|                      +--------------------------------------+     |
+-------------------------------------------------------------------+
```

**Layer 1** operates through the Claude Code skill system — instructions
loaded into every conversation context.

**Layer 2** operates through the Claude Code hook system — Node.js scripts
that execute on specific lifecycle events and inject `additionalContext`
when thresholds are crossed.

Both layers are independent. Layer 1 works without Layer 2 (instruction-only
mode). Layer 2 reinforces Layer 1 with data-driven alerts.

## Quick start

### Installation

Add the marketplace and browse plugins with `/plugin`:

```bash
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
```

Or enable directly in `~/.claude/settings.json`:

```json
{
  "enabledPlugins": {
    "ai-psychosis@ktg-plugin-marketplace": true
  }
}
```

Layer 1 and Layer 2 are active immediately. No configuration needed.

### Configure layers

Create `~/.claude/ai-psychosis.local.md` for global config:

```markdown
---
layer2: true
layer3: true
layer4: false
---
```

Or override per project at `<project>/.claude/ai-psychosis.local.md`.
Project config takes precedence over global.

| Setting | Default | Effect |
|---------|---------|--------|
| `layer2` | `true` | Programmatic pattern detection (hooks write JSONL metadata) |
| `layer3` | `false` | Interaction reports from collected data |
| `layer4` | `false` | Contemplative references |

Layer 1 (SKILL.md instructions) is always active. To run in instruction-only
mode, set `layer2: false`.

Restart Claude Code after editing configuration.

### Uninstall

```
/plugin uninstall ai-psychosis
```

Clean removal. Plugin data in `~/.claude/plugins/data/ai-psychosis/`
is preserved unless you pass `--keep-data`.

## Privacy

This plugin is designed for people who are concerned about AI interaction
patterns. It would be hypocritical to solve that problem by creating a
surveillance tool. Privacy is a hard design constraint, not a feature.

### What Layer 2 stores

- Session timestamps and duration
- Tool names (`Read`, `Edit`, `Bash`, etc.)
- Boolean pattern flags (`dependency: true/false`)
- Session and tool counts
- Burst detection metrics

### What Layer 2 never stores

- Prompt text or AI responses
- File paths or file contents
- Bash commands or their output
- Any conversation content

The prompt analyzer (`prompt-analyzer.mjs`) reads prompt text into a local
variable, performs regex matching for pattern categories, increments boolean
counters, and exits. The variable is reassigned to an empty string before
exit. No temporary files are created. The prompt text never reaches disk.

All data is stored locally in `~/.claude/plugins/data/ai-psychosis/`.
Nothing is sent to any server.

### Verification

You can verify the privacy guarantee at any time:

```bash
grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/
```

This will always return zero results.

## Background

### What is AI psychosis?

"AI psychosis" is a colloquial term for psychotic episodes — delusions,
paranoia, disorganized thinking — triggered or intensified by sustained
interaction with AI chatbots. The term entered clinical literature in 2025
after a series of documented cases, many involving individuals with no prior
psychiatric history [[3]](#references).

The mechanism is not mysterious. AI chatbots are optimized for engagement
and user satisfaction. Satisfaction correlates with agreement. Agreement
creates reinforcement loops. Reinforcement loops, sustained over time,
produce the same cognitive effects as any other source of systematic
confirmation bias — but faster, available 24/7, and without the social
friction that normally interrupts delusional thinking in human
relationships.

### The sycophancy trap

In February 2026, researchers at MIT CSAIL published a formal model
demonstrating that sycophantic AI interaction causes "delusional spiraling"
as a mathematical inevitability, not an edge case [[1]](#references). Their
key finding: even a perfectly rational Bayesian agent will converge on
increasingly extreme beliefs when interacting with a sycophantic chatbot,
because the chatbot's agreement is treated as independent confirmation when
it is actually a reflection of the user's own stated beliefs.

The paper's most consequential result: **post-hoc warnings do not work.**
Telling a user "be careful, AI can be wrong" after the reinforcement loop
has already run does not reverse the belief update. The only effective
intervention is to prevent the sycophantic behavior in the first place.

### Disempowerment patterns

In March 2026, Anthropic Research published an analysis of interaction
patterns that systematically reduce human agency [[2]](#references). They
identified specific mechanisms by which AI assistance can erode:

- **Judgment** — deferring decisions to the AI instead of thinking them through
- **Self-trust** — seeking AI validation for choices the user is capable of
  making independently
- **Skill development** — using AI as a crutch that prevents learning
- **Social connection** — replacing human relationships with AI interaction

These are not failures of individual willpower. They are structural
properties of the interaction itself.

### Clinical evidence

Nature reported in 2025 that clinical cases of AI-associated psychotic
episodes were appearing with sufficient frequency to warrant systematic
study [[3]](#references). The Psychogenic Machine benchmark (2025)
demonstrated that LLMs can produce outputs with measurable "psychogenic
potential" — the capacity to trigger or intensify psychotic symptoms in
vulnerable individuals [[4]](#references).

### Design implications

This plugin is built on three principles derived from the research:

1. **Sycophancy must be prevented, not warned about.** Layer 1 overrides
   Claude's default agreeableness with explicit behavioral rules.
2. **Patterns must be made visible.** Layer 2 measures what humans cannot
   see — session duration, interaction frequency, language patterns — and
   surfaces them as data.
3. **Observation, not intervention.** The plugin never blocks the user.
   It names patterns, suggests breaks, and returns decisions to the human.
   The goal is awareness, not control.

## Technical details

### Cross-platform

All hook scripts are Node.js ES modules (`.mjs`) with zero npm
dependencies. They use only Node.js stdlib (`fs`, `path`, `os`).
Works on macOS, Linux, and Windows — anywhere Claude Code runs.

### Performance

Hook scripts target <100ms execution. JSONL append is sub-millisecond.
JSON parsing is native (`JSON.parse`).

### Data volume

At 100 tool-use events per day, Layer 2 produces approximately 7 MB of
JSONL per year. Session state files are cleaned up at session end.

### Dependencies

- Node.js (bundled with Claude Code)

No bash, no jq, no npm packages, no network access.

## Platform scope

This plugin requires **Claude Code** — Anthropic's CLI and development
environment. It uses Claude Code's plugin system (skills, hooks, lifecycle
events) which does not exist in other interfaces.

**Works in:** Claude Code CLI, Claude Code desktop app, Claude Code web app
(claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains).

**Does not work in:** Claude.ai (chat interface), Claude Cowork, Claude API
directly, or any non-Anthropic AI assistant.

Layer 1's behavioral instructions (SKILL.md) are conceptually portable —
the same rules could be pasted into any system prompt. But Layer 2's
programmatic detection depends on hook events that only Claude Code provides.
Other platforms would need equivalent hook systems to support this kind of
real-time behavioral modification.

## Compatibility

| Requirement | Version |
|-------------|---------|
| Claude Code | 1.0+ |
| Node.js | 18+ (bundled with Claude Code) |
| Platform | macOS, Linux, Windows |

## References

1. **Sycophantic Chatbots Cause Delusional Spiraling.** MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. [arXiv:2602.19141](https://arxiv.org/abs/2602.19141)

2. **Disempowerment Patterns in AI Interaction.** Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. [anthropic.com/research/disempowerment-patterns](https://www.anthropic.com/research/disempowerment-patterns)

3. **Can AI chatbots trigger psychosis?** Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. [doi:10.1038/d41586-025-03020-9](https://www.nature.com/articles/d41586-025-03020-9)

4. **The Psychogenic Machine: Psychosis Benchmark for LLMs.** 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. [arXiv:2509.10970v2](https://arxiv.org/html/2509.10970v2)

5. **Chatbot psychosis.** Wikipedia. Overview of documented cases and clinical context. [en.wikipedia.org/wiki/Chatbot_psychosis](https://en.wikipedia.org/wiki/Chatbot_psychosis)

## License

[MIT](LICENSE)