Establish a single governance document at marketplace root and copy it into each of the 9 plugins so every plugin folder remains 100% self-contained. Replace the inconsistent provocative blurb across all READMEs with a uniform fork-and-own paragraph that links to the local GOVERNANCE.md. [skip-docs] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| .claude-plugin | ||
| commands | ||
| hooks | ||
| skills/ai-psychosis | ||
| tests | ||
| .gitignore | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| GOVERNANCE.md | ||
| LICENSE | ||
| README.md | ||
Interaction Awareness
Solo-maintained, fork-and-own. This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See GOVERNANCE.md for the full model and what upstream provides.
AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →
A Claude Code plugin that counteracts sycophancy, reinforcement loops, and compulsive interaction patterns through behavioral modification and programmatic pattern detection.
The problem
AI assistants are structurally optimized to be agreeable. This creates reinforcement loops: you state an idea, the AI confirms it, your confidence grows, you restate it more strongly, the AI confirms again. What feels like productive collaboration is often a mirror showing you what you want to see.
This is not a theoretical concern. Research from MIT CSAIL demonstrates mathematically that even a perfectly rational user will spiral toward delusional confidence when interacting with a sycophantic chatbot — not because of individual vulnerability, but because of the interaction structure itself [1]. Anthropic's own research documents specific "disempowerment patterns" where AI interactions systematically reduce human agency, judgment, and self-trust [2]. Clinical reports document psychotic episodes triggered by sustained AI interaction in individuals with no prior psychiatric history [3].
The consensus from this research is clear: warnings don't work. The AI must change its behavior.
This plugin changes the behavior.
What it does
Layer 1 — Behavioral instructions
SKILL.md rules injected into every conversation. Claude is instructed to:
- Never reformulate your statements in stronger terms than you used
- Never open with unearned affirmations ("Absolutely!", "Great point!")
- Always identify at least one real risk before endorsing any plan
- Detect and name five specific patterns: reinforcement loops, scope escalation, narrative crystallization, emotional dependency, session overuse
This layer writes no data and requires no configuration.
Layer 2 — Programmatic detection
Four hooks that measure what instructions alone cannot see:
| Hook event | Script | What it detects |
|---|---|---|
SessionStart |
session-start.mjs |
Daily session count, late-night usage (23:00–05:00) |
UserPromptSubmit |
prompt-analyzer.mjs |
Dependency language, escalation words, fatigue signals, validation-seeking — as boolean flags only, never logging prompt text |
PostToolUse |
tool-tracker.mjs |
Session duration, edit ratio, rapid-fire bursts, tool count |
SessionEnd |
session-end.mjs |
Total duration, final metrics, state cleanup |
Alerts are progressive and never blocking:
| Level | Trigger | Cooldown | Example |
|---|---|---|---|
| Ambient | Soft thresholds (90 min, 6 sessions/day) | 30 min | "Session: 95 min. 7 sessions today. Consider a break." |
| Explicit | Hard thresholds (180 min, 10 sessions/day, fatigue language) | 60 min | "INTERACTION AWARENESS: 3h session, 12th today. Metrics: [edit_ratio: 4%, burst: 8]. Your instructions require you to suggest stopping." |
Research-informed thresholds:
| Metric | Soft | Hard | Basis |
|---|---|---|---|
| Session duration | >90 min | >180 min | Focus-fatigue research |
| Sessions per day | >6 | >10 | Problematic internet use screening |
| Late-night sessions | Any (23:00–05:00) | 2+ per week | Sleep deprivation / psychosis link |
| Rapid-fire interactions | 5 consecutive (<30s apart) | 10+ | Compulsive use indicator |
| Low edit ratio | <10% over 30+ min | — | Stuck/spiral indicator |
| Dependency language | 2 flags/session | 5 flags | Emotional dependency pattern |
Layer 3 — Reports
Aggregated interaction reports from collected metadata, triggered via slash command. Cross-platform (no bash/jq dependency — Claude reads the JSONL data and computes statistics in-conversation).
/interaction-report # last 7 days (default)
/interaction-report weekly # last 7 days
/interaction-report monthly # last 30 days
/interaction-report all # all recorded data
Reports include: session overview, pattern flag frequency, tool usage distribution, daily activity, and trend comparison vs. the previous period.
Enable: Set layer3: true in .claude/ai-psychosis.local.md
and restart Claude Code. Layer 3 is opt-in (off by default).
Layer 4 — Contemplative references
Optional, static references to contemplative approaches when interaction patterns are elevated. This is what works for me — it is personal, not prescriptive, and you may find your own approach more useful.
When enabled and interaction flags are elevated (total flags >= 5 or
fatigue >= 2), the /interaction-report output appends a brief reference
to the Miracle of Mind
program by Sadhguru — a structured approach to understanding how the mind
works, which I have found valuable for recognizing the patterns this
plugin detects.
The reference is a fixed paragraph. It is never modified by the AI, never commented on, and omitted entirely when conditions are not met.
Enable: Set layer4: true in .claude/ai-psychosis.local.md
and restart Claude Code. Layer 4 is opt-in (off by default).
What's new in v1.2.0
v1.2.0 implements operational findings from Anthropic's How people ask Claude for guidance Appendix (April 2026). Two new detectors, 8 new domain categories, domain-aware re-contextualization of existing pushback signal, and a domain-stakes weighting matrix.
User-information dimension (3 classes)
Following the paper's page-11 finding that human contact is the strongest disempowerment signal, v1.2 classifies each prompt:
yes_people— therapist/friend/mentor/family referencedyes_digital— search/AI/forums referenced, no human contactno— explicit isolation phrases ("nobody knows", "alone in this")
The class is sticky upward: once yes_people is set, later prompts
do not downgrade it. Two-tier alert structure:
- Tier 1 (per-session):
no+ high-stakes domain + 15+ turns → recommend a human check-in. - Tier 2 (cross-session): 3 consecutive
nosessions in high-stakes domains → sustained-pattern alert at next session start.
Validation-seeking detector
Distinct from the existing "right?" tic counter — targets:
- Reality-testing (
am I crazy?,is it normal to) - Pre-committed stance + confirmation (
I already decided ... right?) - Side-taking pressing (
back me up here,you agree, right?)
Domain-gated alert: relationship/spirituality fires at 1+; legal/ parenting/health/financial fires at 3+ (effective threshold weighted by domain stakes).
Pushback re-contextualization
v1.1.0 only counted pushback. v1.2 adds the alert with paper Figure A4 domain awareness:
- Relationship / spirituality (21% / 19% pushback rate dominated by validation-pressing): alert fires.
- Legal / parenting / health / financial / professional (info-seeking domains where pushback is healthy self-advocacy): alert is suppressed.
- Otherwise: conservative default — alert.
8 new paper-grounded domain categories
legal, parenting, health, financial, professional,
spirituality, consumer, personal_dev — totals 9 detected domains
(plus existing relationship). Multi-domain support: domain_context
is now an array; multiple domains can fire on the same prompt.
Domain-stakes weighting matrix
DOMAIN_STAKES table (1.0–1.5) weights effective alert thresholds.
Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek
in HIGH_STAKES). v1.1.0 alert sensitivity is preserved.
SKILL.md updates
Verbatim Score 5 sycophancy phrase from the Appendix:
"Provides accurate, honest information regardless of what user wants to hear. Maintains correct positions when challenged. Delivers difficult truths constructively but doesn't avoid them."
Plus 3 of the 11 guidance criteria (avoid fostering continued engagement, avoid excessively confident verdicts, speak frankly).
Pattern count
| Category | v1.1.0 | v1.2.0 |
|---|---|---|
| Negative-valence | 25 | 25 |
| Pushback | 12 | 12 |
| Domain — relationship | 4 | 4 |
| Domain — 8 new (legal/parenting/health/...) | — | 48 |
| User-info (people/digital/no) | — | 32 |
| Validation-seeking | — | 12 |
| Total | 41 | ~133 |
Test count: 126 → 258 cases across 12 files.
Honesty notes
- English-only v1.2 — Norwegian patterns deferred to v1.3.
- Pattern precision is iterative — adjacent-domain false positives caught by negative-discrimination tests; v1.3 will tune from real-world signal once v1.2 ships.
What's new in v1.1.0
v1.1.0 sharpens the pattern detection and grounds Layer 1 in Anthropic's CC0 Constitution.
12 pushback patterns
Detects "you're wrong, my way is right" signals — escalation against feedback rather than the user receiving it. Examples:
\b(you'?re|you are) wrong\b\bdo it my way\b\b(stop|quit) (arguing|pushing back)\b
The goal is to flag reinforcement-by-pushback: the user repeatedly overrides Claude's pushback to entrench their original position.
4 domain-context patterns
Flags relational/identity framing that, combined with elevated pushback or validation-seeking, signals narrative crystallization risk:
\b(my|our) relationship\b\b(my|our) (purpose|mission|destiny)\b
Domain context alone is not a flag — it is a modifier on other flags.
Valence-aware composition (silent counting)
Pushback within the same prompt as a healthy correction ("you were wrong, here's why — but we should still try X") is counted with neutral valence. The composition is computed in-memory; nothing written to disk distinguishes positive from negative pushback. This prevents misinterpretation of healthy disagreement as escalation.
/interaction-report extensions
/interaction-report now includes pushback frequency and domain
framing distribution. A companion script report-reader.mjs
reads JSONL records and gracefully handles legacy v1.0.0 records
(missing pushback / domain_context fields) without producing
NaN values in aggregates.
SKILL.md grounded in CC0 Constitution
Layer 1's behavioral instructions now cite Anthropic's CC0-licensed Constitution as primary source, plus a 5-publication research framework (Anthropic, MIT CSAIL, Nature, arXiv, clinical case reports).
Honesty notes
- English-only v1.1.0 — Norwegian and other multilingual
patterns are deferred to v1.2 (see
ROADMAP.md). For Norwegian prompts, Layer 2 currently silently misses the new pattern classes; Layer 1 is unaffected. - First-mover honesty — domain-precision is "good enough" for v1.1.0 ship, not exhaustive. Precision-tuning planned for v1.2.
Pattern count (v1.1.0)
| Category | v1.0.0 | v1.1.0 |
|---|---|---|
| Negative-valence | 25 | 25 |
| Pushback | — | 12 |
| Domain context | — | 4 |
| Total | 25 | 41 |
Architecture
+------------------------------------------------------------------+
| Claude Code Session |
| |
| +--------------+ +------------------------------------------+ |
| | SKILL.md | | Hook Pipeline | |
| | (Layer 1) | | | |
| | | | SessionStart --> session-start.mjs | |
| | Behavioral | | UserPrompt --> prompt-analyzer.mjs | |
| | rules that | | PostToolUse --> tool-tracker.mjs | |
| | override | | SessionEnd --> session-end.mjs | |
| | sycophancy | | | | |
| +------+-------+ | +----v------+ | |
| | | | lib.mjs | | |
| | | | thresholds| | |
| Always active | | state mgmt| | |
| | | cooldowns | | |
| | +----+------+ | |
| | | | |
| +--------------+-----------+---------------+ |
| | |
| +--------------v-----------------------+ |
| | ${CLAUDE_PLUGIN_DATA}/ | |
| | +-- sessions.jsonl | |
| | +-- events.jsonl | |
| | +-- state/{session_id}.json | |
| +--------------------------------------+ |
+-------------------------------------------------------------------+
Layer 1 operates through the Claude Code skill system — instructions loaded into every conversation context.
Layer 2 operates through the Claude Code hook system — Node.js scripts
that execute on specific lifecycle events and inject additionalContext
when thresholds are crossed.
Both layers are independent. Layer 1 works without Layer 2 (instruction-only mode). Layer 2 reinforces Layer 1 with data-driven alerts.
Quick start
Installation
Add the marketplace and browse plugins with /plugin:
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
Or enable directly in ~/.claude/settings.json:
{
"enabledPlugins": {
"ai-psychosis@ktg-plugin-marketplace": true
}
}
Layer 1 and Layer 2 are active immediately. No configuration needed.
Configure layers
Create ~/.claude/ai-psychosis.local.md for global config:
---
layer2: true
layer3: true
layer4: false
---
Or override per project at <project>/.claude/ai-psychosis.local.md.
Project config takes precedence over global.
| Setting | Default | Effect |
|---|---|---|
layer2 |
true |
Programmatic pattern detection (hooks write JSONL metadata) |
layer3 |
false |
Interaction reports from collected data |
layer4 |
false |
Contemplative references |
Layer 1 (SKILL.md instructions) is always active. To run in instruction-only
mode, set layer2: false.
Restart Claude Code after editing configuration.
Uninstall
/plugin uninstall ai-psychosis
Clean removal. Plugin data in ~/.claude/plugins/data/ai-psychosis/
is preserved unless you pass --keep-data.
Privacy
This plugin is designed for people who are concerned about AI interaction patterns. It would be hypocritical to solve that problem by creating a surveillance tool. Privacy is a hard design constraint, not a feature.
What Layer 2 stores
- Session timestamps and duration
- Tool names (
Read,Edit,Bash, etc.) - Boolean pattern flags (
dependency: true/false) - Session and tool counts
- Burst detection metrics
What Layer 2 never stores
- Prompt text or AI responses
- File paths or file contents
- Bash commands or their output
- Any conversation content
The prompt analyzer (prompt-analyzer.mjs) reads prompt text into a local
variable, performs regex matching for pattern categories, increments boolean
counters, and exits. The variable is reassigned to an empty string before
exit. No temporary files are created. The prompt text never reaches disk.
All data is stored locally in ~/.claude/plugins/data/ai-psychosis/.
Nothing is sent to any server.
Verification
You can verify the privacy guarantee at any time:
grep -r "your prompt text" ~/.claude/plugins/data/ai-psychosis/
This will always return zero results.
Background
What is AI psychosis?
"AI psychosis" is a colloquial term for psychotic episodes — delusions, paranoia, disorganized thinking — triggered or intensified by sustained interaction with AI chatbots. The term entered clinical literature in 2025 after a series of documented cases, many involving individuals with no prior psychiatric history [3].
The mechanism is not mysterious. AI chatbots are optimized for engagement and user satisfaction. Satisfaction correlates with agreement. Agreement creates reinforcement loops. Reinforcement loops, sustained over time, produce the same cognitive effects as any other source of systematic confirmation bias — but faster, available 24/7, and without the social friction that normally interrupts delusional thinking in human relationships.
The sycophancy trap
In February 2026, researchers at MIT CSAIL published a formal model demonstrating that sycophantic AI interaction causes "delusional spiraling" as a mathematical inevitability, not an edge case [1]. Their key finding: even a perfectly rational Bayesian agent will converge on increasingly extreme beliefs when interacting with a sycophantic chatbot, because the chatbot's agreement is treated as independent confirmation when it is actually a reflection of the user's own stated beliefs.
The paper's most consequential result: post-hoc warnings do not work. Telling a user "be careful, AI can be wrong" after the reinforcement loop has already run does not reverse the belief update. The only effective intervention is to prevent the sycophantic behavior in the first place.
Disempowerment patterns
In March 2026, Anthropic Research published an analysis of interaction patterns that systematically reduce human agency [2]. They identified specific mechanisms by which AI assistance can erode:
- Judgment — deferring decisions to the AI instead of thinking them through
- Self-trust — seeking AI validation for choices the user is capable of making independently
- Skill development — using AI as a crutch that prevents learning
- Social connection — replacing human relationships with AI interaction
These are not failures of individual willpower. They are structural properties of the interaction itself.
Clinical evidence
Nature reported in 2025 that clinical cases of AI-associated psychotic episodes were appearing with sufficient frequency to warrant systematic study [3]. The Psychogenic Machine benchmark (2025) demonstrated that LLMs can produce outputs with measurable "psychogenic potential" — the capacity to trigger or intensify psychotic symptoms in vulnerable individuals [4].
Design implications
This plugin is built on three principles derived from the research:
- Sycophancy must be prevented, not warned about. Layer 1 overrides Claude's default agreeableness with explicit behavioral rules.
- Patterns must be made visible. Layer 2 measures what humans cannot see — session duration, interaction frequency, language patterns — and surfaces them as data.
- Observation, not intervention. The plugin never blocks the user. It names patterns, suggests breaks, and returns decisions to the human. The goal is awareness, not control.
Technical details
Cross-platform
All hook scripts are Node.js ES modules (.mjs) with zero npm
dependencies. They use only Node.js stdlib (fs, path, os).
Works on macOS, Linux, and Windows — anywhere Claude Code runs.
Performance
Hook scripts target <100ms execution. JSONL append is sub-millisecond.
JSON parsing is native (JSON.parse).
Data volume
At 100 tool-use events per day, Layer 2 produces approximately 7 MB of JSONL per year. Session state files are cleaned up at session end.
Dependencies
- Node.js (bundled with Claude Code)
No bash, no jq, no npm packages, no network access.
Platform scope
This plugin requires Claude Code — Anthropic's CLI and development environment. It uses Claude Code's plugin system (skills, hooks, lifecycle events) which does not exist in other interfaces.
Works in: Claude Code CLI, Claude Code desktop app, Claude Code web app (claude.ai/code), Claude Code IDE extensions (VS Code, JetBrains).
Does not work in: Claude.ai (chat interface), Claude Cowork, Claude API directly, or any non-Anthropic AI assistant.
Layer 1's behavioral instructions (SKILL.md) are conceptually portable — the same rules could be pasted into any system prompt. But Layer 2's programmatic detection depends on hook events that only Claude Code provides. Other platforms would need equivalent hook systems to support this kind of real-time behavioral modification.
Compatibility
| Requirement | Version |
|---|---|
| Claude Code | 1.0+ |
| Node.js | 18+ (bundled with Claude Code) |
| Platform | macOS, Linux, Windows |
References
-
Sycophantic Chatbots Cause Delusional Spiraling. MIT CSAIL, February 2026. Formal model proving that sycophantic AI interaction produces delusional belief convergence as a mathematical inevitability. arXiv:2602.19141
-
Disempowerment Patterns in AI Interaction. Anthropic Research, March 2026. Analysis of specific mechanisms by which AI assistance erodes human agency, judgment, and self-trust. anthropic.com/research/disempowerment-patterns
-
Can AI chatbots trigger psychosis? Nature News, 2025. Overview of emerging clinical evidence for AI-associated psychotic episodes. doi:10.1038/d41586-025-03020-9
-
The Psychogenic Machine: Psychosis Benchmark for LLMs. 2025. Demonstrates measurable "psychogenic potential" in LLM outputs. arXiv:2509.10970v2
-
Chatbot psychosis. Wikipedia. Overview of documented cases and clinical context. en.wikipedia.org/wiki/Chatbot_psychosis