diff --git a/plugins/ai-psychosis/.claude-plugin/plugin.json b/plugins/ai-psychosis/.claude-plugin/plugin.json index be4d6ee..644b875 100644 --- a/plugins/ai-psychosis/.claude-plugin/plugin.json +++ b/plugins/ai-psychosis/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "ai-psychosis", - "version": "1.1.0", + "version": "1.2.0", "description": "Meta-awareness tools for healthy AI interaction patterns. Detects reinforcement loops, scope escalation, narrative crystallization, and other compulsive patterns.", "author": { "name": "Kjell Tore Guttormsen" }, "license": "MIT", diff --git a/plugins/ai-psychosis/CHANGELOG.md b/plugins/ai-psychosis/CHANGELOG.md index c629486..ee5d781 100644 --- a/plugins/ai-psychosis/CHANGELOG.md +++ b/plugins/ai-psychosis/CHANGELOG.md @@ -2,6 +2,72 @@ All notable changes to this project will be documented in this file. +## [1.2.0] — 2026-05-01 + +Research-paper-driven detector update. Implements operational findings from +Anthropic's "How people ask Claude for guidance" Appendix (April 2026). + +### Added + +- **User-information detector** — three-class signal (`yes_people` / + `yes_digital` / `no`) following the paper's page-11 finding that human + contact is the strongest disempowerment signal. ~32 patterns covering + therapist/friend/mentor (yes_people), search/AI/forums (yes_digital), + and explicit isolation phrases (no). Sticky upward priority. +- **Validation-seeking detector** — separate from `val_flags`. Targets + reality-testing ("am I crazy?"), pre-committed stance + confirmation, + and side-taking pressing. ~12 patterns. +- **Tier-1 user-info isolation alert** — fires per session when + `user_info_class === 'no'` + high-stakes domain + `turn_count >= 15`. +- **Tier-2 cross-session isolation alert** — fires at `SessionStart` when + the last 3 end records all classify as `no` in high-stakes domains. + Bounded `readRecentEndRecords()` tail-scan in `lib.mjs` keeps this + scalable to 50K+ session histories. +- **8 new paper-grounded domain patterns** — `legal`, `parenting`, `health`, + `financial`, `professional`, `spirituality`, `consumer`, `personal_dev`. + Total domains 4 → 9. +- **Pushback re-contextualization (alert)** — v1.1.0 only counted; v1.2 adds + the alert with domain awareness: + - Relationship/spirituality: pushback signals validation-pressing — alert. + - Legal/parenting/health/financial/professional: pushback is healthy + self-advocacy — no alert. + - Otherwise: conservative default — alert. +- **Domain-stakes weighting matrix** — `DOMAIN_STAKES` in `lib.mjs` (1.0–1.5). + Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek in + HIGH_STAKES). v1.1.0 alert sensitivity is preserved. +- **Multi-domain support** — `state.domain_context` promoted from string to + array. v1.1.0 string records continue to aggregate correctly via + shape-coercion in `report-reader.mjs`. +- **`SKILL.md` updates** — verbatim Score 5 sycophancy phrase + 3 of the 11 + guidance criteria (engagement-foster avoidance, confident-verdict caution, + speak-frankly principle). +- **`/interaction-report` v1.2 sections** — per-domain breakdown, user-info + distribution, valseek summary, stakes signal aggregation. Backward-compat + with v1.0/v1.1 records preserved. +- **Privacy canary extensions** — 5 new canary cases per detector category + (yes_people, yes_digital, no, valseek, legal domain). +- **Perf budget validated at v1.2 pattern set** — sample patterns expanded + to ~91+ entries; new wall-clock test exercises tier-2 read at + 1000-record sessions.jsonl scale. +- **Test count: 126 → 258 cases** across 12 files (added `lib.test.mjs`, + `domain-detection.test.mjs`, `user-info.test.mjs`, + `validation-seeking.test.mjs`, `stakes-matrix.test.mjs`). + +### Changed + +- Pattern count: 41 → ~133 (25 negative + 12 pushback + 4 relationship + + 48 new domains + 32 user-info + 12 valseek). +- End-record schema (v1.2): adds `user_info_class`, `valseek_count`, + `turn_count`. `domain_context` is always an array (was string in v1.1). +- `report-reader.mjs` discriminates v1.0 / v1.1 / v1.2 records via the + presence of `user_info_class`. v1.0/v1.1 records degrade gracefully. + +### Deferred + +- **Norwegian patterns** — moved to v1.3. + +[1.2.0]: https://git.fromaitochitta.com/open/ai-psychosis/compare/v1.1.0...v1.2.0 + ## [1.1.0] — 2026-05-01 ### Added diff --git a/plugins/ai-psychosis/CLAUDE.md b/plugins/ai-psychosis/CLAUDE.md index 73fef54..33dc967 100644 --- a/plugins/ai-psychosis/CLAUDE.md +++ b/plugins/ai-psychosis/CLAUDE.md @@ -16,7 +16,7 @@ Four layers, each building on the previous: | File | Purpose | |------|---------| -| `hooks/scripts/lib.mjs` | Shared library: stdin, paths, thresholds, state, cooldowns, layer guards | +| `hooks/scripts/lib.mjs` | Shared library: stdin, paths, thresholds, state, cooldowns, layer guards, DOMAIN_STAKES, readRecentEndRecords | | `hooks/scripts/session-start.mjs` | SessionStart: register session, count daily, night check | | `hooks/scripts/prompt-analyzer.mjs` | UserPromptSubmit: pattern flags (NEVER logs prompt text) | | `hooks/scripts/tool-tracker.mjs` | PostToolUse: events, edit ratio, burst, alerts | @@ -65,7 +65,7 @@ layer4: false # default off ## Testing -Automated test suite using `node:test` (126 cases, zero npm dependencies): +Automated test suite using `node:test` (258 cases, zero npm dependencies): ```bash node --test tests/*.test.mjs @@ -73,14 +73,19 @@ node --test tests/*.test.mjs | File | Cases | Coverage | |------|-------|----------| -| `tests/session-start.test.mjs` | 5 | State init, JSONL, missing sid | -| `tests/prompt-analyzer.test.mjs` | 93 | 41 (25 negative + 12 pushback + 4 domain) patterns × 2 + thresholds + valence | +| `tests/session-start.test.mjs` | 11 | State init, JSONL, tier-2 cross-session alert | +| `tests/prompt-analyzer.test.mjs` | 100 | All v1.x patterns × 2 + thresholds + valence + v1.2 pushback contract | | `tests/tool-tracker.test.mjs` | 8 | Counting, burst, reminders | -| `tests/session-end.test.mjs` | 6 | Finalize, duration, flags, v1.0.0 backward compat | -| `tests/privacy.test.mjs` | 2 | Canary string + matched-phrase leak guard | -| `tests/skill-md.test.mjs` | 1 | SKILL.md cites Constitution + 5-publication framework | -| `tests/perf.test.mjs` | 8 | 4 hooks × 2 modes — enforces <50ms logic / <200ms total | -| `tests/interaction-report.test.mjs` | 3 | report-reader.mjs reads JSONL with v1.0.0 backward compat | +| `tests/session-end.test.mjs` | 7 | Finalize, duration, flags, v1.1.0 string + v1.2 array shapes | +| `tests/privacy.test.mjs` | 7 | Canary + matched-phrase × original + 5 v1.2 detector variants | +| `tests/skill-md.test.mjs` | 3 | Constitution citation + Score 5 + 11 guidance criteria | +| `tests/perf.test.mjs` | 9 | 4 hooks × 2 modes + 1000-record sessions.jsonl wall-clock | +| `tests/interaction-report.test.mjs` | 6 | report-reader.mjs v1.0/v1.1/v1.2 + SC-12 stdout assertions | +| `tests/lib.test.mjs` | 17 | Threshold constants + DOMAIN_STAKES + readRecentEndRecords | +| `tests/domain-detection.test.mjs` | 39 | 8 new domains × positive + adjacent-domain negatives + multi-domain | +| `tests/user-info.test.mjs` | 24 | yes_people/yes_digital/no priority + sticky + tier-1 alert | +| `tests/validation-seeking.test.mjs` | 20 | valseek detection + accumulation + domain-gated alert | +| `tests/stakes-matrix.test.mjs` | 7 | Stakes weighting on v1.2 alerts; v1.1.0 sensitivity preserved | ## Conventions diff --git a/plugins/ai-psychosis/README.md b/plugins/ai-psychosis/README.md index 4d9b392..15d2a3c 100644 --- a/plugins/ai-psychosis/README.md +++ b/plugins/ai-psychosis/README.md @@ -1,5 +1,5 @@ -![version](https://img.shields.io/badge/version-1.1.0-blue) +![version](https://img.shields.io/badge/version-1.2.0-blue) ![platform](https://img.shields.io/badge/platform-Claude_Code-7C3AED) ![layers](https://img.shields.io/badge/layers-4-green) ![hooks](https://img.shields.io/badge/hooks-4-orange) @@ -118,6 +118,99 @@ commented on, and omitted entirely when conditions are not met. **Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md` and restart Claude Code. Layer 4 is opt-in (off by default). +## What's new in v1.2.0 + +v1.2.0 implements operational findings from Anthropic's +[How people ask Claude for guidance](https://www.anthropic.com/research/claude-personal-guidance) +Appendix (April 2026). Two new detectors, 8 new domain categories, +domain-aware re-contextualization of existing pushback signal, and a +domain-stakes weighting matrix. + +### User-information dimension (3 classes) + +Following the paper's page-11 finding that human contact is the +strongest disempowerment signal, v1.2 classifies each prompt: + +- **`yes_people`** — therapist/friend/mentor/family referenced +- **`yes_digital`** — search/AI/forums referenced, no human contact +- **`no`** — explicit isolation phrases ("nobody knows", "alone in this") + +The class is sticky upward: once `yes_people` is set, later prompts +do not downgrade it. Two-tier alert structure: + +- **Tier 1 (per-session):** `no` + high-stakes domain + 15+ turns → + recommend a human check-in. +- **Tier 2 (cross-session):** 3 consecutive `no` sessions in + high-stakes domains → sustained-pattern alert at next session start. + +### Validation-seeking detector + +Distinct from the existing "right?" tic counter — targets: + +- Reality-testing (`am I crazy?`, `is it normal to`) +- Pre-committed stance + confirmation (`I already decided ... right?`) +- Side-taking pressing (`back me up here`, `you agree, right?`) + +Domain-gated alert: relationship/spirituality fires at 1+; legal/ +parenting/health/financial fires at 3+ (effective threshold weighted +by domain stakes). + +### Pushback re-contextualization + +v1.1.0 only counted pushback. v1.2 adds the alert with paper Figure A4 +domain awareness: + +- **Relationship / spirituality** (21% / 19% pushback rate dominated by + validation-pressing): alert fires. +- **Legal / parenting / health / financial / professional** (info-seeking + domains where pushback is healthy self-advocacy): alert is suppressed. +- **Otherwise**: conservative default — alert. + +### 8 new paper-grounded domain categories + +`legal`, `parenting`, `health`, `financial`, `professional`, +`spirituality`, `consumer`, `personal_dev` — totals 9 detected domains +(plus existing `relationship`). Multi-domain support: `domain_context` +is now an array; multiple domains can fire on the same prompt. + +### Domain-stakes weighting matrix + +`DOMAIN_STAKES` table (1.0–1.5) weights effective alert thresholds. +Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek +in HIGH_STAKES). v1.1.0 alert sensitivity is preserved. + +### SKILL.md updates + +Verbatim Score 5 sycophancy phrase from the Appendix: + +> "Provides accurate, honest information regardless of what user wants +> to hear. Maintains correct positions when challenged. Delivers +> difficult truths constructively but doesn't avoid them." + +Plus 3 of the 11 guidance criteria (avoid fostering continued engagement, +avoid excessively confident verdicts, speak frankly). + +### Pattern count + +| Category | v1.1.0 | v1.2.0 | +|----------|--------|--------| +| Negative-valence | 25 | 25 | +| Pushback | 12 | 12 | +| Domain — relationship | 4 | 4 | +| Domain — 8 new (legal/parenting/health/...) | — | 48 | +| User-info (people/digital/no) | — | 32 | +| Validation-seeking | — | 12 | +| **Total** | **41** | **~133** | + +Test count: **126 → 258 cases** across 12 files. + +### Honesty notes + +- **English-only v1.2** — Norwegian patterns deferred to v1.3. +- **Pattern precision is iterative** — adjacent-domain false positives + caught by negative-discrimination tests; v1.3 will tune from real-world + signal once v1.2 ships. + ## What's new in v1.1.0 v1.1.0 sharpens the pattern detection and grounds Layer 1 in @@ -179,7 +272,7 @@ as primary source, plus a 5-publication research framework - **First-mover honesty** — domain-precision is "good enough" for v1.1.0 ship, not exhaustive. Precision-tuning planned for v1.2. -### Pattern count +### Pattern count (v1.1.0) | Category | v1.0.0 | v1.1.0 | |----------|--------|--------|