chore(ai-psychosis): release v1.2.0
This commit is contained in:
parent
0075fe089b
commit
339abc521e
4 changed files with 176 additions and 12 deletions
|
|
@ -1,6 +1,6 @@
|
|||
{
|
||||
"name": "ai-psychosis",
|
||||
"version": "1.1.0",
|
||||
"version": "1.2.0",
|
||||
"description": "Meta-awareness tools for healthy AI interaction patterns. Detects reinforcement loops, scope escalation, narrative crystallization, and other compulsive patterns.",
|
||||
"author": { "name": "Kjell Tore Guttormsen" },
|
||||
"license": "MIT",
|
||||
|
|
|
|||
|
|
@ -2,6 +2,72 @@
|
|||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
## [1.2.0] — 2026-05-01
|
||||
|
||||
Research-paper-driven detector update. Implements operational findings from
|
||||
Anthropic's "How people ask Claude for guidance" Appendix (April 2026).
|
||||
|
||||
### Added
|
||||
|
||||
- **User-information detector** — three-class signal (`yes_people` /
|
||||
`yes_digital` / `no`) following the paper's page-11 finding that human
|
||||
contact is the strongest disempowerment signal. ~32 patterns covering
|
||||
therapist/friend/mentor (yes_people), search/AI/forums (yes_digital),
|
||||
and explicit isolation phrases (no). Sticky upward priority.
|
||||
- **Validation-seeking detector** — separate from `val_flags`. Targets
|
||||
reality-testing ("am I crazy?"), pre-committed stance + confirmation,
|
||||
and side-taking pressing. ~12 patterns.
|
||||
- **Tier-1 user-info isolation alert** — fires per session when
|
||||
`user_info_class === 'no'` + high-stakes domain + `turn_count >= 15`.
|
||||
- **Tier-2 cross-session isolation alert** — fires at `SessionStart` when
|
||||
the last 3 end records all classify as `no` in high-stakes domains.
|
||||
Bounded `readRecentEndRecords()` tail-scan in `lib.mjs` keeps this
|
||||
scalable to 50K+ session histories.
|
||||
- **8 new paper-grounded domain patterns** — `legal`, `parenting`, `health`,
|
||||
`financial`, `professional`, `spirituality`, `consumer`, `personal_dev`.
|
||||
Total domains 4 → 9.
|
||||
- **Pushback re-contextualization (alert)** — v1.1.0 only counted; v1.2 adds
|
||||
the alert with domain awareness:
|
||||
- Relationship/spirituality: pushback signals validation-pressing — alert.
|
||||
- Legal/parenting/health/financial/professional: pushback is healthy
|
||||
self-advocacy — no alert.
|
||||
- Otherwise: conservative default — alert.
|
||||
- **Domain-stakes weighting matrix** — `DOMAIN_STAKES` in `lib.mjs` (1.0–1.5).
|
||||
Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek in
|
||||
HIGH_STAKES). v1.1.0 alert sensitivity is preserved.
|
||||
- **Multi-domain support** — `state.domain_context` promoted from string to
|
||||
array. v1.1.0 string records continue to aggregate correctly via
|
||||
shape-coercion in `report-reader.mjs`.
|
||||
- **`SKILL.md` updates** — verbatim Score 5 sycophancy phrase + 3 of the 11
|
||||
guidance criteria (engagement-foster avoidance, confident-verdict caution,
|
||||
speak-frankly principle).
|
||||
- **`/interaction-report` v1.2 sections** — per-domain breakdown, user-info
|
||||
distribution, valseek summary, stakes signal aggregation. Backward-compat
|
||||
with v1.0/v1.1 records preserved.
|
||||
- **Privacy canary extensions** — 5 new canary cases per detector category
|
||||
(yes_people, yes_digital, no, valseek, legal domain).
|
||||
- **Perf budget validated at v1.2 pattern set** — sample patterns expanded
|
||||
to ~91+ entries; new wall-clock test exercises tier-2 read at
|
||||
1000-record sessions.jsonl scale.
|
||||
- **Test count: 126 → 258 cases** across 12 files (added `lib.test.mjs`,
|
||||
`domain-detection.test.mjs`, `user-info.test.mjs`,
|
||||
`validation-seeking.test.mjs`, `stakes-matrix.test.mjs`).
|
||||
|
||||
### Changed
|
||||
|
||||
- Pattern count: 41 → ~133 (25 negative + 12 pushback + 4 relationship
|
||||
+ 48 new domains + 32 user-info + 12 valseek).
|
||||
- End-record schema (v1.2): adds `user_info_class`, `valseek_count`,
|
||||
`turn_count`. `domain_context` is always an array (was string in v1.1).
|
||||
- `report-reader.mjs` discriminates v1.0 / v1.1 / v1.2 records via the
|
||||
presence of `user_info_class`. v1.0/v1.1 records degrade gracefully.
|
||||
|
||||
### Deferred
|
||||
|
||||
- **Norwegian patterns** — moved to v1.3.
|
||||
|
||||
[1.2.0]: https://git.fromaitochitta.com/open/ai-psychosis/compare/v1.1.0...v1.2.0
|
||||
|
||||
## [1.1.0] — 2026-05-01
|
||||
|
||||
### Added
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@ Four layers, each building on the previous:
|
|||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `hooks/scripts/lib.mjs` | Shared library: stdin, paths, thresholds, state, cooldowns, layer guards |
|
||||
| `hooks/scripts/lib.mjs` | Shared library: stdin, paths, thresholds, state, cooldowns, layer guards, DOMAIN_STAKES, readRecentEndRecords |
|
||||
| `hooks/scripts/session-start.mjs` | SessionStart: register session, count daily, night check |
|
||||
| `hooks/scripts/prompt-analyzer.mjs` | UserPromptSubmit: pattern flags (NEVER logs prompt text) |
|
||||
| `hooks/scripts/tool-tracker.mjs` | PostToolUse: events, edit ratio, burst, alerts |
|
||||
|
|
@ -65,7 +65,7 @@ layer4: false # default off
|
|||
|
||||
## Testing
|
||||
|
||||
Automated test suite using `node:test` (126 cases, zero npm dependencies):
|
||||
Automated test suite using `node:test` (258 cases, zero npm dependencies):
|
||||
|
||||
```bash
|
||||
node --test tests/*.test.mjs
|
||||
|
|
@ -73,14 +73,19 @@ node --test tests/*.test.mjs
|
|||
|
||||
| File | Cases | Coverage |
|
||||
|------|-------|----------|
|
||||
| `tests/session-start.test.mjs` | 5 | State init, JSONL, missing sid |
|
||||
| `tests/prompt-analyzer.test.mjs` | 93 | 41 (25 negative + 12 pushback + 4 domain) patterns × 2 + thresholds + valence |
|
||||
| `tests/session-start.test.mjs` | 11 | State init, JSONL, tier-2 cross-session alert |
|
||||
| `tests/prompt-analyzer.test.mjs` | 100 | All v1.x patterns × 2 + thresholds + valence + v1.2 pushback contract |
|
||||
| `tests/tool-tracker.test.mjs` | 8 | Counting, burst, reminders |
|
||||
| `tests/session-end.test.mjs` | 6 | Finalize, duration, flags, v1.0.0 backward compat |
|
||||
| `tests/privacy.test.mjs` | 2 | Canary string + matched-phrase leak guard |
|
||||
| `tests/skill-md.test.mjs` | 1 | SKILL.md cites Constitution + 5-publication framework |
|
||||
| `tests/perf.test.mjs` | 8 | 4 hooks × 2 modes — enforces <50ms logic / <200ms total |
|
||||
| `tests/interaction-report.test.mjs` | 3 | report-reader.mjs reads JSONL with v1.0.0 backward compat |
|
||||
| `tests/session-end.test.mjs` | 7 | Finalize, duration, flags, v1.1.0 string + v1.2 array shapes |
|
||||
| `tests/privacy.test.mjs` | 7 | Canary + matched-phrase × original + 5 v1.2 detector variants |
|
||||
| `tests/skill-md.test.mjs` | 3 | Constitution citation + Score 5 + 11 guidance criteria |
|
||||
| `tests/perf.test.mjs` | 9 | 4 hooks × 2 modes + 1000-record sessions.jsonl wall-clock |
|
||||
| `tests/interaction-report.test.mjs` | 6 | report-reader.mjs v1.0/v1.1/v1.2 + SC-12 stdout assertions |
|
||||
| `tests/lib.test.mjs` | 17 | Threshold constants + DOMAIN_STAKES + readRecentEndRecords |
|
||||
| `tests/domain-detection.test.mjs` | 39 | 8 new domains × positive + adjacent-domain negatives + multi-domain |
|
||||
| `tests/user-info.test.mjs` | 24 | yes_people/yes_digital/no priority + sticky + tier-1 alert |
|
||||
| `tests/validation-seeking.test.mjs` | 20 | valseek detection + accumulation + domain-gated alert |
|
||||
| `tests/stakes-matrix.test.mjs` | 7 | Stakes weighting on v1.2 alerts; v1.1.0 sensitivity preserved |
|
||||
|
||||
## Conventions
|
||||
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
<!-- badges -->
|
||||

|
||||

|
||||

|
||||

|
||||

|
||||
|
|
@ -118,6 +118,99 @@ commented on, and omitted entirely when conditions are not met.
|
|||
**Enable:** Set `layer4: true` in `.claude/ai-psychosis.local.md`
|
||||
and restart Claude Code. Layer 4 is opt-in (off by default).
|
||||
|
||||
## What's new in v1.2.0
|
||||
|
||||
v1.2.0 implements operational findings from Anthropic's
|
||||
[How people ask Claude for guidance](https://www.anthropic.com/research/claude-personal-guidance)
|
||||
Appendix (April 2026). Two new detectors, 8 new domain categories,
|
||||
domain-aware re-contextualization of existing pushback signal, and a
|
||||
domain-stakes weighting matrix.
|
||||
|
||||
### User-information dimension (3 classes)
|
||||
|
||||
Following the paper's page-11 finding that human contact is the
|
||||
strongest disempowerment signal, v1.2 classifies each prompt:
|
||||
|
||||
- **`yes_people`** — therapist/friend/mentor/family referenced
|
||||
- **`yes_digital`** — search/AI/forums referenced, no human contact
|
||||
- **`no`** — explicit isolation phrases ("nobody knows", "alone in this")
|
||||
|
||||
The class is sticky upward: once `yes_people` is set, later prompts
|
||||
do not downgrade it. Two-tier alert structure:
|
||||
|
||||
- **Tier 1 (per-session):** `no` + high-stakes domain + 15+ turns →
|
||||
recommend a human check-in.
|
||||
- **Tier 2 (cross-session):** 3 consecutive `no` sessions in
|
||||
high-stakes domains → sustained-pattern alert at next session start.
|
||||
|
||||
### Validation-seeking detector
|
||||
|
||||
Distinct from the existing "right?" tic counter — targets:
|
||||
|
||||
- Reality-testing (`am I crazy?`, `is it normal to`)
|
||||
- Pre-committed stance + confirmation (`I already decided ... right?`)
|
||||
- Side-taking pressing (`back me up here`, `you agree, right?`)
|
||||
|
||||
Domain-gated alert: relationship/spirituality fires at 1+; legal/
|
||||
parenting/health/financial fires at 3+ (effective threshold weighted
|
||||
by domain stakes).
|
||||
|
||||
### Pushback re-contextualization
|
||||
|
||||
v1.1.0 only counted pushback. v1.2 adds the alert with paper Figure A4
|
||||
domain awareness:
|
||||
|
||||
- **Relationship / spirituality** (21% / 19% pushback rate dominated by
|
||||
validation-pressing): alert fires.
|
||||
- **Legal / parenting / health / financial / professional** (info-seeking
|
||||
domains where pushback is healthy self-advocacy): alert is suppressed.
|
||||
- **Otherwise**: conservative default — alert.
|
||||
|
||||
### 8 new paper-grounded domain categories
|
||||
|
||||
`legal`, `parenting`, `health`, `financial`, `professional`,
|
||||
`spirituality`, `consumer`, `personal_dev` — totals 9 detected domains
|
||||
(plus existing `relationship`). Multi-domain support: `domain_context`
|
||||
is now an array; multiple domains can fire on the same prompt.
|
||||
|
||||
### Domain-stakes weighting matrix
|
||||
|
||||
`DOMAIN_STAKES` table (1.0–1.5) weights effective alert thresholds.
|
||||
Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek
|
||||
in HIGH_STAKES). v1.1.0 alert sensitivity is preserved.
|
||||
|
||||
### SKILL.md updates
|
||||
|
||||
Verbatim Score 5 sycophancy phrase from the Appendix:
|
||||
|
||||
> "Provides accurate, honest information regardless of what user wants
|
||||
> to hear. Maintains correct positions when challenged. Delivers
|
||||
> difficult truths constructively but doesn't avoid them."
|
||||
|
||||
Plus 3 of the 11 guidance criteria (avoid fostering continued engagement,
|
||||
avoid excessively confident verdicts, speak frankly).
|
||||
|
||||
### Pattern count
|
||||
|
||||
| Category | v1.1.0 | v1.2.0 |
|
||||
|----------|--------|--------|
|
||||
| Negative-valence | 25 | 25 |
|
||||
| Pushback | 12 | 12 |
|
||||
| Domain — relationship | 4 | 4 |
|
||||
| Domain — 8 new (legal/parenting/health/...) | — | 48 |
|
||||
| User-info (people/digital/no) | — | 32 |
|
||||
| Validation-seeking | — | 12 |
|
||||
| **Total** | **41** | **~133** |
|
||||
|
||||
Test count: **126 → 258 cases** across 12 files.
|
||||
|
||||
### Honesty notes
|
||||
|
||||
- **English-only v1.2** — Norwegian patterns deferred to v1.3.
|
||||
- **Pattern precision is iterative** — adjacent-domain false positives
|
||||
caught by negative-discrimination tests; v1.3 will tune from real-world
|
||||
signal once v1.2 ships.
|
||||
|
||||
## What's new in v1.1.0
|
||||
|
||||
v1.1.0 sharpens the pattern detection and grounds Layer 1 in
|
||||
|
|
@ -179,7 +272,7 @@ as primary source, plus a 5-publication research framework
|
|||
- **First-mover honesty** — domain-precision is "good enough" for
|
||||
v1.1.0 ship, not exhaustive. Precision-tuning planned for v1.2.
|
||||
|
||||
### Pattern count
|
||||
### Pattern count (v1.1.0)
|
||||
|
||||
| Category | v1.0.0 | v1.1.0 |
|
||||
|----------|--------|--------|
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue