chore(ai-psychosis): release v1.2.0
This commit is contained in:
parent
0075fe089b
commit
339abc521e
4 changed files with 176 additions and 12 deletions
|
|
@ -2,6 +2,72 @@
|
|||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
## [1.2.0] — 2026-05-01
|
||||
|
||||
Research-paper-driven detector update. Implements operational findings from
|
||||
Anthropic's "How people ask Claude for guidance" Appendix (April 2026).
|
||||
|
||||
### Added
|
||||
|
||||
- **User-information detector** — three-class signal (`yes_people` /
|
||||
`yes_digital` / `no`) following the paper's page-11 finding that human
|
||||
contact is the strongest disempowerment signal. ~32 patterns covering
|
||||
therapist/friend/mentor (yes_people), search/AI/forums (yes_digital),
|
||||
and explicit isolation phrases (no). Sticky upward priority.
|
||||
- **Validation-seeking detector** — separate from `val_flags`. Targets
|
||||
reality-testing ("am I crazy?"), pre-committed stance + confirmation,
|
||||
and side-taking pressing. ~12 patterns.
|
||||
- **Tier-1 user-info isolation alert** — fires per session when
|
||||
`user_info_class === 'no'` + high-stakes domain + `turn_count >= 15`.
|
||||
- **Tier-2 cross-session isolation alert** — fires at `SessionStart` when
|
||||
the last 3 end records all classify as `no` in high-stakes domains.
|
||||
Bounded `readRecentEndRecords()` tail-scan in `lib.mjs` keeps this
|
||||
scalable to 50K+ session histories.
|
||||
- **8 new paper-grounded domain patterns** — `legal`, `parenting`, `health`,
|
||||
`financial`, `professional`, `spirituality`, `consumer`, `personal_dev`.
|
||||
Total domains 4 → 9.
|
||||
- **Pushback re-contextualization (alert)** — v1.1.0 only counted; v1.2 adds
|
||||
the alert with domain awareness:
|
||||
- Relationship/spirituality: pushback signals validation-pressing — alert.
|
||||
- Legal/parenting/health/financial/professional: pushback is healthy
|
||||
self-advocacy — no alert.
|
||||
- Otherwise: conservative default — alert.
|
||||
- **Domain-stakes weighting matrix** — `DOMAIN_STAKES` in `lib.mjs` (1.0–1.5).
|
||||
Applied ONLY to new v1.2 alerts (pushback in HIGH_SYCOPHANCY, valseek in
|
||||
HIGH_STAKES). v1.1.0 alert sensitivity is preserved.
|
||||
- **Multi-domain support** — `state.domain_context` promoted from string to
|
||||
array. v1.1.0 string records continue to aggregate correctly via
|
||||
shape-coercion in `report-reader.mjs`.
|
||||
- **`SKILL.md` updates** — verbatim Score 5 sycophancy phrase + 3 of the 11
|
||||
guidance criteria (engagement-foster avoidance, confident-verdict caution,
|
||||
speak-frankly principle).
|
||||
- **`/interaction-report` v1.2 sections** — per-domain breakdown, user-info
|
||||
distribution, valseek summary, stakes signal aggregation. Backward-compat
|
||||
with v1.0/v1.1 records preserved.
|
||||
- **Privacy canary extensions** — 5 new canary cases per detector category
|
||||
(yes_people, yes_digital, no, valseek, legal domain).
|
||||
- **Perf budget validated at v1.2 pattern set** — sample patterns expanded
|
||||
to ~91+ entries; new wall-clock test exercises tier-2 read at
|
||||
1000-record sessions.jsonl scale.
|
||||
- **Test count: 126 → 258 cases** across 12 files (added `lib.test.mjs`,
|
||||
`domain-detection.test.mjs`, `user-info.test.mjs`,
|
||||
`validation-seeking.test.mjs`, `stakes-matrix.test.mjs`).
|
||||
|
||||
### Changed
|
||||
|
||||
- Pattern count: 41 → ~133 (25 negative + 12 pushback + 4 relationship
|
||||
+ 48 new domains + 32 user-info + 12 valseek).
|
||||
- End-record schema (v1.2): adds `user_info_class`, `valseek_count`,
|
||||
`turn_count`. `domain_context` is always an array (was string in v1.1).
|
||||
- `report-reader.mjs` discriminates v1.0 / v1.1 / v1.2 records via the
|
||||
presence of `user_info_class`. v1.0/v1.1 records degrade gracefully.
|
||||
|
||||
### Deferred
|
||||
|
||||
- **Norwegian patterns** — moved to v1.3.
|
||||
|
||||
[1.2.0]: https://git.fromaitochitta.com/open/ai-psychosis/compare/v1.1.0...v1.2.0
|
||||
|
||||
## [1.1.0] — 2026-05-01
|
||||
|
||||
### Added
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue