ktg-plugin-marketplace

open/ktg-plugin-marketplace

Fork 0

Commit graph

Author	SHA1	Message	Date
Kjell Tore Guttormsen	6cef80c640	feat(unicode): E1 — extend hidden-Unicode detection to PUA-A and PUA-B Critical-review §4 E1 finding: pre-v7.2.0 the Unicode-stego detector (`containsUnicodeTags`) covered only U+E0001-E007F (Tag block). Private Use Areas — also invisible in most terminals and surviving normalization — were not detected. Attackers could encode payloads in PUA codepoints that pass through `scanForInjection` undetected. Coverage extended to: - U+E0001-E007F Unicode Tag block (existing — DeepMind kat. 1) - U+F0000-FFFFD Supplementary PUA-A (NEW — E1) - U+100000-10FFFD Supplementary PUA-B (NEW — E1) Detection-only for PUA: PUA characters have NO standard ASCII mapping, so `decodeUnicodeTags` leaves them unchanged. Detection alone is sufficient — `scanForInjection` emits HIGH on any presence, regardless of decoded content. Function name `containsUnicodeTags` preserved for back-compat. All existing call sites (injection-patterns.mjs:259, etc.) work unchanged. Semantically the function is now "containsHiddenUnicode". Tests: +21 cases in tests/lib/string-utils-hidden-unicode.test.mjs: - 5 Tag-block regression guards - 4 PUA-A range cases (start, just-inside, end, buried-in-ASCII) - 3 PUA-B range cases - 5 boundary cases (gap U+E0080-EFFFF, U+10FFFE noncharacter, emoji, CJK, Latin Extended — all must be FALSE) - 4 decodeUnicodeTags passthrough cases (PUA-A unchanged, PUA-B unchanged, Tag block still decodes, mixed Tag+PUA) Suite: 1596 → 1617 (+21). All green.	2026-04-29 14:18:49 +02:00

Author

SHA1

Message

Date

Kjell Tore Guttormsen

6cef80c640

feat(unicode): E1 — extend hidden-Unicode detection to PUA-A and PUA-B

Critical-review §4 E1 finding: pre-v7.2.0 the Unicode-stego detector
(`containsUnicodeTags`) covered only U+E0001-E007F (Tag block). Private
Use Areas — also invisible in most terminals and surviving normalization
— were not detected. Attackers could encode payloads in PUA codepoints
that pass through `scanForInjection` undetected.

Coverage extended to:
- U+E0001-E007F  Unicode Tag block       (existing — DeepMind kat. 1)
- U+F0000-FFFFD  Supplementary PUA-A      (NEW — E1)
- U+100000-10FFFD Supplementary PUA-B     (NEW — E1)

Detection-only for PUA: PUA characters have NO standard ASCII mapping,
so `decodeUnicodeTags` leaves them unchanged. Detection alone is
sufficient — `scanForInjection` emits HIGH on any presence, regardless
of decoded content.

Function name `containsUnicodeTags` preserved for back-compat. All
existing call sites (injection-patterns.mjs:259, etc.) work unchanged.
Semantically the function is now "containsHiddenUnicode".

Tests: +21 cases in tests/lib/string-utils-hidden-unicode.test.mjs:
- 5 Tag-block regression guards
- 4 PUA-A range cases (start, just-inside, end, buried-in-ASCII)
- 3 PUA-B range cases
- 5 boundary cases (gap U+E0080-EFFFF, U+10FFFE noncharacter, emoji,
  CJK, Latin Extended — all must be FALSE)
- 4 decodeUnicodeTags passthrough cases (PUA-A unchanged, PUA-B
  unchanged, Tag block still decodes, mixed Tag+PUA)

Suite: 1596 → 1617 (+21). All green.

2026-04-29 14:18:49 +02:00

1 commit