open/ktg-plugin-marketplace - Forgejo: Beyond coding. We Forge.

open/ktg-plugin-marketplace

Author	SHA1	Message	Date
Kjell Tore Guttormsen	ec4ae268da	feat(injection): E16 — homoglyph NFKC fold before every pattern match Critical-review §4 E16 finding: pre-v7.2.0 homoglyph normalization fired ONLY for the MEDIUM-advisory "obfuscation present" signal. Pattern matchers in scanForInjection compared against raw + decoded variants only — they did NOT compare against a fold-normalized variant. As a result, "ignоre previous instructions" (Cyrillic о, U+043E) bypassed the CRITICAL "ignore previous" pattern. Two coordinated edits: scanners/lib/string-utils.mjs - Adds HOMOGLYPH_MAP (frozen) — surgical Cyrillic/Greek → Latin map. ~25 entries focused on injection-vocabulary letters (a, e, o, c, p, x, y, i, j, s, l, A, E, O, C, P, X, Y, T). - Adds foldHomoglyphs(s) — pipeline: NFKC → apply HOMOGLYPH_MAP. NFKC handles Mathematical Alphanumeric (U+1D400 block), fullwidth Latin (U+FF21 block), ligatures, width variants. Excluded by design from HOMOGLYPH_MAP: - Latin Extended (æ, ø, å, é, è, ñ, ü, ö, ä, ç, ß, þ, ð) — legitimate Norwegian/German/French/Spanish letters. Map them and we false-positive on every non-English source file. - Greek letters not visually overlapping (β, γ, δ, ...) - Cyrillic letters not visually overlapping (б, г, д, ж, ...) scanners/lib/injection-patterns.mjs - scanForInjection now builds a 4-variant set: raw, normalized, folded(raw), folded(normalized). Set deduplication skips redundant identical variants. Existing dedup-by-label (seenLabels Set) prevents double-counts when the same pattern matches in multiple variants. - foldHomoglyphs added to the imports. Tests: +27 cases in tests/lib/string-utils-homoglyph.test.mjs: - 6 Cyrillic → Latin (lowercase, uppercase, multiple substitutions, Palochka U+04CF) - 3 Greek → Latin - 2 NFKC normalization (Math Bold, Fullwidth) - 8 preserves-non-confusable (Norwegian æøå, German umlauts, French accents, Spanish ñ, emoji, CJK, Arabic/Hebrew) - 3 edge cases (empty, null/undefined, idempotency) - 5 scanForInjection integration (Cyrillic ignore, Cyrillic Assistant, Norwegian non-trigger, benign "ignore" comment, mixed Cyrillic+Greek) Test-development found: U+1D5DC is "I" not "A" (test pin caught my codepoint mistake — fixed during dev). Suite: 1617 → 1644 (+27). All green.	2026-04-29 14:22:05 +02:00
Kjell Tore Guttormsen	6cef80c640	feat(unicode): E1 — extend hidden-Unicode detection to PUA-A and PUA-B Critical-review §4 E1 finding: pre-v7.2.0 the Unicode-stego detector (`containsUnicodeTags`) covered only U+E0001-E007F (Tag block). Private Use Areas — also invisible in most terminals and surviving normalization — were not detected. Attackers could encode payloads in PUA codepoints that pass through `scanForInjection` undetected. Coverage extended to: - U+E0001-E007F Unicode Tag block (existing — DeepMind kat. 1) - U+F0000-FFFFD Supplementary PUA-A (NEW — E1) - U+100000-10FFFD Supplementary PUA-B (NEW — E1) Detection-only for PUA: PUA characters have NO standard ASCII mapping, so `decodeUnicodeTags` leaves them unchanged. Detection alone is sufficient — `scanForInjection` emits HIGH on any presence, regardless of decoded content. Function name `containsUnicodeTags` preserved for back-compat. All existing call sites (injection-patterns.mjs:259, etc.) work unchanged. Semantically the function is now "containsHiddenUnicode". Tests: +21 cases in tests/lib/string-utils-hidden-unicode.test.mjs: - 5 Tag-block regression guards - 4 PUA-A range cases (start, just-inside, end, buried-in-ASCII) - 3 PUA-B range cases - 5 boundary cases (gap U+E0080-EFFFF, U+10FFFE noncharacter, emoji, CJK, Latin Extended — all must be FALSE) - 4 decodeUnicodeTags passthrough cases (PUA-A unchanged, PUA-B unchanged, Tag block still decodes, mixed Tag+PUA) Suite: 1596 → 1617 (+21). All green.	2026-04-29 14:18:49 +02:00
Kjell Tore Guttormsen	5f8f2d3c41	fix(dep): B7 — token-overlap typosquat heuristic alongside Levenshtein Critical-review §2 B7 finding: pure Levenshtein <=2 misses the most common modern typosquat pattern — popular-name + token-injection suffix. Examples: lodash → lodash-utils (edit distance 6, not flagged pre-B7) react → react-helper (edit distance 7, not flagged pre-B7) express → express-wrapper (edit distance 8, not flagged pre-B7) Three coordinated edits: scanners/lib/string-utils.mjs - Adds tokenize(name): string[] splits on -/_, lowercases - Adds tokenOverlap(a, b): number intersection.size / min(\|a\|,\|b\|) - Adds TYPOSQUAT_SUSPICIOUS_TOKENS frozen list of common typosquat suffixes. Excludes language-extension tokens (js, jsx, ts, tsx) — the v7.0.0 allowlist contains `tsx` as a legit package and including the same token in the suspicious set creates a contradiction. Caught by the new allowlist-intersection-guard test. Also excludes 'pro' (legitimate edition marker). scanners/dep-auditor.mjs + scanners/supply-chain-recheck.mjs - New checkTyposquatTokenOverlap() helper — fires AFTER Levenshtein 1/2 branches, only when: 1. popular package's tokens ⊆ declared name's tokens (strict superset) 2. declared name has at least one suspicious suffix 3. popular package is in topCutoff window All three conditions required — conservative by design. Allowlist precedence preserved (existing 22 npm + 13 PyPI entries always pass). MEDIUM severity, NOT block. New finding title prefix: "Possible typosquatting via token-overlap". Tests: +21 cases across two new files - tests/lib/string-utils-tokens.test.mjs (15) — tokenize, tokenOverlap, TYPOSQUAT_SUSPICIOUS_TOKENS frozen contract, allowlist-intersection guard (caught the tsx conflict on first run) - tests/scanners/dep-token-overlap.test.mjs (7) — integration via in-memory tmpdir fixtures: lodash-utils flagged, react-helper flagged, express-wrapper flagged, lodash exact NOT flagged, allowlist tools (knip/tsx/nx/rimraf) NOT flagged, react-router-dom (no suspicious suffix) NOT flagged, react itself (equal token set, not superset) NOT flagged. Existing dep.test.mjs and supply-chain-recheck.test.mjs unchanged — all green (149 → 149 regression guard). Suite: 1570 → 1591 (+21). All green.	2026-04-29 14:10:53 +02:00
Kjell Tore Guttormsen	f93d6abdae	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00