ktg-plugin-marketplace/plugins/llm-security/scanners/lib
Kjell Tore Guttormsen ec4ae268da feat(injection): E16 — homoglyph NFKC fold before every pattern match
Critical-review §4 E16 finding: pre-v7.2.0 homoglyph normalization fired
ONLY for the MEDIUM-advisory "obfuscation present" signal. Pattern
matchers in scanForInjection compared against raw + decoded variants
only — they did NOT compare against a fold-normalized variant. As a
result, "ignоre previous instructions" (Cyrillic о, U+043E) bypassed
the CRITICAL "ignore previous" pattern.

Two coordinated edits:

scanners/lib/string-utils.mjs
- Adds HOMOGLYPH_MAP (frozen) — surgical Cyrillic/Greek → Latin map.
  ~25 entries focused on injection-vocabulary letters
  (a, e, o, c, p, x, y, i, j, s, l, A, E, O, C, P, X, Y, T).
- Adds foldHomoglyphs(s) — pipeline: NFKC → apply HOMOGLYPH_MAP.
  NFKC handles Mathematical Alphanumeric (U+1D400 block), fullwidth
  Latin (U+FF21 block), ligatures, width variants.

Excluded by design from HOMOGLYPH_MAP:
- Latin Extended (æ, ø, å, é, è, ñ, ü, ö, ä, ç, ß, þ, ð) — legitimate
  Norwegian/German/French/Spanish letters. Map them and we false-positive
  on every non-English source file.
- Greek letters not visually overlapping (β, γ, δ, ...)
- Cyrillic letters not visually overlapping (б, г, д, ж, ...)

scanners/lib/injection-patterns.mjs
- scanForInjection now builds a 4-variant set: raw, normalized,
  folded(raw), folded(normalized). Set deduplication skips redundant
  identical variants. Existing dedup-by-label (seenLabels Set) prevents
  double-counts when the same pattern matches in multiple variants.
- foldHomoglyphs added to the imports.

Tests: +27 cases in tests/lib/string-utils-homoglyph.test.mjs:
- 6 Cyrillic → Latin (lowercase, uppercase, multiple substitutions,
  Palochka U+04CF)
- 3 Greek → Latin
- 2 NFKC normalization (Math Bold, Fullwidth)
- 8 preserves-non-confusable (Norwegian æøå, German umlauts, French
  accents, Spanish ñ, emoji, CJK, Arabic/Hebrew)
- 3 edge cases (empty, null/undefined, idempotency)
- 5 scanForInjection integration (Cyrillic ignore, Cyrillic Assistant,
  Norwegian non-trigger, benign "ignore" comment, mixed Cyrillic+Greek)

Test-development found: U+1D5DC is "I" not "A" (test pin caught my
codepoint mistake — fixed during dev).

Suite: 1617 → 1644 (+27). All green.
2026-04-29 14:22:05 +02:00
..
audit-trail.mjs feat(governance): add structured JSONL audit trail with SIEM-ready schema 2026-04-10 13:25:59 +02:00
bash-normalize.mjs fix(scanners): preserve single-quoted regions through bash-normalize pipeline 2026-04-17 14:29:02 +02:00
bom-builder.mjs feat(scanner): add AI-BOM generator — CycloneDX 1.6 format for AI supply chain transparency 2026-04-10 13:29:30 +02:00
diff-engine.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
distribution-stats.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
file-discovery.mjs feat(llm-security)!: v7.0.0 commit 6 — tests, docs, version bump 2026-04-19 22:26:35 +02:00
fs-utils.mjs feat(llm-security): sandboxed remote cloning v5.1.0 2026-04-07 17:08:32 +02:00
git-clone.mjs feat(llm-security): sandboxed remote cloning v5.1.0 2026-04-07 17:08:32 +02:00
ide-extension-data.mjs feat(llm-security): seed top-jetbrains-plugins.json + loadJetBrainsBlocklist export 2026-04-18 09:56:55 +02:00
ide-extension-discovery.mjs feat(llm-security): honor LLM_SECURITY_IDE_ROOTS for JetBrains discovery 2026-04-18 11:09:02 +02:00
ide-extension-parser.mjs feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction 2026-04-18 10:15:12 +02:00
injection-patterns.mjs feat(injection): E16 — homoglyph NFKC fold before every pattern match 2026-04-29 14:22:05 +02:00
jetbrains-fetch-worker.mjs feat(llm-security): URL-fetch support for JetBrains Marketplace (v6.6.0) 2026-04-18 10:46:13 +02:00
mcp-description-cache.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
output.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
policy-loader.mjs feat(llm-security): v7.0.0 commit 3 — policy-driven entropy thresholds 2026-04-19 22:02:52 +02:00
sarif-formatter.mjs feat(scanner): add SARIF 2.1.0 output format to scan-orchestrator (--format sarif) 2026-04-10 13:22:59 +02:00
severity.mjs docs(severity): B3 — document info as scoring-inert (v7.2.0 prep) 2026-04-29 13:56:11 +02:00
skill-registry.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
string-utils.mjs feat(injection): E16 — homoglyph NFKC fold before every pattern match 2026-04-29 14:22:05 +02:00
supply-chain-data.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
vsix-fetch-worker.mjs feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0) 2026-04-17 17:28:57 +02:00
vsix-fetch.mjs feat(llm-security): add fetchJetBrainsPlugin + URL detection for plugins.jetbrains.com 2026-04-18 10:39:54 +02:00
vsix-sandbox.mjs refactor(llm-security): parameterize buildSandboxedWorker with workerPath 2026-04-18 10:37:10 +02:00
yaml-frontmatter.mjs feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
zip-extract.mjs feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0) 2026-04-17 17:16:26 +02:00