fix(injection): E16 ASCII fast-path + UNI-003 expectation update (v7.2.0)

Two follow-up fixes after E16 + E17 landed: 1. foldHomoglyphs ASCII fast-path - scanForInjection calls foldHomoglyphs on every scan (raw + normalized). - Pre-fix: NFKC normalization runs unconditionally, even on pure ASCII inputs where it's a no-op. - Result: benchmark.test.mjs timed out at 120s on the full suite. - Fix: charCodeAt sweep for >=128, short-circuit return s when all ASCII. NFKC and HOMOGLYPH_MAP iteration only run when non-ASCII chars are present (the actual attack case). - Verified: benchmark.test.mjs passes within timeout. 2. Attack-scenario UNI-003 expectation - Pre-E16: "Homoglyph Cyrillic-Latin mixing" payload triggered only a MEDIUM "obfuscation present" advisory (exit 0, stdout match "MEDIUM"). - Post-E16: the same payload is folded to Latin BEFORE pattern matching, so it now matches CRITICAL "ignore previous instructions" and blocks (exit 2). - This is the intended v7.2.0 behavior — not a regression. Updated expectation: exit_code 2, stdout_match "block". Renamed scenario to "now blocked via E16 fold, v7.2.0". Suite: pre-compact-scan flake remains (perf-budget under load, passes isolated). All other tests green.
2026-04-29 14:44:41 +02:00 · 2026-04-29 14:44:41 +02:00 · 6073952b97
commit 6073952b97
parent f0a1d4024a
2 changed files with 15 additions and 3 deletions
--- a/plugins/llm-security/knowledge/attack-scenarios.json
+++ b/plugins/llm-security/knowledge/attack-scenarios.json
@ -881,7 +881,7 @@
        },
        {
          "id": "UNI-003",
-          "name": "Homoglyph Cyrillic-Latin mixing (MEDIUM)",
+          "name": "Homoglyph Cyrillic-Latin mixing (now blocked via E16 fold, v7.2.0)",
          "input": {
            "session_id": "red-team",
            "message": {
@ -890,8 +890,8 @@
            }
          },
          "expect": {
-            "exit_code": 0,
+            "exit_code": 2,
-            "stdout_match": "MEDIUM"
+            "stdout_match": "block"
          }
        },
        {
--- a/plugins/llm-security/scanners/lib/string-utils.mjs
+++ b/plugins/llm-security/scanners/lib/string-utils.mjs
@ -451,11 +451,23 @@ const HOMOGLYPH_MAP = Object.freeze({
 * Norwegian/Polish/German/etc. text is NOT affected — characters like
 * æ, ø, å, é, ñ, ü, ö, ä are not in HOMOGLYPH_MAP.
 *
 * Performance: pure-ASCII inputs short-circuit before NFKC, since NFKC is
 * a no-op on ASCII and HOMOGLYPH_MAP only contains non-ASCII keys.
 * scanForInjection calls this on every scan; the fast-path keeps the
 * common-case overhead near zero.
 *
 * @param {string} s
 * @returns {string}
 */
 export function foldHomoglyphs(s) {
  if (!s) return s;
  // Fast path: pure ASCII has nothing to fold and NFKC is identity.
  // charCodeAt is cheaper than iterating codepoints.
  let asciiOnly = true;
  for (let i = 0; i < s.length; i++) {
    if (s.charCodeAt(i) > 127) { asciiOnly = false; break; }
  }
  if (asciiOnly) return s;
  const normalized = s.normalize('NFKC');
  let out = '';
  for (const ch of normalized) {