refactor(entropy): B5 — two-stage context-classified suppression pipeline

The v7.0.0 entropy-scanner ran rules 11-13 (GLSL/CSS-in-JS/inline-markup
line-proximity suppressions) for every line regardless of file type. A
polyglot `.ts` file with an embedded fragment-shader template literal
could therefore mask a real high-entropy credential when the credential
literal happened to share a line with a GLSL keyword. Critical-review
B5 documented the false-negative class.

Refactor:

  * New `classifyFileContext(absPath, lines)` returns
    `'shader-dominant' | 'markup-dominant' | 'code-dominant' | 'mixed'`,
    keyed off file extension with a content-density fallback for
    code-extension files (≥50% of sampled non-blank lines matching
    GLSL/inline-markup → downgrade to `mixed`).

  * `isFalsePositive(str, line, absPath, context)` gates rules 11-13
    on `context !== 'code-dominant'`. Rules 1-10 and 14-19 still run
    unconditionally, so URL/path/test-fixture/ffmpeg/UA/SQL/error-
    template suppression behaves identically.

  * `scanFileContent` computes `fileContext` once per file and threads
    it through every per-string suppression check.

Conservative defaults to keep the regression surface minimal:

  * Files with `<5` sampled non-blank lines fall back to `mixed`
    (preserves the existing rule-11/12/13 behaviour for the single-
    line .js fixtures used by entropy-context.test.mjs).
  * Unknown extensions fall back to `mixed`.
  * Code-extension files densely populated with shader/markup
    content fall back to `mixed`.

Net effect: a `.ts` file with an embedded GLSL block but mostly TS
code on the surrounding lines now surfaces credentials that the
v7.0.0 line-proximity heuristic suppressed. Pure shader/markup
files are unaffected (extension skip / mixed default).

New fixture: tests/fixtures/entropy/polyglot-ts-with-glsl.ts (with
runtime placeholder so it does not commit a high-entropy literal).

+3 tests in tests/scanners/entropy-context.test.mjs (26 → 29).
Existing entropy.test.mjs and entropy-context.test.mjs all remain
green. Full suite 1658 → 1661.

Refs: Batch B Wave 5 / Step 12 / v7.2.0
critical-review-2026-04-20.md §B5
This commit is contained in:
Kjell Tore Guttormsen 2026-04-29 15:13:13 +02:00
commit 04f1593df3
3 changed files with 197 additions and 8 deletions

View file

@ -0,0 +1,32 @@
// Polyglot TypeScript fixture for the entropy-scanner B5 regression.
//
// Pre-B5 behaviour: rule 11 (GLSL_KEYWORDS line-proximity) suppressed any
// high-entropy string that happened to share a line with shader keywords.
// In a `.ts` file with an embedded fragment-shader template literal, a real
// credential on the closing brace line would be silently dismissed.
//
// Post-B5 behaviour: classifyFileContext returns 'code-dominant' for `.ts`
// files (unless the file is overwhelmingly shader/markup), which disables
// rules 11-13. The credential below is therefore detected.
//
// The placeholder __ENTROPY_PAYLOAD_PLACEHOLDER__ is replaced at test time
// with a randomly generated high-entropy string. The static fixture stays
// out of the pre-edit-secrets hook because no real high-entropy literal is
// committed to disk.
const fragmentShader = `
precision highp float;
uniform vec3 u_resolution;
uniform float u_time;
varying vec2 v_uv;
void main() {
vec3 color = vec3(v_uv, sin(u_time));
gl_FragColor = vec4(color, 1.0);
}
`;
// The next line ends a uniform vec3 declaration AND carries the placeholder
// — exactly the kind of GLSL-adjacent line that rule 11 used to suppress.
const placeholder = "__ENTROPY_PAYLOAD_PLACEHOLDER__"; // uniform vec3 normal;
export { fragmentShader, placeholder };

View file

@ -262,4 +262,82 @@ describe('entropy-scanner context suppression (v7.0.0+)', () => {
await rm(fx, { recursive: true, force: true });
});
});
describe('D. B5 file-context classification (v7.2.0)', () => {
it('B5 regression: code-dominant .ts file with embedded GLSL — credential adjacent to shader is detected', async () => {
// Polyglot TS file: many code lines, a few GLSL lines inside a template
// literal, and a credential-shaped string on a line that happens to
// contain GLSL keyword tokens. Pre-B5 rule 11 line-proximity suppressed
// this. Post-B5 classifyFileContext returns 'code-dominant' (sample is
// mostly TS code, <50% GLSL/markup), rules 11-13 are gated off, and
// the credential is detected.
const fx = await newRoot('ent-b5-polyglot-');
const fixtureContent = [
'import { Renderer } from "./renderer";',
'',
'const fragmentShader = `',
' precision highp float;',
' uniform vec3 u_resolution;',
' varying vec2 v_uv;',
'`;',
'',
'// Adjacent line carries GLSL tokens AND the credential payload.',
'const blob = "' + PAYLOAD + '"; // uniform vec3 normal;',
'',
'export { fragmentShader, blob };',
].join('\n');
await writeFixture(fx, 'shader-app.ts', fixtureContent);
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.ok(
result.findings.length >= 1,
'expected B5 to surface credential in code-dominant .ts despite GLSL neighbour; got ' + result.findings.length
);
await rm(fx, { recursive: true, force: true });
});
it('B5 control: legitimate .glsl file with high-entropy hash in shader source is still suppressed (extension skip)', async () => {
// A pure-shader file is skipped at the file-extension gate, never
// reaching classifyFileContext. This control confirms the extension
// skip still works (B5 only changed line-level rule gating).
const fx = await newRoot('ent-b5-glsl-');
await writeFixture(fx, 'noise.glsl',
'uniform vec3 u_seed;\nvec3 rand = vec3(' + PAYLOAD + ');\n');
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.equal(result.findings.length, 0, '.glsl files remain extension-skipped');
await rm(fx, { recursive: true, force: true });
});
it('B5 control: shader-dominant .ts file with ≥50% GLSL lines downgrades to mixed and suppresses', async () => {
// A code-extension file that is *mostly* shader template content —
// rule 11 should still fire because classifyFileContext downgrades it
// to 'mixed' (≥50% sampled lines match GLSL/INLINE_MARKUP).
const fx = await newRoot('ent-b5-shader-ts-');
const fixtureContent = [
'uniform vec3 u_resolution;',
'uniform vec3 u_camera_pos;',
'uniform float u_time;',
'varying vec2 v_uv;',
'varying vec3 v_normal;',
'attribute vec3 position;',
'attribute vec2 uv;',
'precision highp float;',
'gl_Position = vec4(position, 1.0);',
'gl_FragColor = vec4(1.0);',
'const blob = "' + PAYLOAD + '"; // uniform vec3 normal;',
].join('\n');
await writeFixture(fx, 'shader-heavy.ts', fixtureContent);
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.equal(
result.findings.length, 0,
'expected shader-dense .ts (≥50% GLSL lines) to downgrade to mixed and suppress; got ' + result.findings.length
);
await rm(fx, { recursive: true, force: true });
});
});
});