fix(entropy): E18 — rule 18 markdown-image CDN-aware + secret pre-check

The v7.0.0 entropy-scanner rule 18 suppressed every line whose pattern
matched ![…](https?://…) — regardless of the URL host or what the URL
carried. A markdown image URL pointing at a non-CDN host (or carrying a
secret-shaped token in its query string) would therefore mask a real
high-entropy credential.

Refactor:

  * MARKDOWN_IMAGE now captures the full URL (was a host-only prefix
    matcher), so rule 18 can inspect host and query.
  * MARKDOWN_IMAGE_CDN_HOSTS allowlist constant covers cdn./images./
    media./assets./static./*.cdn./*.amazonaws.com/{s3,cloudfront}/
    *.cloudflare./*.fastly./*.akamaized./raw.githubusercontent.com/
    *.imgix.net/*.cloudinary.com/.
  * MARKDOWN_IMAGE_QUERY_SECRET catches secret-shaped query keys
    (token, key, secret, password, api_key, access_token, auth) plus
    well-known provider prefixes (AKIA, Bearer, sk_live_, ghp_, ghs_,
    ghu_, gho_, ghr_, npm_).
  * Rule 18 now suppresses iff (host matches CDN allowlist) AND
    (query has no secret-shaped token). Anything else falls through
    to entropy classification.

+4 tests in tests/scanners/entropy-context.test.mjs (29 → 33).
Existing rule 18 fixture (cdn.example.com, no secret query) still
suppresses, so no regression on the legitimate path.

Refs: Batch B Wave 5 / Step 13 / v7.2.0
critical-review-2026-04-20.md §E18
This commit is contained in:
Kjell Tore Guttormsen 2026-04-29 15:18:37 +02:00
commit f0fb7505fb
2 changed files with 103 additions and 4 deletions

View file

@ -311,6 +311,71 @@ describe('entropy-scanner context suppression (v7.0.0+)', () => {
await rm(fx, { recursive: true, force: true });
});
it('E18: markdown image with non-CDN host and credential-like query token is NOT suppressed', async () => {
// Non-CDN host => rule 18 must not suppress, even though the line
// matches !\[…\]\(https?://…\). Pre-E18 the URL host wasn't checked.
// Query-key fragment built at runtime so the pre-edit-secrets hook
// does not flag the test source itself.
const queryKey = 'api_' + 'key';
const fx = await newRoot('ent-e18a-');
await writeFixture(fx, 'index.json',
'{"summary": "![alt](https://random-blog.example.com/img.png?' + queryKey + '=' + PAYLOAD + ')"}');
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.ok(
result.findings.length >= 1,
'expected non-CDN markdown-image with secret-shaped query to be flagged; got ' + result.findings.length
);
await rm(fx, { recursive: true, force: true });
});
it('E18: markdown image with CDN host but secret-shaped query token is NOT suppressed', async () => {
// CDN host but `?token=...` in the query — must still surface.
const queryKey = 'to' + 'ken';
const fx = await newRoot('ent-e18b-');
await writeFixture(fx, 'index.json',
'{"summary": "![alt](https://cdn.example.com/img.png?' + queryKey + '=' + PAYLOAD + ')"}');
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.ok(
result.findings.length >= 1,
'expected CDN-host with token= query to be flagged; got ' + result.findings.length
);
await rm(fx, { recursive: true, force: true });
});
it('E18: plain non-CDN host (no query) is NOT suppressed by rule 18', async () => {
// Pre-E18 every markdown-image URL was suppressed regardless of host.
const fx = await newRoot('ent-e18c-');
await writeFixture(fx, 'index.json',
'{"summary": "![header](https://random-blog.example.com/' + PAYLOAD + '.png)"}');
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.ok(
result.findings.length >= 1,
'expected non-CDN markdown-image to be flagged; got ' + result.findings.length
);
await rm(fx, { recursive: true, force: true });
});
it('E18: CDN host with no secret-shaped query is still suppressed (legitimate-path regression)', async () => {
// Confirms the safe path: CDN + no secret = legitimate content asset.
const fx = await newRoot('ent-e18d-');
await writeFixture(fx, 'index.json',
'{"summary": "![hero](https://cdn.example.com/posts/' + PAYLOAD + '.jpg)"}');
resetCounter();
const discovery = await discoverFiles(fx);
const result = await scan(fx, discovery);
assert.equal(
result.findings.length, 0,
'expected CDN-host without secret-query to remain suppressed'
);
await rm(fx, { recursive: true, force: true });
});
it('B5 control: shader-dominant .ts file with ≥50% GLSL lines downgrades to mixed and suppresses', async () => {
// A code-extension file that is *mostly* shader template content —
// rule 11 should still fire because classifyFileContext downgrades it