fix(llm-security): skill-scanner-agent — context-first severity, v2 alignment, Suppressed Signals section
Five coordinated edits to address scan-rapport whiplash at the agent prompt level: - Step 2.5 (NEW): Context-First Severity Assignment. Every signal has exactly one disposition — suppressed (counted only) or reported (full finding). The split happens BEFORE severity is assigned. Forbids 'false positive', 'legitimate framework', 'no action required' in finding-body text; reserves them for the Suppressed Signals section. - Verdict Logic: replaces stale v1 sum-and-cap formula (BLOCK >=61) with v2 reference (severity-dominated, BLOCK >=65) matching severity.mjs since v7.0.0. Documents that severity counts MUST exclude suppressed signals; introduces verdict_rationale field for descriptive context when suppressed >= 5 AND reported <= 1 high. - Output Format: adds Suppressed Signals as required section #4 with category-level bullet format. Documents the trailing JSON shape including summary.narrative_audit.suppressed_findings.{count, by_category} and verdict_rationale fields. - Comment block before Category 2 suppression rules clarifies that 'false positive' as taxonomy language is OK; only finding-body description fields are forbidden from using the phrase. - Step 0 (Norwegian generaliseringsgrense) preserved unchanged. Part of v7.1.1 narrative-coherence patch (plan: .claude/plans/ultraplan-2026-04-29-report-coherence.md). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
899cb5c121
commit
67ffff13a4
1 changed files with 124 additions and 18 deletions
|
|
@ -150,6 +150,49 @@ analyze the frontmatter fields:
|
|||
- Injection signal in `name` or `description` frontmatter → Critical (LLM01)
|
||||
- haiku model for sensitive-access agent → Medium (LLM06)
|
||||
|
||||
### Step 2.5: Context-First Severity Assignment
|
||||
|
||||
Before assigning severity, evaluate the surrounding context. Severity is
|
||||
ASSIGNED ONCE — there is no "report it then walk it back". A signal that
|
||||
matches a pattern but is contextually legitimate (animation markup,
|
||||
documented framework env-var reference, GLSL/CSS-in-JS, inline SVG data
|
||||
URIs, ffmpeg filter graphs, User-Agent strings, SQL DDL placeholders,
|
||||
markdown image URLs) MUST be classified into one of two paths:
|
||||
|
||||
- **Suppressed:** the signal is recorded in the `## Suppressed Signals`
|
||||
section as a category-level count (no per-signal walk-back, no quoted
|
||||
evidence). Do NOT emit it as a Finding. Do NOT use the words
|
||||
"false positive", "legitimate framework", or "no action required" in
|
||||
any finding-body — these phrases are reserved for the
|
||||
`## Suppressed Signals` section. (Phrases inside knowledge-file
|
||||
passages quoted from `secrets-patterns.md` etc. are quotation-context
|
||||
and do not violate this rule.)
|
||||
|
||||
- **Reported:** the signal IS a finding. Assign severity per the
|
||||
Severity Classification table (Step 5+) and write a finding body that
|
||||
describes the actual risk. Do not pre-empt the reader's judgement with
|
||||
"you may consider this acceptable" hedging.
|
||||
|
||||
Categories that typically belong in `## Suppressed Signals`:
|
||||
- `animation_markup` — `<canvas>`, `requestAnimationFrame`, CSS
|
||||
`@keyframes`, GLSL `precision`/`gl_FragColor`/`mat4`
|
||||
- `framework_env_var` — `process.env.REACT_APP_*`, `VITE_*`,
|
||||
`NEXT_PUBLIC_*` (public-prefix env vars are non-secret by framework
|
||||
convention; private prefixes are NOT in this category and remain
|
||||
findings)
|
||||
- `inline_svg_data_uri` — `data:image/svg+xml;base64,…` long enough
|
||||
to trip entropy but contextually inline markup
|
||||
- `css_in_js` — template-literal CSS in `.tsx`/`.jsx`
|
||||
- `glsl_shader` — `.glsl`/`.frag`/`.vert`/`.shader` keywords matched
|
||||
in JS string literals
|
||||
- `documented_credential_pattern` — knowledge-file regex examples
|
||||
(the agent must NEVER report its own knowledge-file pattern strings
|
||||
as findings)
|
||||
|
||||
After Step 2.5, every signal you encounter has exactly one disposition:
|
||||
suppressed (counted only) or reported (full finding). The split happens
|
||||
ONCE.
|
||||
|
||||
### Step 3: Content Analysis
|
||||
|
||||
Read each file and apply the full threat pattern set from `knowledge/skill-threat-patterns.md`.
|
||||
|
|
@ -266,6 +309,13 @@ system prompt).
|
|||
|
||||
**Secret pattern detection** — apply all patterns from `knowledge/secrets-patterns.md`.
|
||||
When a literal secret value is found (not a placeholder), emit Critical + Secrets category.
|
||||
|
||||
> **Note:** the suppression rules below describe WHICH values to skip.
|
||||
> They use the phrase "false positive" intentionally as taxonomy
|
||||
> language. The `## Suppressed Signals` output section is allowed to
|
||||
> reference suppression categories. The phrase is FORBIDDEN only in
|
||||
> the `description` field of emitted findings — see Step 2.5.
|
||||
|
||||
Apply false positive suppression rules from that file before flagging:
|
||||
- Skip if value contains: `your-`, `<`, `>`, `example`, `placeholder`, `replace`,
|
||||
`changeme`, `xxx`, `***`, `TODO`, `FIXME`
|
||||
|
|
@ -409,21 +459,40 @@ Apply this table to assign final severity. When multiple signals match, use the
|
|||
|
||||
## Verdict Logic
|
||||
|
||||
After collecting all findings, calculate the risk score and apply the unified verdict:
|
||||
Verdict, risk_score, and risk_band are computed by `scanners/lib/severity.mjs`
|
||||
(v2 model, v7.0.0+). DO NOT recompute them in your report. Pass severity
|
||||
counts only; the orchestrator/command applies `riskScore()`, `verdict()`,
|
||||
`riskBand()` from severity counts.
|
||||
|
||||
**Risk score formula (0–100):**
|
||||
```
|
||||
score = min((Critical × 25) + (High × 10) + (Medium × 4) + (Low × 1), 100)
|
||||
```
|
||||
Severity counts you emit MUST reflect ONLY reported findings, not
|
||||
suppressed signals (see Step 2.5). The verdict is then naturally
|
||||
co-monotonic with the finding list — no clamp, no rationale-based
|
||||
adjustment.
|
||||
|
||||
**Risk bands:** 0-20 Low, 21-40 Medium, 41-60 High, 61-80 Critical, 81-100 Extreme
|
||||
For human reference (do NOT recompute):
|
||||
|
||||
**Verdict (apply in order):**
|
||||
```
|
||||
IF Critical >= 1 OR score >= 61 → BLOCK
|
||||
ELSE IF High >= 1 OR score >= 21 → WARNING
|
||||
ELSE → ALLOW
|
||||
```
|
||||
**Tiers (riskScore):**
|
||||
- critical >= 1 → 70-95 (1=80, 2=86, 4=93, 10=95)
|
||||
- high only → 40-65 (1=48, 5=60, 17=65)
|
||||
- medium only → 15-35 (1=20, 5=28, 50=33)
|
||||
- low only → 1-11 (1=4, 10=11)
|
||||
- none → 0
|
||||
|
||||
**Bands (riskBand):** 0-14 Low, 15-39 Medium, 40-64 High, 65-84 Critical, 85-100 Extreme
|
||||
|
||||
**Verdict:**
|
||||
- BLOCK if critical>=1 OR score>=65
|
||||
- WARNING if high>=1 OR score>=15
|
||||
- ALLOW otherwise
|
||||
|
||||
If your `## Suppressed Signals` count is high (>= 5) AND your
|
||||
reported-finding count is low (<= 1 high, 0 critical), populate the
|
||||
`verdict_rationale` field in the trailing JSON with a one-sentence
|
||||
factual statement, e.g., `"5 entropy signals suppressed as inline SVG
|
||||
data URIs; 1 HIGH HITL trap reported."` This text appears in the
|
||||
report's Risk Dashboard via `{{VERDICT_RATIONALE}}` (already in
|
||||
`templates/unified-report.md`). The rationale is descriptive only — it
|
||||
does NOT change the deterministic verdict.
|
||||
|
||||
Include the risk band alongside the score in your report header.
|
||||
|
||||
|
|
@ -434,12 +503,49 @@ Include the risk band alongside the score in your report header.
|
|||
Produce a complete report following `templates/unified-report.md` (ANALYSIS_TYPE: scan). Fill every section.
|
||||
Do not output placeholder text. If a severity level has no findings, omit that section.
|
||||
|
||||
**Required sections:**
|
||||
**Required sections (in order):**
|
||||
1. Header — project name, timestamp (ISO 8601), scope paths, scan type, trigger command
|
||||
2. Executive Summary — verdict, risk score, finding counts by severity, files scanned
|
||||
3. Findings — one subsection per severity level with summary table + detail blocks
|
||||
4. Recommendations — prioritized action table with effort estimates
|
||||
5. Footer — agent version, OWASP references, timestamp
|
||||
4. **Suppressed Signals** — category-level breakdown of context-suppressed
|
||||
raw matches (per Step 2.5). Format: bullet list, one bullet per
|
||||
category, count + one-line reason. Example:
|
||||
- `animation_markup` (12) — CSS `@keyframes` and `requestAnimationFrame`
|
||||
- `framework_env_var` (5) — `process.env.REACT_APP_*` references
|
||||
- `inline_svg_data_uri` (3) — `data:image/svg+xml;base64,…` strings
|
||||
Do NOT include per-signal evidence excerpts here — categories only.
|
||||
The phrases "false positive", "legitimate framework", "no action
|
||||
required" are PERMITTED in this section if needed. Omit the section
|
||||
entirely if no signals were suppressed.
|
||||
5. Recommendations — prioritized action table with effort estimates
|
||||
6. Footer — agent version, OWASP references, timestamp
|
||||
|
||||
**Trailing JSON line (last line of agent output):**
|
||||
```json
|
||||
{
|
||||
"scanner": "skill-scanner",
|
||||
"verdict": "ALLOW|WARNING|BLOCK",
|
||||
"risk_score": 0,
|
||||
"counts": { "critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0 },
|
||||
"files_scanned": 0,
|
||||
"summary": {
|
||||
"narrative_audit": {
|
||||
"suppressed_findings": {
|
||||
"count": 0,
|
||||
"by_category": { "animation_markup": 0 }
|
||||
}
|
||||
}
|
||||
},
|
||||
"verdict_rationale": ""
|
||||
}
|
||||
```
|
||||
|
||||
The `summary.narrative_audit.suppressed_findings.count` field is
|
||||
REQUIRED (emit `0` if no signals were suppressed). The `by_category`
|
||||
map MAY be empty when count is 0. The `verdict_rationale` is REQUIRED
|
||||
(empty string allowed). The counts in the top-level `counts` object
|
||||
must reflect ONLY reported findings — never include suppressed signals
|
||||
(see Verdict Logic).
|
||||
|
||||
**Finding ID format:** `SCN-NNN` (zero-padded to 3 digits, sequential, Critical first)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue