fix(linkedin-studio): S12 — close S11 re-review + render-chain-propagation lint guard
Closes the 2 grep-verified findings from the S11 cold full-brief re-review
(docs/remediation/review.md, BLOCK 1/0/1/0, 0 dropped). Both were the NEXT
RING of the meta-class S10/S11 converged: a propagation miss — the fix had
landed where the SC named the file, not in the render-source it depends on.
BLOCKER (command->reference propagation): references/ab-testing-framework.md:166
still shipped the banned A/B "Significant? (>20%)" Yes/No verdict column while
commands/ab-test.md (which RENDERS from it, inlined at :30, presented at :69)
had been cleaned to the honest "Directional?" framing. Re-framed the reference
result template to match the command verbatim (header + the directional note)
and retuned :38 "20% significance threshold" -> "minimum-meaningful-difference
threshold". The whole render chain is now significance-verdict-free.
MINOR ($-replacement, class-closed not line-patched): the newest-first section
appends/rewrites in hooks/scripts/state-updater.mjs passed a replacement STRING
embedding untrusted user content to String.replace, so dollar-sequences
($1 / $& / dollar-backtick / dollar-apostrophe / $$) in a topic/hook/partner
(e.g. "$100 budget cut") re-injected the captured heading and dropped
characters, silently corrupting state. Converted all 5 content-bearing sites
(Recent Posts, prune, Milestone Log, First-Hour, Outreach) to replacement
FUNCTIONS; the 3 remaining $1 sites only interpolate date scalars. +4
$-bearing regression tests — incl. the prune fixture, which itself had to
switch to a function (the bug bit the fixture as it was being written).
META (generalize the lint to the propagation-miss class): new test-runner.sh
Section 11 — render-chain propagation guard. Forbids the significance-verdict
column (Significant? adjacent to "(" or a table pipe) across the WHOLE render
chain (commands + every inlined reference + adjacent surfaces), with a
permanent non-vacuity self-test (3 verdict forms caught, 6 legitimate
Significant/significance/Directional? forms ignored) and an e2e mutation-proof.
Generalizes S10/S11's "fix the class, not the line" to command->reference.
Pre-patch render-chain sweep confirmed ab-testing-framework.md was the SOLE
propagation survivor (so a 6th review finds no 3rd). test-runner.sh 70/0/0;
node --test 98/98. CLAUDE.md lint enumeration synced.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
433c2efb3d
commit
36f79dd702
5 changed files with 183 additions and 10 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# LinkedIn Studio Plugin (v4.0.0)
|
||||
|
||||
Full-spectrum LinkedIn content engine — short-form feed posts, carousels, video scripts, and long-form newsletter editions — with the 2026 relevance-ranking model baked in. **v4.0.0** is an **audit-remediation release (Voyage Phase 0–3)**: a critical self-review found overclaiming (tracking/analytics/review-independence the plugin couldn't deliver), dormant capability (11 agents never invoked by any command), and structural rot (a dead lint, a self-contradicting algorithm claim, an unpublishable model brand/date in user copy). The fix wires **all 11 orphaned agents** (no deletions → 19 agents), adds **`/linkedin:firsthour`** (→ 27 commands) + a short-form de-AI gate + a video quality gate, promotes `post-feedback-monitor` to Opus, makes the newsletter-distribution / profile-SEO / outreach surfaces honest, **reconciles the algorithm signals to one sourced statement** (no model name or date; `references/algorithm-signals-reference.md` is the single source of truth), fixes the analytics fresh-clone crash, closes the voice-profile leak (placeholder + sentinel + gitignore), and rebuilds the structure lint with version/count/stat/model-consistency guards (each agent's frontmatter model must match every surface declaration). Breaking — reinstall/reload required for the newly-wired agents; consolidates the v3.0.0 identity break (slug, agent namespace, state-file path). v2.0.0 consolidated the surface (27 commands → 24, 16 agents → 14) while adding the long-form `/linkedin:newsletter` orchestrator + two longform-quality gate agents (`fact-checker`, `persona-reviewer`). v2.1.0 added two gates BEFORE prose (Step 2.5 skeleton + Step 3a spine prose) + a third `persona-reviewer` mode (`skjelett`). v2.2.0 hardened the longform gates with the lessons from the second production run (Seres-serien): blocking persona hard-fails, a post-cutoff fact-check mandate, a `voice-scrubber` agent, render+annotate operator gates, and STATE.md-reconciled edition state. v2.3.0 made **visual assets an explicit pipeline phase** — Step 7.5 (visual-assets) between annotation (Step 7) and lock (Step 8): cover (+ optional inline figures) or a carousel deck, generated (default `mcp-image`, external `cover-raw.png` accepted) and operator-gated BEFORE lock so `render/build-linkedin.mjs` picks up `cover.png` at lock without a post-lock re-render. **v2.4.0** makes an **editor's craft gate an explicit pipeline phase** — new **Step 5.5 (editorial-review)** between fact-check (Step 5) and the persona sweep (Step 6): a new **`editorial-reviewer` agent** (Opus) judges **craft** (prosa-håndverk + narrativ-arkitektur), not reader-response, returning ≤10 flags (BLOCK/REWORK/NICE) as direction, **operator-gated via `SendUserFile` BEFORE the persona sweep** so the personas measure resonance instead of stumbling on craft noise. Motivated by Del 4: every persona reported PASS, yet the editor found 8 fresh points on first reading, ~6/8 of them craft/architecture blind spots no agent measured. Mirrors the Maskinrommet writing-contract §C2. Pipeline 14 → 15 phases; agents 15 → 16; additive `editorialReview` state. Doc/orchestration-only for the wiring (the new agent + its fasit fixture + lint test are the only new files); commands unchanged (24). **v3.1.0 (Endring 9)** adds an **adversarial review package** run COLD on a frozen draft — new **Step 6.5 (headless-review)** between the persona sweep (Step 6) and lock, plus a standalone **`/linkedin:headless-review`** command (run in a fresh session for maximum isolation): three new headless archetypes — **`content-reviewer`** (argument integrity), **`language-reviewer`** (Norwegian language), **`fact-reviewer`** (cold re-verification incl. claims a late pivot bolted on) — plus `persona-reviewer` in resonance + conversion modes, all with NO drafting-session context (the independence layer the in-session gates structurally cannot be). v3.1.0 also adds **`/linkedin:pivot`** (re-opens cleared gates after a late change + a >20 %/>2-section pivot-detection gate at lock) and **per-artifact personas** (`articles.NN.personas` — one or more readers configurable per edition, resolved edition-state → series file → plugin library → interactive). Pipeline 15 → 16 phases; agents 16 → 19; commands 24 → 26; additive `personas` / `pivots` / `headlessReview` state. Motivated by Del 4: the in-session editor + persona sweep shared the drafting session's framing-bias, so the version that shipped was never independently re-reviewed.
|
||||
Full-spectrum LinkedIn content engine — short-form feed posts, carousels, video scripts, and long-form newsletter editions — with the 2026 relevance-ranking model baked in. **v4.0.0** is an **audit-remediation release (Voyage Phase 0–3)**: a critical self-review found overclaiming (tracking/analytics/review-independence the plugin couldn't deliver), dormant capability (11 agents never invoked by any command), and structural rot (a dead lint, a self-contradicting algorithm claim, an unpublishable model brand/date in user copy). The fix wires **all 11 orphaned agents** (no deletions → 19 agents), adds **`/linkedin:firsthour`** (→ 27 commands) + a short-form de-AI gate + a video quality gate, promotes `post-feedback-monitor` to Opus, makes the newsletter-distribution / profile-SEO / outreach surfaces honest, **reconciles the algorithm signals to one sourced statement** (no model name or date; `references/algorithm-signals-reference.md` is the single source of truth), fixes the analytics fresh-clone crash, closes the voice-profile leak (placeholder + sentinel + gitignore), and rebuilds the structure lint with version/count/stat/model-consistency + render-chain-propagation guards (each agent's frontmatter model must match every surface declaration; no honesty pattern a command was cleaned of survives in the reference it renders from — S12). Breaking — reinstall/reload required for the newly-wired agents; consolidates the v3.0.0 identity break (slug, agent namespace, state-file path). v2.0.0 consolidated the surface (27 commands → 24, 16 agents → 14) while adding the long-form `/linkedin:newsletter` orchestrator + two longform-quality gate agents (`fact-checker`, `persona-reviewer`). v2.1.0 added two gates BEFORE prose (Step 2.5 skeleton + Step 3a spine prose) + a third `persona-reviewer` mode (`skjelett`). v2.2.0 hardened the longform gates with the lessons from the second production run (Seres-serien): blocking persona hard-fails, a post-cutoff fact-check mandate, a `voice-scrubber` agent, render+annotate operator gates, and STATE.md-reconciled edition state. v2.3.0 made **visual assets an explicit pipeline phase** — Step 7.5 (visual-assets) between annotation (Step 7) and lock (Step 8): cover (+ optional inline figures) or a carousel deck, generated (default `mcp-image`, external `cover-raw.png` accepted) and operator-gated BEFORE lock so `render/build-linkedin.mjs` picks up `cover.png` at lock without a post-lock re-render. **v2.4.0** makes an **editor's craft gate an explicit pipeline phase** — new **Step 5.5 (editorial-review)** between fact-check (Step 5) and the persona sweep (Step 6): a new **`editorial-reviewer` agent** (Opus) judges **craft** (prosa-håndverk + narrativ-arkitektur), not reader-response, returning ≤10 flags (BLOCK/REWORK/NICE) as direction, **operator-gated via `SendUserFile` BEFORE the persona sweep** so the personas measure resonance instead of stumbling on craft noise. Motivated by Del 4: every persona reported PASS, yet the editor found 8 fresh points on first reading, ~6/8 of them craft/architecture blind spots no agent measured. Mirrors the Maskinrommet writing-contract §C2. Pipeline 14 → 15 phases; agents 15 → 16; additive `editorialReview` state. Doc/orchestration-only for the wiring (the new agent + its fasit fixture + lint test are the only new files); commands unchanged (24). **v3.1.0 (Endring 9)** adds an **adversarial review package** run COLD on a frozen draft — new **Step 6.5 (headless-review)** between the persona sweep (Step 6) and lock, plus a standalone **`/linkedin:headless-review`** command (run in a fresh session for maximum isolation): three new headless archetypes — **`content-reviewer`** (argument integrity), **`language-reviewer`** (Norwegian language), **`fact-reviewer`** (cold re-verification incl. claims a late pivot bolted on) — plus `persona-reviewer` in resonance + conversion modes, all with NO drafting-session context (the independence layer the in-session gates structurally cannot be). v3.1.0 also adds **`/linkedin:pivot`** (re-opens cleared gates after a late change + a >20 %/>2-section pivot-detection gate at lock) and **per-artifact personas** (`articles.NN.personas` — one or more readers configurable per edition, resolved edition-state → series file → plugin library → interactive). Pipeline 15 → 16 phases; agents 16 → 19; commands 24 → 26; additive `personas` / `pivots` / `headlessReview` state. Motivated by Del 4: the in-session editor + persona sweep shared the drafting session's framing-bias, so the version that shipped was never independently re-reviewed.
|
||||
|
||||
## Architecture
|
||||
|
||||
|
|
|
|||
|
|
@ -216,6 +216,25 @@ describe('updatePostTracking', () => {
|
|||
assert.ok(result.content.includes('- [2026-04-05] "AI governance is not about..."'));
|
||||
});
|
||||
|
||||
test('appends a $-bearing topic/hook verbatim — special replacement patterns are not interpreted', () => {
|
||||
// Regression (S12): the append used a replacement *string*, so `$1`/`$&`/`$\``/
|
||||
// `$'`/`$$` in user content were interpreted — a "$100 budget" topic re-injected
|
||||
// the captured `## Recent Posts` heading and dropped characters. The replacement
|
||||
// function inserts the content verbatim.
|
||||
const result = updatePostTracking(SAMPLE_STATE, {
|
||||
postDate: '2026-04-09',
|
||||
postTopic: '$100 budget — $& and $1 rule',
|
||||
hookText: 'We cut $1 of $5',
|
||||
charCount: 1200,
|
||||
format: 'post'
|
||||
});
|
||||
assert.notEqual(result, null);
|
||||
assert.ok(result.content.includes('$100 budget — $& and $1 rule'), 'topic with $-tokens must be inserted verbatim');
|
||||
assert.ok(result.content.includes('"We cut $1 of $5"'), 'hook with $-tokens must be inserted verbatim');
|
||||
const headings = result.content.match(/^## Recent Posts$/gm) || [];
|
||||
assert.equal(headings.length, 1, 'heading must not be re-injected by a $1/$& expansion');
|
||||
});
|
||||
|
||||
test('updates longest_streak when current exceeds it', () => {
|
||||
const highStreak = SAMPLE_STATE.replace('current_streak: 5', 'current_streak: 12');
|
||||
const result = updatePostTracking(highStreak, {
|
||||
|
|
@ -319,6 +338,34 @@ describe('pruneContentHistory', () => {
|
|||
assert.notEqual(result, null);
|
||||
assert.equal(result.pruned, 1);
|
||||
});
|
||||
|
||||
test('preserves a $-bearing kept entry verbatim while pruning — special replacement patterns are not interpreted', () => {
|
||||
// Regression (S12): the rewrite used `.replace(section, newSectionString)`; a
|
||||
// string search has no $1 group, but `$&` still expands to the whole matched
|
||||
// section and `$$` collapses to `$`, so a kept post like "$$ and $& budget"
|
||||
// corrupted state. The replacement function inserts newSection verbatim.
|
||||
const today = new Date();
|
||||
const old = new Date(today); old.setDate(old.getDate() - 100);
|
||||
const oldDate = old.toISOString().slice(0, 10);
|
||||
const recent = new Date(today); recent.setDate(recent.getDate() - 10);
|
||||
const recentDate = recent.toISOString().slice(0, 10);
|
||||
|
||||
// Build the fixture with a replacement FUNCTION too — a string replacement here
|
||||
// would itself interpret the `$$`/`$&` we are trying to plant (the very bug under
|
||||
// test), corrupting the fixture before pruneContentHistory ever sees it.
|
||||
const stateWithMix = SAMPLE_STATE.replace(
|
||||
'## Recent Posts\n\n',
|
||||
() => `## Recent Posts\n\n- [${oldDate}] "Old..." (1000) - drop me\n- [${recentDate}] "Saved $&100" (1200) - $$ and $& budget\n`
|
||||
);
|
||||
|
||||
const result = pruneContentHistory(stateWithMix, 90);
|
||||
assert.notEqual(result, null);
|
||||
assert.equal(result.pruned, 1);
|
||||
assert.ok(!result.content.includes(oldDate), 'old entry pruned');
|
||||
assert.ok(result.content.includes(`- [${recentDate}] "Saved $&100" (1200) - $$ and $& budget`), 'kept $-bearing entry survives verbatim');
|
||||
const headings = result.content.match(/^## Recent Posts$/gm) || [];
|
||||
assert.equal(headings.length, 1, 'section must not be duplicated by a $& expansion');
|
||||
});
|
||||
});
|
||||
|
||||
describe('updateFollowerCount', () => {
|
||||
|
|
@ -446,6 +493,24 @@ describe('recordFirstHourPlan', () => {
|
|||
assert.ok(section.includes('AI governance'));
|
||||
});
|
||||
|
||||
test('records $-bearing topic/targets/comments verbatim — special replacement patterns are not interpreted', () => {
|
||||
// Regression (S12): the section append used a replacement *string*; `$&`/`$1`/`$$`
|
||||
// in the topic/targets/comments were interpreted. The function inserts verbatim.
|
||||
const result = recordFirstHourPlan(TEMPLATE_STATE, {
|
||||
planDate: '2026-05-30 09:00',
|
||||
postTopic: '$100 launch & $& spend',
|
||||
targets: ['Whale: @big$voice ($1 ask)'],
|
||||
draftComments: ['Loved the $$ point and the $& follow-up'],
|
||||
plan: ['09:00 — live']
|
||||
});
|
||||
assert.notEqual(result, null);
|
||||
assert.ok(result.content.includes('$100 launch & $& spend'), 'topic with $-tokens verbatim');
|
||||
assert.ok(result.content.includes('Whale: @big$voice ($1 ask)'), 'target with $-tokens verbatim');
|
||||
assert.ok(result.content.includes('Loved the $$ point and the $& follow-up'), 'comment with $-tokens verbatim');
|
||||
const headings = result.content.match(/^## First-Hour Plans$/gm) || [];
|
||||
assert.equal(headings.length, 1, 'section must not be re-injected by a $1/$& expansion');
|
||||
});
|
||||
|
||||
test('no-anchor fall-through: neither last_firsthour_date nor last_post_date — scalar not written, not reported; section still appended', () => {
|
||||
// Exercises the else-fall-through of the date-scalar gate
|
||||
// (state-updater.mjs:225-231): with NO anchor field, the scalar cannot be
|
||||
|
|
@ -557,6 +622,25 @@ describe('recordOutreachContact', () => {
|
|||
assert.ok(section.includes('@bigvoice'));
|
||||
});
|
||||
|
||||
test('records $-bearing partner/stage/nextAction verbatim — special replacement patterns are not interpreted', () => {
|
||||
// Regression (S12): same class as the other section appends — replacement
|
||||
// function, not string, so `$&`/`$1`/`$$` in user content are inserted verbatim.
|
||||
const result = recordOutreachContact(TEMPLATE_STATE, {
|
||||
contactDate: '2026-05-30 14:00',
|
||||
track: 'collab',
|
||||
partner: '@big$voice & $&co',
|
||||
stage: 'pitched $100 deal',
|
||||
nextAction: 'send $$ quote, ref $1',
|
||||
dueDate: '2026-06-06'
|
||||
});
|
||||
assert.notEqual(result, null);
|
||||
assert.ok(result.content.includes('@big$voice & $&co'), 'partner with $-tokens verbatim');
|
||||
assert.ok(result.content.includes('pitched $100 deal'), 'stage with $-tokens verbatim');
|
||||
assert.ok(result.content.includes('send $$ quote, ref $1'), 'nextAction with $-tokens verbatim');
|
||||
const headings = result.content.match(/^## Outreach Pipeline$/gm) || [];
|
||||
assert.equal(headings.length, 1, 'section must not be re-injected by a $1/$& expansion');
|
||||
});
|
||||
|
||||
test('no-anchor fall-through: none of last_outreach_date/last_firsthour_date/last_post_date — scalar not written, not reported; section still appended', () => {
|
||||
// Exercises the else-fall-through of the date-scalar gate
|
||||
// (state-updater.mjs:284-293): with NONE of the three anchors, the scalar
|
||||
|
|
|
|||
|
|
@ -108,9 +108,14 @@ export function updatePostTracking(stateContent, { postDate, postTopic, hookText
|
|||
// 8. Append to Recent Posts section
|
||||
const hookPreview = hookText.length > 60 ? hookText.slice(0, 57) + '...' : hookText;
|
||||
const entry = `- [${postDate}] "${hookPreview}" (${charCount}) - ${postTopic}`;
|
||||
// Replacement FUNCTION, not string: `entry` embeds untrusted user content
|
||||
// (hookPreview, postTopic). In a replacement *string*, `$1`/`$&`/`` $` ``/`$'`/`$$`
|
||||
// are special, so a `$`-bearing topic (e.g. "$100 budget cut") would re-inject the
|
||||
// captured heading and drop characters, silently corrupting state. A function
|
||||
// inserts `entry` verbatim. (m === the whole captured heading.)
|
||||
content = content.replace(
|
||||
/^(## Recent Posts\n\n?)/m,
|
||||
`$1${entry}\n`
|
||||
(m) => `${m}${entry}\n`
|
||||
);
|
||||
changes.push(`Recent Posts += ${postDate} "${hookPreview.slice(0, 30)}..."`);
|
||||
|
||||
|
|
@ -154,7 +159,10 @@ export function pruneContentHistory(stateContent, maxAgeDays = 90) {
|
|||
if (pruned === 0) return null;
|
||||
|
||||
const newSection = kept.join('\n');
|
||||
const content = stateContent.replace(recentSection[1], newSection);
|
||||
// Replacement FUNCTION, not string: `newSection` is rebuilt from KEPT user
|
||||
// entries; with a string search, `$&`/`` $` ``/`$'`/`$$` in any kept post would be
|
||||
// interpreted and corrupt the rewrite. A function inserts it verbatim.
|
||||
const content = stateContent.replace(recentSection[1], () => newSection);
|
||||
return { content, pruned };
|
||||
}
|
||||
|
||||
|
|
@ -192,9 +200,12 @@ export function updateFollowerCount(stateContent, { count, month }) {
|
|||
|
||||
// Append to Milestone Log section
|
||||
const logEntry = `- [${month}] ${count} (${delta >= 0 ? '+' : ''}${delta})`;
|
||||
// Replacement FUNCTION, not string: same class as the other section appends.
|
||||
// `logEntry` is month + integers today (no `$`), but a function keeps the whole
|
||||
// append family uniform and `$`-safe by construction.
|
||||
content = content.replace(
|
||||
/^(## Milestone Log\n)/m,
|
||||
`$1${logEntry}\n`
|
||||
(m) => `${m}${logEntry}\n`
|
||||
);
|
||||
changes.push(`Milestone Log += ${month}`);
|
||||
|
||||
|
|
@ -251,7 +262,7 @@ export function recordFirstHourPlan(stateContent, { planDate, postTopic = '', ta
|
|||
|
||||
// 4. Append to ## First-Hour Plans (newest first); create the section if absent (additive)
|
||||
if (/^## First-Hour Plans\b/m.test(content)) {
|
||||
content = content.replace(/^(## First-Hour Plans\n\n?)/m, `$1${entry}\n`);
|
||||
content = content.replace(/^(## First-Hour Plans\n\n?)/m, (m) => `${m}${entry}\n`); // function, not string: entry embeds untrusted topic/targets/comments — `$`-safe
|
||||
} else {
|
||||
const trimmed = content.replace(/\s*$/, '');
|
||||
content = `${trimmed}\n\n## First-Hour Plans\n\n<!-- First-hour / reply-loop plans. Format: ### [YYYY-MM-DD HH:MM] topic -->\n\n${entry}\n`;
|
||||
|
|
@ -311,7 +322,7 @@ export function recordOutreachContact(stateContent, { contactDate, track = '', p
|
|||
|
||||
// 4. Append to ## Outreach Pipeline (newest first); create the section if absent (additive)
|
||||
if (/^## Outreach Pipeline\b/m.test(content)) {
|
||||
content = content.replace(/^(## Outreach Pipeline\n\n?)/m, `$1${entry}\n`);
|
||||
content = content.replace(/^(## Outreach Pipeline\n\n?)/m, (m) => `${m}${entry}\n`); // function, not string: entry embeds untrusted partner/stage/nextAction — `$`-safe
|
||||
} else {
|
||||
const trimmed = content.replace(/\s*$/, '');
|
||||
content = `${trimmed}\n\n## Outreach Pipeline\n\n<!-- Outreach contacts / pipeline rows, newest first. Written by /linkedin:outreach. -->\n<!-- Format: ### [YYYY-MM-DD HH:MM] partner — track -->\n\n${entry}\n`;
|
||||
|
|
|
|||
|
|
@ -35,7 +35,7 @@ This is NOT a true controlled experiment. Confounders include:
|
|||
- **External events:** Trending topics, holidays, and news affect feed behavior
|
||||
- **Network effects:** A new viral connection can skew reach mid-test
|
||||
|
||||
The 20% significance threshold (see Statistical Interpretation below) accounts for these confounders.
|
||||
The 20% minimum-meaningful-difference threshold (see Statistical Interpretation below) accounts for these confounders.
|
||||
|
||||
## What You Can Test (Variables)
|
||||
|
||||
|
|
@ -163,13 +163,15 @@ Use this template to record completed tests:
|
|||
- [List all controlled variables]
|
||||
|
||||
### Results
|
||||
| Metric | Variant A (Avg) | Variant B (Avg) | Difference | Significant? (>20%) |
|
||||
|--------|-----------------|-----------------|------------|---------------------|
|
||||
| Metric | Variant A (Avg) | Variant B (Avg) | Difference | Directional? |
|
||||
|--------|-----------------|-----------------|------------|--------------|
|
||||
| Impressions | X | X | X% | Yes/No |
|
||||
| Engagement Rate | X% | X% | X% | Yes/No |
|
||||
| Comments | X | X | X% | Yes/No |
|
||||
| Reposts | X | X | X% | Yes/No |
|
||||
|
||||
_"Directional?" = the gap clears the ~20% minimum-meaningful-difference AND points the same way across most posts. It is a direction to test further, not a statistically significant result._
|
||||
|
||||
### Individual Post Data
|
||||
| Post # | Variant | Date | Impressions | Reactions | Comments | Reposts | Eng. Rate |
|
||||
|--------|---------|------|-------------|-----------|----------|---------|-----------|
|
||||
|
|
|
|||
|
|
@ -12,7 +12,9 @@
|
|||
# tree) was added in remediation Step 3; the version-consistency grep in
|
||||
# Step 21; the agent model-consistency guard (each agents/<name>.md frontmatter
|
||||
# model: must match every surface declaration, and canonical rosters must list
|
||||
# every agent) in S11. All three are live below (Sections 8, 9 and 10).
|
||||
# every agent) in S11; the render-chain propagation guard (no honesty pattern a
|
||||
# command was cleaned of survives in the reference it renders from) in S12. All
|
||||
# four are live below (Sections 8, 9, 10 and 11).
|
||||
#
|
||||
# Usage: bash scripts/test-runner.sh
|
||||
# bash 3.2-safe: plain arrays only, no `declare -A`, no `mapfile`/`readarray`.
|
||||
|
|
@ -341,6 +343,80 @@ fi
|
|||
|
||||
echo ""
|
||||
|
||||
# --- Section 11: Render-Chain Propagation ---
|
||||
echo "--- Render-Chain Propagation ---"
|
||||
|
||||
# Commands render from the references they inline via
|
||||
# ${CLAUDE_PLUGIN_ROOT}/references/…. An honesty pattern removed from a command
|
||||
# surface must NOT survive in the reference that command renders from — otherwise
|
||||
# the user still hits it. Added in S12 after a cold full-brief review found the
|
||||
# banned A/B significance-verdict column (`Significant? (>20%)` with Yes/No cells)
|
||||
# still shipping in references/ab-testing-framework.md while commands/ab-test.md had
|
||||
# already been cleaned to the honest "Directional?" framing. The command-level fix
|
||||
# never propagated to its render-source, and Section 8's STALE_STATS grep targets
|
||||
# magnitudes, not this construct, so the survivor passed green. This generalizes the
|
||||
# fix from "clean the command" to "the banned construct is forbidden across the
|
||||
# WHOLE render chain (commands AND references)". Future propagation-class patterns
|
||||
# get appended to PROP_FORBIDDEN, mirroring how Section 8's STALE_STATS grew to the
|
||||
# full criterion rather than the single named token.
|
||||
#
|
||||
# Forbidden: the significance-VERDICT column — `Significant?` adjacent to a `(` (the
|
||||
# `(>20%)` verdict parenthetical) or a table pipe (`| Significant?`). The defect is a
|
||||
# column steering users to record a statistical-significance call that organic
|
||||
# personal-post volume never reaches; "directional" is the honest frame. Legitimate
|
||||
# descriptive prose ("Significantly higher", "Significant capability", "statistical
|
||||
# significance", a bare sentence-final "significant?") carries no `(`/`|`-adjacency
|
||||
# and is left alone.
|
||||
PROP_FORBIDDEN='Significant\?[[:space:]]*\(|\|[[:space:]]*Significant\?'
|
||||
# Non-vacuity self-test (mirrors Section 8): the criterion is only meaningful if it
|
||||
# MATCHES the verdict-column forms and IGNORES legitimate prose. Runs on every
|
||||
# invocation BEFORE the real scan, so weakening the pattern fails the suite instead
|
||||
# of silently certifying an unenforced guard. The positive set covers the exact S12
|
||||
# survivor + its bare-column variant; the negative set covers the honest
|
||||
# "Directional?" fix and every legitimate "Significant"/"significance" string the
|
||||
# tree carries today.
|
||||
PROP_SELFTEST_OK=1
|
||||
while IFS= read -r probe; do
|
||||
[ -z "$probe" ] && continue
|
||||
if ! echo "$probe" | grep -qE "$PROP_FORBIDDEN"; then
|
||||
PROP_SELFTEST_OK=0; echo " non-vacuity FAIL: forbidden form not caught -> $probe"
|
||||
fi
|
||||
done <<'PROP_POSITIVE'
|
||||
| Difference | Significant? (>20%) |
|
||||
Significant? (>20%)
|
||||
| Significant? |
|
||||
PROP_POSITIVE
|
||||
while IFS= read -r probe; do
|
||||
[ -z "$probe" ] && continue
|
||||
if echo "$probe" | grep -qE "$PROP_FORBIDDEN"; then
|
||||
PROP_SELFTEST_OK=0; echo " false-positive FAIL: legitimate form caught -> $probe"
|
||||
fi
|
||||
done <<'PROP_NEGATIVE'
|
||||
| Difference | Directional? (>20% gap) |
|
||||
Significantly higher weight than generic responses
|
||||
Significant capability breakthroughs
|
||||
Significantly Behind (<50%)
|
||||
LinkedIn analytics does not support statistical significance tests
|
||||
Is the difference significant? Probably not.
|
||||
PROP_NEGATIVE
|
||||
if [ "$PROP_SELFTEST_OK" -eq 1 ]; then
|
||||
pass "render-chain propagation self-test: 3 verdict-column forms caught, 6 legitimate forms ignored"
|
||||
else
|
||||
fail "render-chain propagation self-test failed — the guard no longer enforces the criterion"
|
||||
fi
|
||||
|
||||
# Real scan across the whole user-facing render chain (commands + every reference
|
||||
# they inline) plus the adjacent surfaces a copy could migrate the table into.
|
||||
PROP_HITS=$(grep -rnE "$PROP_FORBIDDEN" references/ commands/ skills/ hooks/prompts/ agents/ assets/templates/ assets/checklists/ 2>/dev/null || true)
|
||||
if [ -z "$PROP_HITS" ]; then
|
||||
pass "no significance-verdict column survives in any command or its render-source reference"
|
||||
else
|
||||
fail "significance-verdict column reintroduced — use the honest 'Directional?' framing (see commands/ab-test.md):"
|
||||
echo "$PROP_HITS"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# --- Summary ---
|
||||
echo "================================================"
|
||||
echo "RESULTS"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue