ktg-plugin-marketplace/plugins/linkedin-thought-leadership/references/ab-testing-framework.md
Kjell Tore Guttormsen 39f8b275a6 feat(linkedin-thought-leadership): v1.0.0 — initial open-source import
Build LinkedIn thought leadership with algorithmic understanding,
strategic consistency, and AI-assisted content creation. Updated for
the January 2026 360Brew algorithm change.

16 agents, 25 commands, 6 skills, 9 hooks, 24 reference docs.

Personal data sanitized: voice samples generalized to template,
high-engagement posts cleared, region-specific references replaced
with placeholders.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 22:09:03 +02:00

11 KiB

name version description
A/B Testing Framework for LinkedIn Content 1.7.0 Methodology for systematic content experimentation on LinkedIn, including test design, variable isolation, statistical interpretation, and learning documentation.

A/B Testing Framework for LinkedIn Content

Systematic experimentation methodology for LinkedIn thought leadership. Since LinkedIn provides no native A/B testing, this framework uses sequential posting with controlled variables to generate actionable content insights.

Why A/B Test on LinkedIn?

The Problem

Most content creators rely on gut feeling to decide what works. They notice a post "did well" but can't explain why, or they copy what worked once without understanding the variable that drove performance.

The Approach

LinkedIn does not offer native A/B testing. Instead, we use manual A/B testing through sequential posting: publish Variant A and Variant B across comparable time windows, holding all other variables constant, and compare metrics.

Goals

  • Replace gut-feeling decisions with systematic learning
  • Build a personal dataset of what works for YOUR audience
  • Compound small improvements over time (5% better each month = 80% better per year)
  • Identify high-impact levers specific to your niche and follower level

Limitations

This is NOT a true controlled experiment. Confounders include:

  • Audience variance: Different people see each post
  • Time variance: Algorithm state and user behavior shift day to day
  • Algorithm shifts: LinkedIn updates ranking signals periodically
  • External events: Trending topics, holidays, and news affect feed behavior
  • Network effects: A new viral connection can skew reach mid-test

The 20% significance threshold (see Statistical Interpretation below) accounts for these confounders.

What You Can Test (Variables)

Organized by impact level. Always start with high-impact variables.

High Impact Variables

# Variable What to Test Why It Matters
1 Hook/Opening line Question vs. statement, personal vs. universal, short vs. long (within 110-140 char limit) Determines whether anyone clicks "see more." Single biggest driver of impressions.
2 Post format Text-only vs. carousel vs. poll vs. video vs. document Format multipliers range from 1.17x (text) to 1.6x (carousel). Audience preference varies.
3 Content angle Story-based vs. tactical vs. contrarian vs. curation Angle determines comment quality and engagement depth.
4 Call-to-action Question vs. invitation vs. challenge vs. none CTA drives comments (strongest algorithm signal after saves).

Medium Impact Variables

# Variable What to Test Why It Matters
5 Post length Short (500 chars) vs. standard (1,200-1,800) vs. long (2,500+) Optimal range is 1,200-1,800, but audience tolerance varies.
6 Posting time Morning (7-9 AM) vs. lunch (11 AM-1 PM) vs. evening (5-7 PM) First-hour velocity depends on when your audience is online.
7 Posting day Tue/Wed/Thu (proven best) vs. Mon/Fri vs. weekend Day affects available audience pool.
8 Visual elements With image vs. without, custom graphic vs. photo Visuals affect scroll-stop but may not affect engagement rate.

Low Impact Variables (Test Last)

# Variable What to Test Why It Matters
9 Hashtag count 0 vs. 3 vs. 5 Diminishing returns; 5+ triggers -68% penalty.
10 First comment With vs. without, link vs. context vs. question First comment strategy can boost or confuse engagement.
11 Emoji usage None vs. minimal vs. heavy Audience-dependent; professional audiences may penalize heavy use.
12 Line spacing Dense vs. airy Readability matters on mobile but effect is subtle.

Test Design Methodology

The Sequential A/B Method

  1. Hypothesis: "Changing [variable] from [A] to [B] will increase [metric] by [amount]"
  2. Control (A): Your current approach (baseline)
  3. Variant (B): Single changed variable
  4. Sample size: Minimum 3 posts each (6 total) for any confidence
  5. Timing: Alternate A/B across same days and times to minimize confounders
  6. Duration: Run test over 2-3 weeks minimum

Rules for Valid Testing

  1. Change ONLY ONE variable per test. If you change both hook style and post length, you cannot attribute the result to either.
  2. Keep all other elements as similar as possible. Same topics, same tone, same posting time.
  3. Post at similar times on similar days. A Tuesday 8 AM post vs. a Saturday 3 PM post is not a valid comparison.
  4. Don't test during unusual periods. Holidays, viral events, and algorithm updates introduce noise.
  5. Document everything. Memory is unreliable. Log every post, variant, and metric.
  6. Minimum 6 posts (3 per variant) before drawing conclusions. One post proves nothing.
  7. Wait 48-72 hours before measuring. LinkedIn's long-tail distribution (Stage 4) means early metrics can mislead.

Example Test Plan

Hypothesis: "Using a provocative question hook instead of a bold statement hook will increase engagement rate by 25%."

Post # Week Day Time Variant Hook Style
1 W05 Tue 8 AM A (Statement) "AI readiness is a leadership problem, not a technology problem."
2 W05 Wed 8 AM B (Question) "What if AI readiness has nothing to do with technology?"
3 W05 Thu 8 AM A (Statement) "Your data strategy is probably backwards."
4 W06 Tue 8 AM B (Question) "Why are we implementing AI before fixing our data?"
5 W06 Wed 8 AM A (Statement) "We need to stop calling them 'AI projects.'"
6 W06 Thu 8 AM B (Question) "Is your organization brave enough to wait on AI?"

Keep constant: Post length (~1,500 chars), text-only format, AI/data topic, no external links, 3 hashtags, same CTA style.

Statistical Interpretation (Simplified)

Comparing Results

LinkedIn analytics does not support statistical tests. Use this simplified approach:

  1. Calculate average for each variant across all test posts
  2. Calculate the difference as a percentage: ((B - A) / A) * 100
  3. Apply the 20% rule: Only consider a result meaningful if the difference is >20%
  4. The 20% threshold accounts for LinkedIn's natural variability (algorithm state, audience online, timing, external events)
  5. Below 20% difference: The variable likely does not matter much for your audience. Focus elsewhere.

Metrics to Compare (Priority Order)

Priority Metric Why
1 Engagement rate (reactions + comments + reposts) / impressions. Best single metric.
2 Comment count Strongest algorithm signal. Drives extended distribution.
3 Impressions Total reach. Shows distribution success.
4 Profile views generated Business impact. Measures conversion interest.
5 Follower growth during test Long-term value. Hard to attribute to single test.

Interpreting Results

Result Pattern Interpretation Action
B wins in engagement, A wins in impressions B resonates more deeply but A has broader reach Consider audience targeting and post goals
Both similar (<20% diff) Variable does not matter for your audience Stop testing this variable, move to next
B clearly wins (>30% diff) Strong signal -- adopt B as new baseline Update your content strategy
B wins in some posts, A in others Inconsistent results, likely confounders Extend test with more posts or redesign
A consistently wins Your current approach is better Keep the baseline, test something else

Confidence Levels

Sample Size (per variant) Max Confidence Recommendation
1-2 posts Low Not enough data. Do not draw conclusions.
3-4 posts Medium Directional signal. Proceed cautiously.
5-7 posts High Reliable signal if difference >20%.
8+ posts Very High Strong foundation for strategy changes.

Learning Documentation Template

Use this template to record completed tests:

## A/B Test: [Variable Tested]
**Hypothesis:** [What you expected]
**Test period:** [YYYY-WXX to YYYY-WXX]
**Posts per variant:** A: [X], B: [X]

### Variants
- **Variant A (Control):** [Description of current approach]
- **Variant B (Test):** [Description of change]

### What Was Kept Constant
- [List all controlled variables]

### Results
| Metric | Variant A (Avg) | Variant B (Avg) | Difference | Significant? (>20%) |
|--------|-----------------|-----------------|------------|---------------------|
| Impressions | X | X | X% | Yes/No |
| Engagement Rate | X% | X% | X% | Yes/No |
| Comments | X | X | X% | Yes/No |
| Reposts | X | X | X% | Yes/No |

### Individual Post Data
| Post # | Variant | Date | Impressions | Reactions | Comments | Reposts | Eng. Rate |
|--------|---------|------|-------------|-----------|----------|---------|-----------|
| 1 | A | YYYY-MM-DD | X | X | X | X | X% |
| 2 | B | YYYY-MM-DD | X | X | X | X | X% |
| ... | ... | ... | ... | ... | ... | ... | ... |

### Conclusion
[What we learned -- be specific and honest about confidence level]

### Action
[What changes to make going forward based on results]

### Follow-Up Test
[What to test next based on these learnings]

Common Pitfalls

  1. Testing too many variables at once. If you change hook, format, AND length simultaneously, a positive result tells you nothing about which change mattered.

  2. Drawing conclusions from 1-2 posts. One post can go viral or flop for reasons unrelated to your variable. Minimum 3 posts per variant.

  3. Ignoring external factors. A post during a major industry event will outperform a post during a holiday weekend regardless of your variable. Note external context.

  4. Confirmation bias. You will see what you want to see. Let the numbers speak. If the difference is <20%, accept that the variable does not matter.

  5. Not documenting results. You will forget. Use the template above for every test, even inconclusive ones.

  6. Testing low-impact variables first. Spending weeks testing emoji usage while your hooks are weak wastes time. Start with Variable #1 (hooks).

  7. Never acting on results. The point of testing is to change your approach. If B wins, adopt B as your new baseline and test the next variable.

  8. Abandoning tests early. If post 1 and 2 both favor B, it is tempting to declare victory. Complete the minimum sample size.

  9. Not controlling timing. Posting Variant A on Tuesday morning and Variant B on Friday evening invalidates the comparison.

  10. Forgetting the baseline. Always know what your current averages are before starting a test. Without a baseline, "improvement" is meaningless.


Last updated: January 2026

Methodology adapted from growth marketing A/B testing principles, applied to LinkedIn's sequential posting model with adjustments for platform-specific confounders.