Reading Results

Your test has been running. Now it's time to look at the data, understand what it means, and decide what to do. This page explains every number on your results dashboard in plain English.

The short version: Wait for the green "Significant" badge, check the lift percentage, and ship the winner.

Watch: Understanding your test results

A walkthrough of the results dashboard — what every number means and how to make a decision.

6:15

Video coming soon

What you'll see in the dashboard

Each experiment's results page shows four things:

  1. Variant breakdown — conversion rates for each variant, side by side
  2. Daily chart — how conversion rates trend over time
  3. Statistical analysis — whether the result is real or could be luck
  4. Sample size — how many visitors saw each variant and how many converted

Results update automatically every 30 seconds.

Understanding the numbers

Conversion rate

The percentage of people who completed your goal out of everyone who saw the test:

Conversion Rate = (People who clicked / People who saw) × 100

Example: 1,000 people saw the blue button. 50 clicked it. Conversion rate = 5.0%.

Lift percentage

How much better (or worse) one variant is compared to the other:

Lift = ((New variant rate - Original rate) / Original rate) × 100

Example: Your original button has a 5% conversion rate. The green button has a 6% conversion rate. Lift = +20%. That means 20% more people clicked the green button.

A positive lift means the new variant is winning. A negative lift means the original is better.

Statistical significance (is this result real or luck?)

This is the most important number. It answers: "Can I trust this result?"

Imagine flipping a coin 10 times and getting 7 heads. Is the coin rigged? Probably not — that kind of variation is normal. But if you flip it 1,000 times and get 700 heads, something is definitely going on. Statistical significance tells you which situation you're in.

What the dashboard shows:

| What you see | What it means | What to do | |-------------|--------------|------------| | Significant (p < 0.05) | The difference is almost certainly real | Trust the result — ship the winner | | Not yet significant (p > 0.05) | Could still be luck — need more data | Keep the test running | | No clear difference | Variants perform about the same | Keep the simpler option (usually control) |

You don't need to understand the math

The p-value is a statistical calculation that tells CADENCE how confident to be. A p-value below 0.05 means there's less than a 5% chance the result is due to random variation. You don't need to calculate anything — look for the "Significant" badge in the dashboard.

Confidence intervals

The confidence interval shows the range of plausible values for the real lift. For example:

Lift: +12% (95% CI: +4% to +20%)

This means: "We're 95% confident the true improvement is somewhere between 4% and 20%."

  • Narrow range (like +10% to +14%) = very precise, lots of data
  • Wide range (like -5% to +30%) = still uncertain, need more data

If the range includes zero (like -2% to +15%), the result might not be real — one variant might not actually be better than the other.

Sample size

Two numbers matter:

  • Exposures — how many people saw each variant (from getVariant() calls)
  • Conversions — how many people completed the goal (from trackConversion() calls)

More data = more reliable results. If you only have 20 conversions per variant, the result could easily be luck.

When to stop a test

Your test should run until ALL three conditions are met:

  1. At least 7 days have passed — captures weekday vs. weekend differences
  2. At least 100 conversions per variant — not exposures, but actual goal completions
  3. Dashboard shows "Significant" — the p-value is below 0.05

The #1 mistake: stopping too early

Checking results after 2 days and seeing "green button is ahead by 30%!" doesn't mean the green button is actually better. With small sample sizes, early results swing wildly. A test that looks like a winner on day 3 might be a loser by day 10. Set your minimum duration upfront and don't stop before it.

What if I never reach significance?

If your test ran for the planned duration and still isn't significant, that's a perfectly valid result. It means one of two things:

  • There's no meaningful difference between variants — neither is clearly better
  • The difference is too small to detect with your traffic level

Both are useful information. Keep the simpler option (usually your original), document what you learned, and move on to a higher-impact test.

Making a decision

Use this framework based on what the dashboard shows:

Clear winner

  • Dashboard shows Significant
  • Lift is meaningfully positive (more than +5%)
  • Trend in the daily chart is consistent

What to do: Ship the winning variant to 100% of users. Document the result. Archive the experiment.

No clear winner

  • Not significant after 7+ days
  • Lift is near zero (between -3% and +3%)

What to do: Keep the original (simpler) version. The change doesn't make enough difference to justify it. Move on to testing something else.

Surprising loser

  • Dashboard shows Significant
  • Lift is negative — the new variant is worse

What to do: Don't ship the new variant. This is valuable — you learned what doesn't work. Think about why the change hurt performance and use that insight for your next test.

Borderline result

  • Close to significant (p-value between 0.05 and 0.10)
  • Moderate lift (+5% to +10%)

What to do: If you have enough traffic, extend the test for another week. If not, treat it as inconclusive and make a judgment call.

Sharing results with your team

Raw statistics don't resonate with leadership. Instead of "p = 0.032 with +12% lift," use CADENCE's Impact View to translate results into business language:

  • "This test will generate an estimated $45K ARR"
  • "We retained 200 more users this quarter from testing"
  • "Our testing program has delivered $150K in cumulative value"

See the Impact View guide to set up business metrics for your tests.

Common mistakes to avoid

1. Stopping too early

The #1 mistake. Early results are unreliable. Set a minimum duration of 7 days and at least 100 conversions per variant before looking at results.

2. Peeking and stopping when it looks good

Looking at results repeatedly and stopping the moment one variant looks ahead is called "p-hacking." The more you check, the more likely you'll see a false positive. Set your criteria upfront and check at the end.

3. Ignoring practical significance

A 0.1% improvement might be statistically significant with enough data, but is it worth the effort to implement? Consider whether the lift is large enough to matter for your business.

4. Running too many tests at once

With 20 simultaneous comparisons at p < 0.05, you'd expect 1 false positive by chance alone. Focus on your highest-impact tests.

5. Not documenting results

Every test — win, lose, or draw — is a learning. Record what you tested, what happened, and what you learned. Your team's testing knowledge builds over time.

After the test

  1. Record the result — update the experiment status in CADENCE (implement winner or archive)
  2. Share with your team — use Impact View for leadership reviews
  3. Iterate — if the new variant won, try an even bolder version. If it lost, try a different approach.
  4. Update your backlog — use results to prioritize your next test

Next steps