You're running an experiment. Variant B has a higher conversion rate than the control. SplitPea says confidence is at 74%.

Is that good enough to call it? Should you wait?

Most A/B testing tools show you a confidence number but don't explain what it means. So people either ignore it and trust their gut, or they treat it like a loading bar and end the test the moment it crosses some threshold.

Neither approach is great.

Here's what the number actually tells you.

The short version

Confidence is how sure we are that the difference between your two versions is real, not just noise.

If SplitPea says confidence is 94%, that means there's roughly a 94% chance that Variant B genuinely outperforms the control, and about a 6% chance that the difference you're seeing is just random variation in your traffic.

That's it. That's the whole concept.

Why randomness matters

Imagine you flip a coin ten times and get seven heads. Does that mean the coin is rigged? Probably not. Ten flips isn't enough data to know. You could easily get seven heads with a perfectly fair coin.

But if you flip it ten thousand times and get seven thousand heads, something's clearly going on.

A/B testing works the same way. Early in an experiment, the numbers bounce around a lot. One version might look like it's winning by 40% after fifty visitors, then the gap shrinks to nothing after two hundred.

That's not because anything changed on your site. It's just small sample sizes being unreliable.

Confidence tells you whether you've collected enough data for the difference to mean something. Low confidence means you're still in coin-flip territory. High confidence means the pattern has held up across enough visitors that it's probably real.

What SplitPea uses: Bayesian statistics

There are two main ways to calculate confidence in A/B testing: the traditional approach (frequentist) and the Bayesian approach.

SplitPea uses Bayesian.

The practical difference for you is small, but here's the gist:

Frequentist methods make you decide your sample size in advance and then wait until the experiment is "done". You're not supposed to peek at the results early (even though everyone does).

Bayesian methods update continuously as data comes in, and peeking is fine. The confidence number is valid whenever you look at it.

That's why SplitPea can show you a live confidence percentage that means something at any point during the experiment. You're not cheating by checking it on day three. The number already accounts for how much data you have so far.

When to trust it

SplitPea uses 90% as the threshold for calling a winner. Why 90% and not 95% or 99%?

Because of who's using it.

If you're a pharmaceutical company testing a new drug, you want 99.9% confidence. Lives are at stake. If you're a freelancer testing whether "Book a call" works better than "Get in touch" on your contact page, the stakes are lower. A 10% chance of being wrong means that one time in ten, you might go with a version that's not actually better. For most website decisions, that's an acceptable risk.

95% is the standard in academic research. It's a good number. But for small sites with limited traffic, reaching 95% can take a very long time. 90% is a reasonable trade-off between rigour and practicality.

If you want to wait for higher confidence, you always can. SplitPea won't stop you from letting an experiment run longer. But it'll start recommending a decision at 90%.

What the numbers look like in practice

Here's a scenario. You're testing two headlines on your homepage. After a week, your dashboard shows:

  • Control: 1,200 visitors, 48 conversions (4.0%)
  • Variant B: 1,180 visitors, 58 conversions (4.9%)

Variant B is converting at a higher rate. The lift is about +23%. But confidence is at 76%.

What does that mean?

It means the difference could be real, but SplitPea doesn't have enough data yet to be sure. If you stopped the test now and went with Variant B, you might be right. But there's a meaningful chance (roughly one in four) that the difference is just noise and both versions perform about the same.

A week later:

  • Control: 2,400 visitors, 94 conversions (3.9%)
  • Variant B: 2,380 visitors, 119 conversions (5.0%)

The conversion rates are similar to before, but now you've got twice the data. Confidence is at 93%.

Now SplitPea can say with reasonable certainty that Variant B is actually better. The pattern held up. You can end the test and go with B.

Common mistakes

Ending too early. You see Variant B ahead after one day and shut it down. Confidence is at 61%. You've just made a decision based on a coin flip. Maybe you got lucky. Maybe you didn't.

Waiting too long. Your experiment hit 94% confidence two weeks ago and you're still running it because you want to reach 99%. That's fine if you have the traffic, but you're wasting time you could use to test something else. There are diminishing returns past 90%.

Ignoring inconclusive results. If an experiment runs for three weeks and confidence never climbs above 70%, that's a result too. It means the two versions probably perform about the same. Neither is clearly better. That's useful information. You can stop the test, keep either version, and try a bigger change next time.

Reading lift without reading confidence. "+47% lift!" sounds exciting until you notice confidence is at 58%. A big lift with low confidence is just noise with a loud voice.

The one thing to remember

Confidence answers one question: is this difference real or random?

If the number is above 90%, you can act on it. If it's below 90%, wait for more data or call it inconclusive.

You don't need to understand the maths behind it. You just need to know what the number is telling you before you make a decision. If you're ready to put it into practice, here's how to run your first test. For the shorter product answer, see the A/B testing FAQ for small businesses.

Try SplitPea free →