Most A/B Tests Don't Produce Significant Results

Experimenting remains a crucial aspect of marketing

Experimenting is a crucial part of media and marketing, but many internal tests fail to produce tangible effects.

About two-thirds of brand marketers use A/B testing to improve conversion rates, according to research by Econsultancy and Red Eye. Marketers also rely on A/B testing to optimize the landing pages of their ads. And publishers use A/B testing to personalize content for users and find headlines and images that drive traffic.

While A/B testing is common among publishers and marketers, most A/B tests fail to produce statistically significantly results. According to a survey of 3,900 professionals worldwide by UserTesting, fewer than 20% of respondents reported that their A/B tests produce significant results 80% of the time.

A similar analysis by Appsumo concluded that only one of every eight A/B tests lead to significant change. Although many A/B tests don’t produce significant results, it’d be irresponsible of marketers to eliminate A/B testing from their media plans, according to John Donahue, chief product officer of programmatic platform Sonobi.

“The benefits of A/B testing are undeniable,” Donahue said. “Developing any creative project there are a lot of assumptions, A/B testing allows you to remove those assumptions.”

In some instances, A/B testing call-to-action features and ad headlines can save marketers 40% of their media budget on ad platforms like Facebook, according to Donahue. Part of the reason the listicle publisher Ranker is able to make money from buying traffic off Facebook is because Ranker frequently tests which audience targets it can reach at a low price.

Of course, it’d be unrealistic to expect every A/B test to facilitate meaningful results. And similar to how scientists learn from their failed experiments, marketers can learn from A/B tests that didn’t yield anything.

While some tests fail due to bad design, another reason many A/B tests don’t produce significant results is because the sampling traffic that’s powering the test isn’t large enough to lead to conclusive evidence, according to Mani Gandham, CEO of content marketing company Instinctive. Gandham said that if the size of the test’s sample isn’t properly put into context, marketers can end up with experiments that “result in rather fuzzy results, and a tiny relative difference in performance can easily be mistaken for a clear signal.”