3 Reasons Why A/B Testing Case Studies You Read Suck Big Time

I hear it over and over again when speaking to our clients and prospects: “We did some A/B testing of the CTA color and the social sharing buttons, because we read about it in a case study and it gave great results”. Then I ask if it failed. Yes, it did. Not a surprise.

If you let your company follow the “read a case study” -> “copy the testing ideas” -> “fail” routine, you will soon be out of business. Testing out ideas found on the Internet (or worse – implementing them right away) is a common mistake among beginners. There are three reasons why you shouldn’t utilize every best practice you find on the web.

1. Most case studies are not backed-up with absolute numbers

The case studies you come across on the web mostly contain two metrics: the percentage of conversion rate growth and the statistical confidence level. But relative metrics are not enough! You can’t tell if the results are plausible unless you know the absolute number of conversions and users that took part in the experiment. It’s really rare to find these figures in a case study. For example, this case study of landing page optimization from WhichTestWon.com is not credible in my opinion, since it lacks absolute numbers (despite the fact you are expected to pay to see it!).

A/B testing
Keep in mind that relative metrics like percentage of conversion rate growth are not enough. You need absolute number of conversions and users to decide if the A/B testing case study is credible.

So why does this happen? Well, when you don’t provide the actual numbers, it’s easier to portray a result that will draw attention to your case study, with headlines such as “110% conversion rate growth by changing two words in headline”, and you can be sure that nobody will be able to disprove it. That means more pageviews / downloads for the author. Even if the statistical confidence level is said to be 95% or more, it cannot actually be checked.

Lesson number one: if an A/B testing case study lacks absolute numbers, don’t trust it.

2. Best practices are often based on random results (and are therefore nonsense)

Some people are nice enough to enhance their case studies with screenshots from A/B testing tools, like James did in this post on the Crazy Egg blog (you need to open it via archive.org since they deleted it after I published this article). The tool showed that the results are statistically significant, but the number of conversions are 15 for version A and 24 for version B. This makes the results totally random and no one should ever draw any conclusions from them.

A/B testing case study
If the number of conversions per version is lower than 100, you shouldn’t draw any conclusions from it. Unfortunately, some of the case studies authors write best practices based on random results. Don’t apply them!

That is because firstly, you should get at least 100 conversions per version to be in any way confident about the results. And secondly, at least one whole business cycle of your company should be covered, which means that the test should run for at least 14 days. Of course, depending on the industry, this may even be way longer.  Before you proclaim any best practices based on any test, you should get at least 100 conversions per version (and be sure the results are statistically significant). How does 17 and 24 compare to that?

Unfortunately, the author of the blog post mentioned above seems to lack even any basic knowledge of statistics. But still, he came up with 3 golden rules for writing compelling headlines. However, I really hope not many marketers have started A/B testing their copy based on this, because these kinds of tips may be really harmful to conversion rate.

The fact is that the stats engines in A/B testing tools cannot always be trusted. If the absolute difference between two challengers equals to 9 conversions, stating 95% confidence level is totally wrong. The same applies to any tips, conclusions or best practices built on the foundation of experiments of this kind.

You need absolute numbers and time to make test results trustworthy. So before you start or analyse your next A/B test, make sure you read this great article by Peep Laja on statistical significance and validity in testing. It will also help you understand better why you shouldn’t trust case studies with such low numbers of conversions.

Lesson number two: don’t apply best practices from A/B tests with less than 100 conversions per variation or you could harm your business.

3. Authors don’t explain why they tested certain hypotheses

Finally, you must have in mind that every website and business is unique. This is why you need to understand the logic behind any experiment you read about in a case study. Most authors do not provide information about the research they have conducted, how they found the problems and why they thought it important to test it.

To make an assessment about whether you should follow any concept presented in a case study and test the same element on your website, you need to know why the author made the calls he or she did. Unless you are sure that the context is similar to your business, you shouldn’t just copy the testing idea. Otherwise you will probably end up with a conversion rate decrease.

Lesson number three: don’t copy a testing idea from a case study unless you are sure it applies to your unique situation

How to get the best out of A/B testing case studies?

You should treat any A/B testing case studies you come across on the web as an inspiration for your own analysis. They should just be a starting point for analysis and research, not a signal to start testing right away. So if you see something that has worked out well in a case study, dig into your Google Analytics, conduct usability testing or run some onsite polls to check if a similar problem exists on your own website. If there is, then you can move to the phase of tackling it with the solutions covered in the case study.

If you manage a team that is responsible for conversion rate optimization, please don’t get fooled by any statement that “we have read an article and we want to test that out”. Always ask about absolute numbers, real statistical confidence of results (not just the ones shown by the tools) and the context of the test that they have read about and are so eager to copy.

Protect your company from applying the best practices based on clumsy A/B tests. Stay away from posts like the one from the Crazy Egg blog that I mentioned above. Because following tips based on microscopic changes in conversion rates will do your business more harm than good.

 

 

Actionable Data

If you want to gain a competitive advantage by increasing your conversion rate and grow your revenue and profits without spending an additional dime on advertising, contact Mavenec today to learn more about our approach to conversion rate optimization. You can also learn our most valuable conversion rate optimization strategies by downloading our Ultimate Conversion Rate Optimization Toolkit.

Author: Damian Rams

I apply conversion rate optimization techniques to make sure that our clients get more sales and leads without spending an additional dime on traffic acquisition. I combine analytical and UX skills with experience in psychology to substantially grow digital businesses. I am a lecturer at Warsaw School of Economics.


  • http://conversionxl.com/ Peep Laja

    I agree with the main point, but dude – “you should get at least 100 conversions per version” is absolutely incorrect statement. This is science, not magic.

    You calculate the needed sample size based on your baseline conversion rate and minimum detectable effect http://conversionxl.com/stopping-ab-tests-how-many-conversions-do-i-need/

    • Damian Rams

      Peep, thanks for the comment. I totally agree with you — this is science, not magic and it all depends on the case. For some websites with huge sale numbers even 500 conversions per challenger would not be enough and we both know it. I have seen many tests where results flipped even after reaching a level of 300 conversions per challenger.

      You state in your article that there is no magic bar and I agree. On the other hand you set 250-350 conversions per challenger as the minimum number below which you ignore test results. I would lower the bar to 100. Though for some cases only! And I do believe that you will agree with me that under some circumstances this would not be a bad decision.

      Reaching 100 conversions per challenger is not a stopping rule for every test and it will vary depending on the case. With this article my main point is to get people to be more alert when reading case studies. I want them to look at absolute numbers and question conclusions. I want them to remember some kind of a rule of thumb that would enable them to ignore case studies that can trick them. I do not believe that 100 is a magic number. But I believe it is an easy one to remember. It will not always work. But so will any other number — 200, 300, 400 or even 1000. In my opinion 100 is just a starting point where we can look into the test results with a little bit bigger faith and start calculating anything.

      I do believe that if we manage to get people to question conclusions from case studies that lack numbers or show inconceivable figures we will move conversion rate optimization discussions to a different level. Don’t you agree?

      • http://conversionxl.com/ Peep Laja

        There are no conditions where I would settle with 100 conversions.

        You seem to be missing the point – also 200, 300, 400 or even 1000 are wrong numbers. All random numbers like this are wrong. This is not how it works.

        For every single test the number is different – and should be calculated ahead of time based on the baseline conversion rate and minimum detectable uplift.

        Use a sample size calculator like this (before you start the test):
        http://www.evanmiller.org/ab-testing/sample-size.html

        And now you know exactly when to stop the test!

        The ballpark you got from my article (250-350) is the minimum number of transactions I go for IF the needed sample size is achieved sooner – which is an extremely rare case and requires a huge uplift (e.g. double, triple).

        Longer explanation of this here http://conversionxl.com/stopping-ab-tests-how-many-conversions-do-i-need/

        • Damian Rams

          Methods you wrote about in your articles are without any doubt right and useful. Calculating sample size before starting the test is a good solution and I use it on a daily basis.

          But you should consider the fact that even with the uplift of 50% (it doesn’t have to be double of triple as you said) the calculator will tell you that with 60 conversions the results are statistically significant – http://take.ms/tO3yq

          That’s why in a particular case of a test with a huge uplift and baseline conversion rate I would go for a minimum of 100 conversions per version. That was also the case I presented in the article. Of course the more data you have, the better – that’s why it says „at least 100” in the article.

          And as I wrote before, the purpose of the article is to give simple rules of thumb to quickly identify if the case study is worth reading at all. It is not about the stopping rules or calculating statistical significance, because it’s not the first thing one wants to do when spotting a case study on the web.

x

Grow Your Conversion Rate

Get your free copy of Ultimate Conversion Rate Optimization Toolkit:

Get Yout Free Copy Now
Read previous post:
Case Study: 5 Digital Analytics Challenges

In this era of Big Data, you are able to track every step your users take, and you can report...

Close