Saturday, February 27, 2010

Confidence: Business Angle & Statistical Angle

When will my test ever reach 100% confidence?
What does 90% confidence mean for me? Should I wait till I have 95% confidence before calling a winner?
How long should I run my test?
Is Test & Target the only way to get the confidence value OR can I calculate it independently?

Above are questions from business users that every analytics person in the web world is familiar with. I will explain this in two parts:
a) From the business angle/viewpoint
b) From the statistical viewpoint

When you state:
"Recipe A is better than Recipe B on RPV with a confidence of 90%" -> It means that if you run the test on 100 days (i.e. there were 100 independent trials of the test, you could be confident that on around 90 of those days, Recipe A would be the winner)"

So there are key points here:
a) Confidence is based on a variable
b) Confidence is a percentage
c) The percentage indicates the surety you have that when you shut off the test and go "live", that the live results would be similar to the results of the experiment

High confidence means that one test is CONSISTENTLY beating the other one.
For all our examples let us choose RPV (Revenue per visit) as the metric.

For a business user, a line chart can be an indicator of confidence.

Rule 1: Farther apart the lines the better
Rule 2: Lesser the overlaps between the two lines the better
Rule 3: Lesser the spikes on any particular day the better

In figure 1: the RPV values are far apart from each other while in figure 2 they are not.
In figure 1, the overlap is minimal while in figure 2 there is significant overlap

Statistically, you will always find that in figure 1, the confidence is higher (it is going to be close to 100% for this example 1)

Rule 3 is important - it implies that the standard deviations be low - in case there are lot of spikes, it is good to calculate confidence after removing the outlier dates

So answers to the business questions:
1. There is more than a 'definite' chance that your test will never reach 100% confidence
2. 90% confidence means that when you shut off the test and go live, you can be sure that atleast in 90 out of 100 time periods, the results that you saw in the test environment are replicated in the real world
3. There is no definite period for which you must run the test - Run it for atleast 2-3 weeks. Run it as long as you have sufficient data to understand why the test performed well and why it did'nt
4. No, Test&Target/Offermatica or these tools are not the only way to calculate confidence. It is simple and you can calculate it in excel

First, I will tell you how to calculate confidence in excel. Then I will go into the statistical detail of it.

1. Isolate the variable on which you are trying to calculate confidence. Write down the numbers vertically by day as shown below:
2. Fill up the following values:
Recipe A
Standard Deviation:

Recipe B
Standard Deviation:

Above are easy: Calcualated using average (i:j) and stdev (i:j) functions

Degrees of Freedom: Above is number_of_Days for recipe A+number_of_days for recipe B - 2

Above are based on formulae as in the excel snapshot

As you have seen above, from a statistical viewpoint following are important:
1. Mean of the recipes
2. Standard Deviations
3. Essentially the above - and the resulting histograms/normal distributions of the Recipe RPVs

What the t-test does is that it plots the normal distributions side by side.
In case there is no overlap as in case 1, it is the best solution for confidence
Cases 2 and 3. show high overlap and hence low confidence

1 comment:

  1. Hi Kiran,

    Really good post. Helps in understanding how to go about deciding which version to go with. Is this method of determining the statistical significance associated with a A/B test scalable when you have more than two test versions? If not what methodology needs to be applied for such cases?