When will my test ever reach 100% confidence?
What does 90% confidence mean for me? Should I wait till I have 95% confidence before calling a winner?
How long should I run my test?
Is Test & Target the only way to get the confidence value OR can I calculate it independently?
Above are questions from business users that every analytics person in the web world is familiar with. I will explain this in two parts:
a) From the business angle/viewpoint
b) From the statistical viewpoint
CONFIDENCE FROM A BUSINESS ANGLE:
When you state:
"Recipe A is better than Recipe B on RPV with a confidence of 90%" -> It means that if you run the test on 100 days (i.e. there were 100 independent trials of the test, you could be confident that on around 90 of those days, Recipe A would be the winner)"
So there are key points here:
a) Confidence is based on a variable
b) Confidence is a percentage
c) The percentage indicates the surety you have that when you shut off the test and go "live", that the live results would be similar to the results of the experiment
High confidence means that one test is CONSISTENTLY beating the other one.
For all our examples let us choose RPV (Revenue per visit) as the metric.
For a business user, a line chart can be an indicator of confidence.
Rule 1: Farther apart the lines the better
Rule 2: Lesser the overlaps between the two lines the better
Rule 3: Lesser the spikes on any particular day the better
In figure 1: the RPV values are far apart from each other while in figure 2 they are not.
In figure 1, the overlap is minimal while in figure 2 there is significant overlap
Statistically, you will always find that in figure 1, the confidence is higher (it is going to be close to 100% for this example 1)
Rule 3 is important - it implies that the standard deviations be low - in case there are lot of spikes, it is good to calculate confidence after removing the outlier dates
So answers to the business questions:
1. There is more than a 'definite' chance that your test will never reach 100% confidence
2. 90% confidence means that when you shut off the test and go live, you can be sure that atleast in 90 out of 100 time periods, the results that you saw in the test environment are replicated in the real world
3. There is no definite period for which you must run the test - Run it for atleast 2-3 weeks. Run it as long as you have sufficient data to understand why the test performed well and why it did'nt
4. No, Test&Target/Offermatica or these tools are not the only way to calculate confidence. It is simple and you can calculate it in excel
CONFIDENCE FROM A STATISTICAL ANGLE:
First, I will tell you how to calculate confidence in excel. Then I will go into the statistical detail of it.
1. Isolate the variable on which you are trying to calculate confidence. Write down the numbers vertically by day as shown below:
2. Fill up the following values:
Recipe A
Mean:
Standard Deviation:
Recipe B
Mean:
Standard Deviation:
Above are easy: Calcualated using average (i:j) and stdev (i:j) functions
Degrees of Freedom: Above is number_of_Days for recipe A+number_of_days for recipe B - 2
t-value:
p-value:
Above are based on formulae as in the excel snapshot
As you have seen above, from a statistical viewpoint following are important:
1. Mean of the recipes
2. Standard Deviations
3. Essentially the above - and the resulting histograms/normal distributions of the Recipe RPVs
What the t-test does is that it plots the normal distributions side by side.
In case there is no overlap as in case 1, it is the best solution for confidence
Cases 2 and 3. show high overlap and hence low confidence
Saturday, February 27, 2010
Friday, February 19, 2010
Bridging gap between TnT and Insight
A common scenario in the life of a web analyst focussed on Testing & Targeting using Omniture tools:
Business Scenario:
1. Test run with 2 or more recipes. Test & Target shows recipe A as better over other competing recipes B, C, D......A was control
2. The new design that the business wanted to see was unsuccessful. Business wants to drill down into reasons for the underperformance of B.
Task for web analyst as a result of above:
1. Underperformance of B could have been due to a variety of reasons (first time vs. repeat, product mix, browser/visitor profile, traffic source, ....).
These cannot be answered using Test & Target - needs to use other Omniture tools like Discover or Insight.
2. Insight is the most powerful web analytics tool in the world and I presume the web analyst would prefer to use that given his company has the tool
3. However while Test & Target shows Recipe A as the winner on RPV due to certain reasons and by a certain margin; the other Omniture tools differ in the numbers from Test & Target
Now how to bridge the gap?
Bridging gap between RPV numbers of Test & Target vs. Other Omniture tools (Insight, Discover, SiteCatalyst,....)
1. Test & Target counts visits only if on the visit the user interacted with the page on which the test was being run.
Test & Target uses mboxes to show differing content between recipes. For a visit to get counted as a TnT visit, it is necessary for the mbox to fire.
Example:
So say you are running a test on the deals page.
Visitor X on visit_id x1 saw the deals page and added to cart. This visit is counted as a TnT Visit and as a SiteCatalyst visit
Visitor X on visit_id x2 went to cart where he had saved and viewed the item. This visit is not counted as a TnT visit but is counted as a SiteCatalyst visit
So the visit number as is from TnT vs. SiteCatalyst/Insight is going to match only if you use Insight to replicate the TnT behavior. Unless you replicate the TnT behavior using Insight, the numbers won't match
2. Timezone
TnT Timezones and Insight timezones are setup by your implementation consultant.
It is quite possible the Insight timezone was CST while TnT was GMT. In such a case you have to offset the times
3. Tag firing
It is possible that the TnT Tags fired while the SC ones didn't - this is a small proportion
4. Revenue attribution logic
TnT attributes revenue when the mbox on the orderConfirm page fires.
Implication:
Revenue will be counted in TnT if the visitor saw the campaign on a prior visit and then made a purchase on a new visit without going to the page.
Example:
On 12th Jan, Mr X visited deals page where an offer was being run; added to cart and saved to cart
On 13th Jan, Mr X bought $100 from saved cart without visiting the deals page
The $100 is counted as TnT revenue
In TNT the revenue attribution is for 14 days. Insight/SC might be using 30 days. So you have to replicate the TnT behavior in SC/Insight
5. First/Repeat Visit
TnT bases first/repeat visit on the cookie mbox being fired. So what is a repeat visit for SC/Insight may not be for TnT
However Insight can replicate TnT behavior
All this tells us why Insight must be used to match TnT behavior as you can create powerful segments to replicate the TnT behavior
Business Scenario:
1. Test run with 2 or more recipes. Test & Target shows recipe A as better over other competing recipes B, C, D......A was control
2. The new design that the business wanted to see was unsuccessful. Business wants to drill down into reasons for the underperformance of B.
Task for web analyst as a result of above:
1. Underperformance of B could have been due to a variety of reasons (first time vs. repeat, product mix, browser/visitor profile, traffic source, ....).
These cannot be answered using Test & Target - needs to use other Omniture tools like Discover or Insight.
2. Insight is the most powerful web analytics tool in the world and I presume the web analyst would prefer to use that given his company has the tool
3. However while Test & Target shows Recipe A as the winner on RPV due to certain reasons and by a certain margin; the other Omniture tools differ in the numbers from Test & Target
Now how to bridge the gap?
Bridging gap between RPV numbers of Test & Target vs. Other Omniture tools (Insight, Discover, SiteCatalyst,....)
1. Test & Target counts visits only if on the visit the user interacted with the page on which the test was being run.
Test & Target uses mboxes to show differing content between recipes. For a visit to get counted as a TnT visit, it is necessary for the mbox to fire.
Example:
So say you are running a test on the deals page.
Visitor X on visit_id x1 saw the deals page and added to cart. This visit is counted as a TnT Visit and as a SiteCatalyst visit
Visitor X on visit_id x2 went to cart where he had saved and viewed the item. This visit is not counted as a TnT visit but is counted as a SiteCatalyst visit
So the visit number as is from TnT vs. SiteCatalyst/Insight is going to match only if you use Insight to replicate the TnT behavior. Unless you replicate the TnT behavior using Insight, the numbers won't match
2. Timezone
TnT Timezones and Insight timezones are setup by your implementation consultant.
It is quite possible the Insight timezone was CST while TnT was GMT. In such a case you have to offset the times
3. Tag firing
It is possible that the TnT Tags fired while the SC ones didn't - this is a small proportion
4. Revenue attribution logic
TnT attributes revenue when the mbox on the orderConfirm page fires.
Implication:
Revenue will be counted in TnT if the visitor saw the campaign on a prior visit and then made a purchase on a new visit without going to the page.
Example:
On 12th Jan, Mr X visited deals page where an offer was being run; added to cart and saved to cart
On 13th Jan, Mr X bought $100 from saved cart without visiting the deals page
The $100 is counted as TnT revenue
In TNT the revenue attribution is for 14 days. Insight/SC might be using 30 days. So you have to replicate the TnT behavior in SC/Insight
5. First/Repeat Visit
TnT bases first/repeat visit on the cookie mbox being fired. So what is a repeat visit for SC/Insight may not be for TnT
However Insight can replicate TnT behavior
All this tells us why Insight must be used to match TnT behavior as you can create powerful segments to replicate the TnT behavior
Subscribe to:
Posts (Atom)