Practitioner's Analytics Blog: Testing

Showing posts with label Testing_Targeting. Show all posts

Saturday, February 27, 2010

Confidence: Business Angle & Statistical Angle

When will my test ever reach 100% confidence?
What does 90% confidence mean for me? Should I wait till I have 95% confidence before calling a winner?
How long should I run my test?
Is Test & Target the only way to get the confidence value OR can I calculate it independently?

Above are questions from business users that every analytics person in the web world is familiar with. I will explain this in two parts:
a) From the business angle/viewpoint
b) From the statistical viewpoint

CONFIDENCE FROM A BUSINESS ANGLE:
When you state:
"Recipe A is better than Recipe B on RPV with a confidence of 90%" -> It means that if you run the test on 100 days (i.e. there were 100 independent trials of the test, you could be confident that on around 90 of those days, Recipe A would be the winner)"

So there are key points here:
a) Confidence is based on a variable
b) Confidence is a percentage
c) The percentage indicates the surety you have that when you shut off the test and go "live", that the live results would be similar to the results of the experiment

High confidence means that one test is CONSISTENTLY beating the other one.
For all our examples let us choose RPV (Revenue per visit) as the metric.

For a business user, a line chart can be an indicator of confidence.

Rule 1: Farther apart the lines the better
Rule 2: Lesser the overlaps between the two lines the better
Rule 3: Lesser the spikes on any particular day the better

In figure 1: the RPV values are far apart from each other while in figure 2 they are not.
In figure 1, the overlap is minimal while in figure 2 there is significant overlap

Statistically, you will always find that in figure 1, the confidence is higher (it is going to be close to 100% for this example 1)

Rule 3 is important - it implies that the standard deviations be low - in case there are lot of spikes, it is good to calculate confidence after removing the outlier dates

So answers to the business questions:
1. There is more than a 'definite' chance that your test will never reach 100% confidence
2. 90% confidence means that when you shut off the test and go live, you can be sure that atleast in 90 out of 100 time periods, the results that you saw in the test environment are replicated in the real world
3. There is no definite period for which you must run the test - Run it for atleast 2-3 weeks. Run it as long as you have sufficient data to understand why the test performed well and why it did'nt
4. No, Test&Target/Offermatica or these tools are not the only way to calculate confidence. It is simple and you can calculate it in excel

CONFIDENCE FROM A STATISTICAL ANGLE:
First, I will tell you how to calculate confidence in excel. Then I will go into the statistical detail of it.

1. Isolate the variable on which you are trying to calculate confidence. Write down the numbers vertically by day as shown below:
2. Fill up the following values:
Recipe A
Mean:
Standard Deviation:

Recipe B
Mean:
Standard Deviation:

Above are easy: Calcualated using average (i:j) and stdev (i:j) functions

Degrees of Freedom: Above is number_of_Days for recipe A+number_of_days for recipe B - 2

t-value:
p-value:
Above are based on formulae as in the excel snapshot

As you have seen above, from a statistical viewpoint following are important:
1. Mean of the recipes
2. Standard Deviations
3. Essentially the above - and the resulting histograms/normal distributions of the Recipe RPVs

What the t-test does is that it plots the normal distributions side by side.
In case there is no overlap as in case 1, it is the best solution for confidence
Cases 2 and 3. show high overlap and hence low confidence

Friday, February 19, 2010

Bridging gap between TnT and Insight

A common scenario in the life of a web analyst focussed on Testing & Targeting using Omniture tools:

Business Scenario:
1. Test run with 2 or more recipes. Test & Target shows recipe A as better over other competing recipes B, C, D......A was control
2. The new design that the business wanted to see was unsuccessful. Business wants to drill down into reasons for the underperformance of B.

Task for web analyst as a result of above:
1. Underperformance of B could have been due to a variety of reasons (first time vs. repeat, product mix, browser/visitor profile, traffic source, ....).
These cannot be answered using Test & Target - needs to use other Omniture tools like Discover or Insight.
2. Insight is the most powerful web analytics tool in the world and I presume the web analyst would prefer to use that given his company has the tool
3. However while Test & Target shows Recipe A as the winner on RPV due to certain reasons and by a certain margin; the other Omniture tools differ in the numbers from Test & Target
Now how to bridge the gap?

Bridging gap between RPV numbers of Test & Target vs. Other Omniture tools (Insight, Discover, SiteCatalyst,....)
1. Test & Target counts visits only if on the visit the user interacted with the page on which the test was being run.
Test & Target uses mboxes to show differing content between recipes. For a visit to get counted as a TnT visit, it is necessary for the mbox to fire.
Example:
So say you are running a test on the deals page.
Visitor X on visit_id x1 saw the deals page and added to cart. This visit is counted as a TnT Visit and as a SiteCatalyst visit
Visitor X on visit_id x2 went to cart where he had saved and viewed the item. This visit is not counted as a TnT visit but is counted as a SiteCatalyst visit

So the visit number as is from TnT vs. SiteCatalyst/Insight is going to match only if you use Insight to replicate the TnT behavior. Unless you replicate the TnT behavior using Insight, the numbers won't match

2. Timezone
TnT Timezones and Insight timezones are setup by your implementation consultant.
It is quite possible the Insight timezone was CST while TnT was GMT. In such a case you have to offset the times

3. Tag firing
It is possible that the TnT Tags fired while the SC ones didn't - this is a small proportion

4. Revenue attribution logic
TnT attributes revenue when the mbox on the orderConfirm page fires.

Implication:
Revenue will be counted in TnT if the visitor saw the campaign on a prior visit and then made a purchase on a new visit without going to the page.

Example:
On 12th Jan, Mr X visited deals page where an offer was being run; added to cart and saved to cart
On 13th Jan, Mr X bought $100 from saved cart without visiting the deals page
The $100 is counted as TnT revenue

In TNT the revenue attribution is for 14 days. Insight/SC might be using 30 days. So you have to replicate the TnT behavior in SC/Insight

5. First/Repeat Visit
TnT bases first/repeat visit on the cookie mbox being fired. So what is a repeat visit for SC/Insight may not be for TnT
However Insight can replicate TnT behavior

All this tells us why Insight must be used to match TnT behavior as you can create powerful segments to replicate the TnT behavior

Sunday, January 3, 2010

Perils of Sampling

You have so much online data - storing it is so difficult - How about using a sample for it? What is a good sample? 1:1, 2:1, 4:1 or 16:1

Hmm......Good idea????? Will save me COSTS and PROCESSING TIME

Election results are announced for an entire nation of 100 crores with a sample of 10,000 - some of them accurately predicted - so is 16:1 a good sample for online data?

My answer is an 'EMPHATIC NO'

(1)
The answers that an election survey and online analytics are trying to answer are VERY DIFFERENT.
* Election survey is trying to answer a SINGLE QUESTION with RESTRICTED CHOICES (Which party is going to win?)
* In online data the QUESTIONS ARE MANY and the CHOICES ARE NOT KNOWN.

(2)
Sampling online data can be HAZARDOUS for PAGES THAT GET A FRACTION OF THE SITE VISITS. Say there is a site with 10K visits a day and a page with 100 visits a day.

Sampling means the results are based on 2.5K out of 10K visits. now it is possible that only 10 of these 2.5K fell into the 100 for the page. The results are going to be OBVIOUSLY WRONG

So my advice: it is a MUST to store ALL DATA for a reasonable TIME PERIOD.

Test & Target

Having used Test & Target (an Omniture product) for over a year on atleast 50+ tests, I must say that there is a good market for a better tool :)

What Test & Target does well?

* Normal Beta Testing - sending traffic randomly to 2 or more recipes in the same proportion
* Gathering metrics like Revenue, Orders, Average Order Value, Order Conversion and publishing it real time, including confidence
* Ability to stop the test if results are not in sync

What Test & Target does not do well?
* Detailed Drilldown of a test - Answers as to why one recipe won over another
* Metrics wrt to visitor segments

More often than not, Test & Target forces the business users (i.e. us) to resort to additional tools to get the "answers" ending up being a "plain reporting" tool. One has to use SiteCatalyst & Discover (other Omniture suite products) or Insight (old Visual Sciences) or resort to a database driven process to get the answers

These problems can be solved if Test & Target tool allows the users to view results of certain pre-defined workflows - and in my opinion this is not much.
Some pre-defined workflows in an e-commerce environment can be:
* Which product caused the difference? Product Mix
* Business Segment Issues -> This is specific to organizations. E.g: Book Shopper vs. Electronics Shopper
* Visitor Profile -> 1st vs. repeat
* Marcom Mix/Traffic Source
* Path Taken/Visitor Segment

There is certainly a good market for such a tool - and with so many brilliant CS programmers out there - it would be a great summer project that can rake in money!!
A market definitely exists

Practitioner's Analytics Blog

Saturday, February 27, 2010

Confidence: Business Angle & Statistical Angle

Friday, February 19, 2010

Bridging gap between TnT and Insight

Sunday, January 3, 2010

Perils of Sampling

Test & Target

Counter

Labels

Search This Blog

Blog Archive