Sunday, January 3, 2010

Perils of Sampling

You have so much online data - storing it is so difficult - How about using a sample for it? What is a good sample? 1:1, 2:1, 4:1 or 16:1

Hmm......Good idea????? Will save me COSTS and PROCESSING TIME


Election results are announced for an entire nation of 100 crores with a sample of 10,000 - some of them accurately predicted - so is 16:1 a good sample for online data?

My answer is an 'EMPHATIC NO'

(1)
The answers that an election survey and online analytics are trying to answer are VERY DIFFERENT.
* Election survey is trying to answer a SINGLE QUESTION with RESTRICTED CHOICES (Which party is going to win?)
* In online data the QUESTIONS ARE MANY and the CHOICES ARE NOT KNOWN.

(2)
Sampling online data can be HAZARDOUS for PAGES THAT GET A FRACTION OF THE SITE VISITS. Say there is a site with 10K visits a day and a page with 100 visits a day.

Sampling means the results are based on 2.5K out of 10K visits. now it is possible that only 10 of these 2.5K fell into the 100 for the page. The results are going to be OBVIOUSLY WRONG

So my advice: it is a MUST to store ALL DATA for a reasonable TIME PERIOD.

Test & Target

Having used Test & Target (an Omniture product) for over a year on atleast 50+ tests, I must say that there is a good market for a better tool :)


What Test & Target does well?

* Normal Beta Testing - sending traffic randomly to 2 or more recipes in the same proportion
* Gathering metrics like Revenue, Orders, Average Order Value, Order Conversion and publishing it real time, including confidence
* Ability to stop the test if results are not in sync

What Test & Target does not do well?
* Detailed Drilldown of a test - Answers as to why one recipe won over another
* Metrics wrt to visitor segments

More often than not, Test & Target forces the business users (i.e. us) to resort to additional tools to get the "answers" ending up being a "plain reporting" tool. One has to use SiteCatalyst & Discover (other Omniture suite products) or Insight (old Visual Sciences) or resort to a database driven process to get the answers

These problems can be solved if Test & Target tool allows the users to view results of certain pre-defined workflows - and in my opinion this is not much.
Some pre-defined workflows in an e-commerce environment can be:
* Which product caused the difference? Product Mix
* Business Segment Issues -> This is specific to organizations. E.g: Book Shopper vs. Electronics Shopper
* Visitor Profile -> 1st vs. repeat
* Marcom Mix/Traffic Source
* Path Taken/Visitor Segment

There is certainly a good market for such a tool - and with so many brilliant CS programmers out there - it would be a great summer project that can rake in money!!
A market definitely exists