Wednesday, December 1, 2010

Analyzing Effectiveness of an e-support site

For an introduction to e-support sites:
http://webanalyticsnuggets.blogspot.com/2010/11/benefits-of-efficient-support-site.html

Web Failure rate
The single most important metric to analyze success of an e-support site is 'Web Failure Rate' or WFR.

It is defined as the proportion of support site visits that did not find a solution to their problem online. So if there were 100 visits to the support site and if 20 of them did not find a solution to their problem online, the web failure rate is 20%

How to calculate Web Failure Rate:
This is the tricky part.
That a visitor did not find a solution to their problem online is proven if he contacted a non-online channel to find a solution. This could be a call by the visitor to the toll free #.

In this scenario, there has to be a common field that links the visitor to the caller. It could be a 'service tag' or it could be a 'customer number'.......
In case of unique service tags or customer numbers, the visitor would enter it to find a solution on support site; and would also refer to it during his call with the customer service representative.



The shaded area above is web failures.

The key in e-support is to minimize the web failures and hence the web failure rate (WFR)

Will post on the other metrics that are used in e-support in a separate post

Thursday, November 11, 2010

Benefits of an efficient Support Site

Most ecommerce sites provide support:
1. Support on product usage - To provide support on user queries on product usage. This maybe part of the package that the user bought like an 'add-on' or these maybe queries that the user is entitled to having bought the product
2. Support on product failure & product related problems - Products fail and can have problems during their in-warranty period. This is support that both the seller and the product manufacturer are required to support. Even if the product manufacturer is supposed to handle it, nothing stops the customer from contacting the seller - and forms an important component of customer experience

Support can be provided by:
1. Having a toll-free number where folks can call in
Cons:
* Need to have a dedicated staff to take on the calls - FTE Cost
* Cannot cross-use staff on other activities, during peak season there are likely to be a higher number of calls - FTE Slack time
* Low FTEs means high waiting time - Customer Satisfaction Effected
High FTEs means high idle time
* Core Competency logic may result in moving this to a 3rd party meaning lesser control
* Variability: Customer may have varying levels of satisfaction depending on the person he contacted - Bearing on Customer Satisfaction

Pros:
* Personalized Experience

2. Having a site where users can find solutions to their problems

Pros:
* Zero Ongoing FTE Cost
* Having clear paths for common problems can ensure that 70-80% of the problems have the same kind of resolution path with no variability
* Overall - cost of resolving a problem online is much lesser than resolving a problem offline (that is through a call)


So an efficient support site can reduce the overall cost of ownership of running an e-commerce business

Monday, October 18, 2010

Path Based Segmentation as an Input to Test Design

The way analytics is used today in organization is to test new designs. Rarely is analytics used as an input to test design. Path based segmentation can be an effective way to come up with a design that is not only a combination of drawing room designs & user testing but also from best analytics practices.

1. Path Identification
Identify the main purchase paths on your site. Let us take the example of a popular e-commerce site ebay.in as an example. There are several paths to buy:
a. Deals path: Discounted offers on these pages, usually a link from the home page exists, some marcom vehicles land folks here .....
b. Default Path: Traditional path.
E.g: While buying on ebay I would start with the home page, go to a category page,
subcategory page, product details page and then add to cart and buy
c. Some specific navigation mechanisms: e.g Assisted Navigation on the left

d. Menu path: A set of menu options on top that can be used to navigate


Some examples of these paths on the ebay site are in the above buckets.
Haven't provided examples for a. and b. as these are known

2. Data Gathering & Representation
For each path, identify the visits, purchase_visits and revenue for each path.
Calculate RPV (=revenue/visits), conversion (Purchase_visits/visits) & Visit_Mix
(path_visits/visits) for each path

Represent these paths as a bubble chart with X-axis as RPV, Y-axis as visits and size of bubble as conversion.

C. Insights as an input to new test design
The example chart above is a dummy example but leads to key areas for action:
a. Layout Design to change visit mix
If you want to drive more revenues, you could change your layout to drive visits to your best performing path
(In above example, a possible test design could be to increase visits to deals. Reason being deals has the highest RPV - so you get more out of every dollar you spend by driving traffic to deals
and less traffic to menu path)
b. Feature improvement to improve RPV
In the above example, you would try and improve the feature on ebay.in going with the dummy data.
So this is a simple example where path based segmentation analysis can help you come up with new testing ideas. Obviously this has to be customized to your organizational metrics needs
(In above example, a possible test design could be to improve some path features like Menu - reason being low RPV performance but considerable visits)

Saturday, August 21, 2010

What to action on in a Page that changed

A page is very important in e-tail. It is not any other webpage. The more the visits to a page, the more important it is in the purchase path - and the higher is the impact that can be delivered.

No wonder merchandising teams are trying out different versions of the page. These are usually based on intuition & experience. It is the job of the analytics teams to work with the merchandising teams to ensure these decisions are based on data.
Unfortunately analytics role today in the world of simple web tools is just about providing visits and leakage information for the page OR running a simple A/B test (resulting in 2.5% upside which when annualized becomes a big number). The role of analytics should be much more than analyzing upside and annualizing the results of the A/B tests. Analytics should be able to get back and say what drove the upside/downside. I am presenting an approach that will help the same.

1. Starting Point: Make a list of levers that the merchandising teams have in changing the page
The levers fall into 2 broad buckets
a. Layout Changes
What changed in the layout? Some sample list could be
i. Rotating Banner instead of a static one OR position of Banner Changed
ii. New Links added, Old links removed
iii. Links rearrranged etc.

b. Feature Changes:
i. Page links to a page with a new feature

Step 2a: Create a segment to isolate the entries to the page
Step 2b: Create separate segments to isolate next clicks to pages from the page X
This is where it is imperative that you have a real web analytics tool at hand. You cannot do this in basic tools like SiteCatalyst or CoreMetrics or Discover. You need Insight/Visual Sciences

Step 3: Repeat Step 2 for the page 'before' and 'after' if you are doing an anachronistic analysis or repeat for the versions of the page being tested

Step 4: Tabulate as follows:


Cursory look reveals that for all landings on the page,
* Overall conversion of page improved
* Next clicks were fairly similar
* Bounces were little reduced
* Newly added Page E doing well
* Page A with changed features doing well

Now this can be translated as per our step 1 framework as follows:

Clearly upside in conversion is coming from the new Page E visits, from changed features on Page A & increased visits to Page A that has higher conversion. So business knows what improved

Advantages of my method.

1. Upside might not be coming from the changes to Page X that is being tested but rather because it leads to page Y with a new feature.
In such cases, my method captures the source of the changes
2. In cases where the page does not work, the merchandising & Design teams know what exactly to go and fix

If you need the raw xls for the analysis or help in how to set these things up in a web analytics tool like Insight, get in touch with me at rkirana@gmail.com

Friday, August 6, 2010

Getting the $ from folks that come to Support Site

'How to get folks to buy more' is every marketer's problem.
Getting support folks to buy is the latest buzz in both offline and online contexts. In the offline world, when somebody calls in, the rep tries to sell -
Hey! you need more RAM OR your HDD is close to full - increase capacity OR there is a brand new processor that your motherboard supports
This is not tried on everybody - An experienced rep can make out who to sell and who not to - who to pitch and who not to

How do we replicate the same in the online world?
First step is to find out 'as-is' i.e. how much $ do I make out of the support site today? This is in the online context presents some problems.

Complexity 1: I came to the support site to check my order status :)
Online world presents a unique complexity because after making a purchase on any e-commerce site like Amazon, folks tend to visit the support site to check order status. These are not folks that are getting us revenue $ from the support site.
They have to be excluded

Complexity 2: X landed on support site while Y navigated to support site from purchase site. X and Y are different

The behavior of the folks that land on a support site and path over to the e-commerce site is very different from folks that navigate to the support site and then went over to the purchase site
Both of these have to be treated as separate segments

Complexity 3: I came to check my order status after purchasing from the support site :)

Visitors that bought from a link to purchase site from the support site may actually come back to the support site to check their orders.They should not be excluded



Here is my stepwise approach to solving this problem
Step 1: Separate segments for Landings and Non-Landings
Create 2 segments of visits:
a) Those that land on support site - call as X
b) Those that did not land on support site - call as Y
X and Y are mutually exclusive


Step 2: For X, the bucket that landed on the on the support site and buy in the same visit

The $ that results from these folks is the purchases that they made on the support site. Call this $A

Step 3: For Y, the bucket that did not land on the support site
Step 3a: Visit Segment of folks that made a purchase and checked order status. Call this M
Step 3b: Visit Segment of folks that made a purchase and did not check order status. Call this N
Step 3c: Visit Segment of folks that visited support before before the e-comm site. Call this P.
Now N intersection P -> Count the revenue for these folks. Call it $B
Step 3d: Visit Segment of folks that visited support and went to e-comm site before returning to the support site. Call this Q
Now revenue for Q intersection M is $C

Total Revenue is $A + $B + $C.
More on how to improve revenue for these folks in one of the subsequent posts.

If you need help in how to set these things up in a web analytics tool like Insight, feel free to drop a mail at rkirana@gmail.com

Wednesday, July 21, 2010

Comparison of ODG Vehicle Effectiveness

Continuing on the last post: http://webanalyticsnuggets.blogspot.com/2010/06/online-marcom-vehicle-characteristics.html
that provided an overview of each ODG vehicle, in this post we compare the different ODG vehicles on parameters like:
a) Costing Method
b) Cost of Compaign
c) Primary Effectiveness Factors
d) Tertiary Effectiveness Factors

Here we go :)



   Effectiveness Factors 
SI#Online Marcom VehicleCosting MethodCampaign Cost ($)PrimarySecondary
1BannerPer ClickthroughHigh$ SpendChoice of Site where banner placed, Messaging on the banner, Landing Page Matches Offer on Banner
2High Impact PlacementPer ClickthroughVery Very High$ SpendChoice of Site where HIP placed, Messaging on the HIP, Landing Page Matches Offer on HIP
3EmailClose to 0 for sending email; Creative & Modelling CostLow for executionNumber of email dropsChoice of right audience; Sending right offer to right audience
4AffiliatesPer Purchase, Usually 5-10% of the purchased amount is retained by affiliateMedium - payment hits margins hard on purchasesOffer attractivenessChoice of site where offer placed, Landing page showing same offer as what was placed
5Product ListingsPer Purchase, Usually 5-10% of the amount is retained by the listerMedium - payment hits margins hard on purchasesOffer attractivenessChoice of site where offer placed, Currentness of Offer
6Paid SearchPer ClickMediumKeyword Choice & PlacementEffective Bidding Strategy with Google/Yahoo (Don't bid on keywords that you can migrate to natural search); Effective Landing Pages
7Traditional Mass Media e.g: Newspaper Ads, Print, TVPer Advertisement (e.g: for an ad on the front page, in minutes for TV advertisement)HighTracking Difficult Intuition of the traditional marketing/brand manager :) ;)
8Social Media & communities (SMAC)Free for ads on your site, Pay for ads on other siteMediumCreate effective SMAC pages e.g: Good Facebook Site for your company that others want to visit

Monday, July 5, 2010

Online Marcom Vehicle Characteristics

So you are the Marcom manager trying to understand which vehicle to spend money on?
Or the marcom analyst trying to understand what are the factors that drive each marcom vehicle? Then this post is for you.

Online Marcom Vehicles can be classified into 2 buckets
Bucket 1: Those that are influenced PRIMARILY by SPEND

Banners:
Example: On a little known newssite www.ndtv.com there is a banner on the right (linking to electronics giant Siemens) and a banner on the top (linking to travel site tripadvisor.in)
You are likely to get more visits from the sites you place banners on WHEN you spend more


High Impact Placements (HIPs)
High Impact Placements (HIPs) are banners on 'extremely high' traffic sites like msn.com
Below is banner for netflix on msn.com; NetFlix will get more visits by advertising on msn.com as compared to ndtv.com; However the costs of this are huge and by the time you go to msn.com, the netflix HIP is probably gone !!

Bucket 2: Those that are influenced PRIMARILY by ACTIONS

Email:
Email is a marcom vehicle heavily influenced by the time of drops. Usually a mining algorithm run in the backend scores prospects on their likelihood to respond/make a purchase - and targeted emails are sent to this sample
These are heavily influenced by the drops - If there are a large number of drops in a particular week say 'days of deals' visits from email would be huge

Emails cost nothing to send - the only cost involved is cost of making the creatives



Bucket 3: Those that are influenced PRIMARILY by OFFER ATTRACTIVENESS

Affiliates:
Mr X has a popular blog on computers. Samsung wants to advertise there.
There are 25 other similar blogs by folks like Mr X :). Can a large company like Samsung actually go and negotiate rates/service adherence with all these 25 folks.

Now there are many other sites where Samsung would want to advertise. E.g: Places where folks get coupons e.g: Couponmountain.com,.....etc

In these cases, the companies would tie up with an affiliate network site like www.linkshare.com, www.befree.com, www.commissionjunction.com. Linkshare, BeFree, CommissionJunction - one of these would be contracted by a giant like Samsung to handle affiliate marketing. These sites would receive a portion of the payment that the end users get.

Example: Say somebody used a coupon from www.couponmountain.com and made a purchase from Samsung (ASsuming Samsung contracted Linkshare as its affiliate vendor and CouponMountain is one of the many that Linkshare engages) worth $150, out of the approximately 5-10% commission that couponmountain gets which would be between $7 and $15, a small portion would go to Linkshare.


Product Listings:
Product Listings are sites that carry your list of offers - pricing comparison sites.
Example is Deals2buy.com -> They have daily changing deals of many companies one below the other.

In deals2buy site, today, there are offers of both Samsung & canon - competitors - one below the other



In both Product Listings & Affiliates, the primary driver is "offer attractiveness" - it maybe features or maybe price


Bucket 4: Those driven primarily by PLACEMENT and CHOICE

Paid Search:
Google - "no evil" - for the user of the product - for the one that wants to see search results. But definitely not "no evil" for the advertiser. Google monetizes its wonderful search engine through advertisements.

Example: A search on "used cars" yields lot of results. However there are 2 portions marked in red, one above the search results and one on the right - called "sponsored links" -> this is an example of PAID SEARCH



The ones in the red have actually paid to be there. The ones that bid the most got on top and similarly the ones below.
If you are a seller of used cars like carwale.com, your traffic from paid search is going to be determined by:
a) Placement: Where you were placed
b) Choice: What keyword you bid on

Example: might have been better for you to have bid on a keyword of suzuki used car versus ford used car in India


Bucket 5: Traditional Marcom Media

These include TV advertisements (No way to track effectiveness easily and I am dead against using traditional mass media for e-comm sites - Ibibo.com does this and I hope somebody drives it to them :)

Also include catalogs, print media etc mailed to these folks.

No way to track effectiveness and also very expensive. I am against these

Bucket 6: New-age Social Media & Communities
These include the world of facebook, twitter, orkut......

Tuesday, June 15, 2010

Perils of A/B Testing & Multi-variate Testing

Annualizing Upside gives a big Number. It is not uncommon to see analytics team get a 2% upside in revenue per visit and extrapolating it to an entire year resulting in millions of $ impact that is not real.
Especially if the test is on the basket page or ecomm page - You are claiming that you are going to increase your company's online revenue or e-commerce revenue by almost 2% because everyone that's going to buy is going to add to basket or checkout

This is very dangerous due to the following reasons:
* Many times A/B test might give the upside due to sheer chance. An A/A test might perform even better :)
Analysts ignored the effect of 'noisy data'. The tools unfortunately also ignore this.
* The impact is so low - only 2%. Test is either run 'too short' or 'too long'. Too long means n is large and the standard error formula that is used to extrapolate sample results to the population has n in the denominator. 'Too short' - means you got a confidence level - but simply don't go by these tests. I will be posting subsequently a note as to why & how many of the current testing methods are inadequate and dated.

* There is no sound reasoning on the reason behind upside.
It is like 'Customers felt better' - WHat? I didn't :)
'Customers did not get distracted' - cool. that was not a factor for me ....
Unless there is sound reasoning, the impact $ must not be taken at face value

* Annualizing Upside is dangerous. There might be downstream or upstream changes during the course of the year that leave these changes redundant.

Saturday, June 5, 2010

Social Media & communities - What are the right metrics?

Here are my thoughts on three questions on Social media & Communities in a web analytics group:

1. How to attribute ROI to Facebook and other Social Media?

Answer:
#1 Revenue from Facebook apps with same session attribution (meaning he came from Facebook apps and purchased on the same visit)
#2 Revenue from Facebook apps with multi-session attribution (meaning he could have come from Facebook property 30 or 60 days ago and bought today - give credit to the facebook app since he made our site familiar with him)
#3: Engagement Metrics: Remember this is an important component on CRM
Are folks writing positively about your company or not?
Are folks talking about your organization?

A big risk would be if folks stopped talking or discussing about you - implication - knowledge that your products and brand are no longer "hot"
Many negative posts - your newly launched product is not doing well

2. Is it right to target people this way? Privacy??

Gmail shows up advertisements based on keywords in email - So also do others. The site would usually have a clause asking the user if he wants to view advertisements on registration - so not an issue

3. Is the future bright?

Absolutely yes!!
Think of this - you come from a social media & communities site.
Where should I land you on my e-commerce site?
Should I land you on a good deal or on a product detail page of a standard product?
It is a mining & testing_targeting problem domain

Saturday, May 15, 2010

Email Analytics: Part 1 - Costs of an Email Marketing Campaign

From all executives:
"The least costly campaign to setup is an email marketing campaign - Only costs involved is to have an agency setup a few creatives and mail them out from the company to the list of customers!!!"
How often have we heard this?

Sending the campaigns in the above way would be "extremely costly" in the long run because:
(1) Frequent mailers would cause these to be spam thus missing out on potential buyers - Folks will BEGIN to IGNORE good offers
(2) Create a wrong brand image of the company
(3) Habituate buyers to something like a 10% discount - He will not buy without htat
(4) Result in high bounce rates if the campaigns and the landing page tell different stories

For an effective email campaign, the drivers would be the following:
(1) Analytical Model to score your Email opt-in list using mining techniques - SELECTION ANALYTICS COST
Engage a firm that employs Data Mining techniques like Logistic Regression (LR) to score your email list based on factors such as:
a) Past response to emails, offers, discounts, messaging etc
b) Browse/buy Behavior on the site in the past
c) Factors involved in prior purchase: marcom source, product bought, ....; Lifecycle analysis to predict who is likely to respond (e.g: Printer buyer who is running out of ink!!)
d) Integration of demographic and other factors data from 3rd party sites like BIS
e) Other

(2) Email Design Testing & Landing Page Optimization for the campaign (A/B, Multi-variate Testing) - TESTING COST
Next Step is to be clear about the intent of the email campaign.
What are we trying to do?
a) Sell a deal (like 10% off or $100 off or....)
b) Communicate hot new products arrivals
c) Use prior information that a customer needs a certain product
d) Other
And to design landing pages for the email visits. Landing page is the page where these email visits are going to land on your site.
So if they see a link xxx on the email the xxx click leads to the landing page on your site.

The landing page link xxx is very critical because if it communicates a different story from the email, the results would be devastating

Suppose you have identified 10,000 folks to send the email out to, don't send them rightaway
First identify a bucket say 500.
Say you have 2 recipes of the landing page for the campaign.
Send half of them to link xxx1 and half to link xxx2.
Measure success and the statistical confidence. If not high, increase the 500 bucket
Once you know which recipe is better, use that for the actual campaign

(3) Create the email creatives & Landing Pages - CREATIVE CREATION COST
This is a subset that comes between (1) and (2)
It is important that the intent of the campaign is properly understood when the email creatives and landing pages are designed
And that there are 2-3 versions to test

(4) Measure the ROI of the campaign and store the results for future learnings - POSTMORTEM ANALYTICS COST
This is a stage often ignored.
You could have an organization that has sent out a million email campaigns without learning anything

So total cost is SELECTION ANALYTICS COST + TESTING COST + CREATIVE CREATION COST + POSTMORTEM ANALYTICS COST.

Will have a series of posts on this topics in the coming days

Friday, May 7, 2010

Interaction between paid & unattributed traffic



You have a paid marcom campaign - you take it off - and suddenly you find a drop in unattributed traffic.

Let us take a hypothetical scenario.
Weekly traffic dropped 40% as a result of taking out a campaign that generated approximately 10,000 visits a week. The expected fall was 25%, but there is an extra 15% fall due to drop in unattributed/direct load traffic. You are perplexed :O
What caused this?

This can be graphically represented as a bridge chart:






In this post, let us go through a workflow of how to mine this information and communicate this to business stakeholders
(After all presentation is an art - if you can't present your idea through, it will never get implemented)

STEPS TO FOLLOW/DRILLDOWN THE PROBLEM
Here I would say are the steps to follow before we decide to take out a campaign

I) Capture the first time and repeat visits that resulted from the ODG Campaign for the last few days (maximum of 90 days)
Calculate the Revenue per visit separately for each
II) Arrive at a statement:
"Campaign C results in an average of x visits a week landing from a campaign and y visits that did not land on the campaign on the same visit but on a prior visit"
III) Revenue Impact of removing the campaign would be
x*px + y*py where px and py are the average RPV for the x and y visits

Now let us go through an actual workflow of analyzing this information:

DETAILED ANALYSIS
Step I:
Job to be done: Capture the visits that resulted from the ODG Campaign for the last few days
Method: Create a visit segment with a condition on traffic source as from campaign


Get the visits for above segment - it is x.

Job to be done: Capture the visits wherein the user had not used the campaign on the same visit but on a prior visit
Method: Visitor Segment with condition on traffic source as from campaign and visit_sequence_number > 1


Get visits for above segment - it is w.
Then y = w-x i.e. visits that came to your site as direct load - but on a different visit had seen the campaign.




Step II:
Here is what we found:
"Campaign C results in an average of 10000 visits a week landing from a campaign and 6000 visits that did not land on the campaign on the same visit but on a prior visit. Their RPVs are $4 and $6 respectively."

So revenue impact of removing this campaign is
10000 * 4 + 6000 * 6 = 76000


This is just a directional workflow
So when you take out a campaign, bear in mind that you might be impacting direct traffic as well.

Thursday, April 15, 2010

Building the ECRM Model: Variables of interest

Usual variables of interest in building an ECRM Model for a segmentation of customers would fall into the following buckets.

A) Source Information
* Did he come from a vehicle we spent marcom $ on OR did he come otherwise?
* What was the source that first got him to the site and what has he used subsequently?
* Which specific marcom vehicle/site did he come from? Which search keyword he used etc

Nail down all the source information

B) Visitor Profile
* How many visits has he made in the past?
* Was there a learning/purchasing cycle evident in the visit pattern? Was there something specific he was looking for? (e.g: Deal/Price Point/Specific Product .. etc)

C) Paths on the site
* What navigation methods in terms of pages seen were used on the site? - E.g: Navigation methods on the left/top bar, internal search, page - groups seen (e.g: Deals pages, Normal non-deals product pages), did he add to cart/save cart, did he try configuring, did he start checkout etc

D) Integrate with non-online channel data
Can we integrate his online information with some of his offline behaviors?
E.g: He might have visited the support site after buying in your commmerce site. The quality of support might determine if he purchases in the future.
Did he visit the retail store or did he try buying through a sales call?
Did he visit competitor sites? (Many of this information is now available on shared marketing sites)

E) Product Affinity
Analyze those who bought - transactional history - to understand product affinity - products likely to be bought together OR products that a segment likes. Increase cross-sell/up-sells based on these.

Based on these it is possible to build a robust eCRM model -> I am building one using SAS and advanced analytics. Will keep posted once I get interesting results or results that are publishable based on a standard methodology

Sunday, April 4, 2010

Marcom Attribution Problem

Online Demand Generation/Marcom spends form an important portion of the Internet Marketing budget. The budget has to be spent wisely to ensure that the right funnel inputs get rewarded. Rewarding wrong vehicles can result in lower revenue/value for the company. Optimizing marcom spend is necessary to ensure firm value is maximized.


Case 1: Which vehicle should get the credit in the multi-visit multi-vehicle case?

Peter visited www.evogear.com on Monday from a banner advertisement
He visited on Tuesday from an email advertisement
He visited on Thursday from paid search
Finally he came directly on Friday by typing URL in the browser and made a purchase.
Who should we give the credit to?

Case 2: Upto how many days after visit should a marcom vehicle get credit?
Suppose Peter came from an affiliate site to www.evogear.com on Monday March 1st.
Suppose he bought on March 15th. Should you pay the affiliate site?

Affiliates usually charge 4-5% of the revenue that results from purchase and the payment model varies from 30 - 90 days after first visit depending on the equity between site and affiliate (Porter here :) -> Supplier Power/Vendor Power)

Knowing how your purchasers usually buy and what vehicle touchpoints they take is an important step to undertake before paying the ODG vehicles - that gives you good negotiating power and helps strike good deals
I will tackle case ii) in this post and case i) in a separate one.


Tackling case ii) - What is the ideal number of days to consider?


To find: How many purchases happen 1, 2, 3, 4, .........n days after the person came from Campaign. Say I want to find effectiveness of affiliates

Method: Usage of latency feature in Omniture Insight/Visual Sciences. Will outline this in a separate post

Friday, March 26, 2010

eCRM - how is it different from traditional CRM?

CRM or Customer Relationship Management is:
* understanding your customers needs and wants - so that you are able to customize offerings to them
* understanding your most valuable customers and retaining them; converting the rest to this bucket

Traditional CRM methods like at a retail store can customize only as much. E.g: At a Reliance Fresh outlet or at a Walmart outlet, there can be customization in the form of:
* Separate compartments by 'product type' or 'customer type'
* Coupons to improve loyalty (Points per purchase accumulated over time can be exchanged for purchases or discounts)
All the above are examples of converting a m:n problem where m are the offers and n the customers where m is always less than n.

ECRM differs from traditional CRM from a sheer capability standpoint. Theoretically it is possible in ECRM to have one offer per customer - i.e. 1:1 marketing.
* Based on prior visitor behavior it is possible to dynamically generate a page per customer which is true 1:1 marketing
* While we are yet to reach there, it is still possible to have a i:j solution where i is many times lower than m and j is many times lower than n in the m:n traditional CRM problem

Saturday, February 27, 2010

Confidence: Business Angle & Statistical Angle

When will my test ever reach 100% confidence?
What does 90% confidence mean for me? Should I wait till I have 95% confidence before calling a winner?
How long should I run my test?
Is Test & Target the only way to get the confidence value OR can I calculate it independently?

Above are questions from business users that every analytics person in the web world is familiar with. I will explain this in two parts:
a) From the business angle/viewpoint
b) From the statistical viewpoint


CONFIDENCE FROM A BUSINESS ANGLE:
When you state:
"Recipe A is better than Recipe B on RPV with a confidence of 90%" -> It means that if you run the test on 100 days (i.e. there were 100 independent trials of the test, you could be confident that on around 90 of those days, Recipe A would be the winner)"

So there are key points here:
a) Confidence is based on a variable
b) Confidence is a percentage
c) The percentage indicates the surety you have that when you shut off the test and go "live", that the live results would be similar to the results of the experiment

High confidence means that one test is CONSISTENTLY beating the other one.
For all our examples let us choose RPV (Revenue per visit) as the metric.

For a business user, a line chart can be an indicator of confidence.

Rule 1: Farther apart the lines the better
Rule 2: Lesser the overlaps between the two lines the better
Rule 3: Lesser the spikes on any particular day the better

In figure 1: the RPV values are far apart from each other while in figure 2 they are not.
In figure 1, the overlap is minimal while in figure 2 there is significant overlap


Statistically, you will always find that in figure 1, the confidence is higher (it is going to be close to 100% for this example 1)

Rule 3 is important - it implies that the standard deviations be low - in case there are lot of spikes, it is good to calculate confidence after removing the outlier dates

So answers to the business questions:
1. There is more than a 'definite' chance that your test will never reach 100% confidence
2. 90% confidence means that when you shut off the test and go live, you can be sure that atleast in 90 out of 100 time periods, the results that you saw in the test environment are replicated in the real world
3. There is no definite period for which you must run the test - Run it for atleast 2-3 weeks. Run it as long as you have sufficient data to understand why the test performed well and why it did'nt
4. No, Test&Target/Offermatica or these tools are not the only way to calculate confidence. It is simple and you can calculate it in excel


CONFIDENCE FROM A STATISTICAL ANGLE:
First, I will tell you how to calculate confidence in excel. Then I will go into the statistical detail of it.

1. Isolate the variable on which you are trying to calculate confidence. Write down the numbers vertically by day as shown below:
2. Fill up the following values:
Recipe A
Mean:
Standard Deviation:

Recipe B
Mean:
Standard Deviation:


Above are easy: Calcualated using average (i:j) and stdev (i:j) functions

Degrees of Freedom: Above is number_of_Days for recipe A+number_of_days for recipe B - 2


t-value:
p-value:
Above are based on formulae as in the excel snapshot





As you have seen above, from a statistical viewpoint following are important:
1. Mean of the recipes
2. Standard Deviations
3. Essentially the above - and the resulting histograms/normal distributions of the Recipe RPVs

What the t-test does is that it plots the normal distributions side by side.
In case there is no overlap as in case 1, it is the best solution for confidence
Cases 2 and 3. show high overlap and hence low confidence

Friday, February 19, 2010

Bridging gap between TnT and Insight

A common scenario in the life of a web analyst focussed on Testing & Targeting using Omniture tools:

Business Scenario:

1. Test run with 2 or more recipes. Test & Target shows recipe A as better over other competing recipes B, C, D......A was control
2. The new design that the business wanted to see was unsuccessful. Business wants to drill down into reasons for the underperformance of B.

Task for web analyst as a result of above:
1. Underperformance of B could have been due to a variety of reasons (first time vs. repeat, product mix, browser/visitor profile, traffic source, ....).
These cannot be answered using Test & Target - needs to use other Omniture tools like Discover or Insight.
2. Insight is the most powerful web analytics tool in the world and I presume the web analyst would prefer to use that given his company has the tool
3. However while Test & Target shows Recipe A as the winner on RPV due to certain reasons and by a certain margin; the other Omniture tools differ in the numbers from Test & Target
Now how to bridge the gap?

Bridging gap between RPV numbers of Test & Target vs. Other Omniture tools (Insight, Discover, SiteCatalyst,....)
1. Test & Target counts visits only if on the visit the user interacted with the page on which the test was being run.
Test & Target uses mboxes to show differing content between recipes. For a visit to get counted as a TnT visit, it is necessary for the mbox to fire.
Example:
So say you are running a test on the deals page.
Visitor X on visit_id x1 saw the deals page and added to cart. This visit is counted as a TnT Visit and as a SiteCatalyst visit
Visitor X on visit_id x2 went to cart where he had saved and viewed the item. This visit is not counted as a TnT visit but is counted as a SiteCatalyst visit

So the visit number as is from TnT vs. SiteCatalyst/Insight is going to match only if you use Insight to replicate the TnT behavior. Unless you replicate the TnT behavior using Insight, the numbers won't match

2. Timezone
TnT Timezones and Insight timezones are setup by your implementation consultant.
It is quite possible the Insight timezone was CST while TnT was GMT. In such a case you have to offset the times

3. Tag firing
It is possible that the TnT Tags fired while the SC ones didn't - this is a small proportion

4. Revenue attribution logic
TnT attributes revenue when the mbox on the orderConfirm page fires.

Implication:
Revenue will be counted in TnT if the visitor saw the campaign on a prior visit and then made a purchase on a new visit without going to the page.

Example:
On 12th Jan, Mr X visited deals page where an offer was being run; added to cart and saved to cart
On 13th Jan, Mr X bought $100 from saved cart without visiting the deals page
The $100 is counted as TnT revenue

In TNT the revenue attribution is for 14 days. Insight/SC might be using 30 days. So you have to replicate the TnT behavior in SC/Insight

5. First/Repeat Visit

TnT bases first/repeat visit on the cookie mbox being fired. So what is a repeat visit for SC/Insight may not be for TnT
However Insight can replicate TnT behavior

All this tells us why Insight must be used to match TnT behavior as you can create powerful segments to replicate the TnT behavior

Sunday, January 3, 2010

Perils of Sampling

You have so much online data - storing it is so difficult - How about using a sample for it? What is a good sample? 1:1, 2:1, 4:1 or 16:1

Hmm......Good idea????? Will save me COSTS and PROCESSING TIME


Election results are announced for an entire nation of 100 crores with a sample of 10,000 - some of them accurately predicted - so is 16:1 a good sample for online data?

My answer is an 'EMPHATIC NO'

(1)
The answers that an election survey and online analytics are trying to answer are VERY DIFFERENT.
* Election survey is trying to answer a SINGLE QUESTION with RESTRICTED CHOICES (Which party is going to win?)
* In online data the QUESTIONS ARE MANY and the CHOICES ARE NOT KNOWN.

(2)
Sampling online data can be HAZARDOUS for PAGES THAT GET A FRACTION OF THE SITE VISITS. Say there is a site with 10K visits a day and a page with 100 visits a day.

Sampling means the results are based on 2.5K out of 10K visits. now it is possible that only 10 of these 2.5K fell into the 100 for the page. The results are going to be OBVIOUSLY WRONG

So my advice: it is a MUST to store ALL DATA for a reasonable TIME PERIOD.

Test & Target

Having used Test & Target (an Omniture product) for over a year on atleast 50+ tests, I must say that there is a good market for a better tool :)


What Test & Target does well?

* Normal Beta Testing - sending traffic randomly to 2 or more recipes in the same proportion
* Gathering metrics like Revenue, Orders, Average Order Value, Order Conversion and publishing it real time, including confidence
* Ability to stop the test if results are not in sync

What Test & Target does not do well?
* Detailed Drilldown of a test - Answers as to why one recipe won over another
* Metrics wrt to visitor segments

More often than not, Test & Target forces the business users (i.e. us) to resort to additional tools to get the "answers" ending up being a "plain reporting" tool. One has to use SiteCatalyst & Discover (other Omniture suite products) or Insight (old Visual Sciences) or resort to a database driven process to get the answers

These problems can be solved if Test & Target tool allows the users to view results of certain pre-defined workflows - and in my opinion this is not much.
Some pre-defined workflows in an e-commerce environment can be:
* Which product caused the difference? Product Mix
* Business Segment Issues -> This is specific to organizations. E.g: Book Shopper vs. Electronics Shopper
* Visitor Profile -> 1st vs. repeat
* Marcom Mix/Traffic Source
* Path Taken/Visitor Segment

There is certainly a good market for such a tool - and with so many brilliant CS programmers out there - it would be a great summer project that can rake in money!!
A market definitely exists