Category Archives: data

Categorisation and tagging

Tagging offers an efficient method to both searching and for identifying customer preferences on the axis most appropriate for the customer

The traditional way to organise a retail catalogue is by means of hierarchical categorisation. If you’re selling clothes, for example, you first divide it into men’s and women’s, then into formal and casual, and then into different items of clothing and so on. With a good categorisation, each SKU will have a unique “path” down the category tree. For traditional management purposes, this kind of categorisation might be useful, but it doesn’t lend itself well to both searching and pattern recognition.

To take a personal example (note that I’m going into anecdata territory here), I’m in the market for a hooded sweatshirt, and it has been extremely hard to find. Having given up on a number of “traditional retail” stores in the “High Street” (11th Main Road, 4th Block, Jayanagar, Bangalore) close to where I stay, I decided to check online sources and they’ve left me disappointed, too.

To be more precise, I’m looking for a grey sweatshirt made with a mix of cotton and wool (“traditional sweatshirt material”) with a zipper down the front, pockets large enough to keep my hands and a hood. Of size 42. This description is as specific as it gets and I don’t imagine any brand having more than a small number of SKUs that fit this specification.

In case I were shopping offline in a well-stocked store (perhaps a “well stocked offline store” is entering mythical territory nowadays), I would  repeat the above paragraph to a store attendant (good store attendants are also very hard to find nowadays) and he/she would pick out the sweatshirts that would conform to these specifications and I would buy one of them. The question is how one can replicate this experience in online shopping.

In other words, how can we set up our online customer catalog such that it becomes easy for shoppers to search specifically for what they’re looking for. Currently, most online stores follow a “categorisation” format, where you step into two or three levels of categorisation, where you’re shown a large assortment. This, however, doesn’t allow for efficient search. Let me illustrate by my own experience this morning.

1. Amazon.in : I hit “hoodies” in the search bar, and got shown a large assortment of hoodies. I can drill deeper in terms of sleeve length, material, colour and brand. My choice of material (which I’m particular about) is not there in the given list. There are too many colour choices and I can’t simply say “grey” and be shown all greys. There is no option to say i want a zip-open front, or a cotton-wool mix. My search ends there.

2. Jabong (rumoured to be bought by Amazon shortly): I hover over “Men’s”, click on “winter wear” and then on “hoodies”. There is a large assortment of both material (cotton-wool mix not here) and brand. There are several colours available, but no way for me to tell the system I’m looking for a zip-down hoodie. I can set my price-range and size, though. Search ends at a point when there’s too much choice.

3. Flipkart: Hover over “men’s”, click “winter wear” and then sweatshirt. Price, size and brand are the only axes on which I can drill down further. The least impressive of all the sites I’ve seen. Too much choice again at a point when I end search.

4. Myntra (recently bought by Flipkart, but not yet merged): The most impressive of all sites. I hover over “Men’s” and click on sweaters and sweatshirts (one less click than Jabong or Flipkart). After I click on “sweatshirts” it gives me a “closure” option (this is the part that impresses me) where I can say I want a zippered front. No option to indicate hood or material, though.

In each of the above, it seems like the catalog has been thought up in a hierarchical format, with little attention paid to tagging. There might be some tags attached such as “brand” but these are tags that are available to every item. The key to tagging is that not all tags need to be applicable for all items. For example, “closure” (zippered or buttoned or open) is applicable only to sweatshirts. Sleeve length is applicable only to tops.

In addition to search (as illustrated above), the purpose of tagging is to identify patterns in purchases and know more about customers. The basic idea is that people’s preferences could be along several axes, and at the time of segmentation and bucketing you are not sure which axis describes the person’s preferences best. So by having a large number of tags that you assign to each SKU (this sadly is a highly manual process), you give yourself a much superior chance of getting to know the customer.

In terms of technological capability, things have advanced much in terms of getting to know the customer. For example, it is now really quick to do a Market Basket Analysis based on large numbers of bills, which helps you identify patterns in purchase. With the technology bit being easy, the key to learning more about your customers is the framework you employ to “encase” the technology. And without efficient tagging, you are giving yourself a lesser chance of categorising the customer on the right axis.

Of course for someone used to relational databases, tagging requires non-trivial methods of storage. Firstly the number of tags varies widely by item. Secondly, tags can themselves have a hierarchy, and items might not necessarily be associated with the lowest level of tag. Thirdly, tagging is useless without efficient searching, at various levels, and it is a non-trivial technological problem to solve. But while the problems are non-trivial, the solutions are well-known and advantages large enough that whether to use tags or not is a no-brainer for an organisation that wants to use data in its decision-making.

 

A/B testing with large samples

Out on Numbers Rule Your World, Kaiser Fung has a nice analysis on Andrew Gelman’s analysis of the Facebook controversy (where Facebook apparently “played with people’s emotions” by manipulating their news feeds. The money quote from Fung’s piece is here:

Sadly, this type of thing happens in A/B testing a lot. On a website, it seems as if there is an inexhaustible supply of experimental units. If the test has not “reached” significance, most analysts just keep it running. This is silly in many ways but the key issue is that if you need that many samples to reach significance, it is guaranteed that the measured effect size is tiny, which also means that the business impact is tiny.

This refers to a common fallacy that I’ve often referred to on this blog, and in my writing elsewhere. Essentially, when you have really large sample sizes, even small changes in measured values can be statistically significant. The fact that they are statistically significant, however, does not mean that they have a business impact – sometimes the effect is so small that the only significance it has is statistical!

So before you blindly make business decisions based on statistical significance, you need to take into account whether the measured difference is actually significant enough to have an impact on your business! It may not strictly be “noise” – for the statistical significance test has shown that it’s not “noise”, but it is essentially an effect that, for all business purposes, can be ignored.

PS: Fung and Gelman are among my two favourite bloggers when it comes to statistics and quant. A lot of what I’ve learnt on this subject is down to these two gentlemen. If you’re interested in statistics and quant and visualisation I recommend you to subscribe to both of Fung’s feeds and to Gelman’s feed.

Nate Silver Interview At HBR Blogs

HBR Blogs has interviewed Nate Silver on analytics, building a career in analytics and how organizations should manage analytics. I agree with his views on pretty much everything. Some money quotes:

HBR: Say an organization brings in a bunch of ‘stat heads’ to use your terminology. Do you silo them in their own department that serves the rest of the company? Or is it important to make sure that every team has someone who has the analytic toolkit to pair with expertise?

Silver: I think you want to integrate it as much as possible. That means that they’re going to have some business skills, too, right? And learn that presenting their work is important. But you need it to be integrated into the fabric of the organization.

And this:

Silver: If you can’t present your ideas to at least a modestly larger audience, then it’s not going to do you very much good. Einstein supposedly said that I don’t trust any physics theory that can’t be explained to a 10-year-old. A lot of times the intuitions behind things aren’t really all that complicated. In Moneyball that on-base percentage is better than batting average looks like ‘OK, well, the goal is to score runs. The first step in scoring runs is getting on base, so let’s have a statistic that measures getting on base instead of just one type of getting on base.’ Not that hard a battle to fight.

And this:

Silver: A lot of times when data isn’t very reliable, intuition isn’t very reliable either. The problem is people see it as an either/or, when it sometimes is both or neither, as well. The question should be how good is a model relative to our spitball, gut-feel approach.

Go on and read the whole interview.

Would you want a free membership card?

Last weekend I was at Cafe Coffee Day on MG Road, waiting to meet a prospective client, when one of the store staff walked up to me with a card. “This is a free loyalty card, Sir”, he said, going on to tell me that if I were to buy three coffees using the card I would stand to get a free additional coffee. Considering that Cafe Coffee Day is my favourite meeting room, I thought it might make sense to use the card.

You might have noticed this in supermarkets, too. Invariably you get asked if you want a free membership card. In case you tell them that you had taken a card but are not carrying it, they offer to find your card number for you based on your phone number. Given that it probably costs the store to issue these cards (cost of cards, maintenance, staff time, cost of rewards), there must be a good reason that they are so eager to give it to you for free.

What separates traditional retail from modern is that in the latter, there are way too many customers who visit a store, and way too many store staff who attend to them. Consequently, it is hard for store staff to know who is a regular, and what the regular customers want. You might go up to your regular “single store” bar, and the barman might start mixing your favourite drink as soon as he sees you pop in. In a chain such as Cafe Coffee Day, however, such information is not forthcoming.

What loyalty cards enable stores to do is to track repeat purchases. You might be buying a kilo of rice every week at the neighbourhood supermarket, but in most cases there is no way for the supermarket to know it is you who bought the rice each time. Once you have a card and your sales are logged to that each week, the store knows how often you visit and what kind of items you are likely to buy together. Loyalty cards allow the store to “profile” you and thus hope to serve you better. The cost of the card is small compared to the value of the information the store gets about you.

A new trend in loyalty cards is “third party cards” that work across stores. These cards are issued by independent third party vendors and multiple stores subscribe to them. The advantage with these cards is that the third party has information about the customer across retailers. So for example, it makes it possible for the vendor to know the brand of formal shirts that people who buy Levi’s Jeans buy.

While this is a dream from an analyst’s viewpoint, the uptake of such cards so far has been low. I know of at least one company in this area that folded up and another that is not doing too well. Hopefully this trend will reverse soon and we will find one player who manages to scale up and issue cards at lots of retailers.