home | about us | services | portfolio | blog | merchandise | contact us

Archive for October, 2006

Hacking the Long Tail

Thursday, October 19th, 2006

Making Collaborative Filtering Work for WineLog.net

What is the Long Tail?
Coined (in its current form) by Chris Anderson in his October 2004 Wired magazine article, “long tail” refers to a common pattern seen in graphs depicting product demand or units sold. The graph below is a simple chart of units sold verse the total number of units in a particular domain. This could represent, for example, the number of copies sold of each book in Amazon.com’s catalogue.

The Long Tail.
Image of the long tail, borrowed from the Wikipedia entry of the same name.

Continuing with the Amazon example, the x-axis would represent all books ever printed, and the y-axis would represent the number of units sold for each particular book. On the far left of the graph are large sellers like the the DiVinci Code or the Harry Potter series. On the far right are some books only a few people are interested in, with a relatively low number of sales.

The line in the center of the graph is very important as well. This line represents the cutoff point a traditional retailer might have for stocking items in their domain. In our books example, books to the right of the line have too low a volume to be worth the shelf space they would take up at a bricks-and-mortar store. Everything to the left of the line represents the books which would be readily available.

The term long tail refers to the shape of the graph, but more specifically to the area under the graph to the right of the center line. This represents an opportunity for sales which couldn’t be tapped into until recently. It’s a big opportunity too. A visual scan of some of these graphs shows that the area (representing revenue) of the long tail section can be as large or larger than the area of the so-called “short tail”.
People who haven’t done so should read Chris Anderson’s original article (or better yet, his book, The Long Tail: Why the Future of Business is Selling Less of More). The article explains in more detail why long tails exist at all and discusses the technological and business advances that have lead to a situation where companies can benefit from catering to the long tail.

- - -
Wine Sales Has a Long Tail
Now on to why I’ve been interested in the long tail lately…

Patrick Angeles of InertiaBev published a chart of data from their system, showing that there is a long tail in wine sales. InertiaBev provides “on-demand software that empowers wineries to sell direct and measure the performance of their sales”. Here is a graph that Patrick posted along with his article:

The Long Tail.
A long tail in wines, borrowed from Patrick article at the InertiaBev blog.

Looks familiar doesn’t it. Like with books, a relatively few number of wines account for a large number of sales. Also like books, there is a large group of wines that sell in smaller units but add up in total to a lot of revenue.

Quick note: Patrick originally published a graph based on actually numbers, but took it down to “protect the innocent”. You’ll see the same generic graphic when you visit the link above. However, the observations and numbers in the article come from real data.

- - -

Good and Bad Long Tail Business Models
So we have a long tail. Does that mean that we can build an Amazon or Netflix-like business model on top of wine? Maybe. Maybe not.

The thing to understand is that the long tail is not enough to have a good business model because there is a problem with scaling from offering 5000 books to 5,000,000, or from 500 wines to 10,000. What’s needed is a way to intelligently introduce customers to items in the long tail they might be interested in. While there are a lot of books in the long tail I would be interested in reading, I may never hear of them. Mass media and advertising is going to be focused on those books to the left of the cutoff point, the most profitable ones (on a per-title basis). But you can’t just recommend any book to me; there are at least as many books in the long tail that I would hate as ones I would want to read.

So successful long tail business models have measures to effectively prune or recommend items from their catalogue. Recommendations can be as simple as notifying a user of books written by the same author or wines from the same region. These can work to an extent, but something more is needed to make the recommendations feel personal.

This is where collaborative filtering comes in. It sounds complicated, but it’s really a simple idea. Make recommendations to me based on what I and other users have expressed satisfaction with in the past. At Amazon, we have the now ubiquitous “if you like this item, you might also enjoy…” recommendations. Netflix is another company that has successfully implemented collaborative filtering with their star rating system (and is now offering $1 million to anyone who can improve those recommendations.)

There are other methods of making recommendations (and WineLog uses a bunch of them), but the hacking section of this article focuses specifically on how our data can be tweaked to make it better for collaborative filtering algorithms. The ideas should be applicable to other non-trivial recommendation systems though.
Amazon and Netflix are examples of companies successfully making business in the long tail. How about some business models that wouldn’t work?

Consider the long tail for dating. This graph is extremely flat. The most promiscuous of humans are dating no more than 1000 people per lifetime. One thousand dates may sound like a lot, but then compare that to the 6 billion people in the world. This tail is really long… and so really flat. Even if you limit yourself to just people of one sex in the U.S. (about 150 million people), the curve is still extremely flat.
The problem with a long tail for dating (and flat long tails in general) is that there isn’t enough overlap to make useful recommendations. How often do you find people with more than one past partner in common? Not often.

Side note: collaborative filtering for dating is an interesting problem for a number of reasons as well. Consider the fact that your “ratings” will be skewed to the negative. If you are rating someone 5-stars out of 5, you’re going to try your hardest to take that person off the market. Also, locality affects dating (as it does almost any business). “It’s nice that you’re recommending this nice Asian girl for me, but I’d rather not have to leave Philadelphia to find a soul mate.” I think challenges like this just make the problem more appealing to me. DateLog next?

Another problem for collaborative filtering is with domains where the subjects are too similar. A recommendation engine for drinking water would fall into this category. One the one hand, similarity is a good problem because users are less likely to be upset about a recommendation (all water tastes refreshing served cold). On the other hand, if you want to make good recommendations, you’re not going to have a lot of data to work from. Some long tails may have too little differentiation among their top performers to be able to make good recommendations from.

- - -

What about WineLog.net?
Wine is a domain that seems to suffer from the two problems stated above. The long tail is generally flat, with little overlap in wines drunk by each user (compared to the level of overlap in movie viewing or book reading). It could also be said that the typical wine drinker can’t differentiate between a wine they like and one they don’t like, putting the quality of ratings in question.

Side note: Oddly, the fact that new wine drinkers can’t differentiate between a wine they like and one they don’t like is one of the reasons this domain is so interesting to us. Wine is an acquired taste, and we’d love to help our users on their path to enjoying wine on a new level. Also, many drinkers know they like a wine, but don’t know why. Besides making unit recommendations, our system can teach a user what it is they like about wine even if they don’t know it themselves by showing the user common tags among wines they like and dislike. Making recommendations based on tags is a topic for another article.

Alder at Vinography.com posted what’s become a popular article in our industry about Why Community Tasting Note Sites Will Fail. Here are his major points (paraphrased).

  1. Need a lot of wines.
  2. Users are stupid.
  3. No incentive to use.
  4. Wine “lovers” is too small a niche.

Not all of these deal with the long tail directly, but they hit on the problems that make collaborative filtering difficult for the domain of wine. (For a more direct rebuttal to these points, look for my comment on that blog post. Search for “Jason Coleman” on the article page.)

- - -

Hacking the Long Tail
So how can we hack the long tail to make collaborative filtering work for WineLog.net (or any domain with the same issues)? Visually, what we are trying to do is “build up the short tail” of the graph to give us better data for making recommendations on the long tail.

The Long Tail.
A marked up version of the original long tail image showing how suggesting “key units” can build up the short tail.

One of the great benefits of having a community site focused on wine like WineLog.net is that the community will build up the short tail on their own. Users of WineLog will notice the most popular wines of the site and are more likely to try those wines themselves. In this way, the users create the overlap necessary to make better predictions in the long tail space. Visually, the long tail graph will be growing “fatter” on the left side as these popular wines are tasted by more users.

So our first step in utilizing the long tail to the max is to recommend and encourage the rating of these popular “key” wines to our users. I call them “key” wines because, in a sense, they are used to build a taste profile for the collaborative filtering algorithms.

A good key wine is popular. A GREAT key wine is both popular and DISPUTED. A disputed wine is one with a variety of ratings. A wine which is rated 3-stars by all drinkers won’t tell you very much about a user’s taste. Now a wine with three 5-star ratings and three 1-star ratings is going to tell you a lot. Here’s a wine that is polarizing our users into two distinct camps.

We won’t necessarily be stretching out the short tail any further by recommending key wines with polarized ratings like this, but we will be getting the maximum use out of the short tail that we have.

So now you have our strategy: recommend key wines to our users which are both popular and disputed. This is going to build up the short tail for us, which should lead to better recommendations in the long tail space.

So while it may not be true that most wine drinkers have drunk Mano a Mano Tempranillo 2004 (even though they’ve all seen the movie Titanic), we can encourage our users so that the statement “Most WineLog.net users have 3-5 of our 20 key wines” will be true.
- - -

The Big Picture
I’m a technical guy. The collaborative filtering problem is an interesting one to me, but it’s not what this business hinges on. Effectively suggesting wines from the long tail will help our business, but there are other ways to do this besides CF algorithms. Users will find recommendations from friends, trusted sources, and other simple recommendations (like “try another 2005 Shiraz from Australia). Other data collected, from tags for instance, can also be mined in interesting ways to profile users and suggest wines for them.

I hope this article has explained the long tail well and how it applies to sites trying to sell (or like WineLog, just try to recommend) wines. We learned that a site’s ability to suggest units from the long tail is a large factor in the success of their business model. We learned about two problems some domains have with their long tail: lack of overlap and homogeneous units. And we finished up with a strategy for combating these problems with respect to collaborative filtering strategies: suggest key units which are both popular and polarizing.

The focus was on wine and collaborative filtering, but the ideas brought up should be applicable to other domains and recommendation systems.

BarCamp NYC 2 Follow-up

Monday, October 2nd, 2006

Kim and I had a great time at BarCamp in New York this weekend. We met a ton of like-minded people doing incredible things.

Our presentation on “Hacking the Long Tail: Making Collaborative Filtering work for WineLog.net” went well. I was hoping to get educated by someone in the audience who knew more about the Long Tail than I did, but people were either busy making impromptu business plans in a competing session or more interested in WineLog iteself than the Long Tail.

The wine tasting part did go over well. There was a lot of wine left over, but that just made our evening more interesting. You can see which wines we tasted at BarCamp NYC2 at WineLog.

Presentation PPT: BarCamp_Presentation.ppt (blog post to follow)
Completed Wine Tasting Spreadsheet: BarCampNYC2_WineTasting.xls

The presentation and tasting was filmed, but I’m not sure if it is online yet. I’ll link to it when it shows up.

Here is a brief (and no way complete) list of interesting people we met this weekend:

  • Chris from VistaPrint. Chris was a fun guy to hang out with. Kim and I are big fans of VistaPrint; their cheap inexpensive printing services help us save a ton of money for our clients. I was as excited to meet someone from VistaPrint as I would be to meet someone from the Chicago Bears. At the same time, Chris was excited to meet real users of the system he works on.
  • Nate Abele from CakePHP. Nate is a PHP developer to look up to. If there were PHP developer trading cards, I would covet his rookie card. His project Cake is a framework for PHP. It’s similar to the popular “Rails” framework for the Ruby programming language but for PHP. Unlike some other PHP frameworks, Cake doesn’t just try to mimic Rails. Instead it is built with PHP developers in mind and works off the strengths and weaknesses of PHP. It’s really interesting stuff, and I’m excited about trying it out with a future projects. It would have been great for WineLog if I had looked into it more before I custom-coded a lot of what Cake offers.
  • David Cohn. David writes for Wired.com. He’s writing an article about BarCampNYC2, which might feature Kim and I as crazy Philadelphians who made the trip up to New York to walk around in our socks and talk to geeks.
  • Dean “the Australian” Collins. Dean does a lot of things (like us!), but just seems to love helping startups start up. He was always good for a good comment/question in any session he attended. You can find out more about Dean and his work at Cognation.net.
  • Avi Welnsky. Avi writes for us over at InvestorGeeks. I had never met him in person though… cool guy. Among other things, he taught me a great way to get a high page ranking in MSN’s search engine.

I can’t relate how great the atmosphere was at BarCamp. I had that feeling like “these are my people”. There were a lot of attendees I wish I could have spent more time talking with. Contacts were made though, and I’m sure we’ll follow up. For others, we’ll just have to wait for the next BarCamp.

Subscribe by RSS

You are currently browsing the Stranger Studios weblog archives for October, 2006.

View my Wine Log
Kim Wallmeier
View my Wine Log
Jason Coleman
visit our network sites: