Monday, December 01, 2008

Yahoo's automatic content optimization

Deepak Agarwal along with many others from Yahoo Research have a paper at the upcoming NIPS 2008 conference, "Online Models for Content Optimization", with a fun peek inside of a system at Yahoo that automatically tests and optimizes which content to show to their readers.

It is not made entirely clear which pages at Yahoo use the system, but the paper says that it is "deployed on a major Internet portal and selects articles to serve to hundreds of millions of user visits per day."

The system picks which article to show in a slot from 16 potential candidates where the pool of candidates are picked by editors and change rapidly. The system seeks to optimize the clickthrough rate in the slot. The problem is made more difficult by the way the clickthrough rate on a given article changes rapidly as the article ages and as the audience coming to Yahoo changes over the course of a day, which means the system needs to adapt rapidly to new information.

The paper describes a few variations of algorithms that do explore/exploit by showing the article that performed best recently while constantly testing the other articles to see if they might perform better.

The result was that the CTR increased by 30-60% over editors manually selecting the content that was shown. Curiously, their attempt to show different content to different user segments (a coarse-grained version of personalization) did not generate additional gains, but they say this is almost certainly due to the very small pool of candidate articles (only sixteen articles) from which the algorithm was allowed to pick.

One amusing tidbit in the paper how they describe the culture clash that occurred between maintaining the control the editors were used to and giving the algorithms freedom to pick the content users really seem to want.

I remember similar issues at Amazon way back when we first started use algorithms to pick content rather than editors. It is hard for editors to give up control even to the collective voice of people voting on what they want. While I always have been sympathetic to the need for editorial voice, if it is forcing content on users that they do not actually want, it is important to understand its full cost.

No comments: