Tag Archives: netflix prize

Netflix Prize photo finish!

Two hours less than 30 days ago, the team of BellKor’s Pragmatic Chaos submitted the first entry to the Netflix Prize to exhibit a 10% improvement in performance over Netflix’s movie-recommendation algorithm.   That started the final clock for the competition — whoever’s ahead at 2:42 Eastern time today wins the $1 million prize.

One of the really interesting lessons of the competition is that blendings of many algorithms seem to work better than any single algorithm, even when there’s no principled reason to do the blend.  It’s sort of a “wisdom of crowds of computer programs” effect.  As you can imagine, once BellKor’s Pragmatic Chaos (itself a blend of algorithms from two teams that have been leading through most of the competition) crossed the 10% threshold, just about everybody else realized their best — probably only — chance was to work together.

As of yesterday afternoon, a team called The Ensemble,  made up of — well, I can’t really tell how many previously separate competitors, but a lot — has achieved a 10.09% improvement.  BellKor’s Pragmatic Chaos is at 10.08%.  A hundredth of a percentage point might determine who gets the million.   Wow.

Is BPC sitting on a slightly better algorithm they’re planning to submit at the buzzer for the win?  Check the leaderboard later this afternoon to find out.

Update: Double wow.  No announcement yet, but it looks like BPC and The Ensemble both made submissions in the last hour of the contest; BGC made it to 10.09% but Ensemble, four minutes before closing time, bumped up to 10.10%.

Re-update: I didn’t read the rules carefully enough.  It looks like there’s another dataset (‘test dataset”), distinct from the one generating the public scores (“quiz dataset”) and the final winner will be the program that does best on the test dataset.  So the shifts in the lead, exciting as they are, aren’t necessarily relevant to the contest; we’ve got two algorithms which are essentially identically good and it ought to be a coin-flip which one does better in the final judgment.

Tagged , , , , ,

The moduli space of senators and the moduli space of movies

Last week I blogged about Dmitri Tymoczko’s lecture and the moduli space of chords; since then I remembered some more nice examples of “moduli spaces” in the loose sense described in that post. One comes from the work of Keith Poole and Howard Rosenthal, who attempt to answer the question: what is the best way of “mapping” the members of a legislature in two-dimensional space so that two legislators are close together precisely when their voting records are very closely aligned? In other words, what is the moduli space of senators? Go to Poole’s VoteView page and scroll to the bottom to see the last 100 years of the House and Senate as an animated .gif. Or just read what I wrote about their research on Slate. (Article from 2001, so some links may be dead.)

Votes in the Senate make up a complicated dataset, which people like Poole and Rosenthal like to make more accessible by means of two-dimensional charts. An even more complicated dataset is the set of 100,000,000 Netflix ratings of 18,000 movies which the strivers for the Netflix Prize have to wrestle with. But this too can be nicely mapped into two-dimensional space (or any-dimensional space, but two-dimensional pictures are the easiest to look at!) yielding a “moduli space of movies” in which two movies are close together just when they tend to be liked by the same set of users. Todd Holloway, a CS grad student at Indiana,has made some beautiful examples:

Go to his blog to see the interactive version, or his visualization of the power struggle in Wikipedia.

Tagged , , ,

My piece in Wired: The Netflix Prize

Last month I wrote an article for Wired about the Netflix Prize; a competition to develop a better algorithm for recommending movies, with $1 million from Netflix as the incentive. This kind of problem is immensely hard: the set of ratings submitted by Netflix users is huge, but very sparse (most users haven’t rented most movies) and very noisy (people make mistakes, their tastes change with time, multiple people may be rating on one account.) So to be able to massage this data into a decent set of movie recommendations is a formidable task — as you probably already know from the typically unsatisfactory performance of the recommendation engines that Netflix, Amazon, and so on, use now.

Anyway, the article’s now online; I write a bit about the mathematical techniques that the experts in the area use to attack this genre of problem, and one very interesting non-mathematician with a different and nearly as successful approach.

Tagged , , ,
%d bloggers like this: