furialog

¶ Would you win a contest to do your job? · 11 November 2006 tech

I signed up for the NetFlix Prize without, originally, any intention to ever actually compete. I haven't studied any math or statistics since AP Calculus over 20 years ago, and my usual analytical tools aren't remotely adequate for analyzing data on the scale required by the contest. I have a couple laptops at my disposal, but no distributed number-crunching farms. My current programming tool of choice is Ruby, which is optimized for coding efficiency rather than processing speed or memory compactness.

But my current job involves thinking about useful ways to explore and analyze arbitrary bodies of data, and it's not often that anybody gives away a dataset as large or interesting as NetFlix's, so I wanted to at least play with it a little.

And after playing with it a little I realized that submitting a set of bad guesses, at least, wasn't that difficult. So I did, just for the satisfaction of feeling involved. One of the strangest and maybe most addictive draws of the NetFlix Prize is that bad guesses, if there's even the slightest bit of non-badness embedded in them, are for most practical purposes only trivially worse than fairly good guesses.

The way the contest works is that you're given the data for several million rating actions taken by real people on the real NetFlix system. Each rating has a user ID, a movie ID, a date, and a score of 1, 2, 3, 4 or 5. You get no other information about the user IDs, of which there about half a million, and only titles and years for the movies, of which there are 17,770.

In addition to all this, you also get a file of movie IDs, user IDs and dates, but no scores. These represent other rating actions actually taken by some of those same users on those same movies. NetFlix knows what scores go in these blanks. You don't. To play, you just have to submit a set of guesses for all the blanks. You're scored on how close you get, with a little squaring and square-rooting just to make it sound more complicated.

Meanwhile, back at headquarters, NetFlix has run their own actual software for analyzing movie ratings, using this same exact data. It scores 0.9514, which means it can usually guess ratings within about 1. My first simplistic set of guesses, doing no similarity analysis of any kind, scored 1.0451. This means that my dumb method can usually guess ratings within about 1. Obviously 0.9514 is better than 1.0451, and it's even a little better than it looks due to the exact math, but the human-scale truth is still this: when you're talking about person-assigned integer ratings on a scale of 1 to 5, nobody but us number-geeks is ever going to get excited about the difference between one "about 1" and another "about 1" based on hundredths of a point. In once sense this makes the whole contest pretty idiotic, because who the hell cares?

But if you can get your score down to the NetFlix-chosen target of 0.8563, you stand to win a million dollars. And if 0.8563 is also pretty much "about 1", the difference between $1,000,000 and $0 is much more readily apparent. And submitting some guesses that were at least a little smarter than my first ones wasn't really any harder.

So I still have absolutely no illusions of winning. As of now, the leaders are fighting for 0.0001s down in the neighborhood of 0.9042. I got down to 0.9829 without any similarity-analysis, just to see how much variability I could squeeze out of the problem before starting to work on actual logic. My first wild-ass approximation of "similarity" scored worse than that, but after a small amount of de-assing it got me to 0.9508. As of today, you need 0.9402 to get onto the official leaderboard, but 0.9508 is better than NetFlix's own software, which means that in a few hours of my spare time, using no fancy methods or hardware (albeit with a MacBook Pro running overnight a couple times), I have produced better software for this specific definition of movie recommendation than however many people NetFlix pays to work on this problem full-time. If I were one of those people, I'd be embarrassed.

But actually, if I were one of them, I'd probably also be feverishly working on my own entry in the contest. I'm betting their real engine was built by committee, and thus represents nobody's best ideas. If you held a sufficiently well-constructed contest to see if people could do your team's team-job better than your team was doing it before you knew there was going to be a contest about it, you'd probably lose it about as quickly as NetFlix has lost this one. No matter how complicated you think your work is, once you figure out how to simplify it to the point where you can challenge other people to do it better, somebody will be able to.

Which suggests, of course, a one-step self-improvement program for just about any existing endeavor: treat your contribution as if you are entering a contest to do your job better than you. Assume some stranger could, with hardly any real insight or work, so that's how easy it ought to be.

Or, if that all sounds more tiring than inspiring, there's the other possible moral of the story: The most intense effort in any project usually ends up being spent on the 0.0001s, tiny fractional improvements of no genuine consequence, given apparent significance only by the act of measuring them. It's not clear whether it's actually possible to get to 0.8563 using this set of data. Nor whether doing so would meaningfully improve any NetFlix user's real experience. If they really wanted to qualitatively improve their system, NetFlix would have to hold a design contest, not a data-analysis contest, and that would be a lot harder to score. Maybe you're doing your job, as it's currently defined, more or less as well as it can meaningfully be done, too. Maybe, instead of a contest to dicker over your 0.0001s, you should figure out how to do your current job just as well with a lot less effort, and then spend the rest of your time entering the contest to find a totally different and better way.