The results WON'T be able to be used by the makers of Video Store Clerk to compete for the Netflix Prize, since Video Store Clerk doesn't know what movies will be in the Prize data set.

Here's how the Prize works:
If someone is competing for the Netflix Prize, they get a sample set of a million or so anonymous movie ratings to build their algorithm against. Usually the algorithm is automated, but in this case it would be supposedly human generated. Then they submit their algorithm to Netflix, who then runs it against a DIFFERENT set of movie ratings to see how good it did.

I suppose someone could build their algorithm to have a TON of hardcodes, and hope they get a large intersection between their test data set and the real data set, but that would be crazy big. Plus they need to give their algorithm to Netflix who would never let them have hardcodes.

Myth Busted.


Sorry Craig, I have to disagree with you. My understanding of the rules is that, while no one but the judges know which ratings are in the test data set, it is a subset of the ratings in each submission. While it's true that a team must submit it's algorithms, I believe (if I'm reading the rules correctly) that the prize is awarded based on the RMSE score of a submission, so in fact all the movies are known ahead of time.

So if the Video Store Clerk site is in fact related to this contest, then it could be collecting more data that could be used to improve the predictions. While I can see that an algorithm relying on data collected this way might not be what Netflix had in mind, I didn't see where it was forbidden in the rules (but I could have missed it).


Craig - I should add: I *do* agree that if someone is simply trying to build a submission set by using the site to let players "stand in" and provide the ratings, that won't work to win the prize. But it would be interesting to see what score it would get.

