Anonymity of Netflix Prize Data Cracked?
Arvind Narayanan and Vitaly Shmatikov have submitted a paper that details a technique that might be able to break the anonymity of the Netflix prize data.
We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.



People are creatures of habit. That being said, there is a lot to this article. Yahoo already knows that and has profiles based upon searches coming from your IP and/or cookies established on your computer. There is little in the way of anonymity on the internet. Most of us are establishing ourselves by the way we use it, make purchases through companies like Amazon, rent through NF and BB, and make searches on YouTube and Yahoo (and other search engines).
Where things get sticky is with those who do not want the information revealed, and yet, when it comes to privacy, it becomes obvious that unless you stay away from any means by which your patronage can be tracked and what you purchase, you are revealed to those who know how to obtain the data and process it to reveal who is in that data.
For instance, if you do a search on my name, it will turn up a bunch of hits, including some interesting history, such as I at one time ran a BBS that served as the Fidonet hub in Chicago. That I am a writer (and not from just my site), that I am somewhat politically neutral - a moderate, who will consider either side of a question.
Do I care if this information is available? To a certain extent, but not really. I don't do certain things on the internet, such as banking, but I do make purchases and have done most of my Christmas shopping via that means this year. It sure beats fighting the crowds.
One can also extrapolate misinformation, and that is where this kind of data mining gets dangerous. For instance, my rental and purchase habits are all over the place and I'll take a look at a film that represents a life style of which I don't approve. I've ended up buying three copies of Pretty Woman over the years (my wife went through two VHS copies on her own), but I don't approve of solicitation. And yet, because I bought three copies, someone might think that I think it is okay.
And so it goes...
Posted by: Old Timer Too | November 29, 2007 at 04:08 AM
If you are making comments on a public site (such as this or IMDB), you are voluntarily sacrificing your privacy. Anyone with a desire to link these items is free to do so without your permission.
If you don't want people to be able to put the puzzle pieces of your life together, then the onus is on YOU to make the minimal effort needed to preserve your privacy.
I'm not quite sure how purchases made through Amazon are public information. Can you explain this to me?
Posted by: chi_tino | November 29, 2007 at 09:12 AM
It isn't that Amazon's material is public, but that Amazon mines the information about your searches in hopes of enticing you to make purchases. If you do some searching for items on Amazon's site, it is linked to a cookie that your computer reports when you revisit the site (you don't even have to be signed in for it to do this).
While this material is private, it does reveal a lot about you, which is at the crux of the matter. Credit card purchase information is gather everytime you make a purchase with your card and stored for future reference - again, no public.
Now, when a company's data are compromised, financial records are not the only data that are revealed, but a lot more, including purchase and search information.
Hopefully, that explains it a bit better.
Posted by: Old Timer Too | December 01, 2007 at 11:59 AM