AOL Releases Search Data, Pity the Fools

This weekend the AOL research team made public a file which contains logs showing the searches performed by approximately half a million users over the course of three months. Article about it here. The AOL engineers claimed that they had anonymized the data so that the users who performed the searches could not be identified.

Unfortunately the logs themselves are still very revealing, including people’s names, addresses, phone numbers and email addresses; essentially anything that an AOL user could have entered into the search form is in the log.

Think about it: could someone who knows you identify you by looking at your history of searches? Sure. A lot people do searches on their own names just out of curiosity. So then your name would appear in this log along with all other searches done by that searcher. Of course, we couldn’t tell for sure that it was you that searched for that rather embarrassing genre of porn, just that it was someone who knew you by name.

But let’s say that someone also knew that you took a vacation recently in Barbados and refinanced your mortgage this year. Well, what do you know, the public AOL logs show that “anonymized” user name 123456 happened to do a search for your name, that seedy website, “Barbados hotels” and “refinance mortgage”. Hmm.

Actually, this does not even depend on a person having done a search of their own name. I’ve done Google searches on people I’ve known just out of curiosity. If I had done these searches on AOL, then their names would appear in this published data. If one of those people found their name in this log, then they could easily reconstruct my identity based on the same Barbados and refinancing scenarios.

What were those AOL people thinking??? On the page where they released this file (cutely called AOL Research “Alpha”, apparently a little joke about Google’s beta applications), they were very proud of the fact they had obscured the identities of the searchers and shared this valuable data with the world. It seems like these engineers are narrowly focused on their technical achievement but utterly clueless about how their users (and the real people who are named in the published data) might react. Did the above scenario never occur to them or their managers?

I have read that in Google headquarters, they have a giant flatscreen display in the receptionist area showing office visitors real-time searches that are being done. But that isn’t quite the same thing as AOL releasing an entire set of data like this so thoughtlessly.

Update: it appears that the page on the AOL research site that hosted this file has been taken down. Maybe they can ask people to kindly ignore the files that they downloaded while it was available.

Update: CNET.com has examples of some really disturbing searches that hint at people’s identities. I’m sure CNET does not want to expose too much information for fear of getting sued themselves.

Update: TechCrunch.com, one of the sites that first broke this story, posted a link to a NY Times article with the first person positively identified from the data, some poor lady in Georgia. As noted earlier about the CNET report, I’m certain that the only reason more people haven’t been outed in public is because no blogger wants to end up getting sued. The NY Times got this lady’s permission to use her as an example it seems. But no doubt as this data gets around people will be outed among their circle of acquaintances or co-workers. I would not be surprised if eventually someone committed suicide from being mortified with embarrassment.

There is a segment of the technical world that thinks this is not a big deal in proportion to the credit card fraud and identity theft that occurs on a daily basis. But I suspect that, given a choice, these folks would rather have to go through the hassle of closing stolen credit card accounts, compared to having your name and reputation permanently associated with some of the truly demented stuff found in this data.

Written by Parker on August 6th, 2006 with comments disabled.
Read more articles on Software and Rants.

Related articles

Comments disabled

Comments on this article have been disabled.