Privacy: The 600-pound Gorilla in the Big Data Room

Stephen DeAngelis

May 28, 2013

“[The] sophisticated world of Big Data analysis has arrived,” writes Andrew Carswell, “a realm where both criminal indiscretions and consumer appetite will be targeted by a myriad of data analysts, or ‘geek squads’, whose sole purpose is to connect the dots – to cross reference your data, paint a picture of who you are, determine what you want and plan how best to serve you.” [“Big Data analysis allows businesses and governments to mine your personal details,” The Daily Telegraph, 13 April 2013] You have to admit that the thought of a room full of “geeks” tracking your every move and analyzing your life sounds a bit creepy and the debate about privacy is only going to heat up as more sophisticated analytic techniques are developed. Privacy is the 600-pound gorilla in the Big Data room. Carswell continues:

“This so-called Big Data, which has emerged as the boardroom buzz word for 2013, could be in the form of your bank transactions, your phone calls and texts, the treasure trove of personal details on your Twitter and Facebook account, your Google searches, the petitions you may have signed, the purchases you have made, the information captured by websites and electronic sensors. It is a world which is set to revolutionise the way governments provide services; a world which allows businesses to build intimate relationships with customers; but a world which will ignite an intense debate on the issue of citizen privacy.”

As a businessman and technologist, I’m a big fan of big data and the potential it holds. The analysis of large data sets can make the world a better and, I admit, a more profitable place. We all know that there have been (and will be) mistakes made in the handling of personal data. Most businesses, however, don’t really want to know the kind of personal information that most people would feel is too invasive. As Ruud Wanck, an early adopter who founded an internet company in 1994, told Robert Heeg, “On the web, the vast majority of business models are based on pseudonymous data; we don’t need to know who is who. And we don’t even want to, because it’s incredibly expensive to save all that data.” [“From Mad Men to Math Men,” RW Connect, 8 May 2013] That’s why some pundits, like MIT Professor Alex “Sandy” Pentland, believe that the commercial world and consumers will eventually work out a mutually agreeable arrangement in which consumers have control over their personal information. For more on Pentland’s views, read my post entitled Big Data and Big Brother. John Carroll, Senior Director in Ipsos MediaCT and Chairman of the Media Research Group, agrees with Pentland. He writes:

“The battleground for data will be held between the vendor and the consumer. People will wise up very quickly to flip the relationship around. ‘If you want my personal data, it will cost you’. Or, ‘I will give you this bit of my data, but not all of it’. The word ‘engagement’ rears its head here. If consumers have a relationship with brands and trust them, then there will probably be a healthy two-way ‘big data’ relationship.” [“I predict a big data riot,” MediaWeek, 27 February 2013]

Regardless, Wanck agrees that, for those involved with big data, today’s biggest challenge is privacy. In the heated discussions that are sure to come, Wanck insists, “We should differentiate between anonymous data and more specific personal data.” He explained to Heeg, “Initial political interest in privacy created a very black-and-white, almost populist discussion. The only distinction made was the one between generic data, like how many people visit a website, and personal identifiable information.” Heeg continues:

“In between the two, he argues, is a third category that is used predominantly by marketers and media companies. ‘This is data that determines how people search for information on the web collectively.’ As a company, GroupM says it is not interested in personal data. Media companies, after all, are built on their ability to reach large groups of people who have a similar profile in a short space of time. Wanck calls it ‘pseudonymous’ data, because it is personal but contains a large element of anonymity. ‘Of course you could argue that, if I were to really look for someone with certain identifiable qualities and connect all types of data, I could possibly trace the data back to a certain individual. But why would we? It is not the business we’re in. The essence of advertising is that it’s more efficient to reach a large target group through marketing and communication in one go. That is more cost-efficient than to have a call-centre approach [for] individuals.'”

Jon Neiditz, a lawyer concerned with privacy issues, doesn’t believe current privacy laws are sufficient. [“Big Data Will Turn Privacy Upside Down,” ID Experts, 7 May 2013] In his article, he lists six “major concerns regarding the application of current privacy law to big data,” and insists “the protection of privacy must have what are called in information security ‘compensating controls’.” Two such controls involve information security and data company accountability. Derrick Harris agrees with Neiditz that data security is going to play a critical role in the big data arena. He believes security is essential because, unlike Wanck, he insists that there is no such thing as pseudonymous or anonymous data. [“If there’s no such thing as anonymous data, does privacy just mean security?” DAM Foundation, 3 April 2013] He bases his conclusion on an MIT research paper that concluded:

“To extract the complete location information for a single person from an ‘anonymized’ data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person’s whereabouts.”

While I’m sure a lot of people would simply like to see the collection of data halted altogether, Adam D. Thierer, a public policy analyst, believes that would be a bad idea. [“My Senate Testimony on Privacy, Data Collection & Do Not Track,” The Technology Liberation Front, 24 April 2013] Testifying before a Senate committee Thierer made three primary points. They were:

 

  • First, no matter how well-intentioned, restrictions on data collection could negatively impact the competitiveness of America’s digital economy, as well as consumer choice.
  • Second, it is unwise to place too much faith in any single, silver-bullet solution to privacy, including ‘Do Not Track,’ because such schemes are easily evaded or defeated and often fail to live up to their billing.
  • Finally, with those two points in mind, we should look to alternative and less costly approaches to protecting privacy that rely on education, empowerment, and targeted enforcement of existing laws. Serious and lasting long-term privacy protection requires a layered, multifaceted approach incorporating many solutions.

 

Although privacy concerns will continue to be raised whenever big data collection and analysis is discussed, Diane Mehta reports that the intensity of the debate may decrease in the years ahead. The reason might surprise you. It won’t necessarily be because better solutions will be implemented but because expectations will have diminished. Mehta reports that even though Millennials assert they are not happy that their privacy is being invaded, their actions speak louder than their words. [“New Survey Suggests Millennials Have No Idea What Privacy Means,” Forbes, 26 April 2013] She writes:

“They’re happy to give away their online privacy but say they’re not. A new study by the USC Annenberg Center for the Digital Future and Bovitz Inc. suggests that Millennials (ages 18-34, aka ‘digital natives’) are completely confused. Seventy percent of Millennials say no one should have access to their data or online behavior. Yet 25% will trade it away for more relevant advertising, 56% will share their location for coupons or deals, and 51% say they’ll share information with companies if they get something in return. In response to the survey, Jeffrey I. Cole, director of the USC Annenberg Center for the Digital Future, declares online privacy dead. ‘This demonstrates a major shift in online behavior — there’s no going back,’ he says.”

Those results also strengthen arguments used by Pentland and Carroll that we are in the midst of a period in which consumers and companies are wrestling to see who controls personal data and for what purposes. Elaine B. Coleman, managing director of media and emerging technologies for Bovitz, told Mehta, “[Millennials] perceive social media as an exchange or an economy of ideas, where sharing involves participating in smart ways.” Mehta is quick to point out that Coleman works in an industry that relies on data collection and analysis and fears it might skew her opinions. She concludes:

“The larger issue is Internet privacy of course—and how much the discount-searching online shoppers really know about it. Do they understand exactly what data brokers are after—salaries, hobbies, pregnancies, retail transactions? They’ll take the information whether Millennials want to give it away or not. As long as marketers can justify that people want to give their privacy away, they can go ahead and push for it.”

According to Steve Lohr, companies realize that they can’t leave the future of data collection and analysis to chance with the hope that things work out in their favor. “Corporate executives and privacy experts agree,” he writes, “that the best way forward combines new rules and technology tools.” [“Big Data Is Opening Doors, but Maybe Too Many,” New York Times, 23 March 2013] Lohr concludes his column by discussing the work that Professor Pentland is doing at MIT. He writes:

“Dr. Pentland, an academic adviser to the World Economic Forum’s initiatives on Big Data and personal data, agrees that limitations on data collection still make sense, as long as they are flexible and not a ‘sledgehammer that risks damaging the public good.’ He is leading a group at the M.I.T. Media Lab that is at the forefront of a number of personal data and privacy programs and real-world experiments. He espouses what he calls ‘a new deal on data’ with three basic tenets: you have the right to possess your data, to control how it is used, and to destroy or distribute it as you see fit. … His M.I.T. group is developing tools for controlling, storing and auditing flows of personal data. … Dr. Pentland’s group is also collaborating with law experts, like Scott L. David of the University of Washington, to develop innovative contract rules for handling and exchanging data that insures privacy and security and minimizes risk. … ‘Like anything new,’ Dr. Pentland says, ‘people make up just-so stories about Big Data, privacy and data sharing,’ often based on their existing beliefs and personal bias. ‘We’re trying to test and learn,’ he says.”

His team’s work will surely be watched closely by commercial, government, and private stakeholders. The world of big data is predicted to continue its rapid growth and, I suspect, the privacy gorilla living in that world will grow right along with it.