Big Trouble for Big Data, Part 1

Stephen DeAngelis

February 12, 2013

Ray Gallagher reports that defense contractor Raytheon “has secretly developed software capable of tracking people’s movements and predicting future behaviour by mining data from social networking websites.” [“Software that tracks people on social media created by defence firm,” The Guardian, 10 February 2013] This “extreme-scale analytics” system, that Raytheon calls Rapid Information Overlay Technology (or RIOT), reportedly gathers “vast amounts of information about people from websites including Facebook, Twitter and Foursquare.” Raytheon claims that “it has not sold the software … to any clients” but “acknowledged the technology was shared with US government and industry as part of a joint research and development effort, in 2010, to help build a national security system capable of analysing ‘trillions of entities’ from cyberspace.” It certainly comes as no surprise that web-based activities are being tracked. The question is: Why has the discovery of Raytheon’s system created such a stir?

 

According to Gallagher, it is the “controversial techniques that have attracted interest from intelligence and national security agencies, at the same time prompting civil liberties and online privacy concerns.” He continues:

“The sophisticated technology demonstrates how the same social networks that helped propel the Arab Spring revolutions can be transformed into a ‘Google for spies’ and tapped as a means of monitoring and control. Using Riot it is possible to gain an entire snapshot of a person’s life – their friends, the places they visit charted on a map – in little more than a few clicks of a button.”

That kind of “big brother” tactic inevitably raises privacy concerns and could result in significant backlash. Such backlash would cause concern to any manufacturer or retailer that counts on big data analytics to help them connect better with their customers. Gallagher explains a little more about how Riot works and what it can determine. He writes:

“Riot pulls out this information, showing not only … photographs posted onto social networks by individuals, but also the location at which the photographs were taken. … Riot can display on a spider diagram the associations and relationships between individuals online by looking at who they have communicated with over Twitter. It can also mine data from Facebook and sift GPS location information from Foursquare, a mobile phone app used by more than 25 million people to alert friends of their whereabouts. The Foursquare data can be used to display, in graph form, the top 10 places visited by tracked individuals and the times at which they visited them.”

It should be noted that “mining from public websites for law enforcement is considered legal in most countries. In February last year, for instance, the FBI requested help to develop a social-media mining application for monitoring ‘bad actors or groups’.” But Ginger McCall, an attorney with the Washington-based Electronic Privacy Information Center, told Gallagher that “the Raytheon technology raised concerns about how troves of user data could be covertly collected without oversight or regulation. ‘Social networking sites are often not transparent about what information is shared and how it is shared,’ McCall said. ‘Users may be posting information that they believe will be viewed only by their friends, but instead, it is being viewed by government officials or pulled in by data collection services like the Riot search.'” Needless to say, Raytheon is not happy that the existence of Riot has been revealed. Gallagher continues:

“Jared Adams, a spokesman for Raytheon’s intelligence and information systems department, said in an email: ‘Riot is a big data analytics system design we are working on with industry, national labs and commercial partners to help turn massive amounts of data into useable information to help meet our nation’s rapidly changing security needs. Its innovative privacy features are the most robust that we’re aware of, enabling the sharing and analysis of data without personally identifiable information [such as social security numbers, bank or other financial account information] being disclosed.'”

Gallagher reports that last December Raytheon filed for a patent on the system. Although Raytheon is a U.S.-based company, the stir over the existence of Riot started with The Guardian article in the U.K. Another British newspaper, The Telegraph, picked up on the story and noted that one reason it has raised concern is that the British parliament recently passed a controversial “Communications Bill in Britain which would authorise the monitoring of phone calls, emails and internet usage. Ministers insist the reforms are vital for countering paedophiles, extremists and fraudsters but civil liberties have attacked the Bill’s scope and branded it a ‘snoopers’ charter’.” [“Warning over social networking ‘snooping’ technology,” 10 February 2013]

 

The revelation about Raytheon’s Riot system is not the first time that privacy concerns have been raised in connection with Big Data. Last June, for example, Quentin Hardy wrote, “Even without knowing your name, increasingly, everything about you is out there. Whether and how you guard your privacy in an online world we are building up every day has become increasingly urgent.” [“Rethinking Privacy in an Era of Big Data,” New York Times, 4 June 2012] Hardy reports that at a conference on Big Data held last spring at the University of California, Berkeley, Danah Boyd, a senior researcher at Microsoft Research, stated, “Privacy is a source of tremendous tension and anxiety in Big Data. … It’s a general anxiety that you can’t pinpoint, this odd moment of creepiness.” She then asked, “Is this moving towards a society that we want to build?” Hardy continues:

“Privacy is not a universal or timeless quality. It is redefined by who one is talking to, or by the expectations of the larger society. In some countries, a woman’s ankle is a private matter; in some times and places, sexual orientations away from the norm are deeply private, or publicly celebrated. Privacy, Ms. Boyd notes, is not the same as security or anonymity. It is an ability to have control over one’s definition within an environment that is fully understood. Something, arguably, no one has anymore. ‘Defaults around how we interact have changed,’ she said. ‘A conversation in the hallway is private by default, public by effort. Online, our interactions become public by default, private by effort.'”

Obviously, Hardy was writing before the existence of Riot was revealed; nevertheless, he pointed out that a remarkably complete portfolio on you could be assembled using online data. He noted, “By triangulating different sets of data (you are suddenly asking lots of people on LinkedIn for endorsements on you as a worker, and on Foursquare you seem to be checking in at midday near a competitor’s location), people can now conclude things about you (you’re probably interviewing for a job there) that are radically different from either set of public information.” He continued:

“What is to be done? Ms. Boyd has made a specialty of studying young people’s behavior on the Internet. She says they are now often seeking power over their environment through misdirection, such as continually making and destroying Facebook accounts, or steganography, a cryptographic term for hiding things in plain sight by obscuring their true meaning. ‘Someone writes, “I’m sick and tired of all this,” and it gets “liked” by 32 people,’ she said. ‘When I started doing my fieldwork I could tell you what people were talking about. Now I can’t.’ That is a placeholder solution, and Ms. Boyd sees only one certainty for which we should prepare. ‘Regulation is coming,’ she says. ‘You may not like it, you may close your eyes and hold your nose, but it is coming.’ The issue is what the regulation looks like, and how well it is considered. ‘Technologists need to re-engage with regulators,’ she says. ‘We need to get to a model where we really understand usage.’ Right now, even among the highest geek circles, ‘we have very low levels of computational literacy, data literacy, media literacy, and all of these are contributing to the fears.'”

Obviously, manufacturers, retailers, and marketers are all concerned stakeholders when it comes to the collection, storage, and analysis of big data. They should also be a part of any discussions involving future regulation. More importantly, they need to work with consumers with to establish cooperative measures for data sharing. I’ll discuss more about that subject tomorrow.