Home » Big Data » Datasharing is Big Business

Datasharing is Big Business

December 31, 2012


You would have to be technically illiterate or remarkably naive not to know that your online activity is being tracked by almost every site you visit. In an interesting series of articles, reporters from the Wall Street Journal discussed how and why companies track your online activities as well as some of the ways you have of controlling such activity. The principal article on this topic was co-authored by Jennifer Valentino-Devries and Jeremy Singer-Vine. “The widening ability to associate people’s real-life identities with their browsing habits marks a privacy milestone,” they wrote, “further blurring the already unclear border between our public and private lives.” [“They Know What You’re Shopping For,” 7 December 2012] They continued:

“In pursuit of ever more precise and valuable information about potential customers, tracking companies are redefining what it means to be anonymous. The use of real identities across the Web is going mainstream at a rapid clip. A Wall Street Journal examination of nearly 1,000 top websites found that 75% now include code from social networks, such as Facebook’s ‘Like’ or Twitter’s ‘Tweet’ buttons. Such code can match people’s identities with their Web-browsing activities on an unprecedented scale and can even track a user’s arrival on a page if the button is never clicked. In separate research, the Journal examined what happens when people logged in to roughly 70 popular websites that request a login and found that more than a quarter of the time, the sites passed along a user’s real name, email address or other personal details, such as username, to third-party companies. One major dating site passed along a person’s self-reported sexual orientation and drug-use habits to advertising companies.”

Both scrupulous and unscrupulous companies track online user activities looking for exploitable information. Valentino-Devries and Singer-Vine noted that companies are looking for “precise and valuable information about potential customers.” The purpose, of course, is to turn potential customers into actual consumers. Companies are in business to make a profit; and, there’s nothing wrong with that. It costs money to tell potential customers about the products and services companies have to offer. If marketing efforts are ineffective, either profit margins decrease or product/service prices increase. The former is not good for companies and the latter is not good for customers. That’s why targeted marketing is gaining traction. But targeted marketing depends, in large measure, on tracking some data inputs that raise concerns among privacy advocates.


Most shoppers understand that they are likely to receive offers that more closely match their tastes and lifestyle if they permit their activities to be tracked. What they object to, more often than not, is having that information shared. One of the ways that scrupulous companies deal with privacy issues and tracking policies is by using technologies that anonymize the data through aggregation. But, according to Valentino-Devries and Singer-Vine, the lines are getting fuzzier about it means to be anonymous. They wrote:

“Companies that conduct online tracking have long argued that the information they collect is anonymous, and therefore innocuous. But the industry’s definition of ‘anonymous’ has shifted over time. After an epic regulatory battle in the early 2000s over Web privacy, the online ad industry generally concluded that ‘anonymous’ meant that a firm had no access to ‘PII,’ the industry term for ‘personally identifiable information.’ Now, however, some companies describe tracking or advertising as anonymous even if they have or use people’s real names or email addresses. Their argument: It’s still anonymous because the identity information is removed, protected or separated from browsing history. Facebook Inc., for example, offers a service that shows ads to groups of people based on email address, but only if advertisers already have that address. Facebook says that it doesn’t give people’s email addresses to the advertiser.”

Some businesses will undoubtedly continue to push the boundary of what’s considered acceptable and what’s considered questionable when it comes to privacy. Businesses have to understand, however, that crossing the line often enough could result in a backlash that eventually restricts their access to so-called big data. That would be an unfavorable outcome for everyone. Valentino-Devries and Singer-Vine continued their article with a discussion about how Facebook’s and other site’s “anonymization” schemes work. They write:

“A website uses a formula to turn its users’ email addresses into jumbled strings of numbers and letters. An advertiser does the same with its customer email lists. Both then send their jumbled lists to a third company that looks for matches. When two match, the website can show an ad targeted to a specific person, but no real email addresses changed hands. Still, the sheer ease with which personal details can be shared online makes it difficult for people to know whether their information is safe. A Wall Street Journal survey of 50 popular websites, plus the Journal’s own site, found that 12 sent potentially identifying information such as email addresses or full real names to third parties. The Journal tested an additional 20 sites that deal with sensitive information, including sites dealing with personal relationships, medical information and children. Nine of these sent potentially identifying information elsewhere. Sometimes the information was encoded and sent in a special transmission to another company. Other times, though, people’s names were simply included in the title or address of the Web page. This information gets sent automatically to every ad company with a presence on a Web page unless the website owner takes steps to prevent it.”

The authors admitted that even the Wall Street Journal‘s website “shared considerable amounts of users’ personal information.” A spokeswoman for the newspaper claimed “that most of the sharing of personally identifiable information was unintentional and was being corrected. The only intentional sharing of identity information, she said, was an encoded version of the user’s email address, provided to a company that sends marketing emails to readers who opt to receive them. She said the Journal makes companies it works with sign a policy that would prevent them from using improper data they receive.” At the moment, it appears that companies have the upper hand when it comes to controlling personal data; however, there is a growing belief that if you give people control over their personal data and who can use it, privacy issues would be greatly reduced and relationships between companies and customers would be enhanced. For more on that topic, read my post entitled Targeted Marketing: Understanding the Individual. Valentino-Devries and Singer-Vine continued:

“The regulatory clash over Web privacy in the early 2000s established ground rules that today are being tested. At that time, the Federal Trade Commission investigated the merger of the online-ad company DoubleClick Inc. with a traditional mailing-list giant, Abacus Direct, over concerns that Abacus would merge its lists of people’s real names and addresses with DoubleClick’s Web-browsing profiles. DoubleClick (now owned by Google Inc.) eventually agreed not to do that. The dispute spawned an industry self-regulatory group that pledged not to link personally identifiable information to Web browsing unless the person opted in.”

The temptation to track and use data beyond that offered by “opt in” options is growing. As Valentino-Devries and Singer-Vine put it, “The allure of real identities remains. After all, that’s how most companies keep track of their customers.” Why would individuals opt in to such programs? The obvious answer is: For the benefits they receive. Customers routinely join loyalty programs to gain points or receive discounts. Online companies claim their tracking policies are similar to such programs. The article explains:

“Brick-and-mortar shops can ‘capture things like name, city and email address’ when a person buys something or signs up for a loyalty card, said a Yahoo Inc. official. Yahoo offers a service, Audience Match, that lets retailers find and target their customers online. Yahoo says that it uses anonymization and doesn’t give names or Web-browsing information to advertisers.”

The problem, according to Valentino-Devries and Singer-Vine, is that with so many tracking programs now available to websites there can be unanticipated consequences. They explained:

“In the past, tracking companies and retailers had a tougher time identifying online users. Today, a single Web page can contain computer code from dozens of different ad companies or tracking firms. These separate chunks of code often share information with each other. … It’s so easy to share such information that many of the sites the Journal contacted said they were doing so accidentally. The problem is easy to solve, but it has persisted for years.”

Unfortunately, there is currently little incentive to solve the problem. That’s because big data is big business and it’s an economic sector that is growing both in size and importance. The article continued:

“Craig Wills, a computer-science professor at Worcester Polytechnic Institute, published research in 2011 showing that 56% of more than 100 websites leaked pieces of private information in ways similar to those found in the Journal’s study. … The rise of social networks is also making it easier to tie people’s real identities to their online behavior. The ‘Like’ button, for instance, can send information back to Facebook whenever Facebook users visit pages that have the button, even if they don’t click it. These buttons and related code give social networks, which often know people’s real names, an unprecedented overview of online behavior. The Journal found that Facebook code appears on 67% of the more than 900 sites of the top 1,000 that were scanned by BuiltWith.com, a service that examines websites and the technologies they use. That is up from about 63% a year or so ago. Code from Twitter Inc. was on nearly 54% of sites, up from 43%. Code from the Google+ social network was on almost 30% of sites examined, up from just 12% in December 2011.”

If you would like to see how some popular websites are sharing your data, click on this link. It will take you to an interactive Wall Street Journal website that looks at 50 popular sites. If you would like to gain some control over how you are tracked and your data shared, click on this link. It will take you to a separate article written by Jennifer Valentino-Devries that describes five strategies for reducing identity tracking online.

Related Posts:

Full Logo


One of our team members will reach out shortly and we will help make your business brilliant!