Big and Small Data

Stephen DeAngelis

May 20, 2013

Dr. Rufus Pollock, Founder and co-Director of the Open Knowledge Foundation, insists that the real revolution that will take place in the Big Data era will involve “small data.” [“Forget Big Data, Small Data is the Real Revolution,” Open Knowledge Foundation Blog, 22 April 2013] Pollock asserts, “The real opportunity is not big data, but small data. Not centralized ‘big iron’, but decentralized data wrangling. Not ‘one ring to rule them all’ but ‘small pieces loosely joined.” He explains:

“The real revolution … is the mass democratization of the means of access, storage and processing of data. This story isn’t about large organizations running parallel software on tens of thousand of servers, but about more people than ever being able to collaborate effectively around a distributed ecosystem of information, an ecosystem of small data. Just as we now find it ludicrous to talk of ‘big software’ – as if size in itself were a measure of value – we should, and will one day, find it equally odd to talk of ‘big data’. Size in itself doesn’t matter – what matters is having the data, of whatever size, that helps us solve a problem or address the question we have.”

Although I agree with Pollock that what really matters “is having the data … that helps solve a problem or address a question,” for many of those problems and questions, size does matter. Outliers in small data sets can significantly skew results. Pollock himself admits that even though “for many problems and questions, small data in itself is enough,” there are times when you need to “scale up.” He believes, however, that “when we want to scale up the way to do that is through componentized small data: by creating and integrating small data ‘packages’ not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos.”

Big data initiatives generally involve integrating data not creating “massive centralized silos.” Nevertheless, Pollock is not alone in the belief that small data will play a role in the big data era. Bruno Aziza, Vice President of Marketing at SiSense, believes that one of the surprises that has emerged is that “Big Data isn’t about “Big”. [Forbes, 22 April 2013] By that he means that the term “big” is subjective. What is considered big today might be considered normal a few years from now. In other words, like Pollock, he believes that size is irrelevant as a descriptor of data sets. Aziza also agrees with Pollock that most problems don’t require petabytes of data to solve. “Sometimes,” he writes, “what can be perceived as ‘Small Data’ can go a long way.” That’s true as long as it’s the right data. Regardless of the size of the data set, what really matters according to Aziza is the analytics applied to it.

The big data era began, he claims, as the result of a revolution in storage capability. He calculates that a terabyte of disk storage would have cost upwards of $14 million (adjusted) in 1980 but can be bought today for $30. When it comes to analytics, however, he asserts that what has occurred has been more evolutionary than revolutionary. Eric Schwartzman, founder and CEO of Comply Socially, underscores the importance of analytics. He writes:

“An avalanche of information is not necessarily a good thing. More often than not, it’s a path to obfuscation rather than enlightenment, where speculation inflicts irrevocable harm and sensationalism travels farther and faster than tolerance. If you’re a business, the takeaway is that sharing without analytics is essentially useless, that engagement is not as valuable as insight, and that seeing things in context is more important than being popular.” [“Without Analytics, Big Data is Just Noise,” Brian Solis blog, 24 April 2013]

Jake Sorofman, a research director at Gartner, believes that big data is still a big deal, because “big data [is] the intelligence behind microtargeting.” He also agrees with Pollock and Aziza, however, that relevant smaller data sets will remain important in the big data era because “the precision of your aim doesn’t matter if the customer experience falls short.” He believes these relevant smaller data sets will be created from larger data sets to create “Big Content” and claims that they will be created through content curation. [“Forget Big Data—Here Comes Big Content,” Gartner, 12 April 2013] He believes that curated content is especially important in the marketing sector because “content is the grist for the social marketing mill.” He continues, “The rhythm and tempo of social marketing puts extraordinary pressure on marketing organizations that are more accustomed to publishing horizons measured in weeks and months than those measured in minutes and hours.” As a result, “The expectation for content quality and authenticity has changed dramatically.”

Steve Olenski agrees that in the marketing arena less is often more when it comes to the data involved. [“When It Comes To Big Data Is Less More?,” Forbes, 22 April 2013] His take on why “less is more” is a bit different than the pundits discussed above. Olenski focuses on the fact that some of the sensitive data that is collected isn’t necessary to achieve desired goals. He writes:

“Two esteemed professors at an Ivy League school say that while those in the marketing world continue to struggle with how to handle all the data they are accumulating, they may in fact be wasting their time and more than likely need to go on what they refer to as a ‘data diet.’ … According to the aforementioned professors, all the talk about Big Data and privacy may be, as they put it, ‘a tempest in a teapot.'”

Since many analysts believe that privacy issues are going to create a big storm rather than a tempest in a teapot, Olenski reached out to Eric Bradlow and Peter Fader, Professors of Marketing and Co-Directors of the Wharton Customer Analytics Initiative. Olenski indicates that the two professors “have studied the problem of data-privacy from an empirical perspective.” He continues:

“Their research shows that brands and companies who are on a ‘data diet’ don’t necessarily lose that much customer insights because limited customer data in conjunction with aggregate information (less privacy sensitive) can still provide precise insights. And when it comes to personal data, Fader says bluntly that ‘most sensitive data is worthless and firms are often making mistakes to try to use it (or even collect it).’ And adds that ‘when you build a really good model, there isn’t a whole lot to be gained by bringing in personal data.”

That should be good news for marketers and consumers alike. Olenski reports that Bradlow and Fader believe “brands should keep the data they need to stay competitive and ditch everything else.” That’s the essence of a data diet. Bradlow told Olenski, “I think there is a fear and paranoia among companies that … if they don’t keep every little piece of information on a customer, they can’t function. Companies continue to squirrel away data for a rainy day. We’re not saying throw data away meaninglessly, but use what you need for forecasting and get rid of the rest.”

I’m not so sure that it’s “fear and paranoia” that motivates companies to collect data as much as the unknown. Since we are at the beginning of the big data era, we really don’t know what data is going to useful in the years ahead as analytics and the questions they address change. We are just beginning to appreciate exactly how valuable analyzed data can become. So at least for the next few years, discovering which data is most relevant and then concentrating on analyzing it should be a priority for most businesses. As companies get more comfortable in the world of big data, we are likely to read more about curated data sets and big data diets.

Tariffs, Trade, and Times Ahead

During this U.S. presidential election year, you are likely to hear and read a lot about tariffs. The Economist notes, “Although it is unfashionable to

Artificial Intelligence: Try It You’ll Like It

Futurist Bernard Marr observes, “As a species, humanity has witnessed three previous industrial revolutions: first came steam/water power, followed by electricity, then computing. Now, we’re