A Brief Big Data/Analytics 3.0 Primer

Stephen DeAngelis

December 2, 2013

“We are now in the midst of a big data bubble where everything and anything seems to be touted as having some kind of big data tie-in,” writes Greg Satell. ” It’s hard to know what we’re talking about anymore.” [“Before You Can Manage Big Data, You Must First Understand It,” Forbes, 22 June 2013] Understanding often begins with a definition. Satell writes, “In their book, Big Data, authors Viktor Mayer-Schönberger and Ken Cukier define it as ‘things that one can do at a large scale that can’t be done at a small one’ and that, I think, gets to the heart of the matter. Bigger data isn’t better, but it’s different.” Todd Michaud defines Big Data in a more colorful way:

“When someone in an organization gets the idea that they would like to pull useful information out of bunch of data that is so large and complex that the CIO says, ‘Well, how the hell are we going to do that?’ That is Big Data.” [“Big Data Is Exactly What You Think It Isn’t,” Storefront Backtalk, 16 May 2012]

When Mayer-Schönberger and Cukier write that you can “do” things with big data sets that you can’t do with smaller data sets, the “things” they are talking about are analytics. Robert Handfield reminds us people have been analyzing large data sets for decades. [“A Brief History of Big Data Analytics,” International Institute for Analytics, 26 September 2013] He writes:

“Big Data has been used for a while, with companies such as UPS and others doing a lot of these things for some time. There is definitely something going on and we have been heading in a new direction for a while, but it is relatively unstructured at the moment. Many new kinds of data are being analyzed and new ways of addressing are coming to the forefront; this is a new development afoot, called Analytics 3.0.”

If analyzing large data sets isn’t a new activity, then what makes Analytics 3.0 different? Handfield says one thing is speed. “Companies are using data to make the same decisions,” he writes, “but are making them a lot FASTER! For example, Macy’s can re-price all of their merchandise, all of their SKUs in 20 minutes, not 24 hours.” Michaud offers another perspective about what’s different about Big Data analytics. He writes:

“What’s new is that a bunch of really smart people have solved the two biggest challenges when it comes to analyzing data of large size and complexity. First, they have removed the need for giant, expensive, specialized hardware platforms (instead using a large number of small ‘commodity servers’ or even cloud servers). And second, they have also removed the need to structure the data in a given format prior to running analysis. These two technological advancements (and the dozens of other underlying technologies that support them) have unleashed a tremendous number of possibilities when it comes to gaining insight from information that would have previously required millions of dollars worth of hardware and a staff of data experts to process.”

Another thing that makes Analytics 3.0 different is that the potential amount of data that needs to be analyzed is getting enormous. The word “big” is too mild of a modifier when comes to the size of data that is now being generated. Walter Bailey reminds us just how “big” Big Data is becoming:

“The increase in data volume is so fast that it is astonishing to find out that we create more than 2.5 quintillion bytes of data every single day, according to IBM research. It does not stop at this growth rate; every second the growth rate of data increase is faster than the previous one. Comparatively, we create more data during every new second than in the previous one. This is very interesting as it means that 90 percent of today’s total data was created in the last two years! This trend will prevail in the future too. As per normal trends since last three decades, the data volume doubles after every three years or so. … Big data will not only keep increasing its volume but the expected increase is much more than present trend. Many people in the world have still not started using the Internet; the big data will increase drastically once those people start using the newer technologies and the Internet. This means the demand of data use of the existing users is increasing very rapidly. Smartphones and wireless technologies are fueling the rapid swelling up of big data of the world. This only makes more fun and challenging for IT engineers to explore the concept of big data and develop new technologies.” [“Understanding The Concept Of Big Data,” CloudTweaks, 25 January 2013]

Dr. Barry Devlin, a founder of the data warehousing industry, believes that data comes from three sources: measures, events, and messages. [“Where Does Information Originate?” SmartData Collective, 6 October 2013] He explains:

“Measures and events come from the physical world of machines and show ongoing conditions (e.g., temperature, velocity, location) and changes in conditions (e.g., an acceleration, a button pressed, a call ended). Messages are human-sourced communications in text, voice, image or video that represent something that one person wants to share with another. All of these can and do generate transactions when processed in traditional system.”

He asserts that before tools for the Analytics 3.0 toolkit were developed, people simply didn’t collect much of the data that can be captured in each of those areas. As an example, he discusses a traditional business transaction (an event). Historically, companies focused primarily on recording and analyzing the final event. “Before the emergence of big data,” he writes, “we seldom considered in any detail what happened before such transactions occurred.” Today, for the most part, we capture it all. That’s why Big Data has become such a ubiquitous buzzword. Henrik Liliendahl Sørensen breaks down the sources of data into five categories: social data, sensor data, web logs, big transaction data, and big reference data. [“Five Flavors of Big Data,” Liliendahl on Data Quality, 24 September 2013] Here is his brief description of each type of data:

Social data: The most mentioned type of big data I guess is social data and the opportunity to listen to Twitter streams and Facebook status updates in order to get better customer insight is an often stated business case for analyzing big data. However, everyone who listens to those data will be aware of the tremendous data quality problems in doing that. …

Sensor data: Another often mentioned type of big data is sensor data. … These are somewhat different from social data with less complex data quality issues but not in all free of data quality flaws.

Web logs: Following the clicks from people surfing the internet is a third type of big data. This kind of big data shares characteristics from both social data and sensor data as they are human generated as social data but more fact oriented as sensor data.

Big transaction data: Even traditional transaction data in huge volume are treated as big data but of course inherits the same data quality challenges as all transaction data as even that data are structured we may have trouble with having the right relations to the who, what, where and when in the transactions. And that isn’t easier with large volumes.

Big reference data: When reference data grows big we also meet big complexity. Try for example to build a reference data set with all the valid postal addresses in the world. Several standardizing bodies have a hard time making a common model for that right now.”

Like Liliendahl, Satell notes that much of the data that is collected is of low quality, but that shortcoming can be overcome using Analytics 3.0 techniques. He explains:

“Imagine you had billions of data points all being collected and analyzed in real time and real conditions. That’s the secret to the transformative power of big data. By vastly increasing the data we use, we can incorporate lower quality sources and still be amazingly accurate. What’s more, because we can continue to reevaluate, we can correct errors in initial assessments and make adjustments as facts on the ground change. In an increasingly connected age, with low cost sensors and a central Internet, this is unleashing a world of new possibilities.”

Handfield, in a follow-on article to the one cited above, concludes:

“Analytics 3.0 calls for a new way of thinking about analytics that are embedded in products and services and are communicating back to the enterprise. For example, a logistics company, CH Robinson, is tagging fruit containers with sensors to determine if the fruit is spoiling, allowing them to determine who was responsible for storage at that location and avoiding payment of waste factors if the shipper wasn’t responsible for it. Cement companies are putting sensors in cement, which allows them to determine when it is beginning to dry in transit, and how to re-route the cement to the job site. In such instances, analytics is not just about supplying data (which information providers already do), but also about providing insight and services.” [“Analytics 3.0 and the Impact on Your Business,” International Institute for Analytics, 21 October 2013]

Big Data that lies fallow and unanalyzed in a database is little better than having no data at all. As Handfield notes, the value is not in the data but in the insights and services that emerge from it when it is analyzed. Chuck Rivel notes that Big Data can “guide business decisions in countless ways.” [“A Technical Look at Big Data,” SmartData Collective, 30 July 2013] He goes on to note, “Questions abound about how to make the most of big data — and use it strategically to inform key decisions in your business or organization. While there’s no easy answer, and many companies don’t have the time or expertise to craft and implement a plan, the first step is understanding the tools and technologies behind big data — and their potential to deliver deep insights to your team.” He goes on to explain some of technologies behind Big Data analytics and, if you are unfamiliar with those technologies, I recommend reading it.