Big Data is just Data

Stephen DeAngelis

February 1, 2018

We love to modify our nouns. George Carlin used to point out how our love for modifiers sometimes created oxymoronic phrases like “jumbo shrimp” and “good grief.” It’s no surprise, therefore, the word “data” has been modified by the adjective “big.” When the Internet connected the world, followed by advances in mobile technology, humans and machines began generating massive amounts of data. Data is data; but, there was so much data being generated it screamed out for a modifier. Exactly who coined the term “Big Data” is unclear; but, it stuck.[1] To help explain why Big Data was different from historical data, people talked about three “Vs” — Volume (there is lots of it), Velocity (it is generated at an amazing pace), and Variety (it comes in structured and unstructured forms). Soon a fourth “V” was added — Veracity (not all data is accurate) Then a fifth “V” — Value (data is the new gold). And finally, a sixth “V” — Vulnerability (database breaches occur on a regular basis). With the emergence of the Internet of Things (IoT), the explosion of data is about to get even bigger. In fact, the amount of data is going to be so enormous that “big” just doesn’t seem to be an adequate modifier.

What is Big Data?

If you have ever asked yourself, “What is Big Data,” you are not alone. Michael Kanellos (@mikekanellos) writes, “Everyone seems to have their own definition. To purists, it refers to software for data sets that exceed the capabilities of traditional databases. For a growing number of people, it’s shorthand for predictive analytics. To others, it just means a really staggering amount of 1s and 0s. The problem is that the term is too general.”[2] To refine the discussion, Kanellos suggests there are five different types of Big Data. They are:

Big Data. For Kanellos, big data involves big problems. “These are the classic predictive analytics problems,” he writes, “where you want to unearth trends or push the boundaries of scientific knowledge by mining mind-boggling amount of data.”

Fast Data. Not all problems are earth-changing problems. Sometimes you are just looking for quick insights. Kanellos explains, “Fast Data sets are still large, but the value revolves around being able to deliver a good enough answer now: a somewhat accurate traffic forecast in near real-time is better than a perfect analysis an hour from now.”

Dark Data. You’ve probably heard about the dark web. It’s where nefarious activity takes place and is generally inaccessible to most users. Not all dark data, however, is nefarious. Kanellos explains, “Dark Data is information you have, but can’t easily access. Gartner and IDC estimate that approximately 80% of data is unstructured data and more is likely on the way.”

Lost Data. Nobody likes losing things; but, Kanellos asserts much lost data is not technically lost — it simply isn’t leveraged. He explains, “This is the information from manufacturing equipment, chemical boilers, industrial machinery and the other things you find inside of commercial buildings and industrial plants. It’s not technically lost. The problem is that is it often landlocked in operational systems. McKinsey & Co estimate that an offshore oil rig might have 30,000 sensors, but it only uses 1% of that data for decision making.” If data really is the new gold, that’s a lot of value being left in the sluice box.

New Data. Kanellos explains new data is desired data. He writes, “New Data consists of information we could get, and want to get, but likely aren’t harvesting now.” As sensors and analytics continue to improve, we are likely to obtain much more new data in the years ahead.

Capturing Big Data’s Value

As Kanellos observes, big data is not just about the size of databases but obtaining value from data using advanced analytics. Data lying fallow in a database is of no benefit to anyone. As Thomas H. Davenport (@tdav), a Professor in Management and Information Technology at Babson College, notes, “You don’t need a lot of data to be more successful.”[3] But you do need to gain insights by analyzing the data you do have. Cynthia Harvey explains, “For most organizations, the primary purpose in launching a big data initiative is to analyze that data in order to improve business outcomes.”[4] She continues, “The way that organizations generate those insights is through the use of analytics software. Vendors use a lot of different terms, such as data mining, business intelligence, cognitive computing, machine learning and predictive analytics, to describe their big data analytics solutions.” The types of analytics involved, she notes, can be separated into four broad categories. They are:

Descriptive analytics. “This is the most basic form of data analysis. It answers the question, ‘What happened?’ Nearly every organization performs some kind of descriptive analytics when it puts together its regular weekly, monthly, quarterly and annual reports.”

Diagnostic analytics. “Once an organization understands what happened, the next big question is ‘Why?’ This is where diagnostic analytics tools come in. They help business analysts understand the reasons behind a particular phenomenon, such as a drop in sales or an increase in costs.”

Predictive analytics. “Organizations don’t just want to learn lessons from the past, they also need to know what’s going to happen next. That’s the purview of predictive analytics. Predictive analytics solutions often use artificial intelligence or machine learning technology to forecast future events based on historical data. Many organizations are currently investigating predictive analytics and beginning to put it into production.”

Prescriptive analytics. “The most advanced analytics tools not only tell organizations what will happen next, they also offer advice on what to do about it. They use sophisticated models and machine learning algorithms to anticipate the results of various actions. Vendors are still in the process of developing this technology, and most enterprises have not yet begun using this level of big data analytics in their operations.”

Summary

Denis Kaminskiy, CEO at Arcus Global, writes, “‘Big’ data is still data of course.”[5] He adds, “BIG DATA, is not just ‘more’ data. It is so much data, that is so mixed and unstructured, and is accumulating so rapidly, that traditional techniques and methodologies including ‘normal’ software do not really work (like Excel, Crystal reports or similar).” What’s required to handle big data is advanced analytics; but, not all businesses are taking advantage of these capabilities. “Everybody is abuzz about big data and the opportunities it presents for businesses,” writes Sarah Rubenoff. “Think about the power of analytics and the potential of AI. That said, few organizations are truly reaping the benefits of big data as many are overwhelmed by its sheer size.”[6] A good cognitive computing system can help companies leverage their data so they can remain profitable in the digital era.

Footnotes
[1] Steve Lohr, “The Origins of ‘Big Data’: An Etymological Detective Story,” The New York Times, 1 February 2013.
[2] Michael Kanellos, “The Five Different Types of Big Data,” Forbes, 11 March 2016.
[3] Thomas H. Davenport, “Even Small Data Can Improve Your Organization’s Judgment,” Harvard Business Review, 26 March 2012.
[4] Cynthia Harvey, “Big Data,” Datamation, 30 May 2017.
[5] Denis Kaminskiy, “What’s the Difference Between ‘Big Data’ and ‘Data’?” Digital Leaders, 4 December 2017.
[6] Sarah Rubenoff, “Big Data: Taking Advantage of the Opportunity,” Inside Beg Data, 19 October 2017.

Studies Show Augmented Intelligence Solutions Improve Productivity

Since the public release of large language models (LLMs), like ChatGPT, a plethora of articles have been written about how artificial intelligence (AI) will take

Digital Transformation and Cognitive Computing

Business experts continue to insist that, in the current era, companies must transform into digital enterprises or risk going out of business. What does that