Understanding Big Data Beyond Its Size

Stephen DeAngelis

June 1, 2020

When people hear or read the term “big data,” they initially think two things: It’s data and it’s big. What else is there to say? First, let’s talk about what “big” really means. The volume of data being created is unprecedented and will only increase. The Internet of Things (IoT) is going to generate a massive amount of data, especially with the arrival of 5G technology. Big data is so enormous that the term “big” seems anachronistic. How much data will it generate? Last year (2019) global IP traffic was estimated to have surpassed the 2-zettabyte threshold. The following graphic provides you with some indication of how large a zettabyte really is.

 

 

The many “Vs” of big data

 

Volume is only one of big data’s traits. Over the years, pundits have discussed six other Vs associated with the term. They are:

 

  • Velocity. Velocity refers to how fast data is generated. It’s already being produced faster than many businesses can analyze it with the arrival of 5G networks velocity is only going to increase.
  • Variety. Data is generated in structured, semi-structured, and unstructured ways (see below) and is stored in many forms, such as text, numbers, audio, still image, video, etc.
  • Veracity. Companies want to be able to trust their data. Unfortunately, not all data is accurate. Data can be entered incorrectly. Falsehoods can be published. Deep fakes are gaining ground. This leads to uncertainty, incompleteness, and ambiguity.
  • Value. The World Economic Forum has labeled data a resource whose value ranks alongside natural resources like gold or oil. Some pundits have argued that “value” is the most important of big data’s Vs because other traits are meaningless if businesses can’t derive value from the data they collect. Advanced analytics are required to unlock this value.
  • Vulnerability. Vulnerability primarily addresses the fact that hackers are constantly trying to breach large databases and steal the information they contain. Bank robbers rob banks because that’s where the money is. Hackers hack databases because that’s where the data is. New laws, like the California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) make vulnerability a very serious business concern.
  • Virtue. We’ve all heard the saying, “Virtue is its own reward.” However, when it comes to be data, virtue is a relatively new topic. It refers primarily to how big data is used (i.e., ethical big data). Lisa Morgan (@lisamorgan) writes, “The amount of data being collected about people, companies, and governments is unprecedented. What can be done with that data is downright frightening.”[1]

 

Big data and business

 

Analysts at IDG Connect write, “The Big Data hype isn’t new, but it shows no signs of slowing down. Whether you’re at the start of the big data journey or implementing it with AI teammates, big data plays a vital role in today’s business.”[2] They provide links to numerous articles to help business leaders better understand the topic of big data. Werner Daehn, CEO of rtdi.io, puts an interesting twist on the topic of big data. He writes, “In my opinion, the most important definition of big data is ‘all data which cannot yet be used to generate value’.”[3] He explains, “Here’s an example as to what I mean by that. Purchases are always documented. What isn’t documented, however, is everything else. How did the customer notice the product? Did they see an ad of a specific product? Do customers only skim the product details and buy right away? Or do they meticulously read through technical details and still don’t buy the product?” Another way to state what he is saying is that companies need to understand what data they have and what data they need to gain actionable insights about the business. The data they don’t have may be more important than the data they do have.

 

As mentioned earlier, data comes in structured, semi-structured, and unstructured formats. Gabriela Gavrailova notes, “[Each type of data contains] useful information that you can mine to be used in different projects.”[4] She explains what constitutes each type of data.

 

  • Structured data. “Structured data is fixed-format and frequently numeric in nature. So, in most cases it is something that is handled by machines and not humans. This type of data consists of information already managed by the organization in databases and spreadsheets stored in SQL databases, data lakes and data warehouses.”
  • Unstructured data. “Unstructured data is information that is unorganized and does not fall into a predetermined format because it can be almost anything. For example, it includes data gathered from social media sources and it can be put into text document files held in Hadoop like clusters or NoSQL systems.”
  • Semi-structured data. “Semi-structured data can contain both the forms of data such as web server logs or data from sensors that you have set up. To be precise, it refers to the data that, although has not been classified under a particular repository (database), still contains vital information or tags that segregate individual elements within the data.”

 

Gavrailova adds, “Big Data always includes multiple sources and most of the time is from different types, too. So knowing how to integrate all of the tools you need to work with different types is not always an easy task.” As Daehn pointed out, companies need the right data (as opposed to lots of data) to get real value from big data projects. Stephanie Overby (@stephanieoverby) agrees with that assertion. She writes, “Most business leaders have a reasonable understanding of big data, but some significant misunderstandings persist. The first, and perhaps most damaging, is the assumption that all big data has business value.”[5]

 

Todd Wright, head of data management at SAS, told Overby, “The term ‘big data’ leads many to assume that value is derived simply from the sheer amount of data that an organization holds, and the organization that has the most data wins. The true value comes from how an organization can get a broader view of their customer and business by tapping into different and previously unused data sources. That in turns leads to more educated and informed decisions with the use of analytics.” Making better decisions is critical for businesses. Bain analysts, Michael C. Mankins and Lori Sherer (), assert if you can improve a company’s decision making you can dramatically improve its bottom line. They explain, “We know from extensive research that decisions matter — a lot. Companies that make better decisions, make them faster and execute them more effectively than rivals nearly always turn in better financial performance. Not surprisingly, companies that employ advanced analytics to improve decision making and execution have the results to show for it.”[6]

 

Gerrit Kazmaier (@gerritkazmaier), Executive Vice President of SAP Analytics, Database, and Data Management at SAP, agrees with Mankins and Sherer. He writes, “Making the right choice requires a company to understand every aspect of its business — in the past, the present, and the future — and to recognize the value of the data available to them and what it tells them about their business. Ultimately, the aim of analytics within the enterprise should therefore not simply be to report on what has been, but to enable everyone at every level of an organization to make decisions with confidence.”[7] Cognitive technologies can help them achieve that goal. At Enterra Solutions®, we define cognitive computing as the inter-combination of semantics and computational intelligence (i.e., machine learning). Semantics, in this case, refers to having a symbolic representation of the knowledge domain’s concepts, interrelationships, and rules, which we model within a technology called a Rule-based Ontology. Our ontology allows cognitive computing systems to learn generalizations, encode learnings as rules, and contextualize numerical values. Our cognitive system, the Enterra Cognitive Core™, is a system that can Sense, Think, Act, and Learn®. Like most cognitive technologies, our system is primarily aimed at helping businesses make better decisions.

 

Concluding thoughts

 

Like any undertaking, corporate leaders need to ensure a business case can be made for investing in big data projects and programs. Overby explains, “Business leaders should understand that having more data from more sources is of little to no value without a plan for how the data will be used and a goal for what they want big data to accomplish. … Do they want to predict customer behavior? Map manufacturing trends? Improve sales with better targeting and messaging? Make better hires? Only then can they create a big data strategy — including people, process, and technology ­— to achieve those aims.” Because they are flexible and can deal with ambiguous situations, cognitive technologies can help companies achieve their big data objectives across the board.

 

Footnotes
[1] Lisa Morgan, “14 Creepy Ways To Use Big Data,” InformationWeek, 30 October 2015.
[2] Staff, “Everything you need to know about… Big Data,” IDG Connect, 23 March 2020.
[3] Werner Daehn, “What Is Big Data?E-3 Magazine International, 18 December 2019.
[4] Gabriela Gavrailova, “Big Data: What Is It and How Does It Work?Business2Community, 10 December 2019.
[5] Stephanie Overby, “How to explain big data in plain English,” The Enterprisers Project, 16 October 2019.
[6] Michael C. Mankins and Lori Sherer, “Creating value through advanced analytics,” Bain Brief, 11 February 2015.
[7] Gerrit Kazmaier, “From Augmented Analytics to Confident Decisions,” Manufacturing Business Technology, 2 May 2019.