Good Data, Bad Data, Big Data

Stephen DeAngelis

September 4, 2018

We live in the Information Age and the era of big data. Each of us creates data whenever we make a telephone call, use the Internet, make a purchase with a credit or debit card, and/or use a merchant’s loyalty program. Some people believe all this data gathering is an invasion of privacy; but, data can be used to help companies provide us with better service and more personalized products. In order to derive benefit from big data, organizations or individuals must have access to the right data. Kayla Matthews (@KaylaEMatthews) asserts, “Accurate and reliable data can bring context to research studies, help people understand trends, aid business managers in knowing what’s working well for achieving company goals and much more. However, not all data is as beneficial as it seems at first. Bad data can negate all the positive factors of trustworthy information.”[1] At the dawn of the Information Age, most organizations believed the more data they could gather, store, and analyze the better; but, priorities and strategies have changed. A few years ago, Joab Jackson (@Joab_Jackson) wrote, “Companies are spending billions on tools and engineering to analyze big data, though many are hampered by one little problem: they still don’t know what to do with all the data they collect.”[2]

The Challenge of Bad Data

Bad data is worse than having no data. Seth Rao, CEO of FirstEigen, reports, “Boston Consulting Group identified poor quality of Big Data as that ‘horseshoe nail’ that could lose wars. It impacts as much as 25% of the full potential when making decisions in marketing, bad-debt reduction, pricing, etc. Paying attention to that little thing can literally make you millions.”[3] The horseshoe nail refers to a proverb often attributed to Benjamin Franklin. It goes like this:

“For the want of a nail the shoe was lost,
For the want of a shoe the horse was lost,
For the want of a horse the rider was lost,
For the want of a rider the battle was lost,
For the want of a battle the kingdom was lost,
And all for the want of a horseshoe-nail.”

I’m not sure the horseshoe nail is a fitting metaphor for bad data. After all, the horseshoe nail in Franklin’s proverb was missing and bad data is too often not missing in large datasets. Rao observes, “Poor quality of Big Data results in compliance failures, manual rework cost to fix errors, inaccurate insights, failed initiatives and lost opportunity.” Matthews adds, “Sometimes there are glaringly apparent imperfections in data that IT decision-makers spot right away. … However, there are other telltale signs of bad data that aren’t always evident by visual means alone.” Because datasets in the big data era are so large, it should come as no surprise that bad data is captured along with good data. Matthew notes, “Data scientists, marketing managers and other people working with data aren’t always honest about the limitations of data — and there may be gaps in the way it’s managed that cause inaccuracy.”

One of the oft-cited benefits of data analysis is better decision-making. Bain analysts, Michael C. Mankins and Lori Sherer (), insist decision making is one of the most important aspects of any business. “The best way to understand any company’s operations,” they write, “is to view them as a series of decisions.”[4] They add, “Companies that make better decisions, make them faster and execute them more effectively than rivals nearly always turn in better financial performance.” The opposite is also true. Companies that use bad data to make bad decisions are likely to suffer serious consequences. Matthews explains, “If decision-makers put too much emphasis on flawed data, they may make mistakes and feel less confident about using data to educate their conclusions in the future. A 2016 survey of CEOs found 84 percent of them felt concerned about the quality of data they used while making decisions. And they have valid reasons for feeling wary — bad data could cause financial repercussions if business leaders put too much trust in material that’s ultimately lacking.”

Finding and Using Good Data

Rao asserts, “You belong to an exclusive group of wise executives if you realize the importance of Big Data Quality from the very beginning.” Good data is quality data. David A. Teich (@Teich_Comm), principal consultant at Teich Communications, asserts, “Data quality is a core part of data management — and of making the results of analytics applications believable.”[5] He continues:

“Catalog your existing analytics applications and look at the data that they use and create. Then, consider how to increase the accuracy of that data to make it usable in machine learning and deep learning applications without concerns about its quality and consistency. The benefits of doing so are twofold. First, resources spent on data quality will help improve data throughout your information infrastructure, not just in machine learning applications. Second, good data quality is critical for the increasingly regulated data environment. The European Union’s General Data Protection Regulation, popularly known as GDPR, is the most visible aspect of the growing need to better understand, secure, track and control corporate data.”

Analysts from Nektar explain there are five factors that go into assuring data is of high quality: They are: completeness, consistency, accuracy, validity, and timeliness.[6] They conclude, “High quality data is determined by optimizing the completeness, consistency, accuracy, validity, and timeliness of the data collected. By following the best practices of ensuring high quality data, companies can improve their operational processes and organizational visibility through informed, data-driven decisions.”

Summary

Matthews concludes, “When business leaders blindly trust data — especially when making decisions — they inevitably set the stage for problems.” Making sure the data used to enhance organizational decision-making is of the highest quality is well worth the effort and cost. Collecting, storing, and analyzing the right data has other benefits as well. Elizabeth Anderson explains, “Data storage isn’t free and the data collection grows every year and so does the cost. Additionally, it’s a big challenge to search and analyze vast quantity of data and also, requires resources in large numbers. This concludes that instead of hoarding data only collect what is actually required and makes sense in terms of business.”[7]

Footnotes
[1] Kayla Matthews, “How to spot bad data, and know the limitations when it’s good,” Information Management, 24 July 2018.
[2] Joab Jackson, “Intel reveals big data’s dirty little secret,” PCWorld, 28 August 2015.
[3] Seth Rao, “What is the ‘Horseshoe Nail’ of Big Data?Information Management, 23 March 2016.
[4] Michael C. Mankins and Lori Sherer, “Creating value through advanced analytics,” Bain Brief, 11 February 2015.
[5] David A. Teich, “Good data quality for machine learning is an analytics must,” TechTarget, 19 July 2018.
[6] Staff, “5 Factors of High Quality Data & How They Affect Business Decisions,” Nektar.
[7] Elizabeth Anderson, “5 Signs You Misunderstand Big Data,” SmartData Collective, 6 June 2016.