Dealing with Big Data

Stephen DeAngelis

November 13, 2012

“What search algorithms were to the 1990s,” writes Holly Finn, “big data is today: a game changer.” [“New Gumshoes Go Deep With Data,” Wall Street Journal, 22 October 2012] She describes big data analytics as “statistical analysis on steroids” only bigger. Big data analytics has to be muscular because, as Finn puts it, businesses are suffering from “data obesity.” I’m not sure I would have used the term “obesity.” The term is pejorative and implies that having less data would be a good thing. The fact of the matter is that the reason big data is a game changer is because it is big — really big. The bigger the better. Finn, however, isn’t really complaining about the size of the data as much as the fact that so little of the data is turned into useful information (i.e., into “mental muscle”). She continues:

“Big-data technology treats this condition, transforming massive amounts of mismatched information into digestible, sometimes lifesaving, intelligence. But more than just a brutish mining mechanism, big data can be a subtle couples counselor: It repairs the relationship between man and machine. It may even rouse our better, more charitable selves. In a seminal 1960 paper, computer pioneer J.C.R. Licklider observed: ‘About 85% of my “thinking” time was spent getting into a position to think.’ His activities were mostly tedious; his ‘choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.’ He wrote ‘Man-Computer Symbiosis’ as a result, predicting a more hopeful scenario.”

Licklider’s point is well made. Big data can be a burden (at best) and useless (at worst) if technologies and analytic techniques are not used to sort through and make sense of the data. With the correct information teed up, business executives are in a better “position to think” and have more time to do it. As Finn writes, Licklider’s vision ” is now coming true.” She continues:

“The rise of cloud-based storage and the falling cost of gathering information is ‘augmenting intelligence,’ enhancing what we and our devices can do together by maximizing what each does best. It isn’t easy. Consider the content being spewed, around the globe, as you read this—digital images, purchase records, GPS signals, social media posts: an estimated 2.5 quintillion bytes of data every day. That’s big. But ‘most of “big data” is a fraud, because it is really “dumb data,”‘ Peter Thiel, a co-founder of Palantir, told me. ‘For the most part, we would need something like artificial intelligence to turn the “dumb data” into “smart data,” and the reality is that we are still pretty far from developing that sort of artificial intelligence.'”

Frankly, I’m a bit mystified by Thiel’s comment that “we are still pretty far from developing that sort of artificial intelligence.” If he means that we are far from developing artificial general intelligence (AGI), then I would agree. But there are a growing number of artificial intelligence (AI) applications, including some being developed by Enterra Solutions, that can now turn dumb data into smart data. Finn highlights the bigger challenge — dealing with all sorts of unstructured data from digital images to social media posts. Brian Bloom writes, “Out of the vast majority of the unstructured data that will be created today, a single piece of it, on close inspection, won’t look very important—an email, a text message, stock price, or even a sensor transmitting an ‘off’ signal. But once these tiny bricks of data are put together, the resulting structure can tell us something quite important.” [“The shifting sands of unstructured data,” it World Canada, 19 October 2012] He continues:

“The mess of information we’re immersed in represents a great technological challenge—that is, finding a way to give it all meaning. But since it also offers an irresistible power to business—virtual omniscience—the money to make industrial-level data sifting a reality has arrived in sufficient quantity to get it off the drawing board and into the commercial world. But in giving meaning to a mass of unstructured data, a distinction has to be made when we start with our initial question. Our answer could be built out of a million tiny components. Or it could come in the form of one giant, irreducible entity. Both require a very different kind of hardware. And here is where we enter the worlds of ‘massively parallel’ and ’embarrassingly parallel.'”

Bloom notes that for the second type of processing challenge — the “embarrassingly parallel” problem (i.e., “one that is very simple to divide”) — the development of Hadoop has been a godsend. “Hadoop can sift through endless bales of hay, neatly organizing every needle it finds. Massively scalable and cheap to run on commodity hardware, Hadoop represents something we’ve wanted for years but have only recently been able to use. … Hadoop, the elephant, naturally gets a lot of the attention because of its size. But other platforms have been designed to attack the same datasets with the same ferocity, if not on the same scale.” For more difficult problems that require massively parallel processing, Bloom insists that commodity hardware simply won’t work. He explains:

“To ask very intelligent, very important questions, your processors must be in very close communication. Thus, supercomputers excel at tasks like identifying flaws in a car, or identifying emerging patterns of insurance fraud, says [Steve Conway, research vice-president for high-performance computing at IDC Inc.]. Technologies like Hadoop are useful only if you already know what you’re looking, he adds. But when you don’t, by and large, you’ll want a supercomputer.”

Bloom correctly states that, regardless of what hardware is being used, “the software running on it is becoming smarter.” Finn agrees, “Done right, they make vital connections while freeing up human brain space for more intuitive, interpretive tasks.” She concludes, “Big data may be dumb. But it’s getting smarter.” The reason it is getting smarter, of course, is that smart people are creating the algorithms involved. But users, not just developers, are also going to have to get smarter. Jeanne Harris writes, “The advent of the big data era means that analyzing large, messy, unstructured data is going to increasingly form part of everyone’s work.” [“Data Is Useless Without the Skills to Analyze It,” Harvard Business Review, 13 September 2012] Anders Reinhardt, head of Global Business Intelligence for the VELUX Group, told Harris, “Big data is much more demanding on the user.” She continues:

“Managers and business analysts must be able to apply the principles of scientific experimentation to their business. They must know how to construct intelligent hypotheses. They also need to understand the principles of experimental testing and design, including population selection and sampling, in order to evaluate the validity of data analyses. As randomized testing and experimentation become more commonplace in the financial services, retail and pharmaceutical industries, a background in scientific experimental design will be particularly valued.”

According to Harris, one doesn’t require a crystal ball to predict that “data literacy” is going to be an essential skill for many jobs in the future. She concludes:

“Tomorrow’s leaders need to ensure that their people have these skills, along with the culture, support and accountability to go with it. In addition, they must be comfortable leading organizations in which many employees, not just a handful of IT professionals and PhDs in statistics, are up to their necks in the complexities of analyzing large, unstructured and messy data. … Ensuring that big data creates big value calls for a reskilling effort that is at least as much about fostering a data-driven mindset and analytical culture as it is about adopting new technology. Companies leading the revolution already have an experiment-focused, numerate, data-literate workforce.”

Big data skills will be especially useful for supply chain professionals. Trevor Miles writes, “To be of any value, the data provided by visibility needs to be ‘refined’ by being broken down into ‘specific useful parts’, turning visibility into actionable insight. I don’t mean to diminish the value of visibility because without it there can be no actionable insight. The true value comes from the supply chain orchestration made possible by actionable insight.” [“Data is the new oil: know sooner, act faster,” The 21st Century Supply Chain, 4 June 2012] He continues:

“Knowing the state of something – inventory, capacity, etc. – in the supply chain, which is what visibility provides, is useful, but not valuable. Value is derived from knowing what that state means to your financial and operational metrics, and, perhaps even more importantly, to your projected operational and financial metrics. To achieve this insight you must be able to compare the current state to a desired state and to evaluate if the difference is important, which you can only determine by having a complete representation of your supply chain – BOMs, routings, lead times, etc. – so that you can link cause to effect. … Only if you have and end-to-end representation of your supply chain can you link a tsunami in Japan to your revenue projections for the next 2 quarters. Of course many people can do this given enough time. Doing this quickly – knowing sooner and acting faster – is what brings value.”

As Miles notes above, gaining insight from data is only beneficial if you can and do act on such insights. Knowing something and doing something are quite different. He writes:

“Under certain circumstances decisions can be automated, but only mundane decisions, decisions that make little difference, can be automated. ‘Big’ decisions should always be left to human judgment. And to do that we need to link the cause and effect to the people who need to take action. Hence actionable insight.”

Big data technologies can help with both decision-making situations. Artificial intelligence systems can be used to make the mundane decisions while providing insights that better position decision makers to think. The sine qua non for such decision-making is big data.

On the Road to AI Superintelligence

New knowledge is being generated at such a dramatic rate that humans can no longer be expected to absorb and understand it. Pippa Malmgren, Founder

The Rise of A.I. Is Not Like the Dotcom Bubble

Nearly three decades ago, the world experienced what became known as the dotcom bubble. Many of the start-ups that popped up during that time raised