Forecasting the Future with Big Data

Stephen DeAngelis

June 6, 2013

All of us are curious about what lies ahead of us. We get nervous when our vision is limited. In such circumstances, we proceed cautiously, at best, and fearfully, at worst. It’s why at night we carry flashlights and drive with our headlamps on. It’s no different in the business world. The reason that companies get involved in forecasting and engage in “what if” scenario planning is that they don’t want to be blindsided by the future. Big data is playing a larger role in these processes because such data can be analyzed to discover patterns that make predictions more accurate and timely. You might be surprised at the number of ways that companies are using big data to improve their forecasts about the future.

 

For example, take the world of entertainment. When Netflix decided to produce a series of its own, it “chose House of Cards as its first big project. Based on a BBC series, the show stars Kevin Spacey and is directed by David Fincher, and it has quickly become the most watched series ever on Netflix.” [“The ‘Big Data’ Revolution: How Number Crunchers Can Predict Our Lives,” KRWG, 13 April 2013] In the series, Spacey is a scheming politician from South Carolina hoping to plot his way into the White House. The article continues:

“The success of House of Cards is no accident. Netflix executives knew exactly what their millions of customers were watching; they knew precisely how popular the works of Fincher were, and how many of their customers were fans of Kevin Spacey, and how many people were streaming the British House of Cards. Sifting through that mountain of data, Netflix executives were able to predict that House of Cards would be just what Netflix viewers would want to watch. That kind of decision-making is an example of Big Data: the decade-long explosion of digital information, much of it personal, that has become available to companies and governments.”

The article notes, “This trend in predictions and decisions is the topic of a new book, Big Data: A Revolution That Will Transform How We Live, Work and Think.” One of the stories used in that book involves the retail chain Target. Using algorithms Target was able to figure out which of its customers were pregnant even if the women hadn’t revealed that specific tidbit to the company or anyone else. [“How Companies Learn Your Secrets,” by Charles Duhigg, New York Times Magazine, 16 February 2012] Duhigg relates the following story:

“A man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation. ‘My daughter got this in the mail!’ he said. ‘She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?’ The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. ‘I had a talk with my daughter,’ he said. ‘It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.'”

Target’s objective, of course, was to sell products not reveal secrets; but, the story underscores how analytics can be used to make predictions with great accuracy. Laura Hazard Owen penned an article about a software program developed by Eric Horvitz, of Microsoft Research, and Kira Radinsky, of the Technion-Israel Institute, “that analyzes 22 years of New York Times archives, Wikipedia and about 90 other web resources to predict future disease outbreaks, riots and deaths.” [“How two scientists are using the New York Times archives to predict the future,” Gigaom, 1 February 2013] Although their work sounds ominous, Owen reports that their aim is to prevent potentially disastrous events from happening. She continues:

“The new research is the latest in a number of similar initiatives that seek to mine web data to predict all kinds of events. Recorded Future, for instance, analyzes news, blogs and social media to ‘help identify predictive signals’ for a variety of industries, including financial services and defense. Researchers are also using Twitter and Google to track flu outbreaks.”

You can download a copy of Horvitz’ and Radinsky’s research paper by clicking on this link. In the paper, they describe how news about natural disasters, like storms and droughts, can be used to predict (and hopefully prevent) outbreaks of cholera that inevitably occur after such disasters in the developing world. Armed with data about such relationships, governments and non-governmental organizations can help educate the public about the dangers involved and how to mitigate them. Owen reports that Horvitz and Radinsky “outline the advantages that software has over humans in this area.” They include:

 

  • Learning: Software ‘has the ability to learn patterns from large amounts of data, can monitor numerous information sources, can learn new probabilistic associations over time, and can continue to do real-time monitoring, prediction, and alerting on increases in the likelihoods of forthcoming concerning events.’
  • Tireless researching: Software, with its ‘long tentacles into historical corpora and real-time feeds,’ can dig up data that humans might never find because they’re too focused on ‘knowledge that is easily discovered in studies or available from experts.’
  • Lack of bias: Software can assist ‘when inferences from data run counter to expert expectations,’ or when ‘there is a significantly lower likelihood of an event than expected by experts based on the large set of observations and feeds being considered in an automated manner.’
  • Greater access to news: ‘A system monitoring likelihoods of concerning future events typically will have faster and more comprehensive access to news stories that may seem less important on the surface (e.g., a story about a funeral published in a local newspaper that does not reach the main headlines), but that might provide valuable evidence in the evolution of larger, more important stories (e.g., massive riots).’

 

Owen reports, “One of the problems that the researchers faced in developing their software model is the fact that tragic events in poor African countries are often not widely reported.” Lack of data is not generally a problem for businesses. Yet, all of the arguments used by Horvitz and Radinsky to convince us that big data analytics can offer significant insights about the future apply in the business setting as well as in the natural disaster setting. Michael Fitzgerald, for example, reports that “combining sensors with analytics allows companies to spot potential equipment problems before they occur.” [“Sensing the Future Before It Occurs,” MIT Sloan Management Review, 20 December 2012]

 

In an interview with William Ruh, vice president and general corporate officer of General Electric’s global software headquarters, Fitzgerald discussed “a version of the idea of the Internet of things in which industrial machines connect to other machines and share data that will help companies improve operations.” Ruh calls this the “Industrial Internet.” He told Fitzgerald that GE is interested in the Industrial Internet because it can help the company make machines and systems more intelligent and help its customers run their equipment more effectively and efficiently. Ruh notes, for example, that companies use GE “gas turbines to generate 25 percent of the world’s electricity. A one percent savings in fuel is worth $4.5 billion. A one percent change they can take to the bank and to shareholders.” The collection and analysis of real-time data is one way that GE can help find that 1 percent improvement. Ruh concludes, “Real-time, no SQL, Hadoop and other kinds of big data technologies are actually very early on. They give us hope that we can manage this data differently than we did before. So the Industrial Internet is economical and feasible at a time when productivity needs to be a focus.” As Fitzgerald noted earlier, one of the real advantages of real-time data analysis is the ability to predict an imminent failure. Failures not only shutdown processes they often result in collateral damage that compounds the severity of the failure. Ruh asserts that there are so many sensors being embedded into today’s equipment that “the capability and complexity are off the charts, they are the stuff of science fiction, things you see in movies like ‘Minority Report’.”

 

Harvard’s Doc Searls believes that people need to have the same capability to monitor data about their body’s health as GE’s clients have for monitoring their machines’ health. If you’re really connected, he writes, you might be gathering “data produced by your Withings scale, your Zeo sleep manager, your Nike+ sportwatch, your Omron blood pressure monitor, your Fitbit Flex wristband, your Moves smartphone app, your Sportline heart rate monitor, your MoodScope log, your Accu-Check blood glucose meter and your workout machine data from the gym.” [“What can people do with data that companies alone can’t?” Doc Searls Weblog, 19 May 2013] The problem, Searls writes, is that data “are silo’d by the companies supplying those devices.” He continues:

“If you had that data, you could correlate weight loss or maintenance to specific workout routines, moods or dietary practices. You could present that data to your insurance company or health care provider to get better rates and services from both. The list goes on, and can get very long.”

My suspicion is that one of the things on that long list would be the ability to predict a potentially serious health problem. As we all know, prevention is better than cure. That is the power of data and analytics and why we are likely to see an increasing array of products aimed at helping individuals, governments, and organizations forecast the future with greater accuracy by analyzing it. Martin U. Müller puts it this way: “Forget Big Brother. Companies and countries are discovering that algorithms programmed to scour vast quantities of data can be much more powerful. They can predict your next purchase, forecast car thefts and maybe even help cure cancer.” [“Living by the Numbers: Big Data Knows What Your Future Holds,” ABC News, 18 May 2013] He believes that big analytics could, in some arenas, actually lead to “the end of chance.” We’ll just have to wait and see.