Advanced Analytics: Looking for Better Insights

Stephen DeAngelis

November 9, 2016

Cause and effect relationships are important for understanding the world around us. Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf assert, “An advantage of having knowledge about causal relationships rather than about statistical associations is that the former enables prediction of the effects of actions that perturb the observed system.”[1] That’s fancy talk meaning it’s good to know what reaction you can expect from a given action. Advanced analytics can help companies move beyond statistical correlations to discover important insights that can improve corporate decision making. To demonstrate why correlation is often an insufficient basis on which to make decisions, Tyler Vigen (@TylerVigen), a Harvard law student, started a site called Spurious Correlations. Vigen has shown, for example, that there is an annual correlation between the number of people who have drowned by falling into a swimming pool and the number of films in which Nicolas Cage has appeared and that the divorce rate in Maine correlates to the per capita consumption of margarine in the United States. Vigen provides a few other examples in the following video.

 

 

Yanir Seroussi (@yanirseroussi) observes, “An understanding of cause and effect is something that is not unique to humans. For example, the many videos of cats knocking things off tables appear to exemplify experimentation by animals. If you are not familiar with such videos, it can easily be fixed. The thing to notice is that cats appear genuinely curious about what happens when they push an object. And they tend to repeat the experiment to verify that if you push something off, it falls to the ground.”[2] Seroussi included the following video to demonstrate his point.

 

 

Seroussi notes, “It is surprisingly hard to define causality. Just like cats, we all have an intuitive sense of what causality is, but things get complicated on deeper inspection. For example, few people would disagree with the statement that smoking causes cancer. But does it cause cancer immediately? Would smoking a few cigarettes today and never again cause cancer? Do all smokers develop cancer eventually? What about light smokers who live in areas with heavy air pollution?” That’s where advanced analytics come into the picture. Cognitive computing systems, like the Enterra Enterprise Cognitive System™ (ECS) — a system that can Sense, Think, Act, and Learn® — can analyze many more variables than previous systems in an attempt to go beyond correlation to causality. To underscore his point that determining causality is difficult, Seroussi cites Samantha Kleinberg, who wrote the following passage in her book entitled Why: A Guide to Finding and Using Causes:

“The question often boils down to whether we should see causes as a fundamental building block or force of the world (that can’t be further reduced to any other laws), or if this structure is something we impose. As with nearly every facet of causality, there is disagreement on this point (and even disagreement about whether particular theories are compatible with this notion, which is called causal realism). Some have felt that causes are so hard to find as for the search to be hopeless and, further, that once we have some physical laws, those are more useful than causes anyway. That is, ’causes’ may be a mere shorthand for things like triggers, pushes, repels, prevents, and so on, rather than a fundamental notion. It is somewhat surprising, given how central the idea of causality is to our daily lives, but there is simply no unified philosophical theory of what causes are, and no single foolproof computational method for finding them with absolute certainty. What makes this even more challenging is that, depending on one’s definition of causality, different factors may be identified as causes in the same situation, and it may not be clear what the ground truth is.”

Adam Kelleher (@akelleh), Principal Data Scientist at BuzzFeed, agrees that causality is a difficult goal to achieve. He asks, “We’ve all heard in school that ‘correlation does not imply causation,’ but what does imply causation?!”[3] He goes on to note that advanced analytics are helping companies get beyond correlations to causality. He continues:

“So what is causality good for? Anytime you decide to take an action, in a business context or otherwise, you’re making some assumptions about how the world operates. That is, you’re making assumptions about the causal effects of possible actions. … The more I read and talked to people about the subject of causality, the more I realized the poor state of common knowledge on the subject. … The term ‘causality’ has a nice intuitive definition, but has eluded being well-defined for decades. … It turns out that if you don’t include hidden common causes in your model, you’ll estimate causal effects incorrectly. This raises a question: can we possibly hope to include all of the hidden common causes? What other alternative is there, if this approach fails?”

The very fact that some causal effects may be hidden implies some may be missed. As noted earlier, the value of a cognitive computing system is that it can ingest, integrate, and analyze many more variables than previous systems; which means, hidden effects are much more likely to be discovered. Does the fact that causality is difficult to achieve make correlative insights useless? Kelleher cites another aphorism, “There is no correlation without causation.” In other words, even if you don’t discover causative factors, some correlations are nevertheless useful.

 

Eric Siegel (@predictanalytic), founder of Predictive Analytics World and Text Analytics World, notes, “Data is the world’s most potent, flourishing unnatural resource. Accumulated in large part as the byproduct of routine tasks, it is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is inherently predictive. Thus begins a gold rush to dig up insightful gems. Does crime increase after a sporting event? Do online daters more consistently rated as attractive receive less interest? Do vegetarians miss fewer flights? Does your e-mail address reveal your intentions? Yes, yes, yes, and yes!”[4] Even though some correlations may prove to have predictive qualities, Siegel warns, “The dilemma is, as it is often said, correlation does not imply causation. The discovery of a predictive relationship between A and B does not mean one causes the other, not even indirectly. No way, no how.” He concludes:

“But do not fret. When applying predictive analytics, even though we generally don’t have firm knowledge about causation, we often don’t necessarily care. For many projects, the value comes from prediction, with only an avocational interest in understanding the world and figuring out what makes it tick. The freak show of surprising discoveries delivers predictive value even when it does little to explain itself.”

When a correlation appears to have predictive value, it can often lead to deeper investigation in an effort to find causality. Only artificial intelligence has the power to take on that kind of deeper investigation. For many purposes, however, finding correlations with predictive value is good enough. Since cognitive computing systems employ machine learning, they can continue to verify or disprove the predictive value of correlative insights.

 

Footnotes
[1] Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf, “Distinguishing cause from effect using observational data: methods and benchmarks,” ArXiv, 11 December 2014.
[2] Yanir Seroussi, “Why You Should Stop Worrying about Deep Learning and Deepen Your Understanding of Causality Instead,” Yanir Seroussi Blog, 14 February 2016.
[3] Adam Kelleher, “If Correlation Doesn’t Imply Causation, Then What Does?freeCodeCamp, 27 June 2016.
[4] Eric Siegel, “9 Bizarre and Surprising Insights from Data Science,” Scientific American, 21 September 2016.