Most articles written about artificial intelligence (AI) focus on machine learning (ML). Nicole Martin, owner of NR Digital Consulting, writes, “When it comes to Big Data, these computer science terms are often used interchangeably, but they are not the same thing.”[1] She asserts machine learning is basically a self-explanatory term. “The important thing to remember with ML,” she writes, “is that it can only output what is input based on the large sets of data it is given. It can only check from what knowledge it has been ‘taught.’ If that information is not available, it cannot create an outcome on its own.” On the other hand, she explains, “AI can create outcomes on its own and do things that only a human could do. ML is a part of what helps AI by taking the data that it has been learned and then the AI takes that information along with past experiences and changes behavior accordingly.” She adds, “They are both crucial to the future of technology.” Chris Meserole (@chrismeserole), a fellow in Foreign Policy at the Brookings Institution, believes AI and ML are so interconnected that trying to separate them is a distinction without a difference. He writes, “Machine learning is now so popular that it has effectively become synonymous with artificial intelligence itself. As a result, it’s not possible to tease out the implications of AI without understanding how machine learning works — as well as how it doesn’t.”[2]
What is machine learning?
Stacie Bogdan, head of the Strategy and Insights team at Leapfrog, provides this straight forward definition of machine learning: “Machine learning is the use of computerized algorithms to analyze large amounts of data; for the machine to learn from this data; and, to make predictions and continually apply learning to new data — all accomplished faster and more efficiently than humanly possible.”[3] Meserole notes, “Early [AI] efforts focused primarily on what’s known as symbolic AI, which tried to teach computers how to reason abstractly. But today the dominant approach by far is machine learning, which relies on statistics instead. … The core insight of machine learning is that much of what we recognize as intelligence hinges on probability rather than reason or logic. If you think about it long enough, this makes sense.” The thing to remember about statistics, however, is it often provides correlation rather than causation. Kalev Leetaru (@kalevleetaru), a Senior Fellow at the George Washington University Center for Cyber & Homeland Security, writes, “Lost amongst the hype and hyperbole surrounding machine learning today, especially deep learning, is the critical distinction between correlation and causation.”[4] He agrees with Meserole’s description of machine learning results as being nothing more than “statistical patterns encoded into software.” He adds, “We must recognize that those patterns are merely correlations amongst vast reams of data, rather than causative truths or natural laws governing our world.”
While correlations can provide valuable insights, caution is warranted. Correlations can easily lead one to believe in something that is not true. To drive this point home with a bit of humor, a Harvard Law School student and self-proclaimed statistical provocateur named Tyler Vigen (@TylerVigen) started a site called Spurious Correlations. Vigen has shown, for example, that there is an annual correlation between the number of people who have drowned by falling into a swimming pool and the number of films in which Nicolas Cage has appeared and that the divorce rate in Maine correlates to the per capita consumption of margarine in the United States. Leetaru admits pattern recognition has its place. “It is entirely reasonable,” he writes, “to use machine learning algorithms to sift out extraordinarily nuanced patterns in large datasets. Indeed, a very powerful application of machine learning can be around identifying all of the unexpected patterns underlying phenomena of interest in a dataset or to verify that expected patterns exist. Where things go wrong is when we reach beyond these correlations towards implying causation.”
By far, the type of machine garnering the most attention is the neural network. Data scientist David Fumo (@fumodavi) notes other machine learning algorithms include: Nearest Neighbor; Naive Bayes; Decision Trees; Linear Regression; and Support Vector Machines.[5] He adds, “There some variations of how to define the types of Machine Learning Algorithms but commonly they can be divided into categories according to their purpose and the main categories are the following: Supervised learning; unsupervised learning; semi-supervised learning; and reinforcement learning. Obviously, applying the right algorithm to the right problem in the right way is critical.
Machine learning and business
Machine learning has increased in importance in the digital age. Every second of every day oceans of data are being generated and valuable insights are locked inside that data. Machine learning feeds on data — generally speaking, the more the merrier. As noted above, applying the right algorithm to the right problem in the right way is critical to obtain useful insights. Here’s the rub. Hui Li, a Principal Staff Scientist of Data Science Technologies at SAS, asserts, “Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms.”[6] When faced with the question, “which algorithm should I use,” Li notes, “The answer to the question varies depending on many factors, including: the size, quality, and nature of data; the available computational time; the urgency of the task; and what you want to do with the data.” She adds, “When presented with a dataset, the first thing to consider is how to obtain results, no matter what those results might look like. Beginners tend to choose algorithms that are easy to implement and can obtain results quickly. This works fine, as long as it is just the first step in the process. Once you obtain some results and become familiar with the data, you may spend more time using more sophisticated algorithms to strengthen your understanding of the data, hence further improving the results.”
In her article, Li offers a “Cheat Sheet” for selecting algorithms and discusses how the Cheat Sheet is used. More importantly, she explains for what purposes various algorithms can be used. If you are just getting interested in machine learning, I highly recommend reading her entire article. Machine learning applications can be found in almost every industry — from monitoring equipment for potential breakdowns to targeting promising market segments for advertising campaigns. As Meserole concludes, “From autonomous cars to multiplayer games, machine learning algorithms can now approach or exceed human intelligence across a remarkable number of tasks. … The coming decade will see an explosion in applications that combine the ability to recognize what is happening in the world with the ability to move and interact with it. Those applications will transform the global economy and politics in ways we can scarcely imagine today.”
Footnotes
[1] Nicole Martin, “Machine Learning And AI Are Not The Same: Here’s The Difference,” Forbes, 19 March 2019.
[2] Chris Meserole, “What is machine learning?” The Brookings Institution, 4 October 2018.
[3] Stacie Bogdan, “7 Myths About Machine Learning,” The Marketing Insider, 6 February 2019.
[4] Kalev Leetaru, “A Reminder That Machine Learning Is About Correlations Not Causation,” Forbes, 15 January 2019.
[5] David Fumo, “Types of Machine Learning Algorithms You Should Know,” Towards Data Science, 15 June 2017.
[6] Hui Li, “Which machine learning algorithm should I use?” SAS Blogs, 12 April 2017.