Is Reinforcement Learning the Future of Artificial Intelligence?

Stephen DeAngelis

October 12, 2021

Artificial intelligence (AI) is a controversial subject — made all the more controversial by science fiction writers who often depict dystopian futures in which humans are hunted to distinction by artificial intelligence systems. Even luminaries like Stephen Hawking and Elon Musk have issued warnings about the dangers of artificial intelligence. The type of AI generating these concerns — artificial general intelligence — doesn’t yet exist. However, scientists at DeepMind believe it’s only a matter of time. And the road to a sentient machine, they assert, will be paved by reinforcement (or reward maximization) learning. Software engineer and tech analyst Ben Dickson (@bendee983) elaborates:

“In their decades-long chase to create artificial intelligence, computer scientists have designed and developed all kinds of complicated mechanisms and technologies to replicate vision, language, reasoning, motor skills, and other abilities associated with intelligent life. While these efforts have resulted in AI systems that can efficiently solve specific problems in limited environments, they fall short of developing the kind of general intelligence seen in humans and animals. In a new paper submitted to the peer-reviewed Artificial Intelligence journal, scientists at U.K.-based AI lab DeepMind argue that intelligence and its associated abilities will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.”[1]

Reinforcement learning is a powerful machine learning technique. Whether or not it will lead to a functional AGI system is debatable. What’s not debatable is that reinforcement learning is going to play a significant role in our future. Tech writer Apoorva Bellapu writes, “Reinforcement learning makes use of algorithms that do not rely only on historical data sets, to learn to make a prediction or perform a task. Just like we humans learn using trial and error, these algorithms also do the same. In addition to accelerating as well as improving the design, reinforcement learning has grabbed attention for its wide application in a range of areas — time-series forecasting in highly dynamic conditions, solving complex logistics problems, and coming up with recommendations based on the behaviors and preferences, to name a few.”[2]

The Basics of Reinforcement Learning

Kathryn Hume (@HumeKathryn), interim Head of Borealis AI, and Matthew E. Taylor, an Associate Professor of Computing Science at the University of Alberta, explain, “Reinforcement learning [is] a mature machine learning technology that’s good at optimizing tasks. To do so, an agent takes a series of actions over time, and each action is informed by the outcome of the previous ones. Put simply, it works by trying different approaches and latching onto — reinforcing — the ones that seem to work better than the others. With enough trials, you can reinforce your way to beating your current best approach and discover a new best way to accomplish your task.”[3] Despite its proven effectiveness, Hume and Taylor note, “Reinforcement learning is mostly used in academia and niche areas like video games and robotics. Companies such as Netflix, Spotify, and Google have started using it, but most businesses lag behind. Yet opportunities are everywhere. In fact, any time you have to make decisions in sequence — what AI practitioners call sequential decision tasks — there a chance to deploy reinforcement learning.”

Dickson points out that reinforcement learning plays a particularly important role in robotics. “At the heart of most robotics applications,” he writes, “is reinforcement learning — a branch of machine learning that is based on actions, states, and rewards. A reinforcement learning agent is given a set of actions that it can apply to its environment to obtain rewards or reach a certain goal. These actions create changes to the state of the agent and the environment. The RL agent receives rewards based on how its actions bring it closer to its goal. RL agents usually start by knowing nothing about their environment and selecting random actions. As they gradually receive feedback from their environment, they learn sequences of actions that can maximize their rewards.”[4] Dickson goes on to note, “This scheme is used not only in robotics, but in many other applications such as self-driving cars and content recommendation.”

McKinsey & Company analysts, Jacomo Corbo, Oliver Fleming, and Nicolas Hohn, note that reinforcement learning was used in advance of the 2021 America’s Cup yacht races to improve boat design and performance. They explain “To remain competitive, sailing teams in the America’s Cup contest, like all businesses, must push the boundaries of what is possible. They also face similar constraints, including a steep development curve and a small window of opportunity, meaning teams can pursue only one or two big experiments to up their performance in the sport’s most important competition. For the 2021 edition of the America’s Cup, reigning champion Emirates Team New Zealand ventured that reinforcement learning, an advanced AI technique, could optimize its design process. The technique delivered, enabling the team to test exponentially more boat designs and achieve a performance advantage that helped it secure its fourth Cup victory.”[5]

Business Applications of Reinforcement Learning

Most companies aren’t playing games, building robots, or designing sailing vessels. Nevertheless, experts say reinforcement learning can help in more traditional business environments as well. The McKinsey analysts note, “Besides accelerating and improving design, reinforcement learning is increasingly being incorporated into a broad range of complex applications: recommending products in systems where customer behaviors and preferences change rapidly; time-series forecasting in highly dynamic conditions; solving complex logistics problems that combine packing, routing, and scheduling; and even accelerating clinical trials and impact analysis of economic and health policies on consumers and patients.” Paul Mah (@paulmah), Editor of DSAITrends, adds, “In a world where there is often no ‘right’ answer, reinforcement learning could be the next big thing in business.”[6] Mah goes on to explain:

“Because reinforcement learning systems figure things out through trial and error, it works best in situations where an action or sequence of events is rapidly established, and feedback is obtained quickly to determine the next course of action — there is no need for reams of historical data for reinforcement learning to crunch through. A stock market algorithm that can make hundreds of actions per day is hence an optimal use case for reinforcement learning while optimizing customer lifetime value over years is not. It is worth noting that reinforcement learning does not work well with ambiguity but is superb at optimization tasks using established metrics in the form of inputs, actions, and rewards. This makes reinforcement learning ideal for the automation of processes or for managing dense, data-generating business processes.”

McKinsey analysts insist the COVID-19 pandemic has brought the value of reinforcement learning for businesses to the forefront. They explain, “Whereas once retailers could reasonably expect that past consumer behaviors would indicate future preferences, they now operate in a world where consumer purchase patterns and preferences evolve rapidly — all the more so as the COVID-19 pandemic repeatedly redefines life. Manufacturers and consumer-packaged-goods companies are under pressure to build dynamic supply chains that account for climate, political, and societal shifts anywhere in the world at a moment’s notice. Each of these challenges represents a complex and highly dynamic optimization problem, which, with the right data and feedback loops, is well suited for solving with reinforcement learning.”

Concluding Thoughts

McKinsey analysts conclude, “Executives who today understand the potential of reinforcement learning will be better positioned to find the edge in their industries.” They add “In our experience, one of the best ways to know if a given process is ready for reinforcement learning is to ask, ‘What business challenges haven’t we been able to solve with traditional modeling approaches?’ Look for areas where teams are conducting AI projects with other methods but haven’t been able to bring them into production because the environment is too dynamic and the models deliver inconsistent results, require too many assumptions and approximations about the data, or cannot handle the full scope of business needs.” As Mah pointed out, reinforcement learning isn’t a silver bullet nor is it the right approach for every problem. Even when a good problem has been identified for reinforcement learning, Mah recommends patience. “Though a mature technology,” he writes, “reinforcement learning is hardly magic and will not find the optimal path from day one. False starts are possible, too. However, deployed well and given time, reinforcement learning can potentially find surprising, creative solutions to help organizations outpace their competition.”

Footnotes
[1] Ben Dickson, “DeepMind says reinforcement learning is ‘enough’ to reach general AI,” VentureBeat, 9 June 2021.
[2] Apoorva Bellapu, “Reinforcement Learning for a Better Tomorrow,” Analytics Insight, 24 July 2021.
[3] Kathryn Hume and Matthew E. Taylor, “Why AI That Teaches Itself to Achieve a Goal Is the Next Big Thing,” Harvard Business Review, 21 April 2021.
[4] Ben Dickson, “Reinforcement learning challenge to push boundaries of embodied AI,” TechTalks, 26 April 2021.
[5] Jacomo Corbo, Oliver Fleming, and Nicolas Hohn, “It’s time for businesses to chart a course for reinforcement learning,” McKinsey & Company, 1 April 2021.
[6] Paul Mah, “Why Reinforcement Learning Matters in Business,” CDO Trends, 5 May 2021.

On the Road to AI Superintelligence

New knowledge is being generated at such a dramatic rate that humans can no longer be expected to absorb and understand it. Pippa Malmgren, Founder

The Rise of A.I. Is Not Like the Dotcom Bubble

Nearly three decades ago, the world experienced what became known as the dotcom bubble. Many of the start-ups that popped up during that time raised