What Is Reinforcement Learning? (Definition, Uses)

Reinforcement learning is a training method in machine learning where an algorithm or agent completes a task through trial and error. An agent must explore a controlled environment and learn from its actions the optimal way to achieve a certain goal. Actions that bring an agent closer to its goal are considered positive while those that end in failure are negative.

Unlike techniques such as supervised and unsupervised learning, reinforcement learning doesn’t involve feeding data into an agent before it tries to perform an action. The agent must rely solely on learning from its experiences, allowing it to improve its decision-making over time as it compiles data from previous attempts.

When to Use Reinforcement Learning

Reinforcement learning has many applications and is used in gaming, recommendation engines, robotics, traffic light control and more.

Reinforcement learning delivers appropriate next actions by relying on an algorithm that tries to produce an outcome with the maximum reward. This allows reinforcement learning to control the engines for complex systems for a given state without the need for human intervention.

Reinforcement learning is the most conventional algorithm used to solve games. A famous example of this is AlphaGo, a reinforcement learning engine that was trained in countless human games and has been able to defeat best-in-class masters of renowned games for their difficulty, such as Go, through the use of the Monte Carlo tree search and neural networks in its policy network.

Personalized recommendation engines use an advanced form of reinforcement learning known as deep reinforcement learning to overcome challenges such as rapidly changing content, content fatigue and click rate to deliver recommendations with the greatest reward (ie a “yes” selection).

More From Built In ExpertsHow AI Teach Themselves Through Deep Reinforcement Learning

Applications of Reinforcement Learning


Reviewing patient data and past visit information, reinforcement learning can find a treatment that best meets the needs of each patient while also factoring in timetables for recovery. This speeds up medical diagnoses and ensures patients receive faster, more personalized treatments.


Learning models can analyze data gathered from sensors and anticipate how much energy will be spent when mixing and matching different variables. Reinforcement learning then determines the ideal conditions that minimize energy and costs when teams attempt to cool data centers.


In factories and warehouses, reinforcement learning powers the computer vision systems of robots. Mobile robots can also learn to navigate warehouse aisles, retrieving and transporting inventory while avoiding accidents.


Reinforcement learning can train self-driving cars how to operate safely by training in realistic environments. During testing, the algorithm learned how to take into account factors like staying in lanes, watching the speed limit and remaining aware of other drivers and pedestrians.


To combat congestion in urban environments, cities are turning to reinforcement learning to control traffic signals. Algorithms are trained on finding the best ways to operate traffic lights by considering variables such as the time of day and number of cars passing through an intersection.

Customer Service (NLP)

Reinforcement learning is a major part of natural language processing and helping customer service agents comprehend and respond to sentences. These approaches make possible various customer service technologies, including chatbots and virtual assistants.


Marketing teams often try to target customers with personalized recommendations, and this process becomes easier with reinforcement learning. By analyzing which products and webpages a customer spends the most time viewing, reinforcement learning models can determine other products that may pique a customer’s interest.


Reinforcement learning improves the artificial intelligence used to control non-player characters in video games. Applying reinforcement learning, AI characters can adopt different offensive and defensive tactics and figure out new ways to navigate the game’s landscape.

How Does Deep Reinforcement Learning Work?

Deep reinforcement learning combines reinforcement learning frameworks with artificial neural networks.

To help a software agent reach its reward, deep reinforcement learning combines reinforcement learning frameworks with artificial neural networks to map out a series of states and actions with the rewards they lead to, uniting function approximation and target optimization.

The inclusion of artificial neural networks allows reinforcement learning agents to tap into computer vision and time series prediction and facilitate real-time decision-making that is based on a reward and punishment system. Determining the best path to the maximum reward from a series of states and actions is responsible for AlphaGo and deep learning models besting top-tier human players in Atari video games such as Starcraft II and Dota-II, to name just a few examples.

An Introduction to Deep Reinforcement Learning | Video: Arxiv Insights

What Kinds of Problems Can Reinforcement Learning Solve?

Reinforcement learning helps solve problems in expected and probabilistic environments.

In expected environments, an action must be executed in a certain order to produce a reward and will be punished if other orders are pursued.

Rewards in probabilistic environments, however, are harder to determine due to the inclusion of probability, and consequently, determine the action that should be taken through a defined policy. A policy accounts for probability and determines the action that the agent should take based on the conditions of the environment.​

Limitations of Reinforcement Learning

Although reinforcement learning offers many advantages in a variety of fields, it does come with some drawbacks:

  • Time-consuming: Reinforcement learning requires lots of data to learn an action while undergoing a trial-and-error process.
  • Too complex: Reinforcement learning is designed to address more complex issues and is not a great fit for simpler problems.
  • Lacking experience: Training for reinforcement learning often occurs within controlled environments. This means an agent or algorithm may not be prepared for unique circumstances and events that can occur in the real world.
  • Difficult to understand: Reinforcement learning often involves complex neural networks that are difficult to analyze.
  • Potentially harmful: Reinforcement learning can raise tough ethical questions. What if a model develops its own shortcuts or makes decisions that put people in harm’s way?
  • Easily affected: Noisy data, human interactions and dynamic environments can all impact the performance of agents and make them less effective.

What is reinforcement learning?

Reinforcement learning is a machine learning training method that employs a trial-and-error approach to teach an algorithm or agent how to complete a specific task. Based on actions that result in reward or failure, the agent improves over time as it gathers data from each attempt and learns the optimal way for performing an action.

What is an example of reinforcement learning?

An example of reinforcement learning is AlphaGo, which played itself thousands of times to understand Go and how to beat human players.