Practical Deep Learning with PyTorch | Reinforcement Learning

Practical Deep Learning with PyTorch | Reinforcement Learning

PyTorch is one of the mainstream deep learning frameworks today, designed for minimal encapsulation and intuitive design, making PyTorch code easier to understand and very friendly for beginners.

This article mainly introduces the reinforcement learning section in the field of deep learning.

1

What is Reinforcement Learning

Reinforcement learning is an important branch of machine learning, alongside supervised and unsupervised learning, which are the three main learning methods in machine learning. The relationship between the three is illustrated in Figure 1.7. Reinforcement learning emphasizes how to act based on the environment to maximize expected benefits, so it can be understood as a decision-making problem. It is a product of interdisciplinary and cross-domain collaboration, inspired by the behavioral theory of psychology, which examines how organisms gradually form expectations of stimuli based on rewards or punishments from the environment, leading to habitual behaviors that maximize benefits. The application range of reinforcement learning is very broad, with different research focuses in various fields. In this book, we will not discuss these branches but focus on the general concepts of reinforcement learning.

Practical Deep Learning with PyTorch | Reinforcement Learning

Figure 1.7 Relationship diagram of reinforcement learning, supervised learning, and unsupervised learning

In practical applications, people often confuse reinforcement learning, supervised learning, and unsupervised learning. To better understand reinforcement learning and the differences between them, we first introduce the concepts of supervised learning and unsupervised learning.

Supervised learning trains an optimal model using labeled samples or corresponding results, then maps all inputs to their respective outputs to achieve classification.

Unsupervised learning clusters a sample set based on the similarity between samples when the labels of the samples are unknown, minimizing the intra-class variance to learn a classifier.

Both of these learning methods learn a mapping from input to output, understanding the relationship between input and output, and telling the algorithm what kind of input corresponds to what kind of output. In contrast, reinforcement learning receives feedback without any labels. It works by trying some actions, receiving a result, and adjusting previous behaviors based on whether that result was right or wrong. Through continuous trial and adjustment, the algorithm learns what actions to take under what circumstances to achieve the best results. Furthermore, the feedback in supervised learning is immediate, while in reinforcement learning, the feedback can be delayed, often requiring many steps before knowing if a previous choice was good or bad.

1

The Four Elements of Reinforcement Learning

Reinforcement learning mainly consists of four elements: agent, state, action, and reward. The relationships between them are illustrated in Figure 1.8, with detailed definitions as follows:

agent: The agent is the entity performing the task, which can only improve its strategy through interaction with the environment.

state: The representation of the environment in which the agent is located at each time point is the state.

action: The actions that the agent can take in each state are called actions.

reward: Upon reaching a state, the agent may receive feedback.

2

The Goal of Reinforcement Learning Algorithms

The goal of reinforcement learning algorithms is to obtain the maximum cumulative reward (positive feedback). For example, in the case of a child learning to walk, the child needs to learn to walk independently without guidance on how to do it. They learn to walk through continuous attempts and feedback from the environment.

In this example, as shown in Figure 1.8, the child is the agent, and the task of

Leave a Comment