Information Technology Reference
In-Depth Information
10.2 Reinforcement Learning Model
The reinforcement learning problem is meant a straightforward framing of the
problem of learning from interaction to achieve a goal. The learner or
decision-maker is called the agent. The thing it interacts with, comprising
everything outside the agent, is called the environment. These interact
continually, the agent selecting actions and the environment responding to those
actions and presenting new situations to the agent. The model of RL is illustrated
in Figure 10.1.
Environment
Reword
Action
State
Reinforcement Learning
system
More specifically, the agent exists in an environment described by some set
of possible state S .It can perform any of a set of possible action A. Each time it
performs an action at in some sate st the agent receives a real-valued reward r t
that indicates the immediate value of this state-action transition. This produces a
sequence of states s i , actions ai and immediate rewards r i . The task of the agent is
to learn a control policy ʩ :S ŗ A, that maximizes the expected sum of these
rewards, with future rewards discounted exponentially by their delay. The agent's
goal, roughly speaking, is to maximize the total amount of reward it receives
over the long run, as shown in Formula 10.1. In learning, the principle of RL is:
if the reward is positive, strengthen the action later, otherwise, weaken the action.
Fig. 10.1. Reinforcement learning model
Ã
(10.1)
i
γ
r
0
<
γ
1
t
+
i
i
=
0
A reinforcement learning task that satisfies the Markov property is called a
Markov decision process, or MDP. If the state and action spaces are finite, then it
is called a finite Markov decision process (finite MDP). Finite MDPs are
particularly important to the theory of reinforcement learning.
Markov decision process :
A markov decision process is defined by a 4-tuples
<S A R P>, where S is a set of possible state A is a set of possible action
Search WWH ::




Custom Search