Reinforcement Learning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

10.2 Reinforcement Learning Model

The reinforcement learning problem is meant a straightforward framing of the

problem of learning from interaction to achieve a goal. The learner or

decision-maker is called the agent. The thing it interacts with, comprising

everything outside the agent, is called the environment. These interact

continually, the agent selecting actions and the environment responding to those

actions and presenting new situations to the agent. The model of RL is illustrated

in Figure 10.1.

Environment

Reword

Action

State

Reinforcement Learning

system

More specifically, the agent exists in an environment described by some set

of possible state S .It can perform any of a set of possible action A. Each time it

performs an action at in some sate st the agent receives a real-valued reward r t

that indicates the immediate value of this state-action transition. This produces a

sequence of states s i , actions ai and immediate rewards r i . The task of the agent is

to learn a control policy ʩ :S ŗ A, that maximizes the expected sum of these

rewards, with future rewards discounted exponentially by their delay. The agent's

goal, roughly speaking, is to maximize the total amount of reward it receives

over the long run, as shown in Formula 10.1. In learning, the principle of RL is:

if the reward is positive, strengthen the action later, otherwise, weaken the action.

Fig. 10.1. Reinforcement learning model

Ã

∞

(10.1)

i

γ

r

0

<

γ

≤

1

t

+

i

=

0

A reinforcement learning task that satisfies the Markov property is called a

Markov decision process, or MDP. If the state and action spaces are finite, then it

is called a finite Markov decision process (finite MDP). Finite MDPs are

particularly important to the theory of reinforcement learning.

Markov decision process :

A markov decision process is defined by a 4-tuples

<S A R P>, where S is a set of possible state A is a set of possible action

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home