Reinforcement Learning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

the convergence could not be ensured. Therefore, the new methods of function

approximation which have both convergence and high speed, is still one of the

most important research in reinforcement learning.

10.8 Reinforcement Learning Applications

Reinforcement Learning addresses the question how an autonomous agent that

senses and acts in its environment can learn to choose optimal actions to achieve

its goals. In a Markov decision process (MDP) the agent can perceive a set S of

distinct states of its environment and has a set A of actions that it can perform. At

each discrete time step t, the agent senses the current state s t , chooses a current

action at, and performs it. The environment responds by giving the agent a

reward

r

=

Q

(

s

,

a

)

and by producing the succeeding state s t+1 = P(s t ,a t ). Here

the functions P and Q are part of the environment and are not necessarily known

to the agent. In an MDP, the functions P and Q depend only on the current state

and action, and not on earlier states or actions. Reinforcement learning is a useful

way to solve MDP problems. Reinforcement Learning reaches its goal by

learning

t

r

=

Q

(

s

,

a

)

reward

function

and

state

transition

function

t

r

=

Q

(

s

,

a

)

P(s t ,a t ).Q-learning acquires the optimal policy by learning

.

RoboCup is an international robotics competition founded in 1993. The aim is

to develop autonomous robots with the intention of promoting research and

education in the field of artificial intelligence. The name RoboCup is a

contraction of the competition's full name, “Robot Soccer World Cup”. The

following is the application of Q-learning algorithm to simulate robot soccer with

three members (2 to 1). The training is aimed at trying to get to the main strategy

of awareness in the attack when running. In Figure 10.10, striker A controls the

ball in the shoot region. But A has no angle to shoot; teammate B also is in the

shoot region, and B has a good shot angle. Thus A pass ball to B, and B complete

the shot. Then the cooperation is very successful. Through Q-learning approach

in the training, the action A pass ball to B is the best action in this state after

training a large number of examples.

Figure 10.11 illustrates the description of states. The attack region is divided

into 20*8 small regions. Each small region is a square with the length of 2m. A

two-dimensional array A i,j (0 ±

t

± 7) can be used to describe the region.

The attack state can be described by the location of three Agents. Fig. 10.11

shows the generalization of the state. The state in the same region can be

i

± 19, 0 ±

j

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home