Information Technology Reference
In-Depth Information
the convergence could not be ensured. Therefore, the new methods of function
approximation which have both convergence and high speed, is still one of the
most important research in reinforcement learning.
10.8 Reinforcement Learning Applications
Reinforcement Learning addresses the question how an autonomous agent that
senses and acts in its environment can learn to choose optimal actions to achieve
its goals. In a Markov decision process (MDP) the agent can perceive a set S of
distinct states of its environment and has a set A of actions that it can perform. At
each discrete time step t, the agent senses the current state s t , chooses a current
action at, and performs it. The environment responds by giving the agent a
reward
r
=
Q
(
s
,
a
)
and by producing the succeeding state s t+1 = P(s t ,a t ). Here
the functions P and Q are part of the environment and are not necessarily known
to the agent. In an MDP, the functions P and Q depend only on the current state
and action, and not on earlier states or actions. Reinforcement learning is a useful
way to solve MDP problems. Reinforcement Learning reaches its goal by
learning
t
t
t
r
=
Q
(
s
,
a
)
reward
function
and
state
transition
function
t
t
t
r
=
Q
(
s
,
a
)
P(s t ,a t ).Q-learning acquires the optimal policy by learning
.
RoboCup is an international robotics competition founded in 1993. The aim is
to develop autonomous robots with the intention of promoting research and
education in the field of artificial intelligence. The name RoboCup is a
contraction of the competition's full name, “Robot Soccer World Cup”. The
following is the application of Q-learning algorithm to simulate robot soccer with
three members (2 to 1). The training is aimed at trying to get to the main strategy
of awareness in the attack when running. In Figure 10.10, striker A controls the
ball in the shoot region. But A has no angle to shoot; teammate B also is in the
shoot region, and B has a good shot angle. Thus A pass ball to B, and B complete
the shot. Then the cooperation is very successful. Through Q-learning approach
in the training, the action A pass ball to B is the best action in this state after
training a large number of examples.
Figure 10.11 illustrates the description of states. The attack region is divided
into 20*8 small regions. Each small region is a square with the length of 2m. A
two-dimensional array A i,j (0 ±
t
t
t
± 7) can be used to describe the region.
The attack state can be described by the location of three Agents. Fig. 10.11
shows the generalization of the state. The state in the same region can be
i
± 19, 0 ±
j
Search WWH ::




Custom Search