Information Technology Reference
In-Depth Information
All states in training include four absorption states. Assume that the offensive
in the left half, according to specifications of the standard Soccer server, the four
state are play on, goal left, goal kick right and free kick right. If taking the action
and the state reaching the four absorption states, the agent will be given the
ultimate reward
. For other actions, the agent will be given the procedure
rewards as immediate rewards. For example, the maximum reward value of goal
left is 1, which means shooting the ball to the goal region.
The agent will obtain the ultimate reward by taking several actions through
corresponding states. At this time, the state-action pair will get the reward value.
The core of Q-learning algorithm is that every state-action pair has its Q-value.
And the Q-values will be updated when getting the ultimate rewards. As
Robocup simulation platform add a smaller random noise in the design of the
state transition, the model is non-deterministic MDP. The Q value is updated by
the following equation.
)
r
(
)
(
(
(
)
(
)
Q
s a
,
=
1
α
Q
s a
,
+
α
r
+
γ
m ax
Q
,
)
s
a
(10.21)
t
+
1
t
+
1
where α =0.1 γ =0.95.
In actual training, the initial Q is 1. After about 20,000 of the training (to
reach a state of absorption), the majority of items in Q value has changed, and
has separated. Table 10.1 is the updated scene of Q-values with different training
numbers.
Table 10.1 Q value
initial value 5,000 10,000 20,000
Shoot 1 0.7342 0.6248 0.5311
Pass 1 0.9743 0.9851 0.993
Dribble 1 0.9012 0.8104 0.7242
Reinforcement Learning has received much attention in the past decade. Its
incremental nature and adaptive capabilities make it suitable for use in various
domains, such as automatic control, mobile robotics and multi-agent system. A
critical problem in conventional reinforcement learning is the slow convergence
of the learning process. However, in most learning systems there usually exists
prior knowledge in the form of human expertise or previously learned experience.
Therefore, how to integrate other machine learning techniques, such as neural
networks, symbol learning technology, to help accelerate the learning speed is an
Search WWH ::




Custom Search