Reinforcement Learning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

All states in training include four absorption states. Assume that the offensive

in the left half, according to specifications of the standard Soccer server, the four

state are play on, goal left, goal kick right and free kick right. If taking the action

and the state reaching the four absorption states, the agent will be given the

ultimate reward

. For other actions, the agent will be given the procedure

rewards as immediate rewards. For example, the maximum reward value of goal

left is 1, which means shooting the ball to the goal region.

The agent will obtain the ultimate reward by taking several actions through

corresponding states. At this time, the state-action pair will get the reward value.

The core of Q-learning algorithm is that every state-action pair has its Q-value.

And the Q-values will be updated when getting the ultimate rewards. As

Robocup simulation platform add a smaller random noise in the design of the

state transition, the model is non-deterministic MDP. The Q value is updated by

the following equation.

)

r

(

)

(

)

(

)

Q

s a

,

=

1

−

α

Q

s a

,

+

α

r

+

γ

m ax

Q

,

)

s

a

(10.21)

t

+

1

t

+

1

where α =0.1 γ =0.95.

In actual training, the initial Q is 1. After about 20,000 of the training (to

reach a state of absorption), the majority of items in Q value has changed, and

has separated. Table 10.1 is the updated scene of Q-values with different training

numbers.

Table 10.1 Q value

initial value 5,000 10,000 20,000

Shoot 1 0.7342 0.6248 0.5311

Pass 1 0.9743 0.9851 0.993

Dribble 1 0.9012 0.8104 0.7242

Reinforcement Learning has received much attention in the past decade. Its

incremental nature and adaptive capabilities make it suitable for use in various

domains, such as automatic control, mobile robotics and multi-agent system. A

critical problem in conventional reinforcement learning is the slow convergence

of the learning process. However, in most learning systems there usually exists

prior knowledge in the form of human expertise or previously learned experience.

Therefore, how to integrate other machine learning techniques, such as neural

networks, symbol learning technology, to help accelerate the learning speed is an

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home