Information Technology Reference
In-Depth Information
Concurrent Modular Q-Learning with Local
Rewards on Linked Multi-Component Robotic
Systems
Borja Fernandez-Gauna, Jose Manuel Lopez-Guede, and Manuel Graña
University of the Basque Country(UPV/EHU)
Abstract. Applying conventional Q-Learning to Multi-Component
Robotic Systems (MCRS) increasing the number of components pro-
duces an exponential growth of state storage requirements. Modular ap-
proaches limit the state size growth to be polynomial on the number
of components, allowing more manageable state representation and ma-
nipulation. In this article, we advance on previous works on a modular
Q-learning approach to learn the distributed control of a Linked MCRS.
We have chosen a paradigmatic application of this kind of systems us-
ing only local rewards: a set of robots carrying a hose from some initial
configuration to a desired goal. The hose dynamics are simplified to be
a distance constraint on the robots positions.
1
Introduction
We are working on Linked Multi-component Robotic Systems (L-MCRS) [2,3].
Our previous work deals with the application of distributed control schemes
based on consensus techniques to very simple L-MCRS where the links are mod-
eled as springs [6,7]. As an alternative approach to traditional control, we have
proposed [5,9] the application of
algorithm to learn from experience.
We have already reported initial works on the application of a modular approach
[4,8] which are improved in this paper. Here we move the system to a desired
configuration of the robots and the hose.
The
Q-Learning
Q-Learning
algorithm belongs to the family of unsupervised
Reinforce-
ment Learning
(RL) methods [11]. It has become very popular because of its
good behavior and its simplicity. The algorithm does not require any
a priori
knowledge about the environment and it can be trained on simulated state tran-
sitions. A RL learner is assumed to observe discrete states s
S from the world,
choose a discrete action a
A to be taken following policy π
:
S
A and
observe new state s . A previously defined
reward function
immediately maps
perceived states to a scalar real
reward
( r ) describing how good or desirable is
the new state: r
is the immediately observed signal quali-
fying the observed state, the sum of all the rewards observed over time is called
the
:
S
R
.The
reward
continuous
tasks. Once a reward is obtained, an agent can update its previous knowledge
value
. Knowledge can be acquired from
episodic
(finite tasks) or
 
Search WWH ::




Custom Search