Information Technology Reference
In-Depth Information
Concurrent Modular Q-Learning with Local
Rewards on Linked Multi-Component Robotic
Systems
Borja Fernandez-Gauna, Jose Manuel Lopez-Guede, and Manuel Graña
University of the Basque Country(UPV/EHU)
Abstract.
Applying conventional Q-Learning to Multi-Component
Robotic Systems (MCRS) increasing the number of components pro-
duces an exponential growth of state storage requirements. Modular ap-
proaches limit the state size growth to be polynomial on the number
of components, allowing more manageable state representation and ma-
nipulation. In this article, we advance on previous works on a modular
Q-learning approach to learn the distributed control of a Linked MCRS.
We have chosen a paradigmatic application of this kind of systems us-
ing only local rewards: a set of robots carrying a hose from some initial
configuration to a desired goal. The hose dynamics are simplified to be
a distance constraint on the robots positions.
1
Introduction
We are working on Linked Multi-component Robotic Systems (L-MCRS) [2,3].
Our previous work deals with the application of distributed control schemes
based on consensus techniques to very simple L-MCRS where the links are mod-
eled as springs [6,7]. As an alternative approach to traditional control, we have
proposed [5,9] the application of
algorithm to learn from experience.
We have already reported initial works on the application of a modular approach
[4,8] which are improved in this paper. Here we move the system to a desired
configuration of the robots and the hose.
The
Q-Learning
Q-Learning
algorithm belongs to the family of unsupervised
Reinforce-
ment Learning
(RL) methods [11]. It has become very popular because of its
good behavior and its simplicity. The algorithm does not require any
a priori
knowledge about the environment and it can be trained on simulated state tran-
sitions. A RL learner is assumed to observe discrete states
s
∈
S
from the world,
choose a discrete action
a
∈
A
to be taken following policy
π
:
S
→
A
and
observe new state
s
. A previously defined
reward function
immediately maps
perceived states to a scalar real
reward
(
r
) describing how good or desirable is
the new state:
r
is the immediately observed signal quali-
fying the observed state, the sum of all the rewards observed over time is called
the
:
S
→
R
.The
reward
continuous
tasks. Once a reward is obtained, an agent can update its previous knowledge
value
. Knowledge can be acquired from
episodic
(finite tasks) or
Search WWH ::
Custom Search