Information Technology Reference
In-Depth Information
3.
Distances
: The module models the length constraint on the hose segment
ahead of the robot. It receives a negative reward every time any of them
exceeds the maximum allowed length, and a neutral one otherwise.
4.
Collisions
: This module learns to avoid collisions between robots and hose
segments. State is modeled as a set of boolean values, each of which indicates
whether there is an obstacle (robot or hose segment) within one time-step
reach in a direction corresponding to one of the allowed actions.
5.
InGrid:
The robots were desired to stay within a predefined bounds. This
module gives a negative reward if the position is outside the allowed bounds,
neutral otherwise.
4 Experiments
A set of experiments were conducted to test our modular multi-agent learning
approach for L-MCRS. We used a grid of
21 × 21
cells, a maximum hose length
L hose of
26
300
: episodes
that didn't reach the goal within this limit were forced to abort. In all the cases,
the parameter had a fixed value:
cell units and the step limit count for episodes was set to
0
.
1
. Robots were allowed to take any of 9
actions at each time step: move North, North-east, East, South-east, South,
South-west, West, North-west or no move at all. All of them were considered to
be deterministic, always reaching the intended position. Episodes were randomly
generated to have a more realistic measure of performance, rejecting those that
didn't fulfill all the constraints. We report results on the improvement introduced
by the use of the veto system (Experiment B) versus the conventional greedy
action selection (Experiment A) in the concurrent training of the agents. Videos
were generated for each of the experiments to visually validate the simulations 1 .
Figure 3 shows the evolution of the percentage of episodes reaching the target as
the training evolves for both experiments for an increasing number of agents. In
both cases, increasing the number of agents dramatically reduces the capacity
to reach the goal. However results are much better for the veto system (figure
3(b)) than for the baseline greedy system (figure 3(a)).
5 Conclusions
We have presented a veto-based Modular RL techniques suitable to deal with L-
MCRS, applying it to the multi-robot hose transport problem. Results show that
combined use of separate constraint module training and a veto system leads to
a very good learning rate and success rate. We also studied the scalability of
the system and, although increasing the number of agents decreased slightly the
performance of the learning algorithm, the results obtained are very satisfactory.
In the future, our work will focus on introducing the learnt Q matrices in
a simulation environment that includes more realistic hose models for further
1 Some videos can be downloaded from http://www.ehu.es/ccwintco/index.php/
Borja-videos
 
Search WWH ::




Custom Search