Concurrent Modular Q-Learning with Local Rewards on Linked Multi-Component Robotic Systems - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

3.

Distances

: The module models the length constraint on the hose segment

ahead of the robot. It receives a negative reward every time any of them

exceeds the maximum allowed length, and a neutral one otherwise.

4.

Collisions

: This module learns to avoid collisions between robots and hose

segments. State is modeled as a set of boolean values, each of which indicates

whether there is an obstacle (robot or hose segment) within one time-step

reach in a direction corresponding to one of the allowed actions.

5.

InGrid:

The robots were desired to stay within a predefined bounds. This

module gives a negative reward if the position is outside the allowed bounds,

neutral otherwise.

4 Experiments

A set of experiments were conducted to test our modular multi-agent learning

approach for L-MCRS. We used a grid of

21 × 21

cells, a maximum hose length

L hose of

26

300

: episodes

that didn't reach the goal within this limit were forced to abort. In all the cases,

the parameter had a fixed value:

cell units and the step limit count for episodes was set to

0

.

1

. Robots were allowed to take any of 9

actions at each time step: move North, North-east, East, South-east, South,

South-west, West, North-west or no move at all. All of them were considered to

be deterministic, always reaching the intended position. Episodes were randomly

generated to have a more realistic measure of performance, rejecting those that

didn't fulfill all the constraints. We report results on the improvement introduced

by the use of the veto system (Experiment B) versus the conventional greedy

action selection (Experiment A) in the concurrent training of the agents. Videos

were generated for each of the experiments to visually validate the simulations 1 .

Figure 3 shows the evolution of the percentage of episodes reaching the target as

the training evolves for both experiments for an increasing number of agents. In

both cases, increasing the number of agents dramatically reduces the capacity

to reach the goal. However results are much better for the veto system (figure

3(b)) than for the baseline greedy system (figure 3(a)).

5 Conclusions

We have presented a veto-based Modular RL techniques suitable to deal with L-

MCRS, applying it to the multi-robot hose transport problem. Results show that

combined use of separate constraint module training and a veto system leads to

a very good learning rate and success rate. We also studied the scalability of

the system and, although increasing the number of agents decreased slightly the

performance of the learning algorithm, the results obtained are very satisfactory.

In the future, our work will focus on introducing the learnt Q matrices in

a simulation environment that includes more realistic hose models for further

1 Some videos can be downloaded from http://www.ehu.es/ccwintco/index.php/

Borja-videos

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home