Concurrent Modular Q-Learning with Local Rewards on Linked Multi-Component Robotic Systems - Foundations on Natural and Artificial Computation - page 151

Information Technology Reference

In-Depth Information

3 Hose Transport Application

Agroupof n agents attached at fixed points of a hose must move the hose

from an initial state to a goal configuration. The source of the hose is assumed

to be located in the

(0

,

0)

cell and robots are identified by an integer num-

[0

,n

− 1]

ber in range

, where the robot carrying the tip of the hose is labeled

as

and the rest labels are given according to the position on the hose as

shown in Figure 2. In this paper, simple line segments will be used to represent

thehoselinks.Let P i =(

0

P i ,P i )

denote the discrete coordinates position of

the i th agent on the grid at any time during the simulation. The task consists

in reaching a target configuration of the robots G

= {

G 0 ,G 1 , ..., G n− 1 }

,with

G i ,G i )

G i =(

, starting in the initial hose configuration I

= {

I 0 ,I 1 , ..., I n− 1 }

,

I i ,I i )

. The hose between i th and i

th robots is represented as

with I i =(

+1

the line segment P i −

P i +1 , and the hose as a whole has a maximum nominal

length of L hose times the size of grid cells. All segments have a maximum length

L

L hose

m

=

. Because each agent has a different goal: P i →

G i , all rewards were

local.

Fig. 2. Representation of the robots and the goal on the grid

3.1 Modules

Our experiments involved a variety of module combinations (including both ho-

mogeneous and heterogeneous agents) and the best results were achieved using

two goal-modules and three constraint-modules:

1.

Goal-1

: This module models the state as the combination of the distance

to the agent's own goal and the angle formed by the hose segment and the

goal. Training uses a global reward: whenever the agent reached the goal, a

positive reward was given, else a neutral value.

2.

Goal-2

: A simplification modeling the state as the distance to the goal.

Next Page

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home