Study of a Multi-Robot Collaborative Task through Reinforcement Learning - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

3. The distance between P and the goal decreases, and the distance between

P and the goal increases. (Figure 2 b)

4. The distance between P and the goal increases, and the distance between

P and the goal decreases. (Figure 3 b)

In the same way, we define four possibles actions to obtain these different states

that stablish the collaboration between two robots (Robot 1 and Robot 2):

1. Robot 1 and Robot 2 push in the same direction. They push towards the

goal.

2. Robot 1 push in the goal direction and Robot 2 push in the opposite direc-

tion.

3. Robot 1 push in the opposite direction and Robot 2 push in the goal direc-

tion.

4. Robot 1 and Robot 2 push in the same direction and they push in the

opposite direction to the goal.

We consider that the robots do not push to the same part at the same moment,

i.e, if the Robot 1 push in P part then the Robot 2 has to push in the P' part

necessarily. As we said above, the two robots have the same configuration so it

does not matter which robot pushes in which part.

3 Experimental Results

3.1 Nine States Configuration Approach

In these experiments we introduce some special states but conserves the same

number of possible actions that we have defined in the 2.3 subsection. We are

going to consider five new “no move” states. These new states are:

1. The distance between both P and P and the goal is the same.

2. The distance between P and the goal is the same, and the distance between P

and the goal decrease.

3. The distance between P and the goal is the same, and the distance between P

and the goal increase.

4. The distance between P and the goal decrease, and the distance between P

and the goal is the same.

5. The distance between P and the goal increase, and the distance between P

and the goal is the same.

We start our experiments taking the decision of evaluate every single step and

if the distance to the goal decrease we reward the action. With this approach

we define a matrix of rewards and update the Q matrix in all steps. With this

approach we have good results in the experiments but only in the cases where

the goal and the stick are in parallel path. Another problem with this approach

is to adjust the rewards matrix. So based in Sutton and Barto studies [6,8], we

decide to evaluate the complete episode and only award the behavior that finish

with the stick in the goal. Now we only have two cases:

Search WWH ::

Custom Search

Home