Information Technology Reference
In-Depth Information
3. The distance between P and the goal decreases, and the distance between
P and the goal increases. (Figure 2 b)
4. The distance between P and the goal increases, and the distance between
P and the goal decreases. (Figure 3 b)
In the same way, we define four possibles actions to obtain these different states
that stablish the collaboration between two robots (Robot 1 and Robot 2):
1. Robot 1 and Robot 2 push in the same direction. They push towards the
goal.
2. Robot 1 push in the goal direction and Robot 2 push in the opposite direc-
tion.
3. Robot 1 push in the opposite direction and Robot 2 push in the goal direc-
tion.
4. Robot 1 and Robot 2 push in the same direction and they push in the
opposite direction to the goal.
We consider that the robots do not push to the same part at the same moment,
i.e, if the Robot 1 push in P part then the Robot 2 has to push in the P' part
necessarily. As we said above, the two robots have the same configuration so it
does not matter which robot pushes in which part.
3 Experimental Results
3.1 Nine States Configuration Approach
In these experiments we introduce some special states but conserves the same
number of possible actions that we have defined in the 2.3 subsection. We are
going to consider five new “no move” states. These new states are:
1. The distance between both P and P and the goal is the same.
2. The distance between P and the goal is the same, and the distance between P
and the goal decrease.
3. The distance between P and the goal is the same, and the distance between P
and the goal increase.
4. The distance between P and the goal decrease, and the distance between P
and the goal is the same.
5. The distance between P and the goal increase, and the distance between P
and the goal is the same.
We start our experiments taking the decision of evaluate every single step and
if the distance to the goal decrease we reward the action. With this approach
we define a matrix of rewards and update the Q matrix in all steps. With this
approach we have good results in the experiments but only in the cases where
the goal and the stick are in parallel path. Another problem with this approach
is to adjust the rewards matrix. So based in Sutton and Barto studies [6,8], we
decide to evaluate the complete episode and only award the behavior that finish
with the stick in the goal. Now we only have two cases:
Search WWH ::




Custom Search