Study of a Multi-Robot Collaborative Task through Reinforcement Learning - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

1. If the episode finish at the goal give a reward.

2. If the episode finish with the limit of the steps we penalize.

Later, when we have the reward for the episode, we go all over the steps and

calculate, applying equation1, the Q value for the differents steps. So we reward

all the steps that are involved in the correct episode.

In the figure 4we can see one example episode. In the figure, we show the

last episode where the robots push the stick from the initial position to the goal

represented with a green line.

In the figure 5 we can see the result for the experiment with nine states. The

convergence take place before the

500

episode.

Fig. 4. Examples from one learning episode

Fig. 5. Nine states convergence curve

3.2 Four States Refinement

We analyze the environment and observe that we can simplify the environment

definition in four states. We define the relation between the different states as we

can see in the figure 6. If we simplify in four states we can obtain the convergence

in too many episodes without losing any quality in the solution.

Search WWH ::

Custom Search

Home