Information Technology Reference
In-Depth Information
1. If the episode finish at the goal give a reward.
2. If the episode finish with the limit of the steps we penalize.
Later, when we have the reward for the episode, we go all over the steps and
calculate, applying equation1, the Q value for the differents steps. So we reward
all the steps that are involved in the correct episode.
In the figure 4we can see one example episode. In the figure, we show the
last episode where the robots push the stick from the initial position to the goal
represented with a green line.
In the figure 5 we can see the result for the experiment with nine states. The
convergence take place before the
500
episode.
Fig. 4. Examples from one learning episode
Fig. 5. Nine states convergence curve
3.2 Four States Refinement
We analyze the environment and observe that we can simplify the environment
definition in four states. We define the relation between the different states as we
can see in the figure 6. If we simplify in four states we can obtain the convergence
in too many episodes without losing any quality in the solution.
 
Search WWH ::




Custom Search