Information Technology Reference
In-Depth Information
developed in Chap. 5 can be used, and result in weight vector updates that
resemble those of XCS(F). On the side, this also demonstrates that XCS(F)
performs gradient descent without the need to be modified.
Regarding stability, it has been discussed which properties the approximation
operator provided by LCS has to satisfy in order to guarantee convergence with
approximate value iteration and policy iteration. These properties are all based
on a non-expansion to some norm, where the norm determines which method
can be applied. An initial analysis has been provided, but no conclusive answers
have been given, pending further research.
Related to stability is also the issue of learning long action sequences, which
was shown to cause problems in XCS due to its accuracy definition. While a
preliminary modification to XCS solves this issue for particular problem types
[12], it is not universally applicable. The introduced optimality criterion seems
more promising with this respect, but definite results have to wait until an
incremental LCS implementation is available that satisfies this criterion.
Overall, using LCS to approximate the value or action-value function in RL
is appealing as LCS dynamically adjust to the form of this function and thus
might provide a better approximation than standard function approximation
techniques. It should be noted, however, that the field of RL is moving quickly,
and that Q-Learning is by far not the best method that is currently available.
Hence, in order for LCS to be a competitive approach to sequential decision
tasks, they also need to keep track with new developments in RL, some of which
were discussed when detailing the exploration/exploitation dilemma that is an
essential component of RL.
In summary, it is obvious that there is still plenty of work to be done until LCS
can provide the same formal development as RL currently does. Nonetheless, the
initial formal basis is provided in this chapter, upon which other research can
build further analysis and improvements to how LCS handles sequential decision
tasks effectively, competitively, and with high reliability.
 
 
Search WWH ::




Custom Search