Towards Reinforcement Learning with LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

developed in Chap. 5 can be used, and result in weight vector updates that

resemble those of XCS(F). On the side, this also demonstrates that XCS(F)

performs gradient descent without the need to be modified.

Regarding stability, it has been discussed which properties the approximation

operator provided by LCS has to satisfy in order to guarantee convergence with

approximate value iteration and policy iteration. These properties are all based

on a non-expansion to some norm, where the norm determines which method

can be applied. An initial analysis has been provided, but no conclusive answers

have been given, pending further research.

Related to stability is also the issue of learning long action sequences, which

was shown to cause problems in XCS due to its accuracy definition. While a

preliminary modification to XCS solves this issue for particular problem types

[12], it is not universally applicable. The introduced optimality criterion seems

more promising with this respect, but definite results have to wait until an

incremental LCS implementation is available that satisfies this criterion.

Overall, using LCS to approximate the value or action-value function in RL

is appealing as LCS dynamically adjust to the form of this function and thus

might provide a better approximation than standard function approximation

techniques. It should be noted, however, that the field of RL is moving quickly,

and that Q-Learning is by far not the best method that is currently available.

Hence, in order for LCS to be a competitive approach to sequential decision

tasks, they also need to keep track with new developments in RL, some of which

were discussed when detailing the exploration/exploitation dilemma that is an

essential component of RL.

In summary, it is obvious that there is still plenty of work to be done until LCS

can provide the same formal development as RL currently does. Nonetheless, the

initial formal basis is provided in this chapter, upon which other research can

build further analysis and improvements to how LCS handles sequential decision

tasks effectively, competitively, and with high reliability.

Search WWH ::

Custom Search

Home