Towards Reinforcement Learning with LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

For model-free RL, this means to model the action-value function estima-

tes by probability distributions for each state/action pair. Unfortunately, this

approach is not analytically tractable, as the distributions are strongly correla-

ted due to the state transitions. This leads to complex posterior distributions

that cannot be expressed analytically. A workaround is to use various assump-

tions and approximations that make the method less accurate but analytically

and computationally tractable. This workaround was used to develop Bayesian

Q-Learning [68] that, amongst other things, assumes the independence of all

action-value function estimates, and uses an action selection scheme that maxi-

mises the information gain. Its performance increase when compared to methods

based on confidence intervals is noticeable but moderate.

Bayesian model-based RL is more popular as it provides cleaner implementa-

tions. It is based on modelling the transition and reward function estimates by

probability distributions that are updated with new information. This results in

a problem that can be cast as a POMDP, and can be solved with the same me-

thods [84]. Unfortunately, this implies that it comes with the same complexity,

which makes it unsuitable for application to large problems. Nonetheless, some

implementations have been devices (for example, [185]), and research in Bayesian

RL is still very active. It is to hope that its complexity can be reduced by the

use of approximation, but without losing too much accuracy and maintaining

full distributions that are the advantage of the Bayesian approach.

So far, the only form of Bayesian RL that has been used with LCS is Baye-

sian Q-Learning by using Bayesian classifier models within a standard XCS(F),

with the result of more effective and stable action selection, when compared to

XCS(F) [1]. This approach could be extended to use the full Bayesian model

that was introduced here, once an incremental implementation is available. The

use of model-based Bayesian RL requires anticipatory LCS, but is immediate

contribution is questionable due to the high complexity of the RL method itself.

9.6

Summary

Despite sequential decision tasks being the prime motivator for LCS,theyare

still the ones which LCS handle least successfully. This chapter provides a pri-

mer on how to use dynamic programming and reinforcement learning to handle

such tasks, and on how LCS can be combined with either approach from first

principles. Also, some important issues regarding such combinations, as stability,

long path learning, and the exploration/exploitation dilemma were discussed.

An essential part of the LCS type discussed in this topic is that classifiers

are trained independently. This is not completely true when using LCS with

reinforcement learning, as the target values that the classifiers are trained on

are based on the global prediction, which is formed by all matching classifiers

in combination. In that sense, classifiers interact when forming their action-

value function estimates. Still, besides combining classifier predictions to form

the target values, independent classifier training still forms the basis of this

model type, even when used in combination with RL. Thus, the update equations

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home