A Learning Classifier Systems Model - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

state transition occurs when the agent performs an action from the action set

A

. Each of these state transitions is mediated by a scalar reward. The aim of

the agent is to find a policy, which is a mapping

that determines the

action in each state, that maximises the reward in the long run.

While it is possible to search the space of possible policies directly, a more

e cient approach is to compute the value function

X→A

X×A→ R

that determi-

nes for each state which long-term reward to expect when performing a certain

action. If a model of the state transitions and rewards is known, Dynamic Pro-

gramming (DP) can be used to compute this function. Reinforcement Learning

(RL), on the other hand, deals with finding the value function if no such model

is available. As the latter is commonly the case, Reinforcement Learning is also

theapproachemployedbyLCS.

There are two approaches to RL: either one learns a model of the transitions

and rewards by observations and then uses dynamic programming to find the

value function, called model-based RL, or one estimate the value function directly

while interacting with the environment, called model-free RL.

In the model-based case, a model of the state transitions and rewards needs

to be derived from the given observations, both of which are regression tasks. If

the policy is to be computed while sampling the environment, the model needs

to be updated incrementally, which requires an incremental learner.

In the model-free case, the function to model is the estimate of the value func-

tion, again leading to a regression task that needs to be handled incrementally.

Additionally, the value function estimate is also updated incrementally, and as

it is the data-generating process, this process is slowly changing. As a result,

there is a dynamic interaction between the RL algorithm that updates the value

function estimate and the incremental regression learner that models it, which

is not in all cases stable and needs special consideration [25]. These are additio-

nal diculties that need to be taken into account when performing model-free

RL.

Clearly, although the sequential decision task was the prime motivator for

LCS, it is also the most complex to tackle. Therefore, we deal with standard

regression and classification tasks first, and come back to sequential decision

tasks in Chap. 9. Even then it will be only dealt with from the theoretical

perspective of stability, as it requires an incremental learning procedure that

will not be developed here.

3.1.5

Batch vs. Incremental Learning

In batch learning it is assumed that the whole training set is available at once,

and that the order of the observations in that set is irrelevant. Thus, the model

can be trained with all data at once and in any order.

Incremental learning methods differ from batch learning in that the model

is updated with each additional observation separately, and as such can handle

observations that arrive sequentially as a stream. Revisiting the assumption of

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home