Background - Design and Analysis of Learning Classifier Systems - page 13

Information Technology Reference

In-Depth Information

This chapters begins with a general overview of the problems that were the

prime motivator for the development of LCS. This is followed by a review of

the ideas behind LCS, describing the motivation and structure of Holland's first

LCS, the CS-1 [116]. Many of the LCS that followed had a similar structure

and so instead of describing them in detail, Sect. 2.2.5 focuses on some of the

problems that they struggled with. With the introduction of XCS [237] many of

these problems disappeared and the role of the classifier within the population

was redefined, as discussed in Sect. 2.3. However, as our theoretical understan-

ding even of XCS is still insucient, and as this work aims at advancing the

understanding of XCS and LCS in general, Sect. 2.4 gives an overview over re-

cent significant approaches to the theoretical analysis of LCS, before Sect. 2.5

puts the model-based design approach into the general LCS context.

2.1

A General Problem Description

Consider an agent that interacts with an environment. At each discrete time

step the environment is in a particular hidden state that is not observable by the

agent. Instead, the agent senses the observable state of the environment that is

stochastically determined by its hidden state. Based on this observed state, the

agent performs an action that changes the hidden state of the environment and

consequently also the observable state. The hidden state transitions conform to

the Markov property, such that the current hidden state is completely determined

s t +1

s t

s t +1

s t

r t

o t

a t

b t +1

b t

a t

r t

(a)

(b)

Fig. 2.1. The variables of a POMDP and an MDP involved in a single state transition

from state s t to state s t +1 after the agent performs action a t and receives reward

r t . Each node represents a random variable, and each arrow indicates a dependency

between two variables. (a) shows the transition in a POMDP, where the state s t is

hidden from the agent which observes o t instead. The agent's action depends on the

agent's belief b t about the real state of the environment and the currently observed state

o t . Based on this action and the environment's hidden state, a reward r t is received and

the environment performs a transition to the next state s t +1 . Additionally, the agent

update its belief b t +1 , based on the observed state o t . (b) shows the same transition in

an MDP where the agent can directly observe the environment's state s t ,andperforms

action a t based on that. This causes the agent to receive reward r t and the environment

to perform a state transition to s t +1 .

Next Page

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home