Background - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

are known as Markov Decision Processes (MDPs), as illustrated in Fig. 2.1(b),

and are dealt with in more detail in Chap. 9. They are approached by LCS by

the use of reinforcement learning which is centred on learning the expected sum of

rewards for each state when following the optimal policy. Thus, the intermediate

aim is to learn a value function that maps the states into their respective expected

sum of rewards, which is a univariate regression problem. An example of such a

value function and the policy derived from it is shown in Fig. 2.2.

Even though the ultimate aim of LCS is to handle MDPs and POMDPs, they

firstly need to be able to master univariate regression problems. With that in

mind, this work focuses on LCS models and approaches to handle such pro-

blems, and how the same approach can equally well be applied to multivariate

regression and classification problems. In addition, a separate chapter describes

how the same approach can be potentially extended to handle MDPs, and which

additional considerations need to be made. Nonetheless, it needs to be empha-

sised that the theoretical basis of applying LCS to MDPs and POMDPs is still

in its infancy, and further work on this topic is urgently required. Still, due to

their initial focus on POMDPs, these are the tasks that will be considered when

introducing LCS.

2.2

Early Learning Classifier Systems

The primary problems that LCS were designed to handle are sequential decision

tasks that can be defined by POMDPs. In LCS it is assumed that each observed

state is a composite element that is identified by the collection of its features, such

that the agent is able to associate the choice of action with certain features of the

state. This allows the agent to generalise over certain features and possibly also

over certain states when defining its choice of action for each of the states. The

aim of LCS is not only so find the optimal policy for a given POMDP, but also to

exploit the possible generalisations to find the minimal solution representation.

At the time of their initial introduction the link between the tasks that LCS

aim at solving and POMDPs was not yet established. As a consequence, there was

neither a clear understanding that the regression task underlying value function

learning is an intermediate step that needs to be achieved in order to eciently

learn optimal policies for given POMDPs, nor were objective functions available

that captured all facets of their aim. Rather, their design was approached by the

definition of sub-problems that each LCS has to solve, and a description of the

various LCS subsystems. Only over the last 15 years the relation between LCS,

MDPs and regression tasks became clearer, which resulted in exciting develop-

ments of new LCS and a more transparent understanding of their structure. The

chronological introduction to LCS aims at capturing this paradigm shift.

2.2.1

Initial Idea

Although some of Holland's earlier work [109, 110, 111] had already introdu-

ces some ideas for LCS, a more specific framework was finally defined in [114].

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home