Information Technology Reference
In-Depth Information
are known as Markov Decision Processes (MDPs), as illustrated in Fig. 2.1(b),
and are dealt with in more detail in Chap. 9. They are approached by LCS by
the use of reinforcement learning which is centred on learning the expected sum of
rewards for each state when following the optimal policy. Thus, the intermediate
aim is to learn a value function that maps the states into their respective expected
sum of rewards, which is a univariate regression problem. An example of such a
value function and the policy derived from it is shown in Fig. 2.2.
Even though the ultimate aim of LCS is to handle MDPs and POMDPs, they
firstly need to be able to master univariate regression problems. With that in
mind, this work focuses on LCS models and approaches to handle such pro-
blems, and how the same approach can equally well be applied to multivariate
regression and classification problems. In addition, a separate chapter describes
how the same approach can be potentially extended to handle MDPs, and which
additional considerations need to be made. Nonetheless, it needs to be empha-
sised that the theoretical basis of applying LCS to MDPs and POMDPs is still
in its infancy, and further work on this topic is urgently required. Still, due to
their initial focus on POMDPs, these are the tasks that will be considered when
introducing LCS.
2.2
Early Learning Classifier Systems
The primary problems that LCS were designed to handle are sequential decision
tasks that can be defined by POMDPs. In LCS it is assumed that each observed
state is a composite element that is identified by the collection of its features, such
that the agent is able to associate the choice of action with certain features of the
state. This allows the agent to generalise over certain features and possibly also
over certain states when defining its choice of action for each of the states. The
aim of LCS is not only so find the optimal policy for a given POMDP, but also to
exploit the possible generalisations to find the minimal solution representation.
At the time of their initial introduction the link between the tasks that LCS
aim at solving and POMDPs was not yet established. As a consequence, there was
neither a clear understanding that the regression task underlying value function
learning is an intermediate step that needs to be achieved in order to eciently
learn optimal policies for given POMDPs, nor were objective functions available
that captured all facets of their aim. Rather, their design was approached by the
definition of sub-problems that each LCS has to solve, and a description of the
various LCS subsystems. Only over the last 15 years the relation between LCS,
MDPs and regression tasks became clearer, which resulted in exciting develop-
ments of new LCS and a more transparent understanding of their structure. The
chronological introduction to LCS aims at capturing this paradigm shift.
2.2.1
Initial Idea
Although some of Holland's earlier work [109, 110, 111] had already introdu-
ces some ideas for LCS, a more specific framework was finally defined in [114].
 
Search WWH ::




Custom Search