An Algorithmic Description - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

does). Rather, the globally matching classifier should be rearranged such that

it does not directly compete with the specific classifier in modelling its part of

the data. The resulting pair of classifiers would then cooperate to model a part

of the data and can be seen as a building block of a potentially good model

structure. Thus, while these building blocks exist, they are not exploited when

using the MCMC algorithm for model structure search.

When using a GA for model structure search, on the other hand, the po-

pulation of individuals can contain several potentially useful building blocks,

and it is the responsibility of the crossover operator to identify and recombine

them. As shown by Syswerda [210], uniform crossover generally yields better re-

sults that one-point and two-point crossover. The crossover operator that is used

aims at uniform crossover for variable-length individuals. Further improvement

in identifying building blocks can be made by using Estimation of Distribution

Algorithms (EDAs) [183], but as there are currently no EDAs that directly apply

to the problem structure at hand [150] this topic requires further investigation.

8.3

Empirical Demonstration

To demonstrate the usefulness of the optimality criterion that was introduced

in the last chapter, the previously described algorithms are used to find a good

set of classifiers for a set of simple regression tasks. These tasks are kept simple

in the sense that the number of classifiers that are expected to be required

are low, such that the

( K 3 )complexityof ModelProbability does not cause

any computational problems. Additionally, the crudeness of the model structure

search procedures does not allow us to handle problems where the best solution is

given by a complex agglomeration of classifiers. All regression tasks have D X =1

and D Y = 1 such that the results can be visualised easily. The mixing features

are given by φ ( x ) = 1 for all x . Not all functions are standardised, but their

domain is always within [-1:4] and their range is within [-1:1]. For all experiments,

classifiers that model straight lines are used, together with uninformative priors

and hyperpriors as given in Table 8.1.

Even though the prime problems that most new LCS are tested against are

Multiplexer problems of various lengths [237], they are a challenge for the model

structure search rather than the optimality criterion and thus are not part of the

provided test set. Rather, a significant amount of noise is added to the data, as

the aim is to provide a criterion that defines the minimal model, and can separate

the underlying patterns from the noise, given that enough data is available.

Firstly, two different representations that are used for the matching functions

are introduced. Then, the four regression tasks, their aim, and the found results

are described, one by one.

O

8.3.1

Representations

The two representations that are going to be used are matching by radial-bases

functions, and matching by soft intervals. Starting with matching by radial-basis

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home