Information Technology Reference
In-Depth Information
be approached by any method that is able to find some element in a set that
maximises some function of the elements in that set, such as simulated annealing
[217], or genetic algorithms [95, 167].
The two methods that will be described here are the ones that have been
used to test the usefulness of the optimality definition. They are conceptually
simple and not particularly intelligent, as neither of them uses any information
embedded in the probabilistic LCS model besides the value proportional to
ln p ( M|D ) to form the search trajectory through the model structure space.
Consequently, there is still plenty of room for improvement.
The reason why two alternatives are introduced is i) to emphasise the concep-
tual separation between evaluating the quality of a set of classifiers, and searching
for better ones, and ii) to show that in theory any global optimiser can be used
to perform the task of model structure search. As the aim is independent of
the search procedure, reaching this aim only depends on the compatibility of
the search procedure with the model structure space. After having introduced
the two alternatives, a short discussion in Sect. 8.2.3 deals with their differences,
and what might in general be good guidelines to improve the effectiveness of
searching for good sets of classifiers.
Note that the optimal set of classifiers strongly depends on the chosen represen-
tation for the matching functions, as we can only find solutions that we are able
to represent. Nonetheless, to keep the description of the methods representation-
independent, the discussion of representation-dependent components of the me-
thods are postponed until choosing some representation becomes inevitable; that
is, in Sect. 8.3.
8.2.1
Model Structure Search by a Genetic Algorithm
Genetic algorithms (GA) are a family of global optimisers that are conceptually
based on Darwinian evolution. The reader is expected to be familiar with their
underlying idea and basic implementations, of which good overviews are available
by Goldberg [95] and Mitchell [167].
An individual in the population that the GA operates on is defined by an LCS
model structure
,anditsfitnessisgivenbythevaluethat ModelProbability
returns for this model structure. As the genetic algorithm seeks to increase the fit-
ness of the individuals in the population, its goal is to find the model structure that
maximises p (
M
). An allele of an individual's genome is given by the represen-
tation of a single classifier's matching function, which makes the genome's length
determined by the number of classifiers of the associated model structure. As this
number is not fixed, the individuals in the population can be of variable length 2 .
M|D
2 Variable-length individuals might cause bloat, which is a common problem when using
Evolutionary Computation algorithms with such individuals, as frequently observed
in genetic programming [158]. It also plagues some Pittsburgh-style LCS that use
variable-length individuals, such as LS-1 [198] and GAssist [6], and counteracting mea-
sures have to be taken to avoid its occurrence. Here, this is not an issue, as overly com-
plex model structures will receive a lower fitness due to the preference of the applied
model selection criterion for models of low complexity.
Search WWH ::




Custom Search