Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Classifier Generation and Training. For each experimental run K classi-

fiers are created, where K depends on the experiment. Each classifier matches

an interval [ l k ,u k ] of the input space, that is m k ( i n )=1if l k

u k ,and

m k ( i n ) = 0 otherwise. Even coverage such that about an equal number of classi-

fiers matches each input is achieved by splitting the input space into 1000 bins,

and localising the classifiers one by one in a “Tetris”-style way: the average width

in bins of the matched interval of a classifier needs to be 1000 c/K such that on

average c classifiers match each bin. The interval width of a new classifier is sam-

pled from B (1000 , (1000 c/K ) / 1000), where B ( n, p ) is a binomial distribution for

n trials and a success probability of p . The minimal width is limited from below

by 3, such that each classifier is at least trained on 3 observations. The new clas-

sifier is then localised such that the number of classifiers that match the same

bins is minimal. If several such locations are possible, one is chosen uniformly at

random. Having positioned all K classifier, they are trained by batch learning

using (5.9) and (5.13). The number of classifiers that match each input is in all

experiments set to c =3.

≤

i n

≤

Mixing Models. The performance of the following mixing models is compared:

the IRLS algorithm ( IRLS ) and its least-squares approximation ( LS )onthege-

neralised softmax function with φ ( x ) = 1 for all x , the inverse variance ( InvVar )

heuristics, the mixing by confidence ( Conf ) and mixing by maximum confidence

( MaxConf ) heuristics, and mixing by XCS(F) ( XCS ). When classifiers model

straight lines, the IRLS algorithm ( IRLSf ) and its least-squares approximation

( LSf ) with a transfer function φ ( x )=(1 ,i n ) T are used additionally, to allow

for an additional soft-linear partitioning beyond the realm of matching (see the

discussion in Sect. 4.3.5 for more information). Training by the IRLS algorithm

is performed incrementally according to Sect. 6.1.1, until the change in cross-

entropy (6.6) between two iterations is smaller than 0 . 1%. The least-squares

approximation is performed repeatedly in batches rather than as described in

Sect. 6.1.2, by using (5.9) to find the v k 's that minimise (6.15). Convergence is

assumed when the change in (6.6) between two batch updates is smaller than

0 . 05% (this value is smaller than for the IRLS algorithm, as the least squares

approximation takes smaller steps). The heuristic mixing models do not require

any separate training and are applied such as described in Sect. 6.2. For XCS,

the standard setting 0 =0 . 01, α =0 . 1, and ν = 5, as recommended by Butz

and Wilson [57], are used.

Evaluating the Performance. Having generated and trained a set of classi-

fiers, each mixing model is trained with the same set to make their performance

directly comparable. It is measured by evaluating (6.1), where p ( y n | x n , θ k )is

computed by (5.3), using the same observations that the classifiers where trai-

ned on, and the g k 's are provided by the different mixing models. As the IRLS

algorithm maximises the data likelihood (6.1) when using the generalised soft-

max function as the mixing model, its performance is used as a benchmark that

the other models are compared to. Their performance is reported as a fraction

of the likelihood of the IRLS algorithm with φ ( x )=1.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home