Information Technology Reference
In-Depth Information
Classifier Generation and Training. For each experimental run K classi-
fiers are created, where K depends on the experiment. Each classifier matches
an interval [ l k ,u k ] of the input space, that is m k ( i n )=1if l k
u k ,and
m k ( i n ) = 0 otherwise. Even coverage such that about an equal number of classi-
fiers matches each input is achieved by splitting the input space into 1000 bins,
and localising the classifiers one by one in a “Tetris”-style way: the average width
in bins of the matched interval of a classifier needs to be 1000 c/K such that on
average c classifiers match each bin. The interval width of a new classifier is sam-
pled from B (1000 , (1000 c/K ) / 1000), where B ( n, p ) is a binomial distribution for
n trials and a success probability of p . The minimal width is limited from below
by 3, such that each classifier is at least trained on 3 observations. The new clas-
sifier is then localised such that the number of classifiers that match the same
bins is minimal. If several such locations are possible, one is chosen uniformly at
random. Having positioned all K classifier, they are trained by batch learning
using (5.9) and (5.13). The number of classifiers that match each input is in all
experiments set to c =3.
i n
Mixing Models. The performance of the following mixing models is compared:
the IRLS algorithm ( IRLS ) and its least-squares approximation ( LS )onthege-
neralised softmax function with φ ( x ) = 1 for all x , the inverse variance ( InvVar )
heuristics, the mixing by confidence ( Conf ) and mixing by maximum confidence
( MaxConf ) heuristics, and mixing by XCS(F) ( XCS ). When classifiers model
straight lines, the IRLS algorithm ( IRLSf ) and its least-squares approximation
( LSf ) with a transfer function φ ( x )=(1 ,i n ) T are used additionally, to allow
for an additional soft-linear partitioning beyond the realm of matching (see the
discussion in Sect. 4.3.5 for more information). Training by the IRLS algorithm
is performed incrementally according to Sect. 6.1.1, until the change in cross-
entropy (6.6) between two iterations is smaller than 0 . 1%. The least-squares
approximation is performed repeatedly in batches rather than as described in
Sect. 6.1.2, by using (5.9) to find the v k 's that minimise (6.15). Convergence is
assumed when the change in (6.6) between two batch updates is smaller than
0 . 05% (this value is smaller than for the IRLS algorithm, as the least squares
approximation takes smaller steps). The heuristic mixing models do not require
any separate training and are applied such as described in Sect. 6.2. For XCS,
the standard setting 0 =0 . 01, α =0 . 1, and ν = 5, as recommended by Butz
and Wilson [57], are used.
Evaluating the Performance. Having generated and trained a set of classi-
fiers, each mixing model is trained with the same set to make their performance
directly comparable. It is measured by evaluating (6.1), where p ( y n | x n , θ k )is
computed by (5.3), using the same observations that the classifiers where trai-
ned on, and the g k 's are provided by the different mixing models. As the IRLS
algorithm maximises the data likelihood (6.1) when using the generalised soft-
max function as the mixing model, its performance is used as a benchmark that
the other models are compared to. Their performance is reported as a fraction
of the likelihood of the IRLS algorithm with φ ( x )=1.
 
Search WWH ::




Custom Search