Information Technology Reference
In-Depth Information
assumptions when learning with unlabelled data has been emphasized for Bayesian
network classifiers. When the assumed probabilistic model does not match the true
data generating distribution, using unlabelled data can be detrimental to the
classification accuracy [ 31 ]. In addition, there is interest in studying the data that
do not match underlying generative models as outlier data, whether those data
should be retained (represent phenomena of interest) or rejected (they are mis-
takes) [ 32 ].
In this section, we explore the behaviour of the proposed ICA mixture-based
classifier in both the training stage and the classification stage, depending on which
data are labelled. Thus, two kinds of data were discerned for labelling the data that
better fit into the ICA model and the data that did not adapt as well into the model.
The latter data are considered as a kind of outlier. In addition, ICA mixtures were
divided into two groups, depending on the strength of the membership of the data
to the classes, i.e., the values of posterior probability p ð C k = x Þ: These groups were
called high-fuzziness and low-fuzziness ICA mixtures, applying a term used in
fuzzy classification literature [ 33 ].
A total of 200 Monte Carlo simulations were performed to generate different
ICA mixture datasets with three classes, 400 observation vectors per class, two
Laplacian distributed sources with a sharp peak at the bias and heavy tails; and 200
observation vectors were used for pdf estimation. For each dataset, 16 cases were
obtained varying the following parameters for the training data: (i) Supervision
ratio = 0.1,0.3,0.5,0.7. (ii) ratio of labelled outliers = 0,1/3,2/3,1 (number of
labelled outlier observation vectors/total number of outlier observation vectors).
The outlier observation vectors were mixed with Type K noise (m ¼ 3) [ 34 ]to
highlight their difference with the ICA data model. The data were divided as
follows: 70 % for training and 30 % for testing. The parameters W k and b k of the
generative data model for a dataset, the corresponding s k ; and p ð C k = x Þ were used
as reference to estimate the accuracy in classification and the lack of adjustment
compared to the corresponding parameters obtained for the 16 cases of semi-
supervised training for that dataset.
The fuzziness F for a dataset was calculated as the unity minus the mean of the
maxima of the posterior probability for each class,
F ¼ 1 max ð p ð C k = x ÞÞ
k ¼ 1...K
ð 3 : 15 Þ
Values of F range from 0 (no fuzziness data mixture) to 1 1 = K (completely
fuzziness data mixture). When F ¼ 0 ; for every observation vector the posterior
probability p ð C k = x Þ is 1 for a class and 0 for the other classes. When F ¼ 1 1 = K ;
the posterior probabilities are equally-probable for every class and observation
vector. The lack of adjustment between the reference model for a dataset and a case
of semi-supervised training for that dataset was measured using the Kullback-
Leibler (KL) divergence D KL [ 35 ],
D KL ¼ Z q ð S ; w = X Þ log q ð S ; w = X Þ
p ð S ; w = X Þ dwdS
ð 3 : 16 Þ
Search WWH ::




Custom Search