Information Technology Reference
In-Depth Information
Fig. 4.4
Measure of clustering quality as a function of the merging level
performance of ICAMM parameter estimation, which is the basis of the hierarchical
clustering, 400 Montecarlo simulations were made with toy data such as the
distributions shown in the examples of Fig. 4.1 . The algorithm was executed
varying the number of mixtures, kinds of source distributions, and supervision ratio
in the learning stage of the parameters.
Figure 4.5 shows the classification error for different numbers of ICA mixtures
(hierarchical levels). Note that the percentage of error rapidly decreases as long as
there is a little supervision and the densities are estimated properly. For the
unsupervised case, the error percentage is higher when the number of ICA mix-
tures is greater, with the highest being (0.15) for K = 6 and the lowest being
(0.096) for K = 3. However, this difference in classification accuracy is less
significant for all the semi-supervised cases. This shows the advantage of
employing a semi-supervised and non-parametric density estimation algorithm in
the first step of the hierarchical algorithm.
Figure 4.6 shows an example of application in simulations of the proposed
hierarchical algorithm. The ICA mixtures are Laplacian source distributions and
combinations of different source distributions (Laplacian, uniform, K-type
m = 10, Rayleigh) for the first and second example, respectively. The dendro-
grams estimated by the proposed method and the single linkage method are
included for comparison. There are significant differences between the two
methods of agglomerative clustering. The proposed method is based on probabi-
listic distances between groups of data, while the single linkage method uses the
distance between pairs of data objects (for these examples, we used Euclidean
distance). In addition, the first method employs the parameters of the underlying
model estimated at the lowest hierarchical level, while the second one does not.
This could be important depending on the data structure, i.e., if the data follows
and ICA mixture model. These differences determine variations in the dendro-
grams delivered for these two methods. Thus, it is expected that the merging
Search WWH ::




Custom Search