Information Technology Reference
In-Depth Information
clusters, like Fig 3.b. Fig. 3a shows a true clustering. Since the well separated cluster
in the top left corner is repeated several times (90% repetition) in partitionings of the
reference set, it has to acquire a great stability value (but not equal to 1), however it
acquires the stability value of 1. On account of the two clusters in right hand of
Fig. 3a are relatively jointed and sometimes they are not recognized in reference set as
well, they acquire less stability value. Fig. 3.b shows a spurious clustering which the
two right clusters are incorrectly merged. Since a fixed number of clusters are forced
to the base algorithm, consequently the top left cluster is divided into two clusters.
Here the drawback of the stability measure is significantly appeared. Although it is
obvious that this partition and correspondingly the right big cluster is rarely appeared
in reference set (10% repetition), the stability of the right big cluster is evaluated
equal to 1. Since the NMI is a symmetric equation, the stability of the top left cluster
in Fig 3.a is exactly equal to the big right cluster in Fig 3.b; however they are repeated
90% and 10% respectively. In other word, when two clusters are complement of each
other, their stabilities are always equal. This drawback occurs when the number of
positive clusters in the considered partition of reference set is greater than 1. It means
when the cluster C* is obtained by merging two or more clusters, the unwelcome
results in stability value is occurred.
Cluster1
Cluster2
Cluster3
Cluster4
Cluster4
Cluster1
Cluster2
Cluster3
(
Fig. 4. Evaluating the APMM criterion for cluster C 1 from clustering (a) with respect to
clustering (b), with k =4.
Here, a new criterion is proposed which can solve this problem. Assume that the
problem is evaluating the APMM criterion for cluster C 1 in Fig. 4a with respect to
clustering obtained in Fig. 4b.
The main idea in this method is to eliminate the symmetricalness which exists in
NMI equation. In this approach, except the cluster C 1 all other clusters in P a are taken
out. Also, all clusters in P b which are not included the samples of this cluster are
eliminated. In the next step, the other samples which are not in C 1 of P a , are removed
from clusters in P b (from the clusters which include some of these samples). This
process is depicted in Fig. 5.
Search WWH ::




Custom Search