Information Technology Reference
In-Depth Information
combine the primary results [25], [13] and [15]. Here, the difference is the absence of
some clusters in primary partitions. Since the original EAC method [13] cannot truly
identify the pairwise similarity while there is just a subset of clusters, in this paper a
new method to construct the co-association matrix is presented which hereafter called:
Extended Evidence Accumulation Clustering method, EEAC. Finally, the single-link
method is employed to extract the final clusters from this matrix.
3.1 Cluster Evaluation Using APMM Criteria
A stable cluster is the one that has a high likelihood of recurrence across multiple
applications of the clustering method. The stable clusters are usually preferable, since
they are robust with respect to minor changes in the dataset [18].
Now assume that we want to compute the stability of cluster C i . In this method first
a set of partitionings over resampled datasets is provided which is called the reference
set. In this notation D is resampled data and P(D) is a partitioning over D . Now, the
problem is: “How many times is the cluster C i repeated in the reference partitions?”
Denote by NMI( C i ,P(D) ), the Normalized Mutual Information between the cluster C i
and a reference partition P(D) . The most previous works only compare a partition
with another partition [25]. However, the stability used in [18] evaluates the
similarity between a cluster and a partition by transforming the cluster C i to a
partition and employing common partition to partition methods. To illustrate this
method let P 1 = P a ={ C i ,D/C i } be a partition with two clusters, where D/C i denotes the
set of data points in D that are not in C i .
Then we may compute a second partition P 2 = P b = {C * ,D/C * } , where C * denotes the
union of all “positive” clusters in P ( D ) and others are in D/C * . A cluster C j in P ( D ) is
positive if more than half of its data points are in C i .
Now, define NMI( C i ,P(D) ) by NMI( P a ,P b ) which is calculated as [12]:
ab
ij
k
k
n
.
n
a
b
∑∑
ab
ij
2
n
log
a
i
b
j
n
.
n
i
==
11
j
a
b
NMI
(
P
,
P
)
=
(1)
k
k
b
j
a
i
n
n
a
b
a
i
b
j
n
log
+
n
log
n
n
=
1
=
1
i
j
ab
ij
n
where n is the total number of samples and
denotes the number of shared patterns
between clusters C a i P a and C a j P a .
This is done between the cluster C i and all partitions available in the reference set.
Fig. 2 shows this method.
NMI i in Fig. 2 shows the stability of cluster C i with respect to the i -th partition in
reference set. The total stability of cluster C i is defined as:
M
1
=
Stability
(
C
)
=
NMI
(2)
i
i
M
i
1
where M is the number of partitions available in reference set. This procedure is
applied to each cluster of every primary partition.
 
Search WWH ::




Custom Search