Information Technology Reference
In-Depth Information
combine the primary results [25], [13] and [15]. Here, the difference is the absence of
some clusters in primary partitions. Since the original EAC method [13] cannot truly
identify the pairwise similarity while there is just a subset of clusters, in this paper a
new method to construct the co-association matrix is presented which hereafter called:
Extended Evidence Accumulation Clustering method, EEAC. Finally, the single-link
method is employed to extract the final clusters from this matrix.
3.1 Cluster Evaluation Using APMM Criteria
A stable cluster is the one that has a high likelihood of recurrence across multiple
applications of the clustering method. The stable clusters are usually preferable, since
they are robust with respect to minor changes in the dataset [18].
Now assume that we want to compute the stability of cluster
C
i
. In this method first
a set of partitionings over resampled datasets is provided which is called the reference
set. In this notation
D
is resampled data and
P(D)
is a partitioning over
D
. Now, the
problem is: “How many times is the cluster
C
i
repeated in the reference partitions?”
Denote by NMI(
C
i
,P(D)
), the Normalized Mutual Information between the cluster
C
i
and a reference partition
P(D)
. The most previous works only compare a
partition
with another partition
[25]. However, the stability used in [18] evaluates the
similarity between a
cluster and a partition
by transforming the cluster
C
i
to a
partition and employing common
partition to partition
methods. To illustrate this
method let
P
1
=
P
a
={
C
i
,D/C
i
}
be a partition with two clusters, where
D/C
i
denotes the
set of data points in
D
that are not in
C
i
.
Then we may compute a second partition
P
2
=
P
b
=
{C
*
,D/C
*
}
, where
C
*
denotes the
union of all “positive” clusters in
P
(
D
) and others are in
D/C
*
. A cluster
C
j
in
P
(
D
) is
positive if more than half of its data points are in
C
i
.
Now, define NMI(
C
i
,P(D)
) by NMI(
P
a
,P
b
) which is calculated as [12]:
ab
ij
k
k
⎛
⎞
n
.
n
a
b
∑∑
⎜
⎜
⎝
⎟
⎟
⎠
ab
ij
−
2
n
log
a
i
b
j
n
.
n
i
==
11
j
a
b
NMI
(
P
,
P
)
=
(1)
k
k
⎛
b
j
⎞
⎛
a
i
⎞
n
n
a
b
∑
∑
⎜
⎝
⎟
⎠
a
i
⎜
⎜
⎝
⎟
⎟
⎠
b
j
n
log
+
n
log
⎜
⎟
n
n
=
1
=
1
i
j
ab
ij
n
where
n
is the total number of samples and
denotes the number of shared patterns
between clusters
C
a
i
∈
P
a
and
C
a
j
∈
P
a
.
This is done between the cluster
C
i
and all partitions available in the reference set.
Fig. 2 shows this method.
NMI
i
in Fig. 2 shows the stability of cluster
C
i
with respect to the
i
-th partition in
reference set. The total stability of cluster
C
i
is defined as:
M
1
∑
=
Stability
(
C
)
=
NMI
(2)
i
i
M
i
1
where
M
is the number of partitions available in reference set. This procedure is
applied to each cluster of every primary partition.
Search WWH ::
Custom Search