Information Technology Reference
In-Depth Information
where
R ij = stdev ( C i )+ stdev ( C j )
d ( C i ,C j )
(10)
where stdev ( C i ) ( stdev ( C j ) ) denotes standard deviation of a cluster C i ( C j , respec-
tively). The standard deviation of a cluster i is given by Equation 11.
1
( d ( x, x )) 2
stdev ( C i )=
(11)
|
C i |
x∈C i
where x is a centroid of the cluster and d ( x, x ) is an Euclidean distance between the
point x and the centroid x .
DB index measures the average similarity between each cluster and its most similar
one, thus it is desirable to minimize this value.
When there is appropriate partitioning available, external validity measures can be
used. External indices take into account a membership of points belonging to the gen-
erated ( C ) and compared ( P ) structure [3]. One example is Rand statistic ( R ), which
has values between 0 and 1. High values of this index indicate great similarity between
C and P .
Let U be a set of objects U =
{
x 1 ,...,x n }
and original C and compared partition-
ing P are composed of r clusters - C =
{
c 1 ,...,c r }
and P =
{
p 1 ,...,p r }
. Rand
index is defined by Equation 12:
a + b
a + b + c + d
R =
(12)
where a , b , c and d are defined as follows:
- a is the number of pairs of elements in U which are in the same set in C and in the
same set in P ,
- b - the number of pairs of elements in U which are in different sets in P and in
different sets in C ,
- c - the number of pairs of elements in U which are in the same set in P and in
different sets in C ,
- d is the number of pairs of elements in U which are in different sets in P and in the
same set in C .
5
Experiments
The experiments focus on comparing results of detecting groups in two approaches:
when data are points and hyperboxes. There are the following algorithms used: k-
means, hcl, hsl and SOSIG. For methods, which require a number of groups as a pa-
rameter there are given original values from Table 1. In every case the following are
compared: time of clustering (Table 3) and values of validity indices (Tables 4 and 5).
In case of SOSIG algorithm, because of its ability to detect a number of clusters, the
numbers of detected groups are also examined (Table 2). The interpretability of clus-
terings created on the basis of SOSIG results has also been taken into consideration
(Tables 6 and 7).
Search WWH ::




Custom Search