Advantages of Information Granulation in Clustering Algorithms - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

where

R ij = stdev ( C i )+ stdev ( C j )

d ( C i ,C j )

(10)

where stdev ( C i ) ( stdev ( C j ) ) denotes standard deviation of a cluster C i ( C j , respec-

tively). The standard deviation of a cluster i is given by Equation 11.

( d ( x, x )) 2

stdev ( C i )=

(11)

C i |

x∈C i

where x is a centroid of the cluster and d ( x, x ) is an Euclidean distance between the

point x and the centroid x .

DB index measures the average similarity between each cluster and its most similar

one, thus it is desirable to minimize this value.

When there is appropriate partitioning available, external validity measures can be

used. External indices take into account a membership of points belonging to the gen-

erated ( C ) and compared ( P ) structure [3]. One example is Rand statistic ( R ), which

has values between 0 and 1. High values of this index indicate great similarity between

C and P .

Let U be a set of objects U =

{

x 1 ,...,x n }

and original C and compared partition-

ing P are composed of r clusters - C =

{

c 1 ,...,c r }

and P =

{

p 1 ,...,p r }

. Rand

index is defined by Equation 12:

a + b

a + b + c + d

R =

(12)

where a , b , c and d are defined as follows:

- a is the number of pairs of elements in U which are in the same set in C and in the

same set in P ,

- b - the number of pairs of elements in U which are in different sets in P and in

different sets in C ,

- c - the number of pairs of elements in U which are in the same set in P and in

different sets in C ,

- d is the number of pairs of elements in U which are in different sets in P and in the

same set in C .

Experiments

The experiments focus on comparing results of detecting groups in two approaches:

when data are points and hyperboxes. There are the following algorithms used: k-

means, hcl, hsl and SOSIG. For methods, which require a number of groups as a pa-

rameter there are given original values from Table 1. In every case the following are

compared: time of clustering (Table 3) and values of validity indices (Tables 4 and 5).

In case of SOSIG algorithm, because of its ability to detect a number of clusters, the

numbers of detected groups are also examined (Table 2). The interpretability of clus-

terings created on the basis of SOSIG results has also been taken into consideration

(Tables 6 and 7).

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home