Advantages of Information Granulation in Clustering Algorithms - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

This algorithm assumes processing of both hyperboxes and point-type data. To make

it possible, new data are characterized by 2

n values in comparison to original data.

The first n attributes describe minimal, whereas the following n maximal values for

every dimension. To assure topological ”compatibility” point-type data and hyperboxes

dimensionality of the data is doubled initially. Computational complexity of this algo-

rithm is O ( N 3 ) . However, in every step of the method, the size of data is decreased by

1, which in practice reduces the general complexity significantly.

·

3

Clustering Algorithms

Clustering methods can be divided into two main categories: partitioning and hierar-

chical [5]. It is connected with the form of results achieved. There are also algorithms

proposed in recent 20 years creating a new category - density-based techniques, where

groups are areas of high data density [4]. The main advantages of them are: automatic

evaluation of the number of groups and ability to detect their arbitrary shapes. One

of such techniques is SOSIG [11], which has all the merits mentioned before, but is

characterized by high computational complexity.

3.1

Partitioning Algorithms

Partitioning algorithms determine cluster centers to optimize a criterion function, which

is the most common the sum-squared-error function (see Equation 3).

nc

2 ,

E =

|

x

−

c i |

(3)

i =1

x

∈

C i

where c i denotes the center of cluster C i . Partitioning methods are based on iterative

process relocating objects among the groups to minimize the value of the criterion func-

tion. The most popular partitioning method is k-means [5], dividing data into required

k groups.

All partitioning methods suffer from some difficulties. Among other things, the num-

ber of clusters is required a priori. The form of criterion function limits the shapes of

detected groups to spherical shapes. Additionally, there are problems when clusters are

varied in size. However, time complexity of partitioning algorithms is very low - O ( n ) ,

which makes them the most often used methods in exploration of large databases.

3.2

Hierarchical Algorithms

Hierarchical clustering algorithms [4] create a kind of cluster tree, called dendrogram,

showing relationship among the objects. Grouping into the desired number of clusters is

achieved by cutting the dendrogram at an appropriate level. On account of the method

of distance calculation between clusters they can be divided mainly into single-link

(hsl) and complete-link (hcl) approaches. In hsl method dissimilarity between groups is

calculated as distance between two nearest points from the groups. On the contrary, in

hcl approach the dissimilarity is measured as the longest distance between points from

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home