Information Technology Reference
In-Depth Information
This algorithm assumes processing of both hyperboxes and point-type data. To make
it possible, new data are characterized by 2
n values in comparison to original data.
The first n attributes describe minimal, whereas the following n maximal values for
every dimension. To assure topological ”compatibility” point-type data and hyperboxes
dimensionality of the data is doubled initially. Computational complexity of this algo-
rithm is O ( N 3 ) . However, in every step of the method, the size of data is decreased by
1, which in practice reduces the general complexity significantly.
·
3
Clustering Algorithms
Clustering methods can be divided into two main categories: partitioning and hierar-
chical [5]. It is connected with the form of results achieved. There are also algorithms
proposed in recent 20 years creating a new category - density-based techniques, where
groups are areas of high data density [4]. The main advantages of them are: automatic
evaluation of the number of groups and ability to detect their arbitrary shapes. One
of such techniques is SOSIG [11], which has all the merits mentioned before, but is
characterized by high computational complexity.
3.1
Partitioning Algorithms
Partitioning algorithms determine cluster centers to optimize a criterion function, which
is the most common the sum-squared-error function (see Equation 3).
nc
2 ,
E =
|
x
c i |
(3)
i =1
x
C i
where c i denotes the center of cluster C i . Partitioning methods are based on iterative
process relocating objects among the groups to minimize the value of the criterion func-
tion. The most popular partitioning method is k-means [5], dividing data into required
k groups.
All partitioning methods suffer from some difficulties. Among other things, the num-
ber of clusters is required a priori. The form of criterion function limits the shapes of
detected groups to spherical shapes. Additionally, there are problems when clusters are
varied in size. However, time complexity of partitioning algorithms is very low - O ( n ) ,
which makes them the most often used methods in exploration of large databases.
3.2
Hierarchical Algorithms
Hierarchical clustering algorithms [4] create a kind of cluster tree, called dendrogram,
showing relationship among the objects. Grouping into the desired number of clusters is
achieved by cutting the dendrogram at an appropriate level. On account of the method
of distance calculation between clusters they can be divided mainly into single-link
(hsl) and complete-link (hcl) approaches. In hsl method dissimilarity between groups is
calculated as distance between two nearest points from the groups. On the contrary, in
hcl approach the dissimilarity is measured as the longest distance between points from
 
Search WWH ::




Custom Search