Information Technology Reference
In-Depth Information
Advantages of Information Granulation
in Clustering Algorithms
Urszula Kuzelewska
Faculty of Computer Science, Bialystok University of Technology
Wiejska 45a, 15-521 Bialystok, Poland
u.kuzelewska@pb.edu.pl
http://www.wi.pb.edu.pl
Abstract. Clustering is a part of data mining domain. Its task is to identify
groups consisting of similar data objects according to defined similarity crite-
rion. One of the most common problems in this field is the time complexity of
algorithms. Reducing the time of processing is particularly important due to con-
stantly growing size of present databases. Granular computing (GrC) techniques
create and/or process data portions, called granules, identified with regard to simi-
lar description, functionality or behavior. An interesting characteristic of granular
computation is the ability to create multi-perspective view of data depending on
the resolution level required. Data granules identified on different levels of res-
olution form a hierarchical structure expressing relations between the objects of
data. Granular computing includes methods from various areas with the aim of
supporting human in better understanding of analyzed problems and generated
results.
The proposed solution of clustering is based on processing granulated data in
the form of hyperboxes. The results are compared with the clustering of point-
type data with regard to complexity, quality and interpretability.
Keywords: Knowledge discovery, Data mining, Information granulation, Gran-
ular computing, Clustering, Hyperboxes.
1
Introduction
Cluster analysis is organizing a collection of patterns (usually represented as a vector
of measurements, or a point in a multi-dimensional space) into clusters based on their
similarity [5]. The points within one cluster are more similar to one another than to
any other points from the remaining clusters. The term ”similar” can be different for
various clustering algorithms and the type of data used, but usually means a reverse of a
distance between the points, Euclidean for continuous attributes. Partitioning methods
have had wide applications, among others, in pattern recognition, image processing,
statistical data analysis and knowledge discovery.
There are many challenges met by clustering methods such as: differences in cluster
size or density, arbitrary shapes of clusters, presence of noise or outliers and detecting
data of no clusters present [4]. Another issue when discussing clustering algorithms is
time complexity. This is particularly important when dealing with large databases.
 
Search WWH ::




Custom Search