Advantages of Information Granulation in Clustering Algorithms - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

Advantages of Information Granulation

in Clustering Algorithms

Urszula Kuzelewska

Faculty of Computer Science, Bialystok University of Technology

Wiejska 45a, 15-521 Bialystok, Poland

u.kuzelewska@pb.edu.pl

http://www.wi.pb.edu.pl

Abstract. Clustering is a part of data mining domain. Its task is to identify

groups consisting of similar data objects according to defined similarity crite-

rion. One of the most common problems in this field is the time complexity of

algorithms. Reducing the time of processing is particularly important due to con-

stantly growing size of present databases. Granular computing (GrC) techniques

create and/or process data portions, called granules, identified with regard to simi-

lar description, functionality or behavior. An interesting characteristic of granular

computation is the ability to create multi-perspective view of data depending on

the resolution level required. Data granules identified on different levels of res-

olution form a hierarchical structure expressing relations between the objects of

data. Granular computing includes methods from various areas with the aim of

supporting human in better understanding of analyzed problems and generated

results.

The proposed solution of clustering is based on processing granulated data in

the form of hyperboxes. The results are compared with the clustering of point-

type data with regard to complexity, quality and interpretability.

Keywords: Knowledge discovery, Data mining, Information granulation, Gran-

ular computing, Clustering, Hyperboxes.

1

Introduction

Cluster analysis is organizing a collection of patterns (usually represented as a vector

of measurements, or a point in a multi-dimensional space) into clusters based on their

similarity [5]. The points within one cluster are more similar to one another than to

any other points from the remaining clusters. The term ”similar” can be different for

various clustering algorithms and the type of data used, but usually means a reverse of a

distance between the points, Euclidean for continuous attributes. Partitioning methods

have had wide applications, among others, in pattern recognition, image processing,

statistical data analysis and knowledge discovery.

There are many challenges met by clustering methods such as: differences in cluster

size or density, arbitrary shapes of clusters, presence of noise or outliers and detecting

data of no clusters present [4]. Another issue when discussing clustering algorithms is

time complexity. This is particularly important when dealing with large databases.

Search WWH ::

Custom Search

Home