Information Technology Reference
In-Depth Information
concise representation of a system's behaviour. Most clustering algorithms are
unsupervised and do not rely on assumptions common to statistical classification
methods, such as the statistical data distribution. They are, therefore, very
appropriate for situations where little a priori knowledge exists. The data
classification capability of clustering algorithms has been widely exploited in
pattern recognition, image processing, and nonlinear system modelling. In what
follows, we will introduce the reader to the clustering theory and present some
fuzzy clustering algorithms, based on the c-means functional . For an in-depth
treatment of fuzzy clustering, readers may refer to the classical monograph by Jain
and Dube (1988); for an overview of different clustering algorithms, refer to
Bezdek and Pal (1992), Babuška (1996), and Setnes (2000).
4.7.1.1 Elements of Clustering Theory
Clustering techniques essentially try to group data samples in feature space and
they form the basis of many classification and system modelling algorithms. They
are applied to data that could be numerical (quantitative), qualitative (categorical),
or a mixture of both. Our attention here will be focused on clustering of
quantitative data, which might be observations of some physical process, such as
time series data. It will be supposed that each observation consists of n variables,
grouped into an n -dimensional column vector
>
@
T
n
Z
ZZ
,
,
"
,
Z
,
Z
\ .
s
1
s
2
s
ns
s
A set of N observations is described by
^
`
Z
s Zs
1, 2,
" ,
,
N
and is represented by nN
u
pattern matrix Z :
"
"
# # #
"
ª
º
zz
z
11
12
1
N
«
»
zz
z
«
»
21
22
2
N
.
Z
«
»
«
»
«
zz
z
»
¬
¼
nn
12
N
The rows and columns of the pattern matrix , in pattern recognition terminology,
are respectively called features (or attributes ) and patterns (or objects ). The
pattern matrix Z is also called the data matrix , and in control engineering, for
example, each row of a data matrix may represent one of the process variables like
pressure, temperature, flow, etc. , whereas the columns may indicate the time point
of sampling.
Clusters are usually defined as groups of objects mutually more similar within
the same groups than with the members of other clusters (Bezdek,1981; Jain and
Dube, 1988), whereby the term “similarity” should be understood as mathematical
similarity, measured in some well-defined sense. In metric spaces, similarity is
Search WWH ::




Custom Search