Biology Reference
In-Depth Information
analysis. For example, in image segmentation, clusters are regions in the image,
each of which is considered to ”be homogeneous with respect to some image
property of interests such as intensity, color or texture” [16]. In variable clustering,
usually a cluster is a group of sequences that are associated with each other. For
instance, clustering the spike train data, sequences of the spiking time stamps, of
multiple brain neurons identifies the associations among brain neurons. In some
other applications, a cluster may be considered as a sample from an underlying
probabilistic distribution. In the example of fish data, objects in a cluster can be
considered as a random sample from a multivariate distribution.
The goal of clustering analysis also varies with applications. In image pro-
cessing, the purposes of clustering analysis mostly include detecting edges of ob-
jects [28], and image segmentation. Image segmentation is a common problem
in image processing. It involves taking an image and identifying particular fea-
tures, such as the figure of human beings or a vehicle, for further purpose such
as movement tracking. If properly implemented, clustering analysis can automat-
ically divide an image into similar regions. In some other applications, clustering
analysis may be to refer the underlying distributions generating the clusters, such
as the number of underlying distributions and the parameters of each distribution.
The readers should be noted about the difference between clustering and clas-
sification. Classification is also called supervised learning. Given a collection of
labeled objects, we derive the discrimination model which is later used to label
a new object without a class label. Clustering, also called unsupervised learning,
is to group a collection of unlabeled objects into meaningful clusters. After clus-
tering, objects in the same cluster are given the same labels. Objects in different
clusters are labeled differently.
In this chapter, we introduce clustering analysis mostly from the perspective of
multivariate statistics. For the convenience of the readers, we also introduce some
heuristic methods in case the readers may need them in some applications where
it is not proper to assume the multivariate probability distribution. We focus on
two basic aspects of clustering analysis: clustering and determining the number
of clusters.
As in the definition of clustering analysis, measure of similarity (or dissimi-
larity) plays an important role. Before we go into those two topics, we describe
the measures of similarity (or dissimilarity) between two objects.
Before moving forward, we first give the notations which will be used in the
remainder of this chapter.
We denote the dataset to be clustered as X ,which
is an N
P matrix where P is the number of features (variables), and N is the
number of observations. Here, X stands for the transpose of X . In observation
clustering, observation i is characterized by the i th row of X , denoted as x i ,
×
Search WWH ::




Custom Search