Statistical Clustering Analysis: An Introduction - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

Chapter 5

Statistical Clustering Analysis: An Introduction

Hang Zhang

Department of Industrial Engineering

Arizona State University

Tempe, Arizona 85287-5906, USA

hang.zhang@asu.edu

Clustering analysis is to segment objects in a dataset into meaningful subsets

such that objects with high similarity are segmented into the same subset, and

objects with low similarity are segmented into different subsets. This chapter

introduces three fundamental but core topics in clustering analysis: the definition

of similarity and dissimilarity measure, the clustering algorithm, and determining

the number of clusters. For each topic, we introduce the ones that are most

popularly used, and emphasize their statistical backgrounds.

5.1. Introduction

Clustering analysis is to group objects in a dataset into subsets such that objects

with high similarity are segmented into the same subset and objects with low

similarity are segmented into different subsets. The grouping results, subsets, are

called clusters.

A dataset to be clustered consists of a collection of objects. An object may be

characterized by a vector of feature values. For example, in a dataset of fish, an

object is just an observation of fish represented by a vector of features such as its

weight, length, color, etc. We name clustering these objects as observation cluster-

ing. An object may also be characterized by a sequence of observations, e.g., the

time series of a stock price in one year. If we want to find the segmentation such

that stocks having high dependency are grouped into the same cluster, and stocks

with low dependency into different clusters, we take each sequence as an object.

Specifically, we call clustering these objects (sequences) as variable clustering.

One question comes up with the definition of clustering analysis: what is a

cluster. The answer to this question varies in different applications of clustering

101

Search WWH ::

Custom Search

Home