Information Technology Reference
In-Depth Information
Chapter 4
Hierarchical Clustering from ICA
Mixtures
4.1 Introduction
In this chapter, we present a procedure for clustering (unsupervised learning) data
from a model based on mixtures of independent component analyzers. Clustering
techniques have been extensively studied in many different fields for a long time.
They can be organized in different ways according to several theoretical criteria.
However, a rough widely accepted classification of these techniques is: hierar-
chical and partitional clustering; see for instance [ 1 ]. Both clustering categories
provide a division of the data objects. The hierarchical approach also yields a
hierarchical structure from a sequence of partitions performed from singleton
clusters to a cluster including all data objects (agglomerative or bottom-up strat-
egy) or vice versa (divisive or top-down strategy). This structure consists of a
binary tree (dendrogram) whose leaves are the data objects and whose internal
nodes represent nested clusters of various sizes. The whole node of the dendro-
gram represents the whole data set. The internal nodes describe the extent that the
objects are proximal to each other; and the height of the dendrogram usually
represents the distance between each pair of objects or clusters, or an object and a
cluster.
A review of the clustering algorithms should include the following types of
algorithms: hierarchical; squared error-based (vector quantization); mixture density-
based; graph theory-based; combinatorial search technique-based; fuzzy; neural
network-based; and kernel-based. In addition, some techniques have been developed
to tackle sequential, large-scale, and high-dimensional data sets [ 2 ]. The advantages
of hierarchical clustering include embedded flexibility regarding the level of
granularity and the ability to deal with different types of attributes. The disadvantages
of hierarchical clustering are the difficulty of scaling up to large data sets, the
vagueness of stopping criteria, and the fact that most clustering algorithms cannot
recover from poor choices when merging or splitting data points [ 3 ].
Search WWH ::




Custom Search