Database Reference
In-Depth Information
first and fourth columns. In our visualization this corresponds to brighter
off-diagonal regions and darker block-diagonal regions in S compared to the
original S matrix. The proposed visualization technique is quite powerful
and versatile. In Figure 3.2(a) the chosen similarity behaves randomly. Con-
sequently, no strong visual difference between on- and off-diagonal regions can
be perceived with
in S . It indicates clustering is ineffective, which
is expected because there is no structure in the similarity matrix. Figure
3.2(b) is based on data consisting of pairwise almost equidistant singletons.
Clustering into two groups still renders the on-diagonal regions very bright,
suggesting more splits. In fact, this will remain unchanged until each data
point is a cluster by itself, thus revealing the singleton character of the data.
For monolithic data (Fig. 3.2(c)), many strong similarities are indicated by
an almost uniformly dark similarity matrix S . Splitting the data results in
dark off-diagonal regions in S . A dark off-diagonal region suggests that the
clusters in the corresponding rows and columns should be merged (or not be
split in the first place).
Clusion
indicates that these data are actually one
large cluster. In Fig. 3.2(d), the gray-level distribution of S exposes bright
and dark pixels, thereby recommending it should be split. In this case, k =2
apparently is a very good choice (and the clustering algorithm worked well)
because in S on-diagonal regions are uniformly dark and off-diagonal regions
are uniformly bright.
This induces an intuitive mining process that guides the user to the
“right” number of clusters. Too small a k leaves the on-diagonal regions inho-
mogeneous. On the contrary, growing k beyond the natural number of clusters
will introduce dark off-diagonal regions. Finally,
Clusion
can be used to vi-
sually compare the appropriateness of different similarity measures. Let us
assume, for example, that each row in Fig. 3.2 illustrates a particular way of
defining similarity for the same data set. Then
Clusion
makes visually ap-
parent that the similarity measure in (d) lends itself much better to clustering
than the measures illustrated in rows (a), (b), and (c).
An interactive tool that facilitates exploration of the merge/split process
can be experienced at http://lans.ece.utexas.edu/ strehl/ .
Clusion
3.4.3 Comparison
Clusion
gives a relationship-centered view, as contrasted with common pro-
jective techniques, such as the selection of dominant features or optimal linear
projections (PCA), which are object-centered.In
, the actual features
are transparent, instead, all pairwise relationships, the relevant aspect for the
purpose of clustering, are displayed.
Figure 3.3 compares
Clusion
with other popular visualizations. In Fig. 3.3(a)
parallel axis, PCA projection, CViz (projection through plane defined by cen-
troids of clusters 1, 2, and 3), as well as
Clusion
succeed in visualizing the
IRIS data. Membership in cluster 1/2/3 is indicated by colors red/blue/green
(parallel axis), colors red/blue/green and shapes
Clusion
/
×
/+ (PCA and CViz),
Search WWH ::




Custom Search