Database Reference
In-Depth Information
Distribution of Similarities from 0 to 1
Original (S)
Seriated (S´)
Overall
Within Cluster 1
Within Cluster 2
Between Clusters
(a)
(b)
(c)
(d)
Fig. 3.2. Illustrative Clusion patterns in original order and seriated using opti-
mal bipartitioning are shown in the left two columns. The right four columns show
corresponding similarity distributions. In each example there are 50 objects: (a) no
natural clusters (randomly related objects), (b) set of singletons (pairwise near or-
thogonal objects), (c) one natural cluster (unimodal Gaussian), and (d) two natural
clusters (mixture of two Gaussians).
red horizontal and vertical lines are used to show the divisions into the rect-
angular regions. 4 Visualizing similarity space in this way can help to quickly
get a feel for the clusters in the data. Even for a large number of points, a
sense for the intrinsic number of clusters k in a data set can be gained.
Figure 3.2 shows
output in four extreme scenarios to provide
a feel for how data properties translate to the visual display. Without loss
of generality, we consider the partitioning of a set of objects into two clus-
ters. For each scenario, on the left-hand side the original similarity matrix
S and the seriated version S (
Clusion
) for an optimal bipartitioning are
shown. On the-right hand side four histograms for the distribution of simi-
larity values s, which range from 0 to 1, are shown. From left to right, we have
plotted: distribution of s over the entire data, within the first cluster, within
the second cluster, and between the first and second clusters. If the data are
naturally clustered and the clustering algorithm is good, then the middle two
columns of plots will be much more skewed to the right compared to the
Clusion
4 This can be more clearly seen in the color pictures in the soft- copy.
Search WWH ::




Custom Search