Database Reference
In-Depth Information
Parallel axis
PCA projection
CViz projection
CLUSION
(a)
(b)
Fig. 3.3. Comparison of visualization techniques.All tools work well on the 4-
dimensional IRIS data (a). But on the 2903-dimensional Yahoo! newsdocument
data (b), only Clusion reveals that clusters 1 and 2 are actually highly related,
cluster 3 is strong and interdisciplinary, cluster 4 is weak, and cluster 5 is strong.
and position on-diagonal from the upper-left to the lower-right corner (
Clu-
sion
), respectively. All four tools succeed in visualizing three clusters and
making apparent that clusters 2 and 3 are closer than any other and clus-
ter 1 is very compact. Figure 3.3(b) shows the same comparison for 293
documents from which 2903 word frequencies were extracted to be used as
features. In fact this data set consists of five clusters selected from 40 clusters
extracted from a Yahoo! news document collection that will be described in
more detail in Section 3.5.2. The colors black/magenta and the shapes
have been added to indicate cluster 4/5, respectively. The parallel axis plot
becomes useless clutter due to the high number of dimensions and the large
number of objects. PCA and CViz succeed in separating three clusters each
(2, 3, 5, and 1, 2, 3, respectively) and show all others superimposed on the
axis origin. They give no suggestions toward which clusters are compact or
which clusters are related. Only
/
suggests that clusters 1 and 2 are
actually highly related, cluster 3 is interdisciplinary, cluster 4 is weak, and
cluster 5 is a strong cluster. Indeed, when looking at the cluster descriptions
(which might not be so easily available and understandable in all domains),
the intuitive interpretations revealed by
Clusion
Clusion
are proven to be very true:
Cluster
Dominant category
Purity (%)
Entropy
Most frequent
word stems
1
health (H)
100
0.00
hiv, depress, immun
2
health (H)
100
0.00
weight, infant, babi
3
online (o)
58
0.43
apple, intel, electron
4
film (f)
38
0.72
hbo, ali, alan
5
television (t)
83
0.26
household, sitcom,
timeslot
 
Search WWH ::




Custom Search