Database Reference
In-Depth Information
adaption capability with spectacular results for two-dimensional data. But
its effectiveness and interpretability for higher-dimensional data are not re-
ported. In [3.38], a hypergraph clustering approach was taken for clustering
highly related items defined in high-dimensional space and generate the cor-
responding association rules. This method was applied to binarized data,
with each frequent item set being represented by a weighted hyperedge. Like
our method, it is suitable for high-dimensional data and is linear in d. Sub-
sequently, this and another graph partitioning algorithm called principal di-
rection divisive partitioning were applied for Web document categorization
[3.39]. These two algorithms are the closest in spirit to our approach. Fi-
nally, spectral partitioning methods [3.17], [3.40] can be applied to similarity
graphs. A probabilistic foundation for spectral methods for clustering and
segmentation has been recently proposed [3.41].
Related work on scalability issues of clustering are discussed in Section
3.6.2. Visualization of high-dimensional data clusters can be largely divided
into three popular approaches:
1. Dimensionality reduction by selection of two or three dimensions, or,
more generally, projecting the data down to two or three dimensions.
Often these dimensions correspond to principal components or a scal-
able approximation thereof. Another noteworthy method is CViz [3.42],
which projects onto the plane that passes through three selected cluster
centroids to yield a “discrimination optimal” two-dimensional projection.
These projections are useful for a medium number of dimensions, i.e., if
d is not too large (< 100). 8 Nonlinear projections have also been studied
[3.43]. Re-creating a two- or three-dimensional space from a similarity
graph can also be done through multidimensional scaling [3.44].
2. Parallel-axis plots show each object as a line along d parallel axes. How-
ever, this technique is rendered ineffective if the number of dimensions d
or the number of objects gets too high.
3. Kohonen's self organizing map (SOM) [3.45] provides an innovative and
powerful way of clustering while enforcing constraints on a logical topol-
ogy imposed on the cluster centers. If this topology is two-dimensional,
one can readily “visualize” the clustering of data. Essentially a two-
dimensional manifold is mapped onto the (typically higher-dimensional)
feature space, trying to approximate data density while maintaining topo-
logical constraints. Because the mapping is not bijective, the quality can
degrade very rapidly with increasing dimensionality of the feature space,
unless the data are largely confined to a much lower-order manifold within
this space [3.43]. Multidimensional scaling (MDS) and associated meth-
ods also face similar issues.
8 For text mining, linearly projecting down to about 20 to 50 dimensions does not
affect results much (e.g., latent semantic indexing). However, it is still too high
to visualize.Aprojection to lower dimensions leads to substantial degradation
and 3-dimensional projections are of very limited utility.
Search WWH ::




Custom Search