Geoscience Reference
In-Depth Information
3.3.5
Projection of Data from the High-Dimensional
Data Space to a Visualizable Space
In order to detect structures in data, in particular clusters, it is helpful to project
structures in the high-dimensional data space (R n ) onto a visualizable space of only
two or three dimensions (R viz ). Any projection of a high-dimensional data space
onto a lower dimensional space cannot preserve all the spatial relationships of the
original space. Nonetheless, the projection onto R viz should reflect as closely as
possible the distances and clustering of data points in R n . The visual projection
must certainly enable an estimation of relative distances between data points. An
overview of projection methods can be found in Hand et al. ( 2001 ). The so-
called nonlinear projection methods do not strive to preserve linear relationships
between projected data and the original data space. Instead, a projection is sought
that provides the optimal visualization of the structural characteristics of data.
Such structural characteristics can be spatial relationships, as, for example, the
occurrence of dense clusters of data. A further important consideration is that
nonlinear projection methods should enable a precise illustration of neighborhood
relationships existing in the high-dimensional space in the form of similar or
identical neighborhood relationships in the projection space.
In principle it is impossible to preserve all neighborhood relationships between
data points when projecting high-dimensional data to a lower-dimensional space.
Yet there are visualization processes that attempt to preserve such structuring
as precisely as possible by employing various scaling levels. Such methods are
described as topology preserving.
Suitable visualizations include the self-organizing feature maps (SOMs) pro-
posed in 1992 by the Finnish physicist Teovo Kohonen ( 1982 , 2001 ). These can be
seen as a mathematical model of the formation of sensomotoric regions in biological
neural networks, such as the human brain. SOMs are used in two different forms:
as the so-called k-means-SOM (KMSOM) and as emergent self-organizing feature
maps (ESOM) (Ultsch 1999 ). In k-means-SOM, each neuron stands for a cluster.
Emergent SOMs (ESOMs) create a map of R n on a two-dimensional grid structure
formed by neurons (units). For the constructed visualization, it can be determined
that no energy function can be employed to measure the quality of the visualization.
Nonetheless, this form of visualization has the advantage that the topology of
the original data space is uniquely preserved. An important example of this
characteristic is the U-Matrix, which can be used to reveal nonlinear entanglements
within Kohonen maps. A typical example is the case of two intertwined toroidal data
sets (cf. Figs. 3.10 and 3.11 ). The Emergent Self-Organizing Map also enhances the
investigation of multidimensional spatial objects. An ESOM with 50 82 neurons
is trained with the inspected and preprocessed data.
The corresponding U*-Map (Ultsch 2003 ) (island view, cf. Fig. 3.12 ) delivers
a geographical landscape of the UD data on a projected map (imaginary axis). A
clear structure can be easily recognized. The structures are expressed by mountains
(displayed on the z-axis), the height of which define the distance between different
Search WWH ::




Custom Search