Databases Reference
In-Depth Information
80
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
X
Figure2.13
Visualization of a 2-D data set using a scatter plot.
Source: www.cs.sfu.ca/jpei/publications/
rareevent-geoinformatica06.pdf
.
projection techniques
help users find interesting projections of multidimensional data
sets. The central challenge the geometric projection techniques try to address is how to
visualize a high-dimensional space on a 2-D display.
A
scatter plot
displays 2-D data points using Cartesian coordinates. A third dimen-
sion can be added using different colors or shapes to represent different data points.
Figure 2.13 shows an example, where
X
and
Y
are two spatial attributes and the third
dimension is represented by different shapes. Through this visualization, we can see that
points of types “+” and “” tend to be colocated.
A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses
color, it can display up to 4-D data points (Figure 2.14).
For data sets with more than four dimensions, scatter plots are usually ineffective.
The
scatter-plot matrix
technique is a useful extension to the scatter plot. For an
n
-
dimensional data set, a scatter-plot matrix is an
n
n
grid of 2-D scatter plots that
provides a visualization of each dimension with every other dimension. Figure 2.15
shows an example, which visualizes the Iris data set. The data set consists of 450 sam-
ples from each of three species of Iris flowers. There are five dimensions in the data set:
length and width of sepal and petal, and species.
The scatter-plot matrix becomes less effective as the dimensionality increases.
Another popular technique, called parallel coordinates, can handle higher dimensional-
ity. To visualize
n
-dimensional data points, the
parallel coordinates
technique draws
n
equally spaced axes, one for each dimension, parallel to one of the display axes.