Information Technology Reference
In-Depth Information
9 Generalized biplots
9.1 Introduction
Nonlinear biplots (Chapter 5) generalize the PCA biplot by providing for distance
measures for quantitative variables other than Pythagorean distance; generalized biplots
(Gower, 1992) offer a further generalization that allows categorical variables to be
included. The map of the samples is achieved by allowing for the calculation of
distances between samples with categorical measurements and plugging this distance
matrix into the PCO method. Biplot axes for any quantitative variables are handled in a
similar way to that discussed for nonlinear biplots.
Because a categorical variable can only assume a finite number of distinct values,
called category levels, the concept of a continuous trajectory becomes invalid. Now, each
categorical variable will be represented by a set of category-level points - one point for
each level - in such a way that samples nearest to a certain CLP will be associated
with that level of the categorical variable (see Chapter 8). The nearest-neighbour region
for a CLP is the convex subspace that contains all points that are nearer to this CLP
than to any of the other CLPs for this variable. The combination of nonlinear trajecto-
ries for continuous variables with CLPs for categorical variables defines a generalized
coordinate system, termed a reference system . In low-dimensional approximations, con-
tinuous variables are represented by nonlinear biplot axes, while the contributions from
the nearest-neighbour regions define convex regions termed prediction regions.
Table 9.1 shows a simple data set consisting of two variables, the first the height of
each subject (a continuous variable in centimetres) and the second the colour of their
eyes (a categorical variable with three levels: blue, brown and green ). One possibility for
plotting these data is to represent height on a horizontal axis and use three parallel bars
for the categories, one below the other, as is shown in Figure 9.1. There is, however,
no natural ordering of the categories and any other order would be equally valid. In the
following, we show how categorical variables can be represented directly in a scatterplot.
Search WWH ::




Custom Search