Generalized biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

similarly to the prediction biplot trajectories of the nonlinear biplot described in Chapter 5.

Since this example uses Pythagorean distance, all three prediction methods produce the

same linear biplot trajectory shown in Figure 9.8.

9.8 An example

We now return to the remuneration data introduced in Section 4.9.1 and also used in

Section 8.10 to illustrate a categorical PCA biplot where the continuous variables were

categorized. In a generalized biplot, the distinction between quantitative and qualita-

tive variables is retained. The same variables described in Section 8.10 are used in this

illustration but now treating Remun , Resrch , Age and AQual quantitatively. We use

Pythagorean distance for the continuous variables and the ECM for the categorical vari-

ables. As usual (see, for example, Section 9.2) we have to use some form of scaling for

the quantitative variables. Furthermore, due to the difference in the number of categories

of the qualitative variables, they too have to be normalized. The usual way to normalize

the quantitative variables is to centre and then scale each to unit sum of squares. Thus

we normalized each of Remun , Resrch , Age and AQual to unit sum of squares. Since

1 D q 1 for each quantitative variable is equal to − n times the corrected sum of squares

for that variable, this normalization process is equivalent to dividing the ddistances D q

by − ( 1 D q 1 )/ n . Equivalently, with the EMC, each qualitative variable was scaled so that

1 D k 1 =− n ,where D k

1

2

( 1 n 1 n − G k G k ) . This type of normalization balances the

contributions of quantitative and qualitative variables to overall distance, but it is not the

only possibility.

It is clear from Figure 9.9 that prediction regions for all categories of a qualitative

variable are not necessarily represented in the biplot space: for academic position, pre-

diction regions for lecturer ( R 2), senior lecturer ( R 3) and full professor ( R 5) are visible

but not those for junior lecturer and associate professor, while prediction regions for only

four of the nine faculties appear in the biplot space. The individual sample points are

printed as solid squares. We have coloured (in the top left biplot) the squares according

to gender: red squares denoting the females and green squares the males. However, the

output of our function Genbipl provides all the necessary information for easily obtain-

ing biplots with a different colouring scheme - for example, the different faculties or

different academic positions.

That the sample points, obtained by projection, do not necessarily fall within their

corresponding prediction regions, obtained by back-projection, is clear from Figure 9.9.

We could show this by colouring the category levels in Figure 9.9, but this would interfere

with the coloured prediction regions. Therefore we show the information numerically in

Table 9.2, where the entries in bold give the numbers of correct predictions while the plain

entries show the numbers of incorrect predictions. The proportion of correct predictions

is closely analogous to the predictivity measure (Section 3.3) for quantitative variables.

However, with categorical variables we have additional information giving the separate

contributions to predictivity of each category level. Thus in Table 9.2(a) R 1and R 4

are never predicted, R 2and R 5 are well predicted, while R 3 is poorly predicted. In

Table 9.2(b) both genders are well predicted. Table 9.2(c) never predicts F 3, F 4, F 6, F 7

and F 8, while F 1, F 2, F 5and F 9 are all rather poorly predicted.

=−

Understanding Biplots

Search WWH ::

Custom Search

Home