Principal component analysis biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

The measures (3.26) and (3.27) will coincide only when the regression fit is exact and

the final residual in (3.25) vanishes. Normally (3.26), unlike (3.27), will not attain unity

even when all p dimensions of the PCA fit are used. In the expressions for (3.26) and

(3.27), X may be replaced by its SVD, in which case we have that

2

JJ − 1 U x ∗ = x ∗ UJU x ∗ ,

b JV X XVJb = b JV V

V VJb = x ∗ U − 1

giving

x ∗ UJU x ∗

x ∗ x ∗

x =

.

(3.28)

Furthermore,

b V X XVb

2 V Vb

− 1 U x ∗

x ∗ UU x ∗

b V V

x ∗ U

− 1

2

=

.

so that

x ∗ UJU x ∗

x ∗ UU x ∗ .

2 =

(3.29)

The above expressions generalize in a straightforward way to add several new variables.

It is clear from formulae (2.20), (3.28) and (3.29) that in order to add new axes with

their associated predictivities all we need, in addition to the original SVD, are the values

of all the samples on each of the new variables. There is no need to perform the actual

regression.

As an example of adding new variables we again consider the Ocotea data. Knowl-

edge of the ratios Ve s L to Ve sD and RayH to RayW is of practical importance. These

two ratios ( VLDratio and RHWRatio ) have been added in the form of calibrated biplot

axes to the Figure 3.23 PCA biplot. The augmented biplot is given in Figure 3.24.

The function call for obtaining the biplot in the bottom panel of Figure 3.24 is

> VLDratio <- Ocotea.data[,4]/Ocotea.data[,3]

> RHWratio <-Ocotea.data[,6]/Ocotea.data[,7]

> Ocotea.data.newvars <- data.frame(Ocotea.data, VLDratio =

VLDratio, RHWratio = RHWratio)

> PCAbipl(Ocotea.data[,3:8], scaled.mat = TRUE,

X.new.vars = as.matrix(Ocotea.data.newvars[,9:10]),

colours = "green", pch.samples = 15, pch.samples.size = 1.25,

label = FALSE, pos = "Hor", offset = c(-0.2, 0.1, 0.1, 0.2),

n.int = c(5,5,5,5,3,5,10,5),

ax.col = list(ax.col = c(rep("grey",6),"red","red"),

tickmarker.col = c(rep("grey",6),"red","red"),

marker.col = c(rep("grey",6),"red","red")),

ax.name.col = c(rep("black",6), "red","red"),

pch.new = 16, pch.new.cols = c("red","blue","cyan"),

pch.new.labels = c("O.bul","O.ken","O.por"),

predictions.sample = c(10,35))

It follows from Table 3.16 that neither of the two added variables has high axis predictiv-

ity in two dimensions. Although this is particularly true of VLDratio , there is a dramatic

increase in its axis predictivity when a third dimension is added.

Understanding Biplots

Search WWH ::

Custom Search

Home