Principal component analysis biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

3.8 Some novel applications and enhancements

of PCA biplots

3.8.1 Risk management example revisited

In the biplot in Figure 3.1 the point ( CM = 0, IRD = 0, MM = 0, ALCO = 0, SE

= 0, EDSA = 0, EDM = 0) is interpolated by adding the argument X.new.samples

= matrix(rep(0,7), nrow = 1) ) in the call to PCAbipl . This results in the top

panel of Figure 3.29. If the interpolated point is wanted for the scaled data then the

argument scaled.mat = TRUE is also needed. The reader can verify that the result is

a biplot where the interpolated point does not appear. The reason is that the interpolated

point lies outside the original plotting area. To make this point visible, increase the

default setting of exp.factor . The bottom panel of Figure 3.29 was obtained with the

settings scaled.mat = TRUE, exp.factor = 2 . Note the annotation using functions

draw.arrow and draw.text. The interpolated point can be viewed as an ideal point

where there is zero loss for all seven instruments. Clearly the 'best' situation using the

unscaled data was achieved on day20 .

One problem, especially with the biplot in the bottom panel of Figure 3.29, is that

several data points are so bundled together that it is impossible to identify a particular

day. With even moderately large data sets this problem is bound to change for the worse,

therefore we provide an R function, PCAbipl.zoom , to interactively zoom into any

required part of a biplot. When this function is called with the argument zoomval =

x, the window with the drawn PCA biplot is activated, the mouse pointer changes to a

cross and the user can move the cross to select the bottom left-hand corner to zoom into.

The value x controls the amount of zooming such that the aspect ratio is kept constant

at unity. Figure 3.30 gives an example of the zooming function.

The Figure 3.29 biplot can be further enhanced by adding a trend line showing

the seven-dimensional movement over time. A simple solution would be to connect the

sample points in the PCA biplot in temporal order. Since PCAbipl returns the coordinates

of the sample points in the biplot, this is easy to accomplish. The connecting lines are

shown in Figure 3.31 for both the unscaled and the scaled data, but it is obvious that

the trend is difficult to follow, with too many interconnecting lines. Had the data set

consisted of 100 days' VAR values, such a connecting line would render the biplot

useless. It is easy to correct this: we need some form of smoothing. In Figure 3.32 a

nonparametric regression smoother was fitted to each of the two dimensions separately.

The R commands

> z1 <- X.cent %*% Eigenvectors[,1]

# Eigenvectors returned by PCAbipl

> z2 <- X.cent %*% Eigenvectors[,2]

> zfit1 <- fitted(loess(z1

∼

I(1:20)))

∼

> zfit2 <- fitted(loess(z2

I(1:20)))

yield the values to be connected to form this trend line. In this example, the default span

for the loess function was used. However, the amount of smoothing can be controlled

by this parameter and any other smoothing technique can be applied similarly.

Understanding Biplots

Search WWH ::

Custom Search

Home