Principal component analysis biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

One might argue that by a simple line plot over time of EDM, or of CM this feature

would also be highlighted, but the advantage of the biplot is to give a global overview

of the interlinked financial instruments. We return to this example later in the chapter.

3.2 Understanding PCA and constructing its biplot

According to Jolliffe (2002), PCA is essentially directed 'to reduce the dimensionality of

a data set consisting of a large number of interrelated variables, while retaining as much

as possible of the variation present in the data set. This is achieved by transforming to a

new set of variables, the principal components (PCs), which are uncorrelated, and which

are ordered so that the first few retain most of the variation present in all of the original

variables.' While not disagreeing with Jolliffe, we take a rather different approach that

emphasizes different aspects of the PCA transformation. This difference of emphasis will

become clear in the following.

The fundamental problem of PCA is to approximate X by X [ r ] in r dimensions or,

equivalently, of rank r . In PCA the columns of X refer to different variables, and before

we can think about approximation we have to handle what might be incommensurabilities

between variables. Thus, if we had variables measuring height in metres and weight in

kilograms, we might have difficulties and certainly would not be content with any method

of analysis that was sensitive to changes of scale, such as replacing metres and kilograms

by feet and pounds. Some multivariate methods handle this problem routinely, but PCA

is not one of them. The problem was avoided in our introductory example because

all variables were on the same VAR scale. Other common methods of scaling are to

normalize by dividing each variable by its standard error or, for positive variables, by

taking logarithms.

Assuming that any necessary pre-scaling of the data has been attended to, PCA

uses a least-squares criterion as the basis of approximation. To be precise, the sum of

squares of the differences between corresponding members of X and X [ r ] is minimized.

Algebraically this may be written:

X [ r ] ) ( X −

X [ r ] ) } or, equivalently, minimize || ( X −

X [ r ] ) ||

minimize tr { ( X −

(3.1)

Geometrically, we consider the rows of X as giving the coordinates of n points in p

dimensions and are seeking the r -dimensional plane containing the points whose coor-

dinates are given by the rows of X [ r ] that minimizes criterion (3.1). For a minimum, it

is intuitive (and may be formally proved) that the best fit is obtained when X [ r ] is an

orthogonal projection of X . Furthermore, we know that the plane must pass through the

centroid of the points given by X . This is a simple consequence of Huygens' principle

that the sum of squares about the mean is smaller than the sum of squares about any

other point. Replacing X by ( I -

n 11 ) X ensures that the centroid is at the origin. In the

following, we assume that X has been centred in this way. Thus, it only remains to find

the direction of the best-fitting plane. The solution to this least-squares problem is given

by the Eckart-Young theorem (Eckart and Young, 1936).

As shown in (2.3), X : n × p = U V and the r -dimensional Eckart-Young approxi-

mation X [ r ] =

XVJV minimizes the squared error loss

(2.4). Then the coordinates of the r -dimensional approximation of the centred X are given

by the first r columns of U

JV =

V =

JV =

J ,

(

) r =

XV r and the directions of the axes by VJ . Thus

Understanding Biplots

Search WWH ::

Custom Search

Home