Information Technology Reference
In-Depth Information
One might argue that by a simple line plot over time of EDM, or of CM this feature
would also be highlighted, but the advantage of the biplot is to give a global overview
of the interlinked financial instruments. We return to this example later in the chapter.
3.2 Understanding PCA and constructing its biplot
According to Jolliffe (2002), PCA is essentially directed 'to reduce the dimensionality of
a data set consisting of a large number of interrelated variables, while retaining as much
as possible of the variation present in the data set. This is achieved by transforming to a
new set of variables, the principal components (PCs), which are uncorrelated, and which
are ordered so that the first few retain most of the variation present in all of the original
variables.' While not disagreeing with Jolliffe, we take a rather different approach that
emphasizes different aspects of the PCA transformation. This difference of emphasis will
become clear in the following.
The fundamental problem of PCA is to approximate X by X [ r ] in r dimensions or,
equivalently, of rank r . In PCA the columns of X refer to different variables, and before
we can think about approximation we have to handle what might be incommensurabilities
between variables. Thus, if we had variables measuring height in metres and weight in
kilograms, we might have difficulties and certainly would not be content with any method
of analysis that was sensitive to changes of scale, such as replacing metres and kilograms
by feet and pounds. Some multivariate methods handle this problem routinely, but PCA
is not one of them. The problem was avoided in our introductory example because
all variables were on the same VAR scale. Other common methods of scaling are to
normalize by dividing each variable by its standard error or, for positive variables, by
taking logarithms.
Assuming that any necessary pre-scaling of the data has been attended to, PCA
uses a least-squares criterion as the basis of approximation. To be precise, the sum of
squares of the differences between corresponding members of X and X [ r ] is minimized.
Algebraically this may be written:
X [ r ] ) ( X
X [ r ] ) } or, equivalently, minimize || ( X
X [ r ] ) ||
2
minimize tr { ( X
.
(3.1)
Geometrically, we consider the rows of X as giving the coordinates of n points in p
dimensions and are seeking the r -dimensional plane containing the points whose coor-
dinates are given by the rows of X [ r ] that minimizes criterion (3.1). For a minimum, it
is intuitive (and may be formally proved) that the best fit is obtained when X [ r ] is an
orthogonal projection of X . Furthermore, we know that the plane must pass through the
centroid of the points given by X . This is a simple consequence of Huygens' principle
that the sum of squares about the mean is smaller than the sum of squares about any
other point. Replacing X by ( I -
1
n 11 ) X ensures that the centroid is at the origin. In the
following, we assume that X has been centred in this way. Thus, it only remains to find
the direction of the best-fitting plane. The solution to this least-squares problem is given
by the Eckart-Young theorem (Eckart and Young, 1936).
As shown in (2.3), X : n × p = U V and the r -dimensional Eckart-Young approxi-
mation X [ r ] =
XVJV minimizes the squared error loss
(2.4). Then the coordinates of the r -dimensional approximation of the centred X are given
by the first r columns of U
JV =
V =
JV =
U
UJ
UJ
J ,
(
U
) r =
XV r and the directions of the axes by VJ . Thus
Search WWH ::




Custom Search