Biology Reference
In-Depth Information
than the other variables. The scaling of the vari-
ables means that all variables will contribute to
the analysis in roughly the same way. Mathe-
matically the correlation matrix, R , is written as
P i ¼ 1 x ji
x i x jk
x k
s ik
r ik ¼
s ii p
s j p ¼
r
r
x ji
x i 2
x jk
x k 2
0
1
1
r 1 p
@
A
1
R ¼
FIGURE 1 Scaled physical measurement data. Both the
mass and length are scaled by the standard deviation of the
data for each variable. Because the variables are scaled by
the standard deviation, they are dimensionless.
r p 1
1
(4)
where the elements of the correlation matrix are
given by r ij . R is a square p
p matrix, where p is
the number of variables. The diagonal elements
of R are equal to 1.
PCA is the systematic analysis of the covari-
ance or correlation matrix. It can be shown that
the eigenvalues are positive and the eigenvectors
are orthogonal for both matrices. 5 The eigen-
vector equation for C is
i,j element of the covariance matrix quanti
es
the relative change between the i,j variables. If
an element of the covariance matrix is zero, there
is no relationship (correlation) between the two
variables.
Related to the covariance matrix is the correla-
tion matrix where all the variables have been
scaled to their standard deviations. The correla-
tion matrix is useful when one or more of the
variables has a much higher numerical value
Cu i ¼ l i u i
(5)
where u i is the i ith eigenvector and l i is the corre-
sponding eigenvalue. By convention, the eigen-
values are placed in descending order, where
l 1 is the largest eigenvalue. In PCA, the eigen-
vectors are also called principal components. It
can be shown that the
first principal component
(PC) represents the largest source of variance in
the data set. The percentage variation explained
by the i ith PC is given by
100 l 1
P i l i
(6)
It is common with many metabolomics and
proteomics data sets that the data set can be well
approximated by a few principal components.
As explained earlier, score values provide infor-
mation about the relationship between different
observations. The PCs form a basis set that can
FIGURE 2 Scaled physical measurement data showing
both the
first and second principal components for the data
set. The
first principal component is the direction of the
maximum variation within the data set. The second principal
component is perpendicular to the
first PC. The scores for
each sample point are given by the projection of the data
point onto the principal component vector.
Search WWH ::




Custom Search