Biology Reference
In-Depth Information
where X NP is the value of the P th coordinate in the N th individual. We can also think of
this as a P -dimensional space with N points plotted in that space
just a multidimen-
sional version of the simplistic examples presented in the previous section.
Our problem is to replace the original variables ( X 1 , X 2 , X 3 ,
... X P ), which are the columns
of the data matrix, with a new set of variables ( Y 1 , Y 2 , Y 3 ,
... Y P ), the PCs that meet the con-
straints outlined in the first paragraph of this section. Each PC will be a straight line
through the original P -dimensional space, so we can write each Y j as a linear combination
of the original variables:
Y j 5
A 1 j X 1 1
A 2 j X 2 1?1
A Pj X P
(6.7)
which can be expressed in matrix notation as:
Y j 5 A j X
(6.8)
where A j
A Pj }. (The notation A j refers to the trans-
pose , or row form, of the column matrix A j .) All this means is that the new values of the
individuals, their PC scores, will be computed by multiplying their original values (listed
in matrix X ) by the appropriate values of A j and summing the appropriate combinations
of multiples. Now we can see that our problem is to find the values of A j
is a vector of constants { A 1 j , A 2 j , A 3 j ...
that satisfy the
constraints outlined above.
The first constraint we will address is the requirement that the total variance is not
changed. Variance is the sum of the squared distances of individuals from the mean, so
this is equivalent to requiring that distances in the new coordinate system are the same as
distances in the original coordinate system. The total variance of a sample is given by the
sample variance
covariance matrix S :
2
4
3
5
s 11
s 12
s 13
s 1 P
?
s 21
s 22
s 23
s 2 P
?
S 5
s 31
s 3 P
^^^& ^
s P 1
s 32
s 33
(6.9)
?
s P 2
s P 3
s PP
?
in which s ii is the sample variance observed in variable X i ,and s ij (which is equal to s ji )is
the sample covariance observed in variables X i and X j .
We can meet the requirement that the total variance is unchanged by requiring
that each PC is a vector of length one. If we multiply matrix X by a vector of constants as
indicated in Equation 6.8 , the variance of the resulting vector Y j will be:
Var
ðA j X
Þ 5 A j SA j
ðY j Þ 5
Var
(6.10)
Thus, the constraint that variance is unchanged can be formally stated as the require-
ment that the inner product or dot product of each vector A j with itself must be one:
p
X
A j A j 5
A kj
1
(6.11)
5
k
1
5
Search WWH ::




Custom Search