Biology Reference
In-Depth Information
FIGURE 6.2 Graphical representation of PCA on three original
variables ( X 1 , X 2 , X 3 ). (A) The distribution of individual specimens
on the three original axes is summarized by a three-dimensional
ellipsoid; (B) the three-dimensional ellipsoid is cut by a plane pass-
ing through the sample centroid and perpendicular to the longest
axis (PC1) at its midpoint, showing the distribution of individuals
around the longest axis in the plane of the section; (C) the upper
half of the ellipsoid in B has been rotated so that the cross-section
is in the horizontal plane. Perpendicular projections of all indivi-
duals (from both halves) onto this plane are used to solve for the
second and third PCs.
X 2
X 1
X 3
(A)
X 2
X 1
X 3
(B)
(C)
the first step of the two-dimensional PCA
namely, solving for the long axis of a two-
dimensional ellipse, as outlined above. In the three-dimensional case, the long axis of the
two-dimensional ellipse will be PC2. The short axis of this ellipse will be PC3, and will
complete the description of the distribution of seeds in the watermelon. By logical exten-
sion, we can consider N variables measured on some set of individuals to represent an
N -dimensional ellipsoid. The PCs of this data set will be the N axes of the ellipsoid.
After the variation in the original variables has been redescribed in terms of the PCs,
we want to know the positions of the individual specimens relative to these new axes
( Figure 6.3 ). As shown in Figure 6.3A , the values we want are determined by the orthogo-
nal projections of the specimen onto the PCs. These new distances are called principal
component scores . Because the PCs intersect at the sample mean, the values of the scores
represent the distances of the specimen from the mean in the directions of the PCs.
Search WWH ::




Custom Search