Biology Reference
In-Depth Information
TABLE 1 Example Data Set (cont'd)
TABLE 1 Example Data Set
Length
Mass
Length
Mass
0.824
124.100
0.801
121.410
0.788
120.800
0.824
127.700
0.782
107.400
0.841
129.200
0.795
120.700
0.816
131.800
0.805
121.910
0.840
135.100
0.836
122.310
0.842
131.500
0.788
110.600
0.820
126.700
0.772
103.510
0.802
115.100
0.776
110.710
0.828
130.800
0.758
113.800
0.819
124.600
0.826
118.310
0.802
114.200
are related to each other. Principal component
plots, also called loading plots, provide informa-
tion about how different variables are related to
each other. Because we are working with scaled
variables, the principal components (PCs) and
scores are dimensionless variables.
The mathematics of PCA can be clearly
described using linear algebra. 4 An excellent
discussion of linear algebra can be found in the
topic by Strang. 5 By convention, the data matrix,
X , has p columns and n rows, and each column
represents another variable and new rows for
each ob ser vation or sample. The average data
matrix, X , is the average of each individual
column (i.e., variable) in the data set. Mean
centering is written as
X X
0.810
120.300
0.802
115.700
0.832
117.510
0.796
109.810
0.759
109.100
0.770
115.100
0759
118.310
0.772
112.600
0.806
116.200
0.803
118.000
0.845
131.000
0.822
125.700
(2)
0.971
126.100
0.816
125.800
The covariance matrix is written as
0.836
125.500
T
(3)
where an uppercase T represents a matrix trans-
pose. The covariance matrix is a square,
symmetric, p
C ¼ðX XÞ
ðX XÞ
0.815
127.800
0.822
130.500
0.822
127.900
p matrix. The covariance matrix
provides information about the relationship
between different variables. For example, the
0.843
123.900
( Continued )
Search WWH ::




Custom Search