Information Technology Reference
In-Depth Information
Ta b l e 3 . 7 Adequacies and axis predictivities returned by PCA.predictivities for
the centred (but unscaled) aircraft data.
Adequacy
Axis predictivity
Dimension
SPR
RGF
PLF
SLF
SPR
RGF
PLF
SLF
1
0.9110
0.0144
0.0001
0.0745
0.9878
0.2119
0.0828
0.4283
2
0.9992
0.2054
0.0018
0.7936
1.0000
0.5710
0.2446
0.9566
3
0.9995
0.9997
0.0018
0.9990
1.0000
1.0000
0.2446
1.0000
4
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
k and m have satisfactory predictivities in two dimensions but again the cautionary
remark about pre-scaling the data should be kept in mind.
3.4 Predictivities of newly interpolated samples
Apart from calculating the predictivities for a sample point, the predictivity for newly
interpolated samples can also be calculated, although such samples do not play any role
in constructing the scaffolding underlying the biplot. We have seen that a new sample,
say x , can be interpolated into the biplot by the relation z =
x V r . Therefore, its
sample predictivity can be calculated by applying (3.20). The predictivity of the new
sample z =
x V r follows then as
x VJV x
x x (
x x ) 1
x x ) 1
x =
=
(
.
(3.22)
When interpolating a new sample algebraically into a PCA biplot and calculating its
sample predictivity the meaning of x must be correctly understood. Remember that in
this chapter we follow the convention that X denotes the column-centred data, therefore
x denotes the observed new vector of values centred by the mean vector ( x ) used in the
column centring of X ,thatis,
x =
x New
x
.
(3.23)
If in addition to centring, the columns of X are scaled, then the same scaling must be
applied to the columns (elements) of (3.23) - for example, dividing the i th element of
(3.23) by the standard deviation of the i th column of X .
PCAbipl and PCA.predictivities have optional arguments X.new.samples and
X.new.vars expecting raw data in matrix format. The necessary centring and scaling
are then taken care of by the respective functions.
As an example of the sample predictivity associated with a new sample consider
the following data. Anatomical characteristics of 37 wood samples were determined
by microscopic methods. The following measurements were made: vessel diameter
in micrometres ( Ve sD ), vessel element length in micrometres ( Ve s L ), fibre length in
micrometres ( FibL ), ray height in micrometres ( RayH ), ray width in micrometres
( RayW ) and the number of vessels per square millimetre ( NumVes ). The 37 samples
consisted of three known species: Ocotea bullata, O. porosa and O. kenyensis. The data
are presented in Table 3.9 and a PCA biplot disregarding the group structure of this data
set is given in Figure 3.23. Since the standard deviations of the six variables range from
Search WWH ::




Custom Search