Information Technology Reference
In-Depth Information
4.6.1 Predictivities of new samples and variables
We showed in Chapter 3 that new samples or variables added to a PCA biplot do not
play any role in finding the scaffolding for constructing the biplot, but that it is still
possible to calculate predictivities associated with the newly added samples or variables.
Similar calculations are possible with CVA biplots. The matrix of within-group sample
predictivities (4.23) can be written as
W = diag (( I H ) XMJM X ( I H )) diag (( I H ) XMM X ( I H )) 1 ,
(4.24)
XW 1 X = ( XMJM 1
)( MM )( XMJM 1
) = XMJM X . Given a new sample x :
since
p ×
1 with known values for all p original variables and belonging to group t , its within-
group sample predictivity can be calculated as follows. Centre x by subtracting the same
mean vector used in the column-centring of the original input data matrix X .Let x
denote the centred x . The factor ( I
x -row t of
( G G ) 1 G X . Thus Within-group sample predictivity = x MJM x / x MM x .
If the group membership of the new sample is unknown, as was the case with the
specimen of unknown origin in our Ocotea example, its group membership can be taken
as suggested by the nearest group mean to its position in the biplot display. The above
procedure generalizes in a straightforward way to an s × p matrix containing measure-
ments of s new samples. Consider the p vector C 1 / 2 x and its regression b introduced
in equation (4.10). Similar to (3.24) and (3.25), we have the orthogonal decompositions
x =
H
)
X in (4.24) is replaced by
C 1 / 2 x =
C 1 / 2 XMJb
C 1 / 2 XM
C 1 / 2 x
C 1 / 2 XMb
+
(
I
J
)
b
+ (
).
(4.25)
In (4.25) the first two terms on the right-hand side represent the contribution of the
regression in the canonical space, while the final term is the regression residual. Thus,
in the full space we have b = 1 M X Cx , while Jb = J 1 M X Cx selects the r
elements of b in the approximation space. From (4.25), axis predictivity for the newly
added variable may be defined in two ways:
b JM X CXMJb
x Cx
b J Jb
X Cx .
CVA . x =
=
(4.26)
This compares the regression fit in r dimensions of the canonical space with the sum
of squares among the means of the new variable. Usually the residual term ( C 1 / 2 x
C 1 / 2 XMb ) in (4.25) vanishes because both the canonical means and x will occupy
K - 1 dimensions. Then (4.26) simplifies to
b JM X CXMJb
b M X CXMb
b J Jb
b
CVA . x =
=
b .
(4.27)
This compares the regression fit in r dimensions of the canonical space with the sum of
squares in the remaining p-r dimensions. However, when x occupies fewer dimensions,
as when p
K - 1 or when there happen to be collinearities among the canonical means,
then (4.27) excludes a nonzero regression residual in its denominator and so gives an
overoptimistic indication of predictivity. Thus we prefer (4.26) to (4.27). Expressions
(4.26) and (4.27) generalize in a straightforward way to t newly added variables.
<
Search WWH ::




Custom Search