Canonical variate analysis biplots - Understanding Biplots - page 168

Information Technology Reference

In-Depth Information

4.6.1 Predictivities of new samples and variables

We showed in Chapter 3 that new samples or variables added to a PCA biplot do not

play any role in finding the scaffolding for constructing the biplot, but that it is still

possible to calculate predictivities associated with the newly added samples or variables.

Similar calculations are possible with CVA biplots. The matrix of within-group sample

predictivities (4.23) can be written as

W = diag (( I − H ) XMJM X ( I − H )) diag (( I − H ) XMM X ( I − H )) − 1 ,

(4.24)

XW − 1 X = ( XMJM − 1

)( MM )( XMJM − 1

) = XMJM X . Given a new sample x :

since

p ×

1 with known values for all p original variables and belonging to group t , its within-

group sample predictivity can be calculated as follows. Centre x by subtracting the same

mean vector used in the column-centring of the original input data matrix X .Let x ∗

denote the centred x . The factor ( I

x ∗ -row t of

( G G ) − 1 G X . Thus Within-group sample predictivity = x MJM x / x MM x .

If the group membership of the new sample is unknown, as was the case with the

specimen of unknown origin in our Ocotea example, its group membership can be taken

as suggested by the nearest group mean to its position in the biplot display. The above

procedure generalizes in a straightforward way to an s × p matrix containing measure-

ments of s new samples. Consider the p vector C 1 / 2 x ∗ and its regression b introduced

in equation (4.10). Similar to (3.24) and (3.25), we have the orthogonal decompositions

x =

−

H

)

X in (4.24) is replaced by

C 1 / 2 x ∗ =

C 1 / 2 XMJb

C 1 / 2 XM

C 1 / 2 x ∗ −

C 1 / 2 XMb

+

(

I

−

J

)

b

+ (

).

(4.25)

In (4.25) the first two terms on the right-hand side represent the contribution of the

regression in the canonical space, while the final term is the regression residual. Thus,

in the full space we have b = − 1 M X Cx ∗ , while Jb = J − 1 M X Cx ∗ selects the r

elements of b in the approximation space. From (4.25), axis predictivity for the newly

added variable may be defined in two ways:

b JM X CXMJb

x ∗ Cx ∗

b J Jb

X ∗ Cx ∗ .

CVA . x =

=

(4.26)

This compares the regression fit in r dimensions of the canonical space with the sum

of squares among the means of the new variable. Usually the residual term ( C 1 / 2 x ∗ −

C 1 / 2 XMb ) in (4.25) vanishes because both the canonical means and x ∗ will occupy

K - 1 dimensions. Then (4.26) simplifies to

b JM X CXMJb

b M X CXMb

b J Jb

b

CVA . x =

=

b .

(4.27)

This compares the regression fit in r dimensions of the canonical space with the sum of

squares in the remaining p-r dimensions. However, when x ∗ occupies fewer dimensions,

as when p

K - 1 or when there happen to be collinearities among the canonical means,

then (4.27) excludes a nonzero regression residual in its denominator and so gives an

overoptimistic indication of predictivity. Thus we prefer (4.26) to (4.27). Expressions

(4.26) and (4.27) generalize in a straightforward way to t newly added variables.

<

Next Page

Understanding Biplots

Search WWH ::

Custom Search

Home