Biplot basics - Understanding Biplots

Information Technology Reference

In-Depth Information

∗ is of the form

known as the right singular vectors of X , while the matrix

∗ : n × p =

−

(2.2)

−

In (2.2), k denotes the rank of X while is a k × k diagonal matrix with diagonal

elements the nonzero singular values of X , assumed to be presented in nonincreasing

order. It follows that (2.1) can also be written as

X : n × p = U V ,

(2.3)

where U : n × k and V : p × k consist of the first k columns of U ∗ and V ∗ , respectively.

The matrices U and V are both orthonormal.

An r -dimensional approximation of X is given by

X [ r ] = U [ r ] V ,

where [ r ] replaces the p − r smallest diagonal values of by zero. In the remainder

of this chapter we discuss approximation, axes, interpolation, prediction, projection, and

the like, from the viewpoint of extending scatter diagrams to more than two or three

dimensions. We use mainly a simple type of biplot, the principal component analysis

(PCA) biplot, as the instrument for introducing these concepts. In Chapter 3 we shall

consider the PCA biplot as a distinct type of biplot in more detail while in subsequent

chapters we shall show how the basic concepts generalize to more complicated data

structures. Underpinning PCA is a result, proved by Eckart and Young (1936), that the

r -dimensional approximation of X given by X [ r ] = U [ r ] V is optimal in the least-squares

sense that

) }

−

{ (

−

)(

−

(2.4)

is minimized for all matrices X [ r ] of rank not larger than r .

It turns out to be convenient to express these results in terms of what we term

J -notation. Here the p × p matrix J is defined by

I r

0 : r × ( p − r )

(2.5)

0 : ( p − r ) × r

0 : ( p − r ) × ( p − r )

Note that J 2

= I − J and recall that diagonal matrices commute.

With this notation we can write the above as

= J and

( I − J )

X [ r ] =

JV =

V =

JV .

Of course, the final p

−

r columns of UJ and VJ vanish but the matrices UJ and VJ

p . In some instances, it is more convenient to use the notation U r and V r to

denote the first r columns of U and V , respectively.

In the biplot, we want to represent the approximated rows and columns of our data

matrix X , that is, we want to represent the rows and columns of X [ r ] . A standard result

remain p

Understanding Biplots

Search WWH ::

Custom Search

Home