Generalized biplots - Understanding Biplots

Information Technology Reference

In-Depth Information

for the square root of the Manhattan distance,

x ik − x jk ;

f k ( x ik , x jk ) =−

(9.3)

and for Clark's distance,

x ik −

x jk

f k ( x ik , x jk ) =−

(9.4)

x ik +

x jk

All these distances are defined in terms of differences between two values of the same

variable, so there is no need to work in terms of deviations from the mean. However,

(9.2) and (9.3) depend on the scaling used, which, as explained in Sections 2.5 and 3.6,

makes it vital to use some form of normalization when combining measurements from

variables measured on incommensurable scales. Gower (1992) and Gower and Hand

(1996) consider scaling each variable to have unit sum of squares or unit range. In the

following, we assume that all quantitative variables have been prescaled to correct for

incommensurability. In our first example, we have scaled to unit range with the scaled

value

x ik for i

1, 2,

...

, n and k

1, 2,

...

, p given by

x ik

max i ( x ik ) − min i ( x ik ) .

(9.5)

Then we have used Pythagorean distance.

Due to the assumption of additive distance it can be assumed without loss of generality

that the variables are ordered such that the first p ( 1 )

are continuous and the remaining

are categorical, with p ( 1 ) + p ( 2 ) = p and

p ( 2 )

( 1 )

D =

D k +

D k

= D ( 1 ) + D ( 2 ) ,

(9.6)

( 1 ) +

where D k

,the ddistance matrix derived solely for the k th variable.

The matrix D ( 1 ) is calculated as before, using an additive Euclidean embeddable

distance measure on the p ( 1 ) continuous variables. If the k th variable is categorical, an

indicator matrix (see Chapter 8) G k : n × L k is formed with L k the number of category

levels for this variable. Each row of G k represents a sample such that

f k (

x ik , x jk ) }

1 fthe i th observation on variable k falls into category level h

0oth rw

g ih =

To calculate D ( 2 )

the matrix

G p ( 1 ) + 1

G p

...

n × L =

is formed, where L

L p .Ifthe L columns of G are, or are viewed

as, dichotomous variables, an obvious approach would be to derive D ( 2 ) as a matrix of

dissimilarities. However, coding multilevel categories as a series of dichotomous variables

leads to a situation where the number of negative matches (0 - 0) dominates the number

L p ( 1 ) + 1 + ... +

Understanding Biplots

Search WWH ::

Custom Search

Home