Information Technology Reference
In-Depth Information
of matches (1 - 1), which occurs only when both samples have the same category level.
Furthermore, the number of matches (1 - 1) is the same as the number of agreements
between the variables but the number of mismatches (1 - 0 or 0 - 1) is twice the number of
disagreements between the category levels of the variables. To overcome this deficiency,
Gower (1992) suggests an extension of the Jaccard coefficient, the extended matching
coefficient , which is Euclidean embeddable, defined as
I g ( k )
jh ,
L k
1
2
= g ( k )
f k ( x ik , x jk ) =−
(9.7)
ih
h
=
1
where I ( a = b )
is the indicator function which is unity when a = b =
1, and zero oth-
erwise. Since the ij th element of D k is f k ( x ik , x jk )
, it follows that
1
2 ( 1 n 1 n G k G k )
and from D ( 2 ) = k = p ( 1 ) + 1 D k the matrix D ( 2 )
D k
=−
is obtained as
1
2 ( p ( 2 ) 1 n 1 n GG ).
D ( 2 ) =−
(9.8)
Since the EMC contributes zero to the overall similarity of the two samples if the
k th categorical variable mismatches and unity if it matches, f k ( x ik , x jk ) can assume a
maximum value of 1. This ensures that the contributions from the EMC can be combined
commensurably with the contributions from the quantitative variables, for example as
in (9.5).
The matrix D in (9.6) can be plugged into any multidimensional scaling method
to obtain a map of the sample points. However, biplot axes and prediction regions are
most readily constructed when using principal coordinates/classical scaling as we did for
nonlinear biplots.
9.3 Constructing a generalized biplot
A PCO of an Euclidean embeddable ddistance matrix D gives a map Y in R .In
Figure 9.2 the map is based on Pythagorean distance for the heights and the EMC
for the eye colours. This representation is exact and approximation is not relevant since
four samples can be represented exactly in three dimensions.
9.4 Reference system
As explained above, the reference system in generalized biplots consists of the usual
biplot trajectories representing the continuous variables, and a set of CLPs with accom-
panying nearest-neighbour and prediction regions defined for each categorical variable.
If all the variables were continuous, the biplot trajectories could be obtained similarly
to the nonlinear biplot trajectories by interpolating 'new samples'
τ
φ e k
=
e k , k = 1, 2, ... , p and
−∞ <τ < ,
h = 1, ... , ( x hk ) min
max
h = 1, ... , ( x hk )
Search WWH ::




Custom Search