Database Reference
In-Depth Information
8.4.2.4
Distance Metrics for Mixed-Type Attributes
In the cases where the instances are characterized by attributes of mixed-
type , one may calculate the distance by combining the methods mentioned
above. For instance, when calculating the distance between instances i and
j using a metric such as the Euclidean distance, one may calculate the
difference between nominal and binary attributes as 0 or 1 (“match” or
“mismatch”, respectively), and the difference between numeric attributes
as the difference between their normalized values. The square of each such
difference will be added to the total distance. Such calculation is employed
in many clustering algorithms presented below.
The
dissimilarity d ( x i ,x j )
between
two
instances,
containing p
attributes of mixed types, is defined as:
n =1
p
δ ( n )
ij
d ( n )
ij
d ( x i ,x j )=
,
n =1
p
δ ( n )
ij
where the indicator δ ( n )
ij = 0 if one of the values is missing. The contribution
of attribute n to the distance between the two objects d ( n ) ( x i, x j )is
computed according to its type:
If the attribute is binary or categorical, d ( n ) ( x i ,x j )=0if x in = x jn ,
otherwise d ( n ) ( x i ,x j )=1.
If the attribute is continuous-valued, d ( n )
ij
|x in −x jn |
max h x hn min h x hn
=
,where h
runs over all non-missing objects for attribute n .
If the attribute is ordinal, the standardized values of the attribute are
computed first and then, z i,n is treated as continuous-valued.
8.4.3
Similarity Functions
An alternative concept to that of the distance is the similarity function
s ( x i ,x j ) that compares the two vectors x i and x j . This function should be
symmetrical (namely s ( x i ,x j )= s ( x j ,x i )) and have a large value when x i
and x j are somehow “similar” and constitute the largest value for identical
vectors.
A similarity function where the target range is [0,1] is called a
dichotomous similarity function. In fact, the measures described in the
previous sections for calculating the “distances” in the case of binary
and nominal attributes may be easily converted to similarity functions,
by subtracting the distance measure from 1.
Search WWH ::




Custom Search