Database Reference
In-Depth Information
if both of its states are equally valuable. In that case, using the simple
matching coecient can assess dissimilarity between two objects:
r + s
q + r + s + t ,
d ( x i ,x j )=
where q is the number of attributes that equal 1 for both objects; t is the
number of attributes that equal 0 for both objects; and s and r are the
number of attributes that are unequal for both objects.
A binary attribute is asymmetric, if its states are not equally important
(usually the positive outcome is considered more important). In this case,
the denominator ignores the unimportant negative matches ( t ). This is
called the Jaccard coecient:
r + s
q + r + s .
d ( x i ,x j )=
8.4.2.2
Distance Measures for Nominal Attributes
When the attributes are nominal , two main approaches may be used:
(1) Simple matching:
d ( x i ,x j )= p
m
,
p
where p is the total number of attributes and m is the number of
matches.
(2) Creating a binary attribute for each state of each nominal attribute
and computing their dissimilarity as described above.
8.4.2.3
Distance Metrics for Ordinal Attributes
When the attributes are ordinal , the sequence of the values is meaningful.
In such cases, the attributes can be treated as numeric ones after mapping
their range onto [0,1]. Such mapping may be carried out as follows:
z i,n = r i,n
1
1 ,
M n
where z i,n is the standardized value of attribute a n of object i . r i,n is that
value before standardization, and M n is the upper limit of the domain of
attribute a n (assuming the lower limit is 1).
Search WWH ::




Custom Search