Database Reference
In-Depth Information
if both of its states are equally valuable. In that case, using the simple
matching coecient can assess dissimilarity between two objects:
r
+
s
q
+
r
+
s
+
t
,
d
(
x
i
,x
j
)=
where
q
is the number of attributes that equal 1 for both objects;
t
is the
number of attributes that equal 0 for both objects; and
s
and
r
are the
number of attributes that are unequal for both objects.
A binary attribute is asymmetric, if its states are not equally important
(usually the positive outcome is considered more important). In this case,
the denominator ignores the unimportant negative matches (
t
). This is
called the Jaccard coecient:
r
+
s
q
+
r
+
s
.
d
(
x
i
,x
j
)=
8.4.2.2
Distance Measures for Nominal Attributes
When the attributes are
nominal
, two main approaches may be used:
(1) Simple matching:
d
(
x
i
,x
j
)=
p
−
m
,
p
where
p
is the total number of attributes and
m
is the number of
matches.
(2) Creating a binary attribute for each state of each nominal attribute
and computing their dissimilarity as described above.
8.4.2.3
Distance Metrics for Ordinal Attributes
When the attributes are
ordinal
, the sequence of the values is meaningful.
In such cases, the attributes can be treated as numeric ones after mapping
their range onto [0,1]. Such mapping may be carried out as follows:
z
i,n
=
r
i,n
−
1
1
,
M
n
−
where
z
i,n
is the standardized value of attribute
a
n
of object
i
.
r
i,n
is that
value before standardization, and
M
n
is the upper limit of the domain of
attribute
a
n
(assuming the lower limit is 1).
Search WWH ::
Custom Search