Databases Reference
In-Depth Information
dissimilarity between
i
and
j
is
r
C
s
q
C
r
C
s
C
t
.
d
.
i
,
j
/D
(2.13)
For asymmetric binary attributes, the two states are not equally important, such as
the
positive
(1) and
negative
(0) outcomes of a disease test. Given two asymmetric binary
attributes, the agreement of two 1s (a positive match) is then considered more signifi-
cant than that of two 0s (a negative match). Therefore, such binary attributes are often
considered “monary” (having one state). The dissimilarity based on these attributes is
called
asymmetric binary dissimilarity
, where the number of negative matches,
t
, is
considered unimportant and is thus ignored in the following computation:
r
C
s
q
C
r
C
s
.
d
.
i
,
j
/D
(2.14)
Complementarily, we can measure the difference between two binary attributes based
on the notion of similarity instead of dissimilarity. For example, the
asymmetric binary
similarity
between the objects
i
and
j
can be computed as
q
q
C
r
C
s
D 1
d
sim
.
i
,
j
/D
.
i
,
j
/
.
(2.15)
The coefficient
sim
of Eq. (2.15) is called the
Jaccard coefficient
and is popularly
referenced in the literature.
When both symmetric and asymmetric binary attributes occur in the same data set,
the mixed attributes approach described in Section 2.4.6 can be applied.
.
i
,
j
/
Example2.18
Dissimilarity between binary attributes.
Suppose that a patient record table (Table 2.4)
contains the attributes
name, gender, fever, cough, test-1, test-2, test-3
, and
test-4
, where
name
is an object identifier,
gender
is a symmetric attribute, and the remaining attributes
are asymmetric binary.
For asymmetric attribute values, let the values
Y
(
yes
) and
P
(
positive
) be set to 1,
and the value
N
(
no
or
negative
) be set to 0. Suppose that the distance between objects
Table2.4
Relational Table Where Patients Are Described by Binary Attributes
name
gender
fever
cough
test-1
test-2
test-3
test-4
Jack
M
Y
N
P
N
N
N
Jim
M
Y
Y
N
N
N
N
Mary
F
Y
N
P
N
P
N
.
.
.
.
.
.
.
.