Databases Reference
In-Depth Information
dissimilarity between i and j is
r C s
q C r C s C t .
d
.
i , j
/D
(2.13)
For asymmetric binary attributes, the two states are not equally important, such as
the positive (1) and negative (0) outcomes of a disease test. Given two asymmetric binary
attributes, the agreement of two 1s (a positive match) is then considered more signifi-
cant than that of two 0s (a negative match). Therefore, such binary attributes are often
considered “monary” (having one state). The dissimilarity based on these attributes is
called asymmetric binary dissimilarity , where the number of negative matches, t , is
considered unimportant and is thus ignored in the following computation:
r C s
q C r C s .
d
.
i , j
/D
(2.14)
Complementarily, we can measure the difference between two binary attributes based
on the notion of similarity instead of dissimilarity. For example, the asymmetric binary
similarity between the objects i and j can be computed as
q
q C r C s D 1 d
sim
.
i , j
/D
.
i , j
/
.
(2.15)
The coefficient sim
of Eq. (2.15) is called the Jaccard coefficient and is popularly
referenced in the literature.
When both symmetric and asymmetric binary attributes occur in the same data set,
the mixed attributes approach described in Section 2.4.6 can be applied.
.
i , j
/
Example2.18 Dissimilarity between binary attributes. Suppose that a patient record table (Table 2.4)
contains the attributes name, gender, fever, cough, test-1, test-2, test-3 , and test-4 , where
name is an object identifier, gender is a symmetric attribute, and the remaining attributes
are asymmetric binary.
For asymmetric attribute values, let the values Y ( yes ) and P ( positive ) be set to 1,
and the value N ( no or negative ) be set to 0. Suppose that the distance between objects
Table2.4 Relational Table Where Patients Are Described by Binary Attributes
name
gender
fever
cough
test-1
test-2
test-3
test-4
Jack
M
Y
N
P
N
N
N
Jim
M
Y
Y
N
N
N
N
Mary
F
Y
N
P
N
P
N
.
.
.
.
.
.
.
.
 
Search WWH ::




Custom Search