Getting to Know Your Data - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

dissimilarity between i and j is

r C s

q C r C s C t .

d

.

i , j

/D

(2.13)

For asymmetric binary attributes, the two states are not equally important, such as

the positive (1) and negative (0) outcomes of a disease test. Given two asymmetric binary

attributes, the agreement of two 1s (a positive match) is then considered more signifi-

cant than that of two 0s (a negative match). Therefore, such binary attributes are often

considered “monary” (having one state). The dissimilarity based on these attributes is

called asymmetric binary dissimilarity , where the number of negative matches, t , is

considered unimportant and is thus ignored in the following computation:

r C s

q C r C s .

d

.

i , j

/D

(2.14)

Complementarily, we can measure the difference between two binary attributes based

on the notion of similarity instead of dissimilarity. For example, the asymmetric binary

similarity between the objects i and j can be computed as

q

q C r C s D 1 d

sim

.

i , j

/D

.

i , j

/

.

(2.15)

The coefficient sim

of Eq. (2.15) is called the Jaccard coefficient and is popularly

referenced in the literature.

When both symmetric and asymmetric binary attributes occur in the same data set,

the mixed attributes approach described in Section 2.4.6 can be applied.

.

i , j

/

Example2.18 Dissimilarity between binary attributes. Suppose that a patient record table (Table 2.4)

contains the attributes name, gender, fever, cough, test-1, test-2, test-3 , and test-4 , where

name is an object identifier, gender is a symmetric attribute, and the remaining attributes

are asymmetric binary.

For asymmetric attribute values, let the values Y ( yes ) and P ( positive ) be set to 1,

and the value N ( no or negative ) be set to 0. Suppose that the distance between objects

Table2.4 Relational Table Where Patients Are Described by Binary Attributes

name

gender

fever

cough

test-1

test-2

test-3

test-4

Jack

M

Y

N

P

N

Jim

M

Y

N

Mary

F

Y

N

P

N

P

N

.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home