Information Technology Reference
In-Depth Information
a
n
Russel and Rao: f =
,
(7)
Sokal and Michener: f = a + d
n
,
(8)
a
a + b + c
Jaccard and Needham: f =
,
(9)
a
b + c +1 ,
Kulzinski: f =
(10)
a + d
a + d +2( b + c ) ,
Rogers and Tanimoto: f =
(11)
f = ad
bc
ad + bc
Yule:
.
(12)
A Hamming distance d H is a well known measure and it could be denoted in
terms of a , b , c , d as d H = b + c .
The last of the discussed measures is the r -contiguous bits matching rule. The
rule is a classifier rather than a measure because it returns just two values, true
and false. True is returned (i.e. the classifier says that two patterns match each
other) if there will be a sequence of bits of size r which are identical in both
patterns. False is returned otherwise.
Additionally a transformation T operator [8] was applied to the measured bit-
strings. Before the evaluation every pair was modified by a T operator working
as follows. For every two patterns A, B
N :
∈{
0 , 1
}
i∈{ 0 , 1 ,...,N} A [ i ]=0
( A [ i ]=1
B [ i ]=1
B [ i ])
(13)
The operator reduces the search space, e.g. for a set of 65536 pairs of 8-bit
binary strings we obtain 256 different transformed pairs. After transformation
one of the strings is always turned into a sequence of digits ”1”, while the other
includes information about differences between the input strings. The operator
is simple and of low computational cost and it significantly modifies properties
of the measure and improves their sensitivity.
The operator should be applied just before matching. Every matched pair of
strings is at first turned into a new pair with the T operator and then the measure is
applied to the new pair of strings. The returned value is assigned to the original pair
of strings, i.e. the pair before transformation. The transformed X [ i ] never equals
zero (one of the resulting strings is always a sequence of digits ”1”) so the values c
and d in (6) are equal zero for all pairs of transformed binary strings.
All the measures except from Yule (12) were applied to the transformed pairs
of binary strings too. In case of Yule the transformed pair of bit-strings cannot
be evaluated because of division by zero problem (the denominator always equals
zero). The definitions of the measures (7) - (11) changed as follow:
a
a + b
T1: f =
(14)
a
b +1
T2: f =
(15)
a
a +2 b
T3: f =
(16)
Search WWH ::




Custom Search