Database Reference
In-Depth Information
distance between vectors. The Hamming distance can be used to derive
upper bounds on the Intersection, Jaccard, Dice, and Cosine similarity
for any uniform weighing function W .
Lemma 7.10. Assuming a uniform weight function W ( λ )= c, λ
Λ,
given two strings s, r
2 θ
I
( s, r )
θ
⇒H
( s, r )
≤|
s
|
+
|
r
|−
c .
Proof.
I ( s, r ) ≥ θ ⇒ c|s ∩ r|≥θ,
and
H
( s, r )=
|
s
|−|
s
r
|
+
|
r
|−|
s
r
|
=
|
s
|
+
|
r
|−
2
|
s
r
|⇒
= |s|
+
|r|−H
( s, r )
|
s
r
|
.
2
Hence,
2 θ
2 θ
|
s
|
+
|
r
|−H
( s, r )
c ⇒H
( s, r )
≤|
s
|
+
|
r
|−
c .
Similarly
Lemma 7.11. Assuming a uniform weight function W ( λ )= c , λ
Λ,
given two strings s, r
Normalized Weighted Intersection:
N
( s, r )
θ
⇒H
( s, r )
≤|
s
|
+
|
r
|−
2 θ max(
|
s
|
,
|
r
|
) .
Jaccard:
1
θ
1+ θ ( |s| + |r| ) .
J ( s, r ) ≥ θ ⇒H ( s, r )
Dice:
D
( s, r )
θ
⇒H
( s, r )
(1
θ )(
|
s
|
+
|
r
|
) .
Cosine:
2 θ
C
( s, r )
θ
⇒H
( s, r )
≤|
s
|
+
|
r
|−
|
s
||
r
|
.
 
Search WWH ::




Custom Search