Database Reference
In-Depth Information
•
Jaccard:
1
θ
−
1
J
(
s, r
)
≥
θ
⇒H
(
s, r
)
≤
|
r
|
.
•
Dice:
2
θ
2
D
(
s, r
)
≥
θ
⇒H
(
s, r
)
≤
−
|
r
|
.
•
Cosine:
1
θ
2
θ
C
(
s, r
)
≥
θ
⇒H
(
s, r
)
≤
+1
−
|
r
|
.
Notice that the pigeonhole signature and the prefix enumeration sig-
nature have to be built with respect to a pre-determined minimum
query threshold
θ
. Queries with threshold
θ
<θ
cannot be answered
correctly using the existing signatures. Queries with threshold
θ
≥
θ
can be answered without any false negatives.
The original partenum signature proposed in the literature uses
an enumeration strategy in order to reduce the problem of identify-
ing sub-signatures with intersection size larger than or equal to
x
,to
that of identifying sub-signatures with non-empty intersections (sim-
ilar to the prefix enumeration approach). The idea is simple. Given
sub-signatures consisting of 1 +
x
partitions, enumerate all possible
1+
x
=1+
x
combinations of
x
partitions and hash the resulting
combinations instead of each of the original 1 +
x
partitions individ-
ually. Then, if two sub-signatures agree in at least
x
partitions they
must agree in at least one hash value. The same idea applies for the
pigeonhole signature only for
θ
= 1, otherwise the number of combina-
tions explodes, yielding very large signatures. This is hardly a prob-
lem though, since answering intersection queries using the algorithms
described in Section 6 is equally ecient for small and large intersection
thresholds alike.
7.2.5
Top-
k
Selection Queries
Partitioning and enumeration signatures cannot be used to answer
top-
k
selection queries due to the fact that a minimum pre-determined