Database Reference
In-Depth Information
Jaccard:
1
θ
1
J
( s, r )
θ
⇒H
( s, r )
|
r
|
.
Dice:
2
θ
2
D
( s, r )
θ
⇒H
( s, r )
|
r
|
.
Cosine:
1
θ
2 θ
C
( s, r )
θ
⇒H
( s, r )
+1
|
r
|
.
Notice that the pigeonhole signature and the prefix enumeration sig-
nature have to be built with respect to a pre-determined minimum
query threshold θ . Queries with threshold θ cannot be answered
correctly using the existing signatures. Queries with threshold θ
θ
can be answered without any false negatives.
The original partenum signature proposed in the literature uses
an enumeration strategy in order to reduce the problem of identify-
ing sub-signatures with intersection size larger than or equal to x ,to
that of identifying sub-signatures with non-empty intersections (sim-
ilar to the prefix enumeration approach). The idea is simple. Given
sub-signatures consisting of 1 + x partitions, enumerate all possible
1+ x =1+ x combinations of x partitions and hash the resulting
combinations instead of each of the original 1 + x partitions individ-
ually. Then, if two sub-signatures agree in at least x partitions they
must agree in at least one hash value. The same idea applies for the
pigeonhole signature only for θ = 1, otherwise the number of combina-
tions explodes, yielding very large signatures. This is hardly a prob-
lem though, since answering intersection queries using the algorithms
described in Section 6 is equally ecient for small and large intersection
thresholds alike.
7.2.5
Top-
k
Selection Queries
Partitioning and enumeration signatures cannot be used to answer
top- k selection queries due to the fact that a minimum pre-determined
 
Search WWH ::




Custom Search