Database Reference
In-Depth Information
A, s
identify the answer set A , s.t.
|
A
|
= k and
s
S
\
A :Θ( v,s )
Θ( v,s ).
Top- k queries are very useful in practice since in many applications it
is dicult to decide in advance a meaningful threshold θ for running
an all-match query.
Clearly, all-match queries are easier to evaluate than top- k queries,
given that a cutoff similarity threshold for top- k queries cannot be
decided in advance, making initial pruning of strings di cult. Nev-
ertheless, once k good answers have been identified (good in a sense
that the k -th answer has similarity suciently close to the correct k -th
answer) top- k queries essentially degenerate to all-match queries. Query
answering strategies typically try to identify k good answers as fast as
possible and subsequently revert to all-match query strategies.
4.2
Join Queries
Given two sets of strings and a user specified threshold, all-match join
queries return all pairs of strings in the cross product of the two sets,
with similarity larger than or equal to the threshold.
Definition 4.3 (All-Match Join Query). Given a string similarity
function Θ, two sets of strings S,R , and a positive threshold θ , identify
the answer set
A = { ( s, r ) ∈ S × R :Θ( s, r ) ≥ θ}.
Top- k join queries return the k pairs with the largest similarity among
all pairs in the cross product.
Definition 4.4(Top-
Join Query). Given a string similarity func-
tion Θ, two sets of strings S,R , and a positive integer k , identify
the answer set A s.t.
k
( s ,r )
|
A
|
= k and
( s, r )
A and
( S
×
R )
\
Θ( s ,r ).
A :Θ( s, r )
 
Search WWH ::




Custom Search