Database Reference
In-Depth Information
Algorithm 8.3.4: TopKQuery( v,N,θ,s l ,s u ,H )
if N is the leaf node
for each s
N
if Edit Distance Verification ( v, s, θ )
do
Insert s into H
if
do
do
>k : H. pop
Update the global threshold θ
|
H
|
else
if LB ( v, [ s l ,s 1 ])
θ
do TopKQuery ( v,N 1 ,θ,s l ,s 1 ,H )
for j =2 to m
do if LB ( v, [ s j− 1 ,s j ])
do
θ
do TopKQuery ( v,N j ,θ,s j− 1 ,s j ,H )
if LB ( v, [ s m ,s u ])
θ
do TopKQuery ( v,N m +1 ,s m ,s u ,H )
We use LB ([ s l ,s u ] , [ r l ,r u ]) to denote the lower bound edit distance
between the two string intervals. This property allows the direct adop-
tion of the standard join algorithm, as is shown in Algorithm 8.3.5. The
algorithm recursively expands the nodes in depth-first order, to give a
clear idea on the joining process. In practice we can use a heap to store
the node pairs and pop out the next candidate to join if it satisfies the
minimal distance lower bound.
Finally, this B-tree index scheme is also capable of handling nor-
malized edit distance. This is achievable if the maximal length of the
strings within an interval can be estimated by the string order:
Property 8.4(Length Bounding). Given any string interval [ s i ,s j ],
the string order φ is length bounding if it e ciently returns an upper
bound on the length of any string s
[ s i ,s j ].
The length bounding property can be combined with any of the
query processing algorithms, by dividing the lower bound edit distance
with the maximal length of the string intervals. Therefore, this index
 
Search WWH ::




Custom Search