Database Reference
In-Depth Information
'11011011'. Therefore, each string is indexed in the B-tree according
to the z -order value of its q -gram counting vector. Formally, the gram
counting order φ gc for string s is defined as
φ gc ( s ) = zorder( s ) .
(8.5)
The property of comparability is straightforwardly satisfied since
it only requires to compare the binary representations of the z -order
values on vectors s and r to verify their order in the B-tree.
Next, we analyze the property of lower bounding. For a string inter-
val [ s i ,s j ]in z -order representation, a lower bound and upper bound
on the number of q -grams in each bucket for all strings in the inter-
val can be derived. This is easy to see with an example. Let s i and s j
have binary z -order values '11011011' and '11011110', respectively. Any
string between them must have the prefix '11011', while the remaining
bits can be either 0 or 1. In Figure 8.2 it can be seen that some accu-
rate estimation on the bucket values can be recovered if the common
prefix is long enough. Specifically, the first bucket B 1 is clearly 3, since
both bits are deterministically decided. For the rest of the buckets, the
set of possible values can also be calculated with the confirmed prefix
bits.
Assume that for a given string interval [ s i ,s j ] the lower and upper
bound values on bucket B are B l and B u
L ). After trans-
forming the query v to vector v in gram counting order, we apply
Equation (8.4) using the lower and upper bound values of the buckets,
to reach some new lower bound on the edit distance from v to any
string contained in the interval. The pairwise lower bounding property
can be achieved similarly by retrieving the bound pairs ( B l ( s ) ,B u ( s ))
and ( B l ( r ) ,B u ( r )) for [ s l ,s u ] and [ r l ,r u ], respectively.
(for 1
11
11
??
0
?
1
1
1
?
0
?
1
?
{
{
2,3
}
{
0,1
}
{
2,3
}
Fig. 8.2 Example of bounding the values in buckets on z -order.
Search WWH ::




Custom Search