Database Reference
In-Depth Information
8.3.2
B-Trees
An alternative approach is to use a B-tree structure to store the strings
in a one-dimensional order that provides guarantees on the retrieval of
strings based on unweighted edit distance.
To index the strings with the B-tree, it is necessary to construct a
mapping from the string domain to integer space. Formally:
Definition 8.1(String Order). Given the string domain Σ , a string
order is a mapping function φ N
, mapping each string to an
integer value.
The definition above implies that the mapping function φ uniquely
decides the string order. Therefore, we abuse notation to represent with
φ both the actual string order imposed as well as the mapping function
itself. Note that some strings might be mapped to the same integer by
the function φ . In many cases, the use of a concrete mapping function
is ineffective on both computation and storage. To alleviate this prob-
lem, it is better if we do not construct the mapping explicitly. This
requirement can be fulfilled if the string order satisfies the following
desirable property on comparability:
Property 8.1 (Comparability). A string order φ is eciently com-
parable if it takes linear time to verify if φ ( s ) is larger than φ ( r ) for
any string pair s and r .
The verification method is supposed to take linear time with respect
to the lengths of the strings. It is easy to see that the insertion and
deletion operations on the B-tree rely only on the comparability of
the string order. Therefore, any string order having Property 8.1 can
be used to index strings on the B-tree. Algorithm 8.3.2 can be used
for locating the first leaf node of the tree potentially containing a
given target string. Each intermediate node N in the B-tree contains
m splitting strings
{
s 1 ,...,s m
}
and m + 1 pointers to children nodes
{
. The algorithm identifies the first splitting string with
mapping value larger than that of the query string and iteratively
N 1 ,...,N m +1 }
 
Search WWH ::




Custom Search