Database Reference
In-Depth Information
8.3.2
B-Trees
An alternative approach is to use a B-tree structure to store the strings
in a one-dimensional order that provides guarantees on the retrieval of
strings based on unweighted edit distance.
To index the strings with the B-tree, it is necessary to construct a
mapping from the string domain to integer space. Formally:
Definition 8.1(String Order).
Given the string domain Σ
∗
, a string
order is a mapping function
φ
:Σ
∗
→
N
, mapping each string to an
integer value.
The definition above implies that the mapping function
φ
uniquely
decides the string order. Therefore, we abuse notation to represent with
φ
both the actual string order imposed as well as the mapping function
itself. Note that some strings might be mapped to the same integer by
the function
φ
. In many cases, the use of a concrete mapping function
is ineffective on both computation and storage. To alleviate this prob-
lem, it is better if we do not construct the mapping explicitly. This
requirement can be fulfilled if the string order satisfies the following
desirable property on comparability:
Property 8.1 (Comparability).
A string order
φ
is eciently com-
parable if it takes linear time to verify if
φ
(
s
) is larger than
φ
(
r
) for
any string pair
s
and
r
.
The verification method is supposed to take linear time with respect
to the lengths of the strings. It is easy to see that the insertion and
deletion operations on the B-tree rely only on the comparability of
the string order. Therefore, any string order having Property 8.1 can
be used to index strings on the B-tree. Algorithm 8.3.2 can be used
for locating the first leaf node of the tree potentially containing a
given target string. Each intermediate node
N
in the B-tree contains
m
splitting strings
{
s
1
,...,s
m
}
and
m
+ 1 pointers to children nodes
{
. The algorithm identifies the first splitting string with
mapping value larger than that of the query string and iteratively
N
1
,...,N
m
+1
}