Database Reference
In-Depth Information
Fig. 5.4 A B-tree data structure for strings.
(or pointers to the actual data values) are stored in the leaves of the
tree. A simple B-tree structure is shown in Figure 5.4.
Assume that data values and keys are strings sorted in lexicographic
order. Given a query string v an exact string match query can be
answered by starting at the root of the B-tree and following the pointer
to the next level corresponding to the largest key s s.t. v ≤ s . We con-
tinue iteratively until we reach a leaf node. A simple scan of the leaf
node reveals whether v exists or not. An optimized specialization of
the B-tree specifically designed to index strings, called the string B-
tree, organizes each index node as a trie built over the keys of the node
(instead of a sorted list of keys), to reduce the computational cost. Spe-
cialized algorithms for answering approximate string matching queries
using B-trees will be covered in Section 8.
5.3 Related Work
Witten et al. [71], Baeza-Yates and Ribeiro-Neto [8], and Zobel and
Moffat [79] present very detailed discussions on inverted indexes. Baeza-
Yates and Ribeiro-Neto [8] also discuss sux trees and sux arrays.
Puglisi et al. [59] discuss all varieties of sux array construction algo-
rithms. Comer [24] gives a detailed explanation of B-trees. The string
B-tree was proposed by Ferragina and Grossi [30].
Search WWH ::




Custom Search