Index Structures - Approximate String Processing

Database Reference

In-Depth Information

Fig. 5.4 A B-tree data structure for strings.

(or pointers to the actual data values) are stored in the leaves of the

tree. A simple B-tree structure is shown in Figure 5.4.

Assume that data values and keys are strings sorted in lexicographic

order. Given a query string v an exact string match query can be

answered by starting at the root of the B-tree and following the pointer

to the next level corresponding to the largest key s s.t. v ≤ s . We con-

tinue iteratively until we reach a leaf node. A simple scan of the leaf

node reveals whether v exists or not. An optimized specialization of

the B-tree specifically designed to index strings, called the string B-

tree, organizes each index node as a trie built over the keys of the node

(instead of a sorted list of keys), to reduce the computational cost. Spe-

cialized algorithms for answering approximate string matching queries

using B-trees will be covered in Section 8.

5.3 Related Work

Witten et al. [71], Baeza-Yates and Ribeiro-Neto [8], and Zobel and

Moffat [79] present very detailed discussions on inverted indexes. Baeza-

Yates and Ribeiro-Neto [8] also discuss sux trees and sux arrays.

Puglisi et al. [59] discuss all varieties of sux array construction algo-

rithms. Comer [24] gives a detailed explanation of B-trees. The string

B-tree was proposed by Ferragina and Grossi [30].

Search WWH ::

Custom Search

Home