Database Reference
In-Depth Information
und
e
nd
o
f
eed
d
o
und
od
ed
d
und
d
od
und
Fig. 5.3 The sux tree for the strings 'food', 'feed', 'fund', and 'found'.
instantiated in the worst case, as opposed to one node for every string
in the sux tree in the worst case.)
The size of the sux tree can be reduced even further by construct-
ing what is called a sux array. The strings in S are concatenated into
one large string (using a special terminating character # Σ for every
string) and the position of every sux is sorted in an array according
to the lexicographic ordering of the sux. The resulting sorted array
of positions can be searched using binary search on the sux pointed
to by each element in the array. The size of the sux array (since
it contains only sux positions) is very small. The drawback of the
su x array is that the data strings need to be readily available for
comparison purposes, resulting in accesses with poor locality of refer-
ence. Advanced algorithms for improving the eciency of sux arrays
have been proposed, but the structure eventually resembles an inverted
index built over all suxes in S .
Sux tries, sux trees, and sux arrays were designed to be used
primarily for main memory processing. Fairly recently, a variety of
alternatives have been specifically designed for external memory pro-
cessing. Cache-conscious alternatives have also been proposed.
5.2.3
B-trees
B-trees are used for indexing one-dimensional data in secondary stor-
age. A B-tree is a multiway tree that contains n values k 1
k n
that act as comparison keys and n + 1 pointers p 1 ,...,p n +1 , per node.
Pointer p i points to a child node that is the root of a sub-tree con-
taining all data values that lie in the interval ( k i− 1 ,k i ]. Data values
≤ ··· ≤
Search WWH ::




Custom Search