Accelerating Queries on Very Large Datasets - Scientific Data Management

Database Reference

In-Depth Information

I0

<x0,I1> <x4, I2>

I1

I2

<x0,B0> <x2, B1>

<x4,B2> <x6, B3>

B0

B1

B2

B3

x0{15}, x1{5,6}

x2{12,14}, x3{3,4}

x4{10}, x5{8,9,15}

x6{1,2}, x7{7,13}

Figure 6.3 Illustrative tree-structured index by ISAM organization. The

data blocks are labeled I0,

...

, I2, B0,

...

, B3.

given a file of N data items. Note that we are concerned with

indexing very large datasets so that in practice b

O

(

log b (

N

/

b

))

2. In this scheme the

ISAM index was mapped directly to the layout of data on a disk storage where

the root level index searches give the proper cylinder number of the record.

The first track of each cylinder gives the track number, and this corresponds

to the second level of indexing for searches. At the lowest level, this corre-

sponds to the locations of records within a track. A sequential scan at this

level is used to locate the records. The ISAM index method is a static index

method. Subsequent insertions require the records to be managed as overflow

records that are periodically merged by reorganizing the entire ISAM index.

The indexed sequential organization illustrates the structural characteristic

of tree-structured index schemes. To circumvent the static limitations of the

ISAM, the B -Tree indexing scheme was developed. Detailed coverage of the

B -Tree and its variants such as B + -Tree, B ∗ -Tree, and Prefix- B -Tree is given in

References 9, 24, 25, and 47. The B -Tree is a dynamic, height-balanced index

scheme that grows and shrinks by recursively splitting and merging nodes

from the lowest level of the index tree up to the root node. The VSAM file

organization 55

>>

is a B + -Tree that is mapped directly to the layout of data on

disk storage.

Tree-structured index schemes typically maintain the fixed-sized keys, such

as integers or fixed-length character strings, in the index nodes. When the

keys are variable-length strings, then rather than storing the entire keys, only

sucient strings of leading characters that form separator keys are stored.

This is the basic idea of the prefix- B -Tree. An alternative index method for

long alphabetic strings is the use of a trie . 39 , 47 , 79 , 99 Suppose the keys are formed

from a domain of alphabet set

. Each node of the trie-

index at level i is comprised of all occurrences of the distinct i th characters of

keys with the same i

with cardinality

| |

−

1 prefix string. A node in a trie index structure has size

of at most

entries, where each entry is pair of a character and a pointer

to the lower level node.

A special kind of trie, called the sux tree, 37 , 39 , 43 , 47 can be used to index

all suxes in a text in order to carry out fast full or partial text searches. A

basic trie implementation has the disadvantage of having a single path that

| |

Scientific Data Management

Search WWH ::

Custom Search

Home