Database Reference
In-Depth Information
is not space ecient. A more space-ecient implementation of a trie index
is the PATRICIA index. 47 , 58 PATRICIA stands for practical algorithm to
retrieve information coded in alphanumeric. Tries are the fundamental index
schemes for string-oriented databases, for example, very large text databases
used in information-retrieval problems, genome sequence databases, and bio-
informatics. Tries can be perceived as generic index methods with variants
such as Su x-Trees , String B-Trees , and Burst Tries . 31 , 37 , 39 , 43 , 47 , 99
6.4.3 Hashing Schemes
Hashing methods are mechanisms for accessing records by the address of a
record. The address is computed using a key of the record, usually the primary
key or some unique combination of attribute values of the record. The method
involves a set of n disk blocks, also termed buckets , with index values numbered
from 0 to n
1. A bucket is a unit of storage containing one or more records.
For each record
k i , r i
whose key is k i and entire record is r i , a hash function
H
()
is used to compute the bucket address I where the record is stored, that
is, I
. The result I of computing the hash function forms the index
into the array of buckets or data blocks that hold the records. A hashing
scheme can either be static or dynamic .
In static hashing, the number of buckets is preallocated. Records with dif-
ferent search-key values may map to the same bucket, in which case we say
that a collision occurs. To locate a record in a bucket with multiple keys, the
entire bucket has to be searched sequentially. Occasionally a bucket may have
more records that hash into it than it can contain. When this occurs it is
handled by invoking some overflow handling methods. Typical overflow han-
dling methods include separate chaining, rehashing, coalesced chaining, and
so forth. To minimize the occurrence of overflow, a hash function has to be
chosen to distribute the keys uniformly over the allocated buckets. Hashing
methods on single attributes are discussed extensively in the literature. 25 , 47 , 79
An alternative to static hashing is dynamic hashing. Dynamic hashing uses
a dynamically changing function that allows the addressed space, that is, the
number of allocated buckets, to grow and shrink with insertions and dele-
tions of the records, respectively. It embeds the handling of overflow records
dynamically into its primary address space to avoid explicit management of
overflow buckets. Various techniques for dynamic hashing have been proposed.
Notable among these are dynamic hashing, 51 linear hashing, 54 and extendible
hashing. 30 , 68
=
H
(
k i )
6.5 Multidimensional Index Schemes
Multidimensional indexing arises from the need for ecient query processing
in numerous application domains where objects are generally characterized
by feature vectors . They arise naturally from representation and querying of
Search WWH ::




Custom Search