Databases Reference
In-Depth Information
What if hash-keys are not integers? In a sense, all data types have values
that are composed of bits, and sequences of bits can always be interpreted as in-
tegers. However, there are some simple rules that enable us to convert common
types to integers. For example, if hash-keys are strings, convert each character
to its ASCII or Unicode equivalent, which can be interpreted as a small inte-
ger. Sum the integers before dividing by B. As long as B is smaller than the
typical sum of character codes for the population of strings, the distribution
into buckets will be relatively uniform. If B is larger, then we can partition the
characters of a string into groups of several characters each. Treat the concate-
nation of the codes for the characters of a group as a single integer. Sum the
integers associated with all the groups of a string, and divide by B as before.
For instance, if B is around a billion, or 2 30 , then grouping characters four at
a time will give us 32-bit integers. The sum of several of these will distribute
fairly evenly into a billion buckets.
For more complex data types, we can extend the idea used for converting
strings to integers, recursively.
•For a type that is a record, each of whose components has its own type,
recursively convert the value of each component to an integer, using the
algorithm appropriate for the type of that component. Sum the integers
for the components, and convert the integer sum to buckets by dividing
by B.
•For a type that is an array, set, or bag of elements of some one type,
convert the values of the elements' type to integers, sum the integers, and
divide by B.
1.3.3
Indexes
An index is a data structure that makes it e cient to retrieve objects given the
value of one or more elements of those objects. The most common situation
is one where the objects are records, and the index is on one of the fields
of that record. Given a value v for that field, the index lets us retrieve all
the records with value v in that field. For example, we could have a file of
(name, address, phone) triples, and an index on the phone field. Given a phone
number, the index allows us to find quickly the record or records with that
phone number.
There are many ways to implement indexes, and we shall not attempt to
survey the matter here. The bibliographic notes give suggestions for further
reading. However, a hash table is one simple way to build an index. The field
or fields on which the index is based form the hash-key for a hash function.
Records have the hash function applied to value of the hash-key, and the record
itself is placed in the bucket whose number is determined by the hash function.
The bucket could be a list of records in main-memory, or a disk block, for
example.
Search WWH ::




Custom Search