Database Reference
In-Depth Information
The generalization of Example 1.4 is that when hash-keys are integers, chosing B so it
has any common factor with all (or even most of) the possible hash-keys will result in non-
random distribution into buckets. Thus, it is normally preferred that we choose B to be a
prime. That choice reduces the chance of nonrandom behavior, although we still have to
consider the possibility that all hash-keys have B as a factor. Of course there are many other
types of hash functions not based on modular arithmetic. We shall not try to summarize the
options here, but some sources of information will be mentioned in the bibliographic notes.
What if hash-keys are not integers? In a sense, all data types have values that are com-
posed of bits, and sequences of bits can always be interpreted as integers. However, there
are some simple rules that enable us to convert common types to integers. For example,
if hash-keys are strings, convert each character to its ASCII or Unicode equivalent, which
can be interpreted as a small integer. Sum the integers before dividing by B . As long as
B is smaller than the typical sum of character codes for the population of strings, the dis-
tribution into buckets will be relatively uniform. If B is larger, then we can partition the
characters of a string into groups of several characters each. Treat the concatenation of the
codes for the characters of a group as a single integer. Sum the integers associated with all
the groups of a string, and divide by B as before. For instance, if B is around a billion, or
2 30 , then grouping characters four at a time will give us 32-bit integers. The sum of several
of these will distribute fairly evenly into a billion buckets.
For more complex data types, we can extend the idea used for converting strings to in-
tegers, recursively.
• For a type that is a record, each of whose components has its own type, recursively
convert the value of each component to an integer, using the algorithm appropriate
for the type of that component. Sum the integers for the components, and convert
the integer sum to buckets by dividing by B .
• For a type that is an array, set, or bag of elements of some one type, convert the
values of the elements' type to integers, sum the integers, and divide by B .
1.3.3
Indexes
An index is a data structure that makes it efficient to retrieve objects given the value of one
or more elements of those objects. The most common situation is one where the objects
are records, and the index is on one of the fields of that record. Given a value v for that
field, the index lets us retrieve all the records with value v in that field. For example, we
could have a file of (name, address, phone) triples, and an index on the phone field. Given
a phone number, the index allows us to find quickly the record or records with that phone
number.
Search WWH ::




Custom Search