Database Reference
In-Depth Information
The technique has three significant disadvantages that discourage widespread use as
a primary hash function:
•
It offers no guarantee against collision (though it can be
minimized by carefully choosing the divisor).
•
It does not perform very well in attempting to distribute keys
evenly across the address space.
•
As the size of the data set increases, so does the likelihood of
collision.
A2.2.4 Mid-Square
In the mid-square technique, the key value is squared, and then a specific number of
digits is extracted from the “middle” of the result to yield the hash address. If an address
space of N digits is desired, then digits are truncated at both ends of the squared key
value, leaving N digits from the middle.
As an example, suppose that we need a 4-digit address, and we have a key value of
7895. The address could be determined as follows:
Like the division-remainder technique, when a record is to be stored or retrieved,
its address is first determined via the hash function.
The technique is a slight improvement over the division-remainder technique,
providing the following advantages:
•
It is simple and easy to implement.
•
It is very efficient for small files as well as large files.
•
It performs reasonably well in attempting to distribute the keys
evenly over the address space.
•
It avoids using two physical files for a single data file.
The technique has one significant disadvantage: It offers no guarantee against
collision (but the performance is better than the division-remainder technique).
A2.2.5 Folding
The folding technique involves partitioning the key into several parts and combining
the parts in a convenient way (via multiplication, addition or subtraction) to obtain a
hash address.