Java Reference
In-Depth Information
with key k can be stored in HT[k], and the hash function is simply h(k) = k. To
find the record with key value k, simply look in HT[k].
Typically, there are many more values in the key range than there are slots in
the hash table. For a more realistic example, suppose that the key can take any
value in the range 0 to 65,535 (i.e., the key is a two-byte unsigned integer), and that
we expect to store approximately 1000 records at any given time. It is impractical
in this situation to use a hash table with 65,536 slots, because most of the slots will
be left empty. Instead, we must devise a hash function that allows us to store the
records in a much smaller table. Because the possible key range is larger than the
size of the table, at least some of the slots must be mapped to from multiple key
values. Given a hash function h and two keys k 1 and k 2 , if h(k 1 ) = = h(k 2 )
where is a slot in the table, then we say that k 1 and k 2 have a collision at slot
under hash function h.
Finding a record with key value K in a database organized by hashing follows
a two-step procedure:
1. Compute the table location h(K).
2. Starting with slot h(K), locate the record containing key K using (if neces-
sary) a collision resolution policy.
9.4.1
Hash Functions
Hashing generally takes records whose key values come from a large range and
stores those records in a table with a relatively small number of slots. Collisions
occur when two records hash to the same slot in the table. If we are careful—or
lucky—when selecting a hash function, then the actual number of collisions will
be few. Unfortunately, even under the best of circumstances, collisions are nearly
unavoidable. 1 For example, consider a classroom full of students. What is the
probability that some pair of students shares the same birthday (i.e., the same day
of the year, not necessarily the same year)? If there are 23 students, then the odds
are about even that two will share a birthday. This is despite the fact that there are
365 days in which students can have birthdays (ignoring leap years), on most of
which no student in the class has a birthday. With more students, the probability
of a shared birthday increases.
The mapping of students to days based on their
1 The exception to this is perfect hashing. Perfect hashing is a system in which records are
hashed such that there are no collisions. A hash function is selected for the specific set of records
being hashed, which requires that the entire collection of records be available before selecting the
hash function. Perfect hashing is efficient because it always finds the record that we are looking
for exactly where the hash function computes it to be, so only one access is required. Selecting a
perfect hash function can be expensive, but might be worthwhile when extremely efficient search
performance is required. An example is searching for data on a read-only CD. Here the database will
never change, the time for each access is expensive, and the database designer can build the hash
table before issuing the CD.
Search WWH ::




Custom Search