Searching - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

Records within a slot's list can be ordered in several ways: by insertion order,

by key value order, or by frequency-of-access order. Ordering the list by key value

provides an advantage in the case of an unsuccessful search, because we know to

stop searching the list once we encounter a key that is greater than the one being

searched for. If records on the list are unordered or ordered by frequency, then an

unsuccessful search will need to visit every record on the list.

Given a table of size M storing N records, the hash function will (ideally)

spread the records evenly among the M positions in the table, yielding on average

N=M records for each list. Assuming that the table has more slots than there are

records to be stored, we can hope that few slots will contain more than one record.

In the case where a list is empty or has only one record, a search requires only one

access to the list. Thus, the average cost for hashing should be (1). However, if

clustering causes many records to hash to only a few of the slots, then the cost to

access a record will be much higher because many elements on the linked list must

be searched.

Open hashing is most appropriate when the hash table is kept in main memory,

with the lists implemented by a standard in-memory linked list. Storing an open

hash table on disk in an efficient way is difficult, because members of a given

linked list might be stored on different disk blocks. This would result in multiple

disk accesses when searching for a particular key value, which defeats the purpose

of using hashing.

There are similarities between open hashing and Binsort. One way to view

open hashing is that each record is simply placed in a bin. While multiple records

may hash to the same bin, this initial binning should still greatly reduce the number

of records accessed by a search operation. In a similar fashion, a simple Binsort

reduces the number of records in each bin to a small number that can be sorted in

some other way.

9.4.3

Closed Hashing

Closed hashing stores all records directly in the hash table. Each record R with key

value k R has a home position that is h(k R ), the slot computed by the hash function.

If R is to be inserted and another record already occupies R's home position, then

R will be stored at some other slot in the table. It is the business of the collision

resolution policy to determine which slot that will be. Naturally, the same policy

must be followed during search as during insertion, so that any record not found in

its home position can be recovered by repeating the collision resolution process.

Bucket Hashing

One implementation for closed hashing groups hash table slots into buckets. The

M slots of the hash table are divided into B buckets, with each bucket consisting

Search WWH ::

Custom Search

Home