Hash tables - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

The worst case for hashing generally results from an implementation

error, whereas sorted input can make binary search trees perform poorly. Bal-

anced search trees are quite expensive to implement. Hence, if no ordering

information is required and there is any suspicion that the input might be

sorted, hashing is the data structure of choice.

hashing applications

20.7

Hashing applications are abundant. Compilers use hash tables to keep track of

declared variables in source code. The data structure is called a symbol table .

Hash tables are the ideal application for this problem because only insert and

find operations are performed. Identifiers are typically short, so the hash func-

tion can be computed quickly. In this application, most searches are successful.

Another common use of hash tables is in game programs. As the program

searches through different lines of play, it keeps track of positions that it has

encountered by computing a hash function based on the position (and storing

its move for that position). If the same position recurs, usually by a simple

transposition of moves, the program can avoid expensive recomputation. This

general feature of all game-playing programs is called the transposition table .

We discussed this feature in Section 10.2, where we implemented the tic-tac-

toe algorithm.

A third use of hashing is in online spelling checkers. If misspelling

detection (as opposed to correction) is important, an entire dictionary can be

prehashed and words can be checked in constant time. Hash tables are well

suited for this purpose because the words do not have to be alphabetized.

Printing out misspellings in the order they occurred in the document is

acceptable.

Hashing applica-

tions are abundant.

summary

Hash tables can be used to implement the insert and find operations in con-

stant average time. Paying attention to details such as load factor is especially

important in the use of hash tables; otherwise, the constant time bounds are

not meaningful. Choosing the hash function carefully is also important when

the key is not a short string or integer. You should pick an easily computable

function that distributes well.

For separate chaining hashing, the load factor is typically close to 1,

although performance does not significantly degrade unless the load factor

becomes very large. For quadratic probing, the table size should be prime and

the load factor should not exceed 0.5. Rehashing should be used for quadratic

probing to allow the table to grow and maintain the correct load factor. This

Search WWH ::

Custom Search

Home