Hash tables - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

versus an unsuccessful search is unusual in that, if , the successful search

is more expensive than the unsuccessful search. This condition makes sense,

however, because many unsuccessful searches encounter an empty linked list.

A typical load factor is 1.0; a lower load factor does not significantly

enhance performance, but it costs extra space. The appeal of separate chaining

hashing is that performance is not affected by a moderately increasing load

factor; thus rehashing can be avoided. For languages that do not allow

dynamic array expansion, this consideration is significant. Furthermore, the

expected number of probes for a search is less than in quadratic probing, par-

ticularly for unsuccessful searches.

We can implement separate chaining hashing by using our existing linked

list classes. However, because the header node adds space overhead and is not

really needed, if space were at a premium we could elect not to reuse compo-

nents and instead implement a simple stacklike list. The coding effort turns

out to be remarkably light. Also, the space overhead is essentially one refer-

ence per node, plus an additional reference per list; for example, when the

load factor is 1.0, it is two references per item. This feature could be impor-

tant in other programming languages if the size of an item is large. In that

case, we have the same trade-offs as with the array and linked list implemen-

tations of stacks. The Java Collections API uses seperate chaining hashing,

with a default load factor of 0.75.

To illustrate the complexity (or rather, the relative lack of complexity) of

the separate chaining hash table, Figure 20.20 provides a short sketch of the

basic implementation of separate chaining hash tables. It avoids issues such as

rehashing, and does not implement remove and does not even keep track of the

current size. Nonetheless, it shows the basic logic of add and contains , both of

which use the hash code to select an appropriate singly-linked list.

λ

<

2

For separate chain-

ing hashing, a rea-

sonable load factor

is 1.0. A lower load

factor does not sig-

nificantly improve

performance; a

moderately higher

load factor is

acceptable and can

save space.

hash tables versus

binary search trees

20.6

We can also use binary search trees to implement insert and find operations.

Although the resulting average time bounds are O (log N ), binary search trees

also support routines that require order and thus are more powerful. Using a

hash table, we cannot efficiently find the minimum element or extend the table

to allow computation of an order statistic. We cannot search efficiently for a

string unless the exact string is known. A binary search tree could quickly find

all items in a certain range, but this capability is not supported by a hash table.

Furthermore, the O (log N ) bound is not necessarily that much more than O (1),

especially since no multiplications or divisions are required by search trees.

Use a hash table

instead of a binary

search tree if you

do not need order

statistics and are

worried about non-

random inputs.

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home