Generic Collections - Java: How to Program Early Objects

Java Reference

In-Depth Information

a kind of “mishmashed,” or hashed, number. The number actually has no real significance

beyond its usefulness in storing and retrieving a particular data record.

A glitch in the scheme is called a collision —this occurs when two different keys “hash

into” the same cell (or element) in the array. We cannot store two values in the same space,

so we need to find an alternative home for all values beyond the first that hash to a partic-

ular array index. There are many schemes for doing this. One is to “hash again” (i.e., to

apply another hashing transformation to the key to provide a next candidate cell in the

array). The hashing process is designed to distribute the values throughout the table, so the

assumption is that an available cell will be found with just a few hashes.

Another scheme uses one hash to locate the first candidate cell. If that cell is occupied,

successive cells are searched in order until an available cell is found. Retrieval works the

same way: The key is hashed once to determine the initial location and check whether it

contains the desired data. If it does, the search is finished. If it does not, successive cells are

searched linearly until the desired data is found.

The most popular solution to hash-table collisions is to have each cell of the table be

a hash “bucket,” typically a linked list of all the key-value pairs that hash to that cell. This

is the solution that Java's Hashtable and HashMap classes (from package java.util ) imple-

ment. Both Hashtable and HashMap implement the Map interface. The primary differences

between them are that HashMap is unsynchronized (multiple threads should not modify a

HashMap concurrently) and allows null keys and null values.

A hash table's load factor affects the performance of hashing schemes . The load factor

is the ratio of the number of occupied cells in the hash table to the total number of cells

in the hash table. The closer this ratio gets to 1.0, the greater the chance of collisions.

Performance Tip 16.2

The load factor in a hash table is a classic example of a memory-space/execution-time

trade-off: By increasing the load factor, we get better memory utilization, but the program

runs slower, due to increased hashing collisions. By decreasing the load factor, we get better

program speed, because of reduced hashing collisions, but we get poorer memory utiliza-

tion, because a larger portion of the hash table remains empty.

Computer science students study hashing schemes in courses called “Data Structures”

and “Algorithms.” Classes Hashtable and HashMap enable you to use hashing without

having to implement hash-table mechanisms—a classic example of reuse. This concept is

profoundly important in our study of object-oriented programming. As discussed in ear-

lier chapters, classes encapsulate and hide complexity (i.e., implementation details) and

offer user-friendly interfaces. Properly crafting classes to exhibit such behavior is one of the

most valued skills in the field of object-oriented programming. Figure 16.18 uses a

HashMap to count the number of occurrences of each word in a string.

1

// Fig. 16.18: WordTypeCount.java

2

// Program counts the number of occurrences of each word in a String.

3

import java.util.Map;

4

import java.util.HashMap;

5

import java.util.Set;

6

import java.util.TreeSet;

Fig. 16.18 | Program counts the number of occurrences of each word in a String . (Part 1 of 3.)

Java: How to Program Early Objects

Search WWH ::

Custom Search

Home