Java Reference
In-Depth Information
a kind of “mishmashed,” or hashed, number. The number actually has no real significance
beyond its usefulness in storing and retrieving a particular data record.
A glitch in the scheme is called a collision —this occurs when two different keys “hash
into” the same cell (or element) in the array. We cannot store two values in the same space,
so we need to find an alternative home for all values beyond the first that hash to a partic-
ular array index. There are many schemes for doing this. One is to “hash again” (i.e., to
apply another hashing transformation to the key to provide a next candidate cell in the
array). The hashing process is designed to distribute the values throughout the table, so the
assumption is that an available cell will be found with just a few hashes.
Another scheme uses one hash to locate the first candidate cell. If that cell is occupied,
successive cells are searched in order until an available cell is found. Retrieval works the
same way: The key is hashed once to determine the initial location and check whether it
contains the desired data. If it does, the search is finished. If it does not, successive cells are
searched linearly until the desired data is found.
The most popular solution to hash-table collisions is to have each cell of the table be
a hash “bucket,” typically a linked list of all the key-value pairs that hash to that cell. This
is the solution that Java's Hashtable and HashMap classes (from package java.util ) imple-
ment. Both Hashtable and HashMap implement the Map interface. The primary differences
between them are that HashMap is unsynchronized (multiple threads should not modify a
HashMap concurrently) and allows null keys and null values.
A hash table's load factor affects the performance of hashing schemes . The load factor
is the ratio of the number of occupied cells in the hash table to the total number of cells
in the hash table. The closer this ratio gets to 1.0, the greater the chance of collisions.
Performance Tip 16.2
The load factor in a hash table is a classic example of a memory-space/execution-time
trade-off: By increasing the load factor, we get better memory utilization, but the program
runs slower, due to increased hashing collisions. By decreasing the load factor, we get better
program speed, because of reduced hashing collisions, but we get poorer memory utiliza-
tion, because a larger portion of the hash table remains empty.
Computer science students study hashing schemes in courses called “Data Structures”
and “Algorithms.” Classes Hashtable and HashMap enable you to use hashing without
having to implement hash-table mechanisms—a classic example of reuse. This concept is
profoundly important in our study of object-oriented programming. As discussed in ear-
lier chapters, classes encapsulate and hide complexity (i.e., implementation details) and
offer user-friendly interfaces. Properly crafting classes to exhibit such behavior is one of the
most valued skills in the field of object-oriented programming. Figure 16.18 uses a
HashMap to count the number of occurrences of each word in a string.
1
// Fig. 16.18: WordTypeCount.java
2
// Program counts the number of occurrences of each word in a String.
3
import java.util.Map;
4
import java.util.HashMap;
5
import java.util.Set;
6
import java.util.TreeSet;
Fig. 16.18 | Program counts the number of occurrences of each word in a String . (Part 1 of 3.)
 
Search WWH ::




Custom Search