Java Reference
In-Depth Information
this case study because otherwise we are likely to pursue an approach that won't
work well.
For example, in the program we are about to write, we have to read in a file and
come up with a list of the words from the file that doesn't have any duplicates. One
approach would be to test each word as we read it in to see if it is in the list, as
described in the following pseudocode:
list = new empty list.
while (more words to process) {
word = next word from file.
if (list does not contain word) {
add word to list.
}
}
The problem with this approach is that it would require us to call the ArrayList
method called contains each time the program executes the loop. It turns out that
the contains method can be fairly expensive to call in terms of time. To find out
whether a particular value is in the list, the method has to go through each different
value in the list. So as the list becomes larger and larger, it becomes more and more
expensive to search through it to see if it contains a particular word.
We will run into a similar problem when we get to the second version of the pro-
gram and have to compute the overlap between the two lists. The simplest way to
compute the overlap would be to write a method like this:
overlap = new empty list.
for (each word in list1) {
if (word is in list2) {
add word to overlap.
}
}
This approach will again require calling the contains method for a list that could
potentially be very large. If both lists are large, then the approach will run particu-
larly slowly.
Both of these potential bottlenecks can be addressed by dealing with sorted lists.
In a sorted list of words, all of the duplicates are grouped together, which makes
them easier to spot. And looking for the overlap between two sorted lists is easier
than looking for overlap in two lists that are not ordered.
Of course, sorting isn't cheap either. It takes a nontrivial amount of time to sort a
list. But if we can manage to sort the list just once, it will turn out to be cheaper than
making all of those calls on the contains method.
Instead of trying to eliminate the duplicates as we read the words, we can just read
all of the words directly into the list. That way we won't make any expensive calls on
the contains method. After we have read everything in, we can put the list into
 
Search WWH ::




Custom Search