Java Reference
In-Depth Information
sorted order. When we do that, all of the duplicates will appear right next to one
another, so we can fairly easily get rid of them.
Reading all of the words into the list and then eliminating duplicates will
require more memory than eliminating the duplicates as we go, but it will end up
running faster because the only expensive operation we will have is the sorting
step. This is a classic tradeoff between running time and memory that comes up
often in computer science. We can make programs run faster if we're willing to
use more memory or we can limit memory if we don't mind having the program
take longer to run.
Our approach to building up the list of words, then, will be as follows:
list = new empty list.
while (there are more words to process) {
add word to list.
}
sort list.
eliminate duplicates.
The task of eliminating duplicates also brings up an efficiency consideration. One
obvious approach would be the following:
for (each word in list) {
if (word is a duplicate) {
remove word from list.
}
}
It turns out that remove is another expensive operation. A better approach is to
simply build up a new list that doesn't have duplicates:
result = new empty list.
for (each word in list) {
if (word is not a duplicate) {
add word to result.
}
}
This code runs faster because the method that adds a word at the end of the list runs
very fast compared with the method that removes a word from the middle of the list.
As we write the actual code, we will refine the pseudocode presented here, but at
least we have an idea of the approach we're going to take. For each file, we'll read in
all of the words into a list and sort it once. Then we'll use the sorted lists to build up
lists that have no duplicates. Then we'll use those two sorted lists to look for the
overlap between the two lists.
 
Search WWH ::




Custom Search