Java Reference
In-Depth Information
expression isn't quite right, though, because there might be many such characters
in a row. For example, there might be several spaces, dashes, or other punctuation
characters separating two words. We can indicate that a sequence of illegal characters
should also be ignored by putting a plus after the square brackets to indicate “Any
sequence of one or more of these characters”:
[
∧
a-zA-Z']+
We pass this regular expression as a
String
to a call on
useDelimiter
. We can
add this at the beginning of the
getWords
method:
public static ArrayList<String> getWords(Scanner input) {
input.useDelimiter("[
∧
a-zA-Z']+");
...
}
The following is a complete program that incorporates all of these changes and
includes more extensive commenting:
1 // This program reads two text files and compares the
2 // vocabulary used in each.
3
4
import
java.util.*;
5
import
java.io.*;
6
7
public class
Vocabulary3 {
8
public static void
main(String[] args)
9
throws
FileNotFoundException {
10 Scanner console =
new
Scanner(System.in);
11 giveIntro();
12
13 System.out.print("file #1 name? ");
14 Scanner in1 =
new
Scanner(
new
File(console.nextLine()));
15 System.out.print("file #2 name? ");
16 Scanner in2 =
new
Scanner(
new
File(console.nextLine()));
17 System.out.println();
18
19 ArrayList<String> list1 = getWords(in1);
20 ArrayList<String> list2 = getWords(in2);
21 ArrayList<String> common = getOverlap(list1, list2);
22
23 reportResults(list1, list2, common);
24 }
25
Search WWH ::
Custom Search