Java Reference
In-Depth Information
Figure 17.17 uses lambdas and streams to summarize the number of occurrences of each
word in a file then display a summary of the words in alphabetical order grouped by start-
ing letter. This is commonly called a concordance (
http://en.wikipedia.org/wiki/
Concordance_(publishing)
).
Concordances are often used to analyze published works.
For example, concordances of William Shakespeare's and Christopher Marlowe's works
have been used to question whether they are the same person. Figure 17.18 shows the pro-
gram's output. Line 16 of Fig. 17.17creates a regular expression
Pattern
that we'll use to
split lines of text into their individual words. This
Pattern
represents one or more consec-
utive white-space characters. (We introduced regular expressions in Section 14.7.)
1
// Fig. 17.17: StreamOfLines.java
2
// Counting word occurrences in a text file.
3
import
java.io.IOException;
4
import
java.nio.file.Files;
5
import
java.nio.file.Paths;
6
import
java.util.Map;
7
import
java.util.TreeMap;
8
import
java.util.regex.Pattern;
9
import
java.util.stream.Collectors;
10
11
public class
StreamOfLines
12
{
13
public static void
main(String[] args)
throws
IOException
14
{
15
// Regex that matches one or more consecutive whitespace characters
16
Pattern pattern = Pattern.compile(
"\\s+"
);
17
18
// count occurrences of each word in a Stream<String> sorted by word
19
Map<String, Long> wordCounts =
Files.lines(Paths.get(
"Chapter2Paragraph.txt"
))
.map(line -> line.replaceAll(
"(?!')\\p{P}"
,
""
))
.flatMap(line -> pattern.splitAsStream(line))
.collect(Collectors.groupingBy(String::toLowerCase,
TreeMap::
new
, Collectors.counting()));
20
21
22
23
24
25
26
// display the words grouped by starting letter
27
wordCounts.entrySet()
.stream()
.collect(
Collectors.groupingBy(entry -> entry.getKey().charAt(
0
),
TreeMap::
new
, Collectors.toList()))
.forEach((letter, wordList) ->
{
System.out.printf(
"%n%C%n"
, letter);
wordList.stream().forEach(word -> System.out.printf(
"%13s: %d%n"
, word.getKey(), word.getValue()));
});
28
29
30
31
32
33
34
35
36
37
38
}
39
}
// end class StreamOfLines
Fig. 17.17
|
Counting word occurrences in a text file.