Java Reference
In-Depth Information
17.7 Creating a Stream<String> from a File
Figure 17.17 uses lambdas and streams to summarize the number of occurrences of each
word in a file then display a summary of the words in alphabetical order grouped by start-
ing letter. This is commonly called a concordance ( http://en.wikipedia.org/wiki/
Concordance_(publishing) ). Concordances are often used to analyze published works.
For example, concordances of William Shakespeare's and Christopher Marlowe's works
have been used to question whether they are the same person. Figure 17.18 shows the pro-
gram's output. Line 16 of Fig. 17.17creates a regular expression Pattern that we'll use to
split lines of text into their individual words. This Pattern represents one or more consec-
utive white-space characters. (We introduced regular expressions in Section 14.7.)
1
// Fig. 17.17: StreamOfLines.java
2
// Counting word occurrences in a text file.
3
import java.io.IOException;
4
import java.nio.file.Files;
5
import java.nio.file.Paths;
6
import java.util.Map;
7
import java.util.TreeMap;
8
import java.util.regex.Pattern;
9
import java.util.stream.Collectors;
10
11
public class StreamOfLines
12
{
13
public static void main(String[] args) throws IOException
14
{
15
// Regex that matches one or more consecutive whitespace characters
16
Pattern pattern = Pattern.compile( "\\s+" );
17
18
// count occurrences of each word in a Stream<String> sorted by word
19
Map<String, Long> wordCounts =
Files.lines(Paths.get( "Chapter2Paragraph.txt" ))
.map(line -> line.replaceAll( "(?!')\\p{P}" , "" ))
.flatMap(line -> pattern.splitAsStream(line))
.collect(Collectors.groupingBy(String::toLowerCase,
TreeMap:: new , Collectors.counting()));
20
21
22
23
24
25
26
// display the words grouped by starting letter
27
wordCounts.entrySet()
.stream()
.collect(
Collectors.groupingBy(entry -> entry.getKey().charAt( 0 ),
TreeMap:: new , Collectors.toList()))
.forEach((letter, wordList) ->
{
System.out.printf( "%n%C%n" , letter);
wordList.stream().forEach(word -> System.out.printf(
"%13s: %d%n" , word.getKey(), word.getValue()));
});
28
29
30
31
32
33
34
35
36
37
38
}
39
} // end class StreamOfLines
Fig. 17.17 | Counting word occurrences in a text file.
 
 
Search WWH ::




Custom Search