Java Reference
In-Depth Information
Chapter 7. Parallel data processing and performance
This chapter covers
Processing data in parallel with parallel streams
Performance analysis of parallel streams
The fork/join framework
Splitting a stream of data using a Spliterator
In the last three chapters, you've seen how the new Stream interface lets you manipulate
collections of data in a declarative way. We also explained that the shift from external to internal
iteration enables the native Java library to gain control over processing the elements of a stream.
This approach relieves Java developers from explicitly implementing optimizations necessary to
speed up the processing of collections of data. By far the most important benefit is the
possibility of executing a pipeline of operations on these collections that automatically makes
use of the multiple cores on your computer.
For instance, before Java 7, processing a collection of data in parallel was extremely
cumbersome. First, you needed to explicitly split the data structure containing your data into
subparts. Second, you needed to assign each of these subparts to a different thread. Third, you
needed to synchronize them opportunely to avoid unwanted race conditions, wait for the
completion of all threads, and finally combine the partial results. Java 7 introduced a framework
called fork/join to perform these operations more consistently and in a less error-prone way. We
explore this framework in section 7.2 .
In this chapter, you'll discover how the Stream interface gives you the opportunity to execute
operations in parallel on a collection of data without much effort. It lets you declaratively turn a
sequential stream into a parallel one. Moreover, you'll see how Java can make this magic happen
or, more practically, how parallel streams work under the hood by employing the fork/join
framework introduced in Java 7. You'll also discover that it's important to know how parallel
streams work internally, because if you ignore this aspect, you could obtain unexpected (and
very likely wrong) results by misusing them.
In particular we'll demonstrate that the way a parallel stream gets divided into chunks, before
processing the different chunks in parallel, can in some cases be the origin of these incorrect and
apparently unexplainable results. For this reason, you'll learn how to take control of this
splitting process by implementing and using your own Spliterator.
 
Search WWH ::




Custom Search