Parallel data processing and performance - Java 8 in Action: Lambdas, Streams, and Functional-Style Programming

Java Reference

In-Depth Information

Chapter 7. Parallel data processing and performance

This chapter covers

 Processing data in parallel with parallel streams

 Performance analysis of parallel streams

 The fork/join framework

 Splitting a stream of data using a Spliterator

In the last three chapters, you've seen how the new Stream interface lets you manipulate

collections of data in a declarative way. We also explained that the shift from external to internal

iteration enables the native Java library to gain control over processing the elements of a stream.

This approach relieves Java developers from explicitly implementing optimizations necessary to

speed up the processing of collections of data. By far the most important benefit is the

possibility of executing a pipeline of operations on these collections that automatically makes

use of the multiple cores on your computer.

For instance, before Java 7, processing a collection of data in parallel was extremely

cumbersome. First, you needed to explicitly split the data structure containing your data into

subparts. Second, you needed to assign each of these subparts to a different thread. Third, you

needed to synchronize them opportunely to avoid unwanted race conditions, wait for the

completion of all threads, and finally combine the partial results. Java 7 introduced a framework

called fork/join to perform these operations more consistently and in a less error-prone way. We

explore this framework in section 7.2 .

In this chapter, you'll discover how the Stream interface gives you the opportunity to execute

operations in parallel on a collection of data without much effort. It lets you declaratively turn a

sequential stream into a parallel one. Moreover, you'll see how Java can make this magic happen

or, more practically, how parallel streams work under the hood by employing the fork/join

framework introduced in Java 7. You'll also discover that it's important to know how parallel

streams work internally, because if you ignore this aspect, you could obtain unexpected (and

very likely wrong) results by misusing them.

In particular we'll demonstrate that the way a parallel stream gets divided into chunks, before

processing the different chunks in parallel, can in some cases be the origin of these incorrect and

apparently unexplainable results. For this reason, you'll learn how to take control of this

splitting process by implementing and using your own Spliterator.

Search WWH ::

Custom Search

Home