Streams - Beginning Java 8 Language Features

Java Reference

In-Depth Information

The following snippet of code shows parallel processing of the stream pipeline because the stream is parallel:

String names = Person.persons() // The data source

.parallelStream() // Produces a parallel stream

.filter(Person::isMale) // Processed in parallel

.map(Person::getName) // Processed in parallel

.collect(Collectors.joining(", ")); // Processed in parallel

The following snippet of code shows processing of the stream pipeline in mixed mode because the operations in

the pipeline produce serial and parallel streams:

String names = Person.persons() // The data source

.stream() // Produces a sequential stream

.filter(Person::isMale) // Processed in serial

.parallel() // Produces a parallel stream

.map(Person::getName) // Processed in parallel

.collect(Collectors.joining(", ")); // Processed in parallel

The operations following a serial stream are performed serially and the operations following a parallel stream are

performed in parallel. You get parallelism when processing streams for free. So when do you use parallelism in stream

processing? Do you get the benefits of parallelism whenever you use it? The answer is no. There are some conditions

that must be met before you should use parallel streams. Sometimes using parallel streams may give you worse

performance.

The Streams API uses the Fork/Join framework to process parallel streams. The Fork/Join framework uses

multiple threads. It divides the stream elements into chunks, each thread processes a chunk of elements to produce

partial result, and finally, the partial results are combined to give you the result. Starting up multiple threads, dividing

the data into chunks, and combining partial results takes up CPU time. This overhead is justified by the overall time

to finish the task. For example, a stream of six people is going to take longer to process in parallel than in serial. The

overhead of setting up the threads and coordinating them for such small work is not worth it.

You have seen the use of an Iterator for traversing elements of collections. The Streams API uses a Spliterator

(a splittable iterator) to traverse elements of streams. Spliterator is a generalization of Iterator . An iterator

provides sequential access to data elements. A Spliterator provides sequential access and decomposition of data

elements. When you create a Spliterator , it knows the chunk of data it will process. You can split a Spliterator into

two: each will get its own chunk of data to process. The Spliterator is an interface in the java.util package. It is

used heavily for splitting stream elements into chunks to be processed by multiple threads. As the user of the Streams

API, you will never have to work directly with a Spliterator . The data source of the streams provides a Spliterator .

Parallel processing of a stream is faster if the Spliterator can know the size of the streams. Streams can be based on

a data source that may have a fixed size or an unknown size. Splitting the stream elements into chunks is not possible

if the size of the stream cannot be determined. In such cases, even though you can use a parallel stream, you may not

get the benefits of parallelism.

Another consideration in parallel processing is the ordering of elements. If elements are ordered, threads need to

keep the ordering at the end of the processing. If ordering is not important for you, you can convert an ordered stream

into an unordered stream using the unordered() method.

Spliterators divide the data elements into chunks. It is important that the data source for the stream does not

change during stream processing; otherwise the result is not defined. For example, if your stream uses a list/set as the

data source, do not add or remove elements from the list/set when stream is being processed.

Stream processing is based on functional programming that does not modify data elements during processing. It

creates new data elements rather than modifying them. The same rule holds for stream processing, particularly when

it is processed in parallel. The operations in a stream pipeline are specified as lambda expressions that should not

modify the mutable states of the elements being processed.

Beginning Java 8 Language Features

Search WWH ::

Custom Search

Home