Java Reference
In-Depth Information
The following snippet of code shows parallel processing of the stream pipeline because the stream is parallel:
String names = Person.persons() // The data source
.parallelStream() // Produces a parallel stream
.filter(Person::isMale) // Processed in parallel
.map(Person::getName) // Processed in parallel
.collect(Collectors.joining(", ")); // Processed in parallel
The following snippet of code shows processing of the stream pipeline in mixed mode because the operations in
the pipeline produce serial and parallel streams:
String names = Person.persons() // The data source
.stream() // Produces a sequential stream
.filter(Person::isMale) // Processed in serial
.parallel() // Produces a parallel stream
.map(Person::getName) // Processed in parallel
.collect(Collectors.joining(", ")); // Processed in parallel
The operations following a serial stream are performed serially and the operations following a parallel stream are
performed in parallel. You get parallelism when processing streams for free. So when do you use parallelism in stream
processing? Do you get the benefits of parallelism whenever you use it? The answer is no. There are some conditions
that must be met before you should use parallel streams. Sometimes using parallel streams may give you worse
performance.
The Streams API uses the Fork/Join framework to process parallel streams. The Fork/Join framework uses
multiple threads. It divides the stream elements into chunks, each thread processes a chunk of elements to produce
partial result, and finally, the partial results are combined to give you the result. Starting up multiple threads, dividing
the data into chunks, and combining partial results takes up CPU time. This overhead is justified by the overall time
to finish the task. For example, a stream of six people is going to take longer to process in parallel than in serial. The
overhead of setting up the threads and coordinating them for such small work is not worth it.
You have seen the use of an Iterator for traversing elements of collections. The Streams API uses a Spliterator
(a splittable iterator) to traverse elements of streams. Spliterator is a generalization of Iterator . An iterator
provides sequential access to data elements. A Spliterator provides sequential access and decomposition of data
elements. When you create a Spliterator , it knows the chunk of data it will process. You can split a Spliterator into
two: each will get its own chunk of data to process. The Spliterator is an interface in the java.util package. It is
used heavily for splitting stream elements into chunks to be processed by multiple threads. As the user of the Streams
API, you will never have to work directly with a Spliterator . The data source of the streams provides a Spliterator .
Parallel processing of a stream is faster if the Spliterator can know the size of the streams. Streams can be based on
a data source that may have a fixed size or an unknown size. Splitting the stream elements into chunks is not possible
if the size of the stream cannot be determined. In such cases, even though you can use a parallel stream, you may not
get the benefits of parallelism.
Another consideration in parallel processing is the ordering of elements. If elements are ordered, threads need to
keep the ordering at the end of the processing. If ordering is not important for you, you can convert an ordered stream
into an unordered stream using the unordered() method.
Spliterators divide the data elements into chunks. It is important that the data source for the stream does not
change during stream processing; otherwise the result is not defined. For example, if your stream uses a list/set as the
data source, do not add or remove elements from the list/set when stream is being processed.
Stream processing is based on functional programming that does not modify data elements during processing. It
creates new data elements rather than modifying them. The same rule holds for stream processing, particularly when
it is processed in parallel. The operations in a stream pipeline are specified as lambda expressions that should not
modify the mutable states of the elements being processed.
Search WWH ::




Custom Search