Listing 11-19. Picklist Output
Items to pick
As you can see, developing jobs that use parallel processing using Spring Batch is typically as simple
as updating some XML. However, these approaches have limits. Up to this point, you've used only a
single JVM. Because of that, you're restricted to the CPU and memory available on the box on which you
start the job. But what about more complex scenarios where things are computationally hard? And how
can you take advantage of server clusters to improve throughput? The next two sections look at how to
scale Spring Batch jobs past a single JVM.
Java's multithreading abilities allow very high-performance software to be developed. But there is a limit
to what any single JVM can do. Let's begin to look at ways to spread out the processing of a given task to
many computers. The largest example of this type of distributed computing is the SETI@home project.
SETI (Search for Extraterrestrial Intelligence) takes signals it records from radio telescopes and divides
them in to small chunks of work. To analyze the work, SETI offers a screensaver that anyone can
download onto their computer. The screensaver analyzes the data downloaded from SETI and returns
the results. As of the writing of this topic, the SETI@home project has had more than 5.2 million
participants providing over 2 million years of aggregate computing time. The only way to scale to
numbers like this is to get more computers involved.
Although you probably won't need to scale to the levels of SETI@home, the fact remains that the
amount of data you need to process will probably at least tax the limits of a single JVM and may be
prohibitively large to process in the time window you have. This section looks at how to use Spring
Batch's remote chunking functionality to extend processing past what a single JVM can do.
Spring Batch provides two ways to scale beyond a single JVM. Remote chunking reads data locally,
sends it to a remote JVM for processing, and then receives the results back in the original JVM for
writing. This type of scaling outside of a single JVM is useful only when item processing is the bottleneck
in your process. If input or output is the bottleneck, this type of scaling only makes things worse. There
are a couple things to consider before using remote chunking as your method for scaling batch
Processing needs to be the bottleneck: Because reading and writing are completed
in the master JVM, in order for remote chunking to be of any benefit, the cost of
sending data to the slaves for processing must be less than the benefit received
from parallelizing the processing.
• Guaranteed delivery is required: Because Spring Batch doesn't maintain any type
of information about who is processing what, if one of the slaves goes down
during processing, Spring Batch has no way to know what data is in play. Thus a
persisted form of communication (typically JMS) is required.
Remote chunking takes advantage of two additional Spring projects. The Spring Integration project
is an extension of the Spring project that is intended to provide lightweight messaging in Spring
applications as well as adapters for interacting with remote applications via messaging. In the case of
remote chunking, you use its adapters to interact with slave nodes via JMS. The other project that
remote chunking relies on is the Spring Batch Integration project. This subproject of the Spring Batch