Scaling and Tuning - Spring Batch

Java Reference

In-Depth Information

Listing 11-19. Picklist Output

Items to pick

5837232417899987867:1

As you can see, developing jobs that use parallel processing using Spring Batch is typically as simple

as updating some XML. However, these approaches have limits. Up to this point, you've used only a

single JVM. Because of that, you're restricted to the CPU and memory available on the box on which you

start the job. But what about more complex scenarios where things are computationally hard? And how

can you take advantage of server clusters to improve throughput? The next two sections look at how to

scale Spring Batch jobs past a single JVM.

Remote Chunking

Java's multithreading abilities allow very high-performance software to be developed. But there is a limit

to what any single JVM can do. Let's begin to look at ways to spread out the processing of a given task to

many computers. The largest example of this type of distributed computing is the SETI@home project.

SETI (Search for Extraterrestrial Intelligence) takes signals it records from radio telescopes and divides

them in to small chunks of work. To analyze the work, SETI offers a screensaver that anyone can

download onto their computer. The screensaver analyzes the data downloaded from SETI and returns

the results. As of the writing of this topic, the SETI@home project has had more than 5.2 million

participants providing over 2 million years of aggregate computing time. The only way to scale to

numbers like this is to get more computers involved.

Although you probably won't need to scale to the levels of SETI@home, the fact remains that the

amount of data you need to process will probably at least tax the limits of a single JVM and may be

prohibitively large to process in the time window you have. This section looks at how to use Spring

Batch's remote chunking functionality to extend processing past what a single JVM can do.

Spring Batch provides two ways to scale beyond a single JVM. Remote chunking reads data locally,

sends it to a remote JVM for processing, and then receives the results back in the original JVM for

writing. This type of scaling outside of a single JVM is useful only when item processing is the bottleneck

in your process. If input or output is the bottleneck, this type of scaling only makes things worse. There

are a couple things to consider before using remote chunking as your method for scaling batch

processing:

•

Processing needs to be the bottleneck: Because reading and writing are completed

in the master JVM, in order for remote chunking to be of any benefit, the cost of

sending data to the slaves for processing must be less than the benefit received

from parallelizing the processing.

• Guaranteed delivery is required: Because Spring Batch doesn't maintain any type

of information about who is processing what, if one of the slaves goes down

during processing, Spring Batch has no way to know what data is in play. Thus a

persisted form of communication (typically JMS) is required.

Remote chunking takes advantage of two additional Spring projects. The Spring Integration project

is an extension of the Spring project that is intended to provide lightweight messaging in Spring

applications as well as adapters for interacting with remote applications via messaging. In the case of

remote chunking, you use its adapters to interact with slave nodes via JMS. The other project that

remote chunking relies on is the Spring Batch Integration project. This subproject of the Spring Batch

Spring Batch

Search WWH ::

Custom Search

Home