Scaling and Tuning - Spring Batch

Java Reference

In-Depth Information

communication like JMS is required, and this approach can't provide any benefits for jobs where the

bottleneck exists in the input or output phase of the step.

In situations where offloading just the ItemProcessor's work isn't enough (situations when I/O is the

bottleneck, for example), Spring Batch has one other option up its sleeve: partitioning. You look at

partitioning and how you can use it to scale your jobs in the next section.

Partitioning

Although remote chunking is useful when you're working with a process that has a bottleneck in the

processing of items, most of the time the bottleneck exists in the input and output. Interacting with a

database or reading files typically is where performance and scalability concerns come into play. To help

with that, Spring Batch provides the ability for multiple workers to execute complete steps. The entire

ItemReader, ItemProcessor, and ItemWriter interaction can be offloaded to slave workers. This section

looks at what partitioning is and how to configure jobs to take advantage of this powerful Spring Batch

feature.

Partitioning is a concept where a master step farms out work to any number of listening slave steps

for processing. This may sound very similar to remote chunking (and it is), but there are some key

differences. First, the slave nodes aren't message-driven POJOs as they are with remote chunking. The

slaves in partitioning are Spring Batch steps, each complete with its own reader, processor, and writer.

Because they're full Spring Batch steps, partitioning offers a couple of unique benefits over a remote-

chunking implementation.

The first advantage of partitioning over remote chunking is that you don't need a guaranteed

delivery system (JMS for example). Each step maintains its own state just like any other Spring Batch

step. Currently, the Spring Batch Integration project uses Spring Integration's channels to abstract out

the communication mechanism so you can use anything Spring Integration supports.

The second advantage is that you don't need to develop custom components. Because the slave is a

regular Spring Batch step, there is almost nothing special you need to code (there is one extra class, a

Partitioner implementation you see later).

But even with these advantages, you need to keep a couple of things in mind. First, remote steps

need to be able to communicate with your job repository. Because each slave is a true Spring Batch step,

it has its own StepExecution and maintains its state in the database like any other step. In addition, the

input and output need to be accessible from all the slave nodes. With remote chunking, the master

handles all input and output, so the data can be centralized. But with partitioning, slaves are responsible

for their own input and output. Thus some forms of I/O lend themselves more toward partitioning than

others (databases are typically easier than files, for example).

To see the structural difference between remote chunking and partitioning, Figure 11-18 shows how

a job using partitioning is structured.

Search WWH ::

Custom Search

Home