Java Reference
In-Depth Information
communication like JMS is required, and this approach can't provide any benefits for jobs where the
bottleneck exists in the input or output phase of the step.
In situations where offloading just the ItemProcessor's work isn't enough (situations when I/O is the
bottleneck, for example), Spring Batch has one other option up its sleeve: partitioning. You look at
partitioning and how you can use it to scale your jobs in the next section.
Although remote chunking is useful when you're working with a process that has a bottleneck in the
processing of items, most of the time the bottleneck exists in the input and output. Interacting with a
database or reading files typically is where performance and scalability concerns come into play. To help
with that, Spring Batch provides the ability for multiple workers to execute complete steps. The entire
ItemReader, ItemProcessor, and ItemWriter interaction can be offloaded to slave workers. This section
looks at what partitioning is and how to configure jobs to take advantage of this powerful Spring Batch
Partitioning is a concept where a master step farms out work to any number of listening slave steps
for processing. This may sound very similar to remote chunking (and it is), but there are some key
differences. First, the slave nodes aren't message-driven POJOs as they are with remote chunking. The
slaves in partitioning are Spring Batch steps, each complete with its own reader, processor, and writer.
Because they're full Spring Batch steps, partitioning offers a couple of unique benefits over a remote-
chunking implementation.
The first advantage of partitioning over remote chunking is that you don't need a guaranteed
delivery system (JMS for example). Each step maintains its own state just like any other Spring Batch
step. Currently, the Spring Batch Integration project uses Spring Integration's channels to abstract out
the communication mechanism so you can use anything Spring Integration supports.
The second advantage is that you don't need to develop custom components. Because the slave is a
regular Spring Batch step, there is almost nothing special you need to code (there is one extra class, a
Partitioner implementation you see later).
But even with these advantages, you need to keep a couple of things in mind. First, remote steps
need to be able to communicate with your job repository. Because each slave is a true Spring Batch step,
it has its own StepExecution and maintains its state in the database like any other step. In addition, the
input and output need to be accessible from all the slave nodes. With remote chunking, the master
handles all input and output, so the data can be centralized. But with partitioning, slaves are responsible
for their own input and output. Thus some forms of I/O lend themselves more toward partitioning than
others (databases are typically easier than files, for example).
To see the structural difference between remote chunking and partitioning, Figure 11-18 shows how
a job using partitioning is structured.
Search WWH ::

Custom Search