The last two approaches to parallelization allow you to spread processing across multiple JVMs. In all
cases previously, the processing was performed in a single JVM, which can seriously hinder the
scalability options. When you can scale any part of your process horizontally across multiple JVMs, the
ability to keep up with large demands increases.
The first remote-processing option is remote chunking . In this approach, input is performed using a
standard ItemReader in a master node; the input is then sent via a form of durable communication (JMS
for example) to a remote slave ItemProcessor that is configured as a message driven POJO. When the
processing is complete, the slave sends the updated item back to the master for writing. Because this
approach reads the data at the master, processes it at the slave, and then sends it back, it's important to
note that it can be very network intensive. This approach is good for scenarios where the cost of I/O is
small compared to the actual processing.
The final method for parallelization within Spring Batch is partitioning, shown in Figure 2-5. Again, you
use a master/slave configuration; but this time you don't need a durable method of communication, and
the master serves only as a controller for a collection of slave steps. In this case, each of your slave steps
is self-contained and configured the same as if it was locally deployed. The only difference is that the
slave steps receive their work from the master node instead of the job itself. When all the slaves have
completed their work, the master step is considered complete. This configuration doesn't require
durable communication with guaranteed delivery because the JobRepository guarantees that no work is
duplicated and all work is completed—unlike the remote-chunking approach, in which the
JobRepository has no knowledge of the state of the distributed work.
Figure 2-5. Partitioning work
Any enterprise system must be able to start and stop processes, monitor their current state, and even
view results. With web applications, this is easy: in the web application, you see the results of each action
you request, and tools like Google Analytics provide various metrics on how your application is being
used and is performing.
However, in the batch world, you may have a single Java process running on a server for eight hours
with no output other than log files and the database the process is working on. This is hardly a
manageable situation. For this reason, Spring has developed a web application called Spring Batch
Admin that lets you start and stop jobs and also provides details about each job execution.