The pieces are there, however: transaction support, fast I/O, schedulers such as Quartz and solid
threading support, and a very powerful concept of an application container in Java EE and Spring. It was
only natural that Dave Syer and his team would come along and build Spring Batch, a batch processing
solution for the Spring platform.
It's important to think about the kinds of problems this framework solves before diving into the
details. A technology is defined by its solution space. A typical Spring Batch application typically reads in
a lot of data and then writes it back out in a modified form. Decisions about transactional barriers, input
size, concurrency, and order of steps in processing are all dimensions of a typical integration.
A common requirement is loading data from a comma-separated value (CSV) file, perhaps as a
business-to-business (B2B) transaction; perhaps as an integration technique with an older legacy
application. Another common application is nontrivial processing on records in a database. Perhaps
the output is an update of the database record itself. An example might be resizing of images on the
file system whose metadata is stored in a database, or needing to trigger another process based on
Note Fixed-width data is a format of rows and cells, quite like a CSV file. CSV file cells are separated by
commas or tabs, however, and fixed-width data works by presuming certain lengths for each value. The first value
might be the first nine characters, the second value the next four characters after that, and so on.
Fixed-width data, which is often used with legacy or embedded systems, is a fine candidate
for batch processing. Processing that deals with a resource that's fundamentally nontransactional
(for example, a web service or a file) begs for batch processing because batch processing provides
retry/skip/fail functionality that most web services will not.
It's also important to understand what Spring Batch doesn't do. Spring Batch is a flexible but not
all-encompassing solution. Just as Spring doesn't reinvent the wheel when it can be avoided, Spring
Batch leaves a few important pieces to the discretion of the implementor. Case in point: Spring Batch
provides a generic mechanism by which to launch a job, be it by the command line, a Unix cron , an
operating system service, Quartz (discussed in Chapter 6), or in response to an event on an enterprise
service bus (for example, the Mule ESB or Spring's own ESB-like solution, Spring Integration, which is
discussed in Chapter 8). Another example is the way Spring Batch manages the state of batch processes.
Spring Batch requires a durable store. The only useful implementation of a JobRepository (an interface
provided by Spring Batch for storing runtime data) requires a database because a database is
transactional and there's no need to reinvent it. To which database you should deploy, however, is
largely unspecified, although there are useful defaults provided for you, of course.
Runtime Meta Model
Spring Batch works with a JobRepository , which is the keeper of all the knowledge/metadata for each
job (including component parts such as JobExecution and StepExecution ). Each job is composed of
one or more step s, one after another. With Spring Batch 2.0, a step can conditionally follow another
step , allowing for primitive workflows. These step s can also be concurrent: two step s can run at the
When a job is run, it's often coupled with JobParameter s to parameterize the behavior of the job
itself. For example, a job might take a date parameter to determine which records to process. This
coupling is called a JobInstance . A JobInstance is unique because of the JobParameter s associated
with it. Each time the same JobInstance (i.e., the same job and JobParameter s) is run, it's called a