Working with Steps
If a job defines the entire process, a step is the building block of a job. It's an independent, sequential
batch processor. I call it a batch processor for a reason. A step contains all of the pieces a job requires. It
handles its own input. It has its own processor. It handles its own output. Transactions are self-
contained within a step. It's by design that steps are as disjointed as they're. This allows you as the
developer to structure your job as freely as needed.
In this section you take the same style deep dive into steps that you did with jobs in the previous
section. You cover the way Spring Batch breaks processing down in a step by chunks and how that has
changed because previous versions of the framework. You also look at a number of examples on how to
configure steps within your job including how to control the flow from step to step and conditional step
execution. Finally you configure the steps required for your statement job. With all of this in mind, let's
start looking at steps by looking at how steps process data.
Chunk vs. Item Processing
Batch processes in general are about processing data. When you think about what a unit of data to be
processed is, there are two options: an individual item or a chunk of items. An individual item consists of
a single object that typically represents a single row in a database or file. Item-based processing,
therefore, is the reading, processing, and then writing of your data one row, record, or object at a time, as
Figure 4-5 shows.
for each item
Figure 4-5. Item-based processing
As you can imagine, there can be significant overhead with this approach. The inefficiency of writing
individual rows when you know you'll be committing large numbers of rows to a database or writing
them to a file can be enormous.
When Spring Batch 1.x came out in 2008, item-based processing was the way records were
processed. Since then the guys at SpringSource and Accenture have upgraded the framework, and in
Spring Batch 2, they introduced the concept of chunk-based processing. A chunk in the world of batch
processing is a subset of the records or rows that need to be processed, typically defined by the commit
interval. In Spring Batch, when you're working with a chunk of data, it's defined by how many rows are
processed between each commit.