Java Reference
In-Depth Information
StepExecution is information about a single run of the job or step. You see more detail about what is in
the executions and the repository later in this chapter and in Chapter 5.
Figure 2-2. The job components and their relationships
Running a job begins with the JobLauncher. The JobLauncher verifies whether the job has been run
before by checking the JobRepository, validates the parameters being passed into the job, and, finally,
executes the job.
The processing of a job and a step are very similar. A job goes through the list of steps it has been
configured to run, executing each one. As a chunk of items completes, Spring Batch updates the
JobExecution or StepExecution in the repository with the results of the execution. A step goes through a
list of items as read in by the ItemReader. As the step processes each chunk of items, the StepExecution
in the repository is updated with where it is in the step. Things like current commit count, start and end
times, and other information are stored in the repository. When a job or step is complete, the related
execution is updated in the repository with the final status.
One of the things that changed in Spring Batch from version 1 to 2 was the addition of chunked
processing. In version 1, records were read in, processed, and written out one at a time. The issue with
this is that it doesn't take advantage of the ability to batch-write that Java's file and database I/O
provides (buffered writing and batch updates). In version 2 and beyond of Spring Batch, the framework
has been updated. Reading and processing is still a singular operation; there is no reason to load a large
amount of data into memory if it can't be processed. But now, the write only occurs once a commit
count interval occurs. This allows for more performant writing of records as well as a more capable
rollback mechanism.
A simple batch process's architecture consists of a single-threaded process that executes a job's steps in
order from start to finish. However, Spring Batch provides a number of parallelization options that you
should be aware of as you move forward. (Chapter 11 covers these options in detail.) There are four
different ways to parallelize your work: dividing work via multithreaded steps, parallel execution of full
steps, remote chunking, and partitioning.
Search WWH ::

Custom Search