C H A P T E R 5
Job Repository and Metadata
When you look into writing a batch process, the ability to execute processes without a UI in a stand-
alone manner isn't that hard. When you dig into Spring Batch, the execution of a job amounts to nothing
more than using an implementation of Spring's TaskExecutor to run a separate task. You don't need
Spring Batch to do that.
Where things get interesting, however, is when things go wrong. If your batch job is running and an
error occurs, how do you recover? How does your job know where it was in processing when the error
occurred, and what should happen when the job is restarted? State management is an important part of
processing large volumes of data. This is one of the key features that Spring Batch brings to the table.
Spring Batch, as discussed previously in this topic, maintains the state of a job as it executes in a job
repository. It then uses this information when a job is restarted or an item is retried to determine how to
continue. The power of this feature can't be overstated.
Another aspect of batch processing in which the job repository is helpful is monitoring. The ability
to see how far a job is in its processing as well as trend elements such as how long operations take or
how many items were retried due to errors is vital in the enterprise environment. The fact that Spring
Batch does the number gathering for you makes this type of trending much easier.
This chapter covers job repositories in detail. It goes over ways to configure a job repository for most
environments by using either a database or an in-memory repository. You also look briefly at
performance impacts on the configuration of the job repository. After you have the job repository
configured, you learn how to put the job information stored by the job repository to use using the
JobExplorer and the JobOperator.
Configuring the Job Repository
In order for Spring Batch to be able to maintain state, the job repository needs to be available. Spring
offers two options by default: an in-memory repository and a persisted repository in a database. This
section looks at how to configure each of those options as well as the performance impacts of both
options. Let's start with more simpler option, the in-memory job repository.
Using an In-Memory Job Repository
The opening paragraphs of this chapter laid out a list of benefits for the job repository, such as the ability
to maintain state from execution to execution and trend run statistics from run to run. However, you'll
almost never use an in-memory repository for those reasons. That's because when the process ends, all
of that data is lost. So, why would you use an in-memory repository at all?
The answer is that sometimes you don't need to persist the data. For example, in development, it's
common to run jobs with an in-memory repository so that you don't have to worry about maintaining
the job schema in a database. This also allows you to execute the same job multiple times with the same