Data Loads and Data Warehouses
In this example, I didn't tune the table at all. For example, there are no indexes on any of the columns
besides the primary key. This is to avoid complicating the example. Great care should be taken with a table
like this one in a nontrivial, production-bound application,.
Spring Batch applications are workhorse applications and have the potential to reveal bottlenecks in your
application you didn't know you had. Imagine suddenly being able to achieve 1 million new database
insertions every 10 minutes. Would your database grind to a halt? Insert speed can be a critical factor in
the speed of your application. Software developers will (hopefully) think about schema in terms of how well
it enforces the constraints of the business logic and how well it serves the overall business model.
However, it's important to wear another hat, that of a DBA, when writing applications such as this one. A
common solution is to create a denormalized table whose contents can be coerced into valid data once
inside the database, perhaps by a trigger on inserts. This is a common technique in data warehousing.
Later, you'll explore using Spring Batch to do processing on a record before insertion. This lets the
developer verify or override the input into the database. This processing, in tandem with a conservative
application of constraints that are best expressed in the database, can make for applications that are very
robust and quick.
The Job Configuration
The configuration for the job is as follows:
As described earlier, a job consists of step s, which are the real workhorse of a given job . The step s
can be as complex or as simple as you like. Indeed, a step could be considered the smallest unit of work
for a job . Input (what's read) is passed to the Step and potentially processed; then output (what's
written) is created from the step . This processing is spelled out using a Tasklet . You can provide your
own Tasklet implementation or simply use some of the preconfigured configurations for different
processing scenarios. These implementations are made available in terms of subelements of the Tasklet
element. One of the most important aspects of batch processing is chunk-oriented processing, which is
employed here using the chunk element.