Java Reference
In-Depth Information
Data Loads and Data Warehouses
In this example, I didn't tune the table at all. For example, there are no indexes on any of the columns
besides the primary key. This is to avoid complicating the example. Great care should be taken with a table
like this one in a nontrivial, production-bound application,.
Spring Batch applications are workhorse applications and have the potential to reveal bottlenecks in your
application you didn't know you had. Imagine suddenly being able to achieve 1 million new database
insertions every 10 minutes. Would your database grind to a halt? Insert speed can be a critical factor in
the speed of your application. Software developers will (hopefully) think about schema in terms of how well
it enforces the constraints of the business logic and how well it serves the overall business model.
However, it's important to wear another hat, that of a DBA, when writing applications such as this one. A
common solution is to create a denormalized table whose contents can be coerced into valid data once
inside the database, perhaps by a trigger on inserts. This is a common technique in data warehousing.
Later, you'll explore using Spring Batch to do processing on a record before insertion. This lets the
developer verify or override the input into the database. This processing, in tandem with a conservative
application of constraints that are best expressed in the database, can make for applications that are very
robust
and
quick.
The Job Configuration
The configuration for the
job
is as follows:
<job
job-repository="jobRepository"
id="insertIntoDbFromCsvJob">
<step id="step1">
<tasklet transaction-manager="transactionManager">
<chunk
reader="csvFileReader"
writer="jdbcItemWriter"
commit-interval="5"
/>
</tasklet>
</step>
</job>
As described earlier, a
job
consists of
step
s, which are the real workhorse of a given
job
. The
step
s
can be as complex or as simple as you like. Indeed, a
step
could be considered the smallest unit of work
for a
job
. Input (what's read) is passed to the
Step
and potentially processed; then output (what's
written) is created from the
step
. This processing is spelled out using a
Tasklet
. You can provide your
own
Tasklet
implementation or simply use some of the preconfigured configurations for different
processing scenarios. These implementations are made available in terms of subelements of the
Tasklet
element. One of the most important aspects of batch processing is chunk-oriented processing, which is
employed here using the
chunk
element.
Search WWH ::
Custom Search