I have covered many scenarios involving flat files—from fixed-width records, delimited records,
multiline records, and even input from multiple files. However, flat files are not the only type of files that
you are likely to see. You have spent a large amount of this topic (and will still spend a large amount
more) looking at XML, yet you haven't even looked at how Spring Batch processes it. Let's see what
Spring Batch can do for you when you're faced with XML files.
When I began talking about file-based processing at the beginning of this chapter, I talked about how
different file formats have differing amounts of metadata that describe the format of the file. I said that
fixed-width records have the least amount of metadata, requiring the most information about the record
format to be known in advance. XML is at the other end of the spectrum. XML uses tags to describe the
data in the file, providing a full description of the data it contains.
Two XML parsers are commonly used: DOM and SAX. The DOM parser loads the entire file into
memory in a tree structure for navigation of the nodes. This approach is not useful for batch processing
due to the performance implications. This leaves you with the SAX parser. SAX is an event-based parser
that fires events when certain elements are found.
In Spring Batch, you use a StAX parser. Although this is an event-based parser similar to SAX, it has
the advantage of allowing for the ability to parse sections of your document independently. This relates
directly with the item oriented reading you do. A SAX parser would parse the entire file in a single run;
the StAX parser allows you to read each section of a file that represents an item to be processed at a time.
Before you look at how to parse XML with Spring Batch, let's look at a sample input file. To see how
the XML parsing works with Spring Batch, you will be working with the same input: your customer file.
However, instead of the data in the format of a flat file, you will structure it via XML. Listing 7-34 shows a
sample of the input.
Listing 7-34. Customer XML File Sample
<address>2039 Wall Street</address>
<address>8192 Wall Street</address>