Java Reference
In-Depth Information
Reading an XML Data Stream
Problem
You want to access an XML document in a fast stream. Your dataset is too large for DOM,
and you want a more selective API than SAX offers.
Solution
Use the StAX API in Java SE 6 to “pull” parse your document.
Discussion
Java has given us a number of ways to work with XML documents, including the popular
DOM and SAX. The most recent addition is StAX, or Streaming API for XML, which is
largely the brainchild of Oracle/BEA. While all three of these methods of parsing XML have
advantages, they have shortcomings too.
StAX is currently the most efficient method of dealing with XML, and is therefore particularly
well suited to working with complex processes such as data binding and SOAP messages.
Oracle/BEA's WebLogic 9 and 10 use this parser internally within the application server, as
does Glassfish v2.
DOM offers an easy-to-use API, and has an advantage over SAX and StAX in that it is XPath-
capable. But it also forces you to read the entire document into memory. This is fine for small
documents, but can damage performance for sizeable documents, and can be ultimately pro-
hibitive for very large documents. One European bank network regularly transfers multi-giga-
byte XML files within their SOA; they're not using DOM to deal with it.
SAX, on the other hand, handles this problem by working as a “push” parser; that is, events
are generated for each structure the parser encounters within the document, and the program-
mer can choose to deal with those he's interested in. The disadvantage here is that SAX will
typically generate a lot of events that the programmer doesn't care about. Moreover, the SAX
API does not offer iterative processing of your document, and blasts through the whole thing
from beginning to end. In this model, the parser controls the processing of the document.
The StAX API gives you control akin to the Java I/O RandomAccessFile —you can skip sec-
tions of the document, work with a subsection of the document, pause and resume processing,
or stop processing at any time. Using the “pull” model for processing, the application is in
charge of how the document is processed, and exerts this control by indicating which items
it's interested in working with; the parser then pulls them out of the event stream.
Search WWH ::




Custom Search