Java Reference
In-Depth Information
// Deal with the possible exceptions catch (IOException ioe) {
System.out.println("Couldn't open file: " + fileName);
System.exit(1);
}
catch (SAXException se) {
System.out.println("Couldn't parse the XML file");
System.exit(1);
}
catch(ParserConfigurationException pce) {
System.out.println("Couldn't create a DocumentBuilder");
System.exit(1);
}
// Finally return the Document object
// that we built from the file return doc;
}
}
In this case, we just create a Document object, read through each line of the input, and describe the
content in the console. Naturally, you'll probably want to do something more than describe your input
in the console, but this example shows you how to read a file. One thing to note is that each Element
object is really a Node object (the Element interface extends the Node interface). Due to the way DOM has
been implemented, you sometimes need to work with both Element objects and Node objects, as I had to
do here when working with the attribute values.
Reading XML with SAX
SAX uses an interface called ContentHandler to expose parsing events that you can then intercept in your
own code to do whatever processing you want to do for each parsing event. The SAX packages also
provide a default implementation of ContentHandler , called DefaultHandler . DefaultHandler does
nothing with each event, because doing nothing is the default behavior. However, you can override the
methods in DefaultHandler to do whatever you like. The advantage of extending DefaultHandler is that
you can override just the methods you care about and leave the rest alone. In the example I've used here,
I didn't need many of the methods in DefaultHandler , so I didn't override them.
If you look at the names of the methods, you can see why SAX uses so little memory to process XML.
It triggers an event for the beginning and end of each part of an XML document, be it the document itself
or an element. So, all the parser has to put in memory is the name (and some other details) about the
element, and a list of the element's children. It doesn't have to put the element's content into memory
until it gets to the characters method, which is the method that handles an element's character content.
Most elements don't have vast amounts of text content (one exception is when someone stores an image
in an XML element, as Word documents do), so the memory used to process the text usually isn't much.
To show you how to read a simple XML document and describe its contents in the console, I first
created a class (called XMLToConsoleHandler ) that extends DefaultHandler and overrides the handful of
methods I need to use when capturing the contents of an XML file. Here's the XMLToConsoleHandler class:
Listing 9-7. XMLToConsolHandler
package com.bryantcs.examples.xml;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
Search WWH ::




Custom Search