Writing and Reading XML - Java 7 for Absolute Beginners

Java Reference

In-Depth Information

// Deal with the possible exceptions catch (IOException ioe) {

System.out.println("Couldn't open file: " + fileName);

System.exit(1);

}

catch (SAXException se) {

System.out.println("Couldn't parse the XML file");

System.exit(1);

}

catch(ParserConfigurationException pce) {

System.out.println("Couldn't create a DocumentBuilder");

System.exit(1);

}

// Finally return the Document object

// that we built from the file return doc;

}

In this case, we just create a Document object, read through each line of the input, and describe the

content in the console. Naturally, you'll probably want to do something more than describe your input

in the console, but this example shows you how to read a file. One thing to note is that each Element

object is really a Node object (the Element interface extends the Node interface). Due to the way DOM has

been implemented, you sometimes need to work with both Element objects and Node objects, as I had to

do here when working with the attribute values.

Reading XML with SAX

SAX uses an interface called ContentHandler to expose parsing events that you can then intercept in your

own code to do whatever processing you want to do for each parsing event. The SAX packages also

provide a default implementation of ContentHandler , called DefaultHandler . DefaultHandler does

nothing with each event, because doing nothing is the default behavior. However, you can override the

methods in DefaultHandler to do whatever you like. The advantage of extending DefaultHandler is that

you can override just the methods you care about and leave the rest alone. In the example I've used here,

I didn't need many of the methods in DefaultHandler , so I didn't override them.

If you look at the names of the methods, you can see why SAX uses so little memory to process XML.

It triggers an event for the beginning and end of each part of an XML document, be it the document itself

or an element. So, all the parser has to put in memory is the name (and some other details) about the

element, and a list of the element's children. It doesn't have to put the element's content into memory

until it gets to the characters method, which is the method that handles an element's character content.

Most elements don't have vast amounts of text content (one exception is when someone stores an image

in an XML element, as Word documents do), so the memory used to process the text usually isn't much.

To show you how to read a simple XML document and describe its contents in the console, I first

created a class (called XMLToConsoleHandler ) that extends DefaultHandler and overrides the handful of

methods I need to use when capturing the contents of an XML file. Here's the XMLToConsoleHandler class:

Listing 9-7. XMLToConsolHandler

package com.bryantcs.examples.xml;

import org.xml.sax.Attributes;

import org.xml.sax.SAXException;

Java 7 for Absolute Beginners

Search WWH ::

Custom Search

Home