Creating and Modifying XML Documents - Beginning Java 2 SDK

Java Reference

In-Depth Information

System.out.println(indent+" "+node);

This will automatically output the string produced by the toString() method for the node. Take a look at

the output that corresponds to this as it is quite revealing. The node corresponding to the root element is first,

and for this we get the entire document contents generated by the toString() method:

<street> South Lasalle Street</street>

<city>Chicago</city>

<state>Illinois</state>

</address>

On the basis of this, when you create a new Document object and want to write it as a document to a file,

you might be tempted to use the toString() method for the root element to provide all the text for the

document body. This would be unwise. It would work for this particular parser but you cannot be sure that

another parser will do the same. There is no prescribed string returned by toString() so what you get will

depend entirely on the parser and maybe on the particular release of the parser. When you want to write a

document to a file, extract the data from the Document object and assemble the text yourself.

The remainder of the listNodes() code iterates through the child nodes of the current node if it has any:

NodeList list = node.getChildNodes(); // Get the list of child nodes

if(list.getLength() > 0) { // As long as there are some...

System.out.println(indent+"Child Nodes of "+nodeName+" are:");

for(int i = 0 ; i<list.getLength() ; i++) //...list them & their children...

listNodes(list.item(i),indent+" "); // by calling listNodes() for each

The for loop simply iterates through the list of child nodes obtained by calling the

getChildNodes() method. Each child is passed as an argument to the listNodes() method, which

will list the node and iterate through its children. In this way the method will work through all the nodes

in the document. You can see that we append an extra couple of spaces to indent in the second

argument to the listNodes() call for a child node. The indent parameter in the next level down

will reference a string that is two spaces longer. This ensures that the output for the next level of nodes

will be indented relative to the current node.

You can see from the output that the output produced by the toString() method for each node by

the Crimson parser encompasses its child nodes too. Of course, we get all of them explicitly as nodes in

their own right, so there is a lot of duplication in the output. You may have noticed that the output is

strange in some ways. We seem to have picked up some extra #text nodes from somewhere that seem

to contain just whitespace. Each block of text or whitespace is returned as a node with the name #text ,

and that includes ignorable whitespace here. The newline characters at the end of each line in the

original document, for instance, will contribute text nodes that are ignorable whitespace.

If you don't want to see it, getting rid of the ignorable whitespace is very simple. We just need to set

another parser feature in the factory object:

Search WWH ::

Custom Search

Home