Java Reference
In-Depth Information
System.out.println(indent+" "+node);
This will automatically output the string produced by the toString() method for the node. Take a look at
the output that corresponds to this as it is quite revealing. The node corresponding to the root element is first,
and for this we get the entire document contents generated by the toString() method:
<address>
<buildingnumber> 29 </buildingnumber>
<street> South Lasalle Street</street>
<city>Chicago</city>
<state>Illinois</state>
<zip>60603</zip>
</address>
On the basis of this, when you create a new Document object and want to write it as a document to a file,
you might be tempted to use the toString() method for the root element to provide all the text for the
document body. This would be unwise. It would work for this particular parser but you cannot be sure that
another parser will do the same. There is no prescribed string returned by toString() so what you get will
depend entirely on the parser and maybe on the particular release of the parser. When you want to write a
document to a file, extract the data from the Document object and assemble the text yourself.
The remainder of the listNodes() code iterates through the child nodes of the current node if it has any:
NodeList list = node.getChildNodes(); // Get the list of child nodes
if(list.getLength() > 0) { // As long as there are some...
System.out.println(indent+"Child Nodes of "+nodeName+" are:");
for(int i = 0 ; i<list.getLength() ; i++) //...list them & their children...
listNodes(list.item(i),indent+" "); // by calling listNodes() for each
The for loop simply iterates through the list of child nodes obtained by calling the
getChildNodes() method. Each child is passed as an argument to the listNodes() method, which
will list the node and iterate through its children. In this way the method will work through all the nodes
in the document. You can see that we append an extra couple of spaces to indent in the second
argument to the listNodes() call for a child node. The indent parameter in the next level down
will reference a string that is two spaces longer. This ensures that the output for the next level of nodes
will be indented relative to the current node.
You can see from the output that the output produced by the toString() method for each node by
the Crimson parser encompasses its child nodes too. Of course, we get all of them explicitly as nodes in
their own right, so there is a lot of duplication in the output. You may have noticed that the output is
strange in some ways. We seem to have picked up some extra #text nodes from somewhere that seem
to contain just whitespace. Each block of text or whitespace is returned as a node with the name #text ,
and that includes ignorable whitespace here. The newline characters at the end of each line in the
original document, for instance, will contribute text nodes that are ignorable whitespace.
If you don't want to see it, getting rid of the ignorable whitespace is very simple. We just need to set
another parser feature in the factory object:
Search WWH ::




Custom Search