Java Reference
In-Depth Information
System.out.println(indent+" "+node);
This will automatically output the string produced by the
toString()
method for the node. Take a look at
the output that corresponds to this as it is quite revealing. The node corresponding to the root element is first,
and for this we get the entire document contents generated by the
toString()
method:
<address>
<buildingnumber> 29 </buildingnumber>
<street> South Lasalle Street</street>
<city>Chicago</city>
<state>Illinois</state>
<zip>60603</zip>
</address>
On the basis of this, when you create a new
Document
object and want to write it as a document to a file,
you might be tempted to use the
toString()
method for the root element to provide all the text for the
document body. This would be unwise. It would work for this particular parser but you cannot be sure that
another parser will do the same. There is no prescribed string returned by
toString()
so what you get will
depend entirely on the parser and maybe on the particular release of the parser. When you want to write a
document to a file, extract the data from the
Document
object and assemble the text yourself.
The remainder of the
listNodes()
code iterates through the child nodes of the current node if it has any:
NodeList list = node.getChildNodes(); // Get the list of child nodes
if(list.getLength() > 0) { // As long as there are some...
System.out.println(indent+"Child Nodes of "+nodeName+" are:");
for(int i = 0 ; i<list.getLength() ; i++) //...list them & their children...
listNodes(list.item(i),indent+" "); // by calling listNodes() for each
The
for
loop simply iterates through the list of child nodes obtained by calling the
getChildNodes()
method. Each child is passed as an argument to the
listNodes()
method, which
will list the node and iterate through its children. In this way the method will work through all the nodes
in the document. You can see that we append an extra couple of spaces to
indent
in the second
argument to the
listNodes()
call for a child node. The
indent
parameter in the next level down
will reference a string that is two spaces longer. This ensures that the output for the next level of nodes
will be indented relative to the current node.
You can see from the output that the output produced by the
toString()
method for each node by
the Crimson parser encompasses its child nodes too. Of course, we get all of them explicitly as nodes in
their own right, so there is a lot of duplication in the output. You may have noticed that the output is
strange in some ways. We seem to have picked up some extra
#text
nodes from somewhere that seem
to contain just whitespace. Each block of text or whitespace is returned as a node with the name
#text
,
and that includes ignorable whitespace here. The newline characters at the end of each line in the
original document, for instance, will contribute text nodes that are ignorable whitespace.
If you don't want to see it, getting rid of the ignorable whitespace is very simple. We just need to set
another parser feature in the factory object: