Java Reference
In-Depth Information
The indent parameter defines the indentation for the current node. Calling getNodeType() for the
node object returns a value of type short that identifies the node type. You then pass this value to the
nodeType() helper method that you've added to the TryDOM class. The code for the helper method is just
a switch statement with the constants from the Node interface that identify the types of nodes as case
values. I just included a representative set in the code, but you can add case labels for all 18 constants if
you want.
The remainder of the listNodes() code iterates through the child nodes of the current node if it has any:
NodeList list = node.getChildNodes(); // Get the list
of child nodes
if(list.getLength() > 0) { // As long as
there are some...
System.out.println(indent+"Child Nodes of " + nodeName + " are:");
//...list them & their children...
// ...by calling listNodes() for each
for(int i = 0 ; i < list.getLength() ; ++i) {
listNodes(list.item(i),indent + " ");
}
The for loop simply iterates through the list of child nodes obtained by calling the getChildNodes()
method. Each child is passed as an argument to the listNodes() method, which lists the node and iter-
ates through its children. In this way the method works through all the nodes in the document. You can
see that you append an extra couple of spaces to indent in the second argument to the listNodes()
call for a child node. The indent parameter in the next level down references a string that is two spaces
longer. This ensures that the output for the next level of nodes is indented relative to the current node.
Ignorable Whitespace and Element Content
Some of the elements have multiple #text elements recorded in the output. The #text elements arise from
two things: text that represents element content and ignorable whitespace that is there to present the markup
in a readable fashion. If you don't want to see the ignorable whitespace, you can get rid of it quite easily.
You just need to set another parser feature in the factory object:
builderFactory.setNamespaceAware(true); // Set namespace aware
builderFactory.setValidating(true); // and validating parser
builderFactory.setIgnoringElementContentWhitespace(true);
Calling this method results in a parser that does not report ignorable whitespace as a node, so you don't see
it in the Document object. If you run the example again with this change, the #text nodes arising from ig-
norable whitespace are no longer there.
That still leaves some other #text elements that represent element content, and you really do want to
access that and display it. In this case you can use the getWholeText() method for a node of type Text to
obtain all of the content as a single string. You could modify the code in the listNodes() method in the
example to do this:
Search WWH ::




Custom Search