Java Reference
In-Depth Information
The
indent
parameter defines the indentation for the current node. Calling
getNodeType()
for the
node object returns a value of type
short
that identifies the node type. You then pass this value to the
nodeType()
helper method that you've added to the
TryDOM
class. The code for the helper method is just
a
switch
statement with the constants from the
Node
interface that identify the types of nodes as case
values. I just included a representative set in the code, but you can add case labels for all 18 constants if
you want.
The remainder of the
listNodes()
code iterates through the child nodes of the current node if it has any:
NodeList list = node.getChildNodes(); // Get the list
of child nodes
if(list.getLength() > 0) { // As long as
there are some...
System.out.println(indent+"Child Nodes of " + nodeName + " are:");
//...list them & their children...
// ...by calling listNodes() for each
for(int i = 0 ; i < list.getLength() ; ++i) {
listNodes(list.item(i),indent + " ");
}
The
for
loop simply iterates through the list of child nodes obtained by calling the
getChildNodes()
method. Each child is passed as an argument to the
listNodes()
method, which lists the node and iter-
ates through its children. In this way the method works through all the nodes in the document. You can
see that you append an extra couple of spaces to
indent
in the second argument to the
listNodes()
call for a child node. The
indent
parameter in the next level down references a string that is two spaces
longer. This ensures that the output for the next level of nodes is indented relative to the current node.
Ignorable Whitespace and Element Content
Some of the elements have multiple
#text
elements recorded in the output. The
#text
elements arise from
two things: text that represents element content and ignorable whitespace that is there to present the markup
in a readable fashion. If you don't want to see the ignorable whitespace, you can get rid of it quite easily.
You just need to set another parser feature in the factory object:
builderFactory.setNamespaceAware(true); // Set namespace aware
builderFactory.setValidating(true); // and validating parser
builderFactory.setIgnoringElementContentWhitespace(true);
Calling this method results in a parser that does not report ignorable whitespace as a node, so you don't see
it in the
Document
object. If you run the example again with this change, the
#text
nodes arising from ig-
norable whitespace are no longer there.
That still leaves some other
#text
elements that represent element content, and you really do want to
access that and display it. In this case you can use the
getWholeText()
method for a node of type
Text
to
obtain all of the content as a single string. You could modify the code in the
listNodes()
method in the
example to do this: