Java Reference
In-Depth Information
How It Works
The static
newInstance()
method in the
DocumentBuilderFactory
class returns a reference to a
factory object. We call the
newDocumentBuilder()
method for the factory object to obtain a
reference to a
DocumentBuilder
object that encapsulates a DOM parser. This will be the default
parser. If we want the parser to validate the XML or provide other capabilities, we need to set the
parser features before we create the
DocumentBuilder
object by calling methods for the
DocumentBuilderFactory
object.
You can see that we get a version of the Crimson parser as a DOM parser. Many DOM parsers are built
on top of SAX parsers and this is the case with both the Crimson and Xerces parsers.
Setting DOM Parser Features
The idea of a feature for a DOM parser is the same as with SAX - a parser option that can be either on or off.
The
DocumentBuilderFactory
object has the following methods for setting DOM parser features:
setNamespaceAware(boolean aware)
Calling this method with a
true
argument
sets the parser to be namespace aware.
The default setting is
false
.
setValidating(boolean validating)
Calling this method with a
true
argument
sets the parser to validate the XML in a
document as it is parsed. The default
setting is
false
.
setIgnoringElementContentWhitespace(
boolean ignore)
Calling this method with a
true
argument
sets the parser to remove ignorable
whitespace in element content so the
Document
object produced by a parser
will not contain ignorable whitespace. The
default setting is
false
.
setIgnoringComments(boolean ignore)
Calling this method with a
true
argument
sets the parser to remove comments as the
document is parsed. The default setting is
false
.
setExpandEntityReferences(
boolean expand)
Calling this method with a
true
argument
sets the parser to expand entity references.
The default setting is
true
.
setCoalescing(boolean coalesce)
Calling this method with a
true
argument
sets the parser to convert
CDATA
sections
to text and append it to any adjacent text.
The default setting is
false
.
As you see, by default the parser that is produced is neither namespace aware nor validating. We should
at least set these two features before creating our parser. This is quite simple: