Java Reference
In-Depth Information
Document xmlDoc = null;
try (BufferedInputStream in = new
BufferedInputStream(Files.newInputStream(xmlFile))){
xmlDoc = builder.parse(in);
} catch(SAXException | IOException e) {
e.printStackTrace();
System.exit(1);
}
This creates a
Path
object for the file and creates an input stream for the file in the
try
block. Calling
parse()
for the
builder
object with the input stream as the argument parses the XML file and returns it as
a
Document
object. Note that the entire XML file contents are encapsulated by the
Document
object, so in
practice this can require a lot of memory.
To compile this code you need
import
statements for the
BufferedInputStream
and
IOException
names in the
java.io
package, and
Paths
,
Path
, and
Files
names in the
java.nio.file
package, as well
as the
org.w3c.dom.Document
class name. After this code executes, you can call methods for the
xmlDoc
object to navigate through the elements in the document tree structure. Let's look at what the possibilities
are.
NAVIGATING A DOCUMENT OBJECT TREE
The
org.w3c.dom.Node
interface is fundamental to all objects that encapsulate components of an XML
document, and this includes the
Document
object itself. It represents a type that encapsulates a node in the
document tree.
Node
is also a super-interface of a number of other interfaces that declare methods for ac-
cessing document components. The subinterfaces of
Node
that identify components of a document are the
following:
•
Element
: Represents an XML element.
•
Text
:
R
epresents text that is part of element content. This is a subinterface of
CharacterData
,
which is a subinterface of
Node
.
Text
references, therefore, have methods from all three interfaces.
•
CDATASection
: Represents a
CDATA
section — unparsed character data. This extends
Text
.
•
Comment
: Represents a document comment. This interface extends the
CharacterData
interface.
•
DocumentType
: Represents the type of a document.
•
Document
: Represents the entire XML document.
•
DocumentFragment
: Represents a lightweight document object that encapsulates a subtree of a
document.
•
Entity
: Represents an entity that may be parsed or unparsed.
•
EntityReference
: Represents a reference to an entity.
•
Notation
: Represents a notation declared in the DTD for a document. A notation is a definition
of an unparsed entity type.
•
ProcessingInstruction
: Represents a processing instruction for an application.
Each of these interfaces declares its own set of methods and inherits the fields and methods declared in
the
Node
interface. Every XML document is modeled as a hierarchy of nodes that are accessible as one or
another of the interface types in the list. At the top of the node hierarchy for a document is the
Document
node that is returned by the
parse()
method. Each type of node may or may not have child nodes in the