Java Reference
In-Depth Information
The XML document would then be:
<?xml version="1.0"?>
<!DOCTYPE proverb SYSTEM "proverbDoc.dtd">
<proverb>A little knowledge is a dangerous thing.</proverb>
The DTD is referenced by a relative URI that is relative to the directory containing the document.
When you want to have both an internal and an external subset you just put both in the DOCTYPE
declaration with the external DTD reference appearing first. Entities from both are available for use in
the document but where there is any conflict between them the entities defined in the internal subset
take precedence over those declared in the external subset.
The syntax for defining elements and their attributes is rather different from the syntax for XML
markup. It also can get quite complex so we won't be able to go into it comprehensively here. However,
we do need to have a fair idea of how a DTD is put together in order to understand the operation of the
Java API for XML, so let's look at some of the ways in which we can define elements in a DTD.
Defining Elements in DTDs
The DTD will define each type of element that can appear in the document using an ELEMENT type
declaration. For example, the <address> element could be defined like this:
<!ELEMENT address (buildingnumber, street, city, state, zip)>
This defines the element with the name address . The information between the parentheses specifies
what can appear within an <address> element. The definition states that an <address> element
contains exactly one each of the elements <buildingnumber> , <street> , <city> , <state> , and
<zip> in that sequence. This is an example of element content since only elements are allowed within
an <address> element. Note the space that appears between the element name and the parentheses
enclosing the content definition. This is required, and a parser will flag the absence of at least one space
here as an error. The ELEMENT identifier must be in capital letters and must immediately follow the
opening " <!" .
The definition of the <address> above makes no provision for anything other than the five elements
shown, and in that sequence. Any whitespace that you put between these elements in a document is therefore
not part of the content and will be ignored by a parser, and therefore it is known as ignorable whitespace .
That said, you can still find out if there is whitespace there when the document is parsed, as we shall see.
We can define the <buildingnumber> element like this:
<!ELEMENT buildingnumber (#PCDATA)>
This states that the element can only contain parsed character data, specified by #PCDATA . This is just
ordinary text, and since it will be parsed, it cannot contain markup. The # character preceding the word
PCDATA is necessary just to ensure it cannot be confused with an element or attribute name - it has no
other significance. Since element and attribute names must start with a letter or an underscore, the #
prefix to PCDATA ensures that it cannot be interpreted as such.
Search WWH ::




Custom Search