Java Reference
In-Depth Information
<proverb>A little knowledge is a dangerous thing.</proverb>
All the internal definitions for elements used within the document appear between the square brackets in
the DOCTYPE declaration. In this case just one element is declared, the root element, and the element content
is PCDATA — parsed character data.
You could define an external DTD in a file with the name proverbDoc.dtd in the same directory as the
document. The file would contain just a single line:
<!ELEMENT proverb (#PCDATA)>
The XML document would then be the following:
<?xml version="1.0"?>
<!DOCTYPE proverb SYSTEM "proverbDoc.dtd">
<proverb>A little knowledge is a dangerous thing.</proverb>
The DTD is referenced by a relative URI that is relative to the directory containing the document.
When you want both an internal and external subset, you just put both in the DOCTYPE declaration, with
the external DTD reference appearing first. Entities from both are available for use in the document, but
where there is any conflict between them, the entities defined in the internal subset take precedence over
those declared in the external subset.
The syntax for defining elements and their attributes is rather different from the syntax for XML markup.
It also can get quite complex, so I'm not able to go into it comprehensively here. However, you do need to
have a fair idea of how a DTD is put together in order to understand the operation of the Java API for XML,
so let's look at some of the ways in which you can define elements in a DTD.
Defining Elements in DTDs
The DTD defines each type of element that can appear in the document using an ELEMENT type declaration.
For example, the <address> element could be defined like this:
<!ELEMENT address (buildingnumber, street, city, state, zip)>
This defines the element with the name address . The information between the parentheses specifies what
can appear within an <address> element. The definition states that an <address> element contains exactly
one each of the elements <buildingnumber> , <street> , <city> , <state> , and <zip> , in that sequence.
This is an example of element content because only elements are allowed within an <address> element.
Note the space that appears between the element name and the parentheses enclosing the content definition.
This is required, and a parser flags the absence of at least one space here as an error. The ELEMENT identifier
must be in capital letters and must immediately follow the opening “ <! ."
The preceding definition of the <address> element makes no provision for anything other than the five
elements shown, and in that sequence. Thus, any whitespace that you put between these elements in a doc-
ument is not part of the content and is ignored by a parser; therefore, it is known as ignorable whitespace .
That said, you can still find out if there is whitespace there when the document is parsed, as you later see.
You can define the <buildingnumber> element like this:
<!ELEMENT buildingnumber (#PCDATA)>
Search WWH ::




Custom Search