Java and XML - Beginning Java 2 SDK

Java Reference

In-Depth Information

The XML document would then be:

<?xml version="1.0"?>

<!DOCTYPE proverb SYSTEM "proverbDoc.dtd">

<proverb>A little knowledge is a dangerous thing.</proverb>

The DTD is referenced by a relative URI that is relative to the directory containing the document.

When you want to have both an internal and an external subset you just put both in the DOCTYPE

declaration with the external DTD reference appearing first. Entities from both are available for use in

the document but where there is any conflict between them the entities defined in the internal subset

take precedence over those declared in the external subset.

The syntax for defining elements and their attributes is rather different from the syntax for XML

markup. It also can get quite complex so we won't be able to go into it comprehensively here. However,

we do need to have a fair idea of how a DTD is put together in order to understand the operation of the

Java API for XML, so let's look at some of the ways in which we can define elements in a DTD.

Defining Elements in DTDs

The DTD will define each type of element that can appear in the document using an ELEMENT type

declaration. For example, the <address> element could be defined like this:

<!ELEMENT address (buildingnumber, street, city, state, zip)>

This defines the element with the name address . The information between the parentheses specifies

what can appear within an <address> element. The definition states that an <address> element

contains exactly one each of the elements <buildingnumber> , <street> , <city> , <state> , and

<zip> in that sequence. This is an example of element content since only elements are allowed within

an <address> element. Note the space that appears between the element name and the parentheses

enclosing the content definition. This is required, and a parser will flag the absence of at least one space

here as an error. The ELEMENT identifier must be in capital letters and must immediately follow the

opening " <!" .

The definition of the <address> above makes no provision for anything other than the five elements

shown, and in that sequence. Any whitespace that you put between these elements in a document is therefore

not part of the content and will be ignored by a parser, and therefore it is known as ignorable whitespace .

That said, you can still find out if there is whitespace there when the document is parsed, as we shall see.

We can define the <buildingnumber> element like this:

<!ELEMENT buildingnumber (#PCDATA)>

This states that the element can only contain parsed character data, specified by #PCDATA . This is just

ordinary text, and since it will be parsed, it cannot contain markup. The # character preceding the word

PCDATA is necessary just to ensure it cannot be confused with an element or attribute name - it has no

other significance. Since element and attribute names must start with a letter or an underscore, the #

prefix to PCDATA ensures that it cannot be interpreted as such.

Search WWH ::

Custom Search

Home