Java Reference
In-Depth Information
This states that the element can contain only parsed character data, specified by #PCDATA . This is just or-
dinary text, and because it is parsed, it cannot contain markup. The # character preceding the word PCDATA is
necessary just to ensure it cannot be confused with an element or attribute name — it has no other signific-
ance. Because element and attribute names must start with a letter or an underscore, the # prefix to PCDATA
ensures that it cannot be interpreted as such.
The PCDATA specification does provide for markup — child elements — to be mixed in with ordinary
text. In this case you must specify the names of the elements that can occur mixed in with the text. If you
want to allow a <suite> element specifying a suite number to appear alongside the text within a <build-
ingnumber> element, you could express it like this:
<!ELEMENT buildingnumber (#PCDATA|suite)*>
This indicates that the content for a <buildingnumber> element is parsed character data, and the text can
be combined with <suite> elements. The | operator here has the same meaning as the | operator you read
about in the context of regular expressions in Chapter 15. It means one or other of the two operands, but not
both. The * following the parentheses is required here and has the same meaning as the * operator that you
also read about in the context of regular expressions. It means that the operand to the left can appear zero or
more times.
If you want to allow several element types to be optionally mixed in with the text, you separate them by
| . Note that it is not possible to control the sequence in which mixed content appears.
The other elements used to define an address are similar, so you could define the whole document with
its DTD like this:
<?xml version="1.0"?>
<!DOCTYPE address
[
<!ELEMENT address (buildingnumber, street, city, state, zip)>
<!ELEMENT buildingnumber (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
]>
<address>
<buildingnumber> 29 </buildingnumber>
<street> South LaSalle Street</street>
<city>Chicago</city>
<state>Illinois</state>
<zip>60603</zip>
</address>
Note that you have no way with DTDs to constrain the parsed character data in an element definition. It
would be nice to be able to specify that the building number had to be numeric, for example, but the DTD
grammar and syntax provide no way to do this. This is a serious limitation of DTDs and one of the driving
forces behind the development of XML Schemas, which is an XML-based description language that sup-
ports data types and offers an alternative to DTDs. I introduce XML Schemas a little later in this chapter.
If you were to create the DTD for an address document as a separate file, the file contents would just
consist of the element definitions:
Search WWH ::




Custom Search