Java Reference
In-Depth Information
Now that you know a bit more about XML elements and what goes into a DTD, I can formulate what you
must do to ensure your XML document is well-formed. The rules for a document to be well-formed are quite
simple:
1. If the XML declaration appears in the prolog, it must include the XML version. Other specifications
in the XML document must be in the prescribed sequence — character encoding followed by stan-
dalone specification.
2. If the document type declaration appears in the prolog, the DOCTYPE name must match that of the root
element, and the markup declarations in the DTD must be according to the rules for writing markup
declarations.
3. The body of the document must contain at least one element, the root element, which contains all the
other elements, and an instance of the root element must not appear in the content of another element.
All elements must be properly nested.
4. Elements in the body of the document must be consistent with the markup declarations identified by
the DOCTYPE declaration.
The rules for writing an XML document are absolutely strict. Break one rule and your document is not
well-formed and is not processed. This strict application of the rules is essential because you are commu-
nicating data and its structure. If any laxity were permitted, it would open the door to uncertainty about how
the data should be interpreted. HTML used to be quite different from XML in this respect. Until recently,
the rules for writing HTML were only loosely applied by HTML readers such as web browsers.
For example, even though a paragraph in HTML should be defined using a start tag, <p>, and an end tag,
</p> , you can usually get away with omitting the end tag, and you can use both capital and lowercase p , and
indeed close a capital P paragraph with a lowercase p , and vice versa. You can often have overlapping tags
in HTML and get away with that, too. Although it is not to be recommended, a loose application of the rules
for HTML is not so harmful because HTML is concerned only with data presentation. The worst that can
happen is that the data does not display quite as you intended.
In 2000, the W3C released the XHTML 1.0 standard that makes HTML an XML language, so more and
more HTML documents are conforming to this. The enduring problem is, of course, that the Internet has ac-
cumulated a great deal of material over many years that is still very useful but that will never be well-formed
XML, so browsers may never be fully XML-compliant.
XML NAMESPACES
Even though they are very simple, XML namespaces can be very confusing. The confusion arises because it
is so easy to make assumptions about what they imply when you first meet them. Let's look briefly at why
you have XML namespaces in the first place, and then see what an XML namespace actually is.
You saw earlier that an XML document can have only one DOCTYPE declaration. This can identify an ex-
ternal DTD by a URI or include explicit markup declarations, or it may do both. What happens if you want to
combine two or more XML documents that each has its own DTD into a single document? The short answer
is that you can't — not easily anyway. Because the DTD for each document has been defined without regard
for the other, element name collisions are a real possibility. It may be impossible to differentiate between
different elements that share a common name, and in this case major revisions of the documents' contents,
as well as a new DTD, are necessary to deal with this. It won't be easy.
XML namespaces are intended to help deal with this problem. They enable names used in markup to be
qualified, so that you can make duplicate names that are used in different markup unique by putting them in
Search WWH ::




Custom Search