Java and XML - Beginning Java

Java Reference

In-Depth Information

Now that you know a bit more about XML elements and what goes into a DTD, I can formulate what you

must do to ensure your XML document is well-formed. The rules for a document to be well-formed are quite

simple:

1. If the XML declaration appears in the prolog, it must include the XML version. Other specifications

in the XML document must be in the prescribed sequence — character encoding followed by stan-

dalone specification.

2. If the document type declaration appears in the prolog, the DOCTYPE name must match that of the root

element, and the markup declarations in the DTD must be according to the rules for writing markup

declarations.

3. The body of the document must contain at least one element, the root element, which contains all the

other elements, and an instance of the root element must not appear in the content of another element.

All elements must be properly nested.

4. Elements in the body of the document must be consistent with the markup declarations identified by

the DOCTYPE declaration.

The rules for writing an XML document are absolutely strict. Break one rule and your document is not

well-formed and is not processed. This strict application of the rules is essential because you are commu-

nicating data and its structure. If any laxity were permitted, it would open the door to uncertainty about how

the data should be interpreted. HTML used to be quite different from XML in this respect. Until recently,

the rules for writing HTML were only loosely applied by HTML readers such as web browsers.

For example, even though a paragraph in HTML should be defined using a start tag, <p>, and an end tag,

</p> , you can usually get away with omitting the end tag, and you can use both capital and lowercase p , and

indeed close a capital P paragraph with a lowercase p , and vice versa. You can often have overlapping tags

in HTML and get away with that, too. Although it is not to be recommended, a loose application of the rules

for HTML is not so harmful because HTML is concerned only with data presentation. The worst that can

happen is that the data does not display quite as you intended.

In 2000, the W3C released the XHTML 1.0 standard that makes HTML an XML language, so more and

more HTML documents are conforming to this. The enduring problem is, of course, that the Internet has ac-

cumulated a great deal of material over many years that is still very useful but that will never be well-formed

XML, so browsers may never be fully XML-compliant.

XML NAMESPACES

Even though they are very simple, XML namespaces can be very confusing. The confusion arises because it

is so easy to make assumptions about what they imply when you first meet them. Let's look briefly at why

you have XML namespaces in the first place, and then see what an XML namespace actually is.

You saw earlier that an XML document can have only one DOCTYPE declaration. This can identify an ex-

ternal DTD by a URI or include explicit markup declarations, or it may do both. What happens if you want to

combine two or more XML documents that each has its own DTD into a single document? The short answer

is that you can't — not easily anyway. Because the DTD for each document has been defined without regard

for the other, element name collisions are a real possibility. It may be impossible to differentiate between

different elements that share a common name, and in this case major revisions of the documents' contents,

as well as a new DTD, are necessary to deal with this. It won't be easy.

XML namespaces are intended to help deal with this problem. They enable names used in markup to be

qualified, so that you can make duplicate names that are used in different markup unique by putting them in

Search WWH ::

Custom Search

Home