Java Reference
In-Depth Information
Rules for a Well-Formed Document
Now that we know a bit more about XML elements and what goes into a DTD, we can formulate what
you must do to ensure your XML document is well-formed. The rules for a document to be well-formed
are quite simple:
1.
If the XML declaration appears in the prolog, it must include the XML version. Other
specifications in the XML document must be in the prescribed sequence - character encoding
then standalone specification.
2.
If the document type declaration appears in the prolog the DOCTYPE name must match that of
the root element and the markup declarations in the DTD must be according to the rules for
writing markup declarations.
3.
The body of the document must contain at least one element, the root element, which contains
all the other elements, and an instance of the root element must not appear in the content of
another element. All elements must be properly nested.
4.
Elements in the body of the document must be consistent with the markup declarations
identified by the DOCTYPE declaration.
The rules for writing an XML document are absolutely strict. Break one rule and your document is not
well-formed and will not be processed. This strict application of the rules is essential because we are
communicating data and its structure. If any laxity were permitted it would open the door to uncertainty
about how the data should be interpreted. HTML used to be quite different from XML in this respect. Until
recently, the rules for writing HTML were only loosely applied by HTML readers such as web browsers.
For instance, even though a paragraph in HTML should be defined using a begin tag, <p> , and an end
tag, </p> , you can usually get away with omitting the end tag, and you can use both capital and lower-
case p, and indeed close a capital-case P paragraph with a lower-case p, and vice versa. You can often
have overlapping tags in HTML and get away with that too. While it is not to be recommended, a loose
application of the rules for HTML is not so harmful since HTML is only concerned with data
presentation. The worst that can happen is that the data does not display quite as you intended.
Recently, the W3C has released a number of specifications that make HTML an XML language, and we
can expect compliance within the next few years. The enduring problem is, of course, that the Internet
has many years of material that is still very useful but that will never be well-formed XML, so browsers
may never be fully XML compliant.
XML Namespaces
This is the last topic we need a little insight into before we get back into Java programming. Even
though they are very simple, XML namespaces can be very confusing. The confusion arises because it is
so easy to make assumptions about what they imply when you first meet them. Let's look briefly at why
we have XML namespaces in the first place and then see what an XML namespace actually is.
Search WWH ::




Custom Search