Java Reference
In-Depth Information
Well-Formed Documents
HTML is a sloppy language in which elements can be specified out of order, end tags
canbeomitted,andsoon.Thecomplexityofawebbrowser'spagelayoutcodeispartly
due to the need to handle these special cases. In contrast, XML is a much stricter lan-
guage.TomakeXMLdocumentseasiertoparse,XMLmandatesthatXMLdocuments
follow certain rules:
•
All elements must either have start and end tags or consist of empty-element
tags
. For example, unlike the HTML
<p>
tag that is often specified without
a
</p>
counterpart,
</p>
mustalsobepresent fromanXMLdocument per-
spective.
•
Tags must be nested correctly
. For example, while you'll probably get away
withspecifying
<b><i>JavaFX</b></i>
inHTML,anXMLparserwould
report an error. In contrast,
<b><i>JavaFX</i></b>
doesn't result in an
error.
•
All attribute values must be quoted
. Either single quotes (
'
) or double quotes
(
"
) are permissible (although double quotes are the more commonly specified
quotes). It is an error to omit these quotes.
•
Empty elements must be properly formatted
. For example, HTML's
<br>
tag
wouldhavetobespecifiedas
<br/>
inXML.Youcanspecifyaspacebetween
the tag's name and the
/
character, although the space is optional.
•
Be careful with case
.XMLisacase-sensitivelanguageinwhichtagsdiffering
incase(suchas
<author>
and
<Author>
)areconsidereddifferent.Itisan
errortomixstartandendtagsofdifferentcases,forexample,
<author>
with
</Author>
.
XML parsers that are aware of namespaces enforce two additional rules:
• Allelementandattributenamesmustnotincludemorethanonecoloncharacter.
• No entity names, processing instruction targets, or notation names (discussed
later) can contain colons.
An XML document that conforms to these rules is
well formed
. The document has
a logical and clean appearance, and is much easier to process. XML parsers will only
parse well-formed XML documents.