HTML and CSS Reference
To achieve well-formedness in SGML languages such as HTML, elements should be opened and closed properly.
Empty elements must also be terminated. Elements should be nested properly so that overlapping does not occur.
The root element of the document should contain all other elements.
Since SGML parsers are extremely error-tolerant, these rules are rarely followed completely by HTML developers,
which results in markup errors. Thus, the lack of well-formedness leads directly to incorrect, nonstandard markup.
In XML languages such as XHTML, well-formedness has additional requirements. The element tags are case
sensitive; that is, start and end tags must match exactly. Well-formed XML documents should contain properly
encoded and legal Unicode characters only. These characters, however, can also be used directly in element names
and attributes, not just in character data (document text). Characters with special meaning in XML can be used for
markup instructions only, for example, < , > , or & . If they are intended to be represented as text, their entity codes
should be applied (see the section “Entity references”).
Characters that go against well-formedness rules can cause certain XML parsers to be unable to process XML
files (XHTML documents, RDF metadata, RSS feed channels, and so on). Such special characters might also result
in error messages. A single (not well-formed) character can make the whole file impossible to process. For example,
the XML file of a valid RSS feed opened locally in a modern browser is presented as a tree structure. The same file
retrieved from a server is represented as a news feed. If the file, however, contains just one illegal character, the
browser gives an error message instead of displaying the page content (Figure 1-5 ).
Figure 1-5. An XML parsing error in a browser