HTML and CSS Reference
Chapter 3. Well-Formedness
The very first step in moving markup into modern form is to make it well-formed. Well-formedness is the basis
of the huge and incredibly powerful XML tool chain. Well-formedness guarantees a single unique tree structure
for the document that can be operated on by the DOM, thus making it the basis of reliable, cross-browser
Validity, although important, is not nearly as crucial as well-formedness. There are often good reasons to
compromise on validity. In fact, I often deliberately publish invalid pages. If I need an element the DTD doesn't
allow, I put it in. It won't hurt anything because browsers ignore elements they don't understand. If I have a
blockquote that contains raw text but no elements, no great harm is done. If I use an HTML 5 element such as m
that Opera recognizes and other browsers don't, those other browsers will just ignore it. However, if the page is
malformed, the consequences are much more severe.
First, I won't be able to use any XML tools, such as XSLT or SAX, to process the page. Indeed, almost the only
thing I can do with it is view it in a browser. It is very hard to do any reliable automated processing or testing
with a malformed page.
Second, browser display becomes much more unpredictable. Different browsers fill in the missing pieces and
enough without worrying about what tree each browser will construct from ambiguous HTML. Making the page
well-formed makes it a lot more likely that I can make it behave as I like across a wide range of browsers.
What Is Well-Formedness?
Well-formedness is a concept that comes from XML. Technically, it means that a document adheres to certain
rigid constraints, such as every start-tag has a matching end-tag, elements begin and end in the same parent
element, and every entity reference is defined.
Classic HTML is based on SGML, which allows a lot more leeway than does XML. For example, in HTML and
SGML, it's perfectly OK to have a <br> or <li> tag with no corresponding </br> and </li> tags. However, this
is no longer allowed in a well-formed document.
Well-formedness ensures that every conforming processor treats the document in the same way at a low level.
For example, consider this malformed fragment:
<p>The quick <strong>brown fox</p>
jumped over the
The strong element begins in one paragraph and ends in the next. Different browsers can and do build different
internal representations of this text. For example, Firefox and Safari fill in the missing start-and end-tags
(including those between the paragraphs). In essence, they treat the preceding fragment as equivalent to this
<p>The quick <strong>brown fox</strong></p>
<strong>jumped over the </strong>
This creates the tree shown in Figure 3.1 .