é and É but not &EAcute; or É . Even a browser operating in HTML mode can
guess wrong if you don't have the right case in the entity reference.
Generic XML tools don't care about case but do care that it matches. That is, a <table> start-tag is closed by a
</table> end-tag but not by </TABLE> or </Table> . The id attribute has the type ID as defined in the XHTML
DTD and can be used as a link anchor. However, the id attribute does not and cannot.
Potential Trade-offs
There are relatively few trade-offs for converting to lowercase. All modern browsers support lowercase tag
names without any problems. A few very old browsers that were never in widespread use, such as HotJava, only
supported uppercase for some tags. The same is true of early versions of Java Swing's built-in HTML renderer.
However, this has long since been fixed.
It is also possible that some homegrown scripts based on regular expressions may not recognize lowercase
forms. If you have any scripts that screen-scrape your HTML, you'll need to check them to make sure they're
also ready to handle lowercase tag names. Once you're done making the document well-formed, it may be time
to consider refactoring those scripts, too, so that they use a real parser instead of regular expression hacks.
However, that can wait. Usually it's simple enough to change the expressions to look for lowercase tag names
instead of uppercase ones, or to not care about the case of the tag names at all.
The first rule of well-formedness is that every start-tag has a matching end-tag. The matching part is crucial.
Although classic HTML is case-insensitive, XML and XHTML are not. <DIV> is not the same as <div> and a
</div> end-tag cannot close a <DIV> start-tag.
For purely well-formedness reasons, all that's needed is to normalize the case. All tags could be capitalized or
not, as long as you're consistent. However, it's easiest for everyone if we pick one case convention and stick to
it. The community has chosen lowercase for XHTML. Thus, the first step is to convert all tag names, attribute
names, and entity names to lowercase. For example:
<P> to <p>
<Table> to <table>
</DIV> to </div>
<BLOCKQUOTE CITE=",372,n,n"> to <blockquote
&COPY; to &copy;
There are several ways to do this.
The first and the simplest is to use TagSoup or Tidy in XHTML mode. Along with many other changes, these
