HTML and CSS Reference
Regular expressions are well and good for individual, custom changes, but they can be tedious and difficult to
use for large quantities of changes. In particular, they are designed more to work with plain text than with
semistructured HTML text. For batch changes and automated corrections of common mistakes, it helps to have
tools that take advantage of the markup in HTML. The first such tool is Dave Raggett's Tidy
( www.w3.org/People/Raggett/tidy/ ), the original HTML fixer-upper. It's a simple, multiplatform command-line
program that can correct most HTML mistakes.
For purposes of this topic, you want to use Tidy with the -asxhtml command-line option. For example, this
command converts the file index.html to well-formed XHTML and stores the result back into the same file ( -m ):
$ tidy -asxhtml -m index.html
Frankly, you could do worse than just running Tidy across all your HTML files and calling it a day, but please
don't stop reading just yet. Tidy has a few more options that can improve your code further, and there are some
problems it can't handle or handles incorrectly. For example, when I used this command on one of my older
pages that I hadn't looked at in at least five years, Tidy generated the following error messages:
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 7 column 1 - Warning: <body> attribute "bgcolor" has
invalid value "#fffffff"
line 16 column 2 - Warning: <table> lacks "summary" attribute
line 230 column 1 - Warning: <table> lacks "summary" attribute
line 14 column 91 - Warning: trimming empty <p>
Info: Document content looks like XHTML 1.0 Transitional
5 warnings, 0 errors were found!
These are problems that Tidy mostly didn't know how to fix. It actually was able to supply a DOCTYPE because I
specified XHTML mode, which has a known DOCTYPE. However, it doesn't know what to do with
bgcolor="#fffffff" . The problem here is an extra f which should be removed, or perhaps the entire bgcolor
attribute should be removed and replaced with CSS.
Once you've identified a problem such as this, it's entirely possible that the same problem crops up in
multiple documents. Having noticed this in one file, it's worth doing a search and replace across the
entire directory tree to catch any other occurrences. Given the prevalence of copy and paste coding, few
mistakes occur only once.
The second two problems are tables that lack a summary attribute. This is an accessibility problem, and you
should correct it. Tidy actually prints some further details about this:
The table summary attribute should be used to describe