HTML and CSS Reference
In-Depth Information
xmllint
You can also use a generic XML validator such as xmllint, which is bundled on many UNIX machines and is also
available for Windows. It is part of libxml2, which you can download from
http://xmlsoft.org/
.
There are advantages and disadvantages to using a generic XML validator to check HTML. One advantage is that
you can separate well-formedness checking from validity checking. It is usually easier to fix well-formedness
problems first and then fix validity problems. Indeed, that is the order in which this topic is organized. Well-
formedness is also more important than validity.
The first disadvantage of using a generic XML validator is that it won't catch HTML-specific problems that are not
specifically spelled out in the DTD. For instance, it won't notice an
a
element nested inside another
a
element
(though that problem doesn't come up a lot in practice). The second disadvantage is that it will have to actually
read the DTD. It doesn't assume anything about the document it's checking.
Using xmllint to check for well-formedness is straightforward. Just point it at the local file or remote URL you
wish to check from the command line. Use the
--noout
option to say that the document itself shouldn't be
printed and
--loaddtd
to allow entity references to be resolved. For example:
$ xmllint --noout --loaddtd http://www.aw.com
value for attribute nowrap
<TD class="headerBg" bgcolor="#004F99" nowrap align="left">
^
attributes construct error
<TD class="headerBg" bgcolor="#004F99" nowrap align="left">
^
Start Tag TD line 118
<TD class="headerBg" bgcolor="#004F99" nowrap align="left">
^
tag mismatch: IMG line 120 and A
Benjamin Cummings" WIDTH="84" HEIGHT="64" HSPACE="0"
VSPACE="0" BORDER="0"></A>
...
When you first run a report such as this, the number of error messages can be daunting. Don't despair—start at
the top and fix the problems one by one. Most errors fall into certain common categories that we will discuss
later in the topic, and you can fix them en masse. For instance, in this example, the first error is a valueless
nowrap
attribute. You can fix this simply by searching for
nowrap
and replacing it with
nowrap="nowrap"
.
Indeed, with a multifile search and replace, you can fix this problem on an entire site in less than five minutes.
(I'll get to the details of that a little later in this chapter.)
After each change, you run the validator again. You should see fewer problems with each pass, though
occasionally a new one will crop up. Simply iterate and repeat the process until there are no more well-
formedness errors. The next problem is an
IMG
element that uses a start-tag rather than an empty-element tag.
This one isn't quite as easy, but you can fix most occurrences by searching for
BORDER="0">
and replacing it
with
border="0" />
. That won't catch all of the problems with
IMG
elements, but it will fix a lot of them.
It is important to start with the first error in the list, though, and not pick an error randomly. Often, one early
mistake can cause multiple well-formedness problems. This is especially true for omitted start-tags and end-
tags. Fixing an early problem often removes the need to fix many later ones.
Search WWH ::
Custom Search