HTML and CSS Reference
In-Depth Information
xmllint
You can also use a generic XML validator such as xmllint, which is bundled on many UNIX machines and is also
available for Windows. It is part of libxml2, which you can download from http://xmlsoft.org/ .
There are advantages and disadvantages to using a generic XML validator to check HTML. One advantage is that
you can separate well-formedness checking from validity checking. It is usually easier to fix well-formedness
problems first and then fix validity problems. Indeed, that is the order in which this topic is organized. Well-
formedness is also more important than validity.
The first disadvantage of using a generic XML validator is that it won't catch HTML-specific problems that are not
specifically spelled out in the DTD. For instance, it won't notice an a element nested inside another a element
(though that problem doesn't come up a lot in practice). The second disadvantage is that it will have to actually
read the DTD. It doesn't assume anything about the document it's checking.
Using xmllint to check for well-formedness is straightforward. Just point it at the local file or remote URL you
wish to check from the command line. Use the --noout option to say that the document itself shouldn't be
printed and --loaddtd to allow entity references to be resolved. For example:
$ xmllint --noout --loaddtd http://www.aw.com
http://www.aw-bc.com/:118: parser error : Specification mandate
value for attribute nowrap
<TD class="headerBg" bgcolor="#004F99" nowrap align="left">
^
http://www.aw-bc.com/:118: parser error :
attributes construct error
<TD class="headerBg" bgcolor="#004F99" nowrap align="left">
^
http://www.aw-bc.com/:118: parser error : Couldn't find end of
Start Tag TD line 118
<TD class="headerBg" bgcolor="#004F99" nowrap align="left">
^
http://www.aw-bc.com/:120: parser error : Opening and ending
tag mismatch: IMG line 120 and A
Benjamin Cummings" WIDTH="84" HEIGHT="64" HSPACE="0"
VSPACE="0" BORDER="0"></A>
...
When you first run a report such as this, the number of error messages can be daunting. Don't despair—start at
the top and fix the problems one by one. Most errors fall into certain common categories that we will discuss
later in the topic, and you can fix them en masse. For instance, in this example, the first error is a valueless
nowrap attribute. You can fix this simply by searching for nowrap and replacing it with nowrap="nowrap" .
Indeed, with a multifile search and replace, you can fix this problem on an entire site in less than five minutes.
(I'll get to the details of that a little later in this chapter.)
After each change, you run the validator again. You should see fewer problems with each pass, though
occasionally a new one will crop up. Simply iterate and repeat the process until there are no more well-
formedness errors. The next problem is an IMG element that uses a start-tag rather than an empty-element tag.
This one isn't quite as easy, but you can fix most occurrences by searching for BORDER="0"> and replacing it
with border="0" /> . That won't catch all of the problems with IMG elements, but it will fix a lot of them.
It is important to start with the first error in the list, though, and not pick an error randomly. Often, one early
mistake can cause multiple well-formedness problems. This is especially true for omitted start-tags and end-
tags. Fixing an early problem often removes the need to fix many later ones.
 
Search WWH ::




Custom Search