HTML and CSS Reference
In-Depth Information
This style of validation is good when authoring. It is also extremely useful for spot-checking a site to see how
much work you're likely to do, or for working on a single page. However, at some point, you'll want to verify all
of the pages on a site, not just a few of them. For this, an automatic batch validator is a necessity.
The Log Validator
The W3C Markup Validation Service has given birth to the Log Validator ( www.w3.org/QA/Tools/LogValidator/ ),
a command-line tool written in Perl that can check an entire site. It can also use a web server's log files to
determine which pages are the most popular and start its analysis there. Obviously, you care a lot more about
fixing the page that gets 100 hits a minute than the one that gets 100 hits a year. The Log Validator will provide
you with a convenient list of problems to fix, prioritized by seriousness and popularity of page. Listing 2.1 shows
the beginning of one such list.
Listing 2.1. Log Validator Output
Results for module HTMLValidator
****************************************************************
Here are the 10 most popular invalid document(s) that I could
find in the logs for www.elharo.com.
Rank Hits #Error(s) Address
------ ------ ----------- -------------------------------------
1 2738 21 http://www.elharo.com/blog/feed/atom/
2 1355 21 http://www.elharo.com/blog/feed/
3 1231 3 http://www.elharo.com/blog/
4 1127 6 http://www.elharo.com/
6 738 3 http://www.elharo.com/blog/networks
/2006/03/18/looking-for-a-router/feed/
11 530 3 http://www.elharo.com/journal
/fruitopia.html
20 340 1 http://www.elharo.com/blog
/wp-comments-post.php
23 305 3 http://www.elharo.com/blog/birding
/2006/03/15/birding-at-sd/
25 290 4 http://www.elharo.com/journal
/fasttimes.html
26 274 1 http://www.elharo.com/journal/
The first two pages in this list are Atom feed documents, not HTML files at all. Thus, it's no surprise that they
show up as invalid. The third and fourth ones are embarrassing, though, since they're my blog's main page and
my home page, respectively. They're definitely worth fixing. The fifth most visited page on my site is valid,
however, so it doesn't show up in the list. Numbers 11, 25, and 26 are very old pages that predate XML, much
less XHTML. It's no surprise that they're invalid, but because they're still getting hits, it's worth fixing them.
Number 20 is also a false positive. That's just the comments script used to post comments. When the validator
tries to GET it rather than POST to it, it receives a blank page. That's not a real problem, though I might want to
fix it one of these days to show a complete list of the comments. Or perhaps I should simply set up the script to
return "HTTP error 405 Method Not Allowed" rather than replying with "200 OK" and a blank document.
After these, various other pages that aren't as popular are listed. Just start at the top and work your way down.
Search WWH ::




Custom Search