HTML and CSS Reference
In-Depth Information
Encodings
Tidy is surprisingly bad at detecting the character encoding of HTML documents, despite the relatively rich
metadata in most HTML documents for specifying exactly this. If your content is anything other than ASCII or
ISO-8859-1 (Latin-1), you'd best tell Tidy that with the --input-encoding option. For example, if you've saved
your documents in UTF-8, invoke Tidy thusly:
$ tidy -asxhtml --input-encoding utf8 index.html
Tidy generates ASCII text as output unless you tell it otherwise. It will escape non-ASCII characters using
named entities when available, and numeric character references when not. However, Tidy supports several
other common encodings. The only other one I recommend is UTF-8. To get it use the --output-encoding
option:
$ tidy -asxhtml --output-encoding utf8 index.html
The input encoding does not have to be the same as the output encoding. However, if it is you can just specify -
utf8 instead:
$ tidy -asxhtml -utf8 index.html
For various reasons, I strongly recommend that you stick to either ASCII or UTF-8. Other encodings do not
transfer as reliably when documents are exchanged across different operating systems and locales.
Pretty Printing
Tidy also has a couple of options that don't have a lot to do with the HTML itself, but do make documents
prettier to look at and thus easier to work with when you open them in a text editor.
The -i option indents the text so that it's easier to see how the elements nest. Tidy is smart enough not to
indent whitespace-significant elements such as pre .
The -wrap option wraps text at a specified column. Usually about 80 columns are nice.
$ tidy -asxhtml -utf8 -i -wrap 80 index.html
Generated Code
Tidy has limited support for working on PHP, JSP, and ASP pages. Basically, it will ignore the content inside the
PHP, ASP, or JSP sections and try to work on the rest of the HTML markup. However, that is very tricky to do. In
particular, most templating languages do not respect element boundaries. If code is generating half of an
element or a start-tag for an element that is later closed by a literal end-tag, it is very easy for Tidy to get
confused. I do not recommend using Tidy directly on these sorts of pages.
Instead, download some fully rendered pages from your web site after processing by the template engine. Run
Tidy on a representative sample of these pages, and then compare the results to the original pages. By looking
at the differences, you can usually figure out what needs to be changed in your templates; then make the
changes manually.
Although this does require more manual work and human intelligence, if each template is generating multiple
static pages, this process can finish sooner than semiautomated processing of large numbers of static HTML
pages.
Search WWH ::




Custom Search