HTML and CSS Reference
page individually and build the dictionary as you go, I find it more efficient to work in larger batches. The basic
procedure is as follows.
Generate a list of all possibly misspelled words in all documents.
Delete all actually misspelled words from the list. This requires the services of a native speaker who is an
excellent speller. What remains is a custom dictionary for your site.
Rerun the spell checker on one file at a time using the custom dictionary. This time, any words it flags
should be genuine spelling errors, so you should fix them.
Be sure to store the dictionary you create for later use. You will occasionally need to add new words to it as the
site grows and changes.
At least for English, the gold standard is the GNU Project's Aspell. This is really a library more than an end-user
program, but you can make it work by stringing together a few UNIX commands. Here's how I use it.
First I check an entire directory of files with this command:
$ cat *.html | aspell --mode=html list | sort | uniq
This types all HTML files in the directory, passes the output into the spell checker, sorts the results, and
uniquifies them (deletes duplicates). The result is a list of all the misspelled words in the directory, such as this:
Of course, looking at a list such as this, it will immediately strike you that most of these words are not in fact
misspelled. They are proper names, technical terms, foreign words, coinages, and other things the spell checker
doesn't recognize. Thus, next you inspect the output and use it to build a custom dictionary.
Pipe or copy the output into a text editor and delete all clearly misspelled words. (I am assuming here that
you're a solid speller. If not, hire someone who is. This is especially important if you're not a native speaker of
the language you're checking.) If you're in doubt about a word, delete it. You'll want to look at it in context
Save the remaining correctly spelled words in a file called customdict.txt. If the file is too large to inspect
manually, you may want to start with a smaller sample. Then compile this text file into a custom dictionary, like