HTML and CSS Reference
expression should find most documents that do not have a lang or xml:lang attribute on their root html
This will find most cases you need to deal with.
The language codes themselves are standardized in the IANA Language Subtag Registry. Where possible, you
should use the standard two-letter codes as shown in Table 6.3 . This is an abbreviated list. You can find the full
list at www.iana.org/assignments/language-subtag-registry .
Table 6.3. Common Language Codes
Although there are many more codes than I've shown in Table 6.3 , there are even more languages on the
planet (about 6,000) than there are two-letter codes. Less common languages now use three-letter codes. For
instance, Coptic has the code cop. There are also dialect subcodes you can use. For example, en-US is English
as spoken in the United States, whereas en-GB is English as spoken in Great Britain. This might matter a little to
search engines or spell checkers. However, getting this right is not nearly as important as identifying the
Although they are redundant, you should include both lang and xml:lang attributes, at least for now. Older