Accessibility - Refactoring HTML: Improving the Design of Existing Web Applications

HTML and CSS Reference

In-Depth Information

expression should find most documents that do not have a lang or xml:lang attribute on their root html

element:

<html\s+((id|class|dir|xmlns)\s*=\s*("[^"]+"|'[^']+')\s*)*>

This will find most cases you need to deal with.

The language codes themselves are standardized in the IANA Language Subtag Registry. Where possible, you

should use the standard two-letter codes as shown in Table 6.3 . This is an abbreviated list. You can find the full

list at www.iana.org/assignments/language-subtag-registry .

Table 6.3. Common Language Codes

Language

Code

Amharic

am

Arabic

ar

Czech

cs

German

de

Greek

el

English

en

Esperanto

eo

Spanish

es

French

fr

Hindi

hi

Indonesian

id

Italian

it

Japanese

ja

Korean

ko

Dutch

nl

Portuguese

pt

Russian

ru

Vietnamese

vi

Chinese

zh

Although there are many more codes than I've shown in Table 6.3 , there are even more languages on the

planet (about 6,000) than there are two-letter codes. Less common languages now use three-letter codes. For

instance, Coptic has the code cop. There are also dialect subcodes you can use. For example, en-US is English

as spoken in the United States, whereas en-GB is English as spoken in Great Britain. This might matter a little to

search engines or spell checkers. However, getting this right is not nearly as important as identifying the

primary language.

Although they are redundant, you should include both lang and xml:lang attributes, at least for now. Older

Refactoring HTML: Improving the Design of Existing Web Applications

Search WWH ::

Custom Search

Home