HTML and CSS Reference
Convert Text to UTF-8
Reencode all text as Unicode UTF-8.
Pages that use any content except basic ASCII have cross-platform display problems. Windows encodings are
not interpreted correctly on the Mac and vice versa. Web browsers guess what encoding they think a page is in,
but they often guess wrong.
UTF-8 is a standard encoding that works across all web browsers and is supported by all major text editors and
other tools. It is reasonably fast, small, and efficient. It can support all Unicode characters and is a good basis
for internationalization and localization of pages.
You need to be able to control your web server's HTTP response headers to properly implement this. This can be
problematic in shared hosting environments. Bad tools do not always recognize UTF-8 when they should.
There are two steps here. First, reencode all content in UTF-8. Second, tell clients that you've done that.
Reencoding is straightforward, provided that you know what encoding you're starting with. You have to tell Tidy
that you want UTF-8, but once you do, it will do the work:
$ tidy -asxhtml -m --output-encoding utf8 index.html
TagSoup you don't have to tell. It just produces UTF-8 by default.
A number of command-line tools and other programs will also save content in UTF-8 if you ask, such as GNU
recode ( www.gnu.org/software/recode/recode.html ), BBEdit, and jEdit. You should also set your editor of choice
to save in UTF-8 by default.
The next step is to tell the browsers that the content is in UTF-8. There are three parts to this.
Add a byte order mark.
Add a meta tag.
Specify the Content-type header.
The byte order mark is Unicode character 0xFEFF, the zero-width space. When this is the first character in a
document, the browser should recognize the byte sequence and treat the rest of the content as UTF-8. This
shouldn't be necessary, but Internet Explorer and some other tools are more reliable if they have it. Some
editors add this automatically and some require you to request it.
The second step is to add a meta tag in the head , such as this one: