HTML and CSS Reference
In-Depth Information
HTML 4.01 took things a huge leap further by adding support for Unicode. Unicode
is a standard character encoding system that, although backward compatible with
our familiar ASCII encoding, offers the capability to encode characters in almost any
of the world's languages, including Chinese and Japanese. This means that docu-
ments can support any language, and that one document can contain multiple lan-
guages. All modern browsers support Unicode, and can render documents that use
all the characters provided by Unicode as long as the necessary fonts are available.
Specifying Character Encoding
Most characters on a web page are the same in Unicode (UTF-8), Western (ISO-8859-1),
and Western (Windows-1252). However, some more exotic characters differ between
them. If these characters are included on a web page and the character encoding does not
include those characters, they will display incorrectly. For example, if the browser thinks
a page is encoded in UTF-8 and the page includes smart (curly) quotation marks that
were copied from a Word document, those characters will display as unintelligible
symbols.
This is one of the reasons to use named entities wherever possible. Browsers will trans-
late those entities to the proper characters, regardless of the character set used. You can
change the character set the browser is using for a web page. This can prove useful when
a page has the wrong encoding, or if you just want to experiment with different character
sets. In my browser, you can change character sets using the Character Encoding item in
the View menu. The character encoding used for the page will be selected in that menu,
and you can change to another one by selecting it.
You can also specify the character set used by a page in the page header, using a <meta>
tag. Here's a tag that will specify UTF-8 as the character encoding for that page:
<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8”>
This indicates to the browser that the page is an HTML page encoded using utf-8 . The
<meta> must be placed somewhere within the <head> tag. The <meta> tag is used to
specify or override information that is normally provided by the web server, and indeed,
web server software enables you to specify a character encoding for all the pages that it
serves. If you need to use a specific character encoding for all the pages on your site,
you'll want to configure your web server to use that encoding, instead of adding a
<meta> tag to every page on your site.
7
 
Search WWH ::




Custom Search