HTML and CSS Reference
In-Depth Information
Declaring Character Encoding for the Markup
Character encoding of web documents can be determined in many ways:
Using the HTTP header
Using in-document declarations
Using the
pragma directive (HTML 4, XHTML, (X)HTML5)
meta charset attribute (HTML5)
Using the
XML declaration 5 (XHTML)
Using the
The last three options are used in the markup, but not the first one, which is applied by the web server. Not all
in-document declarations can be used in all markup languages, but the pragma directive can be used in most. Since
browsers retrieve the character encoding declaration to use the right encoding scheme and display the content
correctly, these declarations must correspond to the actual character encoding of the file.
If the different encoding declarations are inconsistent or contradictory, the following precedence rules determine
the encoding to apply:
HTTP Content-Type header
1.
2.
Byte-order mark 6
3.
XML declaration
The meta element
4.
The link charset attribute
5.
Encoding Declaration in the HTTP Header
The highest precedence declaration sets the character encoding in the HTTP header. Listing 2-1 shows an example.
Listing 2-1. Setting the Character Encoding in the HTTP Header
HTTP/1.1 200 OK
Date: Tue, 02 Aug 2011 14:18:05 GMT
Server: Apache/2.2.3 (Oracle)
...
Content-Type: text/html; charset=UTF-8
Content-Language: en
These declarations should be consistent with the in-document declarations.
Documents using UTF-16 should be declared as UTF-16 rather than UTF-16BE or UTF-16LE and provide a byte-order
mark in the file.
HTTP headers are used for other purposes too. For more information on the HTTP header, see Chapter 4.
5 The character encoding declaration, if provided exclusively using the XML declaration, is ignored by some rendering engines.
6 The BOM was added to the hierarchy in the HTML5 specification, but this is not implemented in all browsers yet.
 
Search WWH ::




Custom Search