HTML and CSS Reference
The character set declared in the response headers is generally taken in preference over
the character set specified in your document, so the headers are the preferred method
of communicating this information. However, if you cannot control what headers your
server sends, declaring the character set in your HTML document is the next best
If a character set is declared neither in the document nor in the response headers, the
browser might choose one for you, and it may be the wrong one for your site's needs.
This not only can cause issues with rendering, but also poses a security risk.
Several years ago, a cross-site scripting vulnerability was discovered at
Google that demonstrated the importance of character encoding: http:
In previous versions of HTML, the character set needed to be declared with additional
attributes and values:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
But, as with the DOCTYPE, HTML5 only needs the minimum information required
by browsers. Again, this helps with backward compatibility and makes it easier for
authors to implement.
Unicode (UTF-8) is a versatile encoding that covers most web builders' needs. Some-
times, though, you need to include a character that is outside the UTF-8 encoding.
A great resource for character entities is at http://www.digitalmediami
nute.com/reference/entity/ . It includes the numeric, named, and Unicode
references for many of the more common characters allowed in HTML.
You can specify such characters with Numeric Character References (NCRs) or as
named entities in order to help browsers render them correctly. If you wanted a
copyright symbol, for example, you could include it in your HTML as an NCR:
or you could include it as a named entity:
Mark Pilgrim's “Dive Into HTML5” discussion about character encoding at http://di