Fundamental Syntax and Semantics - HTML5

HTML and CSS Reference

In-Depth Information

The character set declared in the response headers is generally taken in preference over

the character set specified in your document, so the headers are the preferred method

of communicating this information. However, if you cannot control what headers your

server sends, declaring the character set in your HTML document is the next best

option.

If a character set is declared neither in the document nor in the response headers, the

browser might choose one for you, and it may be the wrong one for your site's needs.

This not only can cause issues with rendering, but also poses a security risk.

Several years ago, a cross-site scripting vulnerability was discovered at

Google that demonstrated the importance of character encoding: http:

In previous versions of HTML, the character set needed to be declared with additional

attributes and values:

But, as with the DOCTYPE, HTML5 only needs the minimum information required

by browsers. Again, this helps with backward compatibility and makes it easier for

authors to implement.

Special characters

Unicode (UTF-8) is a versatile encoding that covers most web builders' needs. Some-

times, though, you need to include a character that is outside the UTF-8 encoding.

A great resource for character entities is at http://www.digitalmediami

nute.com/reference/entity/ . It includes the numeric, named, and Unicode

references for many of the more common characters allowed in HTML.

You can specify such characters with Numeric Character References (NCRs) or as

named entities in order to help browsers render them correctly. If you wanted a

copyright symbol, for example, you could include it in your HTML as an NCR:

©

or you could include it as a named entity:

©