Internationalization - Web Standards: Mastering HTML5, CSS3, and XML

HTML and CSS Reference

In-Depth Information

In-Document Declarations

Character encoding can be set by the @charset at-rule with the syntax shown in Listing 2-6.

Listing 2-6. Syntax of the @charset At-Rule

@charset "<charset-name>";

Only one @charset rule can be used per CSS file. It should be declared at the very beginning of the file. No

characters should precede the declaration (only BOM if the CSS file is Unicode encoded 7 ).

The charset-name can be one of the character sets defined by IANA [15]. Some encodings have multiple names

in the IANA registry (the one marked as preferred should be used). Listing 2-7 shows a typical example for character

encoding declaration of external CSS files.

Listing 2-7. Setting the Character Encoding of CSS with an At-Rule

@charset "UTF-8";

These rules can be used only in external style sheets. In-document style sheet declarations cannot use

@charset rules.

The HTML 4.01 specification defined a charset attribute to the link element for identifying the character

encoding of the target document. In HTML5, however, this attribute is obsolete and should not be used.

Escape Codes, Special Characters, and Symbols

In HTML and XHTML documents, each character can be typed in directly or represented by a character sequence

(also known as a character reference ). Two types of character sequences exist: numeric character references and

character entity references .

Assume a document fragment contains an a character with an accent ( á ). This character can be declared by either

the á or á numeric character references or by the á entity reference in (X)HTML documents

(see the following sections for details). However, the best practice is to type in the á character directly in the markup. The

same is true for the copyright sign ( © instead of © ), the registered trademark sign ( ® instead of ® ), and so on.

Characters should always be preferred to escape codes unless they are special characters with syntactic meaning

in (X)HTML or XML, or characters that are invisible or ambiguous. In such cases, using entities is mandatory [16]. In

other words, markup characters used in textual content or attribute values must be escaped . For example, when we

demonstrate (X)HTML source code blocks on a web page and want to avoid processing, the < and > characters should

be provided by their entity names ( < and > ) in the source code rather than typing them in directly. Analogously,

if an & character is needed as text within an RSS feed or an RDF file, the & entity should be used instead (see the

“Entity References” section for more information).

Numeric References

Numeric character references identify characters by Universal Character Set or Unicode codepoints in the form &# nnnn ;

where nnnn is the codepoint in decimal form. Both HTML and XHTML support hexadecimal references as well. In

HTML, they can be applied in either the &# Xhhhh ; or &# xhhhh ; form. Since XML is case sensitive, in XHTML they must

be in lowercase ( &# xhhhh ; ) [17]. The nnnn or hhhh can be any number of digits and may include leading zeros.

7 External CSS files are usually encoded in US-ASCII.

Search WWH ::

Custom Search

Home