HTML and CSS Reference
In-Depth Information
Wherever possible, type in the characters directly instead of their corresponding numeric references. Usually
there is no reason to insert a single apostrophe in the markup as ’ rather than the ' character itself. If a
character, such as a Japanese ideograph, cannot be typed in with the keyboard, the corresponding character can
be inserted using advanced software tools or copy-pasted from other applications, codecharts, or web sites via the
clipboard. Note that even advanced text editors display many of these directly inserted characters incorrectly during
development; however, browsers will display them correctly if the character encoding of the containing file has been
set properly and the file served correctly.
Entity References
Character entity references refer to characters by the name of the appropriate entity that has the desired character as
its replacement text in the form & name ; .
HTML supports 252 character entities [18]. In XHTML, there are 253 entities (including the 5 predefined entities
of XML 1.0) [19]; however, their application is affected by the way XHTML documents are processed. Keep in mind
that XHTML documents, if served correctly, are processed by XML parsers instead of SGML parsers that interpret
HTML documents. Those characters that have a meaning in XML, such as the less-than sign ( < ), cause parsing errors
if they are provided directly rather than using entities. There are only four character entities whose processing is
guaranteed in all XML environments: &amp; , &gt; , &lt; , and &quot; ( & , > , < , and " respectively). Fortunately, this
short list contains those very important character entities that can be used for syntactic notation (ampersand, greater
than, less than). W3C recommends the use of ampersand characters in href attributes of XHTML documents [20].
Particular attention should be paid to URIs that include parameters. Single ampersand characters in these URIs
should be replaced by the &amp; entity [21].
Although the &apos; entity (apostrophe, U+0027 ) is among the five predefined entities of XML, it should not be
used in XHTML [22].
Character references should be eliminated since virtually all characters can be represented directly in Unicode
including, but not limited to, all letters and ideograms of natural languages, accentuated letters, special characters,
mathematical signs, and symbols [23]. Direct character use is easier to interpret, maintain, and modify than numeric
or entity references (see Listing 2-8). Texts filled with character references are more difficult to extend and almost
impossible to search. Many characters cannot be represented by references, which often resulted in incorrect
characters on web pages in the 1990s. For example, the small o with tilde, õ, has been displayed instead of o with the
double acute accent (also known as the Hungarumlaut), ő, which is a different character.
Listing 2-8. Three Versions of the Same Central-European Text with Characters, Numeric, and Entity References
A HTML5 a HTML teljes megújulása, új funkciókkal felvértezve.
A HTML5 a HTML teljes meg&#250;jul&#225;sa, &#250;j funkci&#243;kkal felv&#233;rtezve.
A HTML5 a HTML teljes meg&uacute;jul&aacute;sa, &uacute;j funkci&oacute;kkal felv&eacute;rtezve.
Search WWH ::

Custom Search