HTML and CSS Reference
In-Depth Information
not visible but reserve some space when rendered. The list of whitespace characters varies from context to context.
For example, the form feed control character is considered as whitespace in HTML but not in XML. Each markup
language defines those few whitespace characters that can be applied as part of the markup syntax. The XML
specification defines whitespace as a combination of one or more of the following characters: space ( U+0020 ),
carriage return ( U+000D ), line feed ( U+000A ), or tab ( U+0009 ). HTML 4.01 also supports the form feed character
( U+000C ) which cannot be used in XHTML.
Not all whitespace characters can be typed in from the keyboard, although the most common ones, such as a
blank space (the basic word divider in Western languages) or a single tabulator, can be typed using the spacebar and
the Tab key, respectively. Advanced text editors usually provide inserting options for whitespaces (see the later section
“Development Tools”).
A very bad practice from the 1990s is to provide whitespaces for typography or layout by embedding blank
images, such as 1×1 pixel spacer.gif files, instead of whitespace characters, margins, or paddings. The biggest
disadvantage of this technique is the lack of structure or semantic meaning in the markup. Such images also have
a negative effect on searchability and accessibility (text browsers and screen readers would read aloud “spacer.gif ”
repeatedly). Another huge problem with spaceholder images is that even the slightest changes in the markup can
completely destroy the site layout.
NFC Normalization Is Recommended
In Unicode the same text can be provided with different character sequences. The accentuated a (in other words, á ),
for example, can be represented either as the pre-composed U+00E1 (Latin small letter a with acute) or as the decomposed
sequence of U+0061 (Latin small letter a) and U+0301 (Combining acute accent).
The Unicode standard supports four normalization forms : NFC , NFD , NFKC , and NFKD where C stands for
composed (precomposed), D for decomposed, and K represents compatibility.
The normalization form is especially important when accents or other diacritics are used in (X)HTML identifiers
or CSS selectors and class names. If such a word is used in precomposed form in the HTML (for example,
<div id="hangsúlyos"> ), but in decomposed form in the CSS (for example, #hangsúlyos { color: red; } ), then
the selector won't match the class name. This problem can be avoided by completely eliminating accented characters
in markup attributes and CSS properties, and use standard English characters only, which is the best practice.
W3C recommends NFC normalization—which is supported by advanced text editors by default—on the Web to
improve interoperability [13].
Unicode Should Be Preferred
Web pages should use one character encoding at a time. Different parts of the same document should not be encoded
with different encoding schemes.
UTF-8 character encoding can simplify multilingual sites. Unicode allows more languages to be used on a
single page than any other encoding system, which makes it ideal for content, forms, scripts, and databases. Due
to its powerful features, Unicode should be used wherever possible [14]. Thanks to the increasing popularity
of HTML5 templates and best practices, web designers tend to use UTF-8 for all their projects. The global
distribution of UTF-8 eliminates incorrect automatic encoding detection in browsers rendering documents with
special characters.
Using Unicode does not guarantee that texts will be displayed correctly in browsers. Several scripting languages
such as Arabic require additional techniques to ensure the appropriate character sequence of glyphs.
 
Search WWH ::




Custom Search