HTML and CSS Reference
In-Depth Information
Table 2-4. The Most Important Formatting Characters That Can Also Be Used for Markup [9]
Codepoint(s)
Name or Function
Comment
U+00A0
Nonbreakable space
Line break control.
U+00AD
Soft hyphen
Line break control.
U+200B
Zero-width space
Line break control.
U+200C .. U+200D
Zero-width join controls (ZWJ and ZWNJ)
Required for Persian and many Indic scripts.
U+200E .. U+200F
Implicit directional marks (LRM and RLM)
LRM and RLM are allowed.
U+2011
Nonbreaking hyphen
Line break control.
U+2044
Fraction slash
Alternatively, MathML markup can be used.
U+2060
Word joiner
This should be used for word joiner instead of
U+FEFF (ZWNBSP).
U+2061 .. U+2064
Invisible mathematical operators
Mathematical use.
U+2FF0 .. U+2FFB
Ideographic character description
Graphic characters (not controls).
U+303E
Ideographic variation indicator
Graphic character (not a control).
FE00 .. FE0F
Variation selectors
Modify graphic characters.
E0100 .. E01DF
Variation selectors
Modify graphic characters.
Special Characters
Certain Unicode characters deserve extended attention because they should be used with caution.
The Byte-Order Mark (BOM)
Unicode files can contain special bytes at the very beginning known as the byte-order mark (BOM). This codepoint is
the U+FEFF (Zero-width non-breaking space, ZWNBSP). As mentioned earlier, the byte order of UTF-16 and UTF-32
encoded files should be declared, and the BOM provides this information.
In UTF-16, the 2 or 4 bytes of characters can be ordered in two ways (little-endian or big-endian—defining the
direction the bytes should be read in). To choose from the two, documents encoded in UTF-16 should always start
with the BOM. In UTF-8, the BOM is optional since there are no alternate byte sequences, but if it is still provided, it
is called the UTF-8 signature . According to the I18N Activity Group at W3C, the byte-order mark should be omitted in
UTF-8 [10], mainly because it could cause display problems in some browsers. Typically it produces an extra line or
unwanted characters at the top of the page [11]. An advanced text editor or Richard Ishida's UTF-8 BOM tester [12] can
be used to check the presence of UTF-8 signatures.
Whitespace Characters
Some Unicode characters are (invisible) whitespace characters that have different line-breaking properties,
different ligating properties, and different widths. These characters are used to separate different parts of the
document with line breaks, tabulators, and spaces. They represent horizontal or vertical spaces on web pages and
contribute to the appearance and layout of content blocks or the entire page. Whitespace characters are typically
 
 
Search WWH ::




Custom Search