HTML and CSS Reference
The data types that can be used in element contents and attribute values are defined by DTDs and specifications of
the markup language being used. While many elements and attributes allow most Unicode characters (such as the
p , div , and section elements), there are elements and attributes that have specific restrictions. For example, a link in
the href attribute of an a element must contain a valid URL or file path (Listing 3-25). The width attribute of an img
element should be a value expressed by a number, with or without the unit of a certain type (Listing 3-26). Always
make sure that you use valid attribute values only.
Listing 3-25. Correct and Incorrect URLs in the href Attribute Value
<a href="http://www.example.com/about/"> <!-- correct -->
<a href="contact.htm"> <!-- correct -->
<a href="a ; b.html"> <!-- incorrect (contains an illegal character) -->
Listing 3-26. Correct and Incorrect Width Attribute Values
<img src="/images/logo.png" width="128" height="128" alt="logo" /> <!-- correct -->
<img src="/images/logo.png" width="100px" height="100px" alt="logo" /> <!-- correct -->
<img src="/images/logo.png" width="78 pt " height="64 pt " alt="logo" /> <!-- incorrect
(not allowed unit) -->
Markup elements and attributes can contain a variety of data types, such as case information, SGML basic data
types, text strings, URIs, colors, lengths, content types, language codes, character encodings, single characters, dates
and times, link types, media descriptors, script data, and style sheet data .
The syntax of the core markup element content values and attribute values are derived from SGML tokens such
as the following:
• PCDATA : Parsed Character Data. Mixed content; in other words, an element can contain any
number of character data and/or child elements in arbitrary order.
• CDATA : Character data. A sequence of characters from the document character set and may
include character entities. CDATA attribute values should not contain leading or trailing
whitespace characters. User agents replace character entities with characters, replace carriage
returns and tabs with a single space, and ignore line feeds when interpreting CDATA attribute
values. For script and style elements, CDATA sections are treated as raw text and passed
forward as is. The end tag open delimiter </ is considered as the terminator of the element
• NAME, ID : Identifier tokens that must begin with a letter ( A - Z , a - z ) and may be followed by
any number of letters, digits ( 0 - 9 ), hyphens ( - ), underscores ( _ ), colons ( : ), and periods ( . )
• NUMBER : Tokens containing a minimum of one digit ( 0 - 9 ).
The SGML tokens were introduced in the ISO 8879 standard , and they determine the allowed values of the
data types to be used in markup attributes such as URLs, text, numbers, and so on. The supported characters in the
markup depend on the data type as some characters are reserved or considered unsafe for a particular data type.
The PCDATA and CDATA data types are used mainly in XML applications and serialization, including XHTML,
RSS, Atom, and so on (Chapter 7). SGML and XML Document Type Definition files also use PCDATA and CDATA for