Markup Languages: More Than HTML5 - Web Standards: Mastering HTML5, CSS3, and XML

HTML and CSS Reference

In-Depth Information

Data Types

The data types that can be used in element contents and attribute values are defined by DTDs and specifications of

the markup language being used. While many elements and attributes allow most Unicode characters (such as the

p , div , and section elements), there are elements and attributes that have specific restrictions. For example, a link in

the href attribute of an a element must contain a valid URL or file path (Listing 3-25). The width attribute of an img

element should be a value expressed by a number, with or without the unit of a certain type (Listing 3-26). Always

make sure that you use valid attribute values only.

Listing 3-25. Correct and Incorrect URLs in the href Attribute Value

Listing 3-26. Correct and Incorrect Width Attribute Values

<img src="/images/logo.png" width="78 pt " height="64 pt " alt="logo" /> <!-- incorrect

(not allowed unit) -->

Markup elements and attributes can contain a variety of data types, such as case information, SGML basic data

types, text strings, URIs, colors, lengths, content types, language codes, character encodings, single characters, dates

and times, link types, media descriptors, script data, and style sheet data [33].

The syntax of the core markup element content values and attribute values are derived from SGML tokens such

as the following:

• PCDATA : Parsed Character Data. Mixed content; in other words, an element can contain any

number of character data and/or child elements in arbitrary order.

• CDATA : Character data. A sequence of characters from the document character set and may

include character entities. CDATA attribute values should not contain leading or trailing

whitespace characters. User agents replace character entities with characters, replace carriage

returns and tabs with a single space, and ignore line feeds when interpreting CDATA attribute

values. For script and style elements, CDATA sections are treated as raw text and passed

forward as is. The end tag open delimiter </ is considered as the terminator of the element

content.

• NAME, ID : Identifier tokens that must begin with a letter ( A - Z , a - z ) and may be followed by

any number of letters, digits ( 0 - 9 ), hyphens ( - ), underscores ( _ ), colons ( : ), and periods ( . )

• NUMBER : Tokens containing a minimum of one digit ( 0 - 9 ).

The SGML tokens were introduced in the ISO 8879 standard [34], and they determine the allowed values of the

data types to be used in markup attributes such as URLs, text, numbers, and so on. The supported characters in the

markup depend on the data type as some characters are reserved or considered unsafe for a particular data type.

The PCDATA and CDATA data types are used mainly in XML applications and serialization, including XHTML,

RSS, Atom, and so on (Chapter 7). SGML and XML Document Type Definition files also use PCDATA and CDATA for

markup declarations.

Search WWH ::

Custom Search

Home