Java Reference
In-Depth Information
Note
Aswith
Listings10-1
and
10-2
,
Listing10-3
alsocontains
whitespace
(invis-
ible characters such as spaces, tabs, carriage returns, and line feeds). The XML spe-
cificationpermitswhitespacetobeaddedtoadocument.Whitespaceappearingwithin
content (such as spaces between words) is considered part of the content. In contrast,
theparsertypicallyignoreswhitespaceappearingbetweenanendtagandthenextstart
tag. Such whitespace is not considered part of the content.
AnXMLelement'sstarttagcancontainoneormoreattributes.Forexample,
Listing
<article>
taghas
title
and
lang
attributes.Attributesprovideadditionalinform-
ationaboutelements.Forexample,
qty
identifiestheamountofaningredientthatcan
beadded,
title
identifiesanarticle'stitle,and
lang
identifiesthelanguageinwhich
thearticleiswritten(
en
forEnglish).Attributescanbeoptional.Forexample,if
qty
is
not specified, a default value of
1
is assumed.
Note
Element and attribute names may contain any alphanumeric character from
English or another language, and may also include the underscore (
_
), hyphen (
-
),
period(
.
),andcolon(
:
)punctuation characters. Thecolonshouldonlybeusedwith
namespaces (discussed later in this chapter), and names cannot contain whitespace.
Character References and CDATA Sections
Certaincharacterscannotappearliterallyinthecontentthatappearsbetweenastarttag
and an end tag, or within an attribute value. For example, you cannot place a literal
<
character betweenastarttagandanendtagbecausedoingsowouldconfuseanXML
parser into thinking that it had encountered another tag.
Onesolutiontothisproblemistoreplacetheliteralcharacterwitha
character refer-
ence
,whichisacodethatrepresentsthecharacter.Characterreferencesareclassifiedas
numeric character references or character entity references:
• A
numeric character reference
referstoacharacterviaitsUnicodecodepoint,
andadherestotheformat
&#nnnn;
(notrestrictedtofourpositions)or
&#xh-
hhh;
(notrestrictedtofourpositions),where
nnnn
providesadecimalrepres-
entationofthecodepointand
hhhh
providesahexadecimalrepresentation.For
example,
Σ
and
Σ
represent the Greek capital letter sigma.
AlthoughXMLmandatesthatthe
x
in
&#x
hhhh
;
belowercase,itisflexiblein
thattheleadingzeroisoptionalineitherformat,andinallowingyoutospecify