Java Reference
In-Depth Information
Element Names
If you're going to be creating elements then you're going to have to give them names, and XML is very gen-
erous in the names you're allowed to use. For example, there aren't any reserved words to avoid in XML, as
there are in most programming languages, so you do have a lot of flexibility in this regard. However, there
are certain rules that you must follow. The names you choose for elements must begin with either a letter
or an underscore and can include digits, periods, and hyphens. Here are some examples of valid element
names:
net_price Gross-Weight _sample clause_3.2 pastParticiple
In theory you can use colons within a name but because colons have a special purpose in the context of
names, you should not do so. XML documents use the Unicode character set, so any of the national language
alphabets defined within that set may be used for names. HTML users need to remember that tag names in
XML are case-sensitive, so <Address> is not the same as <address> .
Note also that names starting with uppercase or lowercase x followed by m followed by l are reserved, so
you must not define names that begin xml or XmL or any of the other six possible sequences.
Defining General Entities
There is a frequent requirement to repeat a given block of parsed character data in the body of a document.
An obvious example is some kind of copyright notice that you may want to insert in various places. You can
define a named block of parsed text like this:
<!ENTITY copyright "© 2011 Ivor Horton">
This is an example of declaration of a general entity . You can put declarations of general entities within
a DOCTYPE declaration in the document prolog or within an external DTD. I describe how a little later in this
chapter. The block of text that appears between the double quotes is identified by the name copyright . You
could equally well use single quotes as delimiters for the string. Wherever you want to insert this text in the
document, you just need to insert the name delimited by an ampersand at the beginning and a semicolon at
the end, thus:
&copyright;
This is called an entity reference . This is exactly the same notation as the predefined entities representing
markup characters that you saw earlier. It causes the equivalent text to be inserted at this point when the
document is parsed. A general entity is parsed text, so you need to take care that the document is still well-
formed and valid after the substitution has been made.
An entity declaration can include entity references. For example, I could declare the copyright entity
like this:
<!ENTITY copyright "© 2011 Ivor Horton &documentDate;">
The text contains a reference to a documentDate entity. Entity references may appear in a document only
after their corresponding entity declarations, so the declaration for the documentDate entity must precede
the declaration for the copyright entity:
Search WWH ::




Custom Search