Writing and Reading XML - Java 7 for Absolute Beginners

Java Reference

In-Depth Information

Listing 9-1. The Smallest Possible XML File

<?xml version="1.0" encoding="UTF-8"?>

I have worked with systems that had many such files, as each directory in a set of directories meant

to contain the output of a complex process had to have at least one file. Consequently, we had a bunch

of XML files with content as follows: <?xml version="1.0" encoding="UTF-8"?><placeholder/> You can

see the exact syntax shortly. Until then, a more meaningful example will help to clarify things. Here's one

of my favorite poems, encoded as an XML document.

Listing 9-2. An Example of XML

<?xml version="1.0" encoding="UTF-8"?>

<line>Among the rain</line>

<line>and lights</line>

<line>I saw the figure 5</line>

<line>fire truck</line>

<line>moving</line>

<line>tense</line>

<line>unheeded</line>

<line>to gong clangs</line>

<line>siren howls</line>

<line>and wheels rumbling</line>

<line>through the dark city</line>

</poem>

The first line, the document specifier, indicates that this document is an XML document and

specifies the version (1.0, which is the most often used version, and suffices for most purposes) and the

encoding. Document specifiers always begin with <? and end with ?>. This way, they can't be confused

with XML elements. Most systems that can process XML will work with documents that don't have a

document specifier, but a document without one isn't strictly an XML file—it's just a collection of

characters that happen to look like an XML file. That may seem like an arbitrary and trivial distinction,

but your XML document may be rejected for just that reason by some systems, so it's good to get in the

habit of always including a document specifier. The encoding indicates the character set that applies to

the content. UTF-8 is a large set that includes most of the characters available in non-Asian languages

(including English, Greek, Spanish, Russian, and many others). The Asian languages (Chinese, Japanese,

Vietnamese, and others) use pictographs (that is, an image that corresponds to a word). The Asian

character sets are consequently very large and tricky to manipulate. For the sake of simplicity, we'll stick

to UTF-8 and documents in English.

The next line contains the root element. The first element in any XML file is that document's root

element. All other elements, no matter how deeply nested, are descendants of the root. The root

element, poem , contains two attributes, title and author . The root element also contains all the line

elements, which make up the body of the poem.

Note the syntax for each element. Each one begins with an opening tag ( <poem> or <line> ) and ends

with a closing tag ( </poem> or </line> ). The basic rule is that the names within the tags have to match

(and there are various restrictions about which characters can be used, but just about any English word

works). Other than that, opening tags always start with a left angle character ( < ) and end with a right

angle character( > ). Ending tags always begin with a left angle character and a forward slash ( </ ) and end

Java 7 for Absolute Beginners

Search WWH ::

Custom Search

Home