Java Reference
In-Depth Information
</td>
<td bgcolor="#FFFFFF" width="51%">
<a href="../81332713233407">
<em>The Hunt for Red October</em></a>
<br>
Tom Clancy;
Hardcover</b>
<font size=2 face="Verdana, Helvetica, Courier" color=#000000>
<NOBR>Price: <font color=#990>$18.99</font></b></NOBR><br>
</td>
An Internet browser has no trouble understanding how to format and display
this information in a page to the end user. While viewing the page in a browser, the
end user has no trouble understanding what the data means (the shopping cart
contains two books, one for $6.99 and the other for $18.99).
However, what if you want a program to parse this document and to extract the
item number and other information, including its price, from the HTML docu-
ment? In theory, you could use a trial-and-error design approach to build a parser
for this particular document. Perhaps you could fine-tune this algorithm so that it
can process the shopping cart HTML page and extract the price of the topic:
Look for the string NOBR>Price: .
Skip past the font declaration ( <font ..> ).
The characters before the next font declaration contain the price.
Ignore the currency symbol in the price character string.
Convert the price string into a numeric price variable.
However, you would have no guarantee that the parser would work if the ven-
dor made even minor changes to the Web site, or that the parser wouldn't be con-
fused by similar pages. More important, you would have no guarantee that the
parser would work with another vendor's HTML pages. Furthermore, important
contextual information is hard to decipher. For example, what is the identifier (the
order ID) for this shopping cart? Is it contained in the href identifier?
An XML document, on the other hand, contains information in a format that
can be readily parsed by an application. An XML document might express a shop-
ping cart using this type of syntax:
<Order orderNumber="81332713233407">
<LineItem>
<Title>Debt of Honor</Title>
<Author>Tom Clancy</Author>
<BookType>Paperback</BookType>
Search WWH ::




Custom Search