Java Reference
In-Depth Information
The VALUE attribute of each of the above option tags defines the URL that we will ac-
cess to find that page.
This function begins by opening the URL to the first page.
URL url = new URL("http://www.httprecipes.com/1/9/article.php");
InputStream is = url.openStream();
A ParseHTML object will be used to parse the HTML. The ParseHTML class was
discussed in Chapter 6, “Extracting Data”.
ParseHTML parse = new ParseHTML(is);
int ch;
while ((ch = parse.read()) != -1)
{
As the data is read from the HTML page, each tag is processed. If the tag found is an
<option> tag, then we will look for a URL.
if (ch == 0)
{
HTMLTag tag = parse.getTag();
if (tag.getName().equalsIgnoreCase("option"))
{
If an <option> tag is found, then construct a URL object for it and call the
downloadArticlePage function.
String str = tag.getAttributeValue("value");
URL u = new URL(url,str);
System.out.println(downloadArticlePage(u));
}
}
}
The downloadArticlePage function returns a string for every page download-
ed. This string is then displayed.
Reading Each Article Page
Reading the data from each of the article pages is fairly straightforward. First, let's exam-
ine the HTML page that contains each page of the article. You can see this HTML here:
<center>
<h1>Programming Binary Files in Java</h1>
<h3>Introduction</h3></center>
<p>Java contains an extensive array of classes for file access.
A series of readers, writers and filters make up
Search WWH ::




Custom Search