USING JAVASCRIPT - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

The VALUE attribute of each of the above option tags defines the URL that we will ac-

cess to find that page.

This function begins by opening the URL to the first page.

URL url = new URL("http://www.httprecipes.com/1/9/article.php");

InputStream is = url.openStream();

A ParseHTML object will be used to parse the HTML. The ParseHTML class was

discussed in Chapter 6, “Extracting Data”.

ParseHTML parse = new ParseHTML(is);

int ch;

while ((ch = parse.read()) != -1)

{

As the data is read from the HTML page, each tag is processed. If the tag found is an

<option> tag, then we will look for a URL.

if (ch == 0)

{

HTMLTag tag = parse.getTag();

if (tag.getName().equalsIgnoreCase("option"))

{

If an <option> tag is found, then construct a URL object for it and call the

downloadArticlePage function.

String str = tag.getAttributeValue("value");

URL u = new URL(url,str);

System.out.println(downloadArticlePage(u));

}

The downloadArticlePage function returns a string for every page download-

ed. This string is then displayed.

Reading Each Article Page

Reading the data from each of the article pages is fairly straightforward. First, let's exam-

ine the HTML page that contains each page of the article. You can see this HTML here:

<h1>Programming Binary Files in Java</h1>

<h3>Introduction</h3></center>

<p>Java contains an extensive array of classes for file access.

A series of readers, writers and filters make up

Search WWH ::

Custom Search

Home