EXTRACTING DATA - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

try

{

ExtractPartial parse = new ExtractPartial();

parse.process();

} catch (Exception e)

{

e.printStackTrace();

}

This recipe works by downloading the first page, then following the “next page” links

until the end is reached.

Processing the First Page

The process method of the ExtractPartial class is used to access the first

page, and download subsequent pages. It is important to note that there are two process

methods in the ExtractPartial . The process method used to start downloading

is the process method that accepts no parameters. It begins by obtaining a URL to the

first page.

URL url = new URL("http://www.httprecipes.com/1/6/partial.php");

do

{

url = process(url);

} while (url != null);

The URL is passed to the process method that accepts a URL. This process method

returns the URL to the next page. This process continues until all pages have been down-

loaded.

Processing Individual Pages

The overloaded process method that accepts a URL is called for each partial-page that

is found. The method begins by creating some variables that will be needed to process the

page. The result variable holds the next partial-page, or null if there is no next page.

The buffer variable holds non-tag text encountered. The value variable holds the

href attribute for <a> tags found. The src variable holds the src attribute for <img>

tags encountered.

URL result = null;

StringBuilder buffer = new StringBuilder();

String value = "";

String src = "";

Search WWH ::

Custom Search

Home