Java Reference
In-Depth Information
try
{
ExtractPartial parse = new ExtractPartial();
parse.process();
} catch (Exception e)
{
e.printStackTrace();
}
}
}
This recipe works by downloading the first page, then following the “next page” links
until the end is reached.
Processing the First Page
The
process
method of the
ExtractPartial
class is used to access the first
page, and download subsequent pages. It is important to note that there are two
process
methods in the
ExtractPartial
. The
process
method used to start downloading
is the
process
method that accepts no parameters. It begins by obtaining a URL to the
first page.
URL url = new URL("http://www.httprecipes.com/1/6/partial.php");
do
{
url = process(url);
} while (url != null);
The URL is passed to the process method that accepts a URL. This process method
returns the URL to the next page. This process continues until all pages have been down-
loaded.
Processing Individual Pages
The overloaded process method that accepts a
URL
is called for each partial-page that
is found. The method begins by creating some variables that will be needed to process the
page. The
result
variable holds the next partial-page, or
null
if there is no next page.
The
buffer
variable holds non-tag text encountered. The
value
variable holds the
href
attribute for
<a>
tags found. The
src
variable holds the
src
attribute for
<img>
tags encountered.
URL result = null;
StringBuilder buffer = new StringBuilder();
String value = "";
String src = "";