Java Reference
In-Depth Information
/**
* Called when the spider tries to process a URL but gets
* an error.
*
* @param url
* The URL that generated an error.
*/
public void spiderURLError(URL url);
}
Any class that implements this interface must provide implementations for each of the
methods and functions contained in the above listing. These methods and function are sum-
marized in Table 13.5.
Table 13.5: Functions and Methods of the SpiderReportable Interface
Name
Purpose
beginHost
Called when the spider begins processing a new host.
init
Called to setup the object. The object is provided with a
reference to the spider at this point.
spiderFoundURL
Called when the spider finds a URL. Return true if links
from this URL should be processed.
spiderProcessURL (HTML)
Called when the spider encounters an HTML page.
A SpiderHTMLParse object is provided to parse the
HTML.
spiderProcesURL (binary)
Called to download a binary page, such as an image.
An InputStream is provided to download the page.
spiderURLError
Called when a URL results in an error while loading.
By providing a class that implements the SpiderReportable interface, you
are able to process all of the data the spider finds. This is how you really define the sort
of a spider you are creating. The recipes section of this chapter will demonstrate several
SpiderReportable implementations.
Starting the Spider
Now that you have seen how to configure and setup the spider, you are ready to learn
how to actually start the spider. The spider can be started with the following lines of code:
URL base = new URL("http://www.httprecipes.com/");
SimpleReport report = new SimpleReport();
SpiderOptions options = new SpiderOptions();
Search WWH ::




Custom Search