Java Reference
In-Depth Information
/**
* Called when the spider tries to process a URL but gets
* an error. This method is not used in this manager.
*
* @param url
* The URL that generated an error.
*/
public void spiderURLError(URL url) {
}
}
The unique functionality with the world spider is the way that it handles new URLs when
spiderFoundURL is called. Unlike the previous spiders, no checks are made to deter-
mine if the URL is on the same host. Any URL is a candidate to be visited.
public boolean spiderFoundURL(URL url, URL source,
SpiderReportable.URLType type) {
return true;
}
As you can see, the spiderFoundURL simply returns true .
This spider shows how you would setup a spider that would access a large number of web
sites. Of course, this spider is only the beginning of a search engine; but it does demonstrate
how to configure the Heaton Research Spider to access a large amount of sites.
Recipe #13.4: Display Spider Statistics
Because the SQLWorkloadManager class stores the workload in a database, it is
possible for other programs to monitor the progress of the spider. This recipe will show you
how to create a simple program that monitors the spider progress using the Heaton Research
spider database.
This recipe makes use of a Heaton Research Spider configuration file, just like previous
recipes. To start this recipe, specify the name of the configuration file as the first argument.
The following code demonstrates how you might start the spider:
SpiderStats spider.conf c:\temp\ http://www.example.com
The above command simply shows the abstract format to call this recipe, with the ap-
propriate parameters. For exact information on how to run this recipe refer to Appendix B,
C, or D, depending on the operating system you are using. Figure 13.1 shows this program
monitoring a spider's progress.
Search WWH ::




Custom Search