Java Reference
In-Depth Information
Now that everything has been notified, we begin processing the host. To do this, we at-
tempt to obtain a URL from the workload manager. If no URL was available, then we wait for
up to 60 seconds, and then try the process again.
do {
url = this.workloadManager.getWork();
if (url != null) {
SpiderWorker worker = new SpiderWorker(this, url);
this.threadPool.execute(worker);
} else {
this.workloadManager.waitForWork(60, TimeUnit.SECONDS);
}
} while (((url != null) || (this.threadPool.getActiveCount() > 0))
&& !this.cancel);
This process continues until there is no more work left for the current host.
Other Important Classes in the Heaton Research Spider
When you make use of the Heaton Research Spider, you will deal primarily with the
Spider class. However, there are other important classes in the Heaton Research Spider
that you will also make use of. Particularly, the Heaton Research Spider supports several
interfaces and is also capable of throwing several exceptions.
Spider Interfaces
There are two interfaces that the Heaton Research Spider makes use of. These interfaces
allow you to define how the spider acts.
The first interface is the SpiderReportable interface. To make use of the Heaton
Research Spider, you must provide a class that implements the SpiderReportable
interface. This class is responsible for processing the data that the spider finds.
The second interface is the WorkloadManager interface. The
WorkloadManager class allows the spider to use more than one type of workload manager.
There are two workload managers provided with the spider. The SQLWorkloadManager
stores URLs in an SQL database. The MemoryWorkloadManager stores URLs in
memory.
Spider Exceptions
There are two exceptions that can be thrown by the spider. These exceptions will be re-
quired to be caught when you are working with the spider. Which exception you must catch
is determined by the operation you are performing with the spider.
Search WWH ::




Custom Search