INSIDE THE HEATON RESEARCH SPIDER - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

Now that everything has been notified, we begin processing the host. To do this, we at-

tempt to obtain a URL from the workload manager. If no URL was available, then we wait for

up to 60 seconds, and then try the process again.

do {

url = this.workloadManager.getWork();

if (url != null) {

SpiderWorker worker = new SpiderWorker(this, url);

this.threadPool.execute(worker);

} else {

this.workloadManager.waitForWork(60, TimeUnit.SECONDS);

}

} while (((url != null) || (this.threadPool.getActiveCount() > 0))

&& !this.cancel);

This process continues until there is no more work left for the current host.

Other Important Classes in the Heaton Research Spider

When you make use of the Heaton Research Spider, you will deal primarily with the

Spider class. However, there are other important classes in the Heaton Research Spider

that you will also make use of. Particularly, the Heaton Research Spider supports several

interfaces and is also capable of throwing several exceptions.

Spider Interfaces

There are two interfaces that the Heaton Research Spider makes use of. These interfaces

allow you to define how the spider acts.

The first interface is the SpiderReportable interface. To make use of the Heaton

Research Spider, you must provide a class that implements the SpiderReportable

interface. This class is responsible for processing the data that the spider finds.

The second interface is the WorkloadManager interface. The

WorkloadManager class allows the spider to use more than one type of workload manager.

There are two workload managers provided with the spider. The SQLWorkloadManager

stores URLs in an SQL database. The MemoryWorkloadManager stores URLs in

memory.

Spider Exceptions

There are two exceptions that can be thrown by the spider. These exceptions will be re-

quired to be caught when you are working with the spider. Which exception you must catch

is determined by the operation you are performing with the spider.

Search WWH ::

Custom Search

Home