INSIDE THE HEATON RESEARCH SPIDER - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

Additionally, the thread pool is started with CallerRunsPolicy . This policy speci-

fies that once there is no more room to queue new tasks, the thread pool begins running new

tasks with the main thread. This allows us to make use of all threads and also throttles the

spider when it gets too much work. Since the main thread will be processing work when the

queue fills, the main thread will not have a chance to generate more work. Once the queue

goes down, the main thread is allowed to continue filling up the queue.

The queue is a SynchronousQueue , which is a Java class that holds waiting tasks

for the spider. The Java thread pool requires that some sort of BlockingQueue be used

to hold the workload.

this.tasks = new SynchronousQueue<Runnable>();

this.threadPool = new ThreadPoolExecutor(options.corePoolSize,

options.maximumPoolSize, options.keepAliveTime,

TimeUnit.SECONDS,this.tasks);

this.threadPool.setRejectedExecutionHandler(

new ThreadPoolExecutor.CallerRunsPolicy());

If any filters were specified, they are loaded at this point.

// Add filters.

if (options.filter != null) {

for (String name : options.filter) {

SpiderFilter filter = (SpiderFilter)

Class.forName(name).newInstance();

this.filters.add(filter);

}

Finally, we are ready to perform the startup operation that was specified in the

SpiderOptions configuration. If the user requests STARUP_RESUME then the

workload manager is instructed to set up the resume from the last spider run. Otherwise, the

workload will be cleared.

// Perform startup.

if (options.startup.equalsIgnoreCase(

SpiderOptions.STARTUP_RESUME)) {

this.workloadManager.resume();

} else {

this.workloadManager.clear();

}

After the constructor completes, the spider is ready to run. The Heaton Research Spider

is designed that you should create a new Spider object for each spider that runs.

Search WWH ::

Custom Search

Home