Java Reference
In-Depth Information
URLStatus
The status of a URL, held in the memory workload man-
ager.
WorkloadException
Thrown when the workload manager encounters a prob-
lem.
WorkloadManager
An interface that defines the spider's workload manager.
Workload managers hold all of the URLs that the spider
has encountered.
We will review all but the simplest classes shown in Table 14.1, beginning with the Spider
class.
The Spider Class
As you will recall from Chapter 13, one of the most important classes in the Heaton Re-
search Spider is the Spider class. In this section, we will examine the Spider class. The
Spider class is shown in Listing 14.1.
Listing 14.1: The Spider Class (Spider.java)
package com.heatonresearch.httprecipes.spider;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.logging.*;
import com.heatonresearch.httprecipes.spider.filter.*;
import com.heatonresearch.httprecipes.spider.workload.*;
public class Spider
{
/**
* The logger.
*/
private static Logger logger = Logger
.getLogger("com.heatonresearch.httprecipes.spider.Spider");
/**
* The object that the spider reports its findings to.
*/
private SpiderReportable report;
/**
* A flag that indicates if this process should be
* canceled.
Search WWH ::




Custom Search