Java Reference
In-Depth Information
/**
* Set the OutputStream.
*
* @param os
* The OutputStream.
*/
public void setOutputStream(OutputStream os) {
this.os = os;
}
}
The read function performs the work done by this class. The read function reads
from the parent class and writes that value to the OutputStream . Then the value is re-
turned to the calling method or function.
Workload Management
A workload manager is a class that manages the list of URLs for the spider. The workload
manager tracks which URLs the spider has yet to visit, as well as which URLs resulted in an
error.
As URLs are found by the spider, they are added to the workload. Initially, they are in a
waiting state. However, as the URL is processed by the spider, the state will change accord-
ingly. Table 14.3 lists the states that a URL will go through as it is processed.
Table 14.3: URL States
State
Purpose
ERROR
The URL has resulted in an error. The URL will not enter a new state
after this one.
PROCESSED
The URL was processed successfully. The URL will not enter a new
state after this one.
WAITING
The URL is waiting to be processed. The URL will enter the
WORKING state once the spider is ready to process it.
WORKING
The spider is currently processing the URL. If processing the URL is
successful, the URL will enter the PROCESSED state after this state.
If processing the URL results in an error, the URL will enter the
ERROR state after this state.
Figure 14.1 summarizes these states as a state diagram.
Search WWH ::




Custom Search