Java Reference
In-Depth Information
Table 14.4: Methods and Functions in the WorkloadManager Interface
Method or Function Purpose
add
Add the specified URL to the workload. Return true if the URL
was added, false otherwise.
clear
Clear the workload.
contains
Returns true if the workload contains the specified URL.
convertURL
Convert the specified String to a URL. If the string is too long or
has other issues, throw a WorkloadException. Returns the URL
with any necessary conversion.
getCurrentHost
Returns the current host.
getDepth
Returns the depth for the specified URL.
getSource
Returns the source for the specified URL.
getWork
Returns the next URL that needs to be processed. Also marks
this URL as currently being processed.
init
Sets up the workload manager.
markError
Marks the specified URL as having an error.
markProcessed
Marks the specified URL as successfully processed.
nextHost
Returns the next host to be processed.
resume
Called to set up the workload to resume from a previous attempt.
waitForWork
If there is currently no work available, then wait until a new URL
has been added to the workload.
workloadEmpty
Returns true if the workload is empty.
By defining a workload management interface, the Heaton Research Spider can be pro-
grammed to use a variety of workload managers. Currently, there are only two workload man-
agers defined for the Heaton Research Spider. These workload managers are listed here:
• Memory Workload Management
• SQL Workload Management
Each of these workload managers are discussed in the next few sections.
Memory Workload Management
The most basic workload management type provided by the spider is the
memory-based workload. The memory-based workload is contained in the class
MemoryWorkloadManager . This class stores the complete list of URLs in memory.
Search WWH ::




Custom Search