Java Reference
In-Depth Information
Getting Work
The getWork method is used by the spider to receive individual URLs for the
SpiderWorker objects to work on. The getWork method begins by polling the
waiting queue. The program will wait up to five seconds for something to be placed in
the queue.
URL url;
try {
url = this.waiting.poll(5, TimeUnit.SECONDS);
Once a URL is located, the URL's status is set to WORKING and the workingCount
is increased.
if (url != null) {
setStatus(url, null, URLStatus.Status.WORKING, -1);
this.workingCount++;
}
Return any URL that was found.
return url;
} catch (InterruptedException e) {
return null;
}
If the poll was interrupted, then return a null . This indicates that no work could be
found.
Marking a URL as Error
If an error occurs while processing a URL, then that URL is marked as ERROR and the
workingCount is decreased. If this value falls below zero, this is an error, and the URL
is removed from the waiting queue.
this.workingCount--;
assert this.workingCount > 0;
this.waiting.remove(url);
setStatus(url, null, URLStatus.Status.ERROR, -1);
Finally, the URL's status is set to ERROR .
Marking a URL as Processed
If a URL has been successfully processed, then that URL is marked as PROCESSED
and the workingCount is decreased. If this value falls below zero, this is an error, and
the URL is removed from the waiting queue.
this.workingCount--;
assert this.workingCount > 0;
Search WWH ::




Custom Search