Java Reference
In-Depth Information
Getting Work
The
getWork
method is used by the spider to receive individual URLs for the
SpiderWorker
objects to work on. The
getWork
method begins by polling the
waiting
queue. The program will wait up to five seconds for something to be placed in
the queue.
URL url;
try {
url = this.waiting.poll(5, TimeUnit.SECONDS);
Once a URL is located, the URL's status is set to
WORKING
and the
workingCount
is increased.
if (url != null) {
setStatus(url, null, URLStatus.Status.WORKING, -1);
this.workingCount++;
}
Return any URL that was found.
return url;
} catch (InterruptedException e) {
return null;
}
If the
poll
was interrupted, then return a
null
. This indicates that no work could be
found.
Marking a URL as Error
If an error occurs while processing a URL, then that URL is marked as
ERROR
and the
workingCount
is decreased. If this value falls below zero, this is an error, and the URL
is removed from the waiting queue.
this.workingCount--;
assert this.workingCount > 0;
this.waiting.remove(url);
setStatus(url, null, URLStatus.Status.ERROR, -1);
Finally, the URL's status is set to
ERROR
.
Marking a URL as Processed
If a URL has been successfully processed, then that URL is marked as
PROCESSED
and the
workingCount
is decreased. If this value falls below zero, this is an error, and
the URL is removed from the waiting queue.
this.workingCount--;
assert this.workingCount > 0;