Java Reference
In-Depth Information
this.waiting.remove(url);
setStatus(url, null, URLStatus.Status.PROCESSED, -1);
Finally, the URL's status is set to
PROCESSED
.
Setting a URL Status
Both the
markProcessed
and
markError
methods rely on the
setStatus
method to actually set the status for a URL. The
setStatus
accepts a URL, a
status
,
a page
source
and a page
depth
. Setting the
status
and
depth
is optional. If you
do not wish to affect the
source
then pass
null
for source. If you do not wish to affect
depth
, then pass negative one for
depth
.
The
setStatus
method begins by attempting to access the
URLStatus
object in
the map for the specified URL. If no status object is found, then one is created.
URLStatus s = this.workload.get(url);
if (s == null) {
s = new URLStatus();
this.workload.put(url, s);
}
s.setStatus(status);
If a value was specified for
source
, then set the source for the
URLStatus
object.
if (source != null) {
s.setSource(source);
}
If a value was specified for
depth
, then set the source for the
URLStatus
object.
if (depth != -1) {
s.setDepth(depth);
}
The workload manager uses this method internally any time the status is set.
Summary
In Chapter 13 you saw how to use the Heaton Research Spider. In this chapter you saw
how the Heaton Research Spider was constructed. This chapter is intended for those who
want to see the inner workings of the Heaton Research Spider, rather than simply using it.
The spider uses thread pools to work more efficiently. In addition to allowing the spider
to execute more effectively on multi-processor systems, the thread pool allows even a single
processor system to execute more efficiently. This is because the spider spends a great deal
of time waiting. A thread pool allows the spider to be waiting on a large number of URLs at
the same time.