Java Reference
In-Depth Information
this.waiting.remove(url);
setStatus(url, null, URLStatus.Status.PROCESSED, -1);
Finally, the URL's status is set to PROCESSED .
Setting a URL Status
Both the markProcessed and markError methods rely on the setStatus
method to actually set the status for a URL. The setStatus accepts a URL, a status ,
a page source and a page depth . Setting the status and depth is optional. If you
do not wish to affect the source then pass null for source. If you do not wish to affect
depth , then pass negative one for depth .
The setStatus method begins by attempting to access the URLStatus object in
the map for the specified URL. If no status object is found, then one is created.
URLStatus s = this.workload.get(url);
if (s == null) {
s = new URLStatus();
this.workload.put(url, s);
}
s.setStatus(status);
If a value was specified for source , then set the source for the URLStatus object.
if (source != null) {
s.setSource(source);
}
If a value was specified for depth , then set the source for the URLStatus object.
if (depth != -1) {
s.setDepth(depth);
}
The workload manager uses this method internally any time the status is set.
Summary
In Chapter 13 you saw how to use the Heaton Research Spider. In this chapter you saw
how the Heaton Research Spider was constructed. This chapter is intended for those who
want to see the inner workings of the Heaton Research Spider, rather than simply using it.
The spider uses thread pools to work more efficiently. In addition to allowing the spider
to execute more effectively on multi-processor systems, the thread pool allows even a single
processor system to execute more efficiently. This is because the spider spends a great deal
of time waiting. A thread pool allows the spider to be waiting on a large number of URLs at
the same time.
Search WWH ::




Custom Search