Java Reference
In-Depth Information
Clearing the Workload
Clearing the workload is easy. The workload and waiting variables are cleared
and the workingCount is set to zero.
this.workload.clear();
this.waiting.clear();
this.workingCount = 0;
The resume method is not implemented because the memory workload will not per-
sist its data between runs of the program. There will be nothing to resume.
Getting the Depth of a URL
An important aspect of the workload management is to track the depth of each URL en-
countered. The depth of the URL was stored when the URL was added to the workload. To
determine the URL of a workload, the URLStatus is read from the map.
URLStatus s = this.workload.get(url);
assert (s != null);
if (s != null) {
return s.getDepth();
} else {
return 1;
}
An assert is used to ensure that the URL is found. If the spider is seeking the depth
of a URL that has not been added yet, that is an error.
Getting the Source of a URL
Along with the depth of a URL, the source of a URL is also tracked. The source of a URL
is the page the URL was found on. Finding the source of a URL is very similar to finding the
depth of a URL. It is read from the URLStatus entry in the map.
URLStatus s = this.workload.get(url);
if (s == null) {
return null;
} else {
return s.getSource();
}
If a URL status is not found for the specified URL, then null is returned.
Search WWH ::




Custom Search