Java Reference
In-Depth Information
} catch (Throwable e) {
try {
this.spider.getWorkloadManager().markError(this.url);
} catch (WorkloadException e1) {
logger.log(Level.WARNING, "Error marking workload(2).", e);
}
logger.log(Level.SEVERE, "Caught exception at URL:" + this.url.
toString(), e);
this.spider.getReport().spiderURLError(this.url);
return;
A finally block ensures that the InputStream is closed.
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
}
If no exceptions have occurred by this point, the URL can be marked as processed in the
workload manager.
try {
// Mark URL as complete.
this.spider.getWorkloadManager().markProcessed(this.url);
logger.fine("Complete: " + this.url);
if (!this.url.equals(connection.getURL())) {
Sometimes the spider will request one URL and get another. This is the case with an
HTTP redirect. One requested URL could redirect the browser to another. If this happens, we
need to mark the redirect URL as processed as well. The following lines of code do this.
this.spider.getWorkloadManager().add(connection.getURL(),
this.url,
this.spider.getWorkloadManager().getDepth(connection.getURL()));
this.spider.getWorkloadManager().markProcessed(connection.ge-
tURL());
}
If any errors occur marking the workload, they are logged.
} catch (WorkloadException e) {
logger.log(Level.WARNING, "Error marking workload(3).", e);
}
The thread pool will continue processing SpiderWorker objects until the spider has
no more work to do.
Search WWH ::




Custom Search