Java Reference
In-Depth Information
logger.log(Level.SEVERE,
"Caught exception at URL:" + this.url.toString(), e);
this.spider.getReport().spiderURLError(this.url);
return;
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
}
}
}
try {
// mark URL as complete
this.spider.getWorkloadManager().markProcessed(this.url);
logger.fine("Complete: " + this.url);
if (!this.url.equals(connection.getURL())) {
// save the URL(for redirect's)
this.spider.getWorkloadManager().add(
connection.getURL(), this.url,
this.spider.getWorkloadManager().getDepth(
connection.getURL()));
this.spider.getWorkloadManager().markProcessed(
connection.getURL());
}
} catch (WorkloadException e) {
logger.log(Level.WARNING, "Error marking workload(3).", e);
}
}
}
As the thread pool processes the
SpiderWorker
objects presented to it, the
run
methods from these
SpiderWorker
classes are executed. The
run
method begins by
logging the URL that it is currently processing. Then a connection is opened to that URL.
try {
logger.fine("Processing: " + this.url);
// Get the URL's contents.
connection = this.url.openConnection();
Next, the timeout values are set. The same timeout value is used for both connection and
read timeouts.
connection.setConnectTimeout(this.spider.getOptions().timeout);
connection.setReadTimeout(this.spider.getOptions().timeout);