Java Reference
In-Depth Information
logger.log(Level.SEVERE,
"Caught exception at URL:" + this.url.toString(), e);
this.spider.getReport().spiderURLError(this.url);
return;
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
}
}
}
try {
// mark URL as complete
this.spider.getWorkloadManager().markProcessed(this.url);
logger.fine("Complete: " + this.url);
if (!this.url.equals(connection.getURL())) {
// save the URL(for redirect's)
this.spider.getWorkloadManager().add(
connection.getURL(), this.url,
this.spider.getWorkloadManager().getDepth(
connection.getURL()));
this.spider.getWorkloadManager().markProcessed(
connection.getURL());
}
} catch (WorkloadException e) {
logger.log(Level.WARNING, "Error marking workload(3).", e);
}
}
}
As the thread pool processes the SpiderWorker objects presented to it, the run
methods from these SpiderWorker classes are executed. The run method begins by
logging the URL that it is currently processing. Then a connection is opened to that URL.
try {
logger.fine("Processing: " + this.url);
// Get the URL's contents.
connection = this.url.openConnection();
Next, the timeout values are set. The same timeout value is used for both connection and
read timeouts.
connection.setConnectTimeout(this.spider.getOptions().timeout);
connection.setReadTimeout(this.spider.getOptions().timeout);
Search WWH ::




Custom Search