Java Reference
In-Depth Information
for (SpiderFilter filter : this.filters)
{
try
{
filter.newHost(host, this.options.userAgent);
} catch (IOException e)
{
logger.log(Level.INFO,
"Error while reading robots.txt file:"
+ e.getMessage());
}
}
// now process this host
do
{
url = this.workloadManager.getWork();
if (url != null)
{
SpiderWorker worker = new SpiderWorker(this, url);
this.threadPool.execute(worker);
} else
{
this.workloadManager.waitForWork(60, TimeUnit.SECONDS);
}
} while (((url != null) ||
(this.threadPool.getActiveCount() > 0))
&& !this.cancel);
}
}
As you can see from the above listing, the spider uses a number of instance variables.
The spider uses these to track its current state, as well as to remember configuration infor-
mation. These instance variables are summarized in Table 14.2.
Search WWH ::




Custom Search