Java Reference
In-Depth Information
Table 14.1: The Heaton Research Spider Classes
Class Purpose
MemoryWorkloadManager Manage the set of URLs the spider knows about using the
computer's memory.
OracleHolder
Holds the SQL statements used by the OracleWorkload-
Manager.
OracleWorkloadManager
Manage the set of URLs the spider knows about using an
Oracle Database.
RepeatableStatement
Holds an SQL statement that can be repeated. SQL state-
ments are repeated if the connection is broken.
RobotsFilter
Filter URLs using the bot exclusion file (robots.txt).
SimpleReport
A very simple SpiderReportable implementation. This class
does nothing with the data reported by the spider.
Spider
The main class for the spider. Through this class you will
command the spider.
SpiderException
Thrown when the spider encounters an error it can not
handle.
SpiderFilter
An interface that defines how to create filters for the spider.
Filters allow specific URLs to be excluded.
SpiderFormatter
A JDK logging formatter to display the spider's log output
in a simple way.
SpiderInputStream
A special InputStream that also writes everything it reads
to an OutputStream. This class allows the spider to both
save HTML and parse it, at the same time.
SpiderOptions
Holds configuration items for the spider. It also loads the
configuration from a file.
SpiderParseHTML
A special version of the HTML parser for the spider. This
version records any links found as the user program
parses the HTML.
SpiderReportable
An interface that defines a class that the spider can report
its findings to.
SpiderWorker
Performs the actual work of the spider. The SpiderWorker
class is used to perform work in the thread pool.
SQLHolder
All of the SQL statements used by the spider are con-
tained here.
SQLWorkloadManager
Manages the set of URLs the spider knows about, using
an SQL database.
Status
The status of a URL, held in the SQL workload manager.
Search WWH ::




Custom Search