Java Reference
In-Depth Information
Table 15.2: Instance Variables of the RepeatableStatement Class
Instance Variable Purpose
driver
The driver for the JDBC connection.
url
The URL for the JDBC connection.
addLock
Only one thread at a time is allowed to be added to the workload.
workLatch
Is there any work? Threads can wait on this latch when waiting for
work.
maxURLSize
The maximum size a URL can be. Determined from the column
size in the URL field of SPIDER_WORKLOAD table.
maxHostSize
The maximum size that a host can be. Determined from the col-
umn size in the HOST field of the SPIDER_HOST table.
statements
All of the RepeatableStatement objects.
workResultSet
Used to obtain the next URL.
hostResultSet
Used to obtain the next host.
connection
A connection to a JDBC database.
currentHost
The current host.
currentHostID
The ID of the current host.
There are also several methods and functions that make up the
RepeatableStatement class. These methods and functions will be discussed in the
next few sections.
Generating Hash Codes
URLs can become quite long. Sometimes they will exceed 2,000 characters. In reality,
very few URLs will ever reach this length. However, since they can, they must be supported.
This is why the URL field of the SPIDER_WORKLOAD table is 2,083, which is the maxi-
mum URL size supported by many web browsers.
This large field size presents a problem. The spider will often need to lookup and see if a
URL has already been processed. The long URL length makes this field very difficult to index
in a database. To get around this issue, we create a hash of the URL and store this into a field
named URL_HASH . Using the hash, we can quickly narrow the search down to just a few
rows. Additionally, because the hash is a number, it can very efficiently be indexed.
You may be wondering what a hash is. A hash is a very common computer programming
technique where you convert a String, or other object type, to a number. Java includes sup-
port for hash all the way down to the Object level. The Object class contains a method
called hashCode . The hashCode function returns a hash for any Java object.
Search WWH ::




Custom Search