Java Reference
In-Depth Information
A hash code is not a unique identifier and is never guaranteed to be unique. Though they
are not unique, they allow us to break large sets of data down into small manageable sets that
can be accessed very quickly.
To compute a hash for a URL, the
SQLWorkloadManager
class uses the
computeHash
method. The
computeHash
function relies heavily on Java's built in
hashCode
function. The
computeHash
method begins by converting the URL to a
String
and trimming it.
String str = url.toString().trim();
Next, the
hashCode
function is called to generate a hash code for the trimmed URL.
int result = str.hashCode();
result = (result % 0xffff);
return result;
The hash code is trimmed to 16-bits in length. This allows the hash code to be stored in
a regular integer column type.
Workload Synchronization
The SQL workload manager must synchronize access to the workload. There are
two synchronization objects used by the
SQLWorkloadManager
. The first, named
addLock
, is defined as follows:
private Semaphore addLock;
The
addLock
semaphore is used to ensure that no two threads are added to the work-
load at exactly the same time. This prevents the workload from getting two or more of the
same URL added. A semaphore is a synchronization object that allows a specified number of
threads to simultaneously access a resource. The
addLock
resource was created to allow
only one thread to access it at a time. A semaphore that allows only one at a time thread ac-
cess, are sometimes called a mutex.
The second synchronization object used by the
SQLWorkloadManager
, named
workLatch
, is defined as follows:
private CountDownLatch workLatch;
The
workLatch
synchronization object allows threads to wait until a workload be-
comes available. A latch allows several threads to wait for an event to occur. When the event
occurs, the latch allows one thread to access the resource.
Multiple Hosts
The ability to spider multiple hosts is one of the main reasons to use the SQL workload
manager. First, let me define what I mean by multiple hosts. Consider the following two
URLs. These URLs are on the same host: