Java Reference
In-Depth Information
A hash code is not a unique identifier and is never guaranteed to be unique. Though they
are not unique, they allow us to break large sets of data down into small manageable sets that
can be accessed very quickly.
To compute a hash for a URL, the SQLWorkloadManager class uses the
computeHash method. The computeHash function relies heavily on Java's built in
hashCode function. The computeHash method begins by converting the URL to a
String and trimming it.
String str = url.toString().trim();
Next, the hashCode function is called to generate a hash code for the trimmed URL.
int result = str.hashCode();
result = (result % 0xffff);
return result;
The hash code is trimmed to 16-bits in length. This allows the hash code to be stored in
a regular integer column type.
Workload Synchronization
The SQL workload manager must synchronize access to the workload. There are
two synchronization objects used by the SQLWorkloadManager . The first, named
addLock , is defined as follows:
private Semaphore addLock;
The addLock semaphore is used to ensure that no two threads are added to the work-
load at exactly the same time. This prevents the workload from getting two or more of the
same URL added. A semaphore is a synchronization object that allows a specified number of
threads to simultaneously access a resource. The addLock resource was created to allow
only one thread to access it at a time. A semaphore that allows only one at a time thread ac-
cess, are sometimes called a mutex.
The second synchronization object used by the SQLWorkloadManager , named
workLatch , is defined as follows:
private CountDownLatch workLatch;
The workLatch synchronization object allows threads to wait until a workload be-
comes available. A latch allows several threads to wait for an event to occur. When the event
occurs, the latch allows one thread to access the resource.
Multiple Hosts
The ability to spider multiple hosts is one of the main reasons to use the SQL workload
manager. First, let me define what I mean by multiple hosts. Consider the following two
URLs. These URLs are on the same host:
Search WWH ::




Custom Search