Java Reference
In-Depth Information
Finally, if the variable is numeric, then the
value
variable is parsed as an integer.
} else {
int x = Integer.parseInt(value);
field.set(this, x);
}
This process is repeated for each line of the configuration file.
Understanding the Thread Pool
The spider uses a thread pool to perform its tasks. A thread pool programming pattern
creates N number of threads to perform tasks. These tasks are organized in a queue. This
queue then feeds a workload to the number of threads. As soon as a thread completes its task,
it will request the next task from the queue until all tasks have been completed. At this point,
the thread can then terminate, or sleep, until there are new tasks available. The number of
threads is tuned to increase overall performance.
Thread pools address two different problems. First, they usually provide improved per-
formance when executing large numbers of asynchronous tasks. This is due to reduced
overhead of recreating threads. Secondly, a thread pool provides a means of managing the
resources consumed (including threads), when executing a collection of tasks.
As of JDK 1.5, Java now provides direct support of thread pools. This is done through the use
of the
ThreadPoolExecutor
. In addition to creating a
ThreadPoolExecutor
,
you must also create a class that implements the
BlockingQueue
. Using these two ob-
jects, you can implement a thread pool.
To use the thread pool, you submit tasks to run. These tasks are objects created from
classes that implement the
Runnable
interface. As new tasks are submitted to the thread
pool, they are moved to the queue until the thread pool has time to execute them.
The thread pool is particularly valuable to a spider. Even on a single processor computer,
using a thread pool will considerably increase the performance of the spider. This is because
a spider spends a good deal of time waiting. When a spider submits a request to a web sever
the spider immediately begins waiting for the response. It is much faster to wait on several
web pages instead of just one.
Constructing a Worker Class
To use a Java thread pool, you must add objects which implement the
Runnable
inter-
face. These objects are the individual workers that perform the actual work being performed
by the thread pool. The Heaton Research Spider uses the
SpiderWorker
class for this
purpose. The
SpiderWorker
class is shown in Listing 14.3.