Java Reference
In-Depth Information
g.fillRect(10, y, (int) (progressWidth * this.donePercent), 16);
Finally, a black border is drawn around the total width of the progress bar. This allows
the user to see a white region that represents how much longer it will take to process.
g.setColor(Color.BLACK);
g.drawRect(10, y, progressWidth, 16);
The bar will be updated until it reaches 100%.
Summary
A spider is a special kind of bot. A spider scans HTML pages and looks for more pages to
visit. A spider would theoretically continue finding URLs forever, or until it has visited every
URL on the Internet. However, there are two factors that limit a spider from doing this. First,
a spider is often given a maximum depth to visit. If a page is deeper, relative to the home page,
than this depth, the spider will not visit it. Secondly, spiders are often instructed to stay within
a specified set of hosts. This set is often just one host.
This chapter showed how to use the Heaton Research Spider. The Heaton Research
Spider is an open source spider, written in Java and C#, and is available for free from Heaton
Research, Inc. To use the Heaton Research Spider you must create two objects.
First, a SpiderOptions object must be created to provide the spider with some
basic configuration options. The SpiderOptions properties can either be set directly,
or loaded from a file.
Second, a WorkloadManager is also required. For simple spiders, you may
choose to use the MemoryWorkloadManager . This will store all URLs in the com-
puter's memory. For larger spiders, you should use the SQLWorkloadManager . The
SQLWorkloadManager stores the URL workload in an SQL database.
This chapter provided four recipes. The first recipe showed how to use a spider to check
for bad links on a web site. The second recipe showed how to use a spider to download a site.
The third recipe showed how to create a spider that accesses a large number of URLs that did
not restrict itself to a single host. The fourth recipe showed how to display the statistics from
the database, as a spider executes.
Now that you know how to use the Heaton Research Spider, the next chapter will take
you through the internals of how the Heaton Research Spider works. If you are content with
only using the Heaton Research Spider and do not wish to learn the internals of how to build
a spider yet, you may safely skip to Chapter 16 and learn how to create well behaved bots;
otherwise, continue through Chapters 14 and 15 and learn the internals of the Heaton Re-
search Spider.
Search WWH ::




Custom Search