Java Reference
In-Depth Information
Not every bot programmer will use bots ethically. Because of this, web sites are forced
to take action to curtail bot usage. This chapter will cover some of the ways that web sites do
this. The purpose of this chapter, is not to teach you to circumvent any of these mechanisms.
Rather, this chapter makes you aware of them, so your bots always act in a well-behaved man-
ner. The most common methods used to curtail bot usage are listed here:
• CAPTCHAs
• Bot Exclusion File
• User-Agent Filtering
This chapter will explore each of these methods.
Using a CAPTCHA
One of the most common methods used to thwart bot access is the CAPTCHA. You have
likely seen CAPTCHAs on popular web sites. A CAPTCHA displays an image of distorted text
and asks the user to enter the characters displayed. Figure 16.1 shows four CAPTCHAs from
popular web sites.
Figure 16.1: Four CAPTCHAs
CAPTCHA is an acronym for “Completely Automated Public Turing Test to Tell Comput-
ers and Humans Apart”. The term CAPTCHA is trademarked by Carnegie Mellon University.
Luis von Ahn, Manuel Blum, Nicholas J. Hopper of Carnegie Mellon University, and John
Langford of IBM coined the term in the year 2000. Most CAPTCHAs require that the user
type the letters of a distorted image, sometimes with the addition of an obscured sequence of
letters or digits, that appears on the screen.
Search WWH ::




Custom Search