WELL BEHAVED BOTS - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

Origin of CAPTCHAs

The first discussion of automated tests, which distinguish humans from computers for

the purpose of controlling access to web services, appears in a 1996 manuscript of Moni Naor

from the Weizmann Institute of Science. Primitive CAPTCHAs seem to have been later devel-

oped in 1997 at AltaVista by Andrei Broder and his colleagues, to prevent bots from adding

URLs to their search engine. The team sought to make their CAPTCHA resistance with an

Optical Character Recognition (OCR) attack. The team looked at the manual to their Brother

scanner, which included recommendations for improving OCR results.

These recommendations included similar typefaces, and plain backgrounds. The team

created puzzles by attempting to simulate what the manual claimed would cause bad OCR.

In 2000, von Ahn and Blum developed and publicized the notion of a CAPTCHA, which in-

cluded any program that could distinguish humans from computers. They invented multiple

examples of CAPTCHAs, including the first CAPTCHAs to be widely used (at Yahoo!).

Accessibility Concerns

CAPTCHAs are usually based on reading text. This can present a problem for blind or vi-

sually impaired users who would like to access the protected resource. However, CAPTCHAs

do not necessarily have to be visual. Any hard, artificial intelligence problem, such as speech

recognition, could be used as the basis of a CAPTCHA. Some implementations of CAPTCHAs

permit visually impaired users to opt for an audio CAPTCHA.

Because CAPTCHAs are designed to be unreadable by machines, common assisted tech-

nology tools, such as screen readers, cannot interpret them. Since sites may use CAPTCHAs

as part of the initial registration process, or even every login, this challenge can completely

block some access. In certain jurisdictions, site owners could become a target for litigation if

they are using CAPTCHAs that discriminate against certain people with disabilities.

Circumvention of CAPTCHAs

There are a number of means that unethical bot writers use to defeat CAPTCHAs. If a

web master has taken the time to insert a CAPTCHA, they surely do not want bots to access

their site. Although, this chapter will not demonstrate how to circumvent a CAPTCHA, it will

discuss some of the methods used to circumvent a CAPTCHA, so you are aware of both sides

of this “battle”. Some of the more common means to circumvent CAPTCHAs are listed here:

• Optical Character Recognition (OCR)

• Cheap Human Labor

• Insecure Implementation

Optical Character Recognition is a computer process that converts images to ASCII text.

This is often used for FAX documents. By using OCR, you can capture the text image of the

FAX, and import it into a word processor for editing. OCR technology can also be used to

circumvent a CAPTCHA. However, most modern CAPTCHAs take steps to make it very dif-

ficult for traditional OCR technology to read them.

Search WWH ::

Custom Search

Home