Java Reference
In-Depth Information
C HAPTER 16: W ELL B EHAVED B OTS
• Understanding the Ethical Use of Bots
• Understanding CAPTCHAs
• Using User-Agent Filtering
• Working with the Bot Exclusion Standard
• Implementing a Robots.txt Filter
Not all sites are welcoming, or even indifferent, to bots. Some sites actively take steps
to curtail bot usage. These web sites often have good reason. Unethical bots can be a real
nuisance to a web master. Some commonly unwanted BOT behaviors are:
• Posting Advertisements to BLOG Comments
• Posting Advertisements to Forums
• Registering Fake Users in Forums
• Creating Large Numbers of Web Postings
• Spamming the Referrer Logs of Web Sites
• Harvesting Email Addresses from Web Sites
You should never create a bot that performs any of these actions. It simply bogs down the
Internet with a ton of useless information and annoys people using the Internet for legitimate
purposes.
Although, there are exceptions, most web sites do not object to bots that simply scan
for information, so long as you intend to use the information for a legal purpose. Unsolicited
email, or SPAM, is becoming illegal in many parts of the world; therefore, using a spider to
harvest email addresses is not something you should engage in. Furthermore, it is not some-
thing this topic will teach you to do.
When programming your bot, you really have to be careful when your bot posts infor-
mation to a site. Posting information to web sites is where bots and web masters most often
clash. Most of the programming decisions really just come down to common sense. For ex-
ample, you could create a bot that posts a link to your site on hundreds of forums. First, you
should ask yourself whether the forum owners really want their systems clogged with this
useless information.
 
Search WWH ::




Custom Search