Java Reference
In-Depth Information
C HAPTER 1: T HE S TRUCTURE OF HTTP R EQUESTS
• Understanding the Structure of Surfing
• Using the HTTP Recipes Web Site
• Understanding HTTP Requests
This topic will show how to create HTTP programs in Java. HTTP programming allows
programs to be constructed that retrieve information from web sites in much the same way
as a human user surfs the web. These programs are called bots. This topic will present many
useful recipes for commonly performed HTTP programming tasks. Using these recipes a
wide array of bots can be constructed.
To create HTTP programs with Java an understanding of the structure of HTTP requests
is required. This chapter will introduce this structure. Understanding this structure will allow
the programmer to create programs that surf the web just as a user does.
The HTTP Recipes Examples Site
The Java bots created in this topic are only half of the HTTP communication protocol.
These bots must communicate with a web server. For example, a typical HTTP bot may
access a popular online bookseller and obtain a price for a certain topic. I could write an
example bot that accesses Amazon.com and obtains this price. However, there are several
problems with this.
• Amazon may prohibit bot access of their site
• Amazon may modify their site and break my bot
Both issues are important. If the examples in this topic were all written to use real-world
web sites, a major site redesign to these sites could leave many of the topic examples non-
functional. One minor change to one of these sites, and the related examples in the topic
would immediately become out of date.
Additionally, some sites do not allow access by bots. There are two main ways to stop bot
access to a web site.
• Lawyers
• Technology
 
Search WWH ::




Custom Search