Java Reference
In-Depth Information
The Structure of Surfing
As a user uses the web browser there is considerable network activity occurring to sup-
port the browsing experience. The Hyper Text Transport Protocol (HTTP) is what allows
this to happen. HTTP specifies how web browsers and web servers manage the flurry of
requests and responses that occur while a web user is surfing the web. Once it understood
how web browsers and servers communicate, the built in HTTP classes, provided by Java,
can be used to obtain information from a web server programmatically.
If you already understand the structure of HTTP requests between web servers and web
browsers, you may be able to skip this chapter and proceed directly to Chapter 2, “Analyzing
Sites”, or Chapter 3, “Simple HTTP Requests”. Chapter 2 expands on Chapter 1 by showing
how to use a “network analyzer” to examine, first hand, the information exchanged between
a web server and web browser. A network analyzer can be very valuable when attempting to
program a bot to access a very complex web site. However, if you are already familiar with us-
ing network analyzers, you may proceed directly to Chapter 3, which begins with Java HTTP
programming.
The first thing to understand about web browsing is that it is made up of a series of HTTP
requests and responses. The web browser sends a request to the server, and the server re-
sponds. This is a one sided communication. The opposite never occurs. The web server will
never request something of the web browser.
The HTTP protocol begins when the browser requests the first page from a web server.
It continues as additional pages from that site are requested. To see how this works, the next
section will examine the requests that are sent between the web server and web browser.
Examining HTTP Requests
In this section the requests that pass between the web server and web browser will be
examined. The first step is to examine the HTTP requests for a typical web page. This page
will be covered in the next section. Understanding how a single page is transmitted is key to
seeing how that page fits into a typical surfing session.
A Typical Web Page
A typical web page is displayed on the browser by placing text and images via requests.
One of the first things to understand about HTTP requests is that at the heart of each request
is a Uniform Resource Locater (URL). The URL tells the web server which file should be
sent. The URL could point to an actual file, such as a Hyper Text Markup Language (HTML),
or it could point to an image file, such as a GIF or JPEG.
URLs are what the web user types into a browser to access a web page. Chapter 3, “Sim-
ple HTTP Requests”, will explain what each part of the URL is for. For now, they simply iden-
tify a resource, somewhere on the Internet, that is being requested.
Search WWH ::




Custom Search