Java Reference
In-Depth Information
Sometimes this flow of requests and responses are difficult to determine. Just viewing
the source of the HTML pages and trying to understand what is going on can be a lengthy
task. For sites that use techniques such as AJAX, the requests can become quite complex.
To analyze HTTP requests and properly design the bot, the network analyzer should
be started and begin recording network traffic. This will be discussed later in this chapter.
The web browser should then be launched and the web browser started. It is a good idea to
clear the browser's cache at this point. The procedure for clearing the cache varies with each
browser; this option is usually located under the Internet configuration. Cached files may
cause some information to be hidden from the network analyzer.
Once the web browser is launched, the desired web site should be accessed. While on
the desired web site, use the site as a regular user would. The objective while using the ana-
lyzer is to get to the data that the bot should access. Take as direct a path to the desired data
as possible. The simpler the path, the easier it will be to emulate. As the web site is navigated,
the network analyzer will record the progress. The analyzer will capture every request made
by the web browser. In order to access this site, the bot must provide the same requests to
the web server.
Using a Network Analyzer to Debug a Bot
Creating a bot for some sites can be tricky. For example, a site may use complex messag-
es to communicate with the web server. If the bot does not exactly reproduce these requests,
it will not function properly. If the bot is not functioning properly, then a network analyzer
should be used to debug the bot.
The technique that I normally use is to run the network analyzer while my bot runs. The
network analyzer can track the HTTP requests issued by the bot just as easily as it can track
the requests issued by a real user on a web browser.
If the web server is not communicating properly with the bot, then one of the HTTP re-
quests must be different than what a regular web browser would issue. The packets captured
from the bot's session with the desired web site should then be compared to the packets
captured from a regular browser session with the desired web site.
The next section will show how to use a Network Analyzer. There are many different Net-
work Analyzers available. The one that will be used for this topic is WireShark. WireShark is
a free open source network analyzer that runs on a wide variety of operating systems.
Understanding WireShark
WireShark is one of the most popular network analyzers available. WireShark was once
known by the name Ethereal, but due to copyright issues changed their name to WireShark.
WireShark can be downloaded from the following web site:
http://www.wireshark.org/
Search WWH ::




Custom Search