Information Technology Reference
In-Depth Information
GIS data into more general formats. Geographic data are best handled by dedi-
cated GIS applications; we will describe a simple application in Chapter 11.
Application log files
Another type of data is generated automatically by server software. A variety
of server applications keep logs and these are sometimes useful for research.
For instance, web servers generally log each page request made to the server.
A researcher may want to examine or summarize the contents of such logs.
The two most common web server applications are Apache , which runs on
Unix/Linux, Mac OS X and Windows, and IIS (Internet Information Server)
which runs on Windows (NT and above). Apache is an open-source program
and freely downloadable, IIS is integral to Windows, but not generally installed
unless on Windows Server.
The Apache log is a text file, in which for each page request, a line contains
by default the IP network address of the computer requesting the page, the date
and time of the request, the request type and the page requested, the response
code from the server, the number of characters sent, and the type of browser
that made the request.
65.55.210.49 - - [19/Aug/2007:04:07:27 -0400] “GET/robots.txt
HTTP/1.0” 200 183 “-” “msnbot/1.0 (+ http://search.msn.com/ msnbot.htm)”
The logging can be configured with different options in Apache if desired.
Generally, IP addresses are put in the log file because doing the reverse lookup
from IP address to computer DNS name is time-consuming, and a busy web
server will not be able to keep up with the requests. The response code is a
numerical code indicating success (200), not found (404) and various other
possibilities. In the above example, the browser is actually the MSN web
crawler updating its search catalog.
If only a summary of web access is desired, there are programs such as
awstat (awstats.sourceforge.net), webalizer (www.mrunix.net/webalizer) and
analog (www.analog.cx) that can generate summaries with optional histogram
and pie-chart graphics. These programs generally handle the conversion of IP
addresses to DNS computer names.
If more fine-grained analysis of the log files is required, the log files can be parsed
with your favorite text-processing application (your author (AES) favors the perl
language for tasks like these). The fields should be separated, the date should be
converted into a desired format, some special symbols may need to be unencoded,
and then the output saved into a desired format, perhaps a table or a database.
Be aware that the IP address (or computer name) can be used to a limited
extent to identify the user who made the web page request. Additionally, some
URLs may contain extra parameters that might have sensitive data like pass-
words. If there are any applicable privacy concerns, they should be considered
when publishing results from these data.
Search WWH ::




Custom Search