Information Technology Reference
In-Depth Information
Fig. 1. Internet Retailer Web Site and available Data
3.1 Server and Cookie Data
Web server logs (see Fig. 2) are automatically generated by the server when a user is
visiting an URL at a site. In a server log are registered the IP address of the visitor,
the time when he is entering the website, the time duration he is visiting the requested
URL and the URL he is visiting. From these information can be generated the path
the user is going on this website
. Web server logs are important information in
order to discover the behavior of the user at the website. However, the IP address
stored in the server log does not always lead to the particular user. The address might
have been changed by the proxy server and the heuristic used for the identification of
a user session does not always hold. Therefore cookie logs might be more preferable.
Cookies are short text files that are generated by the server on the client site while
his browser is visiting the website. Cookies allow to set a special identification
number or code for a particular user. Each time a user is visiting the website he can be
identified by this identification code. However to set a cookie requires that the user
has given permission for that which is not always the case. Therefore only the
combination of server logs and cookie log will be a good basis for data mining.
In the example given in Figure 2 a typical server log file is shown. Table 1 shows
the code for the URL. In Table 2 is shown the path the user is taking on this website.
The user has been visiting the website 4 times. A user session is considered to be
closed when the user is not taking a new action within 20 minutes. This is a rule of
thumb that might not always be true. Since in our example the time duration between
the first user access starting at 1: 54 and the second one at 2:24 is longer than 20
minutes we consider the first access and the second access as two sessions. However,
it might be that the user was staying on this website for more than 20 minutes since he
was not entering the website by the main page.
2
Search WWH ::




Custom Search