Information Technology Reference
In-Depth Information
2 The Available Data
The data set that we consider for the analysis is the result of the elaboration of a log
file concerning a site of e-commerce. The source of the data cannot be specified;
however it is the website of a company that sells hardware and software products; it
will be referred to as "a webshop". The accesses to the website have been registered in
a logfile for a period of about two years, since 30 september 1997 to 30 june 1999.
The logfile has then been processed to produce a dataset, named "sequences". Such
dataset contains the user id (c_value), a variable with the date and the instant the
visitor has linked to a specific page (c_time) and the web page seen (c_caller). Table
1 reports a small extract of the available dataset , corresponding to one visit.
Table 1. The considered dataset.
c_value
c_time
c_caller
c_order
70ee683a6df…
14OCT97:11:09:01
home
1
70ee683a6df…
14OCT97:11:09:08
catalog
2
70ee683a6df…
14OCT97:11:09:14
program
3
70ee683a6df…
14OCT97:11:09:23
product
4
70ee683a6df…
14OCT97:11:09:24
program
5
Table 1 describes that the visitor corresponding to the identifier (cookie)
70ee683a6df… has entered the site on the fourteenth of october, 1997, at 11:09:01,
and has visited, in sequence, the pages home, catalog, program, product, program,
leaving the website at 11:09:24.
The whole data set contains 250711 observations, each corresponding to a click,
that describe the navigation paths of 22527 visitors among the 36 pages which
compose the site of the webshop. The visitors are taken as unique, that is, no visitors
appears with more than one session. On the other hand, we remark that a page can
occurr more than once in the same session.
This data set is a noticeable example of a transactional dataset. It can be used
directly, in a form as in Table 1, with as many rows as the number of total clicks, to
determine association and sequence rules. Alternatively, a derived dataset can be
used, named "visitors". It is organised by sessions, and contains variables that can
characterise each of such sessions. These variables include important quantitative
ones, such as the total time length of the server session ( length), the total number of
clicks made in a session ( clicks ), and the time in which the session starts ( start , setting
Search WWH ::




Custom Search