Information Technology Reference
In-Depth Information
Sequence Rules for Web Clickstream Analysis
Erika Blanc and Paolo Giudici
Department of Economics and Quantitative Methods
University of Pavia
Via San Felice 5, 27100 Pavia (Italy)
blaner@eco.unipv.it , giudici@unipv.it
Abstract. We present new methodologies for the search of sequence rules in
the analysis of web clickstream data. We distinguish direct and indirect
sequence rules, and show how to draw data mining conclusions on the basis of
them. We then compare sequence rules, which are local models, with a global
probabilistic expert system model. Our analysis have been conducted on a real
e-commerce dataset.
1 Clickstream Analysis
Every time an user links up at a web site, the server keeps track of all the actions
accomplished in the log file . What is captured is the "click flow" (click-stream) of the
mouse and the keys used by the user during the navigation inside the site. Usually at
every click of the mouse corresponds the visualization of a web page. Therefore, we
can define a click-stream as the sequence of the requested pages.
The succession of the pages shown by a single user during his navigation inside the
Web identifies an user session. Typically, the analysis only concentrates on the part of
each user session concerning the access at a specific site. The set of the pages seen,
inside a user session, coming from a determinate site is known with the term server
session or, it is more commonly said that they identify a visit (J. Srivastava et al.,
2000).
We remark that other statistical methods can be applied to web clickstream data,
in order to detect association rules. For instance, Blanc and Giudici (2002) consider,
besides sequence rules, odds ratios and graphical loglinear models, while Blanc and
Tarantola (2002) consider bayesian networks and dependency networks. Furthermore,
in a recent paper, Di Scala and La Rocca (2002) also consider the application of
Markov chain models to web data, with main emphasis on assessing homeogenity of
the considered Markov chain. Finally, Rognoni, Giudici e Polpettini (2002) consider
using Markov chains to estimate directly the transition probabilities from one page to
another.
Search WWH ::




Custom Search