Information Technology Reference
In-Depth Information
A preceeds B. For an introduction see, for example, Hastie, Tibshirani and Friedman
(2001).
In web clickstream analysis, a sequence rule is typically indirect: namely, between
the visit of page A and the visit of page B other pages can be seen. On the other
hand, in a direct sequence rule A and B are seen consecutively.
In this paragraph we shall consider indirect rules; direct ones will be considered in
the next paragraph.
A sequence rule model is, essentially, an algorithm that searches for the most
interesting rules in a database.
The indexes commonly used in Web Mining to evaluate the importance of a
sequence rule are the indexes of support and confidence.
Consider the indirect sequence A B and indicate as N
A the number of visits
which appear in such sequence, at least once. Let N be the total number of the server
sessions. Notice that the rule A B will be counted only once even if it had been
repeated several times inside the session.
The support for the rule A B is obtained dividing the number of server sessions
which satisfy the rule by the total number of server sessions:
B
N B
support {
B
( 1 )
A
A
=
N
Therefore, it is a relative frequency that indicates the percentage of the users that
have visited in succession the two pages. In presence of a high number of visits, as it
usually happens, it is possible to state that the support for the rule expresses the
probability an user session contains the two pages in sequence:
support {
B
= Pr {
B
A
A
( 2 )
The confidence for the rule A B instead is obtained dividing the number of
server sessions which satisfy the rule by the number of sessions containing the page
A:
N
( 3 )
A
B
{
}
N
support
A
B
N
confidence {
B
A
N
B
A
=
=
=
{ A
N
support
A
A
N
Therefore, the confidence approximates the conditional probability that in a server
session in which has been seen the page A is subsequently required page B.
What just said has been referred to itemsets A and B containing one page each;
however, each itemset can contain more than one page, and the previous definition stil
 
Search WWH ::




Custom Search