Database Reference
In-Depth Information
the items were purchased was irrelevant; therefore, if a rule 1
3 holds, then
3
1 must also hold. However, there are many real-world situations where
the order in which actions are taken is relevant for the decision maker. For
instance, in our Northwind example, we may be interested not only in the
items bought together but also in statements like “Customers that purchase
item X frequently order item Y afterward , with 40% support.” We can see
that if the rule 1
1 does since the order is now
important. When order matters, we are in the case of sequential pattern
mining , which is a particular case of association analysis.
Consider a sequence
3 holds, not necessarily 3
such that the s i 's are itemsets as
defined above when studying association rules. We say that a sequence
s 1 =
s 1 ,s 2 ,...,s n
,
if there exist integers i 1 ,...,i n such that a 1 ⊆ b i 1 , a 2 ⊆ b i 2 , ..., a n ⊆ b i n ,
where the b i j 's are itemsets in s 2 , ordered and different from each other.
For example, the sequence
a 1 ,a 2 ,...,a n
is contained in another sequence s 2 =
b 1 ,b 2 ,...,b m
{
Shoes
}
,
{
Shirt , Tie
}
iscontainedinthe
sequence
{
Belt
}
,
{
Shoes
}
,
{
Jacket
}
,
{
Shirt , Belt , Tie
}
,
{
Jacket
}
, because
the term
{
Shoes
}
in the first sequence matches a similar term in the second
one, and the term
{
Shirt , Tie
}
in the first sequence is included in
{
Shirt , Belt ,
Tie
}
in the second one. On the other hand, the sequence
{
Shoes
}
,
{
Shirt
}
is not contained in the sequence
.
Let us consider the set of transactions performed by customers given
in Fig. 9.6 , where the transactions are ordered by customer and, for each
customer, by transaction time.
We say a customer C supports a sequence s if s iscontainedinthe
sequence corresponding to C .The support s of a sequence is the fraction
of total customers who support it. In this case, the problem of finding
sequential patterns can be defined as follows: given a database D of customer
transactions, find the maximal sequences that have a certain minimum
support.
As an example, we want to find the maximal sequential patterns with
support greater than 2 in the transaction database above. In this case, there
will be Shoes followed by Shoes ,and Shoes followed by Shirt and Tie , because
we can see that customers 1 and 4 support the sequence
{
Shoes , Shirt
}
{
Shoes
}
,
{
Shoes
}
,
and customers 2 and 4 support the sequences
.
Algorithms for finding sequential patterns are similar to the algorithms for
discovering association rules. However, the number of candidate sequences
will be much larger than the number of candidate itemsets because:
{
Shoes
}
,
{
Shirt , Tie
}
￿ In association rule mining, an item can appear at most once in an
itemset. For example, given two items i 1 and i 2 , only one 2-itemset can be
generated. On the contrary, for sequential pattern mining, there are many
sequences that can be generated, like
{i 1 ,i 2 }
,
{i 1 }, {i 1 }
,
{i 2 }, {i 1 }
,
and so on.
￿ As already commented, order matters for sequences but not for item-
sets. For example,
{
1 , 2
}
and
{
2 , 1
}
represent the same itemset, while
Search WWH ::




Custom Search