Database Reference
In-Depth Information
CustomerId
Time
Items
1
2012-06-02
{ Shoes }
1
2013-10-03
{
Shoes
}
2
2013-06-01
{ Shoes }
2
2013-06-15
{
Jacket
}
2
2013-08-14
{ Shirt,Tie }
3
2012-03-02
{
Shoes,Tie
}
4
2013-06-02
{ Shoes }
4
2013-07-12
{
Shirt,Belt,Tie
}
4
2013-10-21
{ Shoes }
5
2013-11-06
{
Shoes
}
Fig. 9.6 A set of transactions of the Northwind customers
{i 1 }, {i 2 }
and
{i 2 }, {i 1 }
correspond to different sequences and must
be generated separately.
Further, the Apriori principle also holds for sequential data since any data
sequence that contains a particular k -sequence must also contain all of its
( k āˆ’
1)-subsequences.
Basically, for generating sequential patterns, we enumerate all possible
sequences and count their support. In this way, we first generate 1-sequences,
then 2-sequences, and so on. The general form of the sequences produced is
given next:
1-sequences:
i 1 ,i 2 ,...,i n
2-sequences:
{i 1 ,i 2 }, {i 1 ,i 3 },...,{i nāˆ’ 1 ,i n },
{i 1 }, {i 1 }, {i 1 }, {i 2 },...,{i nāˆ’ 1 }, {i n }
3-sequences:
{i 1 ,i 2 ,i 3 }, {i 1 ,i 2 ,i 4 },...,{i 1 ,i 2 }, {i 1 },...,
{i 1 }, {i 1 ,i 2 },...,{i 1 }, {i 1 }, {i 1 },...,{i n }, {i n }, {i n }
We can see that we first generate all sequences with just one itemset
(the 1-sequences). To produce the sequences with elements containing two
itemsets, we generate all possible combinations of two itemsets in the 1-
sequences and eliminate the ones that do not satisfy the minimum support
condition. With the remaining sequences, we do the same to generate the
sequences with elements containing three itemsets and continue in the same
way until no more sequences with the required support can be produced.
From the above, it follows that the same principles apply to associates
rules and sequential pattern analysis; thus, we do not get into further details
and direct the interested reader to the references given at the end of the
chapter.
 
Search WWH ::




Custom Search