Database Reference
In-Depth Information
− c k )
1] + c k ∗ A [ k −
A [ k, ]=(1
∗ A [ k, −
1 ,−
1] .
(4)
Lemma 1. Given a p-sequence D i
and a sequence s , by applying Eq. 4 repeat-
edly, we correctly compute Pr[ s D i
] .
Fas t L 1 Computation. It was shown by [12] that it was possible to compute all
frequent 1-sequences in a single pass over the database. The procedure for this is
as follows: Initialize two arrays F and G ,eachofsize q =
|I|
, to zero and consider
each source i in turn. If D i
,for k =1 ,...,r take the pair
( e k ,c k ) and iterate through each x ∈ e k , setting F [ x ]:=( F [ x ]
=
( e 1 ,c 1 ) ,..., ( e r ,c r )
− c k )) + c k .
Once finished with source i ,if F [ x ] is non-zero, update G [ x ]:= G [ x ]+ F [ x ]and
reset F [ x ] to zero (for each source i use a list structure to keep track of all the
non-zero entries in F ). Finally, for any item x ∈I
(1
x,D p ).
, G [ x ]= ES (
PrefixSpan. PrefixSpan is based on the idea of pattern-growth, and works as
follows: First, all frequent 1-sequences are discovered. It is argued that any of the
frequent 2-sequences must begin with a frequent 1-sequence and therefore, the
complete set of sequential patterns can be partitioned into as many subsets as
the number of frequent 1-sequences where each 1-sequence is taken as a prefix.
A projected database is a smaller databased based on some prefix (sequence).
For example, in the sample database of Table 1(R), a ( d )-projected database is
{
. The sub-
set of sequential patterns is mined by constructing the set of projected databases
based on frequent 1-sequences and mining each recursively. For example, if ( e )
is a frequent 1-sequence in the above ( d )-projected database, a ( d )( e )-projected
database looks like
( a, d, e :0 . 2)( b, c, e :0 . 6)
,
( a, d, e :0 . 3)
,
( ,e :0 . 5)( b, c, e :0 . 4)
}
. This recursive mining process con-
tinues until no more sequential patterns could be found. For details see [13].
It was noted in [12] that it is not correct to simply perform the fast L 1
computation on a projected database. For example, if an ( a )-projected database
contained two p-sequences ( b :0 . 5)( b :0 . 5)( a :0 . 5) and ( b :0 . 5)( a :0 . 5)( b :0 . 5),
then when considering whether ( a )( b ) is frequent, it is not correct to compute
the expected support of ( b ) in the projected database (for example, both p-
sequences above would give the same contribution - 0.75 - to the support of
( b ) in the projected database, but clearly their support for ( a )( b ) is different).
In this work, we show how these sub-routines could be put together to find all
frequent sequential patterns using PGA.
{
( b, c, e :0 . 6)
,
,
}
3.1 Pattern-Growth Step
Pre-conditions
1. s is a previously discovered frequent sequence.
2. The list of sources i ,wherePr[ s D i
] > 0 is available.
3. The B i,s arrays for all such sources i are also available.
 
Search WWH ::




Custom Search