Database Reference
In-Depth Information
Table 2. An example of computing the ES of all S-extensions of s =( a )forsource
X in the sample database of Table 1. The gray row is the B X,s array. In the bottom
half, the cells that changed in F after processing the corresponding event are marked
as gray. After processing source X , values in the final column are updated to G .
D X
( a, c, e :0 . 1)
( b, c, d :0 . 7)
( a, d, e :0 . 2)
( b, c, e :0 . 6)
(a)
0.1
0.1
0.28
0.28
(a)
0.0
0.0
0.02
0.020
(b)
0.0
0.07
0.07
0.196
(c)
0.0
0.07
0.07
0.196
(d)
0.0
0.07
0.076
0.076
(e)
0.0
0.0
0.02
0.176
(i)
(ii)
(iii)
(iv)
Objective: To compute the ES of all the S- or I-extensions of s in one pass
over the database, and thus discover all frequent extensions of s .
Steps: We consider the two cases of finding the S- and I-extensions of s in turn.
We compute the frequent S-extensions of s as follows: Let i be a source, D i
=
( e 1 ,c 1 ) ,..., ( e r ,c r )
,and s =
s 1 ,...,s q
be any sequence. Initialize two arrays
F and G ,eachofsize q =
to zero and consider each source i in turn. Then
scan B i,s up-to the first non-zero entry e k ,andforeveryitem x in e , k<≤ r ,
update F [ x ] as follows:
|I|
F [ x ] := ((1
∗ F [ x ]) + ( c ∗ B i,s [
− c )
1])
(5)
We keep track of all the non-zero entries in F [ x ], and once finished with source
i ,update G [ x ]:= G [ x ]+ F [ x ]andreset F [ x ] to zero. After all the sources i
are processed, all the entries in G [ x ]
≥ θm are frequent S-extensions of s .An
example of this computation is shown in Table 2.
For the I-extensions case, the initializations are the same as for the S-extensions
case. For a sequence s = s 1 ,...,s q and source i , when scanning B i,s up-to the
first non-zero entry e which means s q ⊆ e , for every item x in e that is not in
s q
and is lexicographically greater than all items in s q ,update F [ x ] as follows:
F [ x ]:=(1 − c ) ∗ F [ x ]+( B i,s [ ] − B i,s [ 1] (1 − c )) ,
(6)
and apply Eq. 6 to all the events e , ≤ r ,where s q ⊆ e (or alternatively where
the B i,s
values change). After all the sources i are processed, all the entries in
G [ x ]
≥ θm are frequent I-extensions of s .
Pattern-Growth Algorithm. An overview of our pattern-growth algorithm
is in Fig. 1. We first compute the set of frequent 1-sequences, L 1 (Line 3) (as-
sume L 1 is in ascending order). For each 1-sequence x , first we compute the B i,x
 
Search WWH ::




Custom Search