Database Reference
In-Depth Information
Table 2.
An example of computing the ES of all S-extensions of
s
=(
a
)forsource
X
in the sample database of Table 1. The gray row is the
B
X,s
array. In the bottom
half, the cells that changed in
F
after processing the corresponding event are marked
as gray. After processing source
X
, values in the final column are updated to
G
.
D
X
(
a, c, e
:0
.
1)
(
b, c, d
:0
.
7)
(
a, d, e
:0
.
2)
(
b, c, e
:0
.
6)
(a)
0.1
0.1
0.28
0.28
(a)
0.0
0.0
0.02
0.020
(b)
0.0
0.07
0.07
0.196
(c)
0.0
0.07
0.07
0.196
(d)
0.0
0.07
0.076
0.076
(e)
0.0
0.0
0.02
0.176
(i)
(ii)
(iii)
(iv)
Objective:
To compute the
ES
of all the S- or I-extensions of
s
in one pass
over the database, and thus discover all frequent extensions of
s
.
Steps:
We consider the two cases of finding the S- and I-extensions of
s
in turn.
We compute the frequent S-extensions of
s
as follows: Let
i
be a source,
D
i
=
(
e
1
,c
1
)
,...,
(
e
r
,c
r
)
,and
s
=
s
1
,...,s
q
be any sequence. Initialize two arrays
F
and
G
,eachofsize
q
=
to zero and consider each source
i
in turn. Then
scan
B
i,s
up-to the first non-zero entry
e
k
,andforeveryitem
x
in
e
,
k<≤ r
,
update
F
[
x
] as follows:
|I|
F
[
x
] := ((1
∗ F
[
x
]) + (
c
∗ B
i,s
[
−
− c
)
1])
(5)
We keep track of all the non-zero entries in
F
[
x
], and once finished with source
i
,update
G
[
x
]:=
G
[
x
]+
F
[
x
]andreset
F
[
x
] to zero. After all the sources
i
are processed, all the entries in
G
[
x
]
≥ θm
are frequent S-extensions of
s
.An
example of this computation is shown in Table 2.
For the I-extensions case, the initializations are the same as for the S-extensions
case. For a sequence
s
=
s
1
,...,s
q
and source
i
, when scanning
B
i,s
up-to the
first non-zero entry
e
which means
s
q
⊆ e
, for every item
x
in
e
that is not in
s
q
and is lexicographically greater than all items in
s
q
,update
F
[
x
] as follows:
F
[
x
]:=(1
− c
)
∗ F
[
x
]+(
B
i,s
[
]
− B
i,s
[
−
1]
∗
(1
− c
))
,
(6)
and apply Eq. 6 to all the events
e
,
≤ r
,where
s
q
⊆ e
(or alternatively where
the
B
i,s
values change). After all the sources
i
are processed, all the entries in
G
[
x
]
≥ θm
are frequent I-extensions of
s
.
Pattern-Growth Algorithm.
An overview of our pattern-growth algorithm
is in Fig. 1. We first compute the set of frequent 1-sequences,
L
1
(Line 3) (as-
sume
L
1
is in ascending order). For each 1-sequence
x
, first we compute the
B
i,x