Database Reference
In-Depth Information
Algorithm 1.
Pattern-Growth Approach
1:
Input:
SLU probabilistic database
D
p
and support threshold
θ
.
2:
Output:
All sequences
s
with
ES
(
s, D
p
)
≥ θm
.
3:
L
1
←
ComputeFrequent-1-sequences(
D
p
)
4:
for all
sequences
x ∈ L
1
do
5: Compute
B
i,x
arrays
6: Call ProjectedDB(
x
)
7:
function
ProjectedDB(
s
)
8:
L
S
←
Compute Frequent S-extensions
{
fast
L
1
computation
}
9:
L
I
←
Compute Frequent I-extensions
{
fast
L
1
computation
}
10: Output all Frequent Sequences
{s
extended with
x
, for all
x
in
L
S
and
L
I
}
11:
for all
x ∈ L
S
do
12:
t ←s ·{x} {
S-extension
}
13: Compute
B
i,t
arrays
14: ProjectedDB(
t
)
15:
for all
x ∈ L
I
do
16:
t ←s
1
,...,s
q
∪{x} {
I-extension
}
17: Compute
B
i,t
arrays
18: ProjectedDB(
t
)
19:
end function
arrays for each source (Line 5) and also keep track of all the sources where
Pr[
x D
i
]
>
0 (projected database) and then, call the ProjectedDB(
x
) sub-
routine (Line 6).
In the ProjectedDB sub-routine, we first compute all the frequent S- and
I-extensions of
s
using the fast
L
1
computation by applying Eq. 5 and Eq. 6
accordingly (Line 8 and 9). In step 10, output all the frequent S- and I-extensions
of
s
computed in the previous steps. In steps 11-18, for every sequence
t
which
is a frequent S- or I-extension of
s
, compute
B
i,t
arrays and also keep track of
all the sources where Pr[
t D
i
]
>
0, and call the ProjectedDB sub-routine
recursively to mine all frequent sequential patterns.
Table 3.
Number of DP computations (in millions) performed by each algorithm, for
the set of experiments in Fig. 1 (L) and in Fig. 2 (R)
BFS+P DFS+P PGA
θ
= 0.5 % 3.141
C10D10K
C
=10
,θ
=1% BFS+P DFS+P PGA
D
= 10K
3.199 1.494
1.465
1.711
0.886
θ
= 1 % 1.711
1.465 0.886
D
= 20K
2.890
3.370
1.735
θ
= 2 % 0.879
0.781 0.487
D
= 40K
5.716
6.718
3.464
θ
= 4 % 0.505
0.487 0.281
D
= 80K
11.460
13.423
6.936
gazelle
D
=10
= 25%
C
= 10
K, θ
θ
= 0.01 % 0.777
0.375 0.261
0.100
0.111
0.081
θ
= 0.02 % 0.149
0.172 0.152
C
= 20
0.690
0.694
0.314
θ
= 0.03 % 0.073
0.129 0.122
C
= 40
13.044
13.891
4.868
θ
= 0.04 % 0.045
0.110 0.107
C
= 80
2353.677 2782.807 881.480