Mining Sequential Patterns from Probabilistic Databases by Pattern-Growth - Advances in Databases - page 125

Database Reference

In-Depth Information

C10D10K

Gazelle

400

250

BFS+P

DFS+P

PGA

BFS+P

DFS+P

PGA

350

200

300

250

150

200

100

150

100

50

50

0

0

0

2

4

6

8

0

0.01

0.02

0.03

0.04

0.05

θ

values (in %age)

θ

values (in %age)

Fig. 1. Scalability of the three algorithms for decreasing values of θ ,forsynthetic

dataset ( C10D10K ) (L) and for real dataset Gazelle (R)

C = 10,

θ

= 1%

D = 10K,

θ

= 25%

1600

BFS+P

DFS+P

PGA

1400

1000

1200

100

1000

800

10

600

400

1

BFS+P

DFS+P

PGA

200

0

0

0

40

80

120

160

0

20

40

60

80

No. of sources (in thousands)

No. of events per source

Fig. 2. Scalability for increasing number of sources D , with average number of events

per source C = 10 and ES = 1% (L), and for increasing number of events per sources

C with number of sources D =10 K and ES = 25% (R)

4 Experimental Evaluation

In [12], the authors adapt GSP and SPADE/SPAM to yield a breadth-first (BFS)

and a depth-first (DFS) algorithm, respectively. In addition, they also propose

a probabilistic pruning technique to eliminate potential infrequent candidates

without support computation, achieving an overall speedup. We therefore, choose

two of the faster candidate generation variants from [12] i.e. BFS+P (breadth-

first search with pruning) and DFS+P (depth-first search with pruning), and

compare these with the Pattern-Growth Approach (PGA) proposed in this work.

Our implementations are in C# (Visual Studio .Net 2005), executed on a ma-

chine with a 3.2GHz Intel CPU and 3GB RAM running XP (SP3). We begin by

describing the datasets used for experiments. Then, we demonstrate the scalabil-

ity of the three algorithms. Our reported running times are averages from multiple

runs. In our experiments, we use both real ( gazelle from Blue Martini [10]) and

synthetic (IBM Quest [2]) datasets. We transform these deterministic datasets to

probabilistic form in a way similar to [1,4,18,12]; we assign probabilities to each

Next Page

Advances in Databases

Search WWH ::

Custom Search

Home