Big Data Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 10.3 Serial and parallel frequent sequence mining algorithms

Type

Acronym

Name

Cite

Serial

AprioriAll

[ 2 ]

Serial

BIDE

BI-Directional Extension

[ 56 ]

Serial

CloSpan

Closed Sequential Pattern Mining

[ 61 ]

Serial

FreeSpan

Frequent Pattern-projected Sequential Pattern Mining

[ 20 ]

Serial

GSP

General Sequential Patterns

[ 54 ]

Serial

PrefixScan

Prefix-projected Sequential Pattern Mining

[ 43 ]

Serial

SPADE

Sequential PAttern Discovery using Equivalence classes

[ 62 ]

Serial

WAP-miner

Web Access Pattern Miner

[ 42 ]

Parallel

ACME

Advanced Parallel Motif Extractor

[ 50 ]

Parallel

DGSP

Distributed GSP

[ 44 ]

Parallel

DPF

Data Parallel Formulation

[ 18 ]

Parallel

EVE

Event Distribution

[ 27 ]

Parallel

EVECAN

Event and Candidate Distribution

[ 27 ]

Parallel

HPSPM

Hash Partitioned Sequential Pattern Mining

[ 53 ]

Parallel

MG-FSM

Large-Scale Frequent Sequence Mining

[ 38 ]

Parallel

NPSPM

Non-Partitioned Sequential Pattern Mining

[ 53 ]

Parallel

Par-ASP

Parallel PrefixSpan with Sampling

[ 9 ]

Parallel

Par-CSP

Parallel CloSpan with Sampling

[ 10 ]

Parallel

PLUTE

Parallel Sequential Patterns Mining

[ 45 ]

Parallel

pSPADE

Parallel SPADE

[ 65 ]

of parallelizing these approaches. For easy reference, Table 10.3 lists the serial and

parallel methods described in this section.

5.1

Serial Frequent Sequence Mining

As direct extensions of the Apriori algorithm for the FIM problem, AprioriAll [ 2 ] and

General Sequential Patterns (GSP) [ 54 ], by Srikant and Agrawal, make use of the

downward closure property and follow the same general multi-pass candidate gener-

ation outline as described in Sect. 2 . The join operation is redefined for the sequence

domain s.t. only ( k

−

2)-prefix are

joined. GSP generalizes the problem definition, introducing time constraints (mini-

mum or maximum time period between elements in a frequent sequence), a sliding

window (sequence itemset items may come from a set of transactions with times-

tamps within a user-specified time window), and user-defined taxonomies (the set of

items I may be provided as a hierarchy and sequential patterns may include items

across all levels of the hierarchy).

Pattern growth methods also exhibit direct extensions to the sequence do-

main. Zaki [ 64 ] developed Sequential PAttern Discovery using Equivalence classes

(SPADE), patterned after their ECLAT [ 62 ] algorithm for FIM, which is introduced

in Sect. 2 . During the first scan of the sequence database, SPADE creates, for each

discovered sequence element (itemset), an id-list containing < sequence id,

timestamp > pairs denoting locations of the element in the database. Candidate

1)-length frequent sequences with the same ( k

Frequent Pattern Mining

Search WWH ::

Custom Search

Home