Database Reference
In-Depth Information
support counts can then be computed via id-list intersections. Zaki used lattice theory
[ 12 ] to show that any possible frequent sequence can be obtained as a union or join
of shorter frequent sequences. He proposed both breath-first-search and depth-first-
search approaches to recursively decompose the lattice into prefix-based classes,
providing an alternative to the breath-first-search approach of GSP.
Han et al. [ 20 ] introduced Frequent pattern-projected Sequential pattern mining
(FreeSpan), which Pei et al. [ 43 ] followed with Prefix-projected Sequential pattern
mining (PrefixSpan). FreeSpan and PrefixSpan recursively partition the sequence
data based on frequent items and sequence prefixes, respectively. They first scan
the database and derive frequent items. Then, projected databases are constructed
for each of the frequent items, or length-1 frequent sequences, eliminating those
sequences that only contain infrequent items. By choosing to partition on sequence
prefixes, PrefixSpan is able to find all frequent sequences by examining only prefix
sub-sequences and projecting only their corresponding postfix sub-sequences into
projected databases. Pei et al. also explore pseudo-projections (when data fits in
main memory) and bi-level projections in PrefixSpan, using the downward closure
property to prune items in projected databases.
Guralnik and Karypis [ 18 ] altered a tree projection algorithm for discovering
frequent itemsets by Agarwal et al. [ 4 ] to mine frequent sequences. As in the FIM
approach, a lexicographic projection tree is grown through bi-level candidate se-
quence generation (nodes at level k
1). Each
node in the tree represents a frequent sequence, which can be extended either by
adding a lexicographically correct item in the last element of the sequence ( itemset
extension ) or a new itemset/element to the sequence ( sequence extension ). Each ac-
tive node, one that still has a possibility of being extended, maintains four sparse
count matrices used to efficiently update frequencies during the projection phase.
In the context of Web access mining, Pei et al. [ 42 ] developed Web Access Pattern
miner (WAP-miner) for mining symbol sequences constructed from Web logs. The
algorithm uses a Web access pattern tree ( WAP-tree ) as a compressed representation
of the sequence database and a conditional search tree-projection mechanism for
growing sequence patterns. Rajimol and Raju [ 46 ] surveyed several WAP-tree based
extensions which improve the data structure and projection mechanism of WAP-
miner.
While finding all frequent patterns is costly, the majority of the resulting pat-
terns may not be interesting. CloSpan (Yan et al. [ 61 ]) and BIDE (Wang and Han
[ 56 ]) focus on finding frequent closed sequences . Though following a candidate
generate-and-test methodology, CloSpan prunes more of the search space through
CommonPrefix and Backward Sub-Pattern pruning . It also speeds up testing by not-
ing that projected databases with the same number of items are equivalent. BIDE
uses a paradigm called BI-Directional Extension to both check closure of candidate
sequences and prune the search space.
As a related problem first studied by Mannila et al. [ 36 ], frequent episode mining
finds frequent collections of events that occur relatively close to each other in a long
symbol sequence, given a partial event order. Joshi et al. [ 26 ] developed a universal
formulation for sequence patterns, encompassing both Mannila et al.'s definition of
1 generate candidates for level k
+
Search WWH ::




Custom Search