Database Reference
In-Depth Information
algorithms often degrades dramatically. In the field of association rule mining, al-
gorithms such as CHARM [ 21 ], CLOSET [ 22 ], CLOSET+ [ 23 ], CARPENTER[ 24 ]
are proposed for mining frequent closed itemsets , which overcome some of these dif-
ficulties. Similarly, in the area of sequential pattern mining, Yan et al. [ 25 ] proposed
an algorithm called CloSpan, which mines only closed sequential patterns instead
of mining the complete set of sequential patterns. CloSpan can produce a signifi-
cantly less number of sequences than the traditional (i. e., full-set) methods while
preserving the same expressive power since the whole set of frequent subsequences,
together with their supports, can be derived easily from the closed sequential pattern
mining results.
CloSpan [ 25 ] is developed based on the philosophy of sequential pattern growth
and mines closed sequential patterns efficiently by discovery of sharing portions
of the projected databases in the mining process and pruning any redundant search
space, which therefore substantially enhances the mining efficiency. The algorithm
first uses a lexicographic sequence tree to store the generated sequences using both I-
Step and S-Step mechanisms and discovers all of the frequent sequences (closed and
non-closed). During this mining process, CloSpan introduces a search space pruning
condition: whenever it finds two exactly same prefix-based project databases, it can
stop growing one prefix, which is called equivalent projected database pruning. To
determine whether two projected databases are the same for two sequences one of
which is a subsequence of another, CloSpan just needs to compare the size of the
two projected databases, which has been proved in [ 25 ]. Finally, CloSpan uses a
post-pruning step to filter out non-closed sequences.
BIDE [ 26 ] is an efficient algorithm for mining frequent closed sequences without
candidate maintenance. It adopts a novel sequence closure checking scheme called
BI-Directional Extension , where the forward directional extension is used to grow
the prefix patterns and also checks the closure of prefix patterns, while the backward
directional extension can be used to both check closure of a prefix pattern and prune
the search space. Furthermore, the search space pruning is made more efficient by
using the BackScan pruning method and the ScanSkip optimization technique [ 26 ].
The BackScan search space pruning is based on the theorem that given a prefix s p ,if
i ( i is a positive integer and is no greater than the length of s p ) and there exists any
item that appears in each of its i -th semi-maximum periods, s p can be safely pruned.
The interested readers are referred to [ 26 , 27 ] for more details.
Li et al. [ 28 ] proposed Gap-BIDE for mining closed sequential patterns with
gap constraints from a set of input sequences. As Gap-BIDE inherits the same
design philosophy as BIDE algorithm [ 26 ], it shares the same merit, that is, it
does not need to maintain a candidate pattern set, which saves space consumption.
Specifically, Gap-BIDE firstly establishes a framework to enumerate all the frequent
gap-constrained patterns. Then, it leverages a gap-constrained Bi-Directional clo-
sure checking scheme under this enumeration framework, and thus avoids keeping a
large candidate set of closed frequent gap-constrained patterns in order for checking
if a newly mined pattern is closed. At last, it derives a gap-constrained BackScan
pruning technique to prune the unpromising parts of the search space for closed
gap-constrained sequential pattern mining.
Search WWH ::




Custom Search