Database Reference
In-Depth Information
code, API specification mining and API usage mining from open source repositories,
and Web log data mining [ 1 ]. Sequential pattern mining has become a focused theme
in data mining research. Over the past years, various surveys on sequential pattern
mining have been published and provide useful resources [ 1 - 6 ].
Similar to association rule mining [ 7 ], sequential pattern mining was initially
motivated by the decision support problem in retail industry and was first addressed
by Agrawal and Srikant in [ 8 ]. This problem was defined as follows: Given a set
of sequences, where each sequence consists of a list of elements and each element
consists of a set of items, and given a user-specified min_support threshold, sequen-
tial pattern mining is to find all frequent subsequences, i.e., the subsequences whose
occurrence frequency in the set of sequences is no less than min_support [ 8 ].
Since then, abundant literature has been dedicated to this research and tremen-
dous progress has been made. Improvements in sequential pattern mining algorithms
have followed similar trend in the related area of association rule mining and have
been motivated by the need to process more data at a faster speed with lower cost.
Generally, sequential pattern mining algorithms can be categorized into two major
classes: Apriori-based approaches [ 8 - 14 ] and pattern growth algorithms [ 15 , 16 ].
The first class of algorithms (i.e., Apriori-based approaches) form the vast ma-
jority of algorithms proposed in the literature for sequential pattern mining. They
depend mainly on the Apriori property, which states the fact that any super-pattern
of an infrequent pattern cannot be frequent, and are based on a candidate generation-
and-test paradigm proposed in association rule mining [ 7 ]. These methods have the
disadvantage of repeatedly generating an explosive number of candidate sequences
and scanning the database to maintain the support count information for these se-
quences during each iteration of the algorithm, which makes them computationally
expensive.
To alleviate these problems, pattern growth approach for efficient sequential
pattern mining adopts a divide-and-conquer, pattern growth paradigm as follows,
sequence databases are recursively projected into a set of smaller projected databases
based on the current sequential pattern(s), and sequential patterns are grown in each
projected database by exploring only locally frequent fragments [ 17 ]. The frequent
pattern growth paradigm removes the need for the candidate generation and prune
steps that occur in the Apriori-based algorithms and repeatedly narrows the search
space by dividing a sequence database into a set of smaller projected databases,
which are mined separately.
Additionally, there are various kinds of extensions for sequential pattern mining,
including (1) closed sequential pattern mining, (2) multi-level, multi-dimensional
sequential pattern mining, (3) incremental methods, (4) hybrid methods, (5) approx-
imate methods, (6) top- k closed sequential pattern mining, and (7) frequent episode
mining. This chapter will discuss algorithms which fall into these categories in detail.
The remainder of this chapter is organized as follows. Section 2 defines the se-
quential pattern mining problem. Section 3 discusses Apriori-based approaches. In
Sect. 4, pattern growth algorithms are introduced. The extensions for sequential pat-
tern mining in different directions are discussed in Sect. 5. Finally, we conclude this
chapter in Sect. 6.
Search WWH ::




Custom Search