Sequential Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

code, API specification mining and API usage mining from open source repositories,

and Web log data mining [ 1 ]. Sequential pattern mining has become a focused theme

in data mining research. Over the past years, various surveys on sequential pattern

mining have been published and provide useful resources [ 1 - 6 ].

Similar to association rule mining [ 7 ], sequential pattern mining was initially

motivated by the decision support problem in retail industry and was first addressed

by Agrawal and Srikant in [ 8 ]. This problem was defined as follows: Given a set

of sequences, where each sequence consists of a list of elements and each element

consists of a set of items, and given a user-specified min_support threshold, sequen-

tial pattern mining is to find all frequent subsequences, i.e., the subsequences whose

occurrence frequency in the set of sequences is no less than min_support [ 8 ].

Since then, abundant literature has been dedicated to this research and tremen-

dous progress has been made. Improvements in sequential pattern mining algorithms

have followed similar trend in the related area of association rule mining and have

been motivated by the need to process more data at a faster speed with lower cost.

Generally, sequential pattern mining algorithms can be categorized into two major

classes: Apriori-based approaches [ 8 - 14 ] and pattern growth algorithms [ 15 , 16 ].

The first class of algorithms (i.e., Apriori-based approaches) form the vast ma-

jority of algorithms proposed in the literature for sequential pattern mining. They

depend mainly on the Apriori property, which states the fact that any super-pattern

of an infrequent pattern cannot be frequent, and are based on a candidate generation-

and-test paradigm proposed in association rule mining [ 7 ]. These methods have the

disadvantage of repeatedly generating an explosive number of candidate sequences

and scanning the database to maintain the support count information for these se-

quences during each iteration of the algorithm, which makes them computationally

expensive.

To alleviate these problems, pattern growth approach for efficient sequential

pattern mining adopts a divide-and-conquer, pattern growth paradigm as follows,

sequence databases are recursively projected into a set of smaller projected databases

based on the current sequential pattern(s), and sequential patterns are grown in each

projected database by exploring only locally frequent fragments [ 17 ]. The frequent

pattern growth paradigm removes the need for the candidate generation and prune

steps that occur in the Apriori-based algorithms and repeatedly narrows the search

space by dividing a sequence database into a set of smaller projected databases,

which are mined separately.

Additionally, there are various kinds of extensions for sequential pattern mining,

including (1) closed sequential pattern mining, (2) multi-level, multi-dimensional

sequential pattern mining, (3) incremental methods, (4) hybrid methods, (5) approx-

imate methods, (6) top- k closed sequential pattern mining, and (7) frequent episode

mining. This chapter will discuss algorithms which fall into these categories in detail.

The remainder of this chapter is organized as follows. Section 2 defines the se-

quential pattern mining problem. Section 3 discusses Apriori-based approaches. In

Sect. 4, pattern growth algorithms are introduced. The extensions for sequential pat-

tern mining in different directions are discussed in Sect. 5. Finally, we conclude this

chapter in Sect. 6.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home