Sequential Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Summarization subsequences which are used for sequence clustering are a subset

of closed sequential patterns. Wang et al. [ 29 ] proposed a new algorithm, CON-

TOUR, which efficiently mines summarization subsequences directly from the input

sequences, and uses the set of summarization subsequences to construct a sequence

clustering algorithm. CONTOUR is based on the pattern growth paradigm and lever-

ages some effective pruning methods to prune the unpromising parts of search space

by fully exploring some nice properties of the summarization subsequences.

5.2

Multi-level, Multi-dimensional Sequential Pattern Mining

Algorithms discussed so far are based on 1 or 2-dimensional spaces. In many ap-

plications, sequences are often associated with different circumstances, and such

circumstances form a multiple dimensional space. For example, customer purchase

sequences are associated with region, time, customer group, and others. It is in-

teresting and useful to mine sequential patterns associated with multi-dimensional

information. For example, one may find that retired customers (with age) over 60

may have very different patterns in shopping sequences from the professional cus-

tomers younger than 40. Similarly, items in the sequences may also be associated

with different levels of abstraction, and such multiple abstraction levels will form a

multi-level space for sequential pattern mining. For example, one may not be able to

find any interesting buying patterns in an electronics store by examining the concrete

models of products that customers purchase. However, if the concept level is raised

a little high to brand-level, one may find some interesting patterns, such as “ if one

bought an IBM PC, it is likely s/he will buy a new IBM Laptop and then a Cannon

digital camera within the next six months ”[ 3 ].

Pinto et al. [ 30 ] proposed a uniform sequential (or Uni-Seq ) algorithm by em-

bedding multi-dimensional information into sequences. For each sequence, a set of

multi-dimensional circumstance values can be treated as one added transaction in

the sequence. Similarly, for each item, its associated multi-level information can be

added as additional items into the same transaction where that item resides. With

such transformation, the database becomes a typical single-dimensional, single-level

sequence database, and the PrefixSpan algorithm is applied to efficiently mine such

transformed sequence databases. In the same work [ 30 ], Pinto et al. also proposed

Seq-Dim and Dim-Seq algorithms, which divide the mining process into two steps.

Seq-Dim algorithm first mines sequential patterns, and then for each sequential

pattern, forms projected multi-dimensional database and finds multi-dimensional

patterns within the projected databases, while Dim-Seq algorithm uses the reverse

procedure.

Yu and Chen formally defined the multi-dimensional sequential pattern mining

problem in [ 31 ]. In this study, they introduced two algorithms, the first of which

is developed by modifying the traditional Apriori algorithm [ 19 ] and the second by

modifying the PrefixSpan algorithm [ 16 ]. The first algorithm has different methods

for candidate generation and support counting compared with the original Apriori

algorithm. The second algorithm has different approaches for sequential pattern

Frequent Pattern Mining

Search WWH ::

Custom Search

Home