Database Reference
In-Depth Information
Summarization subsequences which are used for sequence clustering are a subset
of closed sequential patterns. Wang et al. [ 29 ] proposed a new algorithm, CON-
TOUR, which efficiently mines summarization subsequences directly from the input
sequences, and uses the set of summarization subsequences to construct a sequence
clustering algorithm. CONTOUR is based on the pattern growth paradigm and lever-
ages some effective pruning methods to prune the unpromising parts of search space
by fully exploring some nice properties of the summarization subsequences.
5.2
Multi-level, Multi-dimensional Sequential Pattern Mining
Algorithms discussed so far are based on 1 or 2-dimensional spaces. In many ap-
plications, sequences are often associated with different circumstances, and such
circumstances form a multiple dimensional space. For example, customer purchase
sequences are associated with region, time, customer group, and others. It is in-
teresting and useful to mine sequential patterns associated with multi-dimensional
information. For example, one may find that retired customers (with age) over 60
may have very different patterns in shopping sequences from the professional cus-
tomers younger than 40. Similarly, items in the sequences may also be associated
with different levels of abstraction, and such multiple abstraction levels will form a
multi-level space for sequential pattern mining. For example, one may not be able to
find any interesting buying patterns in an electronics store by examining the concrete
models of products that customers purchase. However, if the concept level is raised
a little high to brand-level, one may find some interesting patterns, such as “ if one
bought an IBM PC, it is likely s/he will buy a new IBM Laptop and then a Cannon
digital camera within the next six months ”[ 3 ].
Pinto et al. [ 30 ] proposed a uniform sequential (or Uni-Seq ) algorithm by em-
bedding multi-dimensional information into sequences. For each sequence, a set of
multi-dimensional circumstance values can be treated as one added transaction in
the sequence. Similarly, for each item, its associated multi-level information can be
added as additional items into the same transaction where that item resides. With
such transformation, the database becomes a typical single-dimensional, single-level
sequence database, and the PrefixSpan algorithm is applied to efficiently mine such
transformed sequence databases. In the same work [ 30 ], Pinto et al. also proposed
Seq-Dim and Dim-Seq algorithms, which divide the mining process into two steps.
Seq-Dim algorithm first mines sequential patterns, and then for each sequential
pattern, forms projected multi-dimensional database and finds multi-dimensional
patterns within the projected databases, while Dim-Seq algorithm uses the reverse
procedure.
Yu and Chen formally defined the multi-dimensional sequential pattern mining
problem in [ 31 ]. In this study, they introduced two algorithms, the first of which
is developed by modifying the traditional Apriori algorithm [ 19 ] and the second by
modifying the PrefixSpan algorithm [ 16 ]. The first algorithm has different methods
for candidate generation and support counting compared with the original Apriori
algorithm. The second algorithm has different approaches for sequential pattern
Search WWH ::




Custom Search