Information Technology Reference
In-Depth Information
8
From Sequence Mining to Multidimensional
Sequence Mining
Karine Zeitouni
PRISM Laboratory, University of Versailles-St-Quentin
45 Avenue des Etats-Unis, 78035 Versailles, France
Karine.Zeitouni@prism.uvsq.fr
Abstract. Sequential pattern mining has been broadly studied and many algorithms have been
proposed. The first part of this chapter proposes a new algorithm for mining frequent se-
quences. This algorithm processes only one scan of the database thanks to an indexed structure
associated to a bit map representation. Thus, it allows a fast data access and a compact storage
in main memory. Experiments have been conducted using real and synthetic datasets. The
experimental results show the efficiency of our method compared to existing algorithms.
Beyond mining plain sequences, taking into account multidimensional information associated
to sequential data is for a great interest for many applications. In the second part, we propose a
characterization based multidimensional sequential patterns mining. This method first groups
sequences by similarity; then characterizes each cluster using multidimensional properties
describing the sequences. The clusters are built around the frequent sequential patterns. Thus,
the whole process results in rules characterizing sequential patterns using multidimensional
information. This method has been experimented towards a survey on population daily activity
and mobility in order to analyze the profile of the population having typical activity sequences.
The extracted rules show our method effectiveness.
Keywords: Sequential data mining, data structures, algorithms, optimization.
8.1 Introduction
The problem of mining sequential patterns was first introduced in the context of mar-
ket basket analysis [2]. It aims to retrieve frequent patterns in the sequences of prod-
ucts purchased by customers through time ordered transactions. Several algorithms
have been proposed in order to improve the performances and to reduce required
space in memory [20] [26] [11] [13] [10]. Other works have concerned mining
frequent sequences in DNA [9] or Web Usage Mining [19] [22]. In general, it could
apply to any database containing a collection of item sequences. Our target applica-
tion was based on a time-use survey, more precisely the database reports the daily
activities and displacements carried out by each surveyed person in a household.
After having tested the most cited algorithms, we have observed their weakness to
scale with large size datasets. Indeed, most of them perform multiple scans of the
database, which is the main bottleneck in mining. Others require too large memory
space to load the data when the database size increases.
 
Search WWH ::




Custom Search