Information Technology Reference
In-Depth Information
This leads us to propose, in the first part of this chapter, a new algorithm which
aims at enhancing the performances of mining sequential association rules while
reducing resource consumption. We make the following contributions:
1. This algorithm only makes one scan of the database;
2. It is based on a highly compact main memory data structure, saving the required
storage resources;
3. It allows a fast access to the data thanks to index structure;
4. The experimental results show that our algorithm outperforms existing ones.
Mining sequential patterns has many interesting applications as it is. In addition to
performance issue, many works have proposed new features, such as incremental
sequential pattern mining [5] [12], restriction by constraints [14] or dealing with new
types of data, such as query plans [26]. Among interesting extensions, multidimen-
sional sequence mining is a major issue [16]. In fact, it allows discovering rules that
links between sequences (e.g. transaction history) and regular attributes data (such as
those in client file). Such rules may describe customer profiles, e.g. to which category
of individuals a given purchase (or a given path traversal pattern) corresponds, or
discover to which category of individuals correspond a given path traversal pattern.
This is the subject of the second part of this chapter.
Our approach consists in mining individual profiles - based on attributes - for the
most frequent sequential patterns. At this end, we propose a characterization based
approach where a whole sequence is considered as a complex attribute. Thus, it makes
sense to integrate reasoning on sequences (frequent patterns, similarity, grouping)
while other dimensions are considered as descriptive of each sequence group. Briefly,
our approach is based on two steps. The first gathers all database sequences around
the most similar sequential pattern in order to derive classes of sequences represented
by their sequential patterns . The second step describes these classes (and their
sequential patterns) by their multidimensional attributes values characterizing them.
The characteristic rules express which attribute properties are typical to frequent
sequential patterns. The sequential patterns should fulfill a given support threshold,
and the rule should be satisfied with a given confidence threshold. The extraction of
such rules raises three main questions:
1. How to determine that a sequence or a subsequence is similar to another?
2. How to group multidimensional sequences with a given sequential pattern?
3. How to determine the most characteristic properties for a group of sequences?
We have adopted different solutions that we detail afterward.
Both methods have been experimented using a real dataset related to population
daily activity and mobility survey. It aims at mining frequent patterns of activity
sequences, then at analyzing the profile of the population having those typical activity
sequences. In addition, other experiments have been conducted to test the scalability
of the sequential pattern mining algorithm, and use synthetic data and public available
data widely used.
This chapter combines and extends two previously published papers, namely [17]
[18]. It is organized as follows: a background section will provide an overview of the
state of the art, before stating the concepts and definitions used further, and finally, it
Search WWH ::




Custom Search