Information Technology Reference
In-Depth Information
It has been applied to discover all frequent activity sequences in a time use survey
database within an urban area. The experiments have shown that IBM and its variants
provide better performances than existing algorithms in most cases: a better response
time is reached, while a very low memory is required. Experimental results have also
shown that IBM2 and IBM2_OPT outperforms IBM and IBM_OPT , which in turns
outperforms SPAM and PrefixSPAN for large and very large databases and for a
limited number of distinct items.
Notice that the proposed data structure for IBM and IBM2 algorithms, and espe-
cially the SV vector, could be used for other purposes as similarity search between
sequences and sequence clustering. Another perspective is to apply it to different
application contexts, as the analysis of query plans and genomic sequences. We have
already experimented many various datasets: activity sequences, chess play se-
quences, and the web pages sequences msnbc.dat from the UCI KDD archive. In the
context of the activity-mobility survey, we will explore the mining of spatial
sequences, such as trajectories [7]. This is still a challenging research issue.
This chapter also describes a sequence centered approach for mining multidimen-
sional sequential rules . It characterizes the main sequences and performs in three
steps. First, it mines sequential patterns and sets them as class models . In the second
step, a similarity algorithm is used to gather each model with similar sequences of the
database to form the classes. Then in the third step, a characterization algorithm is
used to extract attribute values characterizing each class. Compared to existing works,
our approach: (i) allows extracting rules of the form (a 1 , a 2 , a i ,…, a n ) Î <s 1 , s 2 , s i ,…,
s n > [ R ] where attribute values (a 1 , a 2 , a i …, a n ) are characteristics of objects whom
sequences are similar to <s 1 , s 2 , s i ,…, s n > with a significance R ; ii) considers similar
sequences rather than their exact values, producing more relevant knowledge for a
better decision-making. In perspective, this approach will be extended to multidimen-
sional spatial and temporal sequences in order to characterize the trajectories of mov-
ing objects. At this end, similarity search of trajectories may use the approach of [23]
which is also based on LCSS.
References
1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of the
20th Int. Conf. Very Large Data Bases (VLDB), Santiago, Chile (September 1994)
2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the 11th Int'l Conference
on Data Engineering, Taipei, Taiwan (March 1995)
3. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment
Search Tool. J. Mol. Biol. 215, 403-410 (1990)
4. Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In:
Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1999), Philadel-
phia, PA, pp. 359-370 (June 1999)
5. Cheng, H., Yan, X., Han, J.: IncSpan: Incremental mining of sequential patterns in large
database. In: Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases
(KDD 2004), Seattle, WA, pp. 527-532 (August 2004)
6. Freschi, V., Bogliolo, A.: Longest Common Subsequence between Run-Length-Encoded
Strings: a New Algorithm with Improved Parallelism. Elsevier Information Processing
Letters 90(4), 167-173 (2004)
Search WWH ::




Custom Search