Information Technology Reference
In-Depth Information
Ramex: A Sequence Mining Algorithm Using Poly-trees
Luís Cavique
Departamento de Ciências e Tecnologia, Universidade Aberta, Portugal
lcavique@uab.pt
Abstract. Sequence mining combines the discovery of frequent itemsets and
the order they appear in. Most of the sequence pattern discovery techniques
present some handicaps like the generation of a huge number of rules and the
lack of scalability. In this work the proposed algorithm concerns the analysis of
the whole rather than the parts, thus providing a holistic view of the sequences.
The algorithm analyzes event logs and allows a non-expert user to understand
the sequences using a poly-tree visualization. The scalability associated with
condensed data structures, which shrink the data without losing information,
allows dealing with the Big Data challenge. Ramex was implemented in
different scenarios.
Keywords: pervasive business intelligence, sequence mining, poly-trees.
1 Introduction
The emergence of the Internet has changed the way people interact with computers.
People can access information from personal computers or mobile devices (smart
phones or tablets) anytime and anywhere. The mobile, or ubiquitous dimension
associated with the Web 2.0 and the Internet of things generate a large volume of data
with an increasing updating velocity. The exponential growth of data, when
compared to the linear growth of processing capabilities, leads to a decline in the
capacity to extract useful knowledge from the stored data.
In order to understand the activities of complex computer systems and the eventual
diagnosis of problems, keeping a record of event logs is an essential task.
Accordingly, the storage and analysis of event logs is a pertinent challenge.
Pervasive Information Systems aim to study how information environments affect
human interactions. Pervasive spaces go beyond the Human-Computer Interaction
(HCI) in order to create socio-technical systems that benefit stakeholders and users. In
particular, Pervasive Business Intelligence looks for holistic views which combine
information from different latencies [17].
Most of the sequence pattern discovery techniques present three common
handicaps: the need of parameters, the huge number of rules that do not permit a
global view and the scalability problems:
i) Parameters: The user must specify a minimum support threshold to find the
desired patterns. A useless output can be expected by pruning either too many or too
few items. The process must be repeated interactively, which becomes very time
consuming for large databases.
Search WWH ::




Custom Search