Ramex: A Sequence Mining Algorithm Using Poly-trees - New Contributions in Information Systems and Technologies

Information Technology Reference

In-Depth Information

ii) Number of Rules: The association rules' systems that support the itemset and

sequence mining usually generate a huge number of rules, and therefore, it is difficult

for the user to decide which rules to use.

iii) Scalability: Since most of the existing algorithms use a lattice structure in the

search space and need to scan the database more than once, they are not compatible

with very large databases.

In order to create holistic environments in data mining, micro-patterns and macro-

patterns must be differentiated [7]. The micro-patterns correspond to small

percentages of data; for instance, in association rules, it is usual to have a measure of

support that includes support values ≥ 5%, with high confidence rules being chosen.

On the other hand, the macro-patterns involve a large percentage of data, for example

in the regression model all data elements are used. The micro-patterns are

characterized by high confidence, while macro-patterns are characterized by high

support. There are other examples of micro-patterns: in sequence mining a support ≥

1% is frequent; in the classification problem, by using decision trees, each branch of

the tree corresponds to a small percentage of the data; in the classification problem

using the k-nearest neighbor, the comparisons are made using a reduced number of k

elements. Finally, regarding macro-patterns in techniques such as regression,

hypothesis testing, clustering or reduction of attributes, all data are taken into account.

The proposed algorithm was coined with the Latin name, Ramex, meaning “branch

of a tree”. Ramex introduces a new vision for classic problems of sequence mining,

considering the accumulation of events, and allowing the search of macro-patterns,

instead of searching only for micro-patterns.

Ramex provides a comprehensive view of the sequences, providing the user the

visualization of the data sequences with a special kind of tree, a poly-tree, which

shows all the items, but only the most relevant sequences retrieving the x-ray of the

dataset.

Ramex has been implemented in different scenarios: web mining [6] and financial

studies [16], [20]. This paper also includes part of the work of [8].

In Section 2, the related work with sequence and process mining is presented. In

Section 3, the proposed algorithm is reported and a numeric example is presented. In

Section 4, computational results are reported using the IBM Quest Synthetic datasets

generator. Finally, in Section 5, we draw some conclusions.

2

Related Work

In this section concepts are presented and definitions are established in order to be

reused in the following sections. Sequence Mining is referred, Process Mining is

introduced and the definition of Poly-tree is established.

2.1

Sequence Mining

The problem of pattern discovery is to extract interesting patterns from the data. It is

difficult to define what the ingredients of an interesting pattern are. The temporal data

mining can be divided into four different approaches: Periodic Patterns [22],

Sequential Discovery [4], Frequent Episodes [15] and Markov Chain Models [5].

New Contributions in Information Systems and Technologies

Search WWH ::

Custom Search

Home