Ramex: A Sequence Mining Algorithm Using Poly-trees - New Contributions in Information Systems and Technologies - page 140

Information Technology Reference

In-Depth Information

Table 2. Parameters for IBM Data Generator

Symbol Meaning

T average number items per stream

N number of different items in thousands

We tested the datasets by varying the parameters of the IBM data generator as

shown in Table 3. The dataset T?N1 shows how the algorithms perform by varying

average number of items per stream and the dataset T10N? is used to study the

variation of numbers of different items in thousands.

Table 3. Varying parameters of the datasets

T?N1 average number of items per stream CPU Time (second)

t2n1 2 13,3

t4n1 4 13,1

t6n1 6 13,2

t8n1 8 13,1

t10n1 10 13,3

T10N? number of different items in thousands CPU Time (second)

t10n1 1 13,3

t10n2 2 15,9

t10n4 4 16,3

t10n6 6 16,7

t10n8 8 16,6

Our algorithm performed very well in all the experiments with a running time

around 15 seconds, showing an excellent scalability. The robustness of the Back-and-

Forward Heuristic is based on the use of condensed data and takes advantage of

polynomial algorithms that are able to find poly-trees in networks.

4.2

Data Visualization

The output of Ramex uses DOT language in order to be visualized by [12] and [21].

In Figure 3 a poly-tree with 7472 nodes and 7471 edges is provided from the t10n8

dataset. Note that there is a subset of inner nodes in the graph highlighted by the

chars 'a', 'b', 'c' and 'd'. The fork “abcd” is the origin of the radial poly-tree.

Given the holistic view of the dataset, Tulip software allows the user to zoom in on

the radial poly-tree and discover micro-sequences.

Next Page

New Contributions in Information Systems and Technologies

Search WWH ::

Custom Search

Home