Information Technology Reference
In-Depth Information
Table 2. Parameters for IBM Data Generator
Symbol Meaning
T average number items per stream
N number of different items in thousands
We tested the datasets by varying the parameters of the IBM data generator as
shown in Table 3. The dataset T?N1 shows how the algorithms perform by varying
average number of items per stream and the dataset T10N? is used to study the
variation of numbers of different items in thousands.
Table 3. Varying parameters of the datasets
T?N1 average number of items per stream CPU Time (second)
t2n1 2 13,3
t4n1 4 13,1
t6n1 6 13,2
t8n1 8 13,1
t10n1 10 13,3
T10N? number of different items in thousands CPU Time (second)
t10n1 1 13,3
t10n2 2 15,9
t10n4 4 16,3
t10n6 6 16,7
t10n8 8 16,6
Our algorithm performed very well in all the experiments with a running time
around 15 seconds, showing an excellent scalability. The robustness of the Back-and-
Forward Heuristic is based on the use of condensed data and takes advantage of
polynomial algorithms that are able to find poly-trees in networks.
4.2
Data Visualization
The output of Ramex uses DOT language in order to be visualized by [12] and [21].
In Figure 3 a poly-tree with 7472 nodes and 7471 edges is provided from the t10n8
dataset. Note that there is a subset of inner nodes in the graph highlighted by the
chars 'a', 'b', 'c' and 'd'. The fork “abcd” is the origin of the radial poly-tree.
Given the holistic view of the dataset, Tulip software allows the user to zoom in on
the radial poly-tree and discover micro-sequences.
Search WWH ::




Custom Search