Information Technology Reference
In-Depth Information
chess.dat
1800
IB M
IB M2
PrefixSpan
1600
1400
1200
1000
800
600
400
200
0
Support
Fig. 8.12.
Performances with
chess.dat
file
2000
IB M 2
IB M
PrefixSpan
SPAM
1500
1000
500
0
100000
300000
600000
1000000
Dataset Size
Fig. 8.13.
Memory consumption
Fig. 8.13 shows the total memory consumption (in MB) used by
IBM
and
IBM2
,
SPAM
, and
PrefixSPAN
. For instance, with a database composed of 600,000 rows,
SV
contains about 265 values for 90,000 distinct rows. The size of the Bitmap in
IBM2
is
then equal to: 265*90,000 = 23.85 MB. As
IBM
is 8 times more compact, the size of
the binary Bitmap is less than 3 MB. With 1,000,000 rows (Fig. 8.9),
SV
contains 370
elements for 160,000 distinct rows. Then, the size of the Bitmap reaches 59.2 MB in
IBM2
, whereas the size of the Bitmap fits in 7.5 MB in
IBM
algorithm. These results
show that
IBM
is more appropriate than
IBM2
for very large databases, due to data
compression. However,
IBM2
runs faster than
IBM
. This is due to the costs of shifting
operations necessary to access target values, whereas
IBM2
directly accesses the
target sequences. As we can see in Fig. 8.13, the difference between memory costs in
IBM
and in
IBM2
is insignificant compared to memory costs in
SPAM
and in
Prefix-
SPAN
. For example with 1,000,000 rows, the total memory size for
IBM
is equal to 28
MB whereas for
PrefixSPAN
, it is about 468 MB.