Information Technology Reference
In-Depth Information
chess.dat
1800
IB M
IB M2
PrefixSpan
1600
1400
1200
1000
800
600
400
200
0
Support
Fig. 8.12. Performances with chess.dat file
2000
IB M 2
IB M
PrefixSpan
SPAM
1500
1000
500
0
100000
300000
600000
1000000
Dataset Size
Fig. 8.13. Memory consumption
Fig. 8.13 shows the total memory consumption (in MB) used by IBM and IBM2 ,
SPAM , and PrefixSPAN . For instance, with a database composed of 600,000 rows, SV
contains about 265 values for 90,000 distinct rows. The size of the Bitmap in IBM2 is
then equal to: 265*90,000 = 23.85 MB. As IBM is 8 times more compact, the size of
the binary Bitmap is less than 3 MB. With 1,000,000 rows (Fig. 8.9), SV contains 370
elements for 160,000 distinct rows. Then, the size of the Bitmap reaches 59.2 MB in
IBM2 , whereas the size of the Bitmap fits in 7.5 MB in IBM algorithm. These results
show that IBM is more appropriate than IBM2 for very large databases, due to data
compression. However, IBM2 runs faster than IBM . This is due to the costs of shifting
operations necessary to access target values, whereas IBM2 directly accesses the
target sequences. As we can see in Fig. 8.13, the difference between memory costs in
IBM and in IBM2 is insignificant compared to memory costs in SPAM and in Prefix-
SPAN . For example with 1,000,000 rows, the total memory size for IBM is equal to 28
MB whereas for PrefixSPAN , it is about 468 MB.
Search WWH ::




Custom Search