Databases Reference
In-Depth Information
(a) CRC Set
(b) CCRS Set
Fig. 6. Compression factor for Reuters dataset
(a) CRC Set
(b) CCRS Set
Fig. 7. Rule length for CRC and CCRS sets for Reuters dataset ( maxgap =2)
For low support thresholds and high maxgap values, the CRC representa-
tion always achieves a higher compression. In particular, when minsup =0 . 1%
and 3
5, the compression factor is more than 10% higher than
in the CCRS representation (about 20% when maxgap = 5). The two rep-
resentations provide a comparable compression for higher minsup and lower
maxgap values. To analyze this behavior, Fig. 7 plots the number of general
and compact rules for different rule lengths, for maxgap = 2 and different
minsup values. As discussed above, when decreasing minsup ,thenumberof
compact rules increases more significantly. Figure 7 shows that this is due to
an increment in the number of compact rules with longer size.
As showed in Fig. 7a, b, for a given minsup value compression increases for
increasing maxgap values. Figure 8 focuses on this issue and plots the com-
pression factor for both compact forms for a large set of maxgap values and for
thresholds minsup =0 . 5% and minsup = 1%. For both forms the compression
factor increases until maxgap = 5 and then decreases again. The compression
factors are very close until maxgap = 5 and then the difference between the
two representations becomes more significant. This difference is more relevant
when minsup =0 . 5%. The CRC form always achieves higher compression.
An analogous behavior has been obtained for other minsup values.
maxgap
Search WWH ::




Custom Search