Compact Representations of Sequential Classification Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

(a) CRC Set

(b) CCRS Set

Fig. 6. Compression factor for Reuters dataset

(a) CRC Set

(b) CCRS Set

Fig. 7. Rule length for CRC and CCRS sets for Reuters dataset ( maxgap =2)

For low support thresholds and high maxgap values, the CRC representa-

tion always achieves a higher compression. In particular, when minsup =0 . 1%

and 3

5, the compression factor is more than 10% higher than

in the CCRS representation (about 20% when maxgap = 5). The two rep-

resentations provide a comparable compression for higher minsup and lower

maxgap values. To analyze this behavior, Fig. 7 plots the number of general

and compact rules for different rule lengths, for maxgap = 2 and different

minsup values. As discussed above, when decreasing minsup ,thenumberof

compact rules increases more significantly. Figure 7 shows that this is due to

an increment in the number of compact rules with longer size.

As showed in Fig. 7a, b, for a given minsup value compression increases for

increasing maxgap values. Figure 8 focuses on this issue and plots the com-

pression factor for both compact forms for a large set of maxgap values and for

thresholds minsup =0 . 5% and minsup = 1%. For both forms the compression

factor increases until maxgap = 5 and then decreases again. The compression

factors are very close until maxgap = 5 and then the difference between the

two representations becomes more significant. This difference is more relevant

when minsup =0 . 5%. The CRC form always achieves higher compression.

An analogous behavior has been obtained for other minsup values.

≤

maxgap

≤

Search WWH ::

Custom Search

Home