Databases Reference
In-Depth Information
(a)
CRC
Set
(b)
CCRS
Set
Fig. 6.
Compression factor for Reuters dataset
(a)
CRC
Set
(b)
CCRS
Set
Fig. 7.
Rule length for
CRC
and
CCRS
sets for Reuters dataset (
maxgap
=2)
For low support thresholds and high
maxgap
values, the
CRC
representa-
tion always achieves a higher compression. In particular, when
minsup
=0
.
1%
and 3
5, the compression factor is more than 10% higher than
in the
CCRS
representation (about 20% when
maxgap
= 5). The two rep-
resentations provide a comparable compression for higher
minsup
and lower
maxgap
values. To analyze this behavior, Fig. 7 plots the number of general
and compact rules for different rule lengths, for
maxgap
= 2 and different
minsup
values. As discussed above, when decreasing
minsup
,thenumberof
compact rules increases more significantly. Figure 7 shows that this is due to
an increment in the number of compact rules with longer size.
As showed in Fig. 7a, b, for a given
minsup
value compression increases for
increasing
maxgap
values. Figure 8 focuses on this issue and plots the com-
pression factor for both compact forms for a large set of
maxgap
values and for
thresholds
minsup
=0
.
5% and
minsup
= 1%. For both forms the compression
factor increases until
maxgap
= 5 and then decreases again. The compression
factors are very close until
maxgap
= 5 and then the difference between the
two representations becomes more significant. This difference is more relevant
when
minsup
=0
.
5%. The
CRC
form always achieves higher compression.
An analogous behavior has been obtained for other
minsup
values.
≤
maxgap
≤