Databases Reference
In-Depth Information
Compact rules
rule sup% conf%
( {A},A ) → c 1 66.66 66.66
( {A},A ) → c 2 33.33 33.33
( {B},B ) → c 1 33.33 50.00
( {B},B ) → c 2 33.33 50.00
( {E},AE ) → c 1 33.33 100.00
( {AB, E},ABE ) → c 1 33.33 100.00
( {C},ACA ) → c 1 33.33 50.00
( {C},ACA ) → c 2 33.33 50.00
( {DA},ADA ) → c 1 33.33 100.00
( {CB,BA},ACBA ) → c 2 33.33 100.00
( {DB, BA},ADBA ) → c 2 33.33 100.00
( {D, C},ADCA ) → c 1
General rules
rule sup% conf%
A → c 1 66.66 66.66
A → c 2 33.33 33.33
B → c 1 33.33 50.00
B → c 2 33.33 50.00
C → c 1 33.33 50.00
C → c 2 33.33 50.00
D → c 1 33.33 50.00
D → c 2 33.33 50.00
E → c 1 33.33 100.00
AB → c 1 33.33 100.00
BA → c 2 33.33 100.00
CB → c 2 33.33 100.00
DA → c 1 33.33 100.00
DB → c 2 33.33 100.00
(b) CRC set
33.33 50.00
( {D, C},ADCA ) → c 2
33.33 50.00
( {CB},ADCBA ) → c 2
33.33 100.00
(a) CCRS set
Fig. 4. Compact representations
2 which are not generators inherit generators from
their subsequences with the same support. For example, sequence BE contains
sequence E ,and BE and E have equal support. Hence, we add to
Sequences in set
M
G
( BE )all
sequences in set G ( E ) (i.e., E ).
By iteratively applying the algorithm, we generate set M
3 , which includes
2 with itself . For instance, we gen-
erate sequence DCA from sequences DC and CA . DCA has the same support
as both CA and DC . Hence, DCA is not a generator sequence. Instead, it
inherits generators from both CA and DC . Hence
all sequences with length=3, by joining
M
G
( DCA )=
{
D,C
}
.
3 does not contribute to the CRC set, since none of its elements
is a generator sequence. For set
Set
M
2 , only sequence AE is a closed sequence.
Hence, it generates the compact rule (
M
c 1 .
Figure 4 reports the CRC and CCRS sets for our example dataset.
{
E
}
,AE )
7 Experimental Results
Experiments have been run to evaluate both the compression achievable
by means of the proposed compact representations and the performance of
the proposed algorithm. To run experiments we considered three datasets.
Reuters-21578 news and NewsGroups datasets [2] include textual data. DNA
dataset includes collections of DNA sequences [2]. Table 2 reports the number
of items, sequences, and class labels for each dataset. For Reuters and News-
Grousp datasets items correspond to words in a text. For DNA dataset items
correspond to four aminoacid symbols. Table 2 also shows the maximum,
minimum and average length of sequences in the datasets.
Search WWH ::




Custom Search