Databases Reference
In-Depth Information
Compact rules
rule sup% conf%
(
{A},A
)
→ c
1
66.66 66.66
(
{A},A
)
→ c
2
33.33 33.33
(
{B},B
)
→ c
1
33.33 50.00
(
{B},B
)
→ c
2
33.33 50.00
(
{E},AE
)
→ c
1
33.33 100.00
(
{AB, E},ABE
)
→ c
1
33.33 100.00
(
{C},ACA
)
→ c
1
33.33 50.00
(
{C},ACA
)
→ c
2
33.33 50.00
(
{DA},ADA
)
→ c
1
33.33 100.00
(
{CB,BA},ACBA
)
→ c
2
33.33 100.00
(
{DB, BA},ADBA
)
→ c
2
33.33 100.00
(
{D, C},ADCA
)
→ c
1
General rules
rule sup% conf%
A → c
1
66.66 66.66
A → c
2
33.33 33.33
B → c
1
33.33 50.00
B → c
2
33.33 50.00
C → c
1
33.33 50.00
C → c
2
33.33 50.00
D → c
1
33.33 50.00
D → c
2
33.33 50.00
E → c
1
33.33 100.00
AB → c
1
33.33 100.00
BA → c
2
33.33 100.00
CB → c
2
33.33 100.00
DA → c
1
33.33 100.00
DB → c
2
33.33 100.00
(b)
CRC
set
33.33 50.00
(
{D, C},ADCA
)
→ c
2
33.33 50.00
(
{CB},ADCBA
)
→ c
2
33.33 100.00
(a)
CCRS
set
Fig. 4.
Compact representations
2
which are not generators inherit generators from
their subsequences with the same support. For example, sequence
BE
contains
sequence
E
,and
BE
and
E
have equal support. Hence, we add to
Sequences in set
M
G
(
BE
)all
sequences in set
G
(
E
) (i.e.,
E
).
By iteratively applying the algorithm, we generate set
M
3
, which includes
2
with itself . For instance, we gen-
erate sequence
DCA
from sequences
DC
and
CA
.
DCA
has the same support
as both
CA
and
DC
. Hence,
DCA
is not a generator sequence. Instead, it
inherits generators from both
CA
and
DC
. Hence
all sequences with length=3, by joining
M
G
(
DCA
)=
{
D,C
}
.
3
does not contribute to the
CRC
set, since none of its elements
is a generator sequence. For set
Set
M
2
, only sequence
AE
is a closed sequence.
Hence, it generates the compact rule (
M
c
1
.
Figure 4 reports the
CRC
and
CCRS
sets for our example dataset.
{
E
}
,AE
)
→
7 Experimental Results
Experiments have been run to evaluate both the compression achievable
by means of the proposed compact representations and the performance of
the proposed algorithm. To run experiments we considered three datasets.
Reuters-21578 news and NewsGroups datasets [2] include textual data. DNA
dataset includes collections of DNA sequences [2]. Table 2 reports the number
of items, sequences, and class labels for each dataset. For Reuters and News-
Grousp datasets items correspond to words in a text. For DNA dataset items
correspond to four aminoacid symbols. Table 2 also shows the maximum,
minimum and average length of sequences in the datasets.