Databases Reference
In-Depth Information
rules [14]. For association rules, concise representations have been proposed
based on closed and generator itemsets [22, 23, 33]. In the context of associa-
tive classification, compact representations for associative classification rules
have been proposed based on generator itemsets [7] and free-sets [15].
In the sequence domain less effort has been so far devoted to mine concise
representations. Recently in [29, 31] the concept of closed itemset has been
extended to represent frequent sequences, and in [28] an algorithm to mine
top-k closed sequential patterns has been presented. As far as we know, no
concise representations have been proposed for sequential classification rules.
Our work addresses the definition of concise representations for a sequen-
tial classification rule set. We define a general framework for sequential clas-
sification rule mining. In the framework, the notions of containment between
two arbitrary sequences, and a sequence and an input sequence are a gen-
eralization of previous definitions of constrained containment [5, 17, 25]. In
this general context, we define the concept of sequence generator, which, to
our knowledge, has never been proposed before in the sequence domain. Fur-
thermore, we introduce the concepts of constrained closed sequence and con-
strained generator sequence. We exploit both concepts to define two compact
representations of a classification rule set.
9 Conclusions and Future Work
In this chapter we propose two compact representations to encode the knowl-
edge available in a sequential classification rule set. The classification rule
cover ( CRC ) is defined by means of the concept of generator sequence and
yields a simple rule set, which is equivalent to the complete rule set for classifi-
cation purposes. Compact rules, which are the building blocks of the compact
classification rule set ( CCRS ), are characterized by a more complex structure,
based on closed sequences and their associated generator sequences. Compact
rules allow us to regenerate the entire set of frequent sequential classification
rules from the compact form.
Experiments on textual and biological datasets show that the compression
ratio is significant for low support thresholds and correlated datasets. In this
case, traditional techniques would generate a huge amount of classification
rules.
As future work, we plan to exploit our compact representations to design
an effective classifier. A promising direction is the integration of both sequen-
tial and associative classification rules, to exploit both the specific character-
ization provided by sequential rules and the general representation given by
associative classification rules.
Search WWH ::




Custom Search