Compact Representations of Sequential Classification Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

rules [14]. For association rules, concise representations have been proposed

based on closed and generator itemsets [22, 23, 33]. In the context of associa-

tive classification, compact representations for associative classification rules

have been proposed based on generator itemsets [7] and free-sets [15].

In the sequence domain less effort has been so far devoted to mine concise

representations. Recently in [29, 31] the concept of closed itemset has been

extended to represent frequent sequences, and in [28] an algorithm to mine

top-k closed sequential patterns has been presented. As far as we know, no

concise representations have been proposed for sequential classification rules.

Our work addresses the definition of concise representations for a sequen-

tial classification rule set. We define a general framework for sequential clas-

sification rule mining. In the framework, the notions of containment between

two arbitrary sequences, and a sequence and an input sequence are a gen-

eralization of previous definitions of constrained containment [5, 17, 25]. In

this general context, we define the concept of sequence generator, which, to

our knowledge, has never been proposed before in the sequence domain. Fur-

thermore, we introduce the concepts of constrained closed sequence and con-

strained generator sequence. We exploit both concepts to define two compact

representations of a classification rule set.

9 Conclusions and Future Work

In this chapter we propose two compact representations to encode the knowl-

edge available in a sequential classification rule set. The classification rule

cover ( CRC ) is defined by means of the concept of generator sequence and

yields a simple rule set, which is equivalent to the complete rule set for classifi-

cation purposes. Compact rules, which are the building blocks of the compact

classification rule set ( CCRS ), are characterized by a more complex structure,

based on closed sequences and their associated generator sequences. Compact

rules allow us to regenerate the entire set of frequent sequential classification

rules from the compact form.

Experiments on textual and biological datasets show that the compression

ratio is significant for low support thresholds and correlated datasets. In this

case, traditional techniques would generate a huge amount of classification

rules.

As future work, we plan to exploit our compact representations to design

an effective classifier. A promising direction is the integration of both sequen-

tial and associative classification rules, to exploit both the specific character-

ization provided by sequential rules and the general representation given by

associative classification rules.

Search WWH ::

Custom Search

Home