Databases Reference
In-Depth Information
5 Compact Representations of Sequential
Classification Rules
We propose two compact representations to encode the knowledge available
in a sequential classification rule set. These representations are based on the
concepts of closed and generator sequence. One concise form is a lossless rep-
resentation of the complete rule set and allows regenerating all encoded rules.
This form is based on the concepts of both closed and generator sequences.
Instead, the other representation captures the most general information in
the rule set. This form is based on the concept of generator sequence and it
does not allow the regeneration of the original rule set. Both representations
provide a smaller and more easily understandable class model than traditional
sequential rule representations.
In Sect. 5.1, we introduce the concepts of general and specialistic classi-
fication rule. These rules characterize the more general (shorter) and more
specific (longer) classification rules in a given classification rule set. We then
exploit the concepts of general and specialistic rule to define the two compact
forms, which are presented in Sects. 5.2 and 5.3, respectively.
5.1 General and Specialistic Rules
In associative classification [11,19,30], a shorter rule (i.e., a rule with less ele-
ments in the antecedent) is often preferred to longer rules with same confidence
and support with the intent of both avoiding the risk of overfitting, and re-
ducing the size of the classifier. However, in some applications (e.g., modeling
surfing paths in web log analysis [32]), longer sequences may be more accurate
since they contain more detailed information. In these cases, longest-matching
rules may be preferable to shorter ones. To characterize both kinds of rules,
we propose the definition of specialization of a sequential classification rule.
Definition 8 (Classification Rule Specialization). Let r i : X
c i and
r j : Y
c j be two arbitrary sequential classification rules for
D
. r j is a
specialization of r i iff (i) X
Ψ Y , (ii) c i = c j , (iii) sup Φ ( X )= sup Φ ( Y ) ,
and (iv) sup Φ ( r i )= sup Φ ( r j ) .
From Definition 8, a classification rule r j is a specialization of a rule r i if r i
is more general than r j , i.e., r i has fewer conditions than r j in the antecedent.
Both rules assign the same class label and have equal support and confidence.
The next lemma states that any new data object covered by r j is also
covered by r i . The lemma trivially follows from Property 1, the transitive
property of the set of matching functions Ψ .
Lemma 1. Let r i and r j be two arbitrary sequential classification rules for
D
,andd an arbitrary data object covered by r j .Ifr j is a specialization of r i ,
then r i covers d.
Search WWH ::




Custom Search