Databases Reference
In-Depth Information
universe of all possible matching functions, relation X
Φ S can be formalized
as (a) X
S and (b) X satisfies Gap θ K in S . This case corresponds to
the usual definition of gap constrained sequence as introduced for example
in [17, 25].
Based on the notion of containment between a sequence and an input-
sequence, we can now formalize the definition of support of a sequence. In par-
ticular, sup Φ ( X )=
. A sequence X is frequent
with respect to a given support threshold minsup when sup Φ ( X )
|{
( SID,S,c )
∈D|
X
Φ S
}|
minsup .
c i may be mea-
sured by means of two quality indexes [19], rule support and rule confi-
dence. These indexes estimate the accuracy of r in predicting the correct
class for a data object d . Rule support is the number of input-sequences
in
The quality of a (sequential) classification rule r : X
D
which contain X and are labeled by class label c i . Hence, sup Φ ( r )=
|{
. Rule confidence is given by
the ratio conf Φ ( r )= sup Φ ( r ) /sup Φ ( X ). A sequential rule r is frequent if
sup Φ ( r )
( SID,S,c )
∈D|
X
Φ S
c = c i }|
minsup .
3.3 Framework Properties
The concise representations for sequential classification rules we propose in
this work require the pair ( Ψ,Φ ) to satisfy the following two properties.
Property 1 (Transitivity). Let ( Ψ,Φ ) define a constrained framework for
mining sequential classification rules. Let X, Y ,andZ be arbitrary sequences
in D.IfX Ψ Y and Y Ψ Z, then it follows that X Ψ Z, i.e., the
subsequence relation defined by Ψ satisfies the transitive property.
Property 2 (Containment). Let ( Ψ,Φ ) define a constrained framework for
mining sequential classification rules. Let X,Y be two arbitrary sequences
in
D
.IfX
Ψ Y , then it follows that
{
( SID,S,c )
∈D|
X
Φ S
}⊇
{
.
Property 2 states the anti-monotone property of support both for se-
quences and classification rules. In particular, for an arbitrary class label c
it is sup Φ ( X
( SID,S,c )
∈D|
Y
Φ S
}
c ).
Albeit in a different form, several specializations of the above framework
have already been proposed previously [5, 17, 25]. In the remainder of the
chapter, we assume a framework for sequential classification rule mining where
Properties 1 and 2 hold.
The concepts proposed in the following sections rely on both properties of
our framework. In particular, the concepts of closed and generator itemsets
in the sequence domain are based on Property 2. These concepts are then ex-
ploited in Sect. 5 to define two concise forms for a sequential rule set. By means
of Property 1 we define the equivalence between two classification rules. We
exploit this property to define a compact form which allows the classification of
unlabeled data without information loss with respect to the complete rule set.
Both properties are exploited in the extraction algorithm described in Sect. 6.
c )
sup Φ ( Y
Search WWH ::




Custom Search