Compact Representations of Sequential Classification Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

universe of all possible matching functions, relation X

Φ S can be formalized

as (a) X

S and (b) X satisfies Gap θ K in S . This case corresponds to

the usual definition of gap constrained sequence as introduced for example

in [17, 25].

Based on the notion of containment between a sequence and an input-

sequence, we can now formalize the definition of support of a sequence. In par-

ticular, sup Φ ( X )=

. A sequence X is frequent

with respect to a given support threshold minsup when sup Φ ( X )

( SID,S,c )

∈D|

Φ S

≥

minsup .

c i may be mea-

sured by means of two quality indexes [19], rule support and rule confi-

dence. These indexes estimate the accuracy of r in predicting the correct

class for a data object d . Rule support is the number of input-sequences

The quality of a (sequential) classification rule r : X

→

which contain X and are labeled by class label c i . Hence, sup Φ ( r )=

. Rule confidence is given by

the ratio conf Φ ( r )= sup Φ ( r ) /sup Φ ( X ). A sequential rule r is frequent if

sup Φ ( r )

( SID,S,c )

∈D|

Φ S

∧

c = c i }|

≥

minsup .

3.3 Framework Properties

The concise representations for sequential classification rules we propose in

this work require the pair ( Ψ,Φ ) to satisfy the following two properties.

Property 1 (Transitivity). Let ( Ψ,Φ ) define a constrained framework for

mining sequential classification rules. Let X, Y ,andZ be arbitrary sequences

in D.IfX Ψ Y and Y Ψ Z, then it follows that X Ψ Z, i.e., the

subsequence relation defined by Ψ satisfies the transitive property.

Property 2 (Containment). Let ( Ψ,Φ ) define a constrained framework for

mining sequential classification rules. Let X,Y be two arbitrary sequences

.IfX

Ψ Y , then it follows that

{

( SID,S,c )

∈D|

Φ S

}⊇

{

Property 2 states the anti-monotone property of support both for se-

quences and classification rules. In particular, for an arbitrary class label c

it is sup Φ ( X

( SID,S,c )

∈D|

Φ S

}

c ).

Albeit in a different form, several specializations of the above framework

have already been proposed previously [5, 17, 25]. In the remainder of the

chapter, we assume a framework for sequential classification rule mining where

Properties 1 and 2 hold.

The concepts proposed in the following sections rely on both properties of

our framework. In particular, the concepts of closed and generator itemsets

in the sequence domain are based on Property 2. These concepts are then ex-

ploited in Sect. 5 to define two concise forms for a sequential rule set. By means

of Property 1 we define the equivalence between two classification rules. We

exploit this property to define a compact form which allows the classification of

unlabeled data without information loss with respect to the complete rule set.

Both properties are exploited in the extraction algorithm described in Sect. 6.

→

c )

≥

sup Φ ( Y

→

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home