Compact Representations of Sequential Classification Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Lemma 3 (Specialistic Rule). Let

be the set of frequent sequential classi-

fication rules for

,andr

∈R

, r : X

→

c, an arbitrary rule. r is a specialistic

rule in

iff X is a closed sequence in

Proof. We first prove the su cient condition. Let r i : X

→

c be an arbitrary

rule in

,where X is a closed sequence. By Definition 5, if X is a closed

sequence then

Ψ Y it is sup Φ ( X ) >sup Φ ( Y ).

Thus, r i is a specialistic rule according to Definition 10. We now prove the

necessary condition. Let r i : X

∀

r j : Y

→

c in

, with X

For the sake of contradiction, let X not be a closed sequence. It follows that

∃

→

c be an arbitrary specialistic rule in

Ψ Y and sup Φ ( X )= sup Φ ( Y ). Hence, from

Property 2, { ( SID,S,c ) ∈D|Y Φ S} = { ( SID,S,c ) ∈D|X Φ S} ,and

thus sup Φ ( r i )= sup Φ ( r j ). It follows that r i is not a specialistic rule according

to Definition 10, a contradiction.

r j : Y

→

c in

, with X

5.2 Sequential Classification Rule Cover

In this section we present a compact form which is based on the general rules

in a given set

. This form allows the classification of unlabeled data without

information loss with respect to the complete rule set

. Hence, it is equivalent

for classification purposes.

Intuitively, we say that two rule sets are equivalent if they contain the

same knowledge. When referring to a classification rule set, its knowledge is

represented by its capability in classifying an arbitrary data object d .Note

that d can be matched by different rules in

. Each rule r labels d with a

class c . The estimated accuracy of r in predicting the correct class is usually

given by r 's support and confidence.

The equivalence between two rule sets can be formalized in terms of rule

cover.

Definition 11 (Sequential Classification Rule Cover). Let

R 1 and

R 2 ⊆

R 1 be two arbitrary sequential classification rule sets extracted from

R 2 is

a sequential classification rule cover of

r j ∈R 2 , such that

r i is a specialization of r j according to Definition 8 and (ii) R 2 is minimal.

R 1 if (i)

∀

r i ∈R 1 ,

∃

R 1 , the two sets classify in the

same way an arbitrary data object d .Ifarule r i ∈R 1 labels d with class c ,

then in

When

R 2 ⊆R 1 is a classification cover of

R 2 there is a rule r j ,where r i is a specialization of r j ,and r j labels

d with the same class c (see Lemma 1). r i and r j have the same support

and confidence. It follows that

R 1 and

R 2 are equivalent for classification

purposes.

We propose a compact representation of rule set

which includes all

general rules in

. This compact representation, named classification rule

cover , encodes all necessary information to perform classification, but it does

not allow the regeneration of the complete rule set

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home