Using Association Rules for Classification from Databases Having Class Label Ambiguities: A Belief Theoretic Method - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Note that the confidence is calculated based on the whole database. Then

we apply the Apriori algorithm [2] within each partition P ( k ) to generate the

class rule set R ( k ) =

r ( k )

R ( k ) = Apriori P ( k ) ,MinSupp ( k ) ,k = 1 ,N TC ;

{

}

(14)

r ( k )

: F ( k )

C ( k ) , = 1 ,N R ( k ) ,

{

}

→

(15)

where N R ( k ) denotes the number of rules in the rule set R ( k ) and MinSupp ( k )

denotes the minimum support value that each rule in R ( k ) must satisfy.

The final rule set R ARM is the union of the class rule sets, that is,

N TC

R ( k ) .

R ARM =

(16)

k =1

This procedure is what we refer to as partitioned-ARM .

An antecedent F ( k )

of a rule r ( k )

is of the form

F ( k )

= <f ( k )

1 ,f ( k )

2 ,...,f ( k )

N F >,

(17)

and we assume that each feature value f ( k )

can assume either a singleton value

Θ ( i )

f j

,i = 1 ,n f j , or the completely ambiguous value Θ f j . In other words, the

only feature imperfection we allow is a missing value which is modeled as a

complete ambiguity. This is in contrast to the class label ambiguity where

both partial and complete ambiguities are allowed.

2.5 Rule Pruning

While the classification accuracy of the classifier is a function of the integrity of

the rule set, its e ciency and the computational burden it imposes depends on

the size of the rule set. Hence, to increase e ciency and reduce the associated

computational burden, we propose to prune the rules.

Consider the antecedent F ( k )

of the rule r ( k )

. Then, f ( k )

= Θ f j

indicates

complete ambiguity regarding the feature value f ( k )

j . In this case, the feature

value does not provide any valuable information. To formalize the relevant

notions, we introduce

Definitio n 1 (Le vel o f Abst raction (LoA)). For a given rule r ( k )

: F ( k )

→

C ( k ) , = 1 ,N R ( k ) ,k = 1 ,N TC , define

f ( k )

= Θ f j

LoA [ r ( k )

: f ( k )

(18)

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home