Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

3 Preliminaries

As noted in Sect. 1.1,

represents a com-

plete set of possible CARs that are generated from D TR ,and R j represents a

rule in set

R

=

{

R 1 ,R 2 ,...,R 2 n −n− 2 ,R 2 n −n− 1 }

R

with label j .

3.1 Proposed Rule Weighting Scheme

Item Weighting Score

There are n items involved in D TR . For a particular pre-defined class A (as

c i ∈ C ), a score is assigned to each item in D TR that distinguishes the signif-

icant items for class A from the insignificant ones.

Definition 1. Let c A ( Item h ) denote the contribution of each item h ∈

D TR

for class A, which represents how significantly item h determines A, where 0

≤

c A ( Item h )

≤|

C

|

,and

|

C

|

is the size function of the set C.

The calculation of c A ( Item h ) is given as follows:

TransFreq ( Item h , A ))

c A ( Item h )=( TransFreq ( Item h ,A ))

×

(1

−

|C|

ClassCount ( Item h ,C )

×

,

where

1. The TransFreq ( Item h ,Aor A ) function computes how frequently that

Item h appears in class A or the group of classes

A (the complement of A ).

The calculation of this function is:

number of transactions with Item h in the class ( es )

number of transactions in the class ( es ) .

2. The ClassCount ( Item h ,C ) function simply counts the number of classes

in C which contain Item h .

The rationale of this item weighting score is demonstrated as follows:

1. The weighting score of Item h for class A tends to be high if Item h is

frequent in A .

2. The weighting score of Item h for class A tends to be high if Item h is

infrequent in A .

3. The weighting score of Item h for any class tends to be high if Item h is

involved in a small number of classes in C . In [5], a similar idea can be

found in feature selection for text categorisation.

Rule Weighting Score

Based on the item weighting score, a weighting score is assigned to the rule

antecedent of each R j ∈R .

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home