Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

based on the probabilities of one attribute value occurring together with the value

of the second attribute, and for the classification task the second attribute will cor-

respond to a special attribute in the dataset defined as class. The ST measure for the

capability of input attribute at i in predicting the class attribute Y is defined in [ 54 ]

as follows.

c = 1 r = 1 P ( rc )

+ r = 1 c = 1 P ( rc )

− r = 1 P ( r + )

− c = 1 P ( + c )

P ( + c )

P ( r + )

(10.1)

Ta u ( at i , Y ) =

2 − r = 1 P ( r + )

− c = 1 P ( + c )

The higher values of the ST measure would indicate better discriminating criteria

(features) for the class that is to be predicted in the domain. As performed in [ 15 ], the

attributes are ranked according to their decreasing ST values and a relevance cut-off

point is chosen at and below which all attributes are considered as irrelevant and are

discarded. The relevance cut-off was selected based on the significant difference (less

than half of the previous value in the ranking) between the ST values in decreasing

order. This will prevent the generation of rules which would then need to be discarded

when found that they were comprised of some irrelevant attributes. In accordance

with [ 5 ] we have found that mutual information typically ranks attributes with more

values higher than the ST measure does.

Chi-square : A natural way to express the dependence between antecedent and

the consequence of an association rule is the correlation based on the chi-square

test for independence [ 7 ]. The chi-square test is defined as follows: For a given

D tr , the occurrence of at i where at i ∈

= (

,..., |

| )

is independent of the

occurrence of y r ∈

; otherwise at i and y r are dependent

and correlated. The correlation between at i and y r

Y if P

(

at i ∪

y r ) =

(

at i )

(

y r )

Y is measured using Eq. 10.2 .

For a given lift measure [ 40 ] based on Eq. 10.2 , the chi-square

∈

2 statistic value was

utilised to determine whether the correlation is statistically significant.

(

at i ∪

y r )

lift

(

at i ,

y r ) =

(10.2)

(

at 1 )

(

y r )

Hence, the chi-square test discards any fA k ∈

(

)

for which

∃

at i contained in x

2 value is not significant for y

of x

Y (correlation analysis in Eq. 10.2 ).

Logistic Regression : Another form of statistical analysis applied was the logistic

regression. The relationship between the antecedent and consequent in association

rule mining can be presented as a relationship between a target variable and the input

variables in logistic regression. The following is the definition of the logistic regres-

sion model involved in the framework. For a given D tr , several logistic regression

models were developed based on ln

ₒ

y ,the

∈

(

) = ʲ 0 + ʲ 1 at 1 + ʲ 2 at 2 +···+ ʲ | AT |

| +

e ,

where ln

are the coef-

ficients of the input attributes at i , e is the error variable and Y the dichotomous class

attribute. The coefficient

(

)

is the natural logarithm of the odds ratio,

ʲ 0 ,ʲ 1 ,...,ʲ | AT |

ʲ i of at i is determined based on the log likelihood value

giveninEq. 10.3 , where at i val r denotes the value of attribute at i occurring in record r .

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home