Feature Selection - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

estimate the relationship between any attribute to the class. This requires the coding

of the target class as a binary vector.

Again, these types of measures are closely related to information and distance

measures. Features with a strong association with the class are good features for

predictive tasks. One of the most used dependence measures is the Bhattacharyya

dependence measure B , defined as:

log P

dx

P

B

(

A j ) =

−

(

c i )

(

A j

=

a

|

c i )

P

(

a j

=

a

)

7.2.2.4 Consistency Measures

The previous measures attempt to find the best features that can explain maximally

one class from the others, but are not able to detect whether one of them is redundant.

On the other hand, consistency measures attempt to find a minimum number of

features that separate classes as the full set of features can. They aim to achieve

P

. Feature evaluation rules derived from consistency

measures state that we should select the minimum subset of features that canmaintain

the consistency of data as observed by the full set of features. An inconsistency is

defined as the case of two examples with the same inputs (same feature values)

but with different output feature values (classes in classification). Using them, both

irrelevant and redundant features can be removed.

(

C

|

FullSet

) =

P

(

C

|

SubSet

)

7.2.2.5 Accuracy Measures

This form of evaluation relies on the classifier or learner. Among various possible

subsets of features, the subset which yields the best predictive accuracy is chosen.

This family is distinguished from the previous four due to the fact that is directly

focused on improving the accuracy of the same learner used in the DM task. However,

we have to take some considerations into account. Firstly, how to truly estimate the

predictive accuracy avoiding the problem of over-fitting. Secondly, it is important to

contemplate the required time taken by the DMmodel to complete learning from the

data (usually, classifiers perform more complex tasks than the computation of any of

the four measures seen above). Lastly, the subset of features could be biased towards

an unique model of learning, producing subsets of features that are not generalized.

Table 7.2 summarizes the computation of the accuracymetric and some derivatives

that have been used in FS. The notation used is: tp , true positives; fp , false positives;

fn , false negatives; tn , true negatives; tpr

=

tp

/(

tp

+

fn

)

, sample true positive

rate; fpr

=

fp

/(

fp

+

tn

)

, sample false positive rate; precision

=

tp

/(

tp

+

fp

)

;

recall

=

tpr .

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home