SOFT COMPUTING FOR FEATURE SELECTION - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

little in this situation. With increased memory capacity, we will be better

and better guided by theoretical bounds in determine sample size.

Sample size 51 is also related to mining quality. However, samples of

the same size could vary in terms of their qualities. In particular, some

samples are more representative or resemble the original data more than

others. Hence, there is a need for measuring sample quality; we then wish

to establish the positive correlation between sample quality and mining

quality.

8.2.4. Feature selection based on information theory

This method is a practical and ecient method which eliminates a

feature that gives little information. The proposed method addresses both

theoretical and empirical aspects of feature selection i.e., a filter approach 55

which can serve more features. It is a type of probabilistic approach i.e.,

for each instance:

Pr ( C/F = f ) ,

(8.1)

where C is the class, F denotes the features, f is a tuple.

This method uses cross-entropy (KL-dist) to select G such that

Pr ( C/G = f G ) is close as previous.

Now:

∆ G = Pr ( f ) ∂ G ( F )

(8.2)

and:

∂ G ( F )= D ( Pr ( C/f ) ,Pr ( C/f G ))

(8.3)

i.e., it employs backward elimination (eliminate F i which would cause

smallest increase in triangle).

Working principle:

•

If Pr ( A = a

|

X = x, B = b )= Pr ( A = a

|

X = x ), then B givesusno

information.

•

M is markov blanket for a feature F if M does contain F .

With these two measures, two new feature selection algorithms, called

the quadratic MI-based feature selection (QMIFS) approach and the MI-

based constructive criterion (MICC) approach. In classificatory analysis,

Search WWH ::

Custom Search

Home