SOFT COMPUTING FOR FEATURE SELECTION - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

(1) Initialize relative parameters: F = F ; S = φ ; D u = D ; D l = φ .

(2) Repeat.

(3) For each feature f

F do.

(4) Calculate its mutual information I ( C ; f )on D u ;

(5) If I ( C ; f )=0then F = F

∈

f ;

(6) Choose the feature f with the highest I ( C ; f );

(7) S = SUf ; F = F

−

f ;

(8) Obtain new labeled instances D l from D u induced by f ;

(9) Remove them from D u , i.e., D u = D u −

−

D l ;

(10) Until F = φ or

|

D u |

= I T .

This algorithm works in a straightforward way. It estimates mutual

information for each candidate feature in F with the label C .During

calculating step, feature will be immediately discarded from F if its

mutual information is zero. In this situation, the probability distribution

of the feature is fully random and it will not contribute to predict the

unlabeled instances Du . 70,71 After that, the feature with the highest mutual

information will be chosen. It is noticed that the search strategy in DMIFS

is sequential forward search. This means that the selected subset obtained

by DMIFS is an approximate one.

8.2.9. Learning to classify by ongoing feature selection

Existing classification algorithms use a set of training examples to select

classification features, which are then used for all future applications of the

classifier. A major problem with this approach is the selection of a training

set: a small set will result in reduced performance, and a large set will

require extensive training. In addition, class appearance may change over

time requiring an adaptive classification system. In this paper, we propose a

solution to these basic problems by developing an on-line feature selection

method, which continuously modifies and improves the features used for

classification based on the examples provided so far.

Online feature selection ( n ; k ; e ):

Given a time point in the online

learning process following the presentation of e examples and n features,

find the subset with k<n features that is maximally informative about

the class, estimated on the e examples. For computational eciency, an

on-line selection 23,72 method will also be of use when the set of features to

consider is large, even in a non-online scheme. It then becomes possible to

Knowledge Mining Using Intelligent Agents

Search WWH ::

Custom Search

Home