SOFT COMPUTING FOR FEATURE SELECTION - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

subset to distinguish the different class labels. Considering these divisions

and the latest developments, we divide the evaluation functions into five

categories: distance, information (or uncertainty), dependence, consistency,

and classifier error rate. In the following subsections we briefly discuss each

of these types of evaluation functions.

Distance measures:

It is also known as separability, divergence, or

discrimination measure. For a two-class problem, a feature X is preferred

to another feature Y if X induces a greater difference between the two-class

conditional probabilities than Y ; if the difference is zero, then X and Y are

indistinguishable. An example is the Euclidean distance measure.

Information measures:

These measures typically determine the infor-

mation gain from a feature. The information gain from a feature X is defined

as the difference between the prior uncertainty 40 and expected posterior

uncertainty using X .Feature X is preferred to feature Y if the information

gain from feature X is greater than that from feature Y (e.g., entropy

measure). 108

Dependence measures:

Dependence measures or correlation measures

qualify the ability to predict the value of one variable from the value of

another. The coecient is a classical dependence measure and can be used

to find the correlation between a feature and a class. If the correlation of

feature X with class C is higher than the correlation 109 of feature Y with C ,

then feature X is preferred to Y . A slight variation of this is to determine

the dependence of a feature on other features; this value indicates the degree

of redundancy of the feature. All evaluation functions based on dependence

measures can be divided between distance and information measures. 110,111

But, these are still kept as a separate category, because conceptually, they

represent a different viewpoint. 112 More about the above three measures

can be found in Ben-Basset's survey. 113

Consistency measures:

These measures are rather new and have been

in much focus recently. These are characteristically different from other

measures, because of their heavy reliance on the training dataset and the

use of the Min-Features bias in selecting a subset of features. 114 Min-

Features 115 bias prefers consistent hypotheses definable over as few features

as possible. These measures find out the minimally sized subset that satisfies

the acceptable inconsistency rate that is usually set by the user.

Search WWH ::

Custom Search

Home