Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

13.3.1

Feature Filters

Filter methods, the earliest approaches for feature selection, use general

properties of the data in order to evaluate the merit of feature subsets.

As a result, filter methods are generally much faster and practical than

wrapper methods, especially for use on data of high dimensionality.

13.3.1.1 FOCUS

The FOCUS algorithm is originally designed for attributes with Boolean

domains [ Almuallim and Dietterich (1994) ] . FOCUS exhaustively searches

the space of feature subsets until every combination of feature values is

associated with one value of the class. After selecting the subset, it is passed

to the ID 3 algorithm which constructs a decision tree.

13.3.1.2 LVF

The LVF algorithm [ Liu and Setiono (1996) ] is consistency-driven and can

handle noisy domains if the approximate noise level is known apriori .

During every round of implementation, LVF generates a random subset

from the feature subset space. If the chosen subset is smaller than the

current best subset, the inconsistency rate of the dimensionally reduced

data described by the subset is compared with the inconsistency rate of the

best subset. If the subset is at least as consistent as the best subset, the

subset replaces the best subset.

13.3.1.3 Using a Learning Algorithm as a Filter

Some works have explored the possibility of using a learning algorithm as a

pre-processor to discover useful feature subsets for a primary learning algo-

rithm. Cardie (1995) describes the application of decision tree algorithms for

selecting feature subsets for use by instance-based learners. In [Provan and

Singh (1996)], a greedy oblivious decision tree algorithm is used to select

features to construct a Bayesian network. Holmes and Nevill- Manning

(1995) apply Holte's (1993) 1R system in order to estimate the predictive

accuracy of individual features. A program for inducing decision table

majority classifiers used for selecting features is presented in [ Pfahringer

(1995) ] .

Decision table majority (DTM) classifiers are restricted to returning

stored instances that are exact matches with the instance to be classified.

When no instances are returned, the most prevalent class in the training

Search WWH ::

Custom Search

Home