Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

the number of dimensions increases, the sample size needs to increase

exponentially in order to have an effective estimate of multivariate densities

[ Hwang et al . (1994) ] .

This phenomenon is usually referred to as the “curse of dimensionality”.

Bellman (1961) was the first to coin this term, while working on complicated

signal processing issues. Techniques like decision trees inducers that are

ecient in low dimensions fail to provide meaningful results when the

number of dimensions increases beyond a “modest” size. Furthermore,

smaller classifiers, involving fewer features (probably less than 10), are

much more understandable by humans. Smaller classifiers are also more

appropriate for user-driven data mining techniques such as visualization.

Most methods for dealing with high dimensionality focus on Feature

Selection techniques, i.e. selecting a single subset of features upon which

the inducer (induction algorithm) will run, while ignoring the rest. The

selection of the subset can be done manually by using prior knowledge to

identify irrelevant variables or by using proper algorithms.

In the last decade, many researchers have become increasingly inter-

ested in feature selection. Consequently, many feature selection algorithms

have been proposed, some of which have been reported as displaying

remarkable improvements in accuracy. Since the subject is too broad to

survey here, readers seeking further information about recent developments

should see: ( [ Langley (1994) ] ; [ Liu and Motoda (1998) ] ).

A number of linear dimension reducers have been developed over the

years. The linear methods of dimensionality reduction include projection

pursuit [ Friedman and Tukey (1973) ] ;factoranalysis [ Kim and Mueller

(1978) ] ; and principal components analysis [ Dunteman (1989) ] .These

methods are not aimed directly at eliminating irrelevant and redundant

features, but are rather concerned with transforming the observed variables

into a small number of “projections” or “dimensions”. The underlying

assumptions are that the variables are numeric and the dimensions can

be expressed as linear combinations of the observed variables (and vice

versa). Each discovered dimension is assumed to represent an unobserved

factor and thus provide a new way of understanding the data (similar to

the curve equation in the regression models).

The linear dimension reducers have been enhanced by constructive

induction systems that use a set of existing features and a set of predefined

constructive operators to derive new features [ Pfahringer (1994) ] ; [ Ragavan

and Rendell (1993) ] . These methods are effective for high dimensionality

Search WWH ::

Custom Search

Home