Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

In the literature, there are several works that deal with feature set

partitioning. In one research, the features are grouped according to the

feature type: nominal value features, numeric value features and text

value features [ Kusiak (2000) ] . A similar approach was also used for

developing the linear Bayes classifier [ Gama (2000) ] . The basic idea consists

of aggregating the features into two subsets: the first subset containing only

the nominal features and the second only the continuous features.

In another research, the feature set was decomposed according to

the target class [ Tumer and Ghosh (1996) ] . For each class, the features

with low correlation relating to that class were removed. This method

was applied on a feature set of 25 sonar signals where the target was

to identify the meaning of the sound (whale, cracking ice, etc.). Feature

set partitioning has also been used for radar-based volcano recognition

[ Cherkauer (1996) ] . The researcher manually decomposed a feature set of

119 into 8 subsets. Features that were based on different image processing

operations were grouped together. As a consequence, for each subset,

four neural networks with different sizes were built. A new combining

framework for feature set partitioning has been used for text-independent

speaker identification [ Chen et al . (1997) ] . Other researchers manually

decomposed the features set of a certain truck backer-upper problem and

reported that this strategy has important advantages [ Jenkins and Yuhas

(1993) ] .

The feature set decomposition can be obtained by grouping features

based on pairwise mutual information, with statistically similar features

assigned to the same group [ Liao and Moody (2000) ] . For this purpose, one

can use an existing hierarchical clustering algorithm. As a consequence,

several feature subsets are constructed by selecting one feature from each

group. A neural network is subsequently constructed for each subset. All

networks are then combined.

In statistics literature, the well-known feature-oriented ensemble algo-

rithm is the MARS algorithm [Friedman (1991)]. In this algorithm, a

multiple regression function is approximated using linear splines and their

tensor products. It has been shown that the algorithm performs an ANOVA

decomposition, namely, the regression function is represented as a grand

total of several sums. The first sum is of all basic functions that involve

only a single attribute. The second sum is of all basic functions that involve

exactly two attributes, representing (if present) two-variable interactions.

Similarly, the third sum represents (if present) the contributions from three-

variable interactions, and so on.

Search WWH ::

Custom Search

Home