Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

shows that the proposed method is well suited for datasets of numeric input

attributes and that its performance is influenced by the dataset size and its

homogeneity.

NBTree [ Kohavi (1996) ] is an instance space decomposition method

that induces a decision tree and a Naıve Bayes hybrid classifier. Naıve

Bayes, which is a classification algorithm based on Bayes' theorem and a

Naıve independence assumption, is very ecient in terms of its processing

time. To induce an NBTree, the instance space is recursively partitioned

according to attributes values. The result of the recursive partitioning is

a decision tree whose terminal nodes are Naıve Bayes classifiers. Since

subjectingaterminalnodetoaNaıve Bayes classifier means that the hybrid

classifier may classify two instances from a single hyper-rectangle region

into distinct classes, the NBTree is more flexible than a pure decision tree.

In order to decide when to stop the growth of the tree, NBTree compares

two alternatives in terms of error estimation — partitioning into a hyper-

rectangle region and inducing a single Naıve Bayes classifier. The error

estimation is calculated by cross-validation, which significantly increases the

overall processing time. Although NBTree applies a Naıve Bayes classifier

to decision tree terminal nodes, classification algorithms other than Naıve

Bayes are also applicable. However, the cross-validation estimations make

the NBTree hybrid computationally expensive for more time-consuming

algorithms such as neural networks.

More recently, Cohen et al . (2007) generalizes the NBTree idea and

examines a decision-tree framework for space decomposition. According to

this framework, the original instance-space is hierarchically partitioned into

multiple sub-spaces and a distinct classifier (such as neural network) is

assigned to each sub-space. Subsequently, an unlabeled, previously-unseen

instance is classified by employing the classifier that was assigned to the

sub-space to which the instance belongs.

The divide and conquer approach includes many other specific methods

such as local linear regression, CART/MARS, adaptive sub-space models,

etc [Johansen and Foss (1992); Ramamurti and Ghosh (1999); Holmstrom

et al . (1997)].

9.5.4.2 Feature Subset-based Ensemble Methods

Another less common strategy for manipulating the search space is to

manipulate the input attribute set. Feature subset-based ensemble methods

are those that manipulate the input feature set for creating the ensemble

Search WWH ::

Custom Search

Home