Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

at the meta-level and empirically showed that it performs better than

existing stacking approaches and better than selecting the best classifier

by cross-validation.

The SCANN (for Stacking, Correspondence Analysis and Nearest

Neighbor) combining method [ Merz (1999) ] uses the strategies of stacking

and correspondence analysis. Correspondence analysis is a method for

geometrically modeling the relationship between the rows and columns

of a matrix whose entries are categorical. In this context Correspondence

Analysis is used to explore the relationship between the training examples

and their classification by a collection of classifiers.

A nearest neighbor method is then applied to classify unseen examples.

Here, each possible class is assigned coordinates in the space derived by

correspondence analysis. Unclassified examples are mapped into the new

space, and the class label corresponding to the closest class point is assigned

to the example.

9.3.2.2 Arbiter Trees

According to Chan and Stolfo's approach [ Chan and Stolfo (1993) ] ,an

arbiter tree is built in a bottom-up fashion. Initially, the training set is

randomly partitioned into k disjoint subsets. The arbiter is induced from a

pair of classifiers and recursively a new arbiter is induced from the output

of two arbiters. Consequently for k classifiers, there are log 2 ( k ) levels in the

generated arbiter tree.

The creation of the arbiter is performed as follows. For each pair

of classifiers, the union of their training dataset is classified by the

two classifiers. A selection rule compares the classifications of the two

classifiers and selects instances from the union set to form the training

set for the arbiter. The arbiter is induced from this set with the same

learning algorithm used in the base level. The purpose of the arbiter is to

provide an alternate classification when the base classifiers present diverse

classifications. This arbiter, together with an arbitration rule, decides on

a final classification outcome, based upon the base predictions. Figure 9.4

shows how the final classification is selected based on the classification of

two base classifiers and a single arbiter.

The process of forming the union of data subsets; classifying it using

a pair of arbiter trees; comparing the classifications; forming a training

set; training the arbiter; and picking one of the predictions, is recursively

performed until the root arbiter is formed. Figure 9.5 illustrates an arbiter

tree created for k =4. T 1 −

T 4

are the initial four training datasets from

Search WWH ::

Custom Search

Home