Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

are transformed into new concepts in an iterative manner to create a

hierarchy of concepts. A different function decomposition which can be

applied in data mining is the Bi-Decomposition [ Long (2003) ] .Inthis

approach, the original function is decomposed into two decomposition

functions that are connected by a two-input operator called a “gate”. Each

of the decomposition functions depends on fewer variables than the original

function. Recursive bi-decomposition represents a function as a structure

of interconnected gates.

9.5.4

Partitioning the Search Space

The idea is that each member in the ensemble explores a different part of

the search space. Thus, the original instance space is divided into several

sub-spaces. Each sub-space is considered independently and the total model

is a (possibly soft) union of such simpler models.

When using this approach, one should decide if the sub-spaces will

overlap. At one extreme, the original problem is decomposed into several

mutually exclusive sub-problems, such that each sub-problem is solved

using a dedicated classifier. In such cases, the classifiers may have significant

variations in their overall performance over different parts of the input

space [Tumer and Ghosh (2000)]. At the other extreme, each classifier

solves the same original task. In such cases, “If the individual classifiers

are then appropriately chosen and trained properly, their performances will

be (relatively) comparable in any region of the problem space. [Tumer and

Ghosh (2000) ] ”. However, usually the sub-spaces may have soft boundaries,

namely sub-spaces are allowed to overlap.

There are two popular approaches for search space manipulations:

divide and conquer approaches and feature subset-based ensemble methods.

9.5.4.1 Divide and Conquer

In the neural-networks community, Nowlan and Hinton (1991) examined the

mixture of experts (ME) approach, which partitions the instance space into

several sub-spaces and assigns different experts (classifiers) to the different

sub-spaces. The sub-spaces, in ME, have soft boundaries (i.e. they are

allowed to overlap). A gating network then combines the experts' outputs

and produces a composite decision.

Some researchers have used clustering techniques to partition the space.

The basic idea is to partition the instance space into mutually exclusive

subsets using K-means clustering algorithm. An analysis of the results

Search WWH ::

Custom Search

Home