Hybridization of Decision Trees with other Techniques - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

GA begin by randomly generating a population of L candidate

solutions. Given such a population, a GA generates a new candidate solution

(population element) by selecting two of the candidate solutions as the

parent solutions. This process is termed reproduction. Generally, parents

are selected randomly from the population with a bias toward the better

candidate solutions. Given two parents, one or more new solutions are

generated by taking some characteristics of the solution from the first

parent (the “father”) and some from the second parent (the “mother”).

This process is termed “crossover”. For example, in GA that use binary

encoding of n bits to represent each possible solution, we might randomly

select a crossover bit location denoted as o . Two descendant solutions could

then be generated. The first descendant would inherit the first o string

characteristics from the father and the remaining n

o characteristics

from the mother. The second descendant would inherit the first o string

characteristics from the mother and the remaining n

−

o characteristics

from the father. This type of crossover is the most common and it is

termed one-point crossover. Crossover is not necessarily applied to all pairs

of individuals selected for mating: a P crossover probability is used in order

to decide whether crossover will be applied. If crossover is not applied, the

offspring are simply duplications of the parents.

Finally, once descendant solutions are generated, GA allow characteris-

tics of the solutions to be changed randomly in a process known as mutation.

In the binary encoding representation, according to a certain probability

( P mut ), each bit is changed from its current value to the opposite value.

Once a new population has been generated, it is decoded and evaluated.

The process continues until some termination criterion is satisfied. A GA

converges when most of the population is identical, or in other words, when

the diversity is minimal.

Based on the pseudo code, one should provide the following ingredients

when using a GA algorithm for decision trees: crossover operator, mutation

operator, fitness function, a method to create the initial population and a

stopping criterion.

Several GA-based systems, which learn decision trees in the top-down

manner have been proposed, such as BTGA [ Chai et al . (1996) ] ,OC1-ES

[ Cantu-Paz and Kamath (2003) ] and DDT-EA [ Krtowski (2004) ] . Generally,

they apply an evolutionary approach to the test search, especially in the

form of hyper-planes.

The GDT-EA algorithm [ Krtowski and Grze (2005) ] that we describe

here searches for the whole tree at once in contrast to greedy, top-down

−

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home