Advanced Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

[ John (1996) ] and others ( [ Utgoff (1989a) ] ; [ Lubinsky (1993) ] ; [ Sethi and

Yoo (1994) ] ).

Growing of oblique decision trees was first proposed as a linear

combination extension to the CART algorithm. This extension is known

as the CART-LC [ Biermann et al . (1982) ] . oblique classifier 1 (OC1) is an

inducer of oblique decision trees designed for training sets with numeric

instances [ Murthy et al . (1994) ] . OC1 builds the oblique hyperplanes by

using a linear combinations of one or more numeric attributes at each

internal node; these trees then partition the space of examples with both

oblique and axis-parallel hyperplanes.

11.7

Incremental Learning of Decision Trees

To reflect new data that has become available, most decision trees inducers

must be rebuilt from scratch. This is time-consuming and expensive and

several researchers have addressed the issue of updating decision trees

incrementally. Utgoff [ Utgoff (1989b); Utgoff (1997) ] , for example, presents

several methods for incrementally updating decision trees while Crawford

(1989) describes an extension to the CART algorithm that is capable of

inducing incremental changes.

11.7.1

The Motives for Incremental Learning

In the ever-changing world of information technology there are two

fundamental problems to be addressed:

•

Vast quantities of digital data continue to grow at staggering rates.

In organizations such as e-commerce sites, large retailers and telecom-

munication corporations, data increases of gigabytes per day are not

uncommon. While this data could be extremely valuable to these

organizations, the tremendous volume makes it virtually impossible to

extract useful information. This is due to the fact that KDD systems in

general, and traditional data mining algorithms in particular, are limited

by several crippling factors. These factors, referred to as computational

resources, are the size of the sample to be processed, running time

and memory. As a result, most of the available data is unused which

leads to underfitting. While there is enough data to model a compound

phenomenon, there is no capability for fully utilizing this data and

unsatisfactorily simple models are produced.

Search WWH ::

Custom Search

Home